Event-Level Digital Analytics using Google Analytics, Fivetran, BigQuery and Looker
A few weeks ago I posted a blog on using Mixpanel, Looker and Google BigQuery to analyze listener data from the Drill to Detail website, with Mixpanel tracking individual episode play events by visitors to the site and Fivetran replicating that event data over to BigQuery for analysis using Looker.
Mixpanel is great but like most digital businesses using Google Analytics (GA) to analyze and measure activity on their website, we actually use Google Tag Manager and GA events to track visitor activity on the mjr-analytics.com website. Whilst a full site license for Google Analytics 360 is way beyond our budget right now, the free version of GA together with GTM provides us with an unsampled, event-level stream of visitor events and together with goals and simple A/B testing tools a way to start optimizing the content on our site.
Couple all of this with Fivetran and Looker and we can start to do some interesting analytics on all this data to not only count sessions, visitors and pages viewed but also start looking at visitor retention and the routes those visitors take through the various features on our website.
Events in Google Analytics are a way of recording interactions visitors to your site make with items on a page such as selecting an item on a navigation menu, clicking on a download link or seeing a special offer or product recommendation. On the mjr-analytics.com website we’ve setup events, for example, to record how far down the page a visitor scrolls so we can see how many people actually scroll past the hero image on our front page and look at the content on Looker services, Oracle services and so on.
Another event records clicks on the specific hero image a visitor clicks on with another recording what navigation menu item was clicked on, in each case storing the image name or menu item clicked on as the event label.
Enabling GTM’s debug mode for our site shows us which events then trigger when visiting our site, with the screenshot below showing a typical visitor journey through our homepage and the events that are triggered as they scroll-through and interact with various items on the page.
Looking at the Data Layer view in Google Tag Manager you can see the event entries recorded for this visitor to the site page, but note also how nothing personally identifiable is recorded except for a unique client ID set at the device/browser level that can then be used to analyze new vs. returning visitors and track visitor retention and engagement over time.
As we then did with the Mixpanel example and the Drill to Detail website, we then use Fivetran to replicate all of the data layer events over to Google BigQuery, storing this granular visitor behavioural data in a BigQuery table and dataset. As you can see from the screenshot below we use Fivetran extensively within MJR Analytics; replicating invoices and all our other accounting data from Xero into BigQuery along with timesheet data from Harvest, CRM data from Hubspot, commits and PRs from Github, tasks outstanding and completed from Asana and search rankings and keywords from Google Search Console all into a Google BigQuery project whose datasets we then combine and analyze together using Looker.
For the event data coming across from GA we use Fivetran to replicate the relevant event tracking dimensions and metrics into BigQuery every five minutes, giving us near real-time numbers on what’s popular and what’s engaging on our company website.
As the event data from GA arrives as table rows of individual events being triggered, I then sessionize those individual events into rows of visitor sessions, using BigQuery SQL to pivot the incoming data and create individual columns for the first event, second event, first page view and first product item clicked on, for example, within an individual visitor session. The actual SQL used for the view is too long really to include here but to get the idea, here’s a condensed version with repeating parts removed for clarity.
SELECT global_session_id, visitor_session_id, visitor_id, visitor_session_start_date_time, MAX(visitor_session_length_mins) AS visitor_session_length_mins, visitor_first_event_week_cohort, city, session_events_count, ... event_1_category, event_1_label, ... page_view_1, page_view_2, ... FROM ( SELECT e.global_session_id, e.visitor_session_id, e.visitor_id, e.visitor_session_start_date_time, e.visitor_session_length_mins, e.visitor_first_event_week_cohort, e.city, MAX(e.visitor_session_event_seq_num) OVER (PARTITION BY e.global_session_id) AS session_events_count, ( SELECT COUNT(*) FROM `aerial-vehicle-148023.dw_staging.ga_events` WHERE global_session_id = e.global_session_id AND event_category = 'Page View') AS page_view_count, SELECT COUNT(*) FROM `aerial-vehicle-148023.dw_staging.ga_events` WHERE global_session_id = e.global_session_id AND event_category = 'Nav Bar Clicks') AS nav_bar_click_count, ..., e1.event_label AS event_1_label, e2.event_category AS event_2_category, e2.event_label AS event_2_label, ... pv.event_label AS page_view_1, pv2.event_label AS page_view_2, ..., igc.event_label AS index_gallery_first_click FROM `aerial-vehicle-148023.dw_staging.ga_events` e LEFT OUTER JOIN ( SELECT global_session_id, event_label, ROW_NUMBER() OVER (PARTITION BY global_session_id ORDER BY visitor_session_event_seq_num) page_view_seq_num FROM `aerial-vehicle-148023.dw_staging.ga_events` WHERE event_category = 'Page View' ORDER BY 3) pv ON e.global_session_id = pv.global_session_id AND pv.page_view_seq_num = 1 LEFT OUTER JOIN ( SELECT global_session_id, event_label, ROW_NUMBER() OVER (PARTITION BY global_session_id ORDER BY visitor_session_event_seq_num) page_view_seq_num FROM `aerial-vehicle-148023.dw_staging.ga_events` WHERE event_category = 'Page View' ORDER BY 3) pv2 ON e.global_session_id = pv2.global_session_id AND pv2.page_view_seq_num = 2 ... JOIN `aerial-vehicle-148023.dw_staging.ga_events` e1 ON e.global_session_id = e1.global_session_id AND e1.visitor_event_seq_num = 1 LEFT OUTER JOIN `aerial-vehicle-148023.dw_staging.ga_events` e2 ON e.global_session_id = e2.global_session_id AND e2.visitor_event_seq_num = 2 ... GROUP BY 1, 2, ..., 46 ORDER BY 1
Creating a view or materialized table from this BigQuery SQL then gives me one record per session with event totals, page view and event paths along with session and client IDs that are perfect then for bringing into Looker for further analysis.
Then finally it’s just a case of bringing that table of data into Looker as a LookML view and using one of Looker’s custom visualization blocks to display visitor event paths as a Sankey diagram, for example.
Or we can aggregate and analyze counts of events from sessions to see in graphical form which offers, download items and navigation menu items are most popular over a given period.
If you’d like to understand more about how MJR Analytics can help you better understand what visitors are looking at and interacting with on your website then drop me an email at email@example.com; or even better, if you’re in Copenhagen on November 22nd and have the evening free, come along to the free meetup we’re hosting together with our partners CIMA and Looker and you’ll see me demo this all live, in person and show you how it works.