Customer Journey and Lifetime Value Analytics using Looker, Google BigQuery, Stitch and dbt
One of the most common analytics use-cases I come across on client projects is to understand the “customer journey”; the series of stages prospects and customers typically go through from becoming aware of a company’s products and services through to consideration, conversion, up-sell and repurchase and eventually, to churn.
Customers typically interact with multiple digital and offline channels over their overall lifetime, and if we take the model of a consulting services business such as Rittman Analytics, a typical customer journey and use of those touch-points might look like the diagram below.
Each customer and prospect interaction with those touch-points sends signals about their buying intentions, product and service preferences and readiness to move through to the next stage in the purchase funnel.
Creating visualization such as these in Looker typically involves two steps of data transformation either through a set of Looker (persistent) derived tables, or as we’ve done for our own operational analytics platform, through two dbt (Data Build Tool) models sequenced together as a transformation graph.
To start, we select a common set of columns from each digital touchpoint channel as brought in by our project data pipeline tool, Stitch; Hubspot for inbound and outbound marketing communctions together with sales activity and closes/loses; Harvest for billable and non-billable client days; Segment for website visits and so on. For each source we turn each type of customer activity they record into a type of event, details of the event and a monetary value (revenue, cost etc) that we can then use to measure the overall lifetime value of each client later on.
Note the expressions at the start of the model definition that define a number of recency measures (days since last billable day, last contact day and so on) along with another set that are used for defining customer cohorts and bucketing activity into months and weeks since those first engagement date.
{{config(materialized='table',partition_by='DATE(event_ts)')}}SELECT*,{{ dbt_utils.datediff('last_billable_day_ts', current_timestamp(), 'day')}} AS days_since_last_billable_day,{{ dbt_utils.datediff('last_incoming_email_ts', current_timestamp(), 'day')}} AS days_since_last_incoming_email,{{ dbt_utils.datediff('last_outgoing_email_ts', current_timestamp(), 'day')}} AS days_since_last_outgoing_email,{{ dbt_utils.datediff('last_site_visit_day_ts', current_timestamp(), 'month')}} AS months_since_last_site_visit_day,{{ dbt_utils.datediff('last_site_visit_day_ts', current_timestamp(), 'week')}} AS weeks_since_last_site_visit_dayFROM(SELECT*,ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY event_ts) AS event_seq,MIN(CASE WHEN event_type = 'Billable Day' THEN event_ts END){{ customer_window_over('customer_id', 'event_ts', 'ASC') }} AS first_billable_day_ts,MAX(CASE WHEN event_type = 'Billable Day' THEN event_ts END){{ customer_window_over('customer_id', 'event_ts', 'ASC') }} AS last_billable_day_ts,MIN(CASE WHEN event_type = 'Client Invoiced' THEN event_ts END){{ customer_window_over('customer_id', 'event_ts', 'ASC') }} AS first_invoice_day_ts,MAX(CASE WHEN event_type = 'Client Invoiced' THEN event_ts END){{ customer_window_over('customer_id', 'event_ts', 'ASC') }} AS last_invoice_day_ts,MAX(CASE WHEN event_type = 'Site Visited' THEN event_ts END){{ customer_window_over('customer_id', 'event_ts', 'ASC') }} AS last_site_visit_day_ts,MAX(CASE WHEN event_type = 'Incoming Email' THEN event_ts END){{ customer_window_over('customer_id', 'event_ts', 'ASC') }} AS last_incoming_email_ts,MAX(CASE WHEN event_type = 'Outgoing Email' THEN event_ts END){{ customer_window_over('customer_id', 'event_ts', 'ASC') }} AS last_outgoing_email_ts,MIN(event_ts){{ customer_window_over('customer_id', 'event_ts', 'ASC') }} AS first_contact_ts,DATE_DIFF(date(event_ts),MIN(CASE WHEN event_type = 'Billable Day' THEN date(event_ts) END) {{ customer_window_over('customer_id', 'event_ts', 'ASC') }},MONTH) AS months_since_first_billable_day,DATE_DIFF(date(event_ts),MIN(CASE WHEN event_type = 'Billable Day' THEN date(event_ts) END){{ customer_window_over('customer_id', 'event_ts', 'ASC') }},WEEK) AS weeks_since_first_billable_day,DATE_DIFF(date(event_ts),MIN(CASE WHEN event_type like '%Email%' THEN date(event_ts) END){{ customer_window_over('customer_id', 'event_ts', 'ASC') }},MONTH) AS months_since_first_contact_day,DATE_DIFF(date(event_ts),MIN(CASE WHEN event_type like '%Email%' THEN date(event_ts) END){{ customer_window_over('customer_id', 'event_ts', 'ASC') }},WEEK) AS weeks_since_first_contact_day,MAX(CASE WHEN event_type = 'Billable Day' THEN true ELSE false END){{ customer_window_over('customer_id', 'event_ts', 'ASC') }} AS billable_client,MAX(CASE WHEN event_type LIKE '%Sales%' THEN true ELSE false END){{ customer_window_over('customer_id', 'event_ts', 'ASC') }} AS sales_prospect,MAX(CASE WHEN event_type LIKE '%Site Visited%' THEN true ELSE false END){{ customer_window_over('customer_id', 'event_ts', 'ASC') }} AS site_visitor,MAX(CASE WHEN event_details LIKE '%Blog%' THEN true ELSE false END){{ customer_window_over('customer_id', 'event_ts', 'ASC') }} AS blog_reader,MAX(CASE WHEN event_type LIKE '%Podcast%' THEN true ELSE false END){{ customer_window_over('customer_id', 'event_ts', 'ASC') }} AS podcast_readerFROM-- sales opportunity stages(SELECTdeals.lastmodifieddate AS event_ts,customer_master.customer_id AS customer_id,customer_master.customer_name AS customer_name,deals.dealname AS event_details,deals.dealstage AS event_type,AVG(deals.amount) AS event_valueFROM{ } AS customer_masterLEFT JOIN{ } AS dealsON customer_master.hubspot_company_id = deals.associatedcompanyidsLEFT JOIN{ } AS ownersON deals.hubspot_owner_id = CAST(owners.ownerid AS STRING)WHEREdeals.lastmodifieddate IS NOT null{ }UNION ALL-- consulting daysSELECTtime_entries.spent_date AS event_ts,customer_master.customer_id AS customer_id,customer_master.customer_name AS customer_name,projects.name AS event_details,CASE WHEN time_entries.billable THEN 'Billable Day' ELSE 'Non-Billable Day' END AS event_type,time_entries.hours * time_entries.billable_rate AS event_valueFROM{ } AS customer_masterLEFT JOIN{ } AS projectsON customer_master.harvest_customer_id = projects.client_idLEFT JOIN{ } AS time_entriesON time_entries.project_id = projects.idWHEREtime_entries.spent_date IS NOT null{ }UNION ALL-- incoming and outgoing emailsSELECTcommunications.communication_timestamp AS event_ts,customer_master.customer_id AS customer_id,customer_master.customer_name AS customer_name,communications.communications_subject AS event_details,CASE WHEN communications.communication_type = 'INCOMING_EMAIL' THEN 'Incoming Email'WHEN communications.communication_type = 'EMAIL' THEN 'Outgoing Email'ELSE communications.communications_subjectEND AS event_type,1 AS event_valueFROM{ } AS customer_masterLEFT JOIN{ } AS communicationsON customer_master.hubspot_company_id = communications.hubspot_company_idWHEREcommunications.communication_timestamp IS NOT null{ }UNION ALL-- sales opportunity stagesSELECTinvoices.issue_date AS event_ts,customer_master.customer_id AS customer_id,customer_master.customer_name AS customer_name,invoices.subject AS event_details,'Client Invoiced' AS event_type,SUM(invoices.amount) AS event_valueFROM{ } AS customer_masterLEFT JOIN{ } AS invoicesON customer_master.harvest_customer_id = invoices.client_idWHEREinvoices.issue_date IS NOT null{ }UNION ALLSELECTinvoices.issue_date AS event_ts,customer_master.customer_id AS customer_id,customer_master.customer_name AS customer_name,invoice_line_items.description AS event_details,'Client Credited' AS event_type,COALESCE(SUM(invoice_line_items.amount ), 0) AS event_valueFROM{ } AS customer_masterLEFT JOIN{ } AS invoicesON customer_master.harvest_customer_id = invoices.client_idLEFT JOIN{ } AS invoice_line_itemsON invoices.id = invoice_line_items.invoice_id{ }HAVING(COALESCE(SUM(invoice_line_items.amount ), 0)
<
0)
UNION
ALL
SELECT
pageviews.timestamp
AS
event_ts,
customer_master.customer_id
AS
customer_id,
customer_master.customer_name
AS
customer_name,
pageviews.page_subcategory
AS
event_details,
'Site Visited'
AS
event_type,
sum(1)
as
event_value
FROM
{{
ref(
'customer_master'
)
}}
AS
customer_master
LEFT
JOIN
{{
ref(
'pageviews'
)
}}
AS
pageviews
ON
customer_master.customer_name
=
pageviews.network
{{
dbt_utils.group_by(n
=
5)
}}
UNION
ALL
SELECT
*
FROM
(SELECT
invoices.paid_at
AS
event_ts,
customer_master.customer_id
AS
customer_id,
customer_master.customer_name
AS
customer_name,
invoices.subject
AS
event_details,
CASE
WHEN
invoices.paid_at
<= invoices.due_date THEN 'Client Paid' ELSE 'Client Paid Late' END AS event_type,
SUM(invoices.amount) AS event_valueFROM{ } AS customer_masterLEFT JOIN { } AS invoicesON customer_master.harvest_customer_id = invoices.client_idWHEREinvoices.paid_at IS NOT null{ }))WHEREcustomer_name NOT IN ('Rittman Analytics', 'MJR Analytics'))
Deploying this model and then querying the resulting table, filtering on just a single customer and ordering rows by timestamp, you can see the history of all interactions by that customer with our sales and delivery channels.
You can also plot all of these activity types into a combined area and bar chart, for example, to show billable days initially growing from zero to a level that we maintain over several months, and with events such as client being invoiced, incoming and outgoing communications and the client actually paying in-full at the end.
We can also do things such as restrict the event type reported on to billable days, and then use Looker’s dimension group and timespans feature to split the last set of weeks into days and week numbers to then show client utilisation over that period.
Of course we can also take those monetary values assigned to events and use them to calculate the overall revenue contribution for this customer, factoring in costs incurred around preparing sales proposals, answering support questions and investing non-billable days in getting a project over the line.
Our second dbt model takes this event data and pivots each customer’s events into a series of separate columns, one for the first event in sequence, another for the second, another for the third and so on.
Again, code for this dbt model is available in our project git repo.
{{config(materialized='table')}}WITH event_type_seq_final AS (SELECTcustomer_id,user_session_id AS event_type_seq,event_type,event_ts,is_new_sessionFROM(SELECTcustomer_id,event_type,last_event,event_ts,is_new_session,SUM(is_new_session) OVER (ORDER BY customer_id ASC, event_ts ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS global_session_id,SUM(is_new_session) {{ customer_window_over('customer_id', 'event_ts', 'ASC') }} AS user_session_idFROM(SELECT*,CASE WHEN event_type != LAG(event_type, 1) OVER (PARTITION BY customer_id ORDER BY event_ts ASC) OR last_event IS NULL THEN 1ELSE 0END AS is_new_sessionFROM(SELECTcustomer_id,customer_name,CASE WHEN event_type LIKE '%Email%' THEN 'Presales'WHEN event_type LIKE '%Bill%' THEN 'Delivery' ELSE event_type END AS event_type,event_ts,LAG(CASE WHEN event_type LIKE '%Email%' THEN 'Presales' WHEN event_type LIKE '%Bill%' THEN 'Delivery' ELSE event_type END, 1) OVER (PARTITION BY customer_id ORDER BY event_ts ASC) AS last_eventFROM{ }ORDER BYcustomer_id ASC,event_ts ASC,event_type ASC) lastORDER BYcustomer_id ASC,event_ts ASC,event_type ASC) finalORDER BYcustomer_id ASC,is_new_session DESC,event_ts ASC)WHEREis_new_session = 1{ }ORDER BYcustomer_id,user_session_id,event_ts)SELECTcustomer_id,MAX(event_type_1) AS event_type_1,MAX(event_type_2) AS event_type_2,MAX(event_type_3) AS event_type_3,MAX(event_type_4) AS event_type_4,MAX(event_type_5) AS event_type_5,MAX(event_type_6) AS event_type_6,MAX(event_type_7) AS event_type_7,MAX(event_type_8) AS event_type_8,MAX(event_type_9) AS event_type_9,MAX(event_type_10) AS event_type_10FROM(SELECTcustomer_id,CASE WHEN event_type_seq = 1 THEN event_type END AS event_type_1,CASE WHEN event_type_seq = 2 THEN event_type END AS event_type_2,CASE WHEN event_type_seq = 3 THEN event_type END AS event_type_3,CASE WHEN event_type_seq = 4 THEN event_type END AS event_type_4,CASE WHEN event_type_seq = 5 THEN event_type END AS event_type_5,CASE WHEN event_type_seq = 6 THEN event_type END AS event_type_6,CASE WHEN event_type_seq = 7 THEN event_type END AS event_type_7,CASE WHEN event_type_seq = 8 THEN event_type END AS event_type_8,CASE WHEN event_type_seq = 9 THEN event_type END AS event_type_9,CASE WHEN event_type_seq = 10 THEN event_type END AS event_type_10FROMevent_type_seq_final){ }
Pivoting each customer’s events in this way makes it possible then to visualize the first n events for a given set of customers using Looker’s Sankey custom visualization, as shown in the final screenshot below.
More details on our own internal use of Looker, Stitch, dbt and Google BigQuery to create a modern operational analytics platform can be found in this earlier blog post, and if you’re interested in how we might help you get started with Looker, check out our Services page.