Multi-Channel Marketing Attribution using Segment, Google BigQuery, dbt and Looker

Mark Rittman


1_V36-h1V-t7MDWZBG-Eqoaw.png1_V36-h1V-t7MDWZBG-Eqoaw.png

If you’re a marketer responsible for acquiring new users for your online service, you don’t want to rely just on the reporting and attribution models provided by your ad networks. Why? Because they only show the effectiveness of each channel in isolation and don’t really consider the way in which visitors interact with a number of different channels and touch-points on the path to conversion.

What you really want is attribution and conversion reporting thats independent of the networks themselves, uses a multi-touch attribution model that considers all the touch-points in the conversion journey and is under your control, not the advertisers.

When you control your own attribution model and how those numbers are reported you can switch between models that align with the goals that your business has at particular points in-time; for example using a first click attribution model if your priority is lead generation, last click if conversions are your priority or even time-decay, if you want to place more emphasis on touchpoint interactions closer to conversion.

And just as importantly; if you connect your downstream operational systems to your attribution model and user tracking service you can use the actual revenue or contribution each customer has delivered rather than guessing at an average contribution, something particularly important if one of the channels you’re spending your acquisition budget on brings in users that churn quickly whilst another brings in loyal, long-term and valuable customers.


Screenshot 2020-02-08 at 23.45.01.pngScreenshot 2020-02-08 at 23.45.01.png

For example, an online marketplace may use Facebook Ads, Google Ads and other ad networks to recruit new users who then, to some degree or another go on to generate revenue for the business each time they transact. Like most high-growth online businesses they’re spending a significant amount of their current funding round on growing their user base and wanted to know which channels brought in the ones that actual generate revenue.

They use Segment Connections connected to a downstream Google BigQuery data warehouse to record all their website visitor page views and interactions, and we deployed dbt, Stitch and Looker a while ago to provide their team with point-and-click operational analytics dashboards using the same BigQuery data warehouse to hold their analytics data. We therefore added the Segment package to their dbt project and used it to stitch together all of the interactions by a site visitor both when they were unauthenticated as a visitor to the site and after they’d registered and authenticated, so that we could look at the complete history of a user’s sessions including each session’s UTM campaign parameters using a BigQuery SQL query like this:

SELECT w.full_name,
       s.session_start_tstamp,
       s.session_end_tstamp,
       s.session_id
FROM   `ra-development.analytics.segment_web_sessions__stitched`

Now because we’d previously centralised this customer’s operational system’s data into the same BigQuery data warehouse, it was a simple matter to join the blended user ID coming from Segment to each users’ ID coming from their main operational system to retrieve their date of registration. Adding a few more columns to the query output to give us the first page in session and other potentially useful information and filtering to only return those sessions taking place up until the time of registration, the SQL query we then used against the centralised Segment and operational dataset now looked like this:

SELECT  w.full_name,
        s.session_start_tstamp,
        s.session_end_tstamp,
        w.created_at,
        w.id as user_id,
        s.session_id,
        row_number() over 
        (partition by w.id order by s.session_start_tstamp) as session_seq,
        case when w.created_at between s.session_start_tstamp and s.session_end_tstamp then true
                    else false
                    end as conversion_session,
               case when w.created_at < s.session_start_tstamp then true
                    else false
                    end as prospect_session,
        coalesce(s.utm_source,'Direct') utm_source, 
        coalesce(s.utm_content,'Direct') utm_content, 
        coalesce(s.utm_medium,'Direct') utm_medium, 
        coalesce(s.utm_campaign,'Direct') utm_campaign,
        s.first_page_url_path as entrance_url_path,
        s.last_page_url_path as exit_url_path,
        referrer,
        duration_in_s,
        page_views
 FROM `ra-development.analytics.segment_web_sessions__stitched` s
 JOIN `ra-development.analytics.users`  w
 ON cast(w.id as string) = s.blended_user_id
 WHERE w.created_at >= s.session_start_tstamp
 ORDER BY w.id, s.session_start_tstamp)

To return a user-specific measure of value that we can then use in our attribution model, we’ll create a simple query that returned each user’s ID and the total number of transactions they’d completed on the platform.

SELECT user_id AS user_id,
       SUM(transactions) AS total_user_transactions
FROM   `ra-development.analytics.user_transactions`
WHERE  is_approved
GROUP BY 1

Now comes the part when we start adding attribution logic into the SQL query using the following rules:

  • For each of the sessions for a given user, use the FIRST_VALUE analytic SQL function to work out if that session is the first one in sequence for the user; if it is and we’re using the first click attribution method then allocate 100% of the conversion value to that session and none to the ones that follow, an attribution method that places all the value on the acquisition channel that first brought that user to our website

  • Similarly, use the LAST_VALUE function on each user session row and allocate 100% of the conversion value to that row if we’re using the last click attribution method, and none for the ones that preceded it, placing all the value on the channel that eventually converted the visitor into a paying user

  • If we’re using the even click attribution method, count how many sessions the user took in-total to eventually convert and allocate the conversion value equally to each of them

  • And if we’re using the time decay attribution method, work out how many days did the visitor take to convert and then allocate 100% of the value to sessions in the 7 days up to conversion, 50% of the conversion value to those session between 7 and 14 days and halve and halve the value attributed again for sessions up to 21 and 28 days before conversion.

Added to our SQL query as a CTE with the user transactions SELECT as a second CTE, the query to implement these attribution rules looks like this:

WITH session_attribution AS
      (SELECT *,
       CASE WHEN session_id = LAST_VALUE(session_id)
                              OVER (partition by user_id order by session_start_tstamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) THEN 1
            ELSE 0
       END AS LAST_click_attrib_pct,
       CASE WHEN session_id = FIRST_VALUE(session_id)
                              OVER (partition by user_id order by session_start_tstamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) THEN 1
            ELSE 0
       END AS first_click_attrib_pct,
       1/count(session_id) OVER (partition by worker_id) AS even_click_attrib_pct,
       CASE WHEN session_start_tstamp <= timestamp_sub(created_at, interval 7 day) THEN 1
            WHEN session_start_tstamp > timestamp_sub(created_at, interval 14 day) THEN .5
            WHEN session_start_tstamp > timestamp_sub(created_at, interval 14 day) AND session_start_tstamp > timestamp_sub(created_at, interval 21 day) THEN .25
            WHEN session_start_tstamp > timestamp_sub(created_at, interval 21 day) AND session_start_tstamp > timestamp_sub(created_at, interval 28 day) THEN .125
            ELSE 0
       END AS time_decay_attrib_pct
       FROM
           (
           -- sessions_and_registration_dates_CTE
           ),
     user_transactions AS
        (
    -- user_transactions_CTE
         )

Then we complete the attribution model query with a final SELECT from both CTEs at the end, multiplying the total transactions for each user by the various model percentages for each user session row coming from Segment, like this:

SELECT session_attribution.*,
       total_user_transactions * first_click_attrib_pct AS first_click_attrib_total_transactions,
       total_user_transactions * last_click_attrib_pct AS last_click_attrib_total_transactions,
       total_user_transactions * even_click_attrib_pct AS even_click_attrib_total_transactions,
       total_user_transactions * time_decay_attrib_pct AS time_decay_attrib_total_transactions,
       1 * first_click_attrib_pct AS first_click_attrib_registrations,
       1 * last_click_attrib_pct AS last_click_attrib_registrations,
       1 * even_click_attrib_pct AS even_click_attrib_registrations,
       1 * time_decay_attrib_pct AS time_decay_attrib_registrations
FROM session_attribution
JOIN user_transactions
ON session_attribution.user_id = user_transactions.user_id

The final, compete SQL query used for the attribution model is also available for download as a Github gist here.

Finally, we can add this query into dbt as a model or just plug it directly into Looker as a SQL derived table, then create reports such as this one that aggregate each week’s registrations by week and channel to show where each week’s new users came from, using just one of the attribution models:


Screenshot 2020-02-08 at 23.04.41.pngScreenshot 2020-02-08 at 23.04.41.png

Or you can create a Looker dashboard with attribution models selectable via a dashboard filter, allowing users to show the value of each channel, campaign and source over time and by share of revenue generated using either of the first click, last click, even click or time-decay attribution methods.


Screenshot 2020-02-08 at 23.11.55.pngScreenshot 2020-02-08 at 23.11.55.png

As well as building multi-touch attribution models using Google BigQuery and dbt, Rittman Analytics partners with Qubit, Segment, Stitch, Fivetran and Snowflake to offer a full range of data analytics and data centralization services for high-growth, data-rich businesses; see our Data Centralization and Getting Started with dbt solutions pages for details or contact us now to find out more.