Ad Spend and Campaign RoI Analytics using Segment, Looker, dbt and Google BigQuery

You’ve probably heard of the saying “Half the money I spend on advertising is wasted; the trouble is, I don’t know which half.”, and digital marketing was supposed to fix this problem. If only it were that easy.

Mark Rittman

You’ve probably heard of the saying “Half the money I spend on advertising is wasted; the trouble is, I don’t know which half.”, and digital marketing was supposed to fix this problem. If only it were that simple.

You spend money on online ads and know precisely who clicked on the advert, how many then went on to buy something (or “convert” in eCommerce terminology) and how many then went on to buy again and again and become your best customers.

Except you don’t. Because your ad spend data comes from the ad network and they’re attributing just about anyone who’s even been in the same room as one of your ads to their network as a conversion; you’ve got initial conversion revenue data in Stripe and predicted lifetime value numbers in Baremetrics but the chance of getting someone’s time in Engineering to connect it all together is about zero.

Which means once again, another month comes around and you’re spending $50k, $60k, $100k on Facebook Ads, Google Ads and Bing Ads, you’ve got a target to hit for new customer acquisition … and you just don’t know which channels bring in your most profitable customers.


Previously on this blog I’d written about the revenue part of marketing attribution in Multi-Channel Marketing Attribution using Segment, Google BigQuery, dbt and Looker, but what about the cost side? How can we bring in ad network cost data and use it, together with attributed revenue data, to work out how effective our ad campaigns have actually been?

Take as an example a mobile app business that sells subscriptions via Stripe, uses Baremetrics to predict lifetime value for those subscriptions and uses Segment to collect user activity data across their mobile and web applications.


Together with a cloud data warehouse service such as Google BigQuery and a data visualization tool such as Looker, we can put together a solution that lets us not only analyze the clicks and spend data from our ad networks, but also lets us connect that spend to the revenue we’re recording from Stripe and attributing to those ad campaigns to create dashboards like the one below.


Putting this solution together involves a series of steps that acquire, transform, join and combine data from all of these sources.

  1. We first need to extract details of our ads, ad campaigns and the performance (clicks, spend, impressions etc) using APIs each of the ad networks provides for campaign analysis

  2. Then we need to create a combined dataset from that ad network data that enables us to analyze spend and performance across all networks and campaigns

  3. Next we bring the ad network data, typically only available in aggregated form at the ad, campaign and day level, and join it to the session and page view-level visitor event data we’re recording from our website and mobile app, enabling us to compare ad network reported numbers with the actual numbers we’re seeing from Segment

  4. Separately to this ad spend analysis activity, we bring in data from the revenue attribution model described in Multi-Channel Marketing Attribution using Segment, Google BigQuery, dbt and Looker that assigns the credit for conversions, immediate and predicted lifetime revenue from those conversions to the sessions (touchpoints) that the converter interacted with on the path to conversion

  5. Finally we aggregate the session-level data from the revenue attribution model up to the ad campaign and day level, and then join that revenue data to the ad network performance and observed clicks data so that we can report on the revenue earned and conversions resulting from the spend we’ve made on our individual ads and campaigns.


Shown in the form of a data flow or data transformations graph with data sources on the left and the final transformed datasets on the right, the above steps would translate into the diagram above.

For the first step our usual approach to sourcing data from Facebook Ads and Google Ads would be to use a service such as Stitch or Fivetran to create a fully-managed data extract pipeline into Google BigQuery, but as Segment Connections is already being used to bring in website and mobile app event data we can make use of another Segment feature, Object Cloud App Sources, to replicate the Facebook and Google Ads data into BigQuery instead.

Object Cloud App Sources (as opposed to Cloud Event Sources and regular Event Sources) don’t register ad activity as events in Segment’s event stream but they do the same job as Stitch and Fivetran’s API extract data pipelines, and they’re included in the entry-level Team Plan if you’re a paying Segment customer.


Before we can combine data from the two ad network Segment sources for step 2 in our plan, we first have to standardise table and column naming across the two sources. We can do this using SQL SELECT statements either in SQL views or using a tool such as dbt (“Data Build Tool”) that we use on most client analytics projects.


Google Ads’ API provides spend and click performance data at each of the ad, ad group and campaign levels whereas Facebook’s API provides it at just ad-level, so if you want to base your reports and dashboards at the campaign level you’ll need to roll-up the Facebook data to campaign-level before joining it to Google Ads’ campaign performance data.


To enable comparison of this ad click and cost data from Facebook and Google to the actual clicks and revenue we’re recording via Segment from our website page view data we’ll need a way of connecting these two sets of data together. Facebook Ads provides UTM parameter values for your ads in the API tables it provides, making it easy to look for those values in the session entrance page view data Segment records when people clicking on those adverts arrive at your site.

Google Ads prefers that you use their encrypted gclid (“Google Click Identifier”) tracking codes rather than utm parameters when tracking campaign performance as they enable more campaign metadata to be included in tracking details and then encrypt it, stopping what could potentially be sensitive campaign names or other personally-identifiable data being sent in clear text in the URL querystring.

However only Google Analytics can decrypt the contents of a gclid tracking code rendering them fairly useless when need that metadata to be readable by another process, for displaying the name of the campaign, ad or creative version in a non-GA dashboard report.

As we want our ad spend analytics to be cross-platform and usable outside of GA we instead match on ad_id and store it in the utm_content parameter when setting up tracking codes for new ads, rolling the numbers up to ad group and campaign level from there when needed for reporting.


Put together, the transformations for these first three steps in the process look like the subset shown below of the full transformation graph.


Each step in this graph is a SQL transformation needs to be executed in the right order, waiting for any other transformations the next step depends on to complete before handing-off control to the next stage, and so we use dbt to create a DAG (directed acyclic graph) of transformations to ensure everything gets executed in the correct order, and that we can test everything properly before rolling it out to end-users.


You can read more about our use of this tool in Introducing the RA Warehouse dbt Framework : How Rittman Analytics Does Data Centralization using dbt, Google BigQuery, Stitch and Looker and the code we use for ad spend analysis and marketing attribution is contained within the RA Data Warehouse for dbt public git repo on our Github site.

Once we’ve run all of these ad spend and performance transformations and combined it with actual click data from our website through Segment, we end-up with a table that looks like the one below.


Using this data we can put together some useful charts and dashboards in Looker to show how spend on our advertising campaigns has translated into clicks and traffic sent to our website.


To calculate ratios and metrics such as Return on Investment (RoI), Return on Advertising Spend (ROAS) and our LTV/CAC (Customer Lifetime Value / Customer Acquisition Cost) ratio, we’ll need to add ad campaign revenue data from the Multi-Channel Marketing Attribution model I mentioned right back at the start of this article.

Our revenue attribution approach assigns a percentage of each conversion to sessions (touchpoints) leading up to the conversion (up to a maximum of 90 days before the conversion date) using first click, last click, even click and time-decay attribution models.


Time decay attribution splits the credit for conversions again across all qualifying sessions (touchpoints) but more favourably weights this credit towards more recent touchpoints using a formula where credit halves every 7 days (the “half-life”) using the formula 2^(-x/7) where x is the number of days prior to the conversion event. Expressed in BigQuery Standard SQL this formula comes out as shown below:

  WHEN TIMESTAMP_DIFF( session_start_ts, conversion_ts, DAY ) > -90
  THEN POWER( 2, TIMESTAMP_DIFF( session_start_ts, conversion_ts,
    DAY ) / 7 ) 
  CASE WHEN TIMESTAMP_DIFF( session_start_ts, conversion_ts, DAY ) > -90 
  THEN POWER( 2, TIMESTAMP_DIFF( session_start_ts, conversion_ts, DAY ) / 7 ) 
  END ) 
OVER (PARTITION BY blended_user_id, user_conversion_cycle ) 
AS time_decay_attrib_pct

and a typical time series attribution looks like the screenshot below:


The final step in our series of transformations is to join the ad campaign spend and clicks data to the ad revenue attribution data, aggregated from the one-row-per-session (touchpoint) level of detail its stored in the attribution model to the same day/ad/campaign level as the ad campaign spend data.


Now we’ve got ad spend and revenue linked together in this way, we can add a final set of metrics to our ad spend analytics dashboard and properly understand the actual return on investment we’re getting on our ad campaigns, enabling us to allocate budget to those campaigns that bring in the most conversions and customers with the greatest expected lifetime value.


So if you’re also looking to understand which half of your ad spend budget is being wasted, which ad channels and campaign approaches really bring in long term revenue and which ones just fill the funnel but never convert, then contact us now at [email protected] or book in a free discovery call with Mark Rittman, author if this article and lets see if we can do the same for you and your business.

Column 1

Some text..