How Rittman Analytics does Analytics: Modern BI Stack Operational Analytics using Looker, Stitch, dbt and Google BigQuery

As well as creating modern BI stack analytics solutions for clients such as Florence, Let’s Do This and Colourpop using technologies from our partners at Looker, Stitch, Segment, Fivetran and Snowflake, it’s probably no surprise that we’ve used those same technologies and project experience to build-out our own internal analytics platform, covering data and operational analytics use-cases such as

  • Showing high-level KPIs that show progress towards our four main business objectives

  • Providing actionable context for these high-level KPIs and the ability to drill-into the detail behind them

  • Enabling ad-hoc querying and analysis of our complete dataset by technical and non-technical users

  • Providing a trusted, integrated analysis-ready data platform for other internal projects

overview.png

I thought it might be of interest to go through how we’ve used the tools and techniques we use for client projects but in this case to deliver an analytics solution for our own internal use, sharing some of our thinking around how we defined our underlying KPI framework, the tools and design patterns we used when building out the data management part of the platform, and some of the front-end components and data visualizations we’ve put together using the Looker BI tool.

Defining your KPI Framework

When creating any sort of analytics solution the first thing you need to understand is - what are your business objectives; what do you need to measure to understand your progress towards those objectives; and how do you then define success?

Key Performance Indicators (KPIs) and measures within our analytics platform are based around what's needed to measure our progress towards four business objectives we set ourselves for FY2019:

  1. Increase Billing Revenue

  2. Increase Profitability

  3. Increase customer retention, and

  4. Increase our operational and delivery efficiency

In-practice, we “certify” data around these four primary KPIs and also a number of secondary measures, with data in the platform normally up-to-date as of the previous business day and with historical data going back to when we started the business.

kpi.png

Other metrics, measures and dimension attributes are provided alongside these certified KPIs and measures and are considered “best endeavours” in terms of our certifying their accuracy, and users then of course able to create their own calculations for use in reports and analyses outside of this formal curation.

Platform Architecture

At a high-level our internal analytics platform extracts data on a regular basis from the SaaS-based services we use to run our business operations, joins that data together so that each dataset uses the same definition of clients, products and other reference data, then makes that combined dataset available for analysis as a set of business KPIs and an ad-hoc query environment.

ra_analytics_architecture.png

The design of our internal analytics platform was based around a modular BI technology stack, ELT ("Extract, Land and Transform") data warehouse loading pattern and modern software development approach that we use on all of our client analytics development engagements.

Data Pipeline and Source Connectivity

We’re digital-first, paperless and run our business operations using cloud-hosted software services such as:

  • Harvest for timesheets, project invoicing and client-rechargeable expenses

  • Harvest Forecast for project resource planning and revenue forecasts

  • Xero for accounting and payroll

  • Hubspot CRM for sales and contact management

  • Docusign for contract management

  • Jira for project management and issue tracking

  • BambooHR for human resources and recruitment

On the delivery side, Rittman Analytics partners with four different data pipeline-as-a-Service software vendors, each of which targets a different segment of the market and that segment’s particular use-cases:

  • Stitch, who focus on small to mid-market startup customers with data engineers who value service extensibility and their land-and-expand pricing model ($100/month for up to five sources, including Hubspot and Xero certified and Harvest community-supported connectors)

  • Fivetran, aimed more at mid-market to enterprise customers and the data analyst personas, with a more turnkey solution and increased levels of service and coverage of enterprise data sources

  • Segment, more of an "enterprise service bus" for event-level digital data streams, connecting multiple event producers with downstream event consumers and offering additional features around customer journey mapping, personas and schema validation

  • Supermetrics, aimed at growth hackers and digital marketers with a focus on simple self-service setup, comprehensive data dictionaries and a comprehensive catalog of advertising, marketing and social data sources delivered through a Google Sheets add-in or more recently, a direct connector into Google BigQuery through the GCP Marketplace.

We chose to use Stitch's service for our data pipeline for this internal project due to their out-of-the-box ability to source data from Harvest and Harvest Forecast; functionally both Fivetran and Stitch would meet our needs other than support for these two particular sources, and we've used Fivetran extensively on other internal and client projects in the past.

stitch3.png

We also use Segment's event tracking service to track visitor activity on our website and land those raw tracking events also in our BigQuery project, and intend to make use of Segment's Personas product in due course to enable us to build out 360-degree views of our clients, site visitors and prospects based on their interactions across all of our various digital touchpoints (web, social media, inbound and outbound lead generation, chat etc)

Data Transformation, Data Integration and Orchestration

We transform and integrate raw data that Stitch syncs from each of our SaaS application sources into an integrated, query-orientated data warehouse dataset using the open-source dbt toolkit.

Once we'd decided to work with a data pipeline-as-a-service such as Stitch together with a SQL-based data management platform like Google BigQuery, the decision to transform and integrate our data via a series of SQL SELECT was the obvious next design choice; by using dbt and version-controlling our scripts in a Github repository we increased our productivity when developing these transformation, adhered to modern software design principles and avoided cut-and-paste scripting in-favour of templated, maintainable code.

Screenshot_2019-05-07_at_19.44.16.png

Transformation steps in dbt are basically SQL SELECTs with references to metadata models replacing hard-coded table names and use of macros as reusable, often cross-database transformation components giving three main benefits over hand-written, author-specific code:

  • Increased analyst productivity through leveraging libraries of best-practice transformation components

  • More maintainable loading processes with concise logic that “doesn’t repeat yourself” (DRY)

  • Portable code that can be deployed on BigQuery today or Snowflake tomorrow if required

For example, as shown in the example code snippet below.

SELECT
        *
    FROM (
        SELECT
            harvest_user_id,
            a.id,
            allocation,
            project_id,
            start_date,
            end_date,
            {{ dbt_utils.datediff(start_date, end_date, 'day')}} +1 AS forecast_days,
            a._sdc_sequence ,
            MAX(a._sdc_sequence) OVER (PARTITION BY a.id ORDER BY a._sdc_sequence 
                RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS latest_sdc_sequence
        FROM
            {{ ref('harvest_forecast_assignments') }} a
        INNER JOIN
            {{ ref('harvest_forecast_people') }} p
            ON person_id = p.id
        )
    WHERE
        _sdc_sequence = latest_sdc_sequence
      )

In addition to the open-source dbt toolkit we also use the commercial dbtCloud service from our friends at Fishtown Analytics, mainly in order to then be able to execute our transformation graph in the cloud, like this:

dbt_cloud.png

dbtCloud also hosts the data dictionary automatically generated from the metadata in our dbt package, and diagrams the transformation graph and data lineage we’ve defined through the dependencies we created between each of the models in the package.

Screenshot_2019-05-07_at_13.00.00.png

As well as standardising each data source and mapping each source's set of customer references into a single, master customer lookup table that allows us to analyze operations from sales prospect through project delivery to invoicing and payment, we also turn every customer touchpoint into events recorded against a single view of each customer, like this:

Screenshot_2019-05-07_at_20.12.37.png

We then assign a financial revenue or cost value to each event in the customer journey, calculate metrics such as time between interactions (to measure engagement), value delivered to the client and use that information to better understand and segment our clients so as to provide the advice and service they're looking for at this particular stage in their lifecycle and relationship with our team.

event_model.png

Analytics and Data Visualization

We then use Looker as our primary analytics tool over the data prepared by dbt and stored in Google BigQuery. Everyone's homepage in Looker is set by default to show those four main operational KPIs with comparison to target and with the last six months’ trend history, providing the context when combined with Looker’s ability to go deep into the detail behind those headline numbers so that users can decide and action on what they’re seeing (note that I’ve altered and obfuscated our actual numbers in the next few screenshots)

dashboard2.png

We've also created a single business operations-wide Looker explore that enables users to analyze and explore all aspects of our sales, delivery and financial relationship with clients, suppliers, partners and prospects.

In the example below we've again using the event model and applying sequencing to those events bucketed by months since first billable day, to help us understand how the the demand decay curve looks for each cohort of clients we've taken on over the first year we've been operating.

decay.png

Embedded Analytics using Powered by Looker

We also link fields in our Looker explore back to the online services that can provide more background detail on a metric displayed on the screen, in the example below linking back to Harvest to see details of the project displayed in the explore data visualisation.

drilltoapp.png

Finally, for team members who are more likely to be working within our knowledge-sharing and internal company portal site, we use Powered by Looker embedded analyics within Notion, connecting our internal analytics metrics and insights directly into the productivity and workflow tools used by the team, bringing data and analytics to the problems they’re looking to solve rather than the other way around.

notion.png

So, hopefully this look behind the scenes at how we use these modern analytics stack technologies for our own analytics needs has been useful, and feel free to leave comments or contact me at mark.rittman@rittmananalytics.com if any of this would be applicable to your business. In the meantime, the dbt and Looker git repos used to build-out our platform are available as public git repos on Github:

Feel free to fork, improve on what we’ve done, submit PRs or just see a bit more of the detail behind the examples in this post.

Supermetrics, Google BigQuery and Data Pipelines for Digital Marketers

Chances are that around the corner from the analytics project you’re working on that’s using SaaS data piped-into your data warehouse using a service from Stitch or FIvetran, there’s a marketing analyst using Google Sheets and an add-in from Supermetrics to analyze their channel advertising spend.

Supermetrics are the biggest analytics tool vendor company you’ve never heard, and that’s because their products are aimed squarely at marketers, not data engineers or data analysts, and within that particular market they’re massive; what they sell is easy-to-setup connectivity to all the major advertising and social networks directly from within Google Sheets in the form of the Supermetrics Google Sheets Add-In, a product we’ve used in the past when looking to to understand how well we place in Google search results for keywords relevant to our business.

googlesheets.png

Supermetrics do two things well; they provide connectivity to all the major ad networks that marketers’ use (and there’s lots of them beyond the basic Adwords, Facebook Ads and Snapchat Ads that we’re most familiar with), and they focus exclusively on the non-technical, marketing user rather than data engineers and data analysts, providing in-addition to their Google Sheets add-in a connector for Google Data Studio, for example.

But although you can use Google Sheets as federated sources for Google BigQuery tables and theoretically you could then use the scheduled data refresh feature in Supermetrics’ add-in to regularly download new data and thereby create a sort-of data pipeline into your warehouse, in-practice this probably requires a bit more knowledge of how BigQuery works than the average marketer is likely to have.

So I was particularly interested to see that one of the new BigQuery features announced at Google Cloud Next’19 last week was a number of Supermetrics-provided data connectors for BigQuery in the Google Cloud Marketplace, providing data pipeline-as-a-service connectivity from ad network sources such as Bing Ads, LinkedIn ads, AdRoll and others and with data being fed directly into Google BigQuery using their Data Transfer service.

supermetricspage.png

Some of these sources are also covered by new connectors from Fivetran also in this same Google Cloud Marketplace, so what’s the difference between a data pipeline provided by Supermetrics compared to one from Fivetran or Stitch? Setting up a Supermetrics data pipeline using Google BigQuery’s data transfer service is pretty straightforward and not that different from setting up a data pipeline using Fivetran or Google’s own underlying data transfer technology, as shown in the screenshots below:

steps.png

Its when you look at how Supermetrics land data into BigQuery though that you see their difference in approach; instead of presenting you with several tables of data mapped to the structure of the API calls for each service, the Supermetrics BigQuery tables are typically denormalized (flattened into one big table of data) and come with extensive and verbose column descriptions and metadata, making it much easier for marketers to know what each column is used for and the precise definition of all the metrics and dimension attributes.

metadata.png

Pricing is per data source with a 14-day free trial at the start, to me it looks like Supermetrics are looking to position their pipeline-as-a-service in-between Stitch and Fivetran’s in-terms of pricing but with the service aimed at marketers, not data engineers. Details of pricing and pricing metrics are on the Supermetrics support page and full details of the service are on the Supermetrics for BigQuery product homepage.

Join us at Looker JOIN 2019, London on April 9th - The Brewery, 52 Chiswell Street

Looker JOIN London is coming to London next week and Rittman Analytics will be at the event as one of the Silver Sponsors - come and visit us at Stand 10, right at the front of the main hall just as you walk in!

JOIN London is an exciting opportunity to meet with Looker experts, data mavericks and industry thought leaders, all on-hand to share proven insights on how to become more data-driven. Get ready for thought-provoking presentations, informative discussions and interactive sessions, adopted from the best of Looker’s JOIN conferences that originally ran in San Francisco, USA.

We’ll be there all day and keen to hear about your data initiatives, project needs or just to have a chat about Looker, analytics and driving value from data. Registration is free and still open, check-out the event details page for more information.

mjr analyticsComment
Released Today: The M.O.M.E. Project - Multiple OLAP Machine Emulator

On this day last year we broke the news on Oracle Discoverer Cloud Service; this year we go one-better with the announcement of a new open-source software and homebrew hardware project now hitting its 1.0.0 release milestone: M.O.M.E. - Multiple OLAP Machine Emulator, sponsored by Rittman Analytics.

From the project homepage on Github:

console.png

“MOME is a framework for emulating multi-dimensional Online Analytical Processing ("MOLAP") engines, 4GL programming languages and the servers and OLAP applications that ran on those engines.

MOME's purpose is to preserve decades of multi-dimensional MOLAP engine and software history. As analytical processing technology continues to rush forward, MOME prevents this important "vintage" technology from being lost and forgotten. This is achieved by documenting the MOLAP engine, its multi-dimensional storage format and 4GL query languages, along with the software and sample databases for those engines such as Microsoft OLAP Services 7.0 AdventureWorks.

Over time, MOME (originally stood for Multiple OLAP Machine Emulator) absorbed the sister-project MESS (Multi-dimensional Engine Super System), so MOME now documents a wide variety of (mostly vintage) MOLAP, Relational OLAP ("ROLAP") and now Hybrid OLAP "HOLAP" engines in addition to the MOLAP engines and desktop applications that were its initial focus.”

MJR Analytics is now Rittman Analytics ... and an Update on our First Six Months

As of last Friday our company name has changed from MJR Analytics to Rittman Analytics. Nothing else has changed in-terms of ownership or services we provide but we wanted to take advantage of our ability now to put our name and reputation front-and-centre of our branding and website. All emails to our old @mjr-analytics.com domain will automatically redirect, and any links pointing to https://www.mjr-analytics.com will automatically redirect to http://rittmananalytics.com.

Other than changing our name, since our launch last September we've worked with a number of great customers together with our vendor partners, with five of those projects written-up as case studies and customer quotes in the Customers section of our website:

  • Colourpop : eCommerce and customer analytics using Shopify, Redshift and Looker

  • Let's Do This : eCommerce/conversion analytics using Segment, Redshift and Looker

  • Florence : Operational analytics, retention and RFM segmentation using Looker and PostgreSQL

  • INTO University Partnerships : Digital Analytics Strategy for MS PowerBI and Heap Analytics

  • Qubit : Data Engineering and customer/personalization analytics using Google BigQuery and Looker

We're also co-chaired and helped organize the recent Looker London Dev Meetup together with Jon Palmer from GoCardless, and presented a short session on Improving your User Engagement with Looker’s new System Activity Model - look out for the next Meetup in June later this year!

Looker is our primary, strategic vendor partner and since our launch last year we’ve also partnered with selected vendors that provide features complementary to Looker:

  • Segment for integrating event-level data sources, single customer view and data quality

  • Fivetran for no-ops data pipelines

  • Qubit for consumer behavioural, segmentation and personalization analytics

  • Snowflake for no-ops cloud data warehousing-as-a-service

  • Oracle for enterprise-level analytics and ERP/CRM integration

We’re also now recruiting for our first permanent consulting roles to work alongside myself and our team of experienced contract developers and data scientist, and we're looking forward to being one of the sponsors for the upcoming Looker Join 2019 event in London - see some of you there if not sooner!

Mark RittmanComment
Drill to Detail Podcast Returns with Ep.60 'A Deeper Look Into Looker' With Special Guest Lloyd Tabb
DTDv3 iTunes Logo with Border.jpeg

One of our CEO and founder Mark Rittman’s side projects is the Drill to Detail Podcast, a podcast series where Mark interviews a special guest each episode to talk about their work, their products and their take on the analytics, big data and data management industry.

Drill to Detail is now on its 60th episode and now starts its third season with Mark joined by Lloyd Tabb, Founder, CTO and Chairman of Looker to talk about the foundational story of Looker and LookML, query latency and semantic models, analytic engines and code IDEs, analytics developer workflows and the rise of cloud elastically-scalable databases, packaged applications and embedded analytics and why learning (and loving) are the long-term keys to analytics and business success.

mjr analyticsComment
Looker London Meetup on Thursday, February 7th 2019 - Registration Now Open!
meetup2.png

If you’re in London on Thursday 7th February 2019 and a developer or end-user working with the Looker Data Platform, we’re very pleased to be launching together with Jon Palmer from Go Cardless the first ever Looker London Dev Meetup, sponsored by the good folks at Looker and hosted at Go Cardless’ HQ at 65 Goswell Road, London EC1V 7EN.

The aim is to create a space for showcasing and discussing what data people in London are building in and around the Looker ecosystem, and for this first meetup we have three great customer and partner speakers sharing their insights on Looker development and project implementations:

  • "Scale securely with the Looker API" Dylan Baker, Consultant

  • "Internal and External Multi-tenancy with Looker" - Ian Thompson, King

  • "Increase End-User Engagement with Looker 6 System Activity Model" - Mark Rittman, MJR Analytics

followed by a panel session where the speakers are joined by one of our guests from Looker:

  • “Data Analytics Panel - Hear from Industry experts and Looker users from GoCardless, King, MJR analytics and Looker.”

Registration and attendance is free but spaces are limited, so make sure you register soon if you’d like to come along - the meetup starts at at 5.30pm with drinks, food and introductions from Jon, full agenda is on the meetup page and we’re aiming to wrap-up around 9pm with closing comments from me and a short survey on future meetup content.

MJR Analytics Sessions at UKOUG Tech'18, Liverpool ACC, 3rd-5th December 2018
ukoug3.png

Oracle Openworld may have San Francisco and the Moscone Center but you can’t beat Liverpool and the wind whipping in-off the Mersey in early December, and so we’re very pleased to be presenting once again at the UK Oracle User Group Technology Conference & Exhibition running next week at the Liverpool ACC, December 3rd-5th 2018.

Mark Rittman is presenting two sessions on Oracle’s new autonomous analytics and data warehousing platforms:

  • “Planning a Strategy for Autonomous Analytics & Data Warehouse Cloud”, Monday 3rd December 2:30 PM - 3:15 PM in Room 13

    ”As Oracle Analytics and Data Warehousing becomes self-driving and autonomous, the need for a strategy within your BI function becomes all the more important. How you deliver BI content to your users, the skills your developers now need and the most efficient way to manage your cloud estate are vital components of an autonomous cloud analytics strategy; this session will explain what’s changed, what’s significant and what are the implications of that change.

    By reference to today’s Oracle BI technology and applications and explaining how they now run autonomously as services in Oracle Cloud we’ll cover what are the key changes and its implications, what this means in terms of budgets and planning your investment, what are the key technology and process changes, what this means for team resourcing and roles - and just as importantly, what can be safely ignored for now.”

  • “BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition”, Tuesday 4th December 12:35 PM - 1:20 PM Room 11A

    ”In this session, we'll look at the role of the data engineer in designing, provisioning, and enabling an Oracle Cloud data lake using Oracle Analytics Cloud Data Lake Edition. We’ll also examine the use of data flow and data pipeline authoring tools and how machine learning and AI can be applied to this task. Furthermore, we’ll explore connecting to database and SaaS sources along with sources of external data via Oracle Data-as-a-Service. Finally we’ll delve into how traditional Oracle Analytics developers can transition their skills into this role and start working as data engineers on Oracle Public Cloud data lake projects.”

Mark Rittman will be at the event all week, so if you’ve got any questions about the topics we’re talking about in the two sessions or any aspect of Oracle Analytics, big data or Autonomous Data Warehouse either drop us an email beforehand, give us a call on +44 7866 568246 or just stop us in the corridor and we’ll be pleased to chat then.

Five Thoughts About Thomas Kurian’s Move to become CEO of Google Cloud Platform

The news broke late on Friday that Thomas Kurian is joining Google to become their new CEO for Google Cloud Platform. Five thoughts on Kurian’s move:

  1. It’s a smart move made by an exceptionally smart guy. Brave, as well, given the recent history between Oracle and Google but also not surprising given his drive and presumably point to prove. I met him a few times as part of Oracle’s ACE Director program and he’s the only software exec I know who can talk long-term strategy and vision one minute and then explain to you in-detail how it all works, and doing it all with great humility and courtesy.

  2. The fact that GCP is spoken-of as an also-ran at 10% market share whilst Oracle Cloud gets bundled in with “Next 10” shows what a mountain Oracle have to climb to even become a contender to compete with Microsoft and Amazon in the cloud business - and their insistence on only allowing their SaaS and PaaS apps to run in Oracle Cloud is a worrying parallel with the “Windows Tax” that Microsoft’s Office and Server products teams had to pay back in the Steve Ballmer Era, but with Oracle’s equivalent to Satya Nadella having lost the argument and jumped-ship

  3. But Oracle will survive and this has happened many times before - Ray Lane, Charles Philips, Tom Seibel, Mark Benioff all left and in many cases founded massively successful and competitive businesses, client-server went to internet architecture and then internet went to cloud, it’s all part of how businesses renew and people move on and up, and there’s plenty more smart (and most likely, smarter) people left in Oracle and Larry Ellison is still just as driven, competitive and on-top of his game.

  4. Look out for a very interesting (as in Brexit, interesting to watch but not to be part of) culture clash at GCP, with TK about the most top-down leader of a product development team I’ve ever seen and Google, famously, engineering-focused beanbag-friendly and bottom-up. Add to that the feelings most Googler’s have towards Oracle and TK will have his work cut-out early on to win the various teams over - my guess is that his work-ethic, technical chops and sheer ability will do just that and if this role is a success, Sergey and Larry may well have found their new Eric Schmidt but this time with technical skills to match theirs - but there’s always the chance that culture will prevail and he’ll be the next Marissa Meyer instead. No pressure there then..

  5. Expect to see GCP moving increasingly into SaaS B2B areas such as ERP, CRM, Financials and industry-vertical applications to complement their commoditised IaaS and PaaS cloud business and leveraging their massive GSuite and increasingly ChromeOS install base. Just think what they could do if they had access to all the world’s structured business transactional data as well as the unstructured search, email and spreadsheet data they have now - even more comprehensive and market-leading search, the world’s biggest and most valuable ML training set, and a business model that could provide these applications for free in exchange for exclusive access to that data and Google being your default search engine. That’s the real existential threat to Oracle; spending all their time trying to win an un-winnable cloud infrastructure war and then GCP coming along and making ERP, CRM and business applications essentially free.

Slides and Forbes.com Article from "Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse Cloud"

One of the sessions we delivered at Oracle Openworld 2018 was “Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse Cloud”, a presentation on the agility that Oracle Autonomous Data Warehouse Cloud (ADWC) can provide for teams within big, traditional businesses that would otherwise have to wait months to provision a physical Oracle data warehousing database. Moreover, ADWC instances can scale-up and scale-down as demand grows over time, and hits peaks such as Black Friday that are only short-term but in the past would have required companies to pay for and provision that level capacity all year round even though it sat idle most of the time.

The story behind the presentation was also written-up as a profile article in Forbes magazine by Oracle’s Jeff Erickson, with a great description of how MJR Analytics works on client projects.

To understand the type of agility Rittman is talking about, look at how he works with a client on a data analytics question: “You want to help them get inside their data to see trends and patterns” that they couldn’t otherwise see, he says. That might mean quickly bringing in public data or data from a data-as-a-service company and connecting outside analytics tools. “I work fast,” he says. With a retail customer, for example, “we’ll look to understand the lifetime value of a customer, their cycle of purchases, repeat purchases—all kinds of scenarios. And the whole thing will happen in a day. Much of it collaboratively with the client on a web conference.
— Jeff Erickson, "How A Big Business Can Use An Autonomous Database To Move Like A Startup"

But don’t just take our word for it, or even Forbes Magazine - our objective is for every client engagement to be referenceable where client confidentially allows, and you can check-out our first two testimonials from Colourpop and from Florence on our new Customers page … with a couple more to be added in the next week or so!