10 Ways Your Modern Data Stack Project Can Fail

The rise of the modern data stack has made it easier, cheaper and faster-than-ever for businesses to centralize their data. Built on cloud-based services, the modern data stack enables businesses to provide reports and KPI dashboards for business end-users, activating that data to improve the efficiency and effectiveness of their customer interactions.

Over the past few years, we’ve engaged with users of a modern data stack project who were in the “discovery” or “adoption” phase. These individuals were using services such as Fivetran, Snowflake and Looker to centralize one or two data sources in a cloud data warehouse and to build dashboards for use by “early-adopter” departments and end-users.

Those businesses are now moving through the adoption phase and are experiencing common concerns. These include:

  • Data quality

  • Data velocity 

  • Data architecture 

We have also noticed that businesses are struggling to use their data to hit business goals and objectives, due to a lack of clarity and direction of their data stack projects. 

 With ever-increasing cloud infrastructure costs, the modern data stack could well be entering the “trough of disillusionment” stage in the classic “hype cycle” diagram of technology adoption.


So what are the main ways that your data projects can go wrong and turn your modern data stack into an expensive failure? 

Our specialist data team have produced a list of 10 of the most common pitfalls that many businesses are experiencing: 

1. Unreliable Data that Users Don’t Trust

Everyone’s got a new dashboard but they still don’t trust the numbers. The modern data stack makes it easy to replicate your SaaS data into a warehouse and build dashboards for users, but how do you guarantee that all of the data is correct and keeps your users trust as they use it for decision-making?

The answer is to make testing central to your analytics development workflow, and automate that testing so that it becomes embedded in how your data team works. 

Testing of assumptions about new data sources, testing of any new features and functionality provided to end-users and testing of data once it’s in production, observing the shape and skew of data that passes all of your validity checks but is clearly implausible in real-life.

2. Data Centralisation too Complex to Scale

As new data sources are added to your warehouse, centralising your data becomes increasingly complex.

When none of your systems are the definitive source of customer, product, sales or other key business entities, your design approach of working it out as you go along can quickly leave you with a data model that’s complex, confusing and grinding your project to a halt.

Scaling your data centralization project beyond a trivial amount of sources needs your team to have a set of design patterns and a layered warehouse design such as the one we use on our client projects that:

  • combines identity across multiple systems when none of those systems are the definitive source of that data

  • deduplicates multiple sources of customer data even from within the same source system

  • keeps the velocity and agility of project delivery consistent even as complexity increases

  • and most importantly. makes sure that numbers are trusted, tested and actually add up properly3. Pace of Delivery – keeping up with the speed of data

3. Pace of Delivery too Slow for the Business

A new data stack is of little value when it is unable to provide the data you need – when you need it. And, as your business scales, that demand for more data will only grow stronger.

The pace of delivery for your data team needs to match the velocity of your business otherwise it’ll quickly become irrelevant, and you’re back to the bad old days of decisions being taken by the “loudest person in the room”.

Addressing this issue of project velocity requires three things to be in place:

  1. Data team leadership with experience and an ability to communicate clear and achievable goals

  2. An analytics delivery approach that delivers “right first time” at a predictable and rapid pace

  3. Access to design patterns, toolkits and accelerators that help your data team complete the simple tasks quickly, freeing them up to focus on the more complex problem solving tasks

4. Misalignment with Business Goals

Just as fatal to the success of your data stack are projects that deliver but which are misaligned with your business goals. However hard the data team works and regardless of how innovative their approach, a data team and data stack that are falling short of solving real business problems will struggle to drive their business forward.

To avoid this issue depends upon two things; making sure that your analytics team and data stack are consistently being used to solve complex business problems that deliver near-term, quantifiable and tangible benefits for a key, influential stakeholders; and by ensuring that your pace of delivery is sufficient with a focus on solving enough of a problem to get something of value delivered, rather than satisfying technical curiosity.

5. Lack of Experience with New Technologies

Building a modern data stack requires new tools, technologies, and processes but a central challenge to you is that your new data team lacks experience in this environment.

There just aren’t many Heads of Data available for hire who have built a modern data stack, they’re expensive and right now, it’s unlikely you’d get sign-off for a new full-time hire at this level at all.

A better approach is to bring in a consulting partner such as Rittman Analytics who have worked with and built data stacks for hundreds of venture-funded, fast-growing businesses.
Your consulting partner should also be deeply plugged into the networks of the innovators in the space in order to provide your data team with the context and understanding they need of exactly how these tools work.

At Rittman Analytics, we’re typically engaged after A and B-series funding rounds or when a modern data stack project gets the go-ahead in a more established business, and we usually work directly with the CTO and executive team. We set up analytics infrastructure and processes and build the core warehouse, metrics and reports the business needs.

As our clients’ businesses grow, we typically transition to more of a supporting role, helping clients grow their data teams by recruiting, interviewing, and training new team members on best practices and our delivery approach.

6. Lack of Planning for Support and User Enablement

Smart data teams enjoy learning and solving new problems; building your new data stack is one of the most interesting and rewarding projects your team can work on right now.  But what do you do when they move on, taking their new skills away with them to another challenge or opportunity? Who do you have left to maintain and extend the data stack that is so critical to the growth of your business?

Consulting partners such as Rittman Analytics address this need by providing ongoing support for data stacks that they or your team built. This is typically structured  through a monthly package of hours and services that maintain and monitor your analytics infrastructure. Your consulting partners should also be able to help you identify and resolve any issues caused by source data changes, giving you the reassurance that everything is in safe hands.

7. Cloud Infrastructure Costs getting Out-of-Control

Back in the old days of on-premises data warehouses, you knew when it was time to start looking at your query costs when you couldn’t fit any more servers in the server room.

Today, the only limit to how many Snowflake credits or BigQuery slots your data stack’s queries can consume is the limit on your company credit card, and tuning and optimizing the SQL generated by your BI tools and transformation processes requires specialist skills.

Keeping your cloud infrastructure costs under-control comes down to three things:

  1. A development process that considers the cost of ownership of your data pipeline and analytics platform and tests that it runs within acceptable cost parameters

  2. A monitoring and altering process that identifies queries and pipeline processes that exceed expected cost limits

  3. Experience within the delivery team to tune and optimize the SQL running in your data warehouse

8. Data warehouse platforms that don’t scale

Of course, the reason nobody wants to be a DBA today is that with Snowflake, BigQuery and managed service Postgres databases, you don’t need one as they’re largely self-managing. No partitioning needed for Snowflake, no sizing needed for BigQuery, and you can even run your transactional and ad-hoc query workloads on the same Postgres database.

The challenge comes, however, when your data stack scales to the point where this hands-off, “it-just-works” approach reaches its limit and the only choice is to throw more and more money at the problem … or accept that this is the limit of how far it will scale.

Or you could work with a partner such as Rittman Analytics who have  experience working with database technologies such as Firebolt, designed to work economically and efficiently at-scale. All database technologies have “sweet-spots” and scenarios that they optimize for, and what you need is a partner on your team who knows when your current approach has reached its limits, what alternatives would provide a better solution and what the new trade-offs of that solution would be.

9. Data Pipelines that are always breaking

Your data stack is a success until Engineering releases a new version of your back-end application and your data pipeline breaks. Something that happens seemingly every few weeks as your engineering and data teams aren’t aligned and the resultant outages mean that vital dashboard KPIs aren’t updated for hours, even days.

Solving this problem is part process, and part technology. Regular communications and a productive relationship between your data and engineering teams can help identify and resolve potential issues in the future before they impact on your dashboard availability; thorough testing and alerting, sandbox and staging environments as well as new concepts such as data contracts can provide smart and thorough solutions as a back-stop.

10. You Built it … but they didn’t come.

Perhaps the worst outcome for a modern data stack project is when it’s built as planned, but fails to get the level of user adoption you’d hoped for. A self-service BI project where delivery of new dashboards and reports are still bottlenecked and delivered by a small central team; or the worst fate of all, when your new expensive BI tool is only used to export data to Microsoft Excel!

The reality is that building your warehouse and doing the technical work is only the end of the beginning, and not the beginning of the end; unless you plan now for enabling your end-users to use the platform you’ve built, and activating your data to innovate and transform your business, a strategy of “built it and they will come” will most likely lead to an expensive failure. 

IS YOUR DATA STACK READY FOR 2023?

Now you’ve seen some of the most common pitfalls, why not get in touch with us for a deep dive into how well your Modern Data Stack implementation is performing?

Our new Modern Data Stack Healthcheck service gets to the detail of how you’re currently using and building your data stack, and compares it to industry best-practices. Together we’ll evaluate whether or not your investment has delivered the value it should have, create an action plan and work with your team to make your data stack world-class.

Find out more

Rittman Analytics is a boutique analytics consultancy that specializes in analytics engineering and the modern data stack. We’ve successfully implemented more than 50 modern data stacks for clients in the UK, USA and Europe, are a dbt Preferred Consulting Partner and blog regularly on modern data stack topics at https://rittmananalytics.com/blog

Mark Rittman

CEO of Rittman Analytics, host of the Drill to Detail Podcast, ex-product manager and twice company founder.

https://rittmananalytics.com
Previous
Previous

Modern Data Stack Healthcheck Service from Rittman Analytics

Next
Next

Improving Website Search Keyword Performance using Looker, Google Search Console and Fivetran