Counterfactuals, LLC

Tyler Hanson

Home | About Me | Services | Blog and Case Studies

When it Comes to BI, Less is Still More

How I Leveraged Snowflake and dbt to Create a Sustainable Tableau Implementation for Tango

2023-11-06

Introduction

Starting in early 2023, I’ve been working with the team at Tango to implement a modern data stack for their analytics and reporting needs. A key goal of the initial engagement was to implement Tableau as their source-of-truth for business intelligence (BI).

Tango is a chrome extension and desktop extension that allows users to automatically generate rich how-to guides by walking through the desired steps on their own computers. Their business operates on a “product-led growth” model, meaning that many of their paid customers start off as users of the free tier of their product. This means that collecting and analyzing product usage data is instrumental for driving their sales and revenue processes, so having a robust and reliable BI practice is essential as they continue to grow. I wanted to make sure that both Tableau and the upstream components of the data stack were implemented from the start in a way that was effective and sustainable.

Why Do BI Implementations Fail?

Having worked as a data analyst for over ten years, I’ve seen my fair share of less-than-ideal BI implementations. More often than not, even the best setups can eventually decay into a mess of impossible to navigate and contradictory dashboards. Even the word “dashboard” can become a dirty word for analysts who have to spend all of their time chasing down why two different time series don’t match or satisfying the constant demands of business users to make small updates and cosmetic tweaks. It’s no better for the end users who often become frustrated not being able to find the answers that they are looking for, or have to wait weeks or months for the data team to implement even the smallest of updates.

Along with being a poor experience for everyone involved, a messy BI implementation will also inevitably reinforce bad practices and culture around the use of data. Instead of using dashboards as a launching point for thinking critically about business success and conducting more thorough analysis, end users are conditioned to focus on the minutiae of the specific reports that data analysts spend all of their time fixing and adjusting.

I stress the sustainability of BI implementations because the ultimate end-state for a poor implementation is failure: fed up with an out-of-control mess, organizations will make the decision to wipe the slate clean and migrate to a new tool - a process that’s expensive and painful for everyone involved. Worse yet, without any fundamental changes to strategy, another migration is inevitable a few years down the road.

From my experience, BI practices tend to fail for a few key reasons:

In other words - there are too many dashboards updating too often and accessing too many data points.

Enter the Combined power of Snowflake, dbt, and Tableau

When I started working with Tango, they had already implemented Snowflake for some limited reporting usage, and we agreed to implement Tableau as their BI tool. I was excited to get started because I knew that between Snowflake, Tableau, and dbt Cloud I would be able to set up a simple but scalable BI practice.

My goal with the Tableau implementation was to maximize the amount of reporting and analysis that could be achieved with the minimum of dashboards and data sources. Fewer dashboards meant lower maintenance and faster response times on fulfilling new requests. If the same dashboard could be used for multiple use cases, it would also introduce an economy of scope allowing for new teams to get the reporting they needed from Tableau much faster.

Setting up Tableau Data Sources

Tableau allows for a great deal of flexibility in how data can be accessed by dashboards - possibly too much flexibility. For Tango’s implementation, I followed these guidelines:

  • All data sources are extracted daily onto the Tableau server, rather than being live connections against Snowflake
  • All data sources are extracted as-is from the source table in Snowflake, without any additional custom SQL
As a data transformation tool, dbt becomes especially useful in managing the business logic stored in the tables extracted by Snowflake. There's no need to include additional custom SQL in tableau if column definitions can be managed in a documented and version-controlled fashion in dbt. This does mean that any new or updated columns require opening a pull request against Tango's dbt project, but the additional friction is generally a positive addition as it provides opportunities for testing and feedback.

dbt is also useful for helping to limit the scope of tables available for Tableau to utilize. Within Tango's dbt project, I've defined a “reporting layer”: a set of data models that are designed exclusively for use in Tableau. I’ve even configured the permissions of the Tableau Snowflake user to only be able to access models in the reporting layer.

Within the reporting layer, I've consolidated data models such that there is only one model for each grain of data - a grain being a column or combination of columns that uniquely identifies a relevant entity. For example, a reporting model for Tango users includes one row per Tango user ID. Using this model, it's possible to report on new signups, activation, or any other metrics that are derived from user-level data. This can also include metrics that are aggregated from multiple different data sources - for example, site visit data logged by Segment and revenue data from Stripe. Following a “one big table” paradigm, there's no need for Tableau to access any additional tables at the user level - for any new reporting use cases, additional columns can simply be added to the users reporting table. This means that it’s less likely for redundant or contradictory columns to exist across different tables.

This “one model per grain” strategy also increases the efficiency of Tableau extracts: if multiple teams use the same data source for different dashboards, they can both draw from the same extract.

Building Tableau Dashboards for Flexible Self-serve Analytics

I'm a big fan of the flexibility that Tableau dashboards can provide via the smart use of dashboard parameters and filters, and this is a big part of the strategy I've pursued to keep Tango’s dashboards more manageable.

Just as I've sought to minimize the number of data sources used by Tableau via flexible data models, I've also created flexible dashboards that can satisfy multiple use cases by adjusting filters and parameters. This reduces many inbound data requests to simply finding the right combination that can yield the desired metrics, without having to make any updates to dashboards or data models.

For example, most of Tango’s dashboards include a “slice” parameter that controls the breakdown dimension of plots or tables. This essentially replicates the use of a group by operation in a SQL query. Slicing is not a built-in Tableau function, but can easily be set up by putting a list of fields in a drop down parameter and tying the value of a custom field to the selected value.

Filters, on the other hand, are a key built-in Tableau feature. By tying multiple charts to the same data source, it becomes much easier to manage a standard set of filters that apply across an entire dashboard.

Finally, parameters can be used to create dynamic metrics based on a user’s selection. This can be especially useful for metrics like user activation, which might have multiple definitions based on different time windows or activities. Instead of maintaining multiple charts for different definitions, the necessary logic can be centralized into one.

None of these elements should be mind-blowing or unfamiliar to an experienced analyst, but having the discipline to apply them consistently means that new reporting requests can most often be satisfied quickly with existing dashboards - minimizing the amount of time that the data team needs to spend updating and maintaining dashboards. If a new slice, filter, or metric does need to be added to a dashboard, this can often be completed in as little as five minutes if the necessary field already exists in a reporting table.

More importantly, these flexible dashboards enable a more thoughtful data culture for end users. Instead of poring over endless pages of pre-configured charts, users are encouraged to explore and decompose key metrics using the slices and filters available to them. If a key metric like signups or revenue has trended in a new direction, stakeholders can often begin to answer the question of why without having to ask the data team to commit to any ad-hoc work. This ensures that the data team’s time is reserved for the highest-value analysis.

Beyond Implementation

Setting up a successful BI implementation from scratch is one thing; maintaining the quality of that implementation over a longer period of time also requires effort and forethought. As Tango’s analytics practice matures, I’m planning to conduct regular audits (roughly once a quarter) of dashboards and data models to keep quality consistent.

Since the number of dashboards being maintained for Tango is relatively low, evaluating all of the live dashboards does not require an inordinately large time commitment. Some dashboards might have been set up to centralize metrics for a time-sensitive topic, like a new product launch. Others may have been useful at a certain point in time, but their functionality has been supplanted by newer, more general purpose dashboards. In addition to consolidating the dashboards, we’ll also be interested in removing any slices or filters that are no longer being actively used to streamline the end user experience. Using a combination of Tableau’s built-in viewership analytics as well as direct conversations with the users of these dashboards, it’s straightforward to identify candidates for dashboards and features that can be retired or consolidated. This process has the side effect of providing an opportunity for the data team to check in on stakeholders and learn more about their use cases. Whether or not anything is actually retired, we’ll be able to document important information about usage for the future and may actually receive requests for updates and additions.

Any changes that are made to dashboards will also provide an opportunity to remove unnecessary fields and models from Tango’s dbt project and to conduct any refactors that might simplify key models.

This process for monitoring the quality and usefulness of dashboards and data models over the long run can help Tango avoid an expensive and painful BI migration. As someone who’s had to bear the brunt of several migrations, that’s the ultimate criterion for success.