How to ensure accuracy in data reporting

This article explores the concept of "single source of truth." This concept advocates that data pipelines are centralized and ongoing data dialogue is fostered.

So, in many organizations, you’ve probably been in a meeting that goes something like this:

Analyst: “The number is X”

Business person: “But my other dashboard says this number is Y”

Analyst: *hopes for an earthquake to end her misery*

So, what went wrong? Many sources of information that all have different numbers. How does this happen?

Here is an example of evolution of data needs at an organization. A stakeholder needs some information about app users. So, she chats with the data team. Then, data engineers create some pipelines and analysts write code to generate dashboards that visualize the information.

Every single time a data product like a dashboard is created, a whole set of assumptions and business logic rules are used. For example, do we define churned users as “users inactive for more than 70 days?” Or “users inactive for 50 days?” Do we use table 1 which doesn’t count users who are not registered? Or do we use the bigger table? Many decisions are made in the process of creating a data product.

So, say we created some dashboard that tracks user churn. A month passes by, and a different stakeholder comes in and asks for another dashboard. Now business leadership has decided that churned users are defined as “users inactive for 50 days.” However, the previously created dashboard defines churned users as “users inactive for 70 days.” Unless everyone pays close attention to what is going on, there will be 2 conflicting numbers circulating around the organization.

What are the issues associated with conflicting information? Well, for one, it becomes hard for business owners to make the right decision. As a result of bad decisions, the business might suffer. Also, data analysts begin to seem incompetent and business leaders lose trust in what they do.

SOLUTION: THE SINGLE SOURCE OF TRUTH

There is this heaven-like ideal my team always strived towards. This nirvana is called “The single source of truth.” This is a state where all of your data products (Ex: dashboards, models) use the same pipelines, same business logic and all numbers line up. Of course, that is easier said than done. And, it costs time and money. But, it is well worth it. So, how do we execute it?

First of all, stop the proliferation of dashboards and pipelines. Sometimes, it might seem that the more information we provide to the users, the more value we create. No. This is incorrect. Less is more. It is much better to have a fewer dashboards that are all integrated with each other and use the same business logic. Put an actual effort into reviewing all dashboards you have created and mercilessly delete anything that is out of sync with current state of business rules and reporting standards. Do the same thing with data engineering pipelines.

Also, make sure everyone across analytics and business sides of the business are always talking to each other. I know that we all dread having more meetings. They can seem like a colossal waste of time. But, when done right, meetings and slack updates can save you lots of confusion.

Maintaining single source of truth is an ongoing, painful but very worthwhile process. Remember that there is no state of perfection that you are striving towards. You are simply trying to run fast and keep up with the changes to the business and business reporting. I will leave you with a quote from ‘Alice in Wonderland,’ where the Red Queen character says how you have to run to even stay in one place. It’s up to you whether you think this quote is inspiring or the opposite. I think it’s both.

“Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!”


View Products

Interested in talking to us?

Let's chat