Top 5 Principles for Building User-Friendly Data Tables
Working in data science and analytics for seven years, I have created and queried many tables. There are numerous times I wonder, "What does this column mean?" "Why are there two columns with the same name in table A and table B? Which one should I use?" "What is the granularity of this table?" etc.
If you've faced the same frustration, this article is for you!
In this article, I will share five principles that will help you create tables that your colleagues will appreciate. Please note that this is written from the perspective of a data scientist. Therefore, it will not cover the traditional database design best practices but focus on the strategies to make user-friendly tables.

I. Single Source of Truth
Maintaining a single source of truth for each key data point or metric is very important for reporting and analysis. There should not be any repeated logic in multiple tables.
For convenience, sometimes we compute the same metric in multiple tables. for example, the Gross Merchandise Value (GMV)
calculation might exist in the customer table, monthly financial report table, merchant table, and so on. These columns are then referenced in more downstream tables with even more variations. Over time, they can diverge. Everything worked fine, until one day, a stakeholder came to us asking "Why is the monthly GMV in this dashboard different from the other? Is your data wrong?" When we dig into layers and layers of the data pipeline, we realized last quarter we agreed to only include cleared transactions in GMV, but forgot to make the change in every table… This hurts stakeholders' trust and we will end up spending more and more time investigating and maintaining the complex data model.
So how to solve this? Maintain a single version of the GMV calculation in a dedicated table. Then every other table that needs the GMV metric should use this table, instead of creating their own calculation logic.
DON'Ts:
- Keep multiple versions of the same calculation across different tables.
DOs:
- Maintain one version of each key metric.
- Reference that source-of-truth table in downstream tables, rather than duplicating the logic everywhere.
II. Consistent Granularity
If a table is on the daily level, keep it daily. Mixing data with different granularities in a single table can lead to confusion and data integrity issues.
This is my real example: I once created a table to report the Operations Service Level Agreement (SLA) performance – we had SLA defined for different Operations workflows and we wanted to track how often we could meet it. The SLA was defined at two levels: touchpoint level (each interaction) and case level (the entire process) – a case can have back-and-forth and be touched by multiple people on the Operations team. We want each touchpoint to be completed within a specific time limit and the whole case to be solved within a time range. I thought it was much easier to dump one table into BI tools for reporting, so I created a table mixing the two granularities. It had a column called sla_level
with two values touchpoint
and case
. I was hoping people would filter on a specific SLA level to get the metric they need.
Why is this problematic? Not everyone had the context I just explained. They often did not include this filter in the reporting, and ended up double counting a case, as it would show up in both case level and touchpoint level.
What should I have done instead? Create two tables, one on the touchpoint level, and another on the case level, explain their differences clearly in the table documentation, and only use the appropriate one for different use cases.
DON'Ts:
- Mix rows or columns with different data granularities in the same table. This could lead to misuse of your data and incorrect analysis.
DOs:
- Only have one type of granularity in each table.
III. Descriptive Naming Conventions
I have to admit that I have created temp tables called txn_temp
, tnx_temp_v2
, tnx_subset_v3
, txn_final