On Being Data Driven

"Data driven" is a buzzword many people/teams/companies throw around these days. Like many other buzzwords, it's often misused and misunderstood.

To many, "data driven" equals AB-testing. They setup a live AB test, look at metrics (e.g., click-through rate, revenue, etc.), and make a launch decision based on which treatment has the bigger number.

While valid, this is only a small part of what being data driven means.

What it really means is something much more pervasive throughout the product cycle. Most of them seem obvious, but it's funny how the obvious is often forgotten.

  • Rely on data in product planning. Often, one is faced with many possible choices of how to iterate on a product and having to prioritize amongst them. Instead of relying purely on instinct/experience, add data to the mix. Use analysis to figure out which areas have the most gap, which areas have the most to gain, etc.
  • Invest in metrics infrastructure. Many companies have analysts whose fulltime job is to run data warehouse queries and make pretty charts, often the same ones over-and-over. Though the charts can be useful or even critical, the process does not move the org forward. Instead, hire engineers to build the infrastructure to automate those charts. Use the analysts to discover new charts that should be automated.
  • Focus on data quality. Pick any internet company and its data warehouse is full of logs. Raw, dirty, noisy logs. Every team who wants to use data has to clean those logs from scratch, often in different ways. As a result, different people will come up with different numbers for the same question. Instead, have a single source of cleaned logs that can be universally accepted. If different teams have different definitions of "clean", use the least common denominator.
  • Look beyond the top-line metrics. With AB testing, some people just look at the top-line metrics such as revenue or clicks and make a decision based on that. But when asked why the metrics are the way they are, the answer is often just a hypothesis. Validate that hypothesis with more data. If that data is not available, make it available and understand the cause behind the top-line metric.