I Asked 3 Data Analysts The Same Question. The Answers Made Me Start My Startup
In my previous company, I worked as a team lead of data analysts and engineers. It was a medium size SaaS product development company.
Basically, product managers, marketers, salespeople (etc) came to us with requests, and we tried to deliver them ASAP. We collected the usage data from the web application into our data warehouse with Segment.
We avoided Google Analytics, Mixpanel and similar for security and privacy reasons. Also, transferring the data to external tools would have increased our operating costs significantly.
All this meant data analysts answered all product-related questions in the data warehouse.
Measure With (Guilty) Pleasure
One of the main metrics the company wanted to track was cohort user retention. With that, we wanted to see what impact the application changes had on newly acquired users.
Measuring cohort retention using SQL seemed simple enough as we already had all user behavioural data in our data warehouse. However, a devil sitting on my shoulder nudged me to do a little experiment.
I asked the three data analysts in my team the same question without them knowing about it:
“What is the week-over-week retention of our users that visited our website on the week of 2023–01–02?”
Who Was Right?
To my surprise, all three data analysts came back with different results. Down below you can see the differences (for the cohort 2023–01–02).
The charts describe the percentage of users visiting the application week over week after the initial week of 2023–01–02.
Here are the 3 SQL snippets. I simplified them a bit so they are easier to read.
Junior Analyst’s Code
This one has an obvious flaw. In the JOIN the user IDs are ignored. The final SELECT doesn’t care if the same users are retained as in the initial week.
First Senior Analyst’s Code
This one was cool as it was very easy to read. However, my main problem with it is that it can be difficult to extend and iterate on. Also not very suitable for BI tools for the same reason.
Second Senior Analyst's Code
I liked this one much more, as it was robust and easy to extend. For example, switching from week-over-week to month-over-month would require only 2 changes and a couple of renamings.
And The Winner Is! But…
We kept the second senior analyst’s solution. However, it got me thinking:
- It took (on average) 1 hour to write the queries.
Is this the most efficient way to get these results?
- The second analyst’s results were very close to the first’s.
Who was correct?
(hint: neither of them, which I will cover in a future blog post 😃)
- We get similar many product usage-related questions about user funnels and segmentation every day.
Can’t these tasks be automated?
My First Startup Project
With this short story, I wanted to highlight 3 problems with product analytics in the data warehouse:
- It can be tediously slow. Product managers, sales, and marketing people in an organisation have hundreds of similar questions daily. Data analysts have more critical tasks than answering these questions.
- With people from different backgrounds and experiences, the results will be just as different for each product analytics question. The answers depend on who you ask.
- I believe non-techy people should be able to gain insights from company data without a middleman or copying data to external tools like GA or Mixpanel.
These problems motivated me to start working on my first startup project mitzu.io.
Mitzu is:
- No-code
- Self-served
- Product analytics web-application
- Supports: Snowflake, Databricks, Redshift, Athena, Postgres, MySQL, Trino, etc.
- Handles retention, user funnel and event segmentation type of questions (and more)
Cohort user retention in Mitzu
Putting together a cohort user retention metric takes less than a minute.
Mitzu generates a SQL that was reviewed by multiple data analysts and data scientists.
It is hard to read, but that is not the goal here.
Conclusion
Product analytics with SQL over the data warehouse:
- can be very slow
- produces inconsistent results
- scales poorly since data analysts are a bottleneck
Mitzu is a tool that can help any product development company analyze usage behavioural data natively from the data warehouse.
We have a cloud version of Mitzu that you can try out at:
https://beta.mitzu.io