Product analytics with RudderStack and Mitzu
In this short post, I will show you how to set up RudderStack with BigQuery for Mitzu warehouse-native product analytics.
Introduction
Short intro to RudderStack CDP
For people who don't know RudderStack, it is a customer data platform (CDP) that gathers and enriches usage data collected from your website or application and puts it into your data warehouse.
RudderStack is one of the few CDPs that is warehouse-native. Being a warehouse-native has many benefits for organizations of any size:
- Full ownership and transparency over your data
- Complete "customer 360" view accessible with SQL
- Secure as all data is accessible only by you and your colleagues
- Cost-effective
Short intro to Mitzu
Mitzu is a warehouse-native product analytics application directly connecting to your data warehouse. It autogenerates native SQL queries executed on top of your data warehouse. The generated SQL queries can answer various business questions in product and marketing analytics. The top business question categories are:
- User segmentation
- Conversion rates (marketing or feature adaptation)
- Retention rates
- User journeys
Benefits of using Mitzu warehouse-native PA compared to traditional PA like Mixpanel or Amplitude:
- Accessing all data that is present in the data warehouse, not just product usage logs
- Enhanced data security
- The most cost-effective approach for product analytics
- No vendor lock-in
- Fits well into a company data strategy once data is centralized in a data warehouse
Benefits of using RudderStack with Mitzu
The main point of RudderStack is to collect all usage analytics events in a single location, such as a data warehouse or data lake. You have all the data required to make good product and marketing decisions if you already maintain your data warehouse.
The only missing piece in the puzzle is enabling all product managers and marketers to access that data. Since product managers and marketing specialists are typically not fluent in SQL or Python, specialized tools would be needed to access insights that enable them.
Enter Mitzu, which is a tool that enables non-technical employees to access data warehouse andlearn key insights from the product data.
Setting up RudderStack and Mitzu
In this tutorial, I will use BigQuery as a data warehouse. BigQuery is a great data warehouse with a generous free tier. It is also extremely simple to set up and maintain.
Configuring BigQuery data warehouse
The prerequisite for this is to have a GCP account.
Here is a great guide on how to set up BigQuery and a storage Bucket in GCP for RudderStack.
https://cloud.google.com/bigquery/docs/bigquery-web-ui
This should take less than 10 minutes to set up. If you didn’t yet have a data warehouse, now you have it 😊.
Here is an example of how your BigQuery will look after connecting it to RudderStack.
Data collection with RudderStack
RudderStack has many data sources and SDKs to collect data from your websites or mobile applications. This post will not cover how to set up tracking with RudderStack. I will only cover how to move data to BigQuery.
Let's imagine we already have two data sources:
- Marketing page events (Website source)
- Application event logs from the server side (Application server events)
We also have two destinations:
- Our newly created BigQuery data warehouse
- Postgres DB - this is just for demonstration of how you can multicast events in RudderStack
This setup will collect all events from our marketing website and the server side of our application. It will then ingest this data to both BigQuery and, after some filtering, Postgres.
If your configuration is correct, you should be able to see a similar chart on the Events Trend tab in RudderStack.
Configuring the BigQuery destination is easy. You can find all the information here. Once the connection is set up, after a while, your BigQuery data set should look something like this:
Each event tracked on the landing page or the application should be stored in a single table.Each table is partitioned by the events' timestamp, enabling efficient data processing.
Connecting Mitzu to RudderStack
All events in our BigQuery data warehouse can be processed with SQL or Python.
The BigQuery dataset that contains our event tables allows you to connect Mitzu with a few clicks.
First, set up the connection to BigQuery from Mitzu. You can reuse the same service account created for RudderStack integration for simplicity.
Next, add the entire dataset as a schema to Mitzu. Navigate to the Event tables tab and hit Add schema. This will ensure that new events are synced with a single click to Mitzu.
Note RudderStack creates these important columns in all tables:
- user_id - will contain the users’ identifiers
- timestamp - will contain the events’ timestamp
- _PARTITIONTIME - this is specific to BigQuery only, this column should be used as the date partition field. Based on this column BigQuery can optimize product analytics queries.
Hint: Make sure only the tables, and not the views, are synced.You can do this with the following regular expression: .*(?<!view)$
Once you are done select the newly added Schema and hit Fetch Selected
. You should see something like this. All tables are listed from the BigQuery.
Next, click Save & update
event catalog
. This will ensure that all tables are represented as product events inside Mitzu.
For your landing page events, RudderStack creates a table called pages
, which is not a typical event table. This table may only have the anonymous_id
as user ID.So you need to reconfigure this table by selecting and clicking on the Configure
table
button.
Product analytics with Mitzu and RudderStack
Once your tables are listed and you have saved and updated your event catalog, you can head to the "insights" page.
I will show you some examples of what you can do with Mitzu and the data in BigQuery.
Start with segmentation:
Create your first funnel insight:
Measure user retention:
Discover user journeys:
Summary
In this post, I covered how to set up RudderStack with BigQuery for warehouse-native event tracking. This setup created separate tables in BigQuery for each event tracked with RudderStack, making it ideal for SQL and Python-based analytics.
This setup would work with other data warehouses like Clickhouse, Databricks, or Snowflake. I used BigQuery because it is very simple to set up.
I then connected Mitzu to BigQuery and indexed the event tables in the dataset. This process took less than 10 minutes in total.
Lastly, I demonstrated Mitzu's product analytics capabilities over BigQuery event tables.
We have segmented our users and created a simple funnel insight. After that, we measured our users' retention and visualized their journey in the application.