June 18, 2024

  • 8 minutes

Product analytics with star schema and Mitzu

István Mészáros

n data warehousing, star schema is an excellent model for organizing all the information about your business. Its simplicity and performance capabilities, particularly in product analytics, make it a go-to structure for analysts and business intelligence professionals.

The star schema's design, characterized by central fact tables connected to surrounding dimension tables, streamlines the querying process and improves insight retrieval speed.
Whether you're tracking sales performance, understanding customer behavior, or analyzing product trends, the star schema provides a robust framework that supports detailed and high-performing analytics.
This blog post explores how you can use star schema in warehouse native product analytics.

Star schema example

Before diving deeper into this topic, let me demonstrate a simple star schema data model.

Here, we have one fact table and two dimension tables. (See below)

  • Fact: page_events - this table contains simulated visitor logs from a non-existing website
  • Dimension: dim_countries - this table contains country-level information
  • Dimension: dim_items - this table contains product information that is presented on the website

Dimension Tables:

Items dimension
Countries dimension

Fact Table:

Fact table

Notice that the table has country_code and item_id columns as foreign keys.
In the fact table, we have two event types:

  • page_visit
  • element_clicked_on_page

The element_clicked_on_page event has a foreign key item_id that shows which item was clicked on the page. We have information about the items in the corresponding dimension table.

Challenges with star schema

Although the star schema is a great model for organizing information, data analysts can easily explore dimensions and facts due to its deduplicated format. Joining facts to dimensions is straightforward for data analysts on a small scale.
However, as the company and the complexity of the data grows, a significant challenge surfaces in the data organization: too many fact and dimension tables lead to knowledge-sharing issues, slowing business intelligence and data analytics.

This is especially a problem when it comes to ad-hoc questions. Answering simple questions for the management with SQL will take longer as the business grows in complexity.

Denormalization of datasets

Data analysts are slow to answer questions primarily because each SQL query requires research on which dimension table holds the required information and how to join to that table.
One potential solution to this is the denormalization of datasets.
In practice, this means creating so-called enriched fact tables where each dimension is pre-joined to the fact table.
This way, the fact tables are enriched with information from different dimensions.

This simple SQL query creates a VIEW for a denormalized fact table.


create or replace view page_events_extended as 
select 
    t1.*, 
    d1.item_name, 
    d1.cost as item_cost,
    d2.country_name,
    d2.currency as original_currency,
    d2.continent
from page_events t1
left join dim_items d1 on d1.item_id = t1.item_id
left join dim_countries d2 on d2.iso_country_code = t1.user_country_code


Notice we use left joins, assuming not all foreign keys are present.
The benefit of this table is that everything is already in place. Data analysts do not need to do any research.

Warehouse-native product analytics

These enriched fact tables are ultimately great for quick ad hoc analytics. For example, an account manager might want to answer a question about the sales performance of a particular business segment. Another example is a product manager who might want to know which features in the app convert the most users to paying customers. Most business questions originate from non-technical people in a company. However, answering them requires special skills like SQL knowledge. Delegating questions to data analysts is, in fact, the main contributor to slow decision-making.
Warehouse native product analytics tools enable the non-technical folks in the company to answer their questions themself without SQL knowledge.
In the following section, I will walk through enabling Mitzu’s warehouse-native product analytics using enriched fact tables.

Setting up Mitzu

Mitzu is a warehouse-native product analytics solution that connects directly to your enriched fact tables and enables self-served product analytics.
With Mitzu, non-technical users can create user segmentation, funnels, and retention insights with a handful of clicks.

Step 1.

Connect Mitzu to your data warehouse. In the example below, you can see Snowflake.

Configuring warehouse connection

Step 2.

List your enriched fact tables. Click save & update event catalog.

Configuring fact tables

Step 3.

Start analyzing your data without SQL.

Product analytics

Summary

This post discussed star schema's benefits and potential problems in product analytics. We focused on the usability of facts and dimension datasets with ad-hoc behavioral analytics SQL queries. We also discussed reducing the time to insight by introducing enriched fact tables.

Last, we demonstrated how improving organizational efficiency is possible by introducing a warehouse-native product analytics solution that connects directly to the enriched fact tables.

Explore warehouse native product analytics

See how you can benefit from warehouse native product analytics

Blogs for your growth