Transforming Product Analytics with Mitzu and Databricks

TL;DR

Learn how Mitzu and Databricks work together to bring warehouse-native product analytics to your lakehouse — no data exports, no ETL, full SQL transparency. Product teams want fast, self-serve analytics, but many companies now keep their event data in Databricks instead of in a proprietary analytics vendor store.

Introduction

Product teams want fast, self-serve analytics, but many companies now keep their event data in Databricks instead of in a proprietary analytics vendor store. That creates tension: the data warehouse has the best governance and freshest data, while classic product analytics tools still expect teams to copy events into a separate system. The result is duplicated pipelines, sync lag, and conflicting numbers in planning meetings.

Mitzu changes this model by running directly on Databricks. Instead of exporting events to another tool, Mitzu reads your Delta tables, generates SQL transparently, and gives product managers funnel, retention, segmentation, and journey analysis in a self-serve UI. The warehouse stays the single source of truth and your team can still move quickly.

What is Databricks for analytics?

Databricks is a lakehouse platform that combines warehouse-style analytics with data lake flexibility. In practice, engineering and data teams centralize events, product usage logs, billing records, and customer dimensions into one governed environment. Unity Catalog handles table discovery, permissions, masking, and lineage so data access is consistent across teams.

Delta Lake provides ACID tables and scalable performance on top of cloud storage, which is exactly why product event streams increasingly land in Databricks first. Teams ingest these events from Segment, RudderStack, Fivetran pipelines, CDC flows, or custom Kafka/Spark jobs. Once the event stream is in Delta tables, the main question becomes how to analyze it without moving it out again.

The problem with classical product analytics on a lakehouse

Classical product analytics tools were designed around their own event stores. If your data platform is Databricks, that usually means duplicating the same events: one copy in Delta, one copy in the SaaS analytics vendor. This duplication creates operational overhead and introduces subtle reporting drift when schemas or transformations diverge.

Data sync lag between warehouse and analytics UI
Per-event ingestion pricing on top of warehouse spend
Privacy and compliance risk when exporting sensitive user events
Vendor lock-in caused by proprietary event schemas
Double storage and duplicated governance work

The economics become obvious at scale. A product with 50M monthly events retains roughly 600M rows per year, and 12 months of active history quickly becomes billions of rows. Paying a per-event SaaS fee to re-ingest data you already store in Databricks is hard to justify when product teams can query the source directly.

How Mitzu connects to Databricks?

A typical Mitzu + Databricks setup is straightforward. You connect using a Databricks service principal or personal access token, select the Unity Catalog schema that contains your event tables, and map your event model fields: event name, user identifier, and event timestamp.

Create a Databricks service principal (or PAT for smaller deployments).
Grant SELECT permissions on the relevant catalog and schema.
Connect Databricks credentials in Mitzu.
Run the event catalog wizard and map event columns.
Save the model and run your first funnel query.

No reverse ETL is required. Mitzu generates SQL and executes it directly on Delta tables, so query logic is auditable and identical to what your data team would run manually.

Key analyses you can run

Once connected, product teams can answer high-value product questions without data exports: where conversion drops, which cohorts retain, how enterprise users behave differently, and what paths drive activation. Mitzu exposes these analyses through UI workflows while keeping generated SQL visible.

Funnel analysis

A common funnel is sign_up -> onboarding_completed -> paid_subscription_started. Mitzu builds this as a staged SQL query that enforces event order and time windows, then returns conversion and drop-off by step.

WITH base AS (
  SELECT user_id, event_name, event_time
  FROM analytics.product_events
  WHERE event_time >= DATEADD(day, -30, CURRENT_TIMESTAMP())
),
signup AS (
  SELECT user_id, MIN(event_time) AS t1
  FROM base
  WHERE event_name = 'sign_up'
  GROUP BY 1
),
activated AS (
  SELECT b.user_id, MIN(b.event_time) AS t2
  FROM base b
  JOIN signup s ON s.user_id = b.user_id AND b.event_time >= s.t1
  WHERE b.event_name = 'onboarding_completed'
  GROUP BY 1
)
SELECT
  COUNT(DISTINCT s.user_id) AS step1_signup,
  COUNT(DISTINCT a.user_id) AS step2_activated
FROM signup s
LEFT JOIN activated a USING (user_id)

Retention and segmentation

Retention cohorts can be defined by first key action (for example, Account Created) and measured against return actions (for example, Dashboard Viewed). Segmentation is equally flexible because dimensions can come from any Delta-backed table, not just event payload properties.

SELECT c.company_tier, COUNT(DISTINCT e.user_id) AS active_users
FROM analytics.product_events e
JOIN crm.companies c ON c.company_id = e.company_id
WHERE e.event_name = 'dashboard_viewed'
  AND e.event_time >= DATEADD(day, -30, CURRENT_TIMESTAMP())
GROUP BY 1
ORDER BY 2 DESC

Agentic analytics on Databricks

Mitzu's agentic analytics layer allows product stakeholders to ask questions in natural language and get SQL-backed answers quickly. A prompt like "What is 30-day retention for users who completed onboarding in March?" is translated into Databricks SQL, executed against your lakehouse, and returned as a chart plus query text.

This is more powerful than AI on a proprietary event store because the agent can use all governed warehouse context. It can join product events with CRM stages, ARR data, support tickets, or NPS tables in one query space. That unlocks cross-functional analysis that legacy product analytics stacks cannot do without brittle exports.

Data governance and security

Mitzu follows Databricks permissions rather than replicating an independent permission model. If a role is restricted in Unity Catalog, that restriction is preserved in Mitzu-generated SQL execution. Teams can maintain one security policy plane instead of re-implementing access in every downstream tool.

Unity Catalog grants control table and column visibility.
Column-level masking policies apply automatically in query results.
Row filters remain enforced when analysts segment by attributes.
Warehouse query history provides auditability for every question.

Performance considerations

Large-scale event analysis on Databricks is efficient when table layout and compute settings are tuned for analytics workloads. Partition event tables by event date, cluster by high-cardinality fields like user_id where appropriate, and use Photon-enabled SQL warehouses for heavy scans.

Partition by event_date to limit scanned files.
Optimize tables and ZORDER on user_id / event_name when needed.
Use Photon for high-throughput query performance.
Prefer incremental models for rolling retention/funnel windows.

Mitzu also helps control cost by pushing date filters early and reusing query patterns designed for incremental windows. In practice, teams get self-serve analytics without opening the door to uncontrolled full-table scans.

Getting started

Create a Databricks service principal.
Grant SELECT on the target catalog/schema (and referenced dimensions).
Connect Databricks in Mitzu credentials settings.
Run the event catalog wizard and map event_name, user_id, and event_time.
Run your first funnel and validate SQL output with your data team.

For implementation details, see Mitzu documentation and your Databricks workspace security baseline.

Conclusion

Databricks already centralizes the data your product organization depends on. Mitzu makes that same lakehouse usable for product analytics without copy pipelines, vendor lock-in, or black-box metrics. You keep full SQL transparency, inherit governance from Unity Catalog, and gain a practical path to agentic analytics on top of warehouse truth.

FAQ

Does Mitzu support Unity Catalog row-level security?

Yes. Mitzu queries Databricks with your configured credentials, so Unity Catalog row filters and policy constraints are enforced at query time. You do not need to duplicate row-level rules in a separate analytics permission layer. This keeps governance centralized and auditable.

Can I use Mitzu with Delta Sharing datasets?

If shared datasets are queryable with the Databricks identity used by Mitzu, they can be modeled for analysis the same way as native Delta tables. Teams should validate sharing permissions and performance characteristics before production rollout. The key requirement is stable SQL access to the exposed schema.

How does Mitzu handle very large Databricks event tables?

Mitzu pushes down filtered SQL to Databricks and relies on table design best practices like partitioning by date and optimized clustering. For recurring analyses, teams typically model incremental windows so only relevant slices are scanned. Photon-enabled SQL warehouses also improve query latency significantly at scale.

Do I need a dedicated Databricks cluster for Mitzu?

Most teams use a dedicated SQL warehouse for predictable analytics performance and spend control, rather than a shared all-purpose compute cluster. This is not strictly required for initial setup, but it is recommended for production. Dedicated compute also makes cost attribution and workload governance easier.

Can Mitzu's AI agent query more than events in Databricks?

Yes. The agent can generate SQL that joins event tables with other governed warehouse sources such as CRM, billing, and support data, as long as they are included in the model and permitted by access controls. That is a major advantage over AI assistants bound to proprietary event stores. The resulting SQL remains visible for review.

Product

Insights

Data

Transforming Product Analytics with Mitzu and Databricks

TL;DR

Introduction

What is Databricks for analytics?

The problem with classical product analytics on a lakehouse

How Mitzu connects to Databricks?

Key analyses you can run

Funnel analysis

Retention and segmentation

Agentic analytics on Databricks

Data governance and security

Performance considerations

Getting started

Conclusion

FAQ

Does Mitzu support Unity Catalog row-level security?

Can I use Mitzu with Delta Sharing datasets?

How does Mitzu handle very large Databricks event tables?

Do I need a dedicated Databricks cluster for Mitzu?

Can Mitzu's AI agent query more than events in Databricks?

Key Takeaways

About the Author

Subscribe to our newsletter

Ready to transform your analytics?

Related Articles

Product Analytics with Snowflake and Mitzu

RudderStack & Mitzu: Revolutionizing Warehouse Data Insight

Product Analytics with BigQuery and Mitzu

How to get started with Mitzu

Connect your data warehouse

Define your events

Start analyzing