Back to Blog
Technology

Product Analytics with Snowflake and Mitzu

Run product analytics directly on Snowflake event data

Run funnel, retention, and segmentation analyses directly on your Snowflake event data with Mitzu — no data export, no ETL, full SQL transparency.

István Mészáros
István Mészáros

Co-founder & CEO

May 20, 2026
16 min read
Product Analytics with Snowflake and Mitzu

TL;DR

Run funnel, retention, and segmentation analyses directly on your Snowflake event data with Mitzu — no data export, no ETL, full SQL transparency. The Configuration Agent scans your Snowflake account, identifies event tables, and builds the semantic layer automatically. Every analysis runs natively on Snowflake compute with SQL surfaced for verification at every step.

Introduction

Product teams at Snowflake-first companies face a familiar paradox. Their cleanest event data is already in Snowflake, but many product analytics tools still require duplicating that data into proprietary event stores before teams can run funnels or retention analyses. This introduces pipeline maintenance, freshness lag, and competing metric definitions across tools. For the strategic case against duplication, see warehouse-native vs first-generation product analytics.

Mitzu removes that duplication by running agentic product analytics directly on Snowflake. Product managers get a self-serve interface for funnel, retention, segmentation, and journey analysis, while data teams keep SQL transparency and governance where it belongs: in the warehouse. If you are evaluating AI analytics on Snowflake separately, start with the AI analytics for Snowflake buyer guide or the product overview.

The result is a warehouse-native analytics workflow where non-technical teams can answer product questions quickly and technical teams can still validate every generated query. No reverse ETL. No event re-ingestion taxes. No second source of truth.

Why Snowflake for product event data?

Snowflake is often the canonical event store for modern product organizations because it separates compute from storage, scales virtual warehouses independently, and integrates naturally with the tools that feed event data — Segment, Snowplow, Rudderstack, Fivetran, and custom Snowpipe loads.

  • Separate compute and storage — query costs scale with usage, not with data volume at rest
  • Virtual warehouses — analytics, dbt transformations, and application queries run on isolated compute without contention
  • Snowpipe and continuous ingestion — event data lands in near real-time without batch ETL windows
  • VARIANT and semi-structured support — event property payloads stored as JSON are queryable without pre-flattening (Snowflake semi-structured types)
  • Native dbt integration — Mitzu reads dbt-modelled tables and raw event streams with the same connection
  • Strong ecosystem for cross-domain joins — billing data, CRM exports, and support tickets commonly live in the same Snowflake account

The cost of duplicating data into a SaaS analytics tool

Per-event pricing looks simple at small scale but becomes expensive quickly. At 100M events per month, annual SaaS analytics costs commonly land in the tens or hundreds of thousands of dollars depending on plan and retained history. In contrast, keeping events in Snowflake and querying selectively can be significantly cheaper for many teams.

Beyond budget, duplication creates privacy and compliance risk. Sending user-level behavioral data to third-party infrastructure adds policy overhead for GDPR, regional data residency, and deletion workflows. Warehouse-native analytics keeps the data in one governed environment while still supporting self-serve analysis.

A realistic comparison for many teams: SaaS ingestion might range from $50k–$200k/year at 100M monthly events, while Snowflake storage and selective querying can be a small fraction of that. Your exact spend varies by virtual warehouse size and query frequency, but the structural gap is usually meaningful.

How Mitzu connects to Snowflake?

Connection uses a dedicated Snowflake service user with a role that has USAGE on the relevant database and schema, SELECT on the event tables, and USAGE on a virtual warehouse sized for analytics queries. Key-pair authentication is recommended for production deployments; username/password works for evaluation. See Mitzu Snowflake setup docs for step-by-step connection details.

  1. Create a dedicated Snowflake user and role for Mitzu.
  2. Grant USAGE on the target database, schema, and virtual warehouse.
  3. Grant SELECT on the event tables (or the schema, then restrict by table as needed).
  4. Enter credentials in Mitzu — account identifier, warehouse, database, schema, role.
  5. Run the Configuration Agent to scan and index your event catalog.

No pipelines or sync jobs are required after setup. Mitzu sends SQL directly to Snowflake, so teams can inspect every generated query and validate logic in their existing warehouse workflows.

How the Configuration Agent scans Snowflake event tables?

After credentials are saved, the Configuration Agent queries INFORMATION_SCHEMA.COLUMNS and INFORMATION_SCHEMA.TABLES across the connected schema to identify candidate event tables. It looks for columns that match required product analytics semantics — event timestamp, event name, and user identifier — and recognizes common naming conventions from Segment, Snowplow, Rudderstack, Firebase, GA4, and custom schemas. The agent builds Mitzu's semantic layer automatically — events, properties, entities, and sampled filter values.

Snowflake pipeline flow: Segment, RudderStack, and Snowplow events queried in place by Mitzu
Mitzu queries Snowflake event tables in place — no export, no duplicate event store.
  • Maps required fields — detects event_time / occurred_at / timestamp, event_name / event / event_type, and user_id / anonymous_id / distinct_id
  • Samples VARIANT properties — runs lightweight SAMPLE queries on JSON/VARIANT columns to populate filter value suggestions with real data, not guesses
  • Identifies dimension tables — detects company, account, or plan tables that share a join key with the event table, and maps them as dimension sources
  • Resolves identifier joins — when a pre-login anonymous_id and post-login user_id exist, it configures identity stitching automatically (see identifying users in the warehouse)
  • Indexes the semantic layer — saves events, event properties, entities, dimension properties, and sampled filter values into Mitzu's data catalog

The scan runs incrementally. New event types added to the table surface in Mitzu automatically on the next indexing pass without manual reconfiguration.

Defining your event model

Mitzu's event model starts with required semantics: event_name, user_id, and event_time. You can model one wide events table or multiple event tables, as long as key fields are consistently mapped. Optional fields like anonymous_id, company_id, and property columns improve pre-login analysis and segmentation depth. For modeling patterns, see modeling events in the data warehouse.

A typical Snowflake event table setup:

CREATE TABLE analytics.product_events (
    event_time      TIMESTAMP_TZ,
    event_name      VARCHAR,
    user_id         VARCHAR,
    anonymous_id    VARCHAR,
    company_id      VARCHAR,
    plan_tier       VARCHAR,
    source_channel  VARCHAR,
    event_properties VARIANT
)
CLUSTER BY (TO_DATE(event_time), event_name)

Clustering by event date and event name keeps per-event scans efficient (Snowflake clustering keys). If event volume is very high, a separate table per major event group (e.g. onboarding_events, checkout_events) reduces scanned micro-partitions per query.

Running key product analyses

After modeling, product teams can run core workflows quickly: funnel conversion, cohort retention, and segment comparisons. Mitzu builds these analyses through a visual interface and a natural-language agent, then executes deterministic SQL on Snowflake — making every query transparent for data review.

Funnel example

A 4-step activation funnel broken down by plan tier, with a 7-day conversion window:

WITH base AS (
    SELECT user_id, event_name, event_time, plan_tier
    FROM analytics.product_events
    WHERE event_time >= DATEADD('day', -90, CURRENT_TIMESTAMP())
),
step_1 AS (
    SELECT user_id, MIN(event_time) AS step_1_at, ANY_VALUE(plan_tier) AS plan_tier
    FROM base
    WHERE event_name = 'sign_up'
    GROUP BY user_id
),
step_2 AS (
    SELECT s1.user_id, s1.plan_tier, MIN(b.event_time) AS step_2_at
    FROM step_1 s1
    JOIN base b
        ON b.user_id = s1.user_id
        AND b.event_name = 'onboarding_started'
        AND b.event_time > s1.step_1_at
        AND b.event_time <= DATEADD('day', 7, s1.step_1_at)
    GROUP BY s1.user_id, s1.plan_tier
),
step_3 AS (
    SELECT s2.user_id, s2.plan_tier, MIN(b.event_time) AS step_3_at
    FROM step_2 s2
    JOIN base b
        ON b.user_id = s2.user_id
        AND b.event_name = 'integration_connected'
        AND b.event_time > s2.step_2_at
        AND b.event_time <= DATEADD('day', 7, s2.step_2_at - INTERVAL '0 seconds')
    GROUP BY s2.user_id, s2.plan_tier
),
step_4 AS (
    SELECT s3.user_id, s3.plan_tier, MIN(b.event_time) AS step_4_at
    FROM step_3 s3
    JOIN base b
        ON b.user_id = s3.user_id
        AND b.event_name = 'first_value_moment'
        AND b.event_time > s3.step_3_at
        AND b.event_time <= DATEADD('day', 7, s3.step_3_at - INTERVAL '0 seconds')
    GROUP BY s3.user_id, s3.plan_tier
)
SELECT
    s1.plan_tier,
    COUNT(DISTINCT s1.user_id)  AS step_1_users,
    COUNT(DISTINCT s2.user_id)  AS step_2_users,
    COUNT(DISTINCT s3.user_id)  AS step_3_users,
    COUNT(DISTINCT s4.user_id)  AS step_4_users,
    ROUND(COUNT(DISTINCT s4.user_id) * 100.0
        / NULLIF(COUNT(DISTINCT s1.user_id), 0), 1) AS overall_conv_pct
FROM step_1 s1
LEFT JOIN step_2 s2 USING (user_id)
LEFT JOIN step_3 s3 USING (user_id)
LEFT JOIN step_4 s4 USING (user_id)
GROUP BY s1.plan_tier
ORDER BY step_1_users DESC

The conversion window is enforced strictly at each step — a user who completes step 3 two weeks after step 1 does not count in step 3. Distinct user counts prevent double-counting when a user fires the same event multiple times.

Retention example

Weekly cohort retention for users who signed up in Q1 2026, measuring return via session_start:

WITH cohort_base AS (
    SELECT
        user_id,
        DATE_TRUNC('week', event_time::DATE) AS cohort_week
    FROM analytics.product_events
    WHERE event_name = 'sign_up'
      AND event_time BETWEEN '2026-01-01' AND '2026-03-31'
),
return_events AS (
    SELECT user_id, event_time
    FROM analytics.product_events
    WHERE event_name = 'session_start'
),
retention AS (
    SELECT
        c.cohort_week,
        DATEDIFF('week', c.cohort_week, DATE_TRUNC('week', r.event_time::DATE)) AS week_number,
        COUNT(DISTINCT c.user_id) AS retained_users
    FROM cohort_base c
    JOIN return_events r
        ON r.user_id = c.user_id
        AND r.event_time >= c.cohort_week
    GROUP BY 1, 2
),
cohort_sizes AS (
    SELECT cohort_week, COUNT(DISTINCT user_id) AS cohort_size
    FROM cohort_base
    GROUP BY 1
)
SELECT
    r.cohort_week,
    r.week_number,
    cs.cohort_size,
    r.retained_users,
    ROUND(r.retained_users * 100.0 / cs.cohort_size, 1) AS retention_rate_pct
FROM retention r
JOIN cohort_sizes cs USING (cohort_week)
WHERE r.week_number <= 8
ORDER BY r.cohort_week, r.week_number

Users are binned by signup week, and the return event is counted once per week bucket per user — avoiding double-counting when a user has multiple session_start events in the same week.

Joining Snowflake tables for richer context

The strongest warehouse-native benefit is joining product behavior with operational context that lives elsewhere in the same Snowflake account. For example, enriching a funnel with Salesforce ARR bands can reveal whether aggregate conversion numbers hide enterprise-specific onboarding friction.

SELECT
    c.arr_segment,
    COUNT(DISTINCT e.user_id) AS converted_users,
    ROUND(COUNT(DISTINCT e.user_id) * 100.0
        / NULLIF(COUNT(DISTINCT s.user_id), 0), 1) AS conv_pct
FROM analytics.product_events e
JOIN crm.companies c
    ON c.company_id = e.company_id
JOIN (
    SELECT user_id
    FROM analytics.product_events
    WHERE event_name = 'sign_up'
      AND event_time >= DATEADD('day', -30, CURRENT_TIMESTAMP())
) s ON s.user_id = e.user_id
WHERE e.event_name = 'first_value_moment'
  AND e.event_time >= DATEADD('day', -30, CURRENT_TIMESTAMP())
GROUP BY c.arr_segment
ORDER BY converted_users DESC

This type of join — product events enriched with CRM, billing, or support data — is only possible when analytics runs on the same warehouse. Tools that ingest events into a vendor silo cannot reach the crm.companies table.

Agentic analytics on Snowflake

Mitzu's Analytics Agent accepts natural-language questions and converts them into typed analysis specifications. A deterministic query engine then turns those specifications into SQL — the agent does not write the SQL itself. This means funnel conversion windows, retention cohort bucketing, and segmentation logic are always correct by construction, not dependent on what the model happened to generate for that prompt.

Example prompt: "Why did week-2 retention drop for users who signed up in February compared to January?" The agent fans out into multiple tool calls: it defines the January and February cohorts separately, runs weekly retention curves for both, identifies the step where the divergence starts, then surfaces a segmented breakdown by acquisition channel and plan tier to narrow the cause. Each sub-query runs natively on Snowflake compute. The full investigation returns as a synthesized narrative with the SQL shown as a verification artifact.

Example prompt: "Show activation funnel drop-off for enterprise accounts from last month, broken down by the integration they connected first." The agent joins the event table to the company dimension, filters to enterprise ARR band, runs a multi-step funnel with the configured conversion window, and groups step 3 completions by the integration_type property value sampled during indexing — no manual schema mapping required.

Performance and cost optimization

  • Cluster event tables by (TO_DATE(event_time), event_name) — most product analytics queries filter on both; clustering keeps pruned micro-partitions small.
  • Size the Mitzu virtual warehouse appropriately — an X-SMALL or SMALL warehouse handles most funnel and retention queries; auto-suspend after 60 seconds prevents idle spend.
  • Use a dedicated warehouse for analytics queries — isolates Mitzu's compute from dbt transformation jobs and application queries.
  • Partition large VARIANT columns — if event_properties payloads are large, materialize frequently-used properties as typed columns in a dbt model and point Mitzu at the model instead.
  • Enable result caching — Snowflake caches identical query results for 24 hours; repeated dashboard loads on unchanged data cost nothing.
  • Set a resource monitor on the Mitzu warehouse — caps monthly credit spend regardless of query volume.

Getting started in 15 minutes

  1. Create a dedicated Snowflake user and role with least-privilege access to your event tables.
  2. Grant USAGE on the virtual warehouse, database, and schema; grant SELECT on the event tables.
  3. Connect credentials in Mitzu — account identifier, warehouse, database, schema, role.
  4. Run the Configuration Agent — it scans INFORMATION_SCHEMA, samples VARIANT properties, and builds the semantic layer.
  5. Review the indexed events and dimensions in the Mitzu catalog; adjust any field mappings.
  6. Run your first funnel and retention chart — or start free in the app.

Conclusion

Snowflake is already where many teams keep their most trustworthy event data. Mitzu turns that same data into a practical agentic product analytics system without duplication, vendor lock-in, or opaque logic. The Configuration Agent handles discovery and semantic-layer setup automatically — no YAML, no manual mapping. Every funnel, retention chart, and deep-dive investigation runs natively on Snowflake compute, with SQL surfaced for verification at every step.

FAQ

Does Mitzu support Snowflake's VARIANT type for event properties?

Yes. The Configuration Agent samples VARIANT and OBJECT columns during indexing and populates filter value suggestions from real data. Mitzu generates the appropriate GET_PATH or colon-notation expressions in the SQL it sends to Snowflake — analysts do not need to know the JSON path syntax.

Can I connect Mitzu to dbt-modelled tables in Snowflake?

Yes. Mitzu reads dbt-modelled tables and raw event streams with the same connection. Many teams point Mitzu at a dbt model that flattens VARIANT properties into typed columns for performance, while keeping the raw event table as the source of truth for the transformation.

How does Mitzu control Snowflake compute costs?

Mitzu generates date-constrained SQL that benefits from Snowflake's micro-partition pruning. The virtual warehouse used for analytics can be auto-suspended between queries and size-limited with a Snowflake resource monitor. Because queries run directly on Snowflake, cost control stays entirely on the customer's side — there is no per-event pricing on Mitzu's end.

What Snowflake permissions does Mitzu need?

A dedicated role needs USAGE on the target database, schema, and virtual warehouse, plus SELECT on the event and dimension tables. Key-pair authentication is recommended for production. The principle of least privilege applies — Mitzu does not need CREATE, INSERT, UPDATE, or DELETE permissions on any object.

Can I connect Mitzu to Snowpipe-loaded tables?

Yes. Snowpipe-loaded tables appear in INFORMATION_SCHEMA like any other table. The Configuration Agent discovers them during the scan. Near-real-time event ingestion via Snowpipe is compatible with Mitzu's query patterns — analysis always reads from the current state of the table at query time.

Does Mitzu work with Snowflake's cross-database joins?

Yes. If your event tables and dimension tables (CRM, billing, support) live in different databases within the same Snowflake account, Mitzu can join across them using fully-qualified three-part names (database.schema.table). This is the most common setup for teams that keep product events in one database and operational data in another.

Key Takeaways

  • Warehouse-native product analytics: events stay in Snowflake; Mitzu queries in place with no reverse ETL or per-event SaaS ingestion.
  • Configuration Agent auto-discovers event tables via INFORMATION_SCHEMA, samples VARIANT properties, and maps identity stitching.
  • Deterministic query engine turns funnel, retention, and segmentation specs into correct Snowflake SQL — not LLM-generated SQL.
  • Cross-database joins let product events enrich CRM, billing, and support tables in the same Snowflake account.

About the Author

István Mészáros

Co-founder & CEO

LinkedIn: https://www.linkedin.com/in/imeszaros/

Co-founder and CEO of Mitzu. Passionate about product analytics and helping companies make data-driven decisions.

Share this article

Subscribe to our newsletter

Get the latest insights on product analytics.

Ready to transform your analytics?

See how Mitzu can help you gain deeper insights from your product data.

Get Started

How to get started with Mitzu

Start analyzing your product data in three simple steps

Connect your data warehouse

Securely connect Mitzu to your existing data warehouse in minutes.

Define your events

Map your product events and user properties with our intuitive interface.

Start analyzing

Create funnels, retention charts, and user journeys without writing SQL.