Product Analytics with Snowflake and Mitzu

Q: What Snowflake permissions does Mitzu need?

A dedicated role needs USAGE on the target database, schema, and virtual warehouse, plus SELECT on the event and dimension tables. Key-pair authentication is recommended for production. The principle of least privilege applies — Mitzu does not need CREATE , INSERT , UPDATE , or DELETE permissions on any object.

Q: Can I connect Mitzu to Snowpipe-loaded tables?

Yes. Snowpipe-loaded tables appear in INFORMATION_SCHEMA like any other table. The Configuration Agent discovers them during the scan. Near-real-time event ingestion via Snowpipe is compatible with Mitzu's query patterns — analysis always reads from the current state of the table at query time.

TL;DR

Run funnel, retention, and segmentation analyses directly on your Snowflake event data with Mitzu — no data export, no ETL, full SQL transparency. The Configuration Agent scans your Snowflake account, identifies event tables, and builds the semantic layer automatically. Every analysis runs natively on Snowflake compute with SQL surfaced for verification at every step.

Introduction

Product teams at Snowflake-first companies face a familiar paradox. Their cleanest event data is already in Snowflake, but many product analytics tools still require duplicating that data into proprietary event stores before teams can run funnels or retention analyses. This introduces pipeline maintenance, freshness lag, and competing metric definitions across tools. For the strategic case against duplication, see warehouse-native vs first-generation product analytics.

Mitzu removes that duplication by running agentic product analytics directly on Snowflake. Product managers get a self-serve interface for funnel, retention, segmentation, and journey analysis, while data teams keep SQL transparency and governance where it belongs: in the warehouse. If you are evaluating AI analytics on Snowflake separately, start with the AI analytics for Snowflake buyer guide or the product overview.

The result is a warehouse-native analytics workflow where non-technical teams can answer product questions quickly and technical teams can still validate every generated query. No reverse ETL. No event re-ingestion taxes. No second source of truth.

Why Snowflake for product event data?

Snowflake is often the canonical event store for modern product organizations because it separates compute from storage, scales virtual warehouses independently, and integrates naturally with the tools that feed event data — Segment, Snowplow, Rudderstack, Fivetran, and custom Snowpipe loads.

Separate compute and storage — query costs scale with usage, not with data volume at rest
Virtual warehouses — analytics, dbt transformations, and application queries run on isolated compute without contention
Snowpipe and continuous ingestion — event data lands in near real-time without batch ETL windows
VARIANT and semi-structured support — event property payloads stored as JSON are queryable without pre-flattening (Snowflake semi-structured types)
Native dbt integration — Mitzu reads dbt-modelled tables and raw event streams with the same connection
Strong ecosystem for cross-domain joins — billing data, CRM exports, and support tickets commonly live in the same Snowflake account

The cost of duplicating data into a SaaS analytics tool

Per-event pricing looks simple at small scale but becomes expensive quickly. At 100M events per month, annual SaaS analytics costs commonly land in the tens or hundreds of thousands of dollars depending on plan and retained history. In contrast, keeping events in Snowflake and querying selectively can be significantly cheaper for many teams.

Beyond budget, duplication creates privacy and compliance risk. Sending user-level behavioral data to third-party infrastructure adds policy overhead for GDPR, regional data residency, and deletion workflows. Warehouse-native analytics keeps the data in one governed environment while still supporting self-serve analysis.

A realistic comparison for many teams: SaaS ingestion might range from $50k–$200k/year at 100M monthly events, while Snowflake storage and selective querying can be a small fraction of that. Your exact spend varies by virtual warehouse size and query frequency, but the structural gap is usually meaningful.

How Mitzu connects to Snowflake?

Connection uses a dedicated Snowflake service user with a role that has USAGE on the relevant database and schema, SELECT on the event tables, and USAGE on a virtual warehouse sized for analytics queries. Key-pair authentication is recommended for production deployments; username/password works for evaluation. See Mitzu Snowflake setup docs for step-by-step connection details.

Create a dedicated Snowflake user and role for Mitzu.
Grant USAGE on the target database, schema, and virtual warehouse.
Grant SELECT on the event tables (or the schema, then restrict by table as needed).
Enter credentials in Mitzu — account identifier, warehouse, database, schema, role.
Run the Configuration Agent to scan and index your event catalog.

No pipelines or sync jobs are required after setup. Mitzu sends SQL directly to Snowflake, so teams can inspect every generated query and validate logic in their existing warehouse workflows.

How the Configuration Agent scans Snowflake event tables?

After credentials are saved, the Configuration Agent queries INFORMATION_SCHEMA.COLUMNS and INFORMATION_SCHEMA.TABLES across the connected schema to identify candidate event tables. It looks for columns that match required product analytics semantics — event timestamp, event name, and user identifier — and recognizes common naming conventions from Segment, Snowplow, Rudderstack, Firebase, GA4, and custom schemas. The agent builds Mitzu's semantic layer automatically — events, properties, entities, and sampled filter values.

Mitzu queries Snowflake event tables in place — no export, no duplicate event store.

Maps required fields — detects event_time / occurred_at / timestamp, event_name / event / event_type, and user_id / anonymous_id / distinct_id
Samples VARIANT properties — runs lightweight SAMPLE queries on JSON/VARIANT columns to populate filter value suggestions with real data, not guesses
Identifies dimension tables — detects company, account, or plan tables that share a join key with the event table, and maps them as dimension sources
Resolves identifier joins — when a pre-login anonymous_id and post-login user_id exist, it configures identity stitching automatically (see identifying users in the warehouse)
Indexes the semantic layer — saves events, event properties, entities, dimension properties, and sampled filter values into Mitzu's data catalog

The scan runs incrementally. New event types added to the table surface in Mitzu automatically on the next indexing pass without manual reconfiguration.

Defining your event model

Mitzu's event model starts with required semantics: event_name, user_id, and event_time. You can model one wide events table or multiple event tables, as long as key fields are consistently mapped. Optional fields like anonymous_id, company_id, and property columns improve pre-login analysis and segmentation depth. For modeling patterns, see modeling events in the data warehouse.

A typical Snowflake event table setup:

CREATE TABLE analytics.product_events (
    event_time      TIMESTAMP_TZ,
    event_name      VARCHAR,
    user_id         VARCHAR,
    anonymous_id    VARCHAR,
    company_id      VARCHAR,
    plan_tier       VARCHAR,
    source_channel  VARCHAR,
    event_properties VARIANT
)
CLUSTER BY (TO_DATE(event_time), event_name)

Clustering by event date and event name keeps per-event scans efficient (Snowflake clustering keys). If event volume is very high, a separate table per major event group (e.g. onboarding_events, checkout_events) reduces scanned micro-partitions per query.

Running key product analyses

After modeling, product teams can run core workflows quickly: funnel conversion, cohort retention, and segment comparisons. Mitzu builds these analyses through a visual interface and a natural-language agent, then executes deterministic SQL on Snowflake — making every query transparent for data review.

Funnel example

A 4-step activation funnel broken down by plan tier, with a 7-day conversion window:

WITH base AS (
    SELECT user_id, event_name, event_time, plan_tier
    FROM analytics.product_events
    WHERE event_time >= DATEADD('day', -90, CURRENT_TIMESTAMP())
),
step_1 AS (
    SELECT user_id, MIN(event_time) AS step_1_at, ANY_VALUE(plan_tier) AS plan_tier
    FROM base
    WHERE event_name = 'sign_up'
    GROUP BY user_id
),
step_2 AS (
    SELECT s1.user_id, s1.plan_tier, MIN(b.event_time) AS step_2_at
    FROM step_1 s1
    JOIN base b
        ON b.user_id = s1.user_id
        AND b.event_name = 'onboarding_started'
        AND b.event_time > s1.step_1_at
        AND b.event_time <= DATEADD('day', 7, s1.step_1_at)
    GROUP BY s1.user_id, s1.plan_tier
),
step_3 AS (
    SELECT s2.user_id, s2.plan_tier, MIN(b.event_time) AS step_3_at
    FROM step_2 s2
    JOIN base b
        ON b.user_id = s2.user_id
        AND b.event_name = 'integration_connected'
        AND b.event_time > s2.step_2_at
        AND b.event_time <= DATEADD('day', 7, s2.step_2_at - INTERVAL '0 seconds')
    GROUP BY s2.user_id, s2.plan_tier
),
step_4 AS (
    SELECT s3.user_id, s3.plan_tier, MIN(b.event_time) AS step_4_at
    FROM step_3 s3
    JOIN base b
        ON b.user_id = s3.user_id
        AND b.event_name = 'first_value_moment'
        AND b.event_time > s3.step_3_at
        AND b.event_time <= DATEADD('day', 7, s3.step_3_at - INTERVAL '0 seconds')
    GROUP BY s3.user_id, s3.plan_tier
)
SELECT
    s1.plan_tier,
    COUNT(DISTINCT s1.user_id)  AS step_1_users,
    COUNT(DISTINCT s2.user_id)  AS step_2_users,
    COUNT(DISTINCT s3.user_id)  AS step_3_users,
    COUNT(DISTINCT s4.user_id)  AS step_4_users,
    ROUND(COUNT(DISTINCT s4.user_id) * 100.0
        / NULLIF(COUNT(DISTINCT s1.user_id), 0), 1) AS overall_conv_pct
FROM step_1 s1
LEFT JOIN step_2 s2 USING (user_id)
LEFT JOIN step_3 s3 USING (user_id)
LEFT JOIN step_4 s4 USING (user_id)
GROUP BY s1.plan_tier
ORDER BY step_1_users DESC

The conversion window is enforced strictly at each step — a user who completes step 3 two weeks after step 1 does not count in step 3. Distinct user counts prevent double-counting when a user fires the same event multiple times.

Retention example

Weekly cohort retention for users who signed up in Q1 2026, measuring return via session_start:

WITH cohort_base AS (
    SELECT
        user_id,
        DATE_TRUNC('week', event_time::DATE) AS cohort_week
    FROM analytics.product_events
    WHERE event_name = 'sign_up'
      AND event_time BETWEEN '2026-01-01' AND '2026-03-31'
),
return_events AS (
    SELECT user_id, event_time
    FROM analytics.product_events
    WHERE event_name = 'session_start'
),
retention AS (
    SELECT
        c.cohort_week,
        DATEDIFF('week', c.cohort_week, DATE_TRUNC('week', r.event_time::DATE)) AS week_number,
        COUNT(DISTINCT c.user_id) AS retained_users
    FROM cohort_base c
    JOIN return_events r
        ON r.user_id = c.user_id
        AND r.event_time >= c.cohort_week
    GROUP BY 1, 2
),
cohort_sizes AS (
    SELECT cohort_week, COUNT(DISTINCT user_id) AS cohort_size
    FROM cohort_base
    GROUP BY 1
)
SELECT
    r.cohort_week,
    r.week_number,
    cs.cohort_size,
    r.retained_users,
    ROUND(r.retained_users * 100.0 / cs.cohort_size, 1) AS retention_rate_pct
FROM retention r
JOIN cohort_sizes cs USING (cohort_week)
WHERE r.week_number <= 8
ORDER BY r.cohort_week, r.week_number

Users are binned by signup week, and the return event is counted once per week bucket per user — avoiding double-counting when a user has multiple session_start events in the same week.

Joining Snowflake tables for richer context

The strongest warehouse-native benefit is joining product behavior with operational context that lives elsewhere in the same Snowflake account. For example, enriching a funnel with Salesforce ARR bands can reveal whether aggregate conversion numbers hide enterprise-specific onboarding friction.

SELECT
    c.arr_segment,
    COUNT(DISTINCT e.user_id) AS converted_users,
    ROUND(COUNT(DISTINCT e.user_id) * 100.0
        / NULLIF(COUNT(DISTINCT s.user_id), 0), 1) AS conv_pct
FROM analytics.product_events e
JOIN crm.companies c
    ON c.company_id = e.company_id
JOIN (
    SELECT user_id
    FROM analytics.product_events
    WHERE event_name = 'sign_up'
      AND event_time >= DATEADD('day', -30, CURRENT_TIMESTAMP())
) s ON s.user_id = e.user_id
WHERE e.event_name = 'first_value_moment'
  AND e.event_time >= DATEADD('day', -30, CURRENT_TIMESTAMP())
GROUP BY c.arr_segment
ORDER BY converted_users DESC

This type of join — product events enriched with CRM, billing, or support data — is only possible when analytics runs on the same warehouse. Tools that ingest events into a vendor silo cannot reach the crm.companies table.

Agentic analytics on Snowflake

Mitzu's Analytics Agent accepts natural-language questions and converts them into typed analysis specifications. A deterministic query engine then turns those specifications into SQL — the agent does not write the SQL itself. This means funnel conversion windows, retention cohort bucketing, and segmentation logic are always correct by construction, not dependent on what the model happened to generate for that prompt.

Example prompt: "Why did week-2 retention drop for users who signed up in February compared to January?" The agent fans out into multiple tool calls: it defines the January and February cohorts separately, runs weekly retention curves for both, identifies the step where the divergence starts, then surfaces a segmented breakdown by acquisition channel and plan tier to narrow the cause. Each sub-query runs natively on Snowflake compute. The full investigation returns as a synthesized narrative with the SQL shown as a verification artifact.

Example prompt: "Show activation funnel drop-off for enterprise accounts from last month, broken down by the integration they connected first." The agent joins the event table to the company dimension, filters to enterprise ARR band, runs a multi-step funnel with the configured conversion window, and groups step 3 completions by the integration_type property value sampled during indexing — no manual schema mapping required.

Performance and cost optimization

Cluster event tables by (TO_DATE(event_time), event_name) — most product analytics queries filter on both; clustering keeps pruned micro-partitions small.
Size the Mitzu virtual warehouse appropriately — an X-SMALL or SMALL warehouse handles most funnel and retention queries; auto-suspend after 60 seconds prevents idle spend.
Use a dedicated warehouse for analytics queries — isolates Mitzu's compute from dbt transformation jobs and application queries.
Partition large VARIANT columns — if event_properties payloads are large, materialize frequently-used properties as typed columns in a dbt model and point Mitzu at the model instead.
Enable result caching — Snowflake caches identical query results for 24 hours; repeated dashboard loads on unchanged data cost nothing.
Set a resource monitor on the Mitzu warehouse — caps monthly credit spend regardless of query volume.

Getting started in 15 minutes

Create a dedicated Snowflake user and role with least-privilege access to your event tables.
Grant USAGE on the virtual warehouse, database, and schema; grant SELECT on the event tables.
Connect credentials in Mitzu — account identifier, warehouse, database, schema, role.
Run the Configuration Agent — it scans INFORMATION_SCHEMA, samples VARIANT properties, and builds the semantic layer.
Review the indexed events and dimensions in the Mitzu catalog; adjust any field mappings.
Run your first funnel and retention chart — or start free in the app.

Conclusion

Snowflake is already where many teams keep their most trustworthy event data. Mitzu turns that same data into a practical agentic product analytics system without duplication, vendor lock-in, or opaque logic. The Configuration Agent handles discovery and semantic-layer setup automatically — no YAML, no manual mapping. Every funnel, retention chart, and deep-dive investigation runs natively on Snowflake compute, with SQL surfaced for verification at every step.

FAQ

Does Mitzu support Snowflake's VARIANT type for event properties?

Yes. The Configuration Agent samples VARIANT and OBJECT columns during indexing and populates filter value suggestions from real data. Mitzu generates the appropriate GET_PATH or colon-notation expressions in the SQL it sends to Snowflake — analysts do not need to know the JSON path syntax.

Can I connect Mitzu to dbt-modelled tables in Snowflake?

Yes. Mitzu reads dbt-modelled tables and raw event streams with the same connection. Many teams point Mitzu at a dbt model that flattens VARIANT properties into typed columns for performance, while keeping the raw event table as the source of truth for the transformation.

How does Mitzu control Snowflake compute costs?

Mitzu generates date-constrained SQL that benefits from Snowflake's micro-partition pruning. The virtual warehouse used for analytics can be auto-suspended between queries and size-limited with a Snowflake resource monitor. Because queries run directly on Snowflake, cost control stays entirely on the customer's side — there is no per-event pricing on Mitzu's end.

What Snowflake permissions does Mitzu need?

A dedicated role needs USAGE on the target database, schema, and virtual warehouse, plus SELECT on the event and dimension tables. Key-pair authentication is recommended for production. The principle of least privilege applies — Mitzu does not need CREATE, INSERT, UPDATE, or DELETE permissions on any object.

Can I connect Mitzu to Snowpipe-loaded tables?

Yes. Snowpipe-loaded tables appear in INFORMATION_SCHEMA like any other table. The Configuration Agent discovers them during the scan. Near-real-time event ingestion via Snowpipe is compatible with Mitzu's query patterns — analysis always reads from the current state of the table at query time.

Does Mitzu work with Snowflake's cross-database joins?

Yes. If your event tables and dimension tables (CRM, billing, support) live in different databases within the same Snowflake account, Mitzu can join across them using fully-qualified three-part names (database.schema.table). This is the most common setup for teams that keep product events in one database and operational data in another.

Product

Insights

Data

Product Analytics with Snowflake and Mitzu

TL;DR

Introduction

Why Snowflake for product event data?

The cost of duplicating data into a SaaS analytics tool

How Mitzu connects to Snowflake?

How the Configuration Agent scans Snowflake event tables?

Defining your event model

Running key product analyses

Funnel example

Retention example

Joining Snowflake tables for richer context

Agentic analytics on Snowflake

Performance and cost optimization

Getting started in 15 minutes

Conclusion

FAQ

Does Mitzu support Snowflake's VARIANT type for event properties?

Can I connect Mitzu to dbt-modelled tables in Snowflake?

How does Mitzu control Snowflake compute costs?

What Snowflake permissions does Mitzu need?

Can I connect Mitzu to Snowpipe-loaded tables?

Does Mitzu work with Snowflake's cross-database joins?

Key Takeaways

About the Author

Subscribe to our newsletter

Ready to transform your analytics?

Related Articles

Transforming Product Analytics with Mitzu and Databricks

Using Mitzu With Trino For Product Analytics

Using GA4 with BigQuery for Product Analytics

How to get started with Mitzu

Connect your data warehouse

Define your events

Start analyzing

Product

Insights

Data

Product Analytics with Snowflake and Mitzu

TL;DR

Introduction

Why Snowflake for product event data?

The cost of duplicating data into a SaaS analytics tool

How Mitzu connects to Snowflake?

How the Configuration Agent scans Snowflake event tables?

Defining your event model

Running key product analyses

Funnel example

Retention example

Joining Snowflake tables for richer context

Agentic analytics on Snowflake

Performance and cost optimization

Getting started in 15 minutes

Conclusion

Related reading

FAQ

Does Mitzu support Snowflake's VARIANT type for event properties?

Can I connect Mitzu to dbt-modelled tables in Snowflake?

How does Mitzu control Snowflake compute costs?

What Snowflake permissions does Mitzu need?

Can I connect Mitzu to Snowpipe-loaded tables?

Does Mitzu work with Snowflake's cross-database joins?

Key Takeaways

About the Author

Subscribe to our newsletter

Ready to transform your analytics?

Related Articles

Transforming Product Analytics with Mitzu and Databricks

Using Mitzu With Trino For Product Analytics

Using GA4 with BigQuery for Product Analytics

How to get started with Mitzu

Connect your data warehouse

Define your events

Start analyzing