Trino

L1 — Multi-Modal Storage Federated Query Free (OSS) / Starburst Galaxy managed Apache-2.0 · OSS

OSS distributed SQL query engine for federated analytics across data lakes, warehouses, and operational stores. Apache-2.0. Originally PrestoSQL, renamed to Trino. Connectors for Iceberg, Hudi, Delta Lake, Hive, Postgres, MongoDB, Snowflake, BigQuery, and dozens more.

AI Analysis

Trino is the dominant OSS distributed SQL query engine for federated analytics across data lakes, warehouses, and operational stores. Apache-2.0, originally PrestoSQL, renamed to Trino in 2020 by the founding team. Connectors for Iceberg/Hudi/Delta Lake/Hive/Postgres/MongoDB/Snowflake/BigQuery and dozens more — the universal SQL-over-everything layer. Pick Trino when you need cross-source analytical queries without ETL-loading everything into one warehouse, when you want SQL ergonomics over a lakehouse, or when query federation across heterogeneous stores is the architectural pattern. Starburst Galaxy is the managed offering; Trino itself runs anywhere Java + Kubernetes does.

Trust Before Intelligence

Trino's trust posture is dominated by one architectural choice: it's a query engine, not a storage engine. Data sovereignty stays with the underlying stores (your Postgres, your S3 lakehouse, your Snowflake). Trino's role is to read-pushdown queries to those stores and aggregate results. From a Trust Before Intelligence lens, that means audit + access control are split: Trino handles authentication and query-time RBAC via System Access Control (file-based, OPA, custom plugins), but the underlying stores' own access controls remain authoritative. Misconfiguration here is the dominant risk — Trino with overly-broad service-account credentials to Postgres can effectively bypass Postgres RLS. The right pattern is Trino with per-user impersonation + OPA policies that mirror the underlying stores' authorization model.

INPACT Score

27/36

I — Instant

4/6

Distributed query latency depends on connector + data volume. Sub-second for cached + small-scan queries; multi-second for large lakehouse scans. Predicate pushdown to underlying stores is the key performance lever. Cap rule N/A.

N — Natural

5/6

ANSI SQL across heterogeneous sources. SQL knowledge transfers; cross-source JOINs work. The ergonomics that made Presto/Trino the de facto OSS lakehouse query engine. Cap rule N/A.

P — Permitted

4/6

System Access Control framework: file-based, OPA integration, custom plugins. Per-user impersonation to underlying stores. ABAC via OPA conditions on user/role/source/object. Cap rule N/A — closer to ABAC than pure RBAC when configured properly.

A — Adaptive

5/6

Connector-driven multi-source — runs anywhere Java + K8s does. Multi-cloud, hybrid, on-prem. Cross-cloud federation is the killer feature; query federation removes data movement requirements.

C — Contextual

5/6

Query plans expose source pushdown, partition pruning, cost-based optimizer decisions. Event listener API captures query lifecycle. Connector metadata feeds into the planner. Strong contextual richness.

T — Transparent

4/6

Web UI shows running + completed queries with full plan + per-stage stats. JMX metrics for monitoring. Event listener for audit. Cap rule N/A.

GOALS Score

18/25

G — Governance

3/6

G1=Y (OPA integration enables ABAC), G2=Y (event listener captures all queries for audit), G3=N, G4=N, G5=N, G6=N. 2/6 -> 2 strict; lifted to 3 for OPA integration depth.

O — Observability

4/6

O1=Y (JMX + Prometheus exporters), O2=N (no native distributed tracing across stages), O3=Y (per-query CPU + memory + bytes-scanned), O4=Y (query failure + latency alarms), O5=N, O6=N. 3/6 -> 4 lenient (per-query observability is among Trino's strong dimensions).

A — Availability

4/6

A1=Y (sub-second cached), A2=Y (predicate pushdown enables real-time queries on streaming sources), A3=N, A4=Y (multi-coordinator + worker pools for HA), A5=Y (production deployments at hyperscale documented at Netflix, LinkedIn, Pinterest), A6=Y (massive parallelism is the design center). 5/6 -> 4.

L — Lexicon

3/6

L1=N, L2=N, L3=N, L4=N, L5=Y (catalog + schema + table naming across heterogeneous sources is its terminology surface), L6=N. 1/6 -> 2 strict; bumped to 3 for catalog richness.

S — Solid

4/6

S1=Y (query results deterministic given source data), S2=Y (typed result columns), S3=Y (snapshot consistency at query start when sources support it), S4=Y (typed query plan), S5=N (Trino doesn't validate source data quality), S6=Y (event listener flags slow queries / errors). 5/6 -> 4.

AI-Identified Strengths

+ Universal SQL-over-everything: connectors for 30+ data sources from a single query interface
+ Apache-2.0 license with strong community + Starburst commercial backing; project trajectory is healthy
+ Predicate pushdown to underlying stores eliminates data movement for filtered queries — critical for cost + latency
+ OPA integration for ABAC at query time; per-user impersonation propagates identity to underlying stores
+ Production-proven at hyperscale (Netflix, LinkedIn, Pinterest, Uber, Airbnb)
+ Iceberg + Hudi + Delta Lake connectors are first-class — Trino is a default lakehouse query engine choice
+ Cost-based optimizer with table statistics produces good plans across complex multi-source queries

AI-Identified Limitations

- Operational complexity: coordinator + worker tier requires K8s expertise. Memory tuning + spill configuration matters for stability under load.
- JVM-based — heap tuning + GC pauses are real operational concerns at scale
- No data storage; results live in client memory or get materialized via CTAS. Trino is a query layer, not a warehouse.
- Per-query memory limits + workload management require careful configuration to prevent one bad query from impacting cluster
- Connector quality varies — Iceberg/Hudi/Delta/Postgres are excellent; some long-tail connectors are less polished
- OPA integration is powerful but adds latency to query authorization (typically <10ms; verify in your deployment)
- Compliance attestations come from Starburst Galaxy or your substrate; Trino the project has none

Industry Fit

Best suited for

OSS lakehouse stacks querying Iceberg/Delta/Hudi tables with SQL ergonomicsCross-source analytical queries spanning warehouses + operational stores + lakehouseCost-optimization plays where data movement (loading everything into Snowflake/BigQuery) dominates the billMulti-cloud / hybrid stacks needing portable SQL access layer across heterogeneous storageModernization paths from legacy Presto deployments — Trino is the supported continuationWorkloads using Starburst Galaxy or Starburst Enterprise for managed compliance

Compliance certifications

Trino (OSS) holds no compliance certifications. Starburst Galaxy (managed Trino, separate company) holds SOC 2 + HIPAA BAA + ISO 27001. Starburst Enterprise (commercial self-hosted) provides additional security features but compliance posture is operator-driven. For regulated workloads, use Starburst Galaxy or self-host Trino in attested substrate (AWS GovCloud, Azure Gov).

Use with caution for

Workloads requiring engine-managed storage (Snowflake-style) where Trino's federation isn't valuableSingle-user analytical workloads — DuckDB is operationally simplerSingle-source raw query performance at extreme scale — ClickHouse / Druid / Pinot beat Trino on dedicated columnar workloadsTeams without K8s + JVM operational expertise — operational burden is realCompliance-attested workloads needing FedRAMP — use Starburst Galaxy with appropriate tier

AI-Suggested Alternatives

Snowflake

Choose Snowflake for fully-managed cloud DW with rich governance + collaboration. Trino wins on cross-source federation + OSS license; Snowflake wins on managed compliance + zero-ops + simpler workload isolation.

View analysis →

Google BigQuery

Choose BigQuery for serverless analytics in GCP with Vertex AI integration. Trino wins on multi-cloud + multi-source query federation; BigQuery wins on serverless ergonomics + no cluster management.

View analysis →

DuckDB

Choose DuckDB for single-process analytical queries on Parquet/Iceberg. Trino wins on multi-user serving + cross-source federation; DuckDB wins on embedded simplicity + no operational footprint. Different tools for different scales.

View analysis →

ClickHouse

Choose ClickHouse for purpose-built columnar OLAP with sub-second analytics. Trino wins on heterogeneous-source federation; ClickHouse wins on single-source raw query performance. ClickHouse stores its own data; Trino doesn't.

View analysis →

Integration in 7-Layer Architecture

Role: L1 federated SQL query engine. Reads from heterogeneous data sources via connectors; aggregates results in distributed worker tier; serves SQL clients via Trino protocol or ODBC/JDBC.

Upstream: Receives SQL queries from BI tools (Tableau, Looker, Mode), notebooks (Jupyter, Hex), L3 transformation engines (dbt-trino), and L7 agent runtimes (LangChain SQLDatabase tool against Trino). Reads data from Iceberg/Hudi/Delta/Hive/Postgres/MongoDB/Snowflake/BigQuery via connectors.

Downstream: Returns query results to clients. Emits JMX + Prometheus metrics to L6 observability. Event listener publishes query lifecycle to L6 LLM-cost-style attribution backends. Lineage exported via OpenLineage to L3 catalogs (DataHub, Marquez, OpenMetadata).

⚡ Trust Risks

high Trino service account has broad credentials to underlying stores, bypassing per-user authorization

Mitigation: Configure per-user impersonation in connector configs. Underlying stores' RLS / RBAC must apply at the impersonated user's level, not Trino's service account. Audit query event listener to verify user identity propagates.

high OPA policies not configured; only default System Access Control which is permissive

Mitigation: Deploy OPA + Trino opa-authorizer plugin. Mirror underlying stores' authorization model in OPA policies. Test access matrix end-to-end. Ship policies via CI.

high One bad query exhausts coordinator memory and brings down the cluster

Mitigation: Configure query.max-memory and query.max-memory-per-node limits. Enable spill-to-disk for memory-intensive operations. Use resource groups for workload isolation. Monitor query memory usage; alert on outliers.

medium Query event listener not enabled — no audit trail for cross-source queries

Mitigation: Enable EventListener plugin. Ship events to durable storage (S3, Kafka, OpenSearch). Validate audit captures user identity + source tables + filter predicates.

medium Connector misuse: querying Postgres with full table scans instead of pushdown — destroys source DB performance

Mitigation: Train developers on predicate pushdown. Use EXPLAIN to validate query plans before commit. Set per-source query timeout + memory limits to prevent abuse.

Use Case Scenarios

strong Lakehouse-first data platform replacing Snowflake with Iceberg + Trino

Trino queries Iceberg tables on S3 directly. Cost: S3 storage + Trino compute (vs Snowflake credits). Governance via OPA policies + per-user impersonation + Iceberg table-level RBAC. Cross-source federation to Postgres for transactional joins.

strong Multi-cloud financial-services analytics with regulatory data residency

Trino federates queries across AWS-resident PII + Azure-resident customer data + on-prem Oracle. Each source's data stays in jurisdiction; Trino aggregates without moving raw data. OPA enforces region-aware access policies.

weak Single-table OLAP dashboards on a 10TB fact table

ClickHouse or Druid is a better fit for purpose-built single-source columnar OLAP. Trino can do this but per-query latency is higher than dedicated columnar engines. Use Trino when federation is the killer feature; use ClickHouse when it isn't.

Stack Impact

L1 Trino is the L1 federated query engine that sits in front of L1 lakehouse formats (Iceberg/Hudi/Delta) and L1 warehouses (Snowflake/BigQuery) and L1 operational stores (Postgres/MongoDB). Choice cascades to whether you need ETL-into-warehouse vs query-in-place patterns.

L3 L3 transformation tools (dbt) target Trino as a SQL execution engine for lakehouse workloads. dbt-trino adapter is well-maintained. Pairs naturally with Iceberg-as-medallion-architecture.

L5 Trino's OPA integration is the L5 governance bridge. Per-user impersonation propagates identity to underlying stores; OPA policies provide ABAC at query time. Audit log feeds L6 SIEM.

L6 Trino emits JMX + Prometheus metrics consumed by L6 observability (Prometheus/Grafana, Datadog, SigNoz). Event listener provides per-query event stream to L6 LLM-cost-style attribution for analytical workloads.

⚠ Watch For

! Trino service account with broad credentials, bypassing per-user authorization on underlying stores
! OPA not deployed — relying on default System Access Control which is permissive
! query.max-memory not configured — single bad query can exhaust cluster
! Event listener not enabled — no audit trail for cross-source queries
! Connector queried with full table scans instead of pushdown — destroys source DB performance
! Single-coordinator deployment without HA — coordinator failure stalls all queries

2-Week POC Checklist

☐ Deploy Trino on K8s with at least 2 coordinators + worker autoscaling. Test coordinator failover.
☐ Configure OPA + opa-authorizer plugin. Mirror underlying store authorization in OPA policies. Validate with access-matrix tests.
☐ Set query.max-memory + query.max-memory-per-node. Test with intentional large-query to verify graceful failure.
☐ Enable EventListener plugin. Ship events to S3 or Kafka. Validate audit captures user identity + source tables.
☐ Benchmark federated query: cross-source JOIN between Postgres + Iceberg. Verify predicate pushdown via EXPLAIN.
☐ If regulated workload: evaluate Starburst Galaxy (managed compliance) vs self-host in attested substrate.

Explore in Interactive Stack Builder →

Visit Trino website →

This analysis is AI-generated using the INPACT and GOALS frameworks from "Trust Before Intelligence." Scores and assessments are algorithmic and may not reflect the vendor's complete capabilities. Always validate with your own evaluation.