OSS distributed SQL query engine for federated analytics across data lakes, warehouses, and operational stores. Apache-2.0. Originally PrestoSQL, renamed to Trino. Connectors for Iceberg, Hudi, Delta Lake, Hive, Postgres, MongoDB, Snowflake, BigQuery, and dozens more.
Trino is the dominant OSS distributed SQL query engine for federated analytics across data lakes, warehouses, and operational stores. Apache-2.0, originally PrestoSQL, renamed to Trino in 2020 by the founding team. Connectors for Iceberg/Hudi/Delta Lake/Hive/Postgres/MongoDB/Snowflake/BigQuery and dozens more — the universal SQL-over-everything layer. Pick Trino when you need cross-source analytical queries without ETL-loading everything into one warehouse, when you want SQL ergonomics over a lakehouse, or when query federation across heterogeneous stores is the architectural pattern. Starburst Galaxy is the managed offering; Trino itself runs anywhere Java + Kubernetes does.
Trino's trust posture is dominated by one architectural choice: it's a query engine, not a storage engine. Data sovereignty stays with the underlying stores (your Postgres, your S3 lakehouse, your Snowflake). Trino's role is to read-pushdown queries to those stores and aggregate results. From a Trust Before Intelligence lens, that means audit + access control are split: Trino handles authentication and query-time RBAC via System Access Control (file-based, OPA, custom plugins), but the underlying stores' own access controls remain authoritative. Misconfiguration here is the dominant risk — Trino with overly-broad service-account credentials to Postgres can effectively bypass Postgres RLS. The right pattern is Trino with per-user impersonation + OPA policies that mirror the underlying stores' authorization model.
Distributed query latency depends on connector + data volume. Sub-second for cached + small-scan queries; multi-second for large lakehouse scans. Predicate pushdown to underlying stores is the key performance lever. Cap rule N/A.
ANSI SQL across heterogeneous sources. SQL knowledge transfers; cross-source JOINs work. The ergonomics that made Presto/Trino the de facto OSS lakehouse query engine. Cap rule N/A.
System Access Control framework: file-based, OPA integration, custom plugins. Per-user impersonation to underlying stores. ABAC via OPA conditions on user/role/source/object. Cap rule N/A — closer to ABAC than pure RBAC when configured properly.
Connector-driven multi-source — runs anywhere Java + K8s does. Multi-cloud, hybrid, on-prem. Cross-cloud federation is the killer feature; query federation removes data movement requirements.
Query plans expose source pushdown, partition pruning, cost-based optimizer decisions. Event listener API captures query lifecycle. Connector metadata feeds into the planner. Strong contextual richness.
Web UI shows running + completed queries with full plan + per-stage stats. JMX metrics for monitoring. Event listener for audit. Cap rule N/A.
G1=Y (OPA integration enables ABAC), G2=Y (event listener captures all queries for audit), G3=N, G4=N, G5=N, G6=N. 2/6 -> 2 strict; lifted to 3 for OPA integration depth.
O1=Y (JMX + Prometheus exporters), O2=N (no native distributed tracing across stages), O3=Y (per-query CPU + memory + bytes-scanned), O4=Y (query failure + latency alarms), O5=N, O6=N. 3/6 -> 4 lenient (per-query observability is among Trino's strong dimensions).
A1=Y (sub-second cached), A2=Y (predicate pushdown enables real-time queries on streaming sources), A3=N, A4=Y (multi-coordinator + worker pools for HA), A5=Y (production deployments at hyperscale documented at Netflix, LinkedIn, Pinterest), A6=Y (massive parallelism is the design center). 5/6 -> 4.
L1=N, L2=N, L3=N, L4=N, L5=Y (catalog + schema + table naming across heterogeneous sources is its terminology surface), L6=N. 1/6 -> 2 strict; bumped to 3 for catalog richness.
S1=Y (query results deterministic given source data), S2=Y (typed result columns), S3=Y (snapshot consistency at query start when sources support it), S4=Y (typed query plan), S5=N (Trino doesn't validate source data quality), S6=Y (event listener flags slow queries / errors). 5/6 -> 4.
Best suited for
Compliance certifications
Trino (OSS) holds no compliance certifications. Starburst Galaxy (managed Trino, separate company) holds SOC 2 + HIPAA BAA + ISO 27001. Starburst Enterprise (commercial self-hosted) provides additional security features but compliance posture is operator-driven. For regulated workloads, use Starburst Galaxy or self-host Trino in attested substrate (AWS GovCloud, Azure Gov).
Use with caution for
Choose Snowflake for fully-managed cloud DW with rich governance + collaboration. Trino wins on cross-source federation + OSS license; Snowflake wins on managed compliance + zero-ops + simpler workload isolation.
View analysis →Choose BigQuery for serverless analytics in GCP with Vertex AI integration. Trino wins on multi-cloud + multi-source query federation; BigQuery wins on serverless ergonomics + no cluster management.
View analysis →Choose DuckDB for single-process analytical queries on Parquet/Iceberg. Trino wins on multi-user serving + cross-source federation; DuckDB wins on embedded simplicity + no operational footprint. Different tools for different scales.
View analysis →Choose ClickHouse for purpose-built columnar OLAP with sub-second analytics. Trino wins on heterogeneous-source federation; ClickHouse wins on single-source raw query performance. ClickHouse stores its own data; Trino doesn't.
View analysis →Role: L1 federated SQL query engine. Reads from heterogeneous data sources via connectors; aggregates results in distributed worker tier; serves SQL clients via Trino protocol or ODBC/JDBC.
Upstream: Receives SQL queries from BI tools (Tableau, Looker, Mode), notebooks (Jupyter, Hex), L3 transformation engines (dbt-trino), and L7 agent runtimes (LangChain SQLDatabase tool against Trino). Reads data from Iceberg/Hudi/Delta/Hive/Postgres/MongoDB/Snowflake/BigQuery via connectors.
Downstream: Returns query results to clients. Emits JMX + Prometheus metrics to L6 observability. Event listener publishes query lifecycle to L6 LLM-cost-style attribution backends. Lineage exported via OpenLineage to L3 catalogs (DataHub, Marquez, OpenMetadata).
Mitigation: Configure per-user impersonation in connector configs. Underlying stores' RLS / RBAC must apply at the impersonated user's level, not Trino's service account. Audit query event listener to verify user identity propagates.
Mitigation: Deploy OPA + Trino opa-authorizer plugin. Mirror underlying stores' authorization model in OPA policies. Test access matrix end-to-end. Ship policies via CI.
Mitigation: Configure query.max-memory and query.max-memory-per-node limits. Enable spill-to-disk for memory-intensive operations. Use resource groups for workload isolation. Monitor query memory usage; alert on outliers.
Mitigation: Enable EventListener plugin. Ship events to durable storage (S3, Kafka, OpenSearch). Validate audit captures user identity + source tables + filter predicates.
Mitigation: Train developers on predicate pushdown. Use EXPLAIN to validate query plans before commit. Set per-source query timeout + memory limits to prevent abuse.
Trino queries Iceberg tables on S3 directly. Cost: S3 storage + Trino compute (vs Snowflake credits). Governance via OPA policies + per-user impersonation + Iceberg table-level RBAC. Cross-source federation to Postgres for transactional joins.
Trino federates queries across AWS-resident PII + Azure-resident customer data + on-prem Oracle. Each source's data stays in jurisdiction; Trino aggregates without moving raw data. OPA enforces region-aware access policies.
ClickHouse or Druid is a better fit for purpose-built single-source columnar OLAP. Trino can do this but per-query latency is higher than dedicated columnar engines. Use Trino when federation is the killer feature; use ClickHouse when it isn't.
This analysis is AI-generated using the INPACT and GOALS frameworks from "Trust Before Intelligence." Scores and assessments are algorithmic and may not reflect the vendor's complete capabilities. Always validate with your own evaluation.