Parquet
Libraries
- Java
- High leve interface like parquet-hadoop, hadoop-avro are tightly coupled with Hadoop.
- If you do not use hadoop and want to avoid dependency hell, you may need to implement your own parquet writer using low level interface like parquet-{common,column,encoding}.
- e.g. Iceberg’s parquet writer, Trino’s parquet writer
- High leve interface like parquet-hadoop, hadoop-avro are tightly coupled with Hadoop.
Links
- Capacitor (BigQuery’s columnar storage format)
- Has the same ancestor as Parquet (Dremel)
- Motivation of Parquet
ORC
- ORC Specification v1
- v2 specification exists, but it seems there is no progress since 2018.
- protobuf definition