Parquet

Libraries

Java
- High leve interface like parquet-hadoop, hadoop-avro are tightly coupled with Hadoop.
  - If you do not use hadoop and want to avoid dependency hell, you may need to implement your own parquet writer using low level interface like parquet-{common,column,encoding}.
  - e.g. Iceberg’s parquet writer, Trino’s parquet writer

Links

Capacitor (BigQuery’s columnar storage format)
- Has the same ancestor as Parquet (Dremel)
- Motivation of Parquet

ORC

ORC Specification v1
- v2 specification exists, but it seems there is no progress since 2018.
protobuf definition

Links

Top