Expected Hive version is 4.
Optimizer
These are two separate cardinality estimation systems in Hive that operate at different stages of query compilation.
- Calcite-Level Statistics (HiveRelMdSelectivity, HiveRelMdRowCount, etc.)
- Package:
org.apache.hadoop.hive.ql.optimizer.calcite.stats - Works on: Calcite RelNodes (logical plan)
- When used: During Cost-Based Optimization (CBO) phase
- Executed if
hive.cbo.enable = true
- Package:
- Operator-Level Statistics (StatsRulesProcFactory.java)
- Package:
org.apache.hadoop.hive.ql.optimizer.stats.annotation - Works on: Hive operator tree (physical plan)
- When used: After physical plan generation (Tez compilation phase)
- Entry point: AnnotateWithStatistics transform
- Package:
Operator-level statistics
-
Notations:
T(S)- Number of tuples in relations SV(S,A)- Number of distinct values of attribute A in relation S