Run locally with container

Run interactive pyspark session

podman run -it -v ./:/opt/spark/work-dir spark:4.0.0-java21-python3 /opt/spark/bin/pyspark

Submit a pyspark app

podman run -it spark:4.0.0-java21-python3 \
  /opt/spark/bin/spark-submit \
  --conf spark.log.level=WARN \
  /opt/spark/examples/src/main/python/pi.py

Iceberg

Write dummy data

spark.sql("create database test_db location 's3://bucket_name/iceberg/test_db'")
spark.sql(
  """
  create table test_db.test_tbl (id int)
  using iceberg
  location 's3://bucket_name/iceberg/test_db/test_tbl'
  """
)
df = spark.range(0, 100)
df.writeTo("test.test_tbl").append()
spark.sql("select count(*) from test.test_tbl").show()

Inspecting metadata

Links