Use Spark with AWS Glue Iceberg REST API and S3 Tables

Written on: July 31, 2025
Last modified on: July 31, 2025

A quick experiment to use Spark for Iceberg tables stored on S3 table buckets and managed by Glue Data Catalog via Iceberg REST API.

S3 Tables provides built-in support for Apache Iceberg format. It also provides integration with Glue and Lake Formation. When the integration is enabled, tables stored on table buckets are registered to Glue Data Catalog and available through Iceberg REST API.

This post explains how to use Iceberg tables of S3 Tables by Apache Spark via Glue Iceberg REST API.

Environment

Run Spark locally on a container by using apache/spark image

Run interactive pyspark session on a local container

export AWS_REGION=${your_region}
# retrieve and export credentials
eval $(aws configure export-credentials --format env)
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

# packages required to use iceberg and S3
ICEBERG_VERSION="1.9.2"
SPARK_SCALA_VERSION="3.5_2.12"
AWS_SDK_VERSION="2.32.10"
HADOOP_AWS_VERSION="3.3.6"
SPARK_PACKAGES_CONFIG="org.apache.iceberg:iceberg-spark-runtime-${SPARK_SCALA_VERSION}:${ICEBERG_VERSION},software.amazon.awssdk:s3:${AWS_SDK_VERSION},software.amazon.awssdk:sts:${AWS_SDK_VERSION},org.apache.hadoop:hadoop-aws:${HADOOP_AWS_VERSION}"

GLUE_CATALOG_ID="${AWS_ACCOUNT_ID}"
# If you used S3 Table and Glue/Lake Formation integration, a catalog is created per table bucket
# GLUE_CATALOG_ID="${AWS_ACCOUNT_ID}:s3tablescatalog/${BUCKET_NAME}""

podman run --rm -it \
  --name spark-iceberg-job \
  -v ./:/opt/spark/work-dir \
  -e AWS_ACCESS_KEY_ID="${AWS_ACCESS_KEY_ID}" \
  -e AWS_SECRET_ACCESS_KEY="${AWS_SECRET_ACCESS_KEY}" \
  -e AWS_SESSION_TOKEN="${AWS_SESSION_TOKEN}" \
  -e AWS_REGION="${AWS_REGION}" \
  spark:3.5.6-java17-python3 \
  /opt/spark/bin/pyspark \
  --conf "spark.jars.packages=${SPARK_PACKAGES_CONFIG}" \
  --conf "spark.jars.ivy=/opt/spark/work-dir/.ivy" \
  --conf "spark.sql.catalog.glue_rest_catalog=org.apache.iceberg.spark.SparkCatalog" \
  --conf "spark.sql.catalog.glue_rest_catalog.type=rest" \
  --conf "spark.sql.catalog.glue_rest_catalog.warehouse=${GLUE_CATALOG_ID}" \
  --conf "spark.sql.catalog.glue_rest_catalog.uri=https://glue.${AWS_REGION}.amazonaws.com/iceberg" \
  --conf "spark.sql.catalog.glue_rest_catalog.rest.auth.type=sigv4" \
  --conf "spark.sql.catalog.glue_rest_catalog.rest.signing-name=glue" \
  --conf "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions" \
  --conf "spark.sql.defaultCatalog=glue_rest_catalog"

SQL examples

list databases
```
spark.sql("SHOW databases").show()
```

list tables

spark.sql("SHOW tables in test_db").show()

create database
```
spark.sql("CREATE DATABASE test_db")
```

create table

create_table_sql = f"""
CREATE TABLE IF NOT EXISTS test_db.test_tbl (id LONG)
USING iceberg
LOCATION 's3://{BUCKET_NAME}/{DATABASE_NAME}/{TABLE_NAME}'
TBLPROPERTIES ('write.format.default'='parquet')
"""
spark.sql(create_table_sql)

Use Spark with AWS Glue Iceberg REST API and S3 Tables

Environment

Run interactive pyspark session on a local container

SQL examples

Links