Study Apache Iceberg ecosystems in AWS

Study note about Apache Iceberg ecosystems in AWS.

S3 Tables

S3 Tables supports IAM-based and resource-based access control and automatic maintenance operations for Iceberg tables stored in buckets. S3 Tables is available in S3 table buckets. It was released on 2024/12/03.

Table maintenance

Unreferenced file removal and Compaction and snapshot are enabled by default. They are configurable per table.

Integration with Glue and Lake Formation

S3 table buckets can be integrated with Glue and Lake Formation. When the integration is enabled, a Glue catalog is created per table bucket and Iceberg tables are managed on that catalog.

The integration is enabled by the following steps.

  1. registering buckets to Lake Formation as data location
  2. creating a federated catalog on Glue

Quotas

Limitations


AWS Glue

AWS launched Glue Iceberg REST endpoint along with the launch of S3 Tables. (The release date is the same as S3 Tables.)

Data Catalog

Access control

Quotas


AWS Lake Formation (LF)

Lake Formation provides RDBMS permissions model to grant or revoke access to Data Catalog resources.

Permissions model

Lake Formation manages two types of permissions.

Lake Formation uses a combination of Lake Formation permissions and IAM permissions. A principal must pass both Lake Formation and IAM permissions checks.

Metadata permissions

Underlying data access permissions

The following permissions are required to enable principals to read and write underlying data

The Lake Formation permissions model doesn’t prevent access to Amazon S3 locations through the Amazon S3 API or console if you have access to them through IAM or Amazon S3 policies. You can attach IAM policies to principals to block this access.

(from Underlying data access control)

Cross account data sharing

Example steps for cross account data sharing with LF-TBAC

Permissions enforcement

Storage access management

Credential vending

Quotas