This example demonstrates migrating a single Oracle table (MTL_SYSTEM_ITEMS_B
in the INV
schema) into an Iceberg table stored on Amazon S3, using AWS Glue as the Iceberg catalog.
Prerequisites:
- Java Installation: Ensure JDK 11 or later is installed.
- ora2iceberg Binary: Download the latest binary from the GitHub releases page, or build from source.
- Oracle Connectivity: Have access to your Oracle database and correct credentials.
- AWS Credentials: You’ll need valid AWS credentials (Access Key, Secret Key) with permissions to write to the specified S3 bucket and register tables in AWS Glue.
AWS Environment Variables:
Set your AWS environment variables so that ora2iceberg can authenticate with the AWS services it needs to access. For example:
AWS_REGION=<us-east-1>
AWS_ACCESS_KEY_ID=<ACCESS_KEY_ID>
AWS_SECRET_ACCESS_KEY=<AWS_SECRET_ACCESS_KEY>
You can export these in your terminal session, or provide them via your runtime environment.
Command Example:
java -jar ora2iceberg.jar \
--source-jdbc-url jdbc:oracle:thin:@oebsdbtst:1591/TST \
--source-user apps \
--source-password <password> \
-T glue \
-C test \-U 1 \
-H "s3://iceberg-ora" \
-R io-impl=org.apache.iceberg.aws.s3.S3FileIO \
--source-object MTL_SYSTEM_ITEMS_B \
--source-schema INV \
-m "%_ID:NUMBER=LONG; %ATED_BY:NUMBER=INTEGER" \
-P ORGANIZATION_ID=IDENTITY \
-P INVENTORY_ITEM_ID=BUCKET,5
Parameter Breakdown:
- Source Connection:
--source-jdbc-url jdbc:oracle:thin:@oebsdbtst:1591/TST
points to your Oracle database.--source-user apps
and--source-password password
authenticate the Oracle connection.--source-schema INV
sets the schema from which we’re reading, and--source-object MTL_SYSTEM_ITEMS_B
is the table to migrate.
- Destination Configuration (AWS & Iceberg):
-T glue
specifies the Iceberg catalog type. Here,glue
integrates with AWS Glue as the metadata catalog.-C test
names the Iceberg catalog within AWS Glue.-U 1
parameter for the endpoint URI is required but does not apply to Glue. For Glue scenarios, you can use any placeholder value-H "s3://iceberg-ora"
sets the warehouse location in Amazon S3, where Iceberg data files and metadata will be stored.-R io-impl=org.apache.iceberg.aws.s3.S3FileIO
ensures Iceberg uses the S3 FileIO implementation for reading and writing table data.
- Data Type and Schema Customization:
-m "%_ID:NUMBER=LONG; %ATED_BY:NUMBER=INTEGER"
defines a data type mapping rule:- Any column name ending with
_ID
and typed asNUMBER
in Oracle becomes aLONG
in Iceberg. - Any column name ending with
ATED_BY
(e.g.,CREATED_BY
,UPDATED_BY
) and typed asNUMBER
becomes anINTEGER
in Iceberg.
- Any column name ending with
- This approach ensures that ambiguous NUMBER columns are translated into more precise data types based on their naming conventions and intended usage.
- Table Partitioning:
-P ORGANIZATION_ID=IDENTITY
applies anIDENTITY
partition transform to theORGANIZATION_ID
column, effectively creating separate data files per distinct organization ID value.-P INVENTORY_ITEM_ID=Bucket,5
applies a bucket partitioning transform onINVENTORY_ITEM_ID
into 5 buckets, improving parallelism and query efficiency by organizing data into balanced subsets.
What to Expect After Running the Command:
- Schema Mapping: ora2iceberg reads the Oracle table schema, applies the custom type mappings, and prepares an Iceberg-compatible schema.
- Data Extraction & Loading: Data is fetched from Oracle and written into S3 in the Iceberg format. Partitioning rules (
IDENTITY
,BUCKET
) are applied, improving downstream performance. - Catalog Registration: The Iceberg table is registered in AWS Glue. You’ll find metadata in Glue’s Data Catalog, making the table discoverable by analytic engines like Athena or Spark.
- Verification: Once complete, you can query the Iceberg table using a supported engine (e.g., Spark, Athena with the Iceberg connector) to confirm that the data and schema look correct, and the partitions are as defined.
Next Steps:
- Refine Data Types: If certain columns need more precise decimal definitions or other numeric types, adjust
--default-number-type
or refine your--data-type-map
patterns. - Check Partitions & Performance: Adjust partitioning strategies if queries aren’t performing as expected.
- Explore Other Catalogs & Modes: ora2iceberg supports multiple catalogs and operation modes. Experiment with incremental updates or merges for ongoing data synchronization.
By following this example, you’ll gain a practical understanding of how to configure ora2iceberg for AWS, tailor data types and partitioning to your needs, and confidently begin migrating your data from Oracle to Iceberg.