Migrating data from Oracle to Iceberg can feel daunting—but with the right parameters, Ora2Iceberg makes it straightforward. Here’s a handy guide to what each parameter does, plus some tips on how (and when) to use them.
1. Source Connection Parameters
--source-jdbc-url (-j)
- What It Does: Provides the JDBC URL to connect to your Oracle database (e.g.,
jdbc:oracle:thin:@host:1521/service
). - Why It Matters: Without a valid JDBC URL, Ora2Iceberg can’t talk to your database. Think of this as the “address” that lets the tool find Oracle.
- Pro Tip: Make sure you include all required details: hostname, port, and service/SID. A minor typo here can thwart your entire migration.
--source-user (-u)
- What It Does: Tells Ora2Iceberg which Oracle user account to log in as.
- Why It Matters: Must have enough privileges (READ at least) on the data you plan to move.
- Security Tip: Avoid storing credentials in plain text. Use environment variables or a secure secret manager whenever possible.
--source-password (-p)
- What It Does: Matches the above
--source-user
with the correct password. - Why It Matters: Provides essential authentication to Oracle. If this is incorrect, you’ll get connection failures right away.
- Security Tip: Like with
--source-user
, store or supply this securely to protect sensitive info.
--source-schema (-s)
- What It Does: Names the Oracle schema containing the data you want.
- Why It Matters: Without this, Ora2Iceberg defaults to using the same name as your user. Specifying the schema is particularly important if your user has access to multiple schemas.
- Example:
--source-schema HR
if your data lives in the “HR” schema.
--source-object (-o)
- What It Does: Points to the exact table or view you want to migrate.
- Why It Matters: This is your main data set. Currently, Ora2Iceberg doesn’t support custom SQL statements—so you must specify a table or view.
- Example:
--source-object EMPLOYEES
--source-where (-w)
- What It Does: Applies a WHERE clause at the source, filtering rows to move.
- Why It Matters: Ideal for partial migrations or incremental loads. Reduce data volume and speed things up by filtering out unnecessary rows.
- Example:
--source-where "WHERE LAST_UPDATE_DATE > SYSDATE - 1"
2. Iceberg Destination Parameters
--iceberg-catalog-type (-T)
- What It Does: Tells Ora2Iceberg which Iceberg catalog type you’re targeting (e.g.,
NESSIE
,REST
). - Why It Matters: Different catalogs have different connection methods, so Ora2Iceberg needs to know which type to use under the hood.
--iceberg-catalog (-C)
- What It Does: Names the catalog that will hold your new (or updated) table.
- Why It Matters: Helps Ora2Iceberg register the table in the correct “directory” of your Iceberg ecosystem.
--iceberg-catalog-uri (-U)
- What It Does: The endpoint for your chosen catalog (e.g.,
http://nessie-server:19120/api/v1
for Nessie). - Why It Matters: Similar to
--source-jdbc-url
—this is the “address” Ora2Iceberg needs to connect to your Iceberg catalog. - Pro Tip: Make sure your chosen
--iceberg-catalog-type
supports this URI. Nessie-based catalogs, for instance, need a proper REST API endpoint.
--iceberg-warehouse (-H)
- What It Does: Specifies where your actual Iceberg data files and metadata will live (S3, HDFS, local filesystem, etc.).
- Why It Matters: If the warehouse location is off, queries or data loads could fail or become invisible to your analytics engine.
--iceberg-catalog-properties (-R)
- What It Does: Lets you pass additional key-value properties for your catalog.
- Why It Matters: Perfect for adding authentication tokens, adjusting timeouts, or fine-tuning config for your environment.
- Example:
--iceberg-catalog-properties "authToken=XYZ;clientTimeout=60000"
--iceberg-namespace (-N)
- What It Does: The Iceberg namespace (like a database schema) where the table will be created.
- Why It Matters: Helps keep your tables organized. If not specified, Ora2Iceberg may default to the source schema name, but using
--iceberg-namespace
offers clearer structure.
--iceberg-table (-t)
- What It Does: Names the table in Iceberg.
- Why It Matters: This is how you’ll reference the table for future queries. By default, it might match the source object—but customizing it can keep naming conventions consistent across projects.
--iceberg-id-columns (-I)
- What It Does: Tells Ora2Iceberg which columns uniquely identify each row.
- Why It Matters: Useful for incremental updates or deduplications. If you don’t specify these, any merges or updates will rely on less precise conditions.
--iceberg-partition (-P)
- What It Does: Defines how Iceberg partitions your data, such as by date or another key.
- Why It Matters: Partitioning is huge for performance and cost savings—queries can skip big chunks of data, reading only what’s relevant.
--iceberg-max-file-size (-Z)
- What It Does: Caps the size of each data file created during migration.
- Why It Matters: Too many tiny files can slow down reads; enormous files can hamper parallel reads. Tweak this to optimize performance.
3. Additional Options
–add-rowid-to-iceberg (-r)
- What It Does: Includes Oracle’s ROWID as
ORA_ROW_ID
in Iceberg. - Why It Matters: ROWID can be helpful for tracking lineage (where the row originally lived) or for incremental merges.
- When to Use: If you plan on debugging or analyzing row origins later, or need that tight linkage to Oracle.
–rowid-column (-q)
- What It Does: Gives a custom name to the ROWID column in Iceberg.
- Why It Matters: If you prefer a more meaningful name (e.g.,
ORACLE_ROW_REF
), this is how to set it.
–upload-mode (-L)
- What It Does: Decides how data is uploaded. Options:
- full: Replace the entire table.
- incremental: Append new records since the last load.
- merge: (not yet supported, but will synchronize changes in the future).
- Why It Matters: Helps align your data flow with business needs. Are you recreating the table each time, or just adding new rows?
–default-number-type (-d)
- What It Does: Specifies a default Iceberg type for Oracle NUMBER columns if no precision is defined.
- Why It Matters: Oracle NUMBER can be ambiguous. This ensures consistent typing (e.g.,
decimal(38,8)
), preventing accidental overflow or truncation. - Example:
--default-number-type "decimal(38,8)"
–data-type-map (-m)
- What It Does: Gives fine-grained control over how source columns map to Iceberg types.
- Why It Matters: Suppose you want certain NUMBER columns to always become
long
. This parameter is your key to customizing conversions. - Example:bashCopy code
--data-type-map "ZONE_CONTROL:NUMBER=integer; %_ID:NUMBER=long; LOCATOR_%:NUMBER=decimal(38,0)"
- Any column exactly named
ZONE_CONTROL
→integer
- Any column ending in
_ID
→long
- Any column starting with
LOCATOR_
→decimal(38,0)
- Any column exactly named
–auto-infer-types (-f)
- What It Does: Currently not supported—in future versions, Ora2Iceberg will automatically infer whether a NUMBER column is integer, long, or decimal.
- Why It Matters: Automated inference can make migrations more accurate and speed up schema definition, reducing manual mapping.
Wrapping Up
Whether you’re just dipping a toe into Iceberg or orchestrating a full-scale production pipeline, these parameters give you granular control over your Oracle-to-Iceberg migration. By fine-tuning source connections, catalog details, data types, and partition strategies, you can optimize performance, maintain data integrity, and keep your warehouse environment squeaky clean.
Next Steps:
- Try It Out: Test a basic load with minimal parameters to get a feel for Ora2Iceberg’s workflow.
- Refine & Repeat: Layer in
--source-where
,--iceberg-partition
, and custom type mappings as you refine your data flow. - Stay Tuned: Keep an eye out for future updates like
merge
andauto-infer-types
to make Ora2Iceberg even more powerful.
Happy migrating!