Parameters - ora2iceberg

Migrating data from Oracle to Iceberg can feel daunting—but with the right parameters, Ora2Iceberg makes it straightforward. Here’s a handy guide to what each parameter does, plus some tips on how (and when) to use them.

1. Source Connection Parameters

`--source-jdbc-url (-j)`

What It Does: Provides the JDBC URL to connect to your Oracle database (e.g., jdbc:oracle:thin:@host:1521/service).
Why It Matters: Without a valid JDBC URL, Ora2Iceberg can’t talk to your database. Think of this as the “address” that lets the tool find Oracle.
Pro Tip: Make sure you include all required details: hostname, port, and service/SID. A minor typo here can thwart your entire migration.

`--source-user (-u)`

What It Does: Tells Ora2Iceberg which Oracle user account to log in as.
Why It Matters: Must have enough privileges (READ at least) on the data you plan to move.
Security Tip: Avoid storing credentials in plain text. Use environment variables or a secure secret manager whenever possible.

`--source-password (-p)`

What It Does: Matches the above --source-user with the correct password.
Why It Matters: Provides essential authentication to Oracle. If this is incorrect, you’ll get connection failures right away.
Security Tip: Like with --source-user, store or supply this securely to protect sensitive info.

`--source-schema (-s)`

What It Does: Names the Oracle schema containing the data you want.
Why It Matters: Without this, Ora2Iceberg defaults to using the same name as your user. Specifying the schema is particularly important if your user has access to multiple schemas.
Example: --source-schema HR if your data lives in the “HR” schema.

`--source-object (-o)`

What It Does: Points to the exact table or view you want to migrate.
Why It Matters: This is your main data set. Currently, Ora2Iceberg doesn’t support custom SQL statements—so you must specify a table or view.
Example: --source-object EMPLOYEES

`--source-where (-w)`

What It Does: Applies a WHERE clause at the source, filtering rows to move.
Why It Matters: Ideal for partial migrations or incremental loads. Reduce data volume and speed things up by filtering out unnecessary rows.
Example: --source-where "WHERE LAST_UPDATE_DATE > SYSDATE - 1"

2. Iceberg Destination Parameters

`--iceberg-catalog-type (-T)`

What It Does: Tells Ora2Iceberg which Iceberg catalog type you’re targeting (e.g., NESSIE, REST).
Why It Matters: Different catalogs have different connection methods, so Ora2Iceberg needs to know which type to use under the hood.

`--iceberg-catalog (-C)`

What It Does: Names the catalog that will hold your new (or updated) table.
Why It Matters: Helps Ora2Iceberg register the table in the correct “directory” of your Iceberg ecosystem.

`--iceberg-catalog-uri (-U)`

What It Does: The endpoint for your chosen catalog (e.g., http://nessie-server:19120/api/v1 for Nessie).
Why It Matters: Similar to --source-jdbc-url—this is the “address” Ora2Iceberg needs to connect to your Iceberg catalog.
Pro Tip: Make sure your chosen --iceberg-catalog-type supports this URI. Nessie-based catalogs, for instance, need a proper REST API endpoint.

`--iceberg-warehouse (-H)`

What It Does: Specifies where your actual Iceberg data files and metadata will live (S3, HDFS, local filesystem, etc.).
Why It Matters: If the warehouse location is off, queries or data loads could fail or become invisible to your analytics engine.

`--iceberg-catalog-properties (-R)`

What It Does: Lets you pass additional key-value properties for your catalog.
Why It Matters: Perfect for adding authentication tokens, adjusting timeouts, or fine-tuning config for your environment.
Example: --iceberg-catalog-properties "authToken=XYZ;clientTimeout=60000"

`--iceberg-namespace (-N)`

What It Does: The Iceberg namespace (like a database schema) where the table will be created.
Why It Matters: Helps keep your tables organized. If not specified, Ora2Iceberg may default to the source schema name, but using --iceberg-namespace offers clearer structure.

`--iceberg-table (-t)`

What It Does: Names the table in Iceberg.
Why It Matters: This is how you’ll reference the table for future queries. By default, it might match the source object—but customizing it can keep naming conventions consistent across projects.

`--iceberg-id-columns (-I)`

What It Does: Tells Ora2Iceberg which columns uniquely identify each row.
Why It Matters: Useful for incremental updates or deduplications. If you don’t specify these, any merges or updates will rely on less precise conditions.

`--iceberg-partition (-P)`

What It Does: Defines how Iceberg partitions your data, such as by date or another key.
Why It Matters: Partitioning is huge for performance and cost savings—queries can skip big chunks of data, reading only what’s relevant.

`--iceberg-max-file-size (-Z)`

What It Does: Caps the size of each data file created during migration.
Why It Matters: Too many tiny files can slow down reads; enormous files can hamper parallel reads. Tweak this to optimize performance.

3. Additional Options

–add-rowid-to-iceberg (-r)

What It Does: Includes Oracle’s ROWID as ORA_ROW_ID in Iceberg.
Why It Matters: ROWID can be helpful for tracking lineage (where the row originally lived) or for incremental merges.
When to Use: If you plan on debugging or analyzing row origins later, or need that tight linkage to Oracle.

–rowid-column (-q)

What It Does: Gives a custom name to the ROWID column in Iceberg.
Why It Matters: If you prefer a more meaningful name (e.g., ORACLE_ROW_REF), this is how to set it.

–upload-mode (-L)

What It Does: Decides how data is uploaded. Options:
- full: Replace the entire table.
- incremental: Append new records since the last load.
- merge: (not yet supported, but will synchronize changes in the future).
Why It Matters: Helps align your data flow with business needs. Are you recreating the table each time, or just adding new rows?

–default-number-type (-d)

What It Does: Specifies a default Iceberg type for Oracle NUMBER columns if no precision is defined.
Why It Matters: Oracle NUMBER can be ambiguous. This ensures consistent typing (e.g., decimal(38,8)), preventing accidental overflow or truncation.
Example: --default-number-type "decimal(38,8)"

–data-type-map (-m)

What It Does: Gives fine-grained control over how source columns map to Iceberg types.
Why It Matters: Suppose you want certain NUMBER columns to always become long. This parameter is your key to customizing conversions.
Example:bashCopy code--data-type-map "ZONE_CONTROL:NUMBER=integer; %_ID:NUMBER=long; LOCATOR_%:NUMBER=decimal(38,0)"
- Any column exactly named ZONE_CONTROL → integer
- Any column ending in _ID → long
- Any column starting with LOCATOR_ → decimal(38,0)

–auto-infer-types (-f)

What It Does: Currently not supported—in future versions, Ora2Iceberg will automatically infer whether a NUMBER column is integer, long, or decimal.
Why It Matters: Automated inference can make migrations more accurate and speed up schema definition, reducing manual mapping.

Wrapping Up

Whether you’re just dipping a toe into Iceberg or orchestrating a full-scale production pipeline, these parameters give you granular control over your Oracle-to-Iceberg migration. By fine-tuning source connections, catalog details, data types, and partition strategies, you can optimize performance, maintain data integrity, and keep your warehouse environment squeaky clean.

Next Steps:

Try It Out: Test a basic load with minimal parameters to get a feel for Ora2Iceberg’s workflow.
Refine & Repeat: Layer in --source-where, --iceberg-partition, and custom type mappings as you refine your data flow.
Stay Tuned: Keep an eye out for future updates like merge and auto-infer-types to make Ora2Iceberg even more powerful.

Happy migrating!