Parameters

Migrating data from Oracle to Iceberg can feel daunting—but with the right parameters, Ora2Iceberg makes it straightforward. Here’s a handy guide to what each parameter does, plus some tips on how (and when) to use them.

1. Source Connection Parameters

--source-jdbc-url (-j)

  • What It Does: Provides the JDBC URL to connect to your Oracle database (e.g., jdbc:oracle:thin:@host:1521/service).
  • Why It Matters: Without a valid JDBC URL, Ora2Iceberg can’t talk to your database. Think of this as the “address” that lets the tool find Oracle.
  • Pro Tip: Make sure you include all required details: hostname, port, and service/SID. A minor typo here can thwart your entire migration.

--source-user (-u)

  • What It Does: Tells Ora2Iceberg which Oracle user account to log in as.
  • Why It Matters: Must have enough privileges (READ at least) on the data you plan to move.
  • Security Tip: Avoid storing credentials in plain text. Use environment variables or a secure secret manager whenever possible.

--source-password (-p)

  • What It Does: Matches the above --source-user with the correct password.
  • Why It Matters: Provides essential authentication to Oracle. If this is incorrect, you’ll get connection failures right away.
  • Security Tip: Like with --source-user, store or supply this securely to protect sensitive info.

--source-schema (-s)

  • What It Does: Names the Oracle schema containing the data you want.
  • Why It Matters: Without this, Ora2Iceberg defaults to using the same name as your user. Specifying the schema is particularly important if your user has access to multiple schemas.
  • Example: --source-schema HR if your data lives in the “HR” schema.

--source-object (-o)

  • What It Does: Points to the exact table or view you want to migrate.
  • Why It Matters: This is your main data set. Currently, Ora2Iceberg doesn’t support custom SQL statements—so you must specify a table or view.
  • Example: --source-object EMPLOYEES

--source-where (-w)

  • What It Does: Applies a WHERE clause at the source, filtering rows to move.
  • Why It Matters: Ideal for partial migrations or incremental loads. Reduce data volume and speed things up by filtering out unnecessary rows.
  • Example: --source-where "WHERE LAST_UPDATE_DATE > SYSDATE - 1"

2. Iceberg Destination Parameters

--iceberg-catalog-type (-T)

  • What It Does: Tells Ora2Iceberg which Iceberg catalog type you’re targeting (e.g., NESSIE, REST).
  • Why It Matters: Different catalogs have different connection methods, so Ora2Iceberg needs to know which type to use under the hood.

--iceberg-catalog (-C)

  • What It Does: Names the catalog that will hold your new (or updated) table.
  • Why It Matters: Helps Ora2Iceberg register the table in the correct “directory” of your Iceberg ecosystem.

--iceberg-catalog-uri (-U)

  • What It Does: The endpoint for your chosen catalog (e.g., http://nessie-server:19120/api/v1 for Nessie).
  • Why It Matters: Similar to --source-jdbc-url—this is the “address” Ora2Iceberg needs to connect to your Iceberg catalog.
  • Pro Tip: Make sure your chosen --iceberg-catalog-type supports this URI. Nessie-based catalogs, for instance, need a proper REST API endpoint.

--iceberg-warehouse (-H)

  • What It Does: Specifies where your actual Iceberg data files and metadata will live (S3, HDFS, local filesystem, etc.).
  • Why It Matters: If the warehouse location is off, queries or data loads could fail or become invisible to your analytics engine.

--iceberg-catalog-properties (-R)

  • What It Does: Lets you pass additional key-value properties for your catalog.
  • Why It Matters: Perfect for adding authentication tokens, adjusting timeouts, or fine-tuning config for your environment.
  • Example: --iceberg-catalog-properties "authToken=XYZ;clientTimeout=60000"

--iceberg-namespace (-N)

  • What It Does: The Iceberg namespace (like a database schema) where the table will be created.
  • Why It Matters: Helps keep your tables organized. If not specified, Ora2Iceberg may default to the source schema name, but using --iceberg-namespace offers clearer structure.

--iceberg-table (-t)

  • What It Does: Names the table in Iceberg.
  • Why It Matters: This is how you’ll reference the table for future queries. By default, it might match the source object—but customizing it can keep naming conventions consistent across projects.

--iceberg-id-columns (-I)

  • What It Does: Tells Ora2Iceberg which columns uniquely identify each row.
  • Why It Matters: Useful for incremental updates or deduplications. If you don’t specify these, any merges or updates will rely on less precise conditions.

--iceberg-partition (-P)

  • What It Does: Defines how Iceberg partitions your data, such as by date or another key.
  • Why It Matters: Partitioning is huge for performance and cost savings—queries can skip big chunks of data, reading only what’s relevant.

--iceberg-max-file-size (-Z)

  • What It Does: Caps the size of each data file created during migration.
  • Why It Matters: Too many tiny files can slow down reads; enormous files can hamper parallel reads. Tweak this to optimize performance.

3. Additional Options

–add-rowid-to-iceberg (-r)

  • What It Does: Includes Oracle’s ROWID as ORA_ROW_ID in Iceberg.
  • Why It Matters: ROWID can be helpful for tracking lineage (where the row originally lived) or for incremental merges.
  • When to Use: If you plan on debugging or analyzing row origins later, or need that tight linkage to Oracle.

–rowid-column (-q)

  • What It Does: Gives a custom name to the ROWID column in Iceberg.
  • Why It Matters: If you prefer a more meaningful name (e.g., ORACLE_ROW_REF), this is how to set it.

–upload-mode (-L)

  • What It Does: Decides how data is uploaded. Options:
    • full: Replace the entire table.
    • incremental: Append new records since the last load.
    • merge: (not yet supported, but will synchronize changes in the future).
  • Why It Matters: Helps align your data flow with business needs. Are you recreating the table each time, or just adding new rows?

–default-number-type (-d)

  • What It Does: Specifies a default Iceberg type for Oracle NUMBER columns if no precision is defined.
  • Why It Matters: Oracle NUMBER can be ambiguous. This ensures consistent typing (e.g., decimal(38,8)), preventing accidental overflow or truncation.
  • Example: --default-number-type "decimal(38,8)"

–data-type-map (-m)

  • What It Does: Gives fine-grained control over how source columns map to Iceberg types.
  • Why It Matters: Suppose you want certain NUMBER columns to always become long. This parameter is your key to customizing conversions.
  • Example:bashCopy code--data-type-map "ZONE_CONTROL:NUMBER=integer; %_ID:NUMBER=long; LOCATOR_%:NUMBER=decimal(38,0)"
    • Any column exactly named ZONE_CONTROLinteger
    • Any column ending in _IDlong
    • Any column starting with LOCATOR_decimal(38,0)

–auto-infer-types (-f)

  • What It Does: Currently not supported—in future versions, Ora2Iceberg will automatically infer whether a NUMBER column is integer, long, or decimal.
  • Why It Matters: Automated inference can make migrations more accurate and speed up schema definition, reducing manual mapping.

Wrapping Up

Whether you’re just dipping a toe into Iceberg or orchestrating a full-scale production pipeline, these parameters give you granular control over your Oracle-to-Iceberg migration. By fine-tuning source connections, catalog details, data types, and partition strategies, you can optimize performance, maintain data integrity, and keep your warehouse environment squeaky clean.

Next Steps:

  1. Try It Out: Test a basic load with minimal parameters to get a feel for Ora2Iceberg’s workflow.
  2. Refine & Repeat: Layer in --source-where, --iceberg-partition, and custom type mappings as you refine your data flow.
  3. Stay Tuned: Keep an eye out for future updates like merge and auto-infer-types to make Ora2Iceberg even more powerful.

Happy migrating!