# oceanbase-data

Part of **OCEANBASE**

<!-- intent-backlink:auto -->

> 💡 **Path Selection**: This skill is one implementation path for [Import or export data to/from OceanBase](../../intent/oceanbase-import-data/SKILL.md). If you're unsure which path to take, check the routing skill first.

# OceanBase Data Loading

## Capabilities Overview

| Sub-capability | Calling Mode | Description |
|----------------|--------------|-------------|
| Load Data from File | Synchronous | Load data from external files into OceanBase Database tables using the `LOAD DATA` SQL command with support for parallelism, formatting, and duplicate handling. |

## API Calling Mode

### Authentication
The primary authentication method is database-level credentials via the `obclient` CLI or compatible MySQL client:
- Use connection parameters: `-h <host>`, `-u <user@tenant>`, `-P <port>`, `-p <password>`
- Environment variables are not used; credentials are passed directly in the client command
- Alternative: Standard MySQL-compatible drivers (e.g., JDBC, Python mysql-connector) can be used with username/password authentication

### Service Endpoint
OceanBase Data Loading uses direct database connections rather than HTTP REST endpoints:
- Connect to the OceanBase database instance using its IP address or hostname and port (default: 2881)
- No region-specific URL pattern — connect directly to your deployed OceanBase cluster
- Example connection: `obclient -h192.168.1.101 -utpcc@obbmsql -P2881 -p123456`

### Synchronous Pattern
The `LOAD DATA` command executes synchronously within a database session:
1. Establish a connection to the OceanBase database using `obclient` or a MySQL-compatible client
2. Ensure the input file is placed on an OBServer node where the table leader resides
3. Execute the `LOAD DATA` SQL statement with required parameters (e.g., `infile`, table name)
4. The command returns immediately upon completion with a summary of affected rows
5. Parse the response to extract metrics like records loaded, duplicates skipped, and warnings

Key requirements:
- The `secure_file_priv` global variable must be set to an empty string (`''`) to allow file access
- User must have the `FILE` privilege granted (`GRANT FILE ON *.* TO 'user'@'%'`)
- For `REPLACE` or `IGNORE` options, the target table must have a primary key

## Parameter Reference

### Load Data from File

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| infile | string | Yes | — | File must be on OBServer with table leader | Specifies the directory and name of the input file to be loaded |
| parallel | integer | No | — | Range [0, max CPUs of tenant] | Specifies the degree of parallelism for loading data |
| replace | boolean | No | — | MySQL tenants only | Replaces existing rows if primary key conflict occurs |
| ignore | boolean | No | — | Requires primary key | Ignores duplicate data in the input file |
| fields_terminated_by | string | No | `,` | — | Specifies the delimiter used between fields in the input file |
| fields_enclosed_by | string | No | — | — | Specifies the character that encloses each field |
| fields_escaped_by | string | No | — | — | Specifies the escape character used in the input file |
| lines_terminated_by | string | No | `\n` | — | Specifies the end-of-line character in the input file |
| lines_starting_by | string | No | — | — | Specifies the start-of-line character in the input file |
| ignore_lines | integer | No | — | — | Ignores a specified number of lines from the beginning of the file |
| column_mapping | array | No | — | Must match column count or use NULL | Specifies mapping between table columns and file fields |

## Code Examples

### Basic Load with Parallelism - Bash - All Regions

```bash
load data /*+ parallel(4) */ infile '/home/admin/a.csv' into table t
```

### Load with Custom Delimiters - Bash - All Regions

```bash
load data /*+ parallel(4) */ infile '/home/admin/a.csv' into table t fields terminated by ',' lines terminated by '\n';
```

### Load with Column Mapping and Timestamp - Bash - All Regions

```bash
load data infile '/home/admin/a.csv' into table t (id, names) set dates=CURRENT_TIMESTAMP;
```

### Full End-to-End Example with obclient - Bash - All Regions

```bash
[admin@h07g12092.sqa.eu95 /home/admin/csvdata]
$more ware__df8f30ac_64e0_474c_9cc4_9919d64c5e4c
2,1200,.0862,L6xwRsbDk,xEdT1jkENtbLwoI1Zb0,NT0j4RCQ4OqrS,vlwzndw2FPrO,XR,063311111
1,1200,.1868,n1P4zYo8OH,jTNkXKWXOdh,lf9QXTXXGoF04IZBkCP7,srRq15uvxe5,GQ,506811111

$obclient  -h192.168.1.101 -utpcc@obbmsql -P2881 -p123456

obclient> load data infile '/home/admin/csvdata/ware__df8f30ac_64e0_474c_9cc4_9919d64c5e4c' into table ware2 fields terminated by ',' lines terminated by '\n';
Query OK, 2 rows affected (0.02 sec)
Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

obclient> select * from ware2\G
*************************** 1. row ***************************
      W_ID: 2
     W_YTD: 1200
     W_TAX: .0862
    W_NAME: L6xwRsbDk
W_STREET_1: xEdT1jkENtbLwoI1Zb0
W_STREET_2: NT0j4RCQ4OqrS
    W_CITY: vlwzndw2FPrO
   W_STATE: XR
     W_ZIP: 063311111
*************************** 2. row ***************************
      W_ID: 1
     W_YTD: 1200
     W_TAX: .1868
    W_NAME: n1P4zYo8OH
W_STREET_1: jTNkXKWXOdh
W_STREET_2: lf9QXTXXGoF04IZBkCP7
    W_CITY: srRq15uvxe5
   W_STATE: GQ
     W_ZIP: 506811111
2 rows in set (0.00 sec)
obclient>
```

## Response Format

```text
Query OK, 2 rows affected (0.02 sec)
Records: 2  Deleted: 0  Skipped: 0  Warnings: 0
```

**Key Fields**:
- `rows_affected` — Number of rows inserted or updated in the table
- `records` — Total number of data records processed from the input file
- `deleted` — Number of rows deleted (relevant when using `REPLACE`)
- `skipped` — Number of duplicate or invalid records skipped
- `warnings` — Number of non-fatal issues encountered during loading

## Error Handling

| Error Code | Description | Recommended Action |
|------------|-------------|---------------------|
| 1062 | Duplicated primary key error. Occurs when inserting data with a duplicate primary key. | Use `REPLACE` to overwrite conflicting rows or `IGNORE` to skip duplicates. Ensure input data uniqueness if neither option is suitable. |

## FAQ

Q: Where must the input file be located for LOAD DATA to work?
A: The file must be placed on an OBServer node where the leader replica of the target table resides. You cannot load files from arbitrary client machines.

Q: Why do I get a "file not found" error even when the path is correct?
A: Verify that the `secure_file_priv` global variable is set to an empty string (`SET GLOBAL secure_file_priv = '';`). Also confirm the file permissions allow the OceanBase process to read it.

Q: Can I use LOAD DATA with Oracle tenants?
A: The `REPLACE` option is only available for MySQL tenants. Core `LOAD DATA` functionality works in both MySQL and Oracle modes, but duplicate handling differs.

Q: How do I handle large files efficiently?
A: Use the `parallel(N)` hint to enable parallel loading, where N is within [0, max CPU cores of your tenant]. Also ensure the file is split appropriately if needed.

Q: What privileges are required to run LOAD DATA?
A: The user must have the `FILE` privilege (`GRANT FILE ON *.* TO 'user'@'%';`) and standard INSERT privileges on the target table.

## Pricing & Billing

### Billing Model
Free — Data loading operations are part of core OceanBase database functionality and are not billed separately.

### Free Tier
No explicit cost mentioned; included in standard OceanBase licensing or deployment.

### Usage Limits
No specific quota limits mentioned in documentation. Resource usage (CPU, memory, I/O) is subject to tenant-level resource configurations.

### Billing Notes
While the `LOAD DATA` command itself is free, underlying resource consumption (storage, compute) follows standard OceanBase billing based on your deployment model (self-managed or cloud).