An AI training or inference job on PAI reads data from OceanBase (or RDS) and runs slowly; the developer monitors the PAI job to identify resource bottlenecks, discovers slow database queries are the cause, then optimizes those SQL queries in OceanBase.
When a PAI training or inference job shows high io_wait or prolonged GPU idle time, the bottleneck typically stems from data ingestion rather than compute. This workflow correlates PAI resource telemetry with OceanBase query diagnostics to isolate slow SQL, apply targeted optimizations, and restore pipeline throughput.
TrainingJobId.``bash pai-cli training-job describe --job-id <TrainingJobId> --metrics cpu,gpu,io_wait ` Flag jobs where io_wait > 40%` during the data-loading epoch.
/api/v1/jobs/{TrainingJobId}/logs. Filter for JDBC execution traces and copy the exact query string.``sql EXPLAIN SELECT * FROM training_data WHERE feature_ts BETWEEN '2024-01-01' AND '2024-01-02'; ` Look for table_scan or high cost` in the plan output.
``sql SELECT query_sql, elapsed_time, scan_type FROM oceanbase.GV$OB_SQL_AUDIT WHERE query_sql LIKE '%training_data%'; ``
``sql CREATE INDEX idx_feature_ts ON training_data(feature_ts, label_id); ALTER SYSTEM FLUSH PLAN CACHE; ``
io_wait drops below 15% and GPU utilization stabilizes.PAI orchestrates compute workloads and pulls training batches via JDBC/ODBC. OceanBase executes SQL queries, manages indexes, and streams result sets back to PAI containers. Monitoring flows bidirectionally: PAI emits node-level metrics to CloudMonitor, while OceanBase exposes query execution plans and audit trails. The integration bridges PAI’s I/O telemetry with OceanBase’s SQL diagnostics to isolate data-fetch latency.
oceanbase system DB accessibleAliyunPAIFullAccess + AliyunOceanBaseDBAccesspai-cli and mysql-client available in your debug environmentio_wait may originate from VPC routing, not slow SQL. Always cross-check OceanBase elapsed_time before rewriting queries.FLUSH PLAN CACHE forces the optimizer to reuse the original full-scan path.VARCHAR vs INT) bypasses indexes. Ensure PAI query parameters exactly match OceanBase column definitions.max_connections.