A data scientist runs ML training jobs on PAI that read/write large datasets in RDS. When training is slow or fails, they need to monitor PAI job metrics (GPU utilization, training logs) alongside RDS performance (slow queries, database CPU) to pinpoint whether the bottleneck is in the compute layer or the data layer.
A data scientist runs ML training jobs on PAI that read/write large datasets in RDS. When training is slow or fails, they need to monitor PAI job metrics (GPU utilization, training logs) alongside RDS performance (slow queries, database CPU) to pinpoint whether the bottleneck is in the compute layer or the data layer.
See pai/pai-monitor-jobs.
See rds/rds-monitor-performance.