DaaS / Products / Custom RAG: Train Embeddings to Production App

Custom RAG: Train Embeddings to Production App

Train custom domain-specific embedding models on PAI, build vector indexes in OpenSearch/Elasticsearch with documents stored in OSS, then deploy the production RAG retrieval application using Bailian's orchestration layer—covering the full lifecycle from model training through semantic search deployment.

Products involved

Scenario

Use this workflow when generic embedding models fail to capture proprietary terminology, compliance jargon, or domain-specific semantics. By training a custom model on PAI, persisting vectors in OSS-backed OpenSearch indexes, and orchestrating retrieval via Bailian, you achieve high-precision, low-latency RAG tailored to enterprise data.

Integration steps

Stage raw data in OSS: ossutil cp -r ./domain_docs/ oss://<bucket>/raw/ --acl private
Train embeddings on PAI: Mount OSS in PAI-DSW and submit: pai submit --job-name custom-emb --oss-input oss://<bucket>/raw/ --oss-output oss://<bucket>/models/ --framework pytorch --image registry.cn-hangzhou.aliyuncs.com/pai/pytorch:2.0
Generate & persist vectors: Run inference using the trained checkpoint, outputting .parquet files to oss://<bucket>/vectors/.
Deploy inference model in OpenSearch: curl -X PUT "https://<opensearch-endpoint>/_plugins/_ml/models/_upload" -d '{"model_id": "custom-emb-v1", "model_path": "oss://<bucket>/models/"}'
Create vector index in Elasticsearch: PUT /rag-index { "mappings": { "properties": { "embedding": { "type": "dense_vector", "dims": 768, "index": true, "similarity": "cosine" }, "content": { "type": "text" } } } }
Bulk ingest data: Download vectors via ossutil, then load into ES: curl -X POST "https://<es-endpoint>/rag-index/_bulk" -H "Content-Type: application/json" -d @vectors.json
Configure Bailian RAG pipeline: POST /v1/apps/rag/create -d '{"name": "domain-rag", "retriever": {"type": "elasticsearch", "endpoint": "https://<es-endpoint>", "index": "rag-index", "top_k": 5}}'
Deploy & invoke: Call Bailian orchestration API to route queries through the hybrid retriever and LLM.

Architecture

Raw documents reside in OSS. PAI mounts OSS to train a domain-specific embedding model, outputting weights back to OSS. OpenSearch deploys the model for inference, while Elasticsearch hosts the dense_vector index storing both text chunks and embeddings. Bailian acts as the orchestration layer, querying the ES index via hybrid search, reranking results, and injecting context into the LLM prompt for generation.

Prerequisites

Alibaba Cloud account with OSS, PAI-DSW, OpenSearch/Elasticsearch, and Bailian enabled.
OSS bucket with lifecycle rules configured for raw data and model artifacts.
PAI workspace with GPU quota and VPC network access to OSS.
OpenSearch/Elasticsearch instance with vector search plugin enabled.
Bailian API key and application quota for RAG orchestration.

Common pitfalls

Dimension mismatch: PAI-trained model outputs 768 dims, but ES index mapping expects 1024. Always verify dims in the dense_vector mapping matches the model's config.json.
OSS latency during bulk ingestion: Directly streaming large .parquet files from OSS to ES causes timeouts. Use ossutil to download locally, then chunk with _bulk.
OpenSearch model deployment failure: Missing IAM role for OpenSearch to read OSS paths. Attach AliyunOSSReadOnlyAccess to the OpenSearch service-linked role.
Bailian context window overflow: Retrieving top_k=10 with long chunks exceeds token limits. Implement chunk size limits (512 tokens) and set top_k=5 in Bailian retriever config.
Embedding drift: Retraining without versioning breaks existing indexes. Tag model artifacts in OSS (e.g., v1.2.0) and use index aliases in ES for zero-downtime swaps.

Typical questions

end-to-end custom RAG pipeline
train embeddings and deploy RAG app
full stack RAG Alibaba Cloud
PAI to Elasticsearch RAG deployment
custom model RAG with Bailian
训练嵌入模型并部署RAG应用
从模型训练到RAG生产部署
PAI训练后部署向量检索服务

FAQ

Q: How is the end-to-end custom RAG pipeline implemented on Alibaba Cloud? A: The end-to-end custom RAG pipeline trains domain-specific embedding models on PAI, builds vector indexes in OpenSearch or Elasticsearch using documents stored in OSS, and deploys the production application via Bailian’s orchestration layer. This integrated workflow spans multiple Alibaba Cloud services to cover the full lifecycle from model training through semantic search deployment.