DaaS / Products / Custom RAG: Train Embeddings to Production App

Custom RAG: Train Embeddings to Production App

Train custom domain-specific embedding models on PAI, build vector indexes in OpenSearch/Elasticsearch with documents stored in OSS, then deploy the production RAG retrieval application using Bailian's orchestration layer—covering the full lifecycle from model training through semantic search deployment.

Products involved

Scenario

Use this workflow when generic embedding models fail to capture proprietary terminology, compliance jargon, or domain-specific semantics. By training a custom model on PAI, persisting vectors in OSS-backed OpenSearch indexes, and orchestrating retrieval via Bailian, you achieve high-precision, low-latency RAG tailored to enterprise data.

Integration steps

  1. Stage raw data in OSS: ossutil cp -r ./domain_docs/ oss://<bucket>/raw/ --acl private
  2. Train embeddings on PAI: Mount OSS in PAI-DSW and submit: pai submit --job-name custom-emb --oss-input oss://<bucket>/raw/ --oss-output oss://<bucket>/models/ --framework pytorch --image registry.cn-hangzhou.aliyuncs.com/pai/pytorch:2.0
  3. Generate & persist vectors: Run inference using the trained checkpoint, outputting .parquet files to oss://<bucket>/vectors/.
  4. Deploy inference model in OpenSearch: curl -X PUT "https://<opensearch-endpoint>/_plugins/_ml/models/_upload" -d '{"model_id": "custom-emb-v1", "model_path": "oss://<bucket>/models/"}'
  5. Create vector index in Elasticsearch: PUT /rag-index { "mappings": { "properties": { "embedding": { "type": "dense_vector", "dims": 768, "index": true, "similarity": "cosine" }, "content": { "type": "text" } } } }
  6. Bulk ingest data: Download vectors via ossutil, then load into ES: curl -X POST "https://<es-endpoint>/rag-index/_bulk" -H "Content-Type: application/json" -d @vectors.json
  7. Configure Bailian RAG pipeline: POST /v1/apps/rag/create -d '{"name": "domain-rag", "retriever": {"type": "elasticsearch", "endpoint": "https://<es-endpoint>", "index": "rag-index", "top_k": 5}}'
  8. Deploy & invoke: Call Bailian orchestration API to route queries through the hybrid retriever and LLM.

Architecture

Raw documents reside in OSS. PAI mounts OSS to train a domain-specific embedding model, outputting weights back to OSS. OpenSearch deploys the model for inference, while Elasticsearch hosts the dense_vector index storing both text chunks and embeddings. Bailian acts as the orchestration layer, querying the ES index via hybrid search, reranking results, and injecting context into the LLM prompt for generation.

Prerequisites

Common pitfalls

Typical questions

FAQ

Q: How is the end-to-end custom RAG pipeline implemented on Alibaba Cloud? A: The end-to-end custom RAG pipeline trains domain-specific embedding models on PAI, builds vector indexes in OpenSearch or Elasticsearch using documents stored in OSS, and deploys the production application via Bailian’s orchestration layer. This integrated workflow spans multiple Alibaba Cloud services to cover the full lifecycle from model training through semantic search deployment.