DaaS / Products / Custom RAG Pipeline: Train Embeddings to Deploy Application

Custom RAG Pipeline: Train Embeddings to Deploy Application

A developer trains custom embedding models using PAI on domain-specific datasets, builds a vector search pipeline with OpenSearch and Elasticsearch storing embeddings in OSS, then deploys a production RAG chatbot application on Elasticsearch that queries the custom-trained vector indexes for domain-accurate retrieval-augmented generation.

Products involved

Scenario

Use this workflow when off-the-shelf embeddings fail to capture domain-specific terminology, jargon, or compliance requirements. By training a custom model on PAI, persisting vectors in OSS-backed OpenSearch indexes, and deploying the retrieval layer on Elasticsearch, you achieve high-precision, low-latency RAG grounded in proprietary enterprise data.

Integration steps

Stage raw data in OSS: Upload domain documents via ossutil cp -r ./data oss://<bucket>/raw/ (skill: oss-manage-objects).
Train embeddings on PAI: Mount the OSS path in a PAI-DSW notebook and submit a training job: pai submit --job-name custom-emb --image pai/pytorch:2.0 --script train.py --oss_input oss://<bucket>/raw/ --oss_output oss://<bucket>/models/ (skill: pai-manage-data).
Register model in OpenSearch: Deploy the trained .pt file using the ML plugin: POST /_plugins/_ml/models/_register {"name": "domain-emb", "function_name": "TEXT_EMBEDDING", "model_format": "TORCH_SCRIPT", "model_content_hash_value": "<sha256>"} (skill: opensearch-deploy-model).
Create vector index: Define a knn mapping aligned with your model output: PUT /domain_vectors {"mappings": {"properties": {"embedding": {"type": "knn_vector", "dimension": 768, "method": {"name": "hnsw", "space_type": "cosinesimil"}}}}}.
Ingest & embed: Use OpenSearch’s _bulk API with a text_embedding processor to generate vectors and persist them to OSS-backed storage.
Sync to Elasticsearch: Configure Cross-Cluster Replication (CCR) or Logstash to mirror the domain_vectors index to your production ES cluster.
Deploy RAG app on ES: Route through es-deploy-application to initialize the pipeline: POST /_application/rag {"name": "prod-chat", "retriever": {"type": "vector", "index": "domain_vectors", "top_k": 5}, "llm": {"provider": "bailian", "model": "qwen-plus"}}.
Query & generate: Execute hybrid search: POST /_search {"_source": ["text"], "retriever": {"rrf": {"retrievers": [{"knn": {"field": "embedding", "query_vector": [...], "k": 5}}]}}}.

Architecture

Raw documents flow from OSS into PAI for custom model training. The resulting weights are registered in OpenSearch, which runs inference on new documents and stores vector embeddings in OSS-backed storage. Elasticsearch replicates the index, executes hybrid retrieval (keyword + vector), and routes context to Bailian LLMs for answer synthesis, completing the RAG loop.

Prerequisites

Alibaba Cloud account with PAI, OSS, OpenSearch, and Elasticsearch instances provisioned in the same VPC.
Domain dataset pre-split into chunks (<1024 tokens).
PAI workspace with GPU quota (e.g., ecs.gn7i) and OpenSearch ML plugin enabled.
RAM role with AliyunOSSFullAccess and AliyunElasticsearchFullAccess attached to service nodes.

Common pitfalls

Dimension mismatch: PAI outputs 768-dim vectors but the index expects 384. Verify dimension in knn_vector mapping matches the model exactly.
OSS permission timeouts: Ingestion fails with AccessDenied if OpenSearch/ES nodes lack the correct RAM policy. Attach AliyunOSSFullAccess to the service role.
Scoring imbalance: BM25 dominates cosine similarity in hybrid queries. Use rrf (Reciprocal Rank Fusion) to balance keyword and vector signals.
Context truncation: Bailian LLM drops retrieved chunks if they exceed max_tokens. Enforce chunk limits during PAI preprocessing and set max_context_length: 4096 in the ES app config.

Typical questions

end-to-end custom RAG pipeline
train embeddings and deploy RAG app
custom model RAG deployment
PAI to Elasticsearch RAG
训练嵌入模型并部署RAG应用
从模型训练到RAG部署
build RAG with custom embeddings
full stack RAG pipeline Alibaba Cloud

FAQ

Q: How do I build an end-to-end custom RAG pipeline that trains embeddings and deploys the application? A: You can build this pipeline by combining Alibaba Cloud services including PAI for training, OpenSearch and Elasticsearch for vector search, and OSS for storage. The workflow involves training custom embedding models on domain-specific datasets, building the vector search pipeline, and deploying a production RAG chatbot on Elasticsearch to query the custom-trained indexes.