A developer trains custom embedding models using PAI on domain-specific datasets, builds a vector search pipeline with OpenSearch and Elasticsearch storing embeddings in OSS, then deploys a production RAG chatbot application on Elasticsearch that queries the custom-trained vector indexes for domain-accurate retrieval-augmented generation.
Use this workflow when off-the-shelf embeddings fail to capture domain-specific terminology, jargon, or compliance requirements. By training a custom model on PAI, persisting vectors in OSS-backed OpenSearch indexes, and deploying the retrieval layer on Elasticsearch, you achieve high-precision, low-latency RAG grounded in proprietary enterprise data.
ossutil cp -r ./data oss://<bucket>/raw/ (skill: oss-manage-objects).pai submit --job-name custom-emb --image pai/pytorch:2.0 --script train.py --oss_input oss://<bucket>/raw/ --oss_output oss://<bucket>/models/ (skill: pai-manage-data)..pt file using the ML plugin: POST /_plugins/_ml/models/_register {"name": "domain-emb", "function_name": "TEXT_EMBEDDING", "model_format": "TORCH_SCRIPT", "model_content_hash_value": "<sha256>"} (skill: opensearch-deploy-model).knn mapping aligned with your model output: PUT /domain_vectors {"mappings": {"properties": {"embedding": {"type": "knn_vector", "dimension": 768, "method": {"name": "hnsw", "space_type": "cosinesimil"}}}}}._bulk API with a text_embedding processor to generate vectors and persist them to OSS-backed storage.domain_vectors index to your production ES cluster.es-deploy-application to initialize the pipeline: POST /_application/rag {"name": "prod-chat", "retriever": {"type": "vector", "index": "domain_vectors", "top_k": 5}, "llm": {"provider": "bailian", "model": "qwen-plus"}}.POST /_search {"_source": ["text"], "retriever": {"rrf": {"retrievers": [{"knn": {"field": "embedding", "query_vector": [...], "k": 5}}]}}}.Raw documents flow from OSS into PAI for custom model training. The resulting weights are registered in OpenSearch, which runs inference on new documents and stores vector embeddings in OSS-backed storage. Elasticsearch replicates the index, executes hybrid retrieval (keyword + vector), and routes context to Bailian LLMs for answer synthesis, completing the RAG loop.
ecs.gn7i) and OpenSearch ML plugin enabled.AliyunOSSFullAccess and AliyunElasticsearchFullAccess attached to service nodes.dimension in knn_vector mapping matches the model exactly.AccessDenied if OpenSearch/ES nodes lack the correct RAM policy. Attach AliyunOSSFullAccess to the service role.rrf (Reciprocal Rank Fusion) to balance keyword and vector signals.max_tokens. Enforce chunk limits during PAI preprocessing and set max_context_length: 4096 in the ES app config.Q: How do I build an end-to-end custom RAG pipeline that trains embeddings and deploys the application? A: You can build this pipeline by combining Alibaba Cloud services including PAI for training, OpenSearch and Elasticsearch for vector search, and OSS for storage. The workflow involves training custom embedding models on domain-specific datasets, building the vector search pipeline, and deploying a production RAG chatbot on Elasticsearch to query the custom-trained indexes.