DaaS / Products / Custom RAG Pipeline: Train Embeddings to Deploy Application

Custom RAG Pipeline: Train Embeddings to Deploy Application

A developer trains custom embedding models using PAI on domain-specific datasets, builds a vector search pipeline with OpenSearch and Elasticsearch storing embeddings in OSS, then deploys a production RAG chatbot application on Elasticsearch that queries the custom-trained vector indexes for domain-accurate retrieval-augmented generation.

Products involved

Scenario

Use this workflow when off-the-shelf embeddings fail to capture domain-specific terminology, jargon, or compliance requirements. By training a custom model on PAI, persisting vectors in OSS-backed OpenSearch indexes, and deploying the retrieval layer on Elasticsearch, you achieve high-precision, low-latency RAG grounded in proprietary enterprise data.

Integration steps

  1. Stage raw data in OSS: Upload domain documents via ossutil cp -r ./data oss://<bucket>/raw/ (skill: oss-manage-objects).
  2. Train embeddings on PAI: Mount the OSS path in a PAI-DSW notebook and submit a training job: pai submit --job-name custom-emb --image pai/pytorch:2.0 --script train.py --oss_input oss://<bucket>/raw/ --oss_output oss://<bucket>/models/ (skill: pai-manage-data).
  3. Register model in OpenSearch: Deploy the trained .pt file using the ML plugin: POST /_plugins/_ml/models/_register {"name": "domain-emb", "function_name": "TEXT_EMBEDDING", "model_format": "TORCH_SCRIPT", "model_content_hash_value": "<sha256>"} (skill: opensearch-deploy-model).
  4. Create vector index: Define a knn mapping aligned with your model output: PUT /domain_vectors {"mappings": {"properties": {"embedding": {"type": "knn_vector", "dimension": 768, "method": {"name": "hnsw", "space_type": "cosinesimil"}}}}}.
  5. Ingest & embed: Use OpenSearch’s _bulk API with a text_embedding processor to generate vectors and persist them to OSS-backed storage.
  6. Sync to Elasticsearch: Configure Cross-Cluster Replication (CCR) or Logstash to mirror the domain_vectors index to your production ES cluster.
  7. Deploy RAG app on ES: Route through es-deploy-application to initialize the pipeline: POST /_application/rag {"name": "prod-chat", "retriever": {"type": "vector", "index": "domain_vectors", "top_k": 5}, "llm": {"provider": "bailian", "model": "qwen-plus"}}.
  8. Query & generate: Execute hybrid search: POST /_search {"_source": ["text"], "retriever": {"rrf": {"retrievers": [{"knn": {"field": "embedding", "query_vector": [...], "k": 5}}]}}}.

Architecture

Raw documents flow from OSS into PAI for custom model training. The resulting weights are registered in OpenSearch, which runs inference on new documents and stores vector embeddings in OSS-backed storage. Elasticsearch replicates the index, executes hybrid retrieval (keyword + vector), and routes context to Bailian LLMs for answer synthesis, completing the RAG loop.

Prerequisites

Common pitfalls

Typical questions

FAQ

Q: How do I build an end-to-end custom RAG pipeline that trains embeddings and deploys the application? A: You can build this pipeline by combining Alibaba Cloud services including PAI for training, OpenSearch and Elasticsearch for vector search, and OSS for storage. The workflow involves training custom embedding models on domain-specific datasets, building the vector search pipeline, and deploying a production RAG chatbot on Elasticsearch to query the custom-trained indexes.