DaaS / Products / Custom-Trained Models Power ES RAG App

Custom-Trained Models Power ES RAG App

Fine-tune domain-specific LLM and embedding models on PAI (Skill 3's custom training pipeline), then deploy a RAG chatbot application using Elasticsearch as the retrieval engine that calls these custom-trained models for enterprise-grade, domain-specialized question answering.

Products involved

Scenario

Use this workflow when off-the-shelf LLMs fail to grasp proprietary terminology, compliance constraints, or niche domain logic. By fine-tuning both the generation and embedding layers on PAI and anchoring retrieval in Elasticsearch, you achieve a fully customized, low-latency RAG system tailored to enterprise knowledge bases.

Integration steps

  1. Stage domain corpus in OSS: aliyun oss cp ./domain_data/ oss://rag-bucket/corpus/ --recursive
  2. Launch PAI training pipeline: Submit a DLC job using pai-dlc submit --config train_config.yaml with --model_name_or_path Qwen-7B, --output_dir oss://rag-bucket/models/llm-ft, and --embedding_dim 768.
  3. Deploy inference endpoints: Run alinux-deploy-model to containerize the embedding model (docker run -p 8000:8000 pai-emb:v1). Register the fine-tuned LLM in Bailian via POST https://dashscope.aliyuncs.com/api/v1/models/deploy with {"model_id": "llm-ft", "instance_type": "ml.gu7i.c2xlarge"}.
  4. Configure Elasticsearch index: PUT /rag-kb with mapping: "dense_vector": {"type": "dense_vector", "dims": 768, "index": true, "similarity": "cosine"}.
  5. Build ES ingest pipeline: PUT _ingest/pipeline/rag-embed with {"processors": [{"split": {"field": "content", "separator": "\n\n"}}, {"http": {"url": "http://<alinux-ip>:8000/embed", "request_method": "POST", "field_map": {"content": "input"}, "json_path": "embedding"}}]}.
  6. Ingest & vectorize: POST /rag-kb/_doc?pipeline=rag-embed with {"content": "Enterprise policy v2...", "metadata": {"source": "oss://..."}}.
  7. Deploy RAG app: Execute es-deploy-application to scaffold the orchestration layer. Configure retrieval to query GET /rag-kb/_search with {"knn": {"field": "dense_vector", "query_vector": [0.12, ...], "k": 5}}, then forward context to Bailian's POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions with {"model": "llm-ft", "messages": [...]}.

Architecture

Raw documents reside in OSS and are pulled into PAI for supervised fine-tuning of both the LLM and embedding models. The trained LLM is hosted on Bailian for managed, scalable inference, while the embedding model runs on an Alibaba Cloud Linux ECS instance. Elasticsearch acts as the vector knowledge base, using an ingest pipeline to chunk text and call the custom embedding endpoint. The RAG application orchestrates the workflow: it queries ES for top-k context via KNN search, then passes the prompt + retrieved chunks to Bailian for domain-specialized answer synthesis.

Prerequisites

Common pitfalls

Typical questions

FAQ

Q: How do I fine-tune custom models and deploy a RAG application? A: You fine-tune domain-specific LLM and embedding models on PAI and then deploy a RAG chatbot application using Elasticsearch as the retrieval engine. This configuration enables enterprise-grade, domain-specialized question answering by calling the custom-trained models during retrieval.