DaaS / Products / Custom RAG with Tuned Search Relevance

Custom RAG with Tuned Search Relevance

Build a fully custom RAG pipeline (fine-tuned LLM on PAI deployed to Bailian, custom embeddings, OpenSearch vector store) and then tune the OpenSearch retrieval layer — BM25 weights, tokenization, NER, and ranking models — to maximize answer quality from the retrieval stage.

Products involved

Scenario

Use this workflow when building a production RAG system requiring domain-specific generation and highly tuned retrieval. You will fine-tune a custom LLM and embedding model on PAI, deploy the LLM via Bailian, and store vectors in OpenSearch while optimizing BM25, tokenization, NER, and learning-to-rank models to maximize context precision.

Integration steps

  1. Stage Data in OSS: Upload domain corpus using ossutil cp -r ./docs oss://rag-data/. Format as JSONL: {"id": "doc1", "text": "...", "meta": {}}.
  2. Train on PAI: In a PAI-DSW notebook, submit training via pai-deploy-inference SDK: pai.training.submit(model="qwen-7b", dataset="oss://rag-data/train.jsonl", instance="ecs.gn7i-c8g1.2xlarge"). Train both the LLM and a custom embedding model (e.g., bge-m3 base).
  3. Deploy LLM to Bailian: Register and deploy the fine-tuned checkpoint: bailian.models.register(name="custom-llm", path="oss://models/llm/") then bailian.deploy_model(model_id="custom-llm", endpoint="rag-gen-prod", min_instances=1).
  4. Index Vectors in OpenSearch: Generate embeddings via PAI, then create the index: PUT /rag-index {"mappings": {"properties": {"embedding": {"type": "dense_vector", "dims": 1024, "index": true, "similarity": "cosine"}, "content": {"type": "text", "analyzer": "custom_ik"}}}}.
  5. Tune Relevance Layer:
  1. Execute Hybrid Query: POST /rag-index/_search {"query": {"bool": {"should": [{"match": {"content": {"query": "q", "boost": 0.35}}}, {"knn": {"embedding": {"vector": [...], "k": 5, "boost": 0.65}}}]}}}. Pass top-k context to Bailian: bailian.inference(endpoint="rag-gen-prod", prompt=f"Context: {ctx}\nQ: {q}").

Architecture

Raw documents flow from OSS to PAI for dual training (LLM + embeddings). The LLM is deployed as a managed inference endpoint on Bailian. OpenSearch acts as the retrieval orchestrator, applying custom analyzers, BM25 tuning, and ML ranking to score chunks. The application queries OpenSearch for context, then forwards it to the Bailian endpoint for generation.

Prerequisites

Common pitfalls

Typical questions

FAQ

Q: How do I build a custom RAG pipeline and tune its search relevance and retrieval ranking? A: You can build a fully custom RAG pipeline by fine-tuning an LLM on PAI, deploying it via Bailian, using custom embeddings, and leveraging OpenSearch as a vector store before tuning the retrieval layer to maximize answer quality. This optimization involves adjusting the OpenSearch retrieval layer’s BM25 weights, tokenization, named entity recognition, and ranking models. These capabilities are implemented through integrated product combinations such as full-custom-rag-custom-llm-custom-embeddings and opensearch-optimize-relevance.