DaaS / Products / Custom RAG with Optimized Search Relevance

Custom RAG with Optimized Search Relevance

A developer builds a full custom RAG system — training both a domain-specific LLM and custom embedding models on PAI, with vector retrieval via OpenSearch/Elasticsearch and OSS — then layers comprehensive search relevance tuning including neural reranking with the custom model, BM25 weight configuration, and synonym management for production-grade retrieval quality.

Products involved

Scenario

Use this workflow when building a production-grade RAG system requiring domain-specific accuracy and granular search control. By training custom embedding and LLM models on PAI, deploying them via Bailian, and orchestrating hybrid retrieval through OpenSearch with OSS-backed storage, you achieve precise neural reranking alongside tuned BM25 and synonym configurations.

Integration steps

  1. Stage corpus in OSS: ossutil cp -r ./domain_data/ oss://rag-bucket/corpus/
  2. Fine-tune on PAI: Submit a DLC job to train embeddings and the LLM.
  3. pai submit --workspace ws-123 --job-type DLC --config '{"model": "qwen-7b", "train_data": "oss://rag-bucket/corpus/"}'

  4. Deploy via Bailian: Register the trained model as a managed endpoint.
  5. curl -X POST https://bailian.aliyuncs.com/v1/models/deploy -H "Authorization: Bearer $KEY" -d '{"model_id": "pai-embed-v1", "instance_type": "ml.gu7.xlarge"}'

  6. Index vectors in OpenSearch: Generate embeddings via Bailian SDK, then bulk-index.
  7. es.bulk(index="rag_docs", body=[{"index": {"_id": id}}, {"text": t, "vector": v}] for t, v in zip(texts, vectors))

  8. Configure BM25 & Synonyms: Apply index-level relevance tuning.
  9. PUT /rag_docs/_mapping {"properties": {"text": {"type": "text", "similarity": "BM25", "boost": 0.6}}} PUT /rag_docs/_settings {"analysis": {"filter": {"syn": {"type": "synonym", "synonyms_path": "dict.txt"}}}}

  10. Attach Neural Reranker: Create an OpenSearch search pipeline pointing to Bailian.
  11. PUT /_search/pipeline/neural-rerank {"phase_results_processors": [{"rerank": {"model_id": "pai-rerank-v1", "endpoint": "https://bailian.aliyuncs.com/v1/rerank", "weight": 0.8}}]}

  12. Run Hybrid Query: Combine kNN, BM25, and reranking.
  13. GET /rag_docs/_search {"pipeline": "neural-rerank", "query": {"hybrid": {"queries": [{"match": {"text": "query"}}, {"knn": {"vector": [...], "k": 50}}]}}}

Architecture

Raw documents in OSS feed PAI for model training. PAI outputs optimized embedding/reranker weights, deployed as low-latency endpoints on Bailian. OpenSearch stores dense vectors and inverted indices, applying BM25/synonym filters at query time. It calls the Bailian reranker to re-score top-k candidates, then passes the optimized context to the Bailian-hosted LLM for generation.

Prerequisites

Common pitfalls

Typical questions

FAQ

Q: How do I build a custom RAG pipeline with optimized search relevance? A: You can build a custom RAG system with optimized search relevance by combining PAI for training, OpenSearch or Elasticsearch with OSS for retrieval, and Bailian for deployment. This workflow supports training domain-specific LLMs and custom embeddings on PAI, then applying comprehensive tuning such as neural reranking, BM25 weight configuration, and synonym management.