DaaS / Products / Custom RAG with Optimized Search Relevance

Custom RAG with Optimized Search Relevance

A developer builds a full custom RAG system — training both a domain-specific LLM and custom embedding models on PAI, with vector retrieval via OpenSearch/Elasticsearch and OSS — then layers comprehensive search relevance tuning including neural reranking with the custom model, BM25 weight configuration, and synonym management for production-grade retrieval quality.

Products involved

Scenario

Use this workflow when building a production-grade RAG system requiring domain-specific accuracy and granular search control. By training custom embedding and LLM models on PAI, deploying them via Bailian, and orchestrating hybrid retrieval through OpenSearch with OSS-backed storage, you achieve precise neural reranking alongside tuned BM25 and synonym configurations.

Integration steps

Stage corpus in OSS: ossutil cp -r ./domain_data/ oss://rag-bucket/corpus/
Fine-tune on PAI: Submit a DLC job to train embeddings and the LLM.

pai submit --workspace ws-123 --job-type DLC --config '{"model": "qwen-7b", "train_data": "oss://rag-bucket/corpus/"}'

Deploy via Bailian: Register the trained model as a managed endpoint.

curl -X POST https://bailian.aliyuncs.com/v1/models/deploy -H "Authorization: Bearer $KEY" -d '{"model_id": "pai-embed-v1", "instance_type": "ml.gu7.xlarge"}'

Index vectors in OpenSearch: Generate embeddings via Bailian SDK, then bulk-index.

es.bulk(index="rag_docs", body=[{"index": {"_id": id}}, {"text": t, "vector": v}] for t, v in zip(texts, vectors))

Configure BM25 & Synonyms: Apply index-level relevance tuning.

PUT /rag_docs/_mapping {"properties": {"text": {"type": "text", "similarity": "BM25", "boost": 0.6}}} PUT /rag_docs/_settings {"analysis": {"filter": {"syn": {"type": "synonym", "synonyms_path": "dict.txt"}}}}

Attach Neural Reranker: Create an OpenSearch search pipeline pointing to Bailian.

PUT /_search/pipeline/neural-rerank {"phase_results_processors": [{"rerank": {"model_id": "pai-rerank-v1", "endpoint": "https://bailian.aliyuncs.com/v1/rerank", "weight": 0.8}}]}

Run Hybrid Query: Combine kNN, BM25, and reranking.

GET /rag_docs/_search {"pipeline": "neural-rerank", "query": {"hybrid": {"queries": [{"match": {"text": "query"}}, {"knn": {"vector": [...], "k": 50}}]}}}

Architecture

Raw documents in OSS feed PAI for model training. PAI outputs optimized embedding/reranker weights, deployed as low-latency endpoints on Bailian. OpenSearch stores dense vectors and inverted indices, applying BM25/synonym filters at query time. It calls the Bailian reranker to re-score top-k candidates, then passes the optimized context to the Bailian-hosted LLM for generation.

Prerequisites

Alibaba Cloud account with PAI, Bailian, OpenSearch, and OSS enabled
Domain dataset uploaded to an OSS bucket
PAI workspace with GPU quota
Bailian API key and OpenSearch cluster with search_pipeline plugin
Python environment with alibabacloud-bailian-sdk, opensearch-py, and oss2

Common pitfalls

Dimension mismatch: PAI embedding output must exactly match OpenSearch knn field dimensions.
Weight imbalance: Overweighting neural rerank (>0.9) suppresses exact keyword matches; start at 0.6 BM25 / 0.4 neural.
Synonym formatting: OpenSearch requires strict syn1, syn2 => canonical syntax; malformed lines break index mapping.
Pipeline timeouts: Bailian inference can exceed OpenSearch’s 30s default; set timeout: 60s in the pipeline config.
Non-UTF-8 OSS files: Breaks PAI training; validate encoding with file -i before ingestion.

Typical questions

build custom RAG with optimized search relevance
full RAG pipeline with search tuning
train custom LLM and embeddings then optimize OpenSearch
PAI custom models with BM25 and reranking pipeline
end-to-end RAG with neural reranking and synonym config
自定义RAG系统加搜索相关性优化
PAI训练模型配合RAG和OpenSearch调优
全链路RAG加BM25权重和嵌入训练

FAQ

Q: How do I build a custom RAG pipeline with optimized search relevance? A: You can build a custom RAG system with optimized search relevance by combining PAI for training, OpenSearch or Elasticsearch with OSS for retrieval, and Bailian for deployment. This workflow supports training domain-specific LLMs and custom embeddings on PAI, then applying comprehensive tuning such as neural reranking, BM25 weight configuration, and synonym management.