DaaS / Products / Custom RAG Training to Personalized Production Search

Custom RAG Training to Personalized Production Search

A team trains domain-specific embedding models and fine-tunes LLMs on PAI using curated datasets, deploys a production-grade hybrid retrieval pipeline (custom vector + BM25 with reranking) into OpenSearch/Elasticsearch, then layers AIRec on top to deliver personalized, semantically-rich search results — combining the full production RAG hardening (fine-tuning, reranking, deployment best practices) with a recommendation engine that tailors results per user.

Products involved

Scenario

Use this workflow when off-the-shelf embeddings fail to capture domain-specific terminology and standard search lacks user-level personalization. It combines PAI-trained custom embeddings and fine-tuned LLMs with OpenSearch/Elasticsearch hybrid retrieval (vector + BM25 + reranking), then layers AIRec to dynamically rank results based on real-time user behavior and historical preferences.

Integration steps

  1. Stage raw corpora in OSS: Upload domain documents and telemetry: ossutil cp -r ./domain_data oss://<bucket>/rag-corpus/ --include ".pdf,.json".
  2. Train custom embeddings on PAI: In a PAI-DSW notebook, mount the bucket and run: pai submit --job-name custom-emb --image registry.cn-hangzhou.aliyuncs.com/pai/nlp-emb:latest --data oss://<bucket>/rag-corpus/ --output oss://<bucket>/models/emb-v1/ --config '{"dim": 768, "batch_size": 32}'.
  3. Fine-tune LLM via Bailian: Generate instruction-tuning pairs from the corpus, then submit: bailian model finetune --base qwen-turbo --dataset oss://<bucket>/rag-corpus/instructions.jsonl --output oss://<bucket>/models/llm-ft/.
  4. Deploy hybrid index in OpenSearch: Create an index with knn and text mappings. Bulk-ingest vectors: curl -X POST "https://<opensearch-endpoint>/rag-index/_bulk" -H "Content-Type: application/json" -d @vectors.json.
  5. Configure hybrid retrieval & reranking: Query with a weighted script_score: {"query": {"script_score": {"query": {"match": {"content": "$query"}}, "script": {"source": "cosineSimilarity(params.query_vector, 'embedding') + 0.5 * _score"}}}}. Pipe top-50 results to Bailian reranker: POST https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank.
  6. Layer AIRec for personalization: Sync user events: POST https://<airec-endpoint>/v2/openapi/instances/<instance-id>/actions/bulk. Configure a ranking strategy that blends semantic relevance with AIRec’s CTR prediction.
  7. Provision & serve: Run terraform apply to deploy ALinux/ECS inference nodes behind Cloudflare Workers for edge caching and JWT validation.

Architecture

Raw documents and user telemetry land in OSS. PAI consumes the corpus to train domain-specific embeddings and fine-tune the LLM. Resulting vectors and metadata are bulk-indexed into OpenSearch/Elasticsearch, which executes hybrid retrieval and applies a Bailian reranker. AIRec ingests real-time signals and dynamically re-ranks candidates. The final response is served through ALinux/ECS endpoints, cached and secured at the edge via Cloudflare.

Prerequisites

Common pitfalls

Typical questions