DaaS / Products / Custom Model RAG Platform Across ES and OpenSearch

Custom Model RAG Platform Across ES and OpenSearch

A developer fine-tunes custom embedding and reranking models on PAI, deploys them to Bailian for neural reranking across both Elasticsearch and OpenSearch engines, then builds on that unified optimized search layer a full RAG chatbot with AIRec-powered document recommendations—creating a dual-channel AI platform where conversational QA and personalized recommendations share the same custom-model-enhanced search backbone.

Products involved

Scenario

Use this workflow when you need a unified search and recommendation backbone that powers both conversational RAG and personalized content feeds. By fine-tuning domain-specific embedding and reranking models on PAI and deploying them via Bailian, you can apply identical neural relevance logic across both Elasticsearch and OpenSearch clusters while AIRec handles contextual document recommendations.

Integration steps

  1. Stage Training Data on OSS: Upload query-document pairs to an OSS bucket. Mount it to PAI-DSW: pai-dsw-cli mount --bucket oss://<bucket>/data --mount-point /data.
  2. Fine-tune on PAI: Submit a training job for embeddings/rerankers:
  3. pai-job submit --model qwen-embedding-v2 --task embedding --epochs 3 --output oss://<bucket>/models/emb_v1

  4. Deploy to Bailian: Register the model and test inference:
  5. POST https://dashscope.aliyuncs.com/api/v1/services/aigc/text-embedding/v1 Headers: Authorization: Bearer $BAILIAN_KEY, Body: {"model": "custom-emb-v1", "input": ["<text>"]}

  6. Enable Neural Reranking in ES & OpenSearch: Add Bailian plugin config to elasticsearch.yml and opensearch.yml:
  7. neural_reranker.endpoint: https://dashscope.aliyuncs.com/api/v1/services/aigc/rerank neural_reranker.model: custom-rerank-v1

  8. Create Unified Index Mapping: Apply identical schema to both engines:
  9. PUT /rag_docs { "mappings": { "properties": { "content_vector": { "type": "dense_vector", "dims": 768, "index": true, "similarity": "cosine" } } } }

  10. Sync to AIRec: Pipe ES/OS query logs to AIRec via DataHub. Call recommendations:
  11. POST https://airec.cn-shanghai.aliyuncs.com/v2/openapi/instances/<id>/actions/recommend Payload: {"userId": "u1", "scene": "rag_context", "returnCount": 5}

  12. Orchestrate RAG Pipeline: Query ES/OS with hybrid search → pass top-50 to Bailian for reranking → feed top-10 to AIRec for contextual suggestions → generate LLM response.

Architecture

Raw documents and logs reside in OSS. PAI trains custom embedding/reranker weights offline. Bailian serves these models as real-time APIs for vectorization and neural scoring. Elasticsearch and OpenSearch run parallel hybrid queries, invoking Bailian’s reranker via plugin. AIRec consumes search telemetry to generate personalized document suggestions. The application layer orchestrates retrieval, reranking, recommendation, and LLM generation into a single RAG flow.

Prerequisites

Common pitfalls

Typical questions

FAQ

Q: How do I build a custom model RAG platform with chatbot and recommendation features across Elasticsearch and OpenSearch? A: You build this platform by fine-tuning custom embedding and reranking models on PAI, deploying them to Bailian for neural reranking across both Elasticsearch and OpenSearch, and then integrating a RAG chatbot with AIRec-powered recommendations on top. This architecture creates a dual-channel AI system where conversational QA and personalized recommendations share a unified, custom-model-enhanced search backbone.