A developer fine-tunes a custom embedding and reranking model on PAI, deploys it to Bailian for Elasticsearch neural reranking, then builds on that optimized search layer a full RAG chatbot for document Q&A alongside AIRec-powered semantic recommendations—creating an end-to-end pipeline where the custom-trained model directly improves retrieval quality for both the conversational and recommendation surfaces.
Use this pipeline when enterprise search and recommendation surfaces require domain-specific semantic understanding that off-the-shelf models cannot provide. It is ideal for teams needing a unified retrieval layer where a single fine-tuned embedding/reranker powers both conversational Q&A and personalized content discovery without duplicating infrastructure.
pai-cli model export --workspace-id <ws_id> --model-path oss://<bucket>/models/reranker-v1/
pai-cli eas deploy --model oss://<bucket>/models/reranker-v1 --instance-type ecs.g6.xlarge --replicas 2
POST https://dashscope.aliyuncs.com/api/v1/models/register {"model_name": "custom-rerank-v1", "endpoint": "<pai-eas-url>", "type": "reranker"}
POST https://dashscope.aliyuncs.com/api/v1/pipeline/ingest {"source": "oss://<bucket>/docs/", "target_index": "enterprise-kb", "chunk_size": 512}
PUT /enterprise-kb/_settings {"index.knn": true, "neural_search.model_id": "custom-rerank-v1", "rerank.top_k": 50}
GET /enterprise-kb/_search {"_source": ["content"], "query": {"neural": {"text_embedding": {"query_text": "user_question", "k": 10}}}}
POST /v2/openapi/instances/<airec_id>/actions/sync-schema {"feature_source": "elasticsearch", "index_name": "enterprise-kb", "custom_ranker_url": "<pai-eas-url>"}
wrangler deploy --name airec-pai-proxy --config cloudflare.toml
Raw documents reside in OSS and are parsed/chunked by Bailian before indexing into Elasticsearch. Bailian’s neural plugin routes embedding and reranking requests to the PAI-EAS endpoint hosting your fine-tuned model. Elasticsearch acts as the unified vector/hybrid search layer. The RAG application queries ES directly for context retrieval, while AIRec consumes the same ES index for candidate generation, applying the PAI-EAS model for real-time personalized ranking. Cloudflare sits at the edge to cache frequent inference payloads and enforce rate limits.
model:register and pipeline:ingest permissionsknn and neural-search plugins installeddims in the ES knn mapping; otherwise, ingestion fails with IllegalArgumentException.rerank.top_k ≤ 50 to avoid cascading timeouts.sync-schema must be re-run, or the custom ranker will silently drop missing features.neural_search.auth_token rotation in ES cluster settings.Q: How do I fine-tune a custom model on PAI and use it to build a RAG chatbot with semantic recommendations? A: You can build this end-to-end pipeline by fine-tuning a custom embedding and reranking model on PAI, deploying it to Bailian for Elasticsearch neural reranking, and then constructing a RAG chatbot alongside AIRec-powered semantic recommendations. This workflow establishes an optimized search layer where the custom-trained model directly improves retrieval quality for both the conversational and recommendation surfaces.