A developer fine-tunes a custom embedding or reranking model on PAI using domain-specific search data, deploys it as a managed inference endpoint on Bailian, then integrates it into Elasticsearch for neural reranking to optimize search result relevance for their application.
Use this pipeline when generic search models fail to capture domain-specific terminology or ranking preferences. By fine-tuning a custom embedding or cross-encoder reranker on PAI, deploying it as a managed Bailian endpoint, and wiring it into Elasticsearch’s neural reranking pipeline, you achieve highly tailored, production-grade search relevance without provisioning dedicated inference clusters.
ossutil cp ./train_data.jsonl oss://<bucket>/pai-training/.pai-deploy-inference intent pattern. Configure the training spec:``bash pai-cli job submit --name rerank-ft \ --image registry.cn-hangzhou.aliyuncs.com/pai/pytorch:2.1 \ --code oss://<bucket>/pai-training/train.py \ --data oss://<bucket>/pai-training/ \ --output oss://<bucket>/pai-training/model_artifacts/ \ --instance_type ml.gn7i-c8g1.2xlarge ``
bailian-deploy-model:``bash curl -X POST https://dashscope.aliyuncs.com/api/v1/models/deploy \ -H "Authorization: Bearer $BAILIAN_API_KEY" \ -d '{"model_name": "custom-rerank-v1", "source_uri": "oss://<bucket>/pai-training/model_artifacts/", "instance_type": "ml.gu7i.large"}' ``
es-optimize-results routing to install the neural-search plugin. Create a search pipeline that proxies to Bailian:``json PUT /_search/pipeline/bailian-rerank { "description": "Neural rerank via Bailian", "request_processors": [{ "neural_rerank": { "model_id": "custom-rerank-v1", "endpoint": "https://dashscope.aliyuncs.com/api/v1/services/rerank", "field": "query_text", "top_k": 10, "headers": {"Authorization": "Bearer $BAILIAN_API_KEY"} } }] } ``
``json GET /catalog/_search?search_pipeline=bailian-rerank { "query": {"multi_match": {"query": "enterprise storage", "fields": ["title^2", "desc"]}}, "size": 5 } ``
Training data flows from OSS into PAI-DLC for distributed fine-tuning. The resulting model weights are persisted in OSS and registered in Bailian’s model catalog. Bailian provisions a serverless inference endpoint that exposes a REST API compatible with ES/OpenSearch neural plugins. At query time, ES performs initial BM25/vector retrieval, forwards top-N candidates to the Bailian endpoint for cross-encoder scoring, and reorders results before returning them to the client.
ml.gn7i-c8g1.2xlarge or higher)Model:Deploy and Inference:Invoke permissionsneural-search plugin installed{"output": {"scores": [...]}} while ES expects a flat array. Wrap the endpoint with a lightweight proxy or set response_parser: "dashscope_v2" in the pipeline config.search_phase_execution_exception. Set timeout: "2s" in the pipeline and enable Bailian keep_alive: 300s.neural-search releases lack custom header support. Upgrade to opensearch-neural-search:2.10+ and verify request_processors syntax.[query, doc] pairs. If ES sends raw text without pairing, scores default to 0.0. Ensure field_mapping in the pipeline explicitly binds query and passage fields.Q: How do I fine-tune and deploy a custom search relevance model for Elasticsearch? A: You can implement this by fine-tuning a custom embedding or reranking model on PAI, deploying it as a managed inference endpoint on Bailian, and integrating it into Elasticsearch for neural reranking. This pipeline uses domain-specific search data to optimize your application's search result relevance.