DaaS / Products / Custom Search Relevance Model Pipeline

Custom Search Relevance Model Pipeline

A developer fine-tunes a custom embedding or reranking model on PAI using domain-specific search data, deploys it as a managed inference endpoint on Bailian, then integrates it into Elasticsearch for neural reranking to optimize search result relevance for their application.

Products involved

Scenario

Use this pipeline when generic search models fail to capture domain-specific terminology or ranking preferences. By fine-tuning a custom embedding or cross-encoder reranker on PAI, deploying it as a managed Bailian endpoint, and wiring it into Elasticsearch’s neural reranking pipeline, you achieve highly tailored, production-grade search relevance without provisioning dedicated inference clusters.

Integration steps

Stage training data in OSS: Upload query-document relevance pairs to an OSS bucket. Run ossutil cp ./train_data.jsonl oss://<bucket>/pai-training/.
Fine-tune on PAI: Trigger a PAI-DLC job using the pai-deploy-inference intent pattern. Configure the training spec:

``bash pai-cli job submit --name rerank-ft \ --image registry.cn-hangzhou.aliyuncs.com/pai/pytorch:2.1 \ --code oss://<bucket>/pai-training/train.py \ --data oss://<bucket>/pai-training/ \ --output oss://<bucket>/pai-training/model_artifacts/ \ --instance_type ml.gn7i-c8g1.2xlarge ``

Deploy via Bailian: Register the checkpoint and provision an endpoint using bailian-deploy-model:

``bash curl -X POST https://dashscope.aliyuncs.com/api/v1/models/deploy \ -H "Authorization: Bearer $BAILIAN_API_KEY" \ -d '{"model_name": "custom-rerank-v1", "source_uri": "oss://<bucket>/pai-training/model_artifacts/", "instance_type": "ml.gu7i.large"}' ``

Configure ES rerank pipeline: Follow es-optimize-results routing to install the neural-search plugin. Create a search pipeline that proxies to Bailian:

``json PUT /_search/pipeline/bailian-rerank { "description": "Neural rerank via Bailian", "request_processors": [{ "neural_rerank": { "model_id": "custom-rerank-v1", "endpoint": "https://dashscope.aliyuncs.com/api/v1/services/rerank", "field": "query_text", "top_k": 10, "headers": {"Authorization": "Bearer $BAILIAN_API_KEY"} } }] } ``

Execute reranked queries: Attach the pipeline to your search request:

``json GET /catalog/_search?search_pipeline=bailian-rerank { "query": {"multi_match": {"query": "enterprise storage", "fields": ["title^2", "desc"]}}, "size": 5 } ``

Architecture

Training data flows from OSS into PAI-DLC for distributed fine-tuning. The resulting model weights are persisted in OSS and registered in Bailian’s model catalog. Bailian provisions a serverless inference endpoint that exposes a REST API compatible with ES/OpenSearch neural plugins. At query time, ES performs initial BM25/vector retrieval, forwards top-N candidates to the Bailian endpoint for cross-encoder scoring, and reorders results before returning them to the client.

Prerequisites

Active PAI workspace with GPU quota (ml.gn7i-c8g1.2xlarge or higher)
Bailian API key with Model:Deploy and Inference:Invoke permissions
Elasticsearch/OpenSearch cluster (v7.10+/v2.0+) with neural-search plugin installed
OSS bucket configured for artifact storage and lifecycle management
Curated dataset of query-doc pairs with relevance labels (0/1 or 1–5 scale)

Common pitfalls

Response format mismatch: Bailian returns scores as {"output": {"scores": [...]}} while ES expects a flat array. Wrap the endpoint with a lightweight proxy or set response_parser: "dashscope_v2" in the pipeline config.
ES timeout on inference: Bailian cold starts or high concurrency can push latency >1s, triggering search_phase_execution_exception. Set timeout: "2s" in the pipeline and enable Bailian keep_alive: 300s.
Plugin version drift: Older neural-search releases lack custom header support. Upgrade to opensearch-neural-search:2.10+ and verify request_processors syntax.
Dimension/schema mismatch: Fine-tuned rerankers expect [query, doc] pairs. If ES sends raw text without pairing, scores default to 0.0. Ensure field_mapping in the pipeline explicitly binds query and passage fields.

Typical questions

fine-tune custom reranking model for elasticsearch
train embedding model and deploy to improve search
custom neural search pipeline
PAI model for ES relevance
train and deploy search relevance model
微调排序模型优化搜索
训练自定义嵌入模型提升ES搜索
自定义模型优化Elasticsearch排序

FAQ

Q: How do I fine-tune and deploy a custom search relevance model for Elasticsearch? A: You can implement this by fine-tuning a custom embedding or reranking model on PAI, deploying it as a managed inference endpoint on Bailian, and integrating it into Elasticsearch for neural reranking. This pipeline uses domain-specific search data to optimize your application's search result relevance.