A team trains domain-specific embedding models and fine-tunes LLMs on PAI using curated datasets, deploys a production-grade hybrid retrieval pipeline (custom vector + BM25 with reranking) into OpenSearch/Elasticsearch, then layers AIRec on top to deliver personalized, semantically-rich search results — combining the full production RAG hardening (fine-tuning, reranking, deployment best practices) with a recommendation engine that tailors results per user.
Use this workflow when off-the-shelf embeddings fail to capture domain-specific terminology and standard search lacks user-level personalization. It combines PAI-trained custom embeddings and fine-tuned LLMs with OpenSearch/Elasticsearch hybrid retrieval (vector + BM25 + reranking), then layers AIRec to dynamically rank results based on real-time user behavior and historical preferences.
ossutil cp -r ./domain_data oss://<bucket>/rag-corpus/ --include ".pdf,.json".pai submit --job-name custom-emb --image registry.cn-hangzhou.aliyuncs.com/pai/nlp-emb:latest --data oss://<bucket>/rag-corpus/ --output oss://<bucket>/models/emb-v1/ --config '{"dim": 768, "batch_size": 32}'.bailian model finetune --base qwen-turbo --dataset oss://<bucket>/rag-corpus/instructions.jsonl --output oss://<bucket>/models/llm-ft/.knn and text mappings. Bulk-ingest vectors: curl -X POST "https://<opensearch-endpoint>/rag-index/_bulk" -H "Content-Type: application/json" -d @vectors.json.script_score: {"query": {"script_score": {"query": {"match": {"content": "$query"}}, "script": {"source": "cosineSimilarity(params.query_vector, 'embedding') + 0.5 * _score"}}}}. Pipe top-50 results to Bailian reranker: POST https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank.POST https://<airec-endpoint>/v2/openapi/instances/<instance-id>/actions/bulk. Configure a ranking strategy that blends semantic relevance with AIRec’s CTR prediction.terraform apply to deploy ALinux/ECS inference nodes behind Cloudflare Workers for edge caching and JWT validation.Raw documents and user telemetry land in OSS. PAI consumes the corpus to train domain-specific embeddings and fine-tune the LLM. Resulting vectors and metadata are bulk-indexed into OpenSearch/Elasticsearch, which executes hybrid retrieval and applies a Bailian reranker. AIRec ingests real-time signals and dynamically re-ranks candidates. The final response is served through ALinux/ECS endpoints, cached and secured at the edge via Cloudflare.
ecs.gn7i-c8g1.2xlarge)knn plugin enabled)AliyunPAIFullAccess, AliyunOSSFullAccess, AliyunOpenSearchFullAccessalicloud provider v1.210+knn mapping; otherwise, ingestion fails or returns zero scores.script_score weights using offline A/B testing.index.knn.memory_limit to ≤ 50% of JVM heap and disable doc_values on unused fields.