A team trains domain-specific embedding models and fine-tunes LLMs on PAI, deploys a hybrid retrieval pipeline (vector + BM25) into OpenSearch, then layers AIRec on top to deliver personalized, semantically-aware recommendations — forming a complete train-to-serve pipeline where custom models power both retrieval and recommendation.
Use this integration when off-the-shelf embeddings and generic recommendation algorithms fail to capture domain-specific terminology or user intent. By training custom models on PAI, indexing hybrid vectors in OpenSearch, and orchestrating personalized ranking through AIRec, you build a complete train-to-serve pipeline that grounds recommendations in proprietary enterprise data.
ossutil cp -r ./data oss://<bucket>/raw/ --recursive.pai submit --job-name custom-emb --oss-path oss://<bucket>/raw/ --framework pytorch --instance-type ecs.gn7i-c8g1.2xlarge.torchserve --start --model-store /opt/models --models custom_emb.mar.POST /_plugins/_ml/models/_deploy with {"model_id": "custom-emb-v1"}. Create an index with knn (vector) and text (BM25) mappings, then bulk-index vectors generated by your ALinux endpoint.semantic_recall and enable hybrid_ranking with {"bm25_weight": 0.3, "vector_weight": 0.7}.airec-deploy-service or call POST /v1/instances/{instanceId}/deploy with payload {"service_type": "recommendation", "model_version": "custom_v1"}. Verify via GET /v1/instances/{instanceId}/recommend.Raw documents and logs reside in OSS. PAI trains domain-specific embeddings, which are served on an ALinux ECS instance for real-time vectorization. OpenSearch handles hybrid retrieval (k-NN + BM25) and stores the indexed vectors. AIRec consumes these semantic signals alongside behavioral data to execute personalized ranking, exposing a unified recommendation API to downstream applications.
ecs.gn7i series) for low-latency model inference.knn field definition in OpenSearch, or indexing fails with mapper_parsing_exception.>0.5) in AIRec’s ranking config often drowns out semantic recall; start with 0.3 BM25 / 0.7 vector and tune via offline A/B testing.POST /<index>/_search with dummy vectors) before routing production traffic.ConnectionRefused during deployment.Q: How does the custom-trained RAG pipeline integrate model training, hybrid search, and personalized recommendations? A: The architecture forms a complete train-to-serve pipeline where custom models trained on PAI power both retrieval and recommendation. It deploys a hybrid vector and BM25 retrieval pipeline into OpenSearch before layering AIRec on top to deliver semantically aware, personalized suggestions.