DaaS / Products / AIRec with Custom Models and Semantic Search

AIRec with Custom Models and Semantic Search

Train and serve custom embedding/ranking models on Alibaba Cloud Linux instances, deploy those embeddings into OpenSearch for vector-based semantic retrieval, then layer AIRec on top to orchestrate a full recommendation pipeline that leverages both custom model inference and semantic search for higher-quality personalized recommendations.

Products involved

Scenario

Use this integration when standard recommendation algorithms lack domain-specific context or semantic understanding. By training custom embedding and ranking models on Alibaba Cloud Linux (ALinux) instances, indexing vectors in OpenSearch, and orchestrating the pipeline through AIRec, you achieve highly personalized, context-aware recommendations that combine semantic recall with fine-tuned ranking.

Integration steps

Deploy custom models on ALinux (alinux-deploy-model): Provision a GPU-enabled ECS instance and serve your PyTorch/ONNX model via FastAPI:

``bash aliyun ecs RunCommand --InstanceId i-uf6xxxx --CommandContent "pip install fastapi uvicorn torch && python serve_model.py --port 8080" ``

Expose inference endpoint: Verify VPC connectivity to http://<alinux-vpc-ip>:8080/infer (embedding) and /rank (ranking).
Configure OpenSearch vector index (opensearch-deploy-model): Create an index with dense_vector mapping matching your model’s output dimension:

``json PUT /airec_semantic_index { "mappings": { "properties": { "item_id": { "type": "keyword" }, "embedding": { "type": "dense_vector", "dims": 768, "index": true, "similarity": "cosine" } } } } ``

Ingest embeddings: Batch-encode catalog items via the ALinux endpoint and push to OpenSearch:

``bash curl -X POST "http://<opensearch-host>:9200/airec_semantic_index/_bulk" -H "Content-Type: application/json" -d @batch_vectors.json ``

Provision AIRec service (airec-deploy-service): Initialize the recommendation instance:

``bash aliyun airec CreateInstance --InstanceType standard --RegionId cn-hangzhou --Name "custom-semantic-rec" ``

Wire data sources: In AIRec, configure OpenSearch as the semantic recall source and register the ALinux /rank endpoint as a custom ranking service via aliyun airec UpdateDataSource --InstanceId <id> --Config '{"recall":"opensearch","rank":"http://<alinux-ip>:8080/rank"}'.
Orchestrate pipeline: Define the execution flow: SemanticRecall (OpenSearch) -> CustomRank (ALinux) -> DiversityFilter -> Serve. Validate with aliyun airec CreatePipeline --InstanceId <id> --Config pipeline_v1.json.

Architecture

Item metadata flows through the ALinux-hosted embedding model, generating vectors stored in OpenSearch. At query time, AIRec triggers semantic recall against OpenSearch, retrieves candidate items, and passes them to the ALinux ranking endpoint for personalized scoring. AIRec handles business logic, diversity filtering, and final response formatting, while ALinux and OpenSearch handle heavy ML inference and vector search respectively.

Prerequisites

Active Alibaba Cloud account with RAM roles: AliyunAIRECFullAccess, AliyunOpenSearchFullAccess, AliyunECSFullAccess
Pre-trained embedding/ranking models exported to ONNX or TorchScript
Provisioned ALinux ECS instance (GPU recommended) with VPC peering to OpenSearch and AIRec
OpenSearch instance with vector search plugin enabled
AIRec instance with schema aligned to your item/user behavior data

Common pitfalls

Dimension mismatch: OpenSearch dims must exactly match your model’s output vector size; otherwise, _bulk ingestion fails with mapper_parsing_exception.
Cross-VPC latency: Ensure ALinux, OpenSearch, and AIRec share the same VPC or use Express Connect; public routing adds >50ms latency, breaking real-time ranking SLAs.
AIRec schema drift: AIRec requires strict item_id, user_id, and behavior fields; missing or malformed fields cause pipeline execution to silently drop candidates.
ALinux scaling bottlenecks: Without horizontal pod autoscaling or batch inference optimization, the ALinux endpoint becomes a throughput bottleneck during traffic spikes.

Typical questions

deploy recommendation with custom embeddings and semantic search
AIRec with OpenSearch and custom model serving
full stack recommendation pipeline with vector search
serve embedding models and feed into AIRec via OpenSearch
部署自定义模型语义搜索推荐系统
AIRec配合OpenSearch和自定义嵌入模型
端到端推荐系统加向量检索
custom ranking model with semantic recommendation

FAQ

Q: How do I deploy a full-stack recommendation pipeline with custom models and semantic search? A: You can deploy this architecture by training and serving custom embedding or ranking models on Alibaba Cloud Linux instances, indexing those embeddings in OpenSearch for vector-based retrieval, and layering AIRec on top to orchestrate the complete recommendation pipeline. This cross-product combination integrates Airec, OpenSearch, and Alibaba Cloud Linux to deliver higher-quality personalized recommendations through unified custom model inference and semantic search.