DaaS / Products / AI Recommendation Platform with RAG Explanations

AI Recommendation Platform with RAG Explanations

Deploy a custom inference model on Alibaba Cloud Linux with OpenSearch vector retrieval and AIRec orchestration for recommendations, while integrating a Bailian-hosted LLM to generate natural-language explanations and contextual answers for each recommended item — creating a recommendation system that not only ranks candidates but also tells users why each result is relevant.

Products involved

Scenario

Use this workflow when building a next-generation recommendation engine that requires both high-precision vector recall and transparent, context-aware explanations. It is ideal for e-commerce, content platforms, or enterprise search where users need to understand why an item was recommended, not just see a ranked list.

Integration steps

Deploy custom ranking model on ALinux: Use the alinux-deploy-model intent to containerize and serve your PyTorch/ONNX ranking model.

``bash alinux-cli deploy --model-path ./ranking-model.onnx --port 8080 --instance-type ecs.g7.xlarge ``

Deploy embedding model in OpenSearch: Trigger opensearch-deploy-model to register a vector encoder for candidate retrieval.

``http POST /_plugins/_ml/models/_deploy {"model_id": "text-embedding-v3", "name": "opensearch-deploy-model", "framework_type": "sentence_transformers"} ``

Index candidates with vectors: Ingest item metadata and embeddings into an OpenSearch index with explicit dimension mapping.

``http PUT /rec_candidates_v1 {"mappings": {"properties": {"embedding": {"type": "knn_vector", "dims": 768, "method": {"name": "hnsw"}}}}} ``

Configure AIRec orchestration: Point AIRec to OpenSearch as the recall source and ALinux as the ranking endpoint.

``http POST https://airec.cn-shanghai.aliyuncs.com/v2/openapi/instances/{instanceId}/actions/import {"source": "opensearch", "endpoint": "<os-host>", "index": "rec_candidates_v1", "ranker_url": "http://<alinux-ip>:8080/predict"} ``

Deploy Bailian LLM endpoint: Use bailian-deploy-model to provision a fine-tuned Qwen endpoint for explanation generation.

``bash bailian-cli model deploy --model-id qwen-plus --endpoint-name rag-explainer --max-tokens 512 --temperature 0.3 ``

Wire post-recommendation RAG hook: Configure AIRec’s callback_url to forward top-N results to a middleware that queries OpenSearch for context, then calls Bailian:

``http POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions {"model": "rag-explainer", "messages": [{"role": "user", "content": "Explain why this matches: {item_context}"}]} ``

Architecture

User requests hit AIRec, which triggers OpenSearch for dense vector recall. OpenSearch returns a candidate pool, which is scored by the custom model hosted on ALinux. AIRec ranks and returns the top-N items. A synchronous webhook then fetches item metadata from OpenSearch, constructs a prompt, and sends it to the Bailian LLM. The LLM generates grounded, natural-language explanations that are merged with the ranked list before final delivery.

Prerequisites

Active Alibaba Cloud account with AIRec, OpenSearch, ECS (ALinux), and Bailian enabled
Pre-trained ranking model (ONNX/PyTorch) and compatible embedding model
OpenSearch cluster with _plugins/_ml enabled and ≥16GB RAM for HNSW indexing
Bailian API key with dashscope permissions and deployed Qwen/fine-tuned endpoint
VPC peering or public endpoints connecting AIRec, OpenSearch, ALinux, and Bailian

Common pitfalls

Vector dimension mismatch: OpenSearch index dims must exactly match the embedding output. Mismatch causes silent knn query failures.
AIRec callback timeout: Bailian generation often exceeds AIRec’s default 3s webhook timeout. Implement async processing or increase timeout_ms in AIRec routing config.
Context window overflow: Passing full item descriptions to Bailian triggers max_tokens errors. Use OpenSearch source_includes to limit payload to essential fields.
Cold-start latency: ALinux model serving requires warm-up. Pre-warm instances or use PAI-EAS auto-scaling to avoid 30s+ initial ranking delays.

Typical questions

recommendation system with explanations
AI推荐加智能问答
recommend plus RAG chatbot
deploy recommendation and QA together
推荐系统加生成式解释
vector retrieval plus ranking plus generation
AIRec with LLM explanations
推荐结果自然语言解释

FAQ

Q: How do I deploy a recommendation system that integrates RAG or AI Q&A to generate explanations? A: You can deploy this solution by combining AIRec for recommendation orchestration, OpenSearch for vector retrieval, and a Bailian-hosted LLM to generate natural-language explanations and contextual answers for each ranked item. This architecture runs on Alibaba Cloud Linux and creates a pipeline that not only ranks candidates but also tells users why each result is relevant.