DaaS / Products / AI Recommendation Platform with RAG Explanations

AI Recommendation Platform with RAG Explanations

Deploy a custom inference model on Alibaba Cloud Linux with OpenSearch vector retrieval and AIRec orchestration for recommendations, while integrating a Bailian-hosted LLM to generate natural-language explanations and contextual answers for each recommended item — creating a recommendation system that not only ranks candidates but also tells users why each result is relevant.

Products involved

Scenario

Use this workflow when building a next-generation recommendation engine that requires both high-precision vector recall and transparent, context-aware explanations. It is ideal for e-commerce, content platforms, or enterprise search where users need to understand why an item was recommended, not just see a ranked list.

Integration steps

  1. Deploy custom ranking model on ALinux: Use the alinux-deploy-model intent to containerize and serve your PyTorch/ONNX ranking model.
  2. ``bash alinux-cli deploy --model-path ./ranking-model.onnx --port 8080 --instance-type ecs.g7.xlarge ``

  3. Deploy embedding model in OpenSearch: Trigger opensearch-deploy-model to register a vector encoder for candidate retrieval.
  4. ``http POST /_plugins/_ml/models/_deploy {"model_id": "text-embedding-v3", "name": "opensearch-deploy-model", "framework_type": "sentence_transformers"} ``

  5. Index candidates with vectors: Ingest item metadata and embeddings into an OpenSearch index with explicit dimension mapping.
  6. ``http PUT /rec_candidates_v1 {"mappings": {"properties": {"embedding": {"type": "knn_vector", "dims": 768, "method": {"name": "hnsw"}}}}} ``

  7. Configure AIRec orchestration: Point AIRec to OpenSearch as the recall source and ALinux as the ranking endpoint.
  8. ``http POST https://airec.cn-shanghai.aliyuncs.com/v2/openapi/instances/{instanceId}/actions/import {"source": "opensearch", "endpoint": "<os-host>", "index": "rec_candidates_v1", "ranker_url": "http://<alinux-ip>:8080/predict"} ``

  9. Deploy Bailian LLM endpoint: Use bailian-deploy-model to provision a fine-tuned Qwen endpoint for explanation generation.
  10. ``bash bailian-cli model deploy --model-id qwen-plus --endpoint-name rag-explainer --max-tokens 512 --temperature 0.3 ``

  11. Wire post-recommendation RAG hook: Configure AIRec’s callback_url to forward top-N results to a middleware that queries OpenSearch for context, then calls Bailian:
  12. ``http POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions {"model": "rag-explainer", "messages": [{"role": "user", "content": "Explain why this matches: {item_context}"}]} ``

Architecture

User requests hit AIRec, which triggers OpenSearch for dense vector recall. OpenSearch returns a candidate pool, which is scored by the custom model hosted on ALinux. AIRec ranks and returns the top-N items. A synchronous webhook then fetches item metadata from OpenSearch, constructs a prompt, and sends it to the Bailian LLM. The LLM generates grounded, natural-language explanations that are merged with the ranked list before final delivery.

Prerequisites

Common pitfalls

Typical questions

FAQ

Q: How do I deploy a recommendation system that integrates RAG or AI Q&A to generate explanations? A: You can deploy this solution by combining AIRec for recommendation orchestration, OpenSearch for vector retrieval, and a Bailian-hosted LLM to generate natural-language explanations and contextual answers for each ranked item. This architecture runs on Alibaba Cloud Linux and creates a pipeline that not only ranks candidates but also tells users why each result is relevant.