DaaS / Products / RAG Pipeline with Retrieval and Generation

RAG Pipeline with Retrieval and Generation

Deploy an embedding model in OpenSearch to power vector-based document retrieval, then deploy a fine-tuned LLM on Bailian as the generation endpoint — together forming a Retrieval-Augmented Generation (RAG) application that retrieves relevant context and produces grounded responses.

Products involved

Scenario

Deploy an embedding model in OpenSearch to power vector-based document retrieval, then deploy a fine-tuned LLM on Bailian as the generation endpoint — together forming a Retrieval-Augmented Generation (RAG) application that retrieves relevant context and produces grounded responses.

How the products combine

  1. opensearch · opensearch-deploy-model — OpenSearch — Deploy embedding model for inference
  2. See opensearch/opensearch-deploy-model.

  3. bailian · bailian-deploy-model — — Deploy custom or fine-tuned AI models as endpoints
  4. See bailian/bailian-deploy-model.

Typical questions

FAQ

Q: How do I build and deploy a RAG system that combines vector search with an LLM? A: You build and deploy a RAG system by combining OpenSearch and Bailian to handle retrieval and generation respectively. First, you deploy an embedding model in OpenSearch to power vector-based document retrieval. Next, you deploy a fine-tuned LLM on Bailian as the generation endpoint to produce grounded responses from the retrieved context.