DaaS / Products / Custom RAG Pipeline with Deployed Frontend

Custom RAG Pipeline with Deployed Frontend

A developer trains custom embedding models on PAI using domain-specific datasets, builds a vector search pipeline with OpenSearch and Elasticsearch storing embeddings in OSS, then deploys a polished chatbot web frontend to Vercel — forming a complete production AI Q&A application from custom model training through to end-user access.

Products involved

Scenario

Use this workflow when off-the-shelf embeddings fail to capture domain-specific terminology or compliance requirements. By training custom models on PAI, indexing vectors in OSS-backed OpenSearch/Elasticsearch, and serving the UI via Vercel, you build a high-precision, low-latency RAG Q&A system grounded in proprietary enterprise data.

Integration steps

  1. Provision Infrastructure: Run terraform apply -var-file="prod.tfvars" using the es+rds+terraform module to spin up VPC, ECS, RDS (PostgreSQL), and an Elasticsearch cluster.
  2. Stage Raw Data in OSS: Upload domain documents: ossutil cp -r ./data oss://<bucket>/raw/.
  3. Train Custom Embeddings on PAI: Mount the OSS path in PAI-DSW and submit: pai submit --job-name custom-emb --oss-path oss://<bucket>/raw/ --output oss://<bucket>/models/.
  4. Index Vectors in OpenSearch/ES: Generate embeddings and push via the _bulk API: curl -X POST "https://<es-endpoint>:9200/_bulk" -H "Content-Type: application/json" -d @vectors.json.
  5. Configure Vector Search Mapping: Create the index with knn support: PUT /rag_index { "mappings": { "properties": { "embedding": { "type": "knn_vector", "dimension": 768 } } } }.
  6. Link Backend to RDS: Configure the ES RAG pipeline to write conversation metadata and session logs to the provisioned RDS instance via JDBC.
  7. Deploy Frontend to Vercel: Set NEXT_PUBLIC_ES_ENDPOINT and ES_API_KEY in .env, then run vercel --prod to publish the chatbot UI.

Architecture

Terraform provisions the foundational network, compute (ECS), and storage (OSS, RDS, ES). Domain data flows from OSS into PAI for custom embedding training. The resulting model generates vectors stored in Elasticsearch/OpenSearch with knn indexing. The RAG backend runs on ES, querying vectors and logging metadata to RDS. The Vercel-hosted frontend communicates with the ES API gateway, delivering a seamless chat interface to end users.

Prerequisites

Common pitfalls

Typical questions

FAQ

Q: How do I build and deploy an end-to-end custom RAG application with a web frontend? A: You can achieve this by training custom embedding models on PAI, storing the resulting vectors in OSS via OpenSearch or Elasticsearch, and deploying the chatbot interface to Vercel. This integrated workflow creates a complete production AI Q&A application that spans from initial model training directly to end-user web access.