DaaS / Products / Document-Aware App with RAG and Recommendations

Document-Aware App with RAG and Recommendations

A developer builds a full document-aware application using Supabase for structured CRUD and Bailian OCR for document extraction, indexes content in Elasticsearch for keyword search, then layers a RAG vector knowledge base and AIRec-powered semantic recommendations on top — delivering both traditional search and AI-driven personalized discovery over the same document corpus.

Products involved

Scenario

Use this integration when building a document-centric application requiring precise full-text search, context-grounded RAG Q&A, and behavior-driven content recommendations. It is ideal for knowledge management or technical documentation platforms where developers must unify structured metadata (Supabase), extracted document content (Bailian), keyword indexing (Elasticsearch), semantic retrieval (OpenSearch), and personalized discovery (AIRec) into a single workflow.

Integration steps

  1. Upload to OSS: Push raw PDFs/images to an OSS bucket. Attach AliyunOSSReadOnlyAccess to Bailian and OpenSearch service roles.
  2. Extract via Bailian: Route to bailian-extract-documents for OCR, then call DashScope:
  3. POST https://dashscope.aliyuncs.com/api/v1/services/aigc/document-processing/generation Payload: {"model": "doc-parser-v2", "input": {"file_url": "oss://bucket/doc.pdf"}, "parameters": {"enable_ocr": true, "layout_analysis": true}}

  4. Persist in Supabase: Insert extracted JSON: POST /rest/v1/documents with apikey and Authorization headers. Body: {"title": "...", "ocr_text": "...", "metadata": {...}}.
  5. Index in Elasticsearch: Sync for BM25 search via _bulk:
  6. POST /documents/_bulk{"index":{"_id":"doc_1"}}\n{"title":"...","content":"...","type":"structured"}

  7. Vectorize & Index in OpenSearch: Generate embeddings using text-embedding-v3, then ingest via OpenSearch _bulk with knn mapping: "vector_field": {"type": "knn_vector", "dimension": 1024}.
  8. Configure AIRec: Initialize instance. Push interactions: POST /v2/openapi/instances/{id}/actions/bulk{"action_type": "click", "item_id": "doc_1", "user_id": "u_123"}.
  9. Query Pipeline: Route exact matches to ES _search, semantic/RAG to OpenSearch _knn_search, and discovery to AIRec GET /v2/openapi/instances/{id}/recommendations.

Architecture

Raw documents reside in OSS. Bailian acts as the extraction engine, parsing text and layout into structured JSON. Supabase serves as the system of record for CRUD operations and metadata. A dual-indexing strategy routes extracted content to Elasticsearch for keyword search and to OpenSearch for dense vector storage powering RAG. AIRec consumes user interaction logs and document metadata to generate personalized recommendation feeds. The frontend orchestrates queries across ES (precision), OpenSearch (semantic context), and AIRec (discovery).

Prerequisites

Common pitfalls

Typical questions

FAQ

Q: How can I build a document-aware application with search and recommendations? A: You can build this application by combining Supabase for structured CRUD, Bailian OCR for document extraction, Elasticsearch for keyword indexing, and a RAG vector knowledge base paired with AIRec for semantic recommendations. This architecture delivers both traditional search and AI-driven personalized discovery over a unified document corpus.