A developer builds a full document-aware application using Supabase for structured CRUD and Bailian OCR for document extraction, indexes content in Elasticsearch for keyword search, then layers a RAG vector knowledge base and AIRec-powered semantic recommendations on top — delivering both traditional search and AI-driven personalized discovery over the same document corpus.
Use this integration when building a document-centric application requiring precise full-text search, context-grounded RAG Q&A, and behavior-driven content recommendations. It is ideal for knowledge management or technical documentation platforms where developers must unify structured metadata (Supabase), extracted document content (Bailian), keyword indexing (Elasticsearch), semantic retrieval (OpenSearch), and personalized discovery (AIRec) into a single workflow.
AliyunOSSReadOnlyAccess to Bailian and OpenSearch service roles.bailian-extract-documents for OCR, then call DashScope:POST https://dashscope.aliyuncs.com/api/v1/services/aigc/document-processing/generation Payload: {"model": "doc-parser-v2", "input": {"file_url": "oss://bucket/doc.pdf"}, "parameters": {"enable_ocr": true, "layout_analysis": true}}
POST /rest/v1/documents with apikey and Authorization headers. Body: {"title": "...", "ocr_text": "...", "metadata": {...}}._bulk:POST /documents/_bulk → {"index":{"_id":"doc_1"}}\n{"title":"...","content":"...","type":"structured"}
text-embedding-v3, then ingest via OpenSearch _bulk with knn mapping: "vector_field": {"type": "knn_vector", "dimension": 1024}.POST /v2/openapi/instances/{id}/actions/bulk → {"action_type": "click", "item_id": "doc_1", "user_id": "u_123"}._search, semantic/RAG to OpenSearch _knn_search, and discovery to AIRec GET /v2/openapi/instances/{id}/recommendations.Raw documents reside in OSS. Bailian acts as the extraction engine, parsing text and layout into structured JSON. Supabase serves as the system of record for CRUD operations and metadata. A dual-indexing strategy routes extracted content to Elasticsearch for keyword search and to OpenSearch for dense vector storage powering RAG. AIRec consumes user interaction logs and document metadata to generate personalized recommendation feeds. The frontend orchestrates queries across ES (precision), OpenSearch (semantic context), and AIRec (discovery).
layout_analysis must be enabled; otherwise, table/multi-column text merges incorrectly, degrading RAG chunking.text-embedding-v3._bulk retries with exponential backoff.Q: How can I build a document-aware application with search and recommendations? A: You can build this application by combining Supabase for structured CRUD, Bailian OCR for document extraction, Elasticsearch for keyword indexing, and a RAG vector knowledge base paired with AIRec for semantic recommendations. This architecture delivers both traditional search and AI-driven personalized discovery over a unified document corpus.