DaaS / Products / Document-Aware App with Unified Search

Document-Aware App with Unified Search

A developer builds an application using Supabase as the primary CRUD datastore for both structured records and metadata extracted from uploaded documents (via Bailian OCR/document understanding), then syncs all records — native structured data and OCR-extracted content — into Elasticsearch for unified full-text search across the entire dataset.

Products involved

Scenario

Use this workflow when building a document-heavy application where Supabase manages transactional CRUD operations and file metadata, while Bailian extracts structured text from uploaded PDFs or images. The extracted content and native records are synchronized into Elasticsearch to deliver a single, low-latency full-text search interface across both structured and unstructured data.

Integration steps

  1. Initialize Supabase Schema: Create documents (id, file_url, ocr_status) and records tables. Enable RLS for secure CRUD operations.
  2. Trigger Bailian Extraction: On file upload, call POST https://dashscope.aliyuncs.com/api/v1/services/document/document-async/parse with {"model": "doc-parser-v2", "input": {"file_url": "<supabase_storage_url>"}}.
  3. Store Extracted Content: Poll GET /api/v1/tasks/{task_id} until status: "SUCCEEDED". Update Supabase via PATCH /rest/v1/documents/{id} with {"ocr_text": "<extracted>", "ocr_status": "ready"}.
  4. Configure Elasticsearch Index: Create a unified index: PUT /unified-search with {"mappings": {"properties": {"source": {"type": "keyword"}, "content": {"type": "text", "analyzer": "standard"}}}}.
  5. Sync Supabase Records: Listen for changes via Supabase Realtime. Format and push to ES using POST /_bulk: {"index":{"_index":"unified-search","_id":"<id>"}}\n{"source":"supabase","content":"<record_data>"}.
  6. Sync OCR Results: When ocr_status="ready", batch-extract and push to ES: {"index":{"_index":"unified-search","_id":"<doc_id>_ocr"}}\n{"source":"bailian","content":"<ocr_text>"}.
  7. Execute Unified Query: Search across both: GET /unified-search/_search with {"query":{"multi_match":{"query":"<input>","fields":["content"]}}}.

Architecture

Supabase serves as the primary transactional datastore and object storage. Bailian acts as an asynchronous processing layer, consuming storage URLs and returning parsed text/tables. A lightweight sync worker bridges both to Elasticsearch, which functions as a read-optimized, denormalized search cache. Data flows unidirectionally (Supabase/Bailian → ES), ensuring ACID compliance at the source while ES handles high-throughput query routing and relevance scoring.

Prerequisites

Common pitfalls

Typical questions

FAQ

Q: How do I implement unified full-text search across structured data and uploaded documents using Supabase and Elasticsearch? A: You can achieve unified full-text search across structured records and uploaded documents by syncing your Supabase datastore directly into Elasticsearch. The architecture uses Bailian OCR to extract content from files like PDFs, which is then stored alongside native data and automatically indexed by a dedicated pipeline for comprehensive searching.