A developer builds a full-stack application using Supabase/RDS as the primary datastore for structured records and document metadata, uses Bailian OCR to extract content from uploaded PDFs stored in OSS, indexes everything into Elasticsearch for unified full-text search, layers OpenSearch for semantic/RAG retrieval, and adds AIRec to deliver personalized document and content recommendations based on user behavior — creating a complete document intelligence platform from ingestion to personalization.
Use this combination when building a document intelligence platform requiring end-to-end processing: ingesting raw PDFs, extracting structured text, enabling hybrid full-text and semantic search, and delivering behavior-driven personalized recommendations. Ideal for knowledge bases, legal tech, or enterprise portals where contextual discovery drives user engagement.
ossutil cp ./docs/ oss://my-doc-bucket/raw/ --recursive.POST https://dashscope.aliyuncs.com/api/v1/services/aigc/document-parse/async with {"model": "doc-parser-v1", "input": {"file_url": "oss://my-doc-bucket/raw/report.pdf"}}. Poll /api/v1/tasks/{task_id} until status: "SUCCEEDED".POST https://<proj>.supabase.co/rest/v1/documents with headers apikey: <key> and body {"title": "...", "ocr_text": "..."}.es-ingest-documents pipeline. Logstash config: input { http_poller { urls => { supabase => "https://<proj>.supabase.co/rest/v1/documents" } } } output { elasticsearch { hosts => ["https://<es-endpoint>:9200"] index => "docs-index" } }.POST https://<os-endpoint>/docs-index/_doc/{id} with {"vector_field": <1024-dim-array>, "text": "..."}. Set mapping: "knn": true, "knn.algo_param.ef_search": 100.POST https://airec.aliyuncs.com/v2/openapi/instances/{inst}/items with {"itemId": "...", "itemType": "document"}. Log interactions: POST .../behaviors with {"behaviorType": "click", "itemId": "..."}.GET https://airec.aliyuncs.com/v2/openapi/instances/{inst}/recommendations?userId={uid}&size=10. Merge with ES/OpenSearch hybrid queries for final UI rendering.Raw files reside in OSS. Bailian OCR asynchronously extracts text/metadata, which Supabase persists as structured records. A sync pipeline pushes this data to Elasticsearch for BM25 full-text indexing. OpenSearch consumes the same corpus to generate and store dense vector embeddings for RAG queries. AIRec operates in parallel, ingesting item metadata and real-time user behavior logs to train a ranking model. The app orchestrates queries across ES (keyword), OpenSearch (semantic), and AIRec (personalized) to deliver unified results.
itemId, userId, and behaviorType fields.IllegalArgumentException.Q: How does the Document-Aware App with AI Recommendations architecture handle document ingestion, search, and personalization? A: The architecture integrates Supabase or RDS for metadata, Bailian OCR for PDF extraction, Elasticsearch and OpenSearch for full-text and semantic search, and AIRec for personalized recommendations. Uploaded PDFs are stored in OSS and processed by Bailian OCR before indexing into Elasticsearch for unified search. OpenSearch then layers semantic retrieval on top, while AIRec delivers personalized suggestions based on user behavior to complete the pipeline.