A developer builds a full-stack application using Supabase as the primary CRUD datastore, extracts text from uploaded documents (PDFs, scanned images) via Bailian's document understanding, indexes content into Elasticsearch for unified full-text search, builds a RAG knowledge base for conversational retrieval, and layers AIRec-powered semantic recommendations on top — delivering an intelligent document management platform with personalized search, chatbot Q&A, and content suggestions.
Use this integration when building an intelligent document management platform that requires automated OCR extraction, unified full-text/vector search, conversational RAG Q&A, and personalized content discovery. It bridges raw unstructured files with a Supabase-backed CRUD layer, Elasticsearch indexing, and AIRec-driven semantic recommendations.
POST https://dashscope.aliyuncs.com/api/v1/services/document-understanding/async/process with {"oss_uri": "oss://<bucket>/<file>", "task_type": "ocr_layout_analysis"}.status: SUCCEEDED, insert structured output via supabase.from('documents').insert({ id: uuid, title, extracted_text, oss_url })._bulk. Define mapping: {"mappings": {"properties": {"content": {"type": "text", "analyzer": "ik_max_word"}, "embedding": {"type": "dense_vector", "dims": 1024}}}}.extracted_text and call Bailian POST /v1/services/embedding/text-embedding/v2 with {"model": "text-embedding-v2", "input": {"texts": ["<chunk>"]}}. Update ES embedding field via POST /_update/<index>/_doc/<id>.{"query": {"knn": {"field": "embedding", "query_vector": <vec>, "k": 5, "filter": {"match": {"content": "<query>"}}}}}. Inject top hits into LLM context.POST https://airec.cn-shanghai.aliyuncs.com/v2/openapi/instances/<id>/actions with {"action_type": "click", "item_id": "<doc_id>", "user_id": "<uid>"}. Train a semantic rule targeting document_id to surface related files.Raw files land in OSS and trigger Bailian for async OCR/layout extraction. Extracted text and metadata persist in Supabase as the system of record. A background worker chunks text, generates dense vectors via Bailian, and pushes both to Elasticsearch. The RAG layer queries ES using hybrid BM25+KNN retrieval, while AIRec ingests real-time interaction logs to train a personalized recommendation model that surfaces contextually relevant documents alongside search results.
documents and user_interactions tablesDASHSCOPE_API_KEY and AIREC_ACCESS_KEY configured in environmentik analyzer and dense_vector field support enabled@supabase/supabase-js, elasticsearch, and dashscope SDKstext-embedding-v2 outputs 1024 dims. If ES mapping uses 768, indexing fails. Explicitly align dims in the index template.oss:GetObject rights, causing extraction timeouts. Attach AliyunOSSReadOnlyAccess to the service role.overlap=128 and sentence-boundary splitting.Q: How can I build a full-stack document application with PDF extraction, search, RAG chatbot, and AI recommendations? A: You can build this application by using Supabase as the primary datastore, Bailian for extracting text from PDFs, Elasticsearch for indexing and search, and AIRec for semantic recommendations. This combination delivers an intelligent platform featuring unified full-text search, a RAG-based chatbot for conversational Q&A, and personalized content suggestions.