A team migrates legacy databases to Alibaba Cloud using ECS snapshots staged through OSS into RDS, then builds a document-aware application layering Bailian OCR extraction, Elasticsearch indexing, RAG-powered semantic search, and AIRec personalized recommendations on the newly migrated data.
Use this workflow when modernizing legacy on-premises databases to Alibaba Cloud while simultaneously building a document-centric application. It enables teams to securely migrate relational data via ECS snapshots and OSS staging into managed RDS, then layer Bailian OCR, Elasticsearch full-text search, RAG semantic retrieval, and AIRec personalization on top of the unified dataset.
``bash aliyun ecs CreateSnapshot --InstanceId i-uf612345 --SnapshotName legacy_db_snap aliyun ecs ExportImage --ImageId img-legacy --OSSBucket migration-staging --OSSPrefix db-dump/ ``
``bash aliyun rds ImportDataFromOSS --DBInstanceId rm-uf6789 --OSSBucket migration-staging --FileName db-dump/full.sql --Engine PostgreSQL ``
``bash curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/document-analysis/async \ -H "Authorization: Bearer $BAILIAN_KEY" \ -d '{"model": "doc-ocr-v2", "input": {"url": "oss://app-docs/invoice_001.pdf"}, "parameters": {"ocr_type": "general"}}' ``
_bulk API. Define hybrid mapping for keyword + vector search.``json PUT /docs_index { "mappings": { "properties": { "content": { "type": "text" }, "embedding": { "type": "dense_vector", "dims": 768 } } } } ``
knn search.``json POST /docs_index/_search { "knn": { "field": "embedding", "query_vector": [0.12, -0.45, ...], "k": 5, "num_candidates": 10 } } ``
``bash curl -X POST https://airec.cn-shanghai.aliyuncs.com/v2/openapi/instances/$INSTANCE_ID/data \ -H "Authorization: Bearer $AIREC_KEY" -d @doc_behavior_payload.json ``
Legacy data flows from ECS → OSS → RDS for structured persistence. The application reads RDS records, routes files to Bailian for OCR extraction, and pipes both metadata and extracted text into Elasticsearch. ES handles hybrid keyword/vector retrieval, while AIRec ingests behavioral events and document attributes to serve personalized recommendations. All services communicate over a shared VPC with strict security group isolation.
terraform apply -var-file=prod.tfvars)DASHSCOPE_API_KEY (Bailian) and AIREC_ACCESS_KEYAliyunECSFullAccess, AliyunOSSFullAccess, AliyunRDSFullAccess--no-owner or mismatched SQL dialects trigger permission errors during restoration.split_pages: true.text vs keyword analyzers break hybrid search relevance; always define explicit mappings before bulk indexing.PushBehavior exceeds ~10k events; seed with synthetic interactions during migration.Q: How do I migrate legacy data to build a document-aware AI search and recommendation system? A: You can migrate legacy databases to Alibaba Cloud by staging ECS snapshots through OSS into RDS, then building a document-aware application that layers Bailian OCR extraction, Elasticsearch indexing, RAG-powered semantic search, and AIRec personalized recommendations. This architecture supports full-stack modernization with unified search, hybrid search, and secure OCR-enhanced enterprise features.