DaaS / Products / Cloud Migration with AI Search and Recommendations

Cloud Migration with AI Search and Recommendations

A team migrates legacy databases to Alibaba Cloud using ECS snapshots staged through OSS into RDS, then builds a document-aware application layering Bailian OCR extraction, Elasticsearch indexing, RAG-powered semantic search, and AIRec personalized recommendations on the newly migrated data.

Products involved

Scenario

Use this workflow when modernizing legacy on-premises databases to Alibaba Cloud while simultaneously building a document-centric application. It enables teams to securely migrate relational data via ECS snapshots and OSS staging into managed RDS, then layer Bailian OCR, Elasticsearch full-text search, RAG semantic retrieval, and AIRec personalization on top of the unified dataset.

Integration steps

  1. Snapshot & Stage Legacy Data: Export the legacy ECS database to OSS.
  2. ``bash aliyun ecs CreateSnapshot --InstanceId i-uf612345 --SnapshotName legacy_db_snap aliyun ecs ExportImage --ImageId img-legacy --OSSBucket migration-staging --OSSPrefix db-dump/ ``

  3. Import to RDS: Restore the staged dump into a new RDS instance.
  4. ``bash aliyun rds ImportDataFromOSS --DBInstanceId rm-uf6789 --OSSBucket migration-staging --FileName db-dump/full.sql --Engine PostgreSQL ``

  5. Extract Documents with Bailian: Trigger async OCR on uploaded files.
  6. ``bash curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/document-analysis/async \ -H "Authorization: Bearer $BAILIAN_KEY" \ -d '{"model": "doc-ocr-v2", "input": {"url": "oss://app-docs/invoice_001.pdf"}, "parameters": {"ocr_type": "general"}}' ``

  7. Index in Elasticsearch/OpenSearch: Sync RDS metadata and Bailian JSON output using the _bulk API. Define hybrid mapping for keyword + vector search.
  8. ``json PUT /docs_index { "mappings": { "properties": { "content": { "type": "text" }, "embedding": { "type": "dense_vector", "dims": 768 } } } } ``

  9. Enable RAG Semantic Search: Query the vector field with knn search.
  10. ``json POST /docs_index/_search { "knn": { "field": "embedding", "query_vector": [0.12, -0.45, ...], "k": 5, "num_candidates": 10 } } ``

  11. Configure AIRec Recommendations: Push document metadata and user telemetry to AIRec.
  12. ``bash curl -X POST https://airec.cn-shanghai.aliyuncs.com/v2/openapi/instances/$INSTANCE_ID/data \ -H "Authorization: Bearer $AIREC_KEY" -d @doc_behavior_payload.json ``

Architecture

Legacy data flows from ECS → OSS → RDS for structured persistence. The application reads RDS records, routes files to Bailian for OCR extraction, and pipes both metadata and extracted text into Elasticsearch. ES handles hybrid keyword/vector retrieval, while AIRec ingests behavioral events and document attributes to serve personalized recommendations. All services communicate over a shared VPC with strict security group isolation.

Prerequisites

Common pitfalls

Typical questions

FAQ

Q: How do I migrate legacy data to build a document-aware AI search and recommendation system? A: You can migrate legacy databases to Alibaba Cloud by staging ECS snapshots through OSS into RDS, then building a document-aware application that layers Bailian OCR extraction, Elasticsearch indexing, RAG-powered semantic search, and AIRec personalized recommendations. This architecture supports full-stack modernization with unified search, hybrid search, and secure OCR-enhanced enterprise features.