A developer uses PAI to both fine-tune a domain-specific LLM and train custom embedding models, builds a vector retrieval pipeline with OpenSearch/Elasticsearch and OSS, then deploys the entire RAG system through Bailian where the custom LLM generates answers grounded in custom embeddings for maximum domain accuracy.
This integration is required when developers need a fully customized Retrieval-Augmented Generation (RAG) system that leverages proprietary domain data. By fine-tuning both a domain-specific LLM and custom embedding models on PAI, then orchestrating vector retrieval via OpenSearch/Elasticsearch and deploying the inference pipeline through Bailian, teams achieve maximum domain accuracy and controlled data residency.
aliyun pai CreateJob --JobName "domain-llm-sft" --AlgorithmSpec "qwen2.5-7b-sft" --DatasetUri "oss://<bucket>/sft_data.jsonl" --InstanceType "ecs.gn7i-c8g1.2xlarge".aliyun pai CreateJob --JobName "custom-emb-train" --AlgorithmSpec "text-embedding-v3" --DatasetUri "oss://<bucket>/emb_corpus.csv" --OutputUri "oss://<bucket>/models/emb_v1".aliyun bailian CreateModel --ModelName "domain-llm-v1" --ModelSource "PAI" --ModelId "<pai-job-id>" --EndpointType "managed".PUT /rag_vectors { "settings": { "index.knn": true }, "mappings": { "properties": { "embedding": { "type": "dense_vector", "dims": 1024, "method": { "name": "hnsw", "space_type": "cosine" } }, "chunk_text": { "type": "text" } } } }.POST /_bulk with {"index": {"_index": "rag_vectors"}} payloads.aliyun bailian CreateApplication --Name "DomainRAG" --Model "domain-llm-v1" --RetrievalConfig '{"vector_store": "opensearch", "endpoint": "<es-endpoint>", "index": "rag_vectors", "top_k": 5}'.aliyun bailian InvokeApplication --AppId "<app-id>" --Query "domain-specific-question" --Stream true.PAI acts as the training engine for both the generative LLM and embedding models. OSS serves as the centralized data lake for raw datasets, training artifacts, and chunked documents. OpenSearch/Elasticsearch hosts the dense vector indexes for low-latency k-NN retrieval. Bailian orchestrates the runtime RAG workflow: it intercepts user queries, executes hybrid retrieval against OpenSearch, injects top-k context into the prompt, and routes generation to the deployed Bailian LLM endpoint.
AliyunPAIFullAccess, AliyunBailianFullAccess, and AliyunOpenSearchFullAccess.ecs.gn7i series).dims parameter in the OpenSearch index must exactly match the output dimension of the PAI-trained embedding model (e.g., 1024 vs 768), otherwise vector ingestion fails.AliyunServiceRoleForBailian) to read OSS chunks; missing sts:AssumeRole breaks the ingestion pipeline.retrieval_config lacks temperature: 0.1 and strict system_prompt constraints, the LLM may hallucinate instead of strictly using retrieved context.Q: How do I build and deploy a fully custom RAG system using fine-tuned LLMs and custom embeddings? A: You build and deploy this system by using PAI to fine-tune a domain-specific LLM and train custom embedding models, then routing the entire pipeline through Bailian. The architecture pairs these trained components with a vector retrieval pipeline built on OpenSearch or Elasticsearch and OSS. This configuration ensures the custom LLM generates answers strictly grounded in your custom embeddings for maximum domain accuracy.