Train domain-specific embedding models on PAI and store optimized vector indexes in OSS (Skill 1), then feed those custom embeddings into a fully custom RAG pipeline where a fine-tuned LLM deployed on Bailian generates answers grounded in the vector-retrieved context (Skill 3), creating a complete train-to-inference loop with both custom embeddings and a custom generative model.
Use this workflow when generic embeddings and foundation models underperform on proprietary domain data, requiring a fully customized RAG pipeline. By training both the embedding model and generative LLM on PAI and orchestrating them through OSS, Elasticsearch/OpenSearch, and Bailian, you achieve high-precision semantic retrieval and domain-accurate generation in a single train-to-inference loop.
ossutil cp -r ./data oss://<bucket>/raw-docs/pai submit --workspace <ws-id> --job-name emb-train --framework pytorch --script train_emb.py --data oss://<bucket>/raw-docs/ --output oss://<bucket>/models/emb-v1/pai submit --job-name llm-ft --framework deepspeed --script train_llm.py --base-model qwen-7b --data oss://<bucket>/qa-pairs/ --output oss://<bucket>/models/llm-ft/POST https://dashscope.aliyuncs.com/api/v1/models with payload {"model_name": "custom-llm-v1", "model_path": "oss://<bucket>/models/llm-ft/"}oss://<bucket>/vector-indexes/.PUT /rag-index {"mappings": {"properties": {"embedding": {"type": "dense_vector", "dims": 768}, "text": {"type": "text"}}}}POST /rag-index/_search {"knn": {"field": "embedding", "query_vector": [...], "k": 5}}, then route chunks to Bailian: POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions with {"model": "custom-llm-v1", "messages": [{"role": "user", "content": f"Context: {chunks}\nQuestion: {query}"}]}Raw documents reside in OSS as the immutable source of truth. PAI consumes this data to train the embedding model (retrieval layer) and generative LLM (reasoning layer). The embedding model outputs high-dimensional vectors stored in OSS and indexed in Elasticsearch/OpenSearch for low-latency knn search. At runtime, the application queries the vector index, retrieves top-k context chunks, and routes them alongside the user prompt to the Bailian-deployed LLM endpoint, closing the end-to-end RAG loop.
ecs.gn7i-c8g1.2xlarge)ossutil, pai-cli, and Python SDK installed locallydims must exactly match PAI embedding output (e.g., 768 vs 1024), otherwise indexing fails silently or throws mapping errors.knn search; skipping this degrades cosine similarity accuracy.knn without tuning BM25 rank_feature in ES causes keyword noise to dominate semantic results.Q: How do I build an end-to-end RAG pipeline using custom-trained embeddings and a fine-tuned LLM? A: You can create a complete train-to-inference RAG loop by training domain-specific embedding models on PAI, storing the optimized vector indexes in OSS, and feeding them into a pipeline where a fine-tuned LLM deployed on Bailian generates answers grounded in the vector-retrieved context. This workflow is implemented through predefined integration combinations such as the Full Custom RAG and Full-Stack Custom RAG skills.