DaaS / Products / AI-Driven Search Knowledge Platform

AI-Driven Search Knowledge Platform

Bailian AI agents autonomously create and curate knowledge content in Notion (secured via IDaaS M2M auth), while a parallel pipeline ingests that content into Elasticsearch with PAI-trained ranking models — forming a self-maintaining enterprise search platform where the AI both produces the knowledge base and optimizes its discoverability, all gated behind IDaaS end-user authentication and served via a Vercel frontend on Alinux with Cloudflare CDN.

Products involved

Scenario

Use this pipeline when building a self-maintaining enterprise search platform where an autonomous Bailian AI agent generates and curates knowledge in Notion, while a parallel ML pipeline indexes that content into Elasticsearch with PAI-optimized ranking. Ideal for teams requiring zero-touch content updates, secure M2M orchestration, and globally distributed, identity-gated search delivery.

Integration steps

  1. Provision Infrastructure: Run terraform apply -var="ecs_instance_type=ecs.c6.xlarge" -var="oss_bucket=search-artifacts" to deploy ECS nodes for Elasticsearch and OSS buckets for PAI model weights.
  2. Initialize Notion CMS: Create a Notion database, generate an Internal Integration token with content:read and content:write scopes, and export the DATABASE_ID.
  3. Scaffold Notion MCP Server: Configure mcp.json with:
  4. ``json { "server": "notion-build-ai", "env": { "NOTION_TOKEN": "<token>", "DATABASE_ID": "<id>" } } ` Deploy via npx @modelcontextprotocol/server-notion`.

  5. Configure Bailian Agent & IDaaS M2M: In Bailian, attach the MCP endpoint. Enable keyless M2M auth via POST /v1/idaas/oauth/token with grant_type=client_credentials, client_id, and client_secret. Set the agent system prompt to autonomously generate, structure, and publish assets.
  6. Build ES Ingestion Pipeline: Use the ES Ingest API POST /_ingest/pipeline/notion-sync to parse Notion webhook payloads, normalize to title, body, metadata, and route to POST /knowledge_base/_doc.
  7. Integrate PAI Ranking: Train a learning-to-rank model on PAI, export to OSS, and attach to ES via script_score query: "script": {"source": "pai_rank_model.predict(_source.features)", "lang": "painless"}.
  8. Deploy Vercel Frontend + Cloudflare CDN: Run vercel deploy --prod with NEXT_PUBLIC_IDAAS_CLIENT_ID and NEXT_PUBLIC_ES_ENDPOINT. Point your Alinux edge domain to Cloudflare, configure Cache-Control: public, max-age=3600 for static assets, and enable Always Online for CDN fallback.
  9. Enforce IDaaS End-User Auth: Implement Vercel middleware to validate OIDC tokens: import { getSession } from '@auth/nextjs'; and proxy authenticated search requests to ES with Authorization: Bearer <jwt>.

Architecture

Bailian agents call Notion via MCP using IDaaS M2M tokens to create/update pages. Notion webhooks trigger an ECS-hosted ingestion pipeline that normalizes JSON payloads and pushes them to Elasticsearch. PAI-trained ranking models run as ES plugins to score queries. End-users authenticate via IDaaS OIDC, query through the Vercel frontend, and receive ranked results cached at the Cloudflare/Alinux edge.

Prerequisites

Common pitfalls

Typical questions

FAQ

Q: How does the platform operate as an autonomous AI knowledge base with a self-maintaining search pipeline? A: The platform functions as a self-maintaining enterprise search system where Bailian AI agents autonomously create and curate knowledge content in Notion. This content is automatically ingested into Elasticsearch with PAI-trained ranking models to continuously optimize discoverability without manual intervention.

Q: How do Bailian agents feed Elasticsearch using PAI ranking models? A: Bailian agents populate Elasticsearch by routing curated Notion content through a parallel ingestion pipeline that applies PAI-trained ranking algorithms. This automated workflow ensures that newly generated knowledge is immediately indexed and ranked for optimal search performance.

Q: How does IDaaS secure search access for AI-generated content? A: IDaaS protects the platform by enforcing end-user authentication for search queries while simultaneously managing machine-to-machine authorization for the AI agents. This unified identity layer ensures that both content creation and search retrieval remain securely gated.