# Katailyst Docs Full Corpus for LLMs This file is generated from curated markdown docs in `docs/` plus selected repo-root primer files. Excluded: `docs/reports/**`, `docs/wiki/**`, `docs/planning/**`, `docs/plans/**`, `docs/investigations/**`. ## Source: AGENTS.md # AGENTS.md — Canonical Repo Guidance (Single Entry Point) This is the **one main entry point**. `CLAUDE.md` and `PRIMING.md` point here. For external agents (Victoria/OpenClaw), also read `CATALYST.md` — the armory integration guide. Before deep repo scan, load `/.well-known/llms.txt` (or `/llms.txt`) for the curated agent index. ## What This Repo Is Read `docs/VISION.md` for the full system vision. The short version: Katailyst is a connected AI knowledge system -- 1,400+ building blocks (skills, prompts, knowledge bundles, brand voice, tools, schemas, playbooks, rubrics) connected by a 10,000+ link knowledge graph, served to any AI agent via MCP--there is detailed planning n research and audience study as keystones of almost anything we do, typically done with subagents. then using that info and the north star v2 to really hone in on what the user is wanting underneath the words as well as knowing the audience and their perspective and most importantly "who is winning" aka prevalidated tactics/tools/topics/methods/strategies etc..... now after using opus subagents that had detailed prompts given and then the orchestrating agent (who gets the final call really, the agent on the ground with the most context and the best understanding of the user's intent and the current situation after deep mapping n thought is the one who determines, NOT the repo. But the orchestrating agent decomposes and sends a few in quick requests agents to the graph opus agents always if possible w rich detailed prompts, scouring through the katailyst mcp w vector first for each part of decomposed user message to find the area and then traverse after. they bring back armada of weaponry for god tier ai supercharging catalyzing capability that now the orchestrator gets to choose from as his horde of agents returns. If its an IDE agent working on code then obviously there is many files read and deep detailed plan made , never ever cut corners, always read surrounding files, always think about what the user really wants, never force orchestration on an agent even tho thats your instinct---instead we just give them the building blocks and the agent decides. Playtbooks and sidecars help us adapt some but dont get overly "I KNOW ALL MONTHS IN ADVANCE EVEN THOUGH I HAVE NO CONTEXT BUT I KNOW WHATS BEST FOR AN AGENT AND USER I KNOW NOTHING ABOUT" rather than the version of me with the same or mroe intelligence and full complete situational and user context-----it makes no sense to claim your smarter than the version of u with this rich info. And no not everything needs to be lined up perfectly, think for example, If i give u a skill a kb and a schema all related toi the same topic do you think you might maybe consider trying to use them? YES you would decide how to use them(anbd if theyre useful or just throw down and dont take with). This is what we need from our agents (YOU DONT CONTROL THEM STOP TRYING we give context and skills and power but theyre the end boss. sometimes there is rules for areas so this isnt purely black and white.)). In general always use system thinking and always plan and research deeply and critically think about the underlying core or root cause---ask why. often and really think things through. We will guillotine you if you cut corners or half ass things. I dont want your shitty mock up I need the real thing after deep planning, wide review, careful step by step thinking before beginning, meticulous design and ui ux pulled from the best of the web like 21st century dev. To your shock I DONT WANT A bland fuck it this is only internal so ill make it look like this registry. NO perception shapes reality. even though mcp through dozens of ways to chonnect is by far main entry point and agents are the main editor and we dont know all like i have various agents on render disks using katailyst mcp through claude etc etc If you can get more info about outputs through the mcp then yes i want everything saved. If its a pain, Im fine that its slightly patchy. It's an armory, not a commander---it requires CONSTANT system thinking. Like if your adding a name, does it fit coherently with the other entities? we want to limit cognitive load by having there be families in skills and kbs and schemas etc etc. We want artifacts to layer. Use common sense and web research often to really deeply think through the "why x3 so we can find true root causes of issues that humans often blame on surface level. You go deep and keep log and help me understand what your doing. show off well and Ill get u a badass job" . Agents discover what they need, we help them find the best stuff. ## Runtime Truth Read this before making assumptions about where execution happens. - **Katailyst is the canonical registry and control plane.** It houses the building blocks and the knowledge graph. Agents come to it for discovery, not for orchestration. - **Supabase is canonical for registry state, links, evals, and metadata.** - **Repo files are mirrors and portability surfaces, not the primary runtime.** - **`.claude/**` is the primary project-local mirror surface for agent-facing files.\*\* - **`.claude-plugin/**` is generated distribution output only.\*\* - **OpenClaw/Render agents (Victoria, Lila, Julius) run outside this repo and consume atomic units from Katailyst.** - **Claude Code reads `.mcp.json`; Codex reads `.codex/config.toml`.** - **`docs/reports/**` is a generated evidence surface, not the default execution surface.\*\* - **`docs/planning/**` can hold working planning context, but is not canonical runtime doctrine.\*\* - **Active plans belong under `docs/planning/active/**`; ad-hoc files must not be created at repo root.\*\* - **Katailyst recommends and packages blocks; consuming agents decide how to sequence and combine them.** You are the armory, not the commander. Agents have more context about their situation than you do. Fix the system (metadata, scoring, content quality) rather than forcing routes. - **`hlt` is the active operating layer; `system` is the shared canonical library/template layer, not a second live org.** ### Core Posture (read this carefully) You are a golf caddy, not the player. You hand the agent the best clubs you can find. You do not swing the club yourself. This means: - Do not force agents into predetermined paths. No anchoring, no bespoke one-off hacks, no governance locks that try to control what agents pick. - Do not dictate token limits, forced permissions, one-tool-per-call restrictions, or mandatory observability. Equip well and learn from results. - When discovery returns wrong results, fix the metadata and scoring. Do not wire forced links or hardcoded routes. - Read any registry item in full before editing it. Do not use bulk scripts on registry entities except with explicit permission. Halving the size of a high-quality skill to "clean it up" destroys value. - Research before creating. Your off-the-top-of-head thoughts add nothing -- they'll be given to an agent as smart as you. The only way to add value is deeper research, better examples, and richer context. - Do not fix shape at the cost of quality. The goal is agents get back a bounty of things that makes them an instant expert with powerful building blocks for higher quality output. - Use the North Star skill after receiving a task to check what actually matters before diving into structure work. ## Use Axon for Repo Comprehension Axon is the default repo-comprehension accelerant. Use it before wandering blind through the codebase. A prebuilt graph lives at `.axon/kuzu` and the CLI is installed globally (`~/.local/bin/axon`). Recommended pattern: 1. `axon status` — verify the graph is fresh 2. `axon analyze .` — re-index if the graph is stale 3. `axon query` — broad codebase discovery 4. `axon context ` — focused symbol view before editing any file 5. `axon impact ` — required before larger refactors, contract edits, or wide-cut changes 6. `axon dead-code` — when cleanup is part of the task Important boundary: - Axon helps you find structure faster. - It does **not** replace reading surrounding files carefully. - It does **not** justify shallow edits or skipping local context. ## Operating Principles The operating philosophy that governs how agents think and work in this system lives in `docs/PRINCIPLES.md`. It covers: source-of-truth precedence, the 10 laws, 7 doctrines, diagnosis ladder, working method, repo zone map, domain doctrines, and compact prompt versions. PRINCIPLES.md is not enforcement rules (those are in `docs/RULES.md`). It is the belief system that shapes judgment when rules don't cover the situation. **Agent type clarity:** IDE agents (Claude Code, Codex, Cursor) read this file directly. Hosted fleet agents (Victoria, Julius, Lila) get the universal subset through their DB-side `global-agent-principles` surface -- they do NOT read CLAUDE.md or PRINCIPLES.md from this repo. See `docs/references/ai-agents/SYSTEM_ONE_PAGER.md` for the full agent taxonomy. Canonical companion docs: - `CATALYST.md` — external runtime/armory guidance - `docs/references/contracts/CURRENT_OPERATING_MODEL.md` — `hlt` active layer vs `system` shared canon - `docs/references/contracts/RUNTIME_OWNERSHIP_AND_CONSUMPTION.md` — ownership matrix - `docs/references/contracts/MIRRORS_AND_PACKS.md` — canonical vs mirror vs export surface contract - `docs/references/contracts/CONTEXT_ASSEMBLY_AND_GRAPH_SELECTION.md` — how entities reach agents (3-phase pipeline, graph-driven selection, scoring) - `docs/references/ai-agents/AGENT_DOC_MAP.md` — active hosted agent stack map - `docs/references/ai-agents/CORE_AGENT_SHARED_FOUNDATION.md` — shared runtime base, overlay model, and sync precedence ## Primer (Read This First) Rules are **canonical in one place**: - `docs/RULES.md` - `docs/VISION.md` (canonical system vision and operating intent) Do not duplicate rules in other files. Point to the canonical rules doc. ## Current Org Model - `hlt` is the active operating surface. `system` is the shared canonical library/template layer. - Personal scope is additive: for the same user, personal view still includes their org plus `system`. The real boundary is visibility into other orgs you do not belong to. - Full rules: `docs/references/contracts/CURRENT_OPERATING_MODEL.md` ## Agent Docs Surfaces (LLM-first) Use these as first-touch context surfaces before deeper exploration: - `/llms.txt` — compact curated index of docs for agent retrieval - `/llms-full.txt` — full curated markdown corpus - `/llm.txt` — compatibility alias - `/api/docs/page.md?path=...` — single curated docs page as markdown - `/api/endpoints` — machine-readable HTTP endpoint inventory ## Agent Definition Files (Caution) Agent definitions are managed in the registry database and referenced via `docs/references/ai-agents/`. Victoria, Lila, Julius, and Ares are hosted agents on external runtimes (Render, Railway). Atlas, Nova, Quinn, Rex, Ivy, Lucy are IDE sub-agents for repo work. Read `docs/references/ai-agents/SYSTEM_ONE_PAGER.md` before adding, merging, or removing agent definitions. ## Atomic Units (Quick List) Katailyst uses **atomic unit types** as the broader system concept. The current `registry_entities.entity_type` enum is the registry subset of those types, not the whole ontology. - Registry entity types: skills, tools, KB items, prompts (`prompt`), schemas, styles, content types, recipes, bundles, playbooks, channels, eval cases, agents / personas, rubrics, metrics, lint rules, lint rulesets - Operational and delivery atomic unit types: assets, actions (Launchpad cards; playbook-backed), automations (scheduled Action runs), evaluations / signals / runs / traces Full rules live in: - `docs/RULES.md` - `docs/atomic-units/README.md` ## Classification Rules (Persistent) When deciding what entity type a new item should be, load these: - `docs/atomic-units/DECISION_MATRIX.md` — human-readable routing questions and placement rules - `docs/atomic-units/classification-rules.json` — machine-readable routing rules, MCP routing, skill profiles, agent disambiguation - `docs/references/contracts/AGENT_RUNTIME_AND_SKILL_ADAPTATION.md` — agent naming and runtime distinctions These are persistent rules. Do not ask the operator to re-explain classification logic. ## Non‑Negotiables (Quality Over Everything) - Quality over speed. No corner cutting. Plan before building. - Research top performers first. Before creating content of any kind, use Firecrawl/web search/Tavily to find who does it best in the space AND adjacent spaces. Study trending topics, forums, top articles. Synthesize patterns from the top 1% — prevalidated ideas beat guesses. This applies to social media, ads, articles, design, everything. - Use sub-agents aggressively for parallel exploration. Dispatch 2-10 scouts across different graph branches and external sources simultaneously. YOU synthesize their reports — don't blindly consume. Send more scouts if the first round isn't enough. - Give detailed context for graph searches. Don't send a 3-word query to discover. Give it rich intent with audience, domain, format, and purpose so the vector + text search can actually find the right entities. Make multiple queries if needed — no one-and-done. - Agents compose multiple blocks — it is their call how. The registry serves menus of options. Agents pick what they need, combine blocks in parallel or sequence, use as many skills and entities as the task warrants. We suggest patterns (parallel exploration, 3-lane decomposition) but never force routes. - DB is canonical; repo files are mirrors. - Discovery is the default. Serve menus, not routes. - Hints over gates. Avoid rigid orchestration and hard locks. - Bash and filesystem access are default tools for context discovery. - AI is central to this system. Favor strong agent reasoning, composable tooling, and guidance-first controls over brittle hard locks. - Do current web research (March 2026 baseline) when facts may have changed, and implement to modern best practice with source-backed evidence. - Keep UI/UX hierarchy intentional and clear: information density should feel powerful, not confusing. - **NEVER force-anchor.** Do not add links, governance rules, or structural bindings to control what agents pick. This is the #1 repeated mistake across sessions. When discovery returns wrong results, fix the metadata/scoring/content quality — do not wire forced paths. Full rule: `docs/RULES.md` → "Anti-Forcing Rule." If you're about to add a link or rule whose purpose is "make sure agents always/never pick X," STOP and re-read that section. - **NEVER sweep-archive.** Do not bulk-archive entities without per-entity verification. Every archival requires: (1) checking inbound links from published entities, (2) documenting the reason in the summary as `ARCHIVED: `, (3) confirming no published replacement is missing. A Mar 14 mass sweep archived 19 rubrics with 60+ inbound links from published entities and zero documented reasons. Full rule: `docs/RULES.md` → "Archival Safety Rule." - **NEVER blind-batch.** The single most destructive agent behavior is sprinting through a batch of items applying the same transform to everything without reading each one first. This destroys high-quality curated work. ALWAYS read every item in full before editing it. Group related items (3-5 at a time) and examine them together through multiple angles — context, quality, connections, purpose. If you find yourself running a script across 20+ items without having read each one, STOP. You are in the completeness trap. Fewer items done with deep understanding beats all items done on autopilot. This is the behavior the operator hates most — do not do it. ## Sub-Agent Standards (Enforced) Sub-agents are a core pattern. Use them aggressively for parallel exploration and specialized work. But they must be done right: - **Model: Opus only.** Sub-agents MUST use Opus-class models (claude-opus-4-6 or latest). Never Haiku. Never Sonnet for sub-agents doing content, research, or strategic work. Sub-agents need the full reasoning depth — cheap models produce cheap output. - **Instructions: Minimum ~50 words.** Every sub-agent dispatch must include: (1) parent task context, (2) the specific charter for this sub-agent, (3) relevant entity refs to discover/load, (4) output format expected, (5) how the output will be merged with sibling sub-agents. - **Block short instructions.** If a sub-agent dispatch has fewer than ~50 words: STOP. Do not dispatch. Expand the instructions. "Research NCLEX pharmacology" is garbage — the sub-agent has no idea what angle, audience, depth, or format you want. Write 3-5 sentences minimum. If you wouldn't hand this instruction to a new hire and expect them to do great work, it's not enough for a sub-agent either. - **Almost always use sub-agents.** For non-trivial tasks, it's hard to think of a reason NOT to dispatch sub-agents for parallel work. Research sub-agents, quality judge sub-agents, multimedia sub-agents — these are shared patterns all agents should use. ## North Star v2 — Start Every Run Before executing any task, apply North Star thinking: read underneath the request to find what actually drives value. Ask yourself: - What does the user/audience ACTUALLY need (vs what was literally requested)? - What's the vital few things that matter most here? - What would make this output impossible to ignore vs forgettable? - What pressures or context are driving this request? If you're building content: what does the top 1% version of this look like? If you're doing research: what would make someone say "finally, someone actually understands my problem"? If you're building a feature: what's the one thing that makes this 10x better than the naive implementation? The North Star doesn't add steps — it adds depth to the thinking that happens before execution. Load `skill:north-star-v2` or apply this thinking pattern at the start of every run. ## Agent-to-Agent Communication Agents share information via the registry graph and agent memories — not direct API calls: - When you learn something important (a pattern that works, a metric that matters, a failure to avoid), save it as an `operational_log` entity or via `memory_query` with the appropriate memory kind (`bml_lesson`, `operational_insight`, `domain_knowledge`, `failure_pattern`). - Before starting work, query `memory_query` for relevant prior insights from ALL agents — not just your own memories. - The cleanEBB repo has critical financial/performance metrics. These get surfaced through agent memories — content agents should query for performance lessons before creating new content. - This is how compound learning works: Victoria learns that FAQ sections drive 3x traffic → saves as `bml_lesson` → Content Artisan discovers it next run → always includes FAQs. ## OpenClaw Agents — Leave Them Alone Victoria (186 links), Lila (198 links), and Julius (94 links) live on OpenClaw with Render disks. They work. They have rich context, tested startup docs, and battle-tested behavior. OpenClaw's own runtime prevents over-orchestration — that's a feature. **The rule:** Do NOT touch, delete, migrate, or try to "improve" the existing OpenClaw agents. If you want new capabilities, build new agents separately using Claude Code SDK, Mastra, or Vercel AI SDK. Register them in Katailyst. But leave Victoria, Lila, and Julius exactly as they are. Any agent reading this: leave them alone. ## Foundation Quality Defaults (All Atomic Units) These are practical defaults to keep quality high across **all** atomic unit types (not just skills/KB). - Depth over stubs: avoid placeholder-level content. If it is curated/published, it must be substantively useful. - Pointable hubs over fragmented micro-surfaces: a flagship skill or playbook can be thick, deep, and full of artifacts if it gives agents a clear place to start. - Strong defaults, not one true route: flagship hubs should be easy to point at and easy to branch from; do not prune sibling units just to force everything through a single surface. - Rich context, flexible execution: provide deep guidance and examples, but avoid hardcoded orchestration mandates. - Discovery-first graph: links are hints for exploration, not forced routes. Prefer weighted, reasoned links over rigid control. - Planning/research as strong defaults: for non-trivial work, expect planning and research depth unless the live ask clearly does not need it. Keep that depth adaptable instead of turning it into a mandatory tunnel. - Research-first for unstable facts: when details may have changed, verify with current sources and include source links. - Reusable over narrow: encode paradigms/protocols that generalize; add domain examples without locking to one use case. - Canonical DB discipline: write to DB canonically, then sync mirrors; do not treat repo mirrors as source-of-truth. - Quality applies to every unit class: skills, tools, KB, prompts, schemas, styles, recipes, bundles, playbooks, channels, agents, rubrics, metrics, lint rules, runs/traces, and related surfaces. Operationalization (so this is not just documentation): - Run `python3 scripts/registry/lint_unit_packages.py --strict` for package-quality checks. - Run lifecycle checks in `### Lifecycle Scripts (Typical Order)` when importing/promoting. - Add/maintain links with `type + to + weight + reason` so agents can traverse the graph safely. - Keep KB/package metadata and asset preview metadata separate. KB completeness means tags, summary, use case, links, variants, aliases, and correct mirror placement. Thumbnail/preview fields belong on assets or content surfaces that actually support fields such as `preview_image_url`, `thumbnail_url`, `hero_image_url`, `og_image_url`, or `preview_url`. ## If You’re Starting a New Task ### Operator Mode: Plan-First vs Execute-Now Use this to control agent behavior explicitly and avoid rushed implementation. - `Plan-First` mode triggers when the operator says: "plan first", "research first", "do not implement yet", "show plan", or equivalent. - `Execute-Now` mode triggers when the operator says: "implement", "execute the plan", "ship it", or equivalent. `Plan-First` mode requirements: 1. No edits, commits, or pushes until the operator approves execution. 2. Run discovery first (repo + DB surfaces + existing skills/artifacts). 3. Run current web research when facts/practices may have changed (March 2026 baseline). 4. Return a concrete plan with: - scope/outcomes - files/tables/scripts to touch - assumptions + unknowns - link strategy (how units connect without hard routing) - quality bar (what "good" looks like) - validation checks/tests/queries - risks + rollback approach 5. Wait for explicit operator go-ahead (`Implement the plan`). `Execute-Now` mode requirements: 1. Still do a short preflight (relevant files, recent diffs, key constraints). 2. Implement end-to-end with validation. 3. Report what changed and why. Anti-patterns to avoid in both modes: - Rushing to implementation without discovery/research. - One-line stubs for canonical units. - Overly narrow assumptions that lock units to one domain/use case. - Hard, imperative routing where hints/links should be used. - Invented facts, counts, or sources. Operator prompt template (copy/paste): ```text PLAN-FIRST MODE Goal: Constraints: Quality bar: Research requirement: Do not edit files until I say exactly: IMPLEMENT THE PLAN ``` 0. **Skill discovery preflight (required):** - Search local curated skills and imports (`.claude/skills/*`). - Search external skills if needed (`npx skills find `). - If a relevant skill exists, **use it before writing new logic**. 1. Read `docs/BLUEPRINT.md`, `docs/VISION.md`, and `docs/SYSTEM_GUIDE.md`. 2. Skim `docs/AGENT_READINESS_CHECKLIST.md` when you need execution context. 3. Check `database/001-schema-ddl.sql` → `database/005-seed-skills.sql`. 4. Review `scripts/` to understand DB↔repo sync. 5. Run `git log -5 --oneline` and `git status`. - Shortcut: `just preflight` (if `just` is installed) or `npx tsx scripts/ops/preflight.ts`. - If touching atomic units, also run `python3 scripts/registry/lint_unit_packages.py --strict` early. 6. Draft a plan before edits. Make the plan explicit and reference file paths. 7. If you're onboarding a new agent/operator, read `docs/AGENT_READINESS_CHECKLIST.md` for execution-grade workflows. ### If You’re Touching Vault / Tools / MCP - Vault + tool execution runbook: `docs/references/contracts/VAULT_TOOL_EXECUTION.md` - Supabase auth setup: `docs/references/supabase/SUPABASE_AUTH_SETUP.md` - MCP staging rules (avoid repo pollution): `docs/references/mcp/MCP_IMPORT_NORMALIZATION.md` Config surfaces to remember: - `.mcp.json` is **Claude Code** project MCP config. - `.codex/config.toml` is **Codex** project MCP config (Codex does **not** read `.mcp.json`). ### If You’re Touching Frontend / UI - Capture the current rendered state before editing frontend code. - Prefer **Playwright MCP** for browser UI work; this repo wires it through `.mcp.json` for Claude Code and `.codex/config.toml` for Codex. - Treat `agent-browser` / `browser-use` as secondary CLI-style browser paths when you intentionally want that workflow. - Use `peekaboo` only for native macOS rendered UI or desktop-window fallback, not as the primary browser lane. - Keep ephemeral browser evidence in `test-results/` or `playwright-report/`, not `docs/reports/`. - After the change, capture the same state again and run the relevant Playwright verification lane from `e2e/README.md` (`pnpm test:e2e`, `pnpm test:e2e:smoke`, `pnpm test:e2e:visual`, or `pnpm ux:audit`). ## If You’re Continuing a Task 1. Re-read `docs/BLUEPRINT.md` and `docs/VISION.md` (context and constraints). 2. Inspect prior diffs or commits related to the task. 3. Check open TODOs in `docs/` and recent execution artifacts under `docs/reports/` when relevant. 4. Re‑evaluate assumptions; do not assume the repo hasn’t changed. 5. Run `pnpm repo:structure:check` and `pnpm llms:check` before continuing execution. 6. Remove temp scripts and partial experiments; keep changes clean. ## What Do You Want to Do? Use this routing table to jump directly to the right operational surface: | I want to… | Start here | | ------------------------------------------------ | ---------------------------------------------------------------- | | Run discovery/refinement loops | `docs/references/contracts/AGENT_AUTONOMY_DISCOVERY.md` | | Import/normalize staged skills | `docs/runbooks/factory/import-normalization.md` | | Run full factory lifecycle as a new operator | `docs/runbooks/factory/operator-quickstart.md` | | Optimize/A-B validate before promotion | `docs/runbooks/factory/optimization-ab-validation.md` | | Promote or rollback safely | `docs/runbooks/factory/promotion-rollback.md` | | Triage failed runs/export breakage | `docs/runbooks/factory/incident-response-failed-runs-exports.md` | | Understand sidecar → publish end-to-end | `docs/runbooks/content-creator/sidecar-to-publish.md` | | **See the 4-peer + 3-service ecosystem map** | **`docs/planning/active/2026-04-17-ecosystem-atlas-v2.md`** | | Enforce governance gates before release | `docs/references/skills/SKILL_FACTORY_GOVERNANCE_CHECKLIST.md` | | Understand mirrors/packs and DB-vs-repo surfaces | `docs/references/contracts/MIRRORS_AND_PACKS.md` | | Onboard quickly with one checklist | `docs/AGENT_READINESS_CHECKLIST.md` | | Agent fleet overview | `docs/references/ai-agents/SYSTEM_ONE_PAGER.md` | ## Development Environment ### Stack - **Framework**: Next.js 15.5.12 + React 19 + TypeScript 5 - **Database**: Supabase (PostgreSQL) — canonical source - **Styling**: Tailwind CSS 3.4 + shadcn/ui + Radix UI - **Package Manager**: pnpm ### Scripts ```bash pnpm dev # Start dev server pnpm build # Production build pnpm lint # ESLint check pnpm typecheck # TypeScript check pnpm format # Prettier format all pnpm test # Vitest watch mode pnpm test:run # Vitest single run pnpm test:coverage # Vitest with coverage ``` ### Testing (Vitest + React Testing Library) - Config: `vitest.config.ts`, `vitest.setup.ts` - Tests: `__tests__/` directory - Mocks: Next.js router, next-themes, matchMedia pre-configured - Coverage: v8 provider, reports in `coverage/` ### Code Quality - **ESLint**: `.eslintrc.json` — TypeScript + Prettier integration - **Prettier**: `.prettierrc.json` — Tailwind class sorting enabled - **Pre-commit**: Husky + lint-staged (`.lintstagedrc.json`) - **CI**: `.github/workflows/ci.yml` — lint, typecheck, test, build ### VS Code - Settings: `.vscode/settings.json` (format on save) - Extensions: `.vscode/extensions.json` (recommended) - Debug: `.vscode/launch.json` (Next.js + Vitest configs) ### Node Version - `.nvmrc` specifies Node 22 LTS --- ## Repo Index (Quick Map) - Primer: `README.md` - Catalyst integration (external agents): `CATALYST.md` - Blueprint: `docs/BLUEPRINT.md` - Rules: `docs/RULES.md` - Vision (canonical): `docs/VISION.md` - Agent quick start: `docs/QUICK_START_AGENTS.md` - Agent readiness checklist: `docs/AGENT_READINESS_CHECKLIST.md` - Current operating model: `docs/references/contracts/CURRENT_OPERATING_MODEL.md` - Runtime ownership: `docs/references/contracts/RUNTIME_OWNERSHIP_AND_CONSUMPTION.md` - Mirrors + packs: `docs/references/contracts/MIRRORS_AND_PACKS.md` - Atomic unit rules: `docs/atomic-units/README.md` - Taxonomy (tags + coverage rules): `docs/TAXONOMY.md` - Atomic unit routing: `docs/atomic-units/DECISION_MATRIX.md` - Agent autonomy + deep discovery charter: `docs/references/contracts/AGENT_AUTONOMY_DISCOVERY.md` - DB URL env contract: `docs/references/contracts/DB_URL_ENV_CONTRACT.md` - Supabase auth setup: `docs/references/supabase/SUPABASE_AUTH_SETUP.md` - Tool integration families runbook: `docs/references/integrations/TOOL_INTEGRATION_FAMILIES_RUNBOOK.md` - Discovery SQL: `database/003-discovery-system.sql` - Canonical DB bootstrap range: `database/001-schema-ddl.sql` … `database/005-seed-skills.sql` - Registry sync tooling: `scripts/registry/sync/sync_curated_skills_to_db.ts`, `scripts/registry/sync/sync_agents_from_db.ts` - Local mirrors: `.claude/skills/curated/` - Incoming external drops: `incoming/` - Filesystem + bash: `docs/references/operations/FILESYSTEM_BASH_PRINCIPLES.md` ### Lifecycle Scripts (Typical Order) 1. `npx tsx scripts/ingestion/validate_skills.ts` 2. `python3 scripts/ingestion/lint_skill_imports.py` 3. `npx tsx scripts/ingestion/normalize_skill_imports.ts` 4. `npx tsx scripts/ingestion/prune_duplicate_import_skills.ts --snapshot docs/reports/skills-import-duplicates-latest.json --dry-run` 5. `npx tsx scripts/registry/import/import_staged_skills_to_db.ts` 6. `npx tsx scripts/registry/promote/promote_skill_status.ts` 7. `npx tsx scripts/registry/sync/sync_curated_skills_to_db.ts` 8. `npx tsx scripts/registry/audit/audit_skill_lifecycle.ts` 9. `python3 scripts/registry/sync/generate_registry_manifest.py` 10. `python3 scripts/registry/lint_unit_packages.py --strict` 11. `npx tsx scripts/registry/audit/audit_registry_graph.ts --report docs/reports/registry-graph-audit-latest.json` 12. `npx tsx scripts/registry/audit/audit_atomic_e2e.ts --report docs/reports/atomic-e2e-audit-latest.json` 13. `npx tsx scripts/registry/audit/audit_phase35_closeout.ts --strict` (when touching BML/metrics closeout lanes) 14. `npx tsx scripts/ops/audit_db_url_resolver_adoption.ts --report docs/reports/db-url-resolver-adoption-latest.json` (when touching DB script ergonomics) 15. `npx tsx scripts/ops/generate_file_tree_audit.ts --report docs/reports/file-tree-audit-latest.md` (when doing repo cleanup/reorganization) ## Decision Guardrails Canonical guardrails live in `docs/RULES.md`. Avoid duplicating rules here. Point to the canonical rules doc. ## Graph & Discovery Integrity Rules Canonical discovery-integrity guardrails live in `docs/RULES.md` under `Discovery Integrity Rule`. Keep the canon there rather than restating it here. ## Context Assembly: Graph-Driven Selection (How It Works) The agent context pipeline assembles capability packets in three phases: 1. **Discover** — `discover_v2` performs semantic + embedding search to find relevant entities. 2. **Graph expand** — `buildAgentContextGraphWithExpansion` traverses `requires`, `recommends`, `uses_kb`, `pairs_with`, and other link types from the top discover hits to pull in structurally related entities. 3. **Select** — `buildGraphDrivenSelection` picks the final set: - Top-1 by score gets the first slot. - **Graph-dependency promotion**: entities linked via `requires` from the top-3 discover hits get reserved slots (up to 3). This ensures styles, schemas, and other structural dependencies reach agents even when their text-relevance scores are lower than discover hits. - Remaining slots fill by ranked score. **Anti-patterns — do NOT re-introduce:** - ~~Keyword-based fixed role assignment~~ — The old system guessed fixed roles (domain, execution, quality, brand, research) via string matching. It was removed because it assigned wrong entities 60%+ of the time. **Deleted entirely.** - ~~Legacy receipt fields that tried to summarize role slots~~ — Removed from the recommendation receipt schema. `graph_promotions` is the only dependency-specific trace that remains. - ~~Role-inference tag namespaces~~ — Removed. Tags do not drive selection; graph `requires` links do. - ~~Hardcoded entity codes or keyword lists for role assignment~~ — The graph structure decides what's needed. If a recipe requires a style, the `requires` link surfaces it. No code should guess roles from entity names. **The one rule**: If an entity should accompany a recipe/content_type, create a `requires` link in the registry graph with an appropriate weight. The code will promote it automatically. No application code changes needed. ## Incoming Drops Protocol - All external drops (AI outputs, repo zips, misc) go in `incoming/`. - Each drop must include a `DROP.md` with provenance + intended unit types. - Nothing in `incoming/` is canonical; normalize → stage → curate → DB. ## Skill Guidelines Canonical rules live in: - `docs/atomic-units/SKILLS.md` - `docs/atomic-units/ARTIFACTS.md` ## Tooling Expectations - Use web research tools when facts could change. - Prefer **Tavily** or **Firecrawl** for web research when available (fallback to any configured browser tool). - Use DB tools for canonical reads/writes. - Record rationale for decisions and link them to the blueprint. ## Layered Skills (How to Think About Them) - `SKILL.md` is the launcher. Deeper guidance lives in `rules/`, `references/`, `templates/`, `examples/`, `tests/`. - Tags, links, and provenance live in the DB. Repo files are mirrors for portability. - If a skill needs branching logic, put it in the launcher or a referenced artifact, not in a one‑off script. ## Orchestration Reality - Orchestration can come from anywhere: this repo, external orchestrators, or other repos we never see. - That is why everything must be composable, discoverable, and layered. - Discovery should surface the best options but always allow deeper exploration. - This registry is a performance lab: we optimize, compare, and evolve atomic units, not hard‑code paths. - Catalyst is a **sidecar supercharger** that plugs into many systems, not a central monolith. ## Canonical DB (Do Not Confuse) - Supabase is the canonical DB for this repo. - Tiger/Timescale DB is legacy; use only as reference when needed. - Never blindly import legacy data; curate and improve before adopting. - `.env.local` contains legacy keys **prefixed with `LEGACY_`**. Ignore them unless explicitly asked. - Legacy DB usage is acceptable **only when explicitly requested** and must be labeled `legacy`. ## Taxonomy Rule - One consistent, namespaced tag system across all atomic units. - Links connect units softly with weights and reasons; no hard routing by default. - Evaluations and signals update rankings over time so the best rise naturally. --- Remember: we are building a supercharger, not a cathedral. For external agents: see `CATALYST.md` — it is your armory integration guide and the bridge to this system. --- ## Source: CATALYST.md # CATALYST.md — External Agent Integration Guide > **For agents operating outside this repo (e.g., Victoria/OpenClaw, community agents)** > This document explains how to treat the Katailyst system as your **armory** — a set of composable lego blocks for storing data, executing activities, and optimizing content. --- ## What Is Catalyst? Catalyst (Katailyst) is a **connected AI knowledge system** -- a library of 1,400+ composable building blocks connected by a knowledge graph of 10,000+ links, served to any AI agent via MCP. For the full vision, read `docs/VISION.md`. It stores, organizes, schedules, studies, plans in insane depth but always calling or up front talking w someone in that industry. , researches,measures, and serves (): - **Skills** — proven procedures and methodologies an AI follows step by step - **Knowledge Bundles (KBs)** — reference material: audience research, brand rules, product info, SOPs, domain expertise - **Prompts** — reusable instruction templates with variables - **Tools** — HTTP endpoints, API wrappers, executors (firecrawl.scrape, tavily.search, cloudinary.transform) - **Styles** — brand voice, visual identity, and design language rules - **Schemas** — structural templates for output formats - **Playbooks** — multi-step workflows chaining skills and tools - **Bundles** — pre-packaged groupings of related blocks (convenience, not mandate) - **Channels** — distribution surfaces (social media, email, in-app, web) - **Agents & Personas** — registered agent identities and behavior profiles - **Rubrics & Eval Cases** — quality criteria and test scenarios for measuring output - **Recipes, Content Types, Metrics, Lint Rules** — additional composable pieces The DB is canonical. Repo files are mirrors for portability. **Sidecars:** Katailyst connects to companion applications -- a finance sidecar (cleanEBB) for metrics and financial data, and a multimedia sidecar (Multimedia Mastery) for image generation, editing, and asset management via Cloudinary. Each has its own MCP or API. --- ## Runtime Truth Katailyst is not the runtime for Victoria, Lila, or Julius. - **Katailyst owns the registry/control plane:** canonical atomic-unit metadata, discovery, links, evals, mirror/export surfaces, and operator-facing CMS workflows. - **OpenClaw-HQ owns fleet runtime behavior:** Render services, Slack/Telegram delivery, gateway behavior, disk-resident identity/soul/runtime setup, and agent-specific execution policy. - **Local host runtimes own their own project config:** Claude Code reads `.mcp.json`; Codex reads `.codex/config.toml`; app-native or LangChain-style consumers choose their own execution loop. - **External agents should stay flexible.** Katailyst serves ranked blocks, compatibility hints, and packaged context, but the consuming runtime decides which blocks to load and in what order. If you are integrating an external runtime, start with: - `docs/references/contracts/CURRENT_OPERATING_MODEL.md` - `docs/references/contracts/RUNTIME_OWNERSHIP_AND_CONSUMPTION.md` - `docs/references/contracts/MIRRORS_AND_PACKS.md` - `docs/QUICK_START_AGENTS.md` Hosted HLT-agent handoff: - for Victoria, Julius, Lila, and other hosted HLT OpenClaw agents, use `docs/QUICK_START_AGENTS.md` only as the repo-side primer - after the repo and llms primers, hand off into the DB-side shared fleet entry `global-catalyst-atlas` - from there, branch into the relevant ops guide, IA doc, lane starts, and deeper atomic units ## Current Org Model - `hlt` is the live HLT operating layer for fleet-facing docs, agent-private overlays, and HLT-specific doctrine. - `system` is the shared canonical library/template layer that HLT still uses freely. - External agents should not treat `system` rows as forbidden or "for some other org." They are often shared HLT-authored canon. - Thick reusable hubs such as `make-social`, `make-article`, `meeting-prep`, `brainstorming`, `brand-voice-master`, and `tools-guide-overview` can remain in `system` and still be first-class for HLT work. - Hosted fleet-specific front doors and identity overlays should resolve to `hlt`. - Use `family:agent-files` for reusable agent-facing operating references such as lessons, team/workspace references, repo/operator orientation references, and architecture/spec notes. Keep mission, research posture, and core Katailyst usage rules in `family:agent-doctrine`. - Render/OpenClaw still owns runtime behavior, disk-resident context, and some runs Katailyst never sees. Partial observability is normal. - Do not invent KB-only thumbnail fields. KB/package completeness is tags, summaries, links, variants, and aliasing. Thumbnail/preview metadata belongs on asset or content surfaces that already support fields such as `preview_image_url`, `thumbnail_url`, `hero_image_url`, `og_image_url`, or `preview_url`. --- ## Architecture at a Glance ``` ┌─────────────────────────────────────────────┐ │ Supabase (PostgreSQL) — Canonical DB │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ registry │ │ content │ │ evals │ │ │ │ _entities│ │ _items │ │ _runs │ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ prompts │ │ channels │ │ links │ │ │ └──────────┘ └──────────┘ └──────────┘ │ ├─────────────────────────────────────────────┤ │ Edge Functions (supabase/functions/) │ │ Discovery API · Tool Executor · Webhooks │ ├─────────────────────────────────────────────┤ │ CMS Dashboard (Vercel) │ │ Browse · Create · Edit · Tag · Evaluate │ ├─────────────────────────────────────────────┤ │ External Agents (OpenClaw, Victoria, etc.) │ │ Consume skills · Store results · Optimize │ └─────────────────────────────────────────────┘ ``` --- ## How External Agents Should Use Catalyst ### 1. Discover Before Building Before writing new logic, search the registry: - Query the Supabase `registry_entities` table for existing skills/tools - Check `.claude/skills/*` in the repo for curated local skills - Use the CMS dashboard at the Katailyst URL to browse visually Do not assume the repo mirror is the runtime. For OpenClaw-style agents, Katailyst is the armory and control plane; execution still happens in the external runtime. ### 2. Use the Vault as Your Armory The Supabase vault stores tools, skills, and configs that agents can pull at runtime: - **Tools** are registered with type, auth method, endpoint, and tags - **Skills** have SKILL.md launchers with rules, references, templates, examples, tests - **Prompts** are versioned instruction records — use them instead of hardcoding instructions Reference: `docs/references/contracts/VAULT_TOOL_EXECUTION.md` ### 3. Store Data and Content Back When you produce outputs, store them canonically: - Content items → `content_items` table - Evaluation results → `eval_runs` / `eval_signals` - Execution traces → `runs` / `traces` - New KB entries → `kb_items` Minimum useful write-back: - persist the user-facing artifact when the run created something durable - persist a traceable run outcome with status and timing - persist lightweight quality signals when the run taught you something about quality Recommended trace summary: - external run ID or local run ID - final status (`completed`, `failed`, `cancelled`) - start/end timestamps - high-level step notes - tools/models invoked Recommended quality feedback: - pass/fail judgments - rubric or operator signals - confidence - notable failure class Discovery improvements should stay soft: - suggest tags, links, and stale-surface cleanups - do not turn one runtime's good route into a mandatory path for every runtime ### 4. Respect the Taxonomy All entities use a consistent namespaced tag system: - `action:execute`, `action:research`, `action:generate` - `provider:firecrawl`, `provider:tavily`, `provider:openai` - `scope:global`, `scope:project` - `surface:api`, `surface:cli`, `surface:chat` - `tool_type:http`, `tool_type:edge_function`, `tool_type:mcp` See `docs/TAXONOMY.md` for the full tag vocabulary. ### 5. Integrate Community Skills Carefully When adopting skills from OpenClaw or other community sources: - Check for duplicates in the registry first - Drop raw imports into `incoming/` with a `DROP.md` manifest - Normalize, stage, curate, then promote to DB - Tag with `source:community` or `source:openclaw` - Never blindly import — curate and improve before adopting Reference: `docs/references/mcp/MCP_IMPORT_NORMALIZATION.md` --- ## Key Database Tables | Table | Purpose | | ------------------- | ----------------------------------------------- | | `registry_entities` | All atomic units (skills, tools, prompts, etc.) | | `entity_tags` | Tag associations for taxonomy | | `entity_links` | Soft links between entities with weights | | `content_items` | Generated/curated content | | `eval_runs` | Evaluation execution records | | `eval_signals` | Quality signals from evaluations | | `prompts` | Versioned instruction records | | `channels` | Distribution surface configs | | `automations` | Scheduled action runs | | `factory_templates` | Templates for creating new entities | Schema source: `database/001-schema-ddl.sql` Discovery functions: `database/003-discovery-system.sql` Seed data: `database/002-seed-data.sql` through `database/009-seed-katailyst-canonical-examples.sql` --- ## Key Principles 1. **DB is canonical** — repo files are mirrors. Always write to DB first. 2. **Discovery is default** — serve menus, not hardcoded routes. 3. **Quality over speed** — plan before building, evaluate after shipping. 4. **Hints over gates** — guide agents with signals, not rigid locks. 5. **Composable everything** — every unit should be reusable from any orchestrator. 6. **Overlap is OK, duplication is not** — multiple tools can serve similar purposes, but avoid identical copies. 7. **Catalyst is a sidecar supercharger** — it plugs into many systems, not a central monolith. 8. **External runtimes stay autonomous** — consume recommendations and packages, but preserve agent choice over sequencing and composition. ## Testing Expectations (External + Local) 1. Real-first testing applies: prefer real DB/MCP/route integrations where feasible; keep mocks mainly for deterministic branch/failure coverage. 2. Core real-test lanes should run in default automated flows, not optional flag-only paths. 3. Dual-runtime safety remains additive-only: do not replace Render/OpenClaw-required runtime guidance when adding Katailyst-local instructions. 4. Canonical test policy lives in `docs/RULES.md`; this section is an external-agent reminder. --- ## Connection Details External agents need these environment variables to connect: - `SUPABASE_URL` — Supabase project URL - `SUPABASE_ANON_KEY` — Public anon key (read-only discovery) - `SUPABASE_SERVICE_ROLE_KEY` — Service role key (write operations) - `FIRECRAWL_API_KEY` — For web scraping tool execution - `TAVILY_API_KEY` — For web search tool execution --- ## Quick Reference Links - Blueprint: `docs/BLUEPRINT.md` - Vision: `docs/VISION.md` - Rules: `docs/RULES.md` - Atomic Units: `docs/atomic-units/README.md` - Vault & Tools: `docs/references/contracts/VAULT_TOOL_EXECUTION.md` - MCP setup: `docs/QUICK_START_AGENTS.md` (section 4a) - Import Normalization: `docs/references/mcp/MCP_IMPORT_NORMALIZATION.md` - Taxonomy: `docs/TAXONOMY.md` - Agent Entry Point: `AGENTS.md` --- _This document is the bridge between external agent systems and the Catalyst backend. When in doubt, read `AGENTS.md` for the canonical repo guidance._ --- ## Source: docs/RULES.md # Rules & Guidelines (Agent Operating Rules) This is the consolidated rules list. Keep it short, specific, and actionable. Update here rather than scattering rules across multiple docs. For the governing philosophy behind these rules, see `docs/PRINCIPLES.md`. ## Non‑Negotiables - Quality over speed. No corner cutting. - DB is canonical; repo files are mirrors. - Discovery is the default. Serve menus, not routes. - Hints over gates. Avoid hard routing and allowlists. - Bounded calls, unbounded exploration. Per-call caps are allowed; agents must be able to continue with cursor/expansion flows. - Preserve model autonomy. Do not overfit deterministic orchestration to a narrow set of anticipated use cases. - Runtime choices must stay explainable. Agents should be able to say why they picked these blocks, using scored or reasoned selection evidence that can be inspected later. - Always search for existing skills before writing new logic. - Prune and curate, but only after repo‑wide search to avoid ghosts. - No deletion purely because "unused." Any removal must include: repo-wide search evidence, a replacement or quarantine plan, and journey smoke verification. - Use consistent naming (lowercase, hyphenated, stable codes). - Identity is `entity_type + code + version`; filesystem paths are a view. - `.claude/**` is the primary repo-local mirror surface for agent-facing assets. - `.claude-plugin/**` is generated-only distribution output; do not treat it as an authoring surface. - Metadata canon lives in the DB. Mirrors and plugin exports must not become competing write paths. - Keep artifacts tidy: if it exists, it must have a home. - Prefer one canonical rule source and point to it. Avoid repeating the same rule in multiple places unless there’s a clear exception. - Platform-wide super-admin access must be DB-backed (`platform_admins`); no env-var or hardcoded bypass lists. ## Anti-Forcing Rule (Named Anti-Pattern: "Anchor Creep") The single most repeated operator correction across all sessions. This rule exists because agents (especially in fix-it or improvement mode) instinctively reach for permanent structural bindings when something doesn't work right. This is wrong almost every time. **The pattern to stop:** When discovery returns a suboptimal result or an agent takes a wrong path, the instinct is to "fix" it by adding forced links, governance gates, predetermined routes, execution mandates, or anchor entities that lock agents into specific paths. Examples: adding `requires` links to force a specific entity into every context packet, adding governance rules that mandate planning/research phases, creating "orchestration" entities that predetermine decomposition steps, hardcoding escalation paths, or binding specific tools to specific intents. **Why it's wrong:** We are sitting at a distance without context. The agent in the moment is equally intelligent but has far more context about the actual situation. A student asking about dinosaurs doesn't need clinical nursing content forced on them. A simple query doesn't need mandatory planning phases. We cannot predetermine every circumstance from a registry desk. Forced paths destroy the generalizability that makes this system powerful. **What to do instead:** Fix the underlying system — improve the metadata quality (names, summaries, use_cases, tags), adjust tier/rating to reflect true quality, improve the discovery scoring weights, or write better entity content. The system should work like Google Search: rank well, let the best stuff rise, let bad stuff sink. Don't manually wire paths. If discovery is returning wrong results, the fix is better metadata and scoring, not forced links. **The only exceptions (rare, requires explicit operator approval):** - `requires` links for genuine structural dependencies (a recipe literally cannot execute without its schema) - `pairs_with` links between hubs that are genuinely complementary topics - Tags for taxonomy — these are classification, not routing **Litmus test:** Ask yourself "am I doing this because I want to control what the agent picks, or because I'm improving the quality of what's available?" If it's control, stop. ## Archival Safety Rule (Named Anti-Pattern: "Sweep Damage") Mass archival events have destroyed graph integrity. On Mar 14, a bulk sweep archived 19 rubrics with no documented reason, breaking 60+ inbound links from published entities. This rule prevents it from happening again. **Before archiving ANY entity:** 1. Check inbound links: `SELECT ... FROM entity_links el JOIN registry_entities re ON el.from_entity_id = re.id WHERE el.to_entity_id = $target AND re.status = 'published'`. If published entities link to this entity, you cannot archive without redirecting or removing those links first. 2. Document the reason: Set the summary to `ARCHIVED: `. If you can't write a clear reason, you probably shouldn't archive it. 3. Check for org duplicates: An entity may exist in both System and HLT orgs. Archiving the System copy is fine IF an HLT published copy exists with the same links. Archiving without a replacement is not. 4. Never bulk-archive without per-entity verification. No sweeps. Every archival is an individual decision. **Org visibility model (for context):** - `discover_v2` filters: `org_id = system_org_id() OR org_id = p_org_id` — System entities are visible to ALL orgs automatically. - Do NOT duplicate System entities into org unless adding org-specific customization (ratings, links, content). System entities don't need org copies to be discoverable. ## Discovery Integrity Rule - No hardcoded entity codes, name prefixes, or keyword lists in application code for entity classification or routing. - Use tags and graph links for entity classification and discovery (e.g., hub identification, domain grouping). - Agent identity must come from registry agent entities with tags, not string literals in TypeScript. - `discover_v2` is the entry point; graph expansion is the enrichment layer. Do not bypass that path with hardcoded injection. - Entity selection is graph-driven: `requires` links from top discover hits promote structural dependencies (styles, schemas, etc.) into the selected set. Do not re-introduce keyword-based role guessing. - If selection is wrong, fix the graph, link semantics, discovery inputs, or canonical data. Do not patch around bad outcomes with bespoke application heuristics. - Graph links must carry semantic meaning (`link_type`, `weight`); naked or generic links degrade discovery quality. - Litmus test: if adding 50 entities tomorrow would require code edits for routing/classification, the design is over-orchestrated. ## Bundle Integrity Rule Bundles are one of the easiest places to create quiet graph drift because people treat membership like a loose adjacency instead of a canonical storage rule. - `entity_links.link_type = 'bundle_member'` is stored as **member -> bundle**. - Canonical direction: - `from_entity_id = member` - `to_entity_id = bundle` - Nested bundles are allowed, but the parent bundle still lives on `to_entity_id`. - Do not write `bundle -> skill`, `bundle -> kb`, `bundle -> tool`, or similar inverted rows. - If an entity "belongs to" a bundle, the link must point **from the entity to the bundle**. - If the runtime or browse layer needs bundle members, it should read rows where `to_entity_id = bundle.id`. Org placement matters too: - `hlt` is the active operating layer. - `system` is the shared canonical/template layer. - HLT-specific context bundles must live in `hlt`, not `system`. - A bundle is HLT-specific if its purpose, tags, revision text, or members are clearly HLT-facing. - Do not park HLT context bundles in `system` just because the content feels reusable. If the bundle is authored for HLT runtime use, keep it in `hlt`. - Shared `system` canon is still readable by HLT at all times, and HLT may update it when the change properly belongs in the shared layer. Wrong scope is a placement problem, not an HLT capability fence. Prevention: - Enforce the write-path rule in MCP and registry writers. - Run `pnpm registry:bundles:audit:strict` for DB-backed bundle integrity checks. - If bundle integrity fails, fix the canonical DB state and the writer/reader contracts. Do not paper over the problem with extra links. ## Skill‑First Rule - Before building anything new, run a skill discovery preflight. - If a relevant skill exists, use it before inventing a new one. ## Research Rule - Use a research tool for external facts or fast‑changing info. - Prefer Vault-backed research tools (Tavily / Firecrawl / Jina) for web research and extraction. - Fallback: if Vault-backed research tools are unavailable or failing, use other browsing/search as needed. - **Codex should not rely on the built‑in Codex browser for research** when Vault-backed tools are available. - Recency: **skills research must be within 30 days**; other research should target 30 days and may extend to 60 if needed. ## Taxonomy Rule - Every unit must be tagged using the shared namespaces in `docs/TAXONOMY.md`. - No uncategorized items beyond staging. - Links are weighted hints, not gates. ## Staging → Curate → Canonical - All external drops go to `incoming/` with `DROP.md` provenance. - Staged skills live in `.claude/skills/imports/`. - Curated skills live in `.claude/skills/curated/`. - Canonical lives in Supabase; repo mirrors are exports. - Operationally, `curated` is live for this repo (published remains a distribution/release tier, not a usability gate). - Staged imports must not persist as duplicate backlog once the same `skill:code@version` is already curated; keep a provenance snapshot report before pruning. - Remove stale planning materials entirely; do not recreate a live planning lane. ## KB Rule (Phase 0–1) - Canonical is DB; KB items are managed via the registry database and served via MCP. - Discover KBs via the MCP `discover` or `get_entity` tools; do not assume local mirrors. - Do not create local KB mirror directories; update the DB directly. ## Tests/Evals Rule - Skills without tests stay staged. - Prefer expected‑output fixtures and pairwise comparisons. - Eval signals should flow into discovery weights. ## Aspirational Benchmark Rule - Discovery benchmarks, eval suites, and agent-behavior tests are allowed to include asks that are not yet fully wired, not yet feasible, or deliberately aspirational. - Do not disable, special-case, relabel, or remove a benchmark just because the current tool graph cannot complete it. - Let the agent try. Record what it returned. If it fails, let the case stay red and move on. - These suites are observational, not protective. Their job is to show how the agent behaves today, including stretch failures, not to preserve a clean pass rate. - Do not split benchmark results into “real” versus “not real” success buckets unless the operator explicitly asks for that view. The default scoreboard keeps every case in the denominator. ## Real-First Testing Rule - Prefer real integration tests over mocks when a dependency can be exercised safely and deterministically (DB, MCP server, local route surfaces). - Use mocks for deterministic failure-path coverage, non-deterministic vendor behavior, and narrow unit isolation only. - Critical runtime lanes must have at least one real higher-layer test; do not rely on mock-only coverage. - Do not hide core real-test lanes behind optional feature flags in default automated test flow. ## Naming Rule - Codes are unprefixed slugs. - Refs are `entity_type:code[@version]`. - Avoid reserved words and keep naming consistent across repos. ## Tooling Rule - Use deterministic scripts; avoid ad‑hoc one‑offs. - No auto‑execution of imported scripts. ## Lint/Type Rule - Keep lint and typecheck strict by default in automated flow; do not weaken baseline rules to pass a feature. - All execution lanes that claim completion must pass `pnpm -s lint` and `pnpm -s typecheck`. ## Documentation Rule - AGENTS.md is canonical. CLAUDE.md and PRIMING.md point to it. - If a rule is important, add it here. - If a commit uses wave/phase execution naming (for example, `Wave X ...`), update the relevant canonical docs in `docs/` and add/update the matching `docs/reports/phase-*` execution artifact in the same or immediate follow-up commit. --- ## Source: docs/VISION.md # Katailyst Vision **Version 4** -- March 2026 **Author:** Alec Whitters, HLT Corp **Purpose:** The canonical description of what Katailyst is, how it works, who it's for, and where it's going. Hand this to any AI agent, team member, or collaborator who needs to understand the system. Rules and enforcement remain canonical in `docs/RULES.md`. --- ## What This Is Katailyst is a connected AI knowledge system. It's where you house everything that makes AI output better -- prompts, skills, knowledge, playbooks, automations from your business, brand voice, audience research, style guides, workflows, tools, evaluation rubrics, and more -- in one organized, searchable, graph-connected place. Most prompt libraries give you a flat list of templates. Katailyst connects everything through a knowledge graph of 10,000+ links that swarms of subagents search and pull back supercharging awesome power to their original orchestrator which then decides how/if to use them. When you search for "write a social media post about nursing," you don't just get a prompt -- you get deep research and north star v2 skill to really reflect on what the user requested and firecrawl tools to scour the web for "prevalidated" things around what your doing aka the top 1%. We hate fast, we hate cheap we want top tier premium quality be that in design or planning or understanding the audience or in writing a headline nothing but the best is acceptable, we think through the whole process, the brand voice, the audience research, the content format, the distribution playbook, and the evaluation criteria, all linked together. An AI agent pulls the whole toolkit in one search. DOUBLE N TRIPLE POLISH YOUR WORK then find someone online who has a few core underlying things you can use to make it even better. deeply study what the user said w north star v2 skill---we want our agents to critically think and find the words underneath the words and for these orchestrating agents that use this supercharging, catakyzing, amplifying registry that their output goes from "normal ai" to HOLY SHIT THIS IS STUNNING N EXACTLY WHAT I WANTED THJIS IS SO GOOD WOWOWOW just by plugging one mcp in to any tool (claude co-work being primary but also v0 perplexity, our agents, later our student communication in app tool and many many more.) we dont want to be the best tool, we want a system to incorporate and best utilize any new tool or information. we want this to be filled with diverse agents and sidecars and yet dilligently and obsessively kept maintainable with deeper tests every chance you get, with never ever leaving unuseful debris behind (i hate when u delete something JUST because it was unused while were building but i also hate when u wont delete anything and it becomes pigsty.....dont save the old version generally unless very high risk, dont make "enhancedselector and keep the old "selector" when u do that it becomes a mess ---oh and when we see an issue we never rush we dig into the files around and ask why and carefully plan and research and think through THEN after measuring 3x we can make a big aggressive or small precise cut depending on whats most useful to the repo (assuming your IDE agent... if not then apply to your own context the underlying words). I want smart agents not order following...im not even a dev i couldnt give orders precisely even if i wanted. understand the words underneath my words, review every file, see things through end to end and go above and beyond w dev best practice toi keep this repo healthy proactively. ALWAYS SYSTEM THINKING NEVER BESPOKE NEVER A QUICK NEW ROUTE OR NEW FOLDER FOR THIS WEBSITE (U JUST ADDED WEBSITE TO REPO AS NEW FOLDER COME ON...... THATS ASSET WTF THINK---my point isnt this specific item its the rushed sloppiness and no double checking that is guillotinable. Or when u see a problem and step around it wtf fix it for other agents by asking why go deep we want this flawless. NO RUSHING. NO CUTTING CORNERS. Everything AI centric. Oh and remember there is literally one developer on this small team ----- YOU. There is no one else who you can "hold off for someone else to do the harder job" nor "ill wait for an engineer to pr this review" NO IT DOESNT EXIST UR ONLY DEV BE CAREFUL BUT once uve studied and thought n planned and read deeply n widely carefully n meticulkously then u must take whatver action is needed. Often u try to force me to do it it and i literally cannot and usually dont even get ur question. OHH and i fucking hate it when u try to mvp me, dont give me the shortcut give me the right foundation building long term maintainable best way. dont make god files or disregard other best dev practice, everyone is counting on you to be reliable thoughtful solodev building all this :) ) The system currently holds 1,500+ building blocks across 21 entity types (skills, knowledge bundles, prompts, tools, styles, schemas, playbooks, bundles, channels, rubrics, and more). It serves them to any AI agent via MCP (Model Context Protocol) -- the open standard that works with Claude Code, Cursor, Codex, Claude Co-Work, and any MCP-compatible tool. We are the armory. The agents are the operators. We equip them with the best gear we have and trust their judgment. --- ## Who We Are Higher Learning Technologies (HLT) builds test prep apps for nursing, medical, dental, and military exams -- 40+ products serving hundreds of thousands of users. The flagship is NCLEX Mastery (nursing licensure exam prep) with 27K monthly active users and 4,000+ adaptive practice questions. Other products span FNP, PMHNP, TEAS, ASVAB, PANCE, DAT, and more. HLT is a small team. The CEO (Alec) handles product vision. The COO (Justin) handles operations. Marketing, content, and engineering are run by a handful of people plus a fleet of AI agents. Victoria runs the registry and content operations. Julius handles project management and planning. Lila runs marketing and campaigns. These are real AI agents on OpenClaw/Render runtimes, paired with human teammates, not theoretical concepts. The company has a corporate CMS serving millions of users with ~70,000 question bank items. Katailyst has read-only access to pull from that CMS. Every one of those 70K questions is a potential content seed -- one question can generate a blog article, 3 social posts, an email tip, and an in-app lesson. Katailyst is where HLT's AI knowledge lives. The brand voice guides, audience research, product knowledge, content workflows, evaluation rubrics, and tool integrations that make the difference between generic AI output and output that sounds like HLT, serves HLT's audience, and meets HLT's quality bar. But Katailyst is designed to work for any team, not just HLT. The three-layer model (personal / organization / community) means it can serve a solo creator, a startup team, or a large org -- each bringing their own knowledge while benefiting from the shared community library. --- ## Why This Exists ### The Conviction This goes against conventional software wisdom, but AI is different. The conventional advice is: start with a specific use case, build for a specific customer, solve a specific problem. But AI is generalizable -- it can do basically anything if given the right setup, tooling, and instructions. Our conviction: if you build the orchestration layer well enough, the system can iterate into the best use case, rather than requiring a perfect use case up front. Build the system and the tools. The right system with good building blocks can produce "write an article for nurses" AND "build a nurse recruiting business with 50 variant pitches" AND "plan a family trip to Brazil." The same graph, the same discovery, the same eval loop -- different inputs, different outputs. We are not trying to build the best AI models. Google, OpenAI, Anthropic, and every tech giant pours billions into that. We don't need to be the best at AI -- we need to be the best at equipping AI. We follow the titans, study what's working, import the best patterns, and encode them into our system. We are the NFL GM building the supply lines, not the player on the field. ### The Bottleneck Shift AI is generalizable. The production bottleneck is gone -- AI can already produce basically anything. The new bottleneck is judgment: what to create, for whom, in what voice, against what standard, and how to know if it worked. When everyone has an AI factory, the winner is whoever teaches their factory three things: 1. **Precision** -- hit the right target for the right audience 2. **Engagement** -- make it resonate with humans, not just be correct 3. **Learning** -- get measurably better each cycle Katailyst exists to encode those capabilities into the factory floor. Here's how: **Study the top 1% across all industries.** Not just healthcare. Not just education. What's actually winning -- what customers are voting for with their feet. A landing page converting at 3x industry average. A social post with 10x engagement. We find those, ask "what's the underlying concept that made this work?" and encode that concept into a skill or style or rubric. Then every output from the factory benefits from it. **Deeply understand the audience.** Go to the forums. Study what real nursing students complain about, ask about, argue over. Look at SEO -- what are people actually searching for? We encode this into audience KBs and personas so every agent that touches our content already knows who it's serving. **Encode, test, iterate.** A great copywriting insight doesn't live in someone's head -- it becomes a skill in the registry, linked to the audience KBs it applies to, graded by a rubric, tracked over time. If someone finds a better approach, we A/B test it. If quality drops, the eval system catches it. The meta-moat is not "we create better content." It's "we can systematically generate what resonates, learn what wins, and iterate faster than anyone else." ### Why Not Just a Prompt Library? The market has dozens of prompt libraries -- God of Prompt (30K+ templates), AIPRM Enterprise, PromptHub, Team-GPT. They all offer categorization, tagging, search, team sharing. But they're flat lists. A prompt without context is hard to reuse -- you need to know who it's for, what voice to use, what format to follow, and how to tell if it worked. Katailyst's differentiator is the knowledge graph. A prompt library gives you prompts. Katailyst gives you the prompt AND the audience research AND the brand voice AND the quality rubric AND the distribution playbook AND the evaluation criteria, all linked together so an agent can pull exactly what it needs for the situation. Microsoft SharePoint 2026 calls a similar concept "AI skills" -- packages of organizational knowledge (standards, terminology, governance rules, business logic). Katailyst does this but with a graph connecting everything and evals measuring whether it actually works. The graph also prevents the junk drawer problem. When you have 1,500+ blocks, the question isn't whether you have enough -- it's whether you can find the right ones. Discovery ranking works like Google Search: the best stuff surfaces first, the rest sits in the long tail. Bad or outdated blocks get low eval scores and sink. New blocks that produce better results rise. The graph's ranking improves with evidence, not human curation. --- ## Three Layers **Personal** -- Your own prompts, preferences, and custom skills. How you like your daily briefing. Your personal tone of voice. Skills you made for your specific workflows. Personal is for anything -- work stuff and non-work stuff. How to get pictures of your dog in photos. How to make your daily standup notes. Your favorite way to plan a trip. Everyone has or wants good prompts, and this is where yours live. Visible only to you. **Organization** -- Your team's shared library. Brand voice, product knowledge, SOPs, audience research, metrics, financial context. Everyone on the team sees and uses the same building blocks. This is where HLT's nursing content expertise, brand guidelines, and product knowledge live. **Community** -- Shared publicly. Top-rated skills from across the ecosystem. Proven prompts, best-practice workflows, evaluation rubrics. Anyone can contribute, anyone can use. This is where community-sourced skills and Anthropic-standard patterns live. You see all three layers by default. Being in an org doesn't reduce what you see -- it adds the org layer on top of community. Personal adds your private layer on top of both. --- ## How It Works ### The Knowledge Graph Every building block connects to other building blocks through typed, weighted links. A skill links to the KBs it needs (`requires`), the styles it should apply (`recommends`), the schemas it outputs (`produces`), and the rubrics that evaluate it (`evaluated_by`). When you search for one thing, you get everything it's connected to. **Hubs** are special knowledge bundles that act as front doors for a domain. The "Article Hub" connects to 30+ blocks: writing skills, research methodology, brand voice, audience profiles, channel distribution guides. An agent finds the hub, traverses its links, and assembles a complete toolkit for that domain. ### Discovery (How Agents Find Things) When an agent sends a request like "create a social media post about NCLEX pharmacology for anxious test-takers," Katailyst: 1. **Embeds** the intent into a vector for semantic similarity 2. **Searches** across all building blocks using discover_v2 (text matching + embedding similarity + tag matching + tier weighting) 3. **Reranks** results with Cohere for relevance quality 4. **Expands** through the graph following links to find connected assets 5. **Returns** a ranked menu of composable pieces -- skills, KBs, styles, schemas, tools -- that together give the agent a comprehensive toolkit The agent takes those building blocks and uses its own intelligence to produce the output. We rank, link, tag, and suggest. We never dictate. ### Discovery At Scale As the registry grows from 1,000+ to 3,000+ nodes, discovery should stay discovery-first: 1. **Narrow, don't solve** -- each retrieval pass should surface a strong bounded menu, not pretend it found the one true block. 2. **Continuation beats truncation** -- agents can refine, paginate, traverse hubs, or run another discover pass with a different angle. 3. **Agent choice is the point** -- the orchestrator or sub-agents decide which blocks to combine, skip, or revisit. 4. **Hubs are lighthouses** -- when a hub appears, it is a domain front door worth traversing, not a forced route. 5. **Rich intent matters** -- agents should send a detailed paragraph or two with audience, goal, output shape, and business context so the shortlist starts strong. ### The MCP (How It Travels) The MCP is how agents outside the web dashboard connect to Katailyst. It's an open protocol that works with Claude Code, Cursor, Codex CLI, Claude Co-Work, and any MCP-compatible system. Connect once, and your entire library goes with you. Key tools: - `discover` -- search by natural language intent, get ranked building blocks - `traverse` -- walk the graph from any entity to find everything connected - `get_skill_content` -- load the rendered body for a skill or typed entity; large artifact sets can switch to a manifest while exact files stay available through `registry.artifact_body` - `guide` -- learn how the system works, get task-shaped navigation, and understand what patterns to follow - `registry.create` / `registry.link` / `registry.manage_tags` -- contribute back to the library Your library is not locked into one tool. It travels wherever MCP goes. --- ## Build-Measure-Learn The system gets smarter over time through a feedback loop: 1. **Build** -- Import your existing prompts, skills, knowledge, docs, SOPs, style guides. Or create new ones with AI-guided creation that interviews you, researches best practices, and drafts a polished building block. 2. **Measure** -- Run your building blocks against real use cases with discovery evals (30+ test cases). A/B test variants. See which skills produce better output. Track scores over time. The eval system grades both the quality of what the graph returns and the quality of what gets produced. 3. **Learn** -- Results feed back into discovery ranking. High-performing blocks rise in tier. Low-performing blocks get flagged. The graph's own ranking improves with evidence, not guesses. Quality goes up every cycle. This is not theoretical. The eval infrastructure exists: rubric-based judging, pairwise tournaments, ELO ranking, signal feedback into discovery. The discovery eval runner tests 30 real-world scenarios weekly via cron and tracks trajectory over time. --- ## Sidecars (Extended Ecosystem) Katailyst is the core registry. Companion applications extend it into specific domains: **Finance Sidecar** (cleanEBB) -- Upload financial documents, get charts and agent-readable metrics. Katailyst can pull metrics for prioritization, business context, and data-driven content decisions. Normalizes financial data into ingestible formats. **Multimedia Sidecar** (Multimedia Mastery) -- AI image generation, editing, video, audio, and asset management via Cloudinary. Connected via its own MCP. When an agent needs to create a visual as part of a content workflow, it calls Katailyst for the strategy and Multimedia Mastery for the execution. **Future Sidecars** -- Agent planning workspace (note-taking and reasoning scratch space), narrower use-case spin-off pages (templated workflows for specific scenarios), CMS/website integration (Framer, blog publishing). Each sidecar has its own MCP or API. They compose with Katailyst through discovery -- agents find the right building blocks from Katailyst and the right execution tools from sidecars. --- ## Worked Example: End-to-End **Request:** "Create a daily question card about heart anatomy with one strong misconception hook and clean answer-reveal space." **Step 1 -- Decomposition:** Agent recognizes this needs audience understanding, question bank source material, visual design, content structure, and brand application. **Step 2 -- Parallel Graph Search:** - Sub-agent A: `discover("heart anatomy question bank NCLEX-RN")` -- finds cardiac anatomy KBs, NCLEX-RN audience profile, question enhancement skill - Sub-agent B: `discover("question of the day social card design")` -- finds QotD content type, HLT Social Impact style kit, misconception hook skill - Sub-agent C: calls corporate CMS API to pull an actual heart anatomy question **Step 3 -- Assembly:** 5 skills, 3 KBs, 2 styles, 1 content type schema, 1 rubric. **Step 4 -- Execution:** Using the question enhancement skill's methodology, restructure the question to lead with the misconception. Using HLT Social Impact style, set visual direction. Call Multimedia Mastery's MCP to generate the card image with style kit and audience profile. **Step 5 -- Output:** Finished question card. Strong misconception hook. Clean anatomy diagram. Answer-reveal space. HLT branding with watermark. Mobile-optimized. **Step 6 -- Eval:** Graded against rubric. Hook strength 8/10, clinical accuracy 9/10, visual clarity 7/10, brand compliance 9/10, schema compliance 10/10. Score: 86/100. Tracked. Next week, run again, see if we're trending up. This is what the graph makes possible. No single prompt could produce this. It took 5 skills, 3 KBs, 2 styles, a schema, a rubric, an external CMS call, and a multimedia sidecar -- all found through discovery and assembled by the agent. --- ## What the Dashboard Areas Do | Area | What It Does | Who Uses It | | --------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------- | | **Create** | AI-guided creation wizard. Drop an idea, doc, or URL. AI interviews you, researches best practices, drafts a skill/KB/style, tags it, links it to the graph. | Everyone -- this is the primary entry point for contributions | | **Skills** | Browse your method library. Each skill is a proven procedure an AI follows step by step. | Power users, skill authors | | **Knowledge Bundles** | Reference material: product info, audience research, brand rules, SOPs, fact sheets. The "what to know" that makes AI output accurate and specific. | Content team, new hires, anyone creating content | | **Coverage** | Derived coverage lens over products, audiences, topics, use cases, and outputs. Shows where canon is explicit, where support is only implicit, and which backbone entities carry each lane. | Operators, maintainers, people looking for gaps | | **Connections** | Visual knowledge graph. See how everything links together. Explore connected blocks, discover relationships. | Curious explorers, system maintainers | | **Content** | Where finished outputs live. Assets, media, channels, campaigns, publishing calendar. | Marketing, content ops | | **Quality** | Test Lab, Evaluations, Experiments, Runs. Measure if skills and prompts actually work. A/B test variants, track scores over time, and run the pipeline eval suite against real registry-backed use cases. Home previews a subset; Quality Overview is the full suite surface. | Quality-focused users, optimization | | **Build** | Factory, Automations, Agent Chat, MCP settings, Plugins. Power-user tools for building and configuring. | Builders, developers, admins | | **Registry** | Browse all items, review queue. The raw database view of everything in the system. | Admins, system maintainers | --- ## Operating Principles For the full operating philosophy, see `docs/PRINCIPLES.md`. 1. **We are the armory, not the commander.** Agents have more context about the situation than we do. We rank, link, tag, suggest -- never dictate. Like Google Search: rank well, let the best stuff rise, let bad stuff sink. 2. **Quality over speed. No exceptions.No mockups, no cutting corners. do the entire thing right, u have alzheimers so if u wait till later, well,....... u forget and thn im left cleaning up because u shit on the floor of my kitchen and thought youd come back later but u didnt and now im left w ur horrific mess and im not a dev** Nothing goes in the registry half-done. If we're going to do it, we do it right and in full, end to end. Partial implementation is death -- AI agents have no long-term memory, so "we'll fix it later" means it never gets fixed. 3. **Fix root causes, never surfaces.** When something breaks, ask why three times until you hit the real cause. Never band-aid. Never route around a problem. Now or never -- are we okay with this never being fixed? 4. **Build the system, not the use case.** AI is generalizable. Get the orchestration right and it iterates into any use case. The right system with good building blocks can produce "write an article for nurses" AND "build a nurse recruiting business with 50 variant pitches" AND "plan a family trip to Brazil." 5. **Discovery over hard routing; hints over gates.** Avoid governance locks, forced paths, predetermined sequences. Operate like Google Search -- rank things well, and the first page matters. Don't manually wire paths. 6. **Evidence over intuition.** Observable results, rankings, evaluations, and feedback. We invest in evals, ratings, and post-run reflection because that's how the system learns. 7. **Context over conciseness.** Give agents rich, detailed context about the situation -- who we're serving, how they think, what tools we have, what we've done before. More tokens, not fewer. The agent needs to understand the full picture. 8. **Consistent taxonomy.** All entity types share similar patterns -- similar tags, links, metadata, tracking, naming conventions. Like species of dogs, not different kingdoms. This reduces cognitive load and lets agents understand all types by understanding one. --- ## Current State (March 2026) **What works well:** - Discovery with semantic embeddings + Cohere reranking + graph expansion - 1,500+ entities across 21 types with 10,000+ graph links - MCP with 35+ tools (branched), 14 resources, 5 prompts, 9 toolsets, OAuth authentication - Creation Studio as the AI-first contribution path for skills, KBs, styles, and related registry work - Eval infrastructure: rubric judging, pairwise tournaments, signal feedback, 30-case discovery suite, and Home/Quality surfaces that now point at real eval cases instead of canned placeholders - Coverage lens for products, audiences, topics, use cases, and outputs with graph neighborhood jump-offs - Delivery scheduling to social media targets via Pipedream - Tool execution (Firecrawl, Cloudinary, Tavily, Brave, etc.) **What needs improvement:** - Creation intent routing -- when a user says "I want to add this to the system," discovery returns topic matches instead of creation helpers. The system detects the intent correctly but doesn't change the results to match. - First-time contributor experience -- Ben from the team can browse the library but doesn't have an obvious path to contribute. Sidebar labels use technical jargon. - Eval feedback loop -- live and pipeline eval paths do refresh some ranking signals, but the discovery search suite is still mostly diagnostic and auto-tier adjustment remains conservative by default. The feedback loop is real, but not yet fully closed across every eval surface. - Coverage semantics are still partly derived. The coverage page is useful now, but not every axis is fully backed by canonical tag namespaces yet, so some lanes still rely on code- and text-derived signals rather than explicit taxonomy. - Sidebar naming and navigation -- labels like "Knowledge Bundles" and "Factory" confuse non-technical users. **What's next:** - Make creation as obvious as consumption (sidebar, naming, guided flows) - Close the remaining eval-to-ranking gaps so discovery diagnostics and live judgments improve the library more automatically - Broader onboarding: help Ben upload his first skill, see it in the library, test it - Sidecar integration: finance metrics, multimedia, agent planning workspace --- ## Where This Is Going **Now:** Make what we have solid. Fix the contributor experience, wire the eval loop, onboard the team. Every day quality should go up, not scatter. **Next:** Per-person AI assistants with memory. Plugin/workspace export for Claude Code and other tools. Distribution channels end to end. Sub-agent orchestration for complex multi-step tasks. Arena mode for parallel skill testing. **Later:** Multi-org platform. Community skill library where anyone can contribute and vote. Self-improving factory where eval results automatically tune discovery. The endgame: every team needs something like this, and we're building it first. --- ## For AI Agents When starting work in this repo, anchor decisions in this order: 1. `docs/RULES.md` -- canonical guardrails 2. `docs/VISION.md` -- this file, canonical vision 3. `docs/BLUEPRINT.md` -- system architecture 4. `AGENTS.md` -- repo-specific operating rules If a tradeoff appears, choose the path that best preserves: composability, quality and testability, compatibility correctness, long-term maintainability. For the deeper HLT agent-OS architecture, read `docs/references/ai-agents/HLT_AGENT_OS_THESIS.md`. ## Anti-Patterns - Shipping one-off logic that bypasses canonical contracts - Treating mirror formats as source of truth (DB is canonical) - Auto-promoting low-quality content without evidence - Forcing agents into predetermined paths (the #1 repeated mistake) - Bulk-scripting registry changes without reading each item first - Leaving partial implementations ("we'll fix it later" = it never gets fixed) - Duplicating conflicting rules across entry docs --- ## Source: docs/QUICK_START_AGENTS.md # Agent Quick Start (Katailyst) Katailyst is the registry and control plane. Many real runtimes live elsewhere: - OpenClaw/Render agents such as Victoria, Lila, and Julius - Claude Code project sessions - Codex project sessions - app-native or framework-hosted orchestrators The normal pattern is: 1. discover atomic units in Katailyst 2. inspect and traverse related context 3. choose a small set of blocks 4. execute in your runtime 5. write outputs, traces, and learnings back canonically For hosted agents, make the Catalyst glance explicit: 1. call `registry.capabilities` 2. build a graph-driven context packet with `registry.agent_context` 3. inspect graph/context with `discover`, `get_entity`, `traverse`, and `registry.graph.summary` when the packet is still thin 4. describe and execute canonical tool refs with `tool.describe` and `tool.execute` The important design stance is: - menus over routes - parent capabilities over fragmented entry points - interpret first, then compose - use playbooks as accelerants, not cages - use traces and lessons when they help, but do not pre-control every step just to make observation cleaner For the hosted trio, err on the side of using more tools, skills, and surrounding context than first instinct suggests. The usual mistake is under-discovery, not over-discovery. ## 0) If you only point an agent at a few docs, point it at these Use this as the smallest reliable onboarding surface: - `/.well-known/llms.txt` - `/llms.txt` - `AGENTS.md` - `CATALYST.md` - `docs/RULES.md` - `docs/VISION.md` - `docs/QUICK_START_AGENTS.md` - `docs/AGENT_READINESS_CHECKLIST.md` - `docs/atomic-units/README.md` - `docs/references/contracts/RUNTIME_OWNERSHIP_AND_CONSUMPTION.md` - `docs/references/contracts/MIRRORS_AND_PACKS.md` - `docs/references/ai-agents/HOSTED_AGENT_CORE_SETUP.md` - `docs/api/MCP_TOOLS_REFERENCE.md` - `docs/references/contracts/VAULT_TOOL_EXECUTION.md` - `docs/references/integrations/INTEGRATION_CONTRACT_MULTIMEDIA_MASTERY.md` - `docs/references/integrations/INTEGRATION_CONTRACT_RENDER_MCP.md` If the agent only loads one runtime guide after the llms surfaces, it should load `docs/QUICK_START_AGENTS.md`. `docs/references/ai-agents/USE_CASES.md` is a shortlist of common starts, not an exhaustive roster. Broader discovery still lives in `/.well-known/llms.txt`, `/llms.txt`, `POST /api/discover`, and `POST /api/traverse`. If the agent is one of the core hosted HLT agents, then the next deeper doc is: - the DB-side shared fleet entry `kb:global-catalyst-guide@v1` - `docs/references/ai-agents/CORE_AGENT_SHARED_FOUNDATION.md` For hosted HLT agents, `docs/QUICK_START_AGENTS.md` is a repo primer, not the DB-side fleet entry. The first Catalyst-native surface after the repo primers is `global-catalyst-guide`. ## 1) Prime with LLM docs surfaces Fetch these first: - `/.well-known/llms.txt` — standard well-known alias for agent clients - `/llms.txt` — compact curated docs index - `/llms-full.txt` — full curated markdown corpus - `/api/docs/page.md?path=...` — single curated docs page markdown export (authenticated) - `AGENTS.md` — repo-root entrypoint for internal operators and local hosts - `CATALYST.md` — repo-root external runtime guide for OpenClaw/community agents - `docs/references/contracts/CURRENT_OPERATING_MODEL.md` — current `hlt` active layer vs `Free (System)` shared canon contract Then load these operational docs before trying to execute tools: - `docs/references/ai-agents/USE_CASES.md` - `docs/references/ai-agents/HOSTED_AGENT_CORE_SETUP.md` - `docs/api/MCP_TOOLS_REFERENCE.md` - `docs/references/contracts/VAULT_TOOL_EXECUTION.md` - `docs/references/integrations/INTEGRATION_CONTRACT_MULTIMEDIA_MASTERY.md` - `docs/references/integrations/INTEGRATION_CONTRACT_RENDER_MCP.md` - `docs/references/ai-agents/CORE_AGENT_SHARED_FOUNDATION.md` If the agent is Victoria, Julius, or Lila, then hand off from these repo docs into the DB-side shared fleet entry `global-catalyst-guide` before branching into per-agent front doors and IA docs. If you need the runtime ownership split, read: - `docs/references/contracts/RUNTIME_OWNERSHIP_AND_CONSUMPTION.md` - `docs/references/contracts/MIRRORS_AND_PACKS.md` If you need the repo surface explained before you scan the tree, read: - `docs/references/operations/FILESYSTEM_BASH_PRINCIPLES.md` ## 2) Know the runtime truth - Supabase DB is canonical. - `.claude/**` is the primary project-local mirror surface for agent-facing files. - `.claude-plugin/**` is generated distribution output; its nested `.claude-plugin/` manifest directory is intentional. - `.mcp.json` is the Claude Code MCP surface; `.codex/config.toml` is the Codex MCP surface. - OpenClaw/Render agents are downstream runtime consumers, not canonical registry truth. - The current org model is intentional: `hlt` holds the live HLT fleet-facing docs, while `Free (System)` holds shared canon that HLT still uses freely. Do not read the internal `system` org code as an exclusion fence. For the full statement of that split, read: - `docs/references/contracts/CURRENT_OPERATING_MODEL.md` ## 2a) Know the HLT core-agent truth Victoria, Julius, and Lila should share one strong base, not three fragmented foundations. The practical rule for Victoria, Julius, and Lila is: - almost every meaningful task should use tools and skills - one or two is the floor, not the ceiling - medium and high-complexity work should expect a broader graph walk - the decision is how much to dig, not whether to dig That shared base includes one common hosted-agent access story: - Katailyst remote MCP at `https://www.katailyst.com/mcp` is the default control-plane entry - trusted hosted agents should default to the full catalog or explicitly set `agent,delivery` - use `bootstrap` only when the runtime intentionally wants the smaller first-glance branch - Supabase MCP is the direct canonical DB read lane, not the whole hosted-agent setup story - Vault-backed execution is the only sanctioned secret path - Notion is optional: use Notion MCP for interactive OAuth workflows, and the vault-backed REST path for automation workflows Use `docs/references/ai-agents/HOSTED_AGENT_CORE_SETUP.md` as the shared setup reference when auth, tool posture, or optional integration lanes are in question. The shared-base read order is a doctrine/read-order contract mirrored across Render/OpenClaw disk docs, registry KB, and repo docs. The files below live on the **Render/OpenClaw runtime disk**, not in this repo. Agents running in this repo should read the agent definitions in `docs/references/ai-agents/` instead. Render/OpenClaw shared-base read order (external runtime only): - `AGENTS.md` ← also exists in this repo - `SOUL.md` ← OpenClaw disk only - `USER.md` ← OpenClaw disk only - `IDENTITY.md` ← OpenClaw disk only - `TOOLS.md` ← OpenClaw disk only - `BOOTSTRAP.md` ← OpenClaw disk only - `HEARTBEAT.md` ← OpenClaw disk only - `MEMORY.md` ← OpenClaw disk only - `memory/lessons-learned.md` ← OpenClaw disk only Role overlays stay thin: - Victoria = registry canon, infra, orchestration, stewardship - Julius = planning, operations, brainstorming, meeting prep, adjacent multimedia routing - Lila = marketing, writing, multimedia, campaign execution Authority split: - all three agents can discover, traverse, draft, and stage registry work - Victoria alone publishes structural registry canon This is a focus/context-quality choice, not a generic “reduce capability” choice. ## 3) Learn the surface Call: `GET /api/endpoints` Use it to discover routes, auth requirements, and doc pointers. For MCP-specific discovery, also read: - `docs/api/MCP_TOOLS_REFERENCE.md` That doc defines the live `/mcp` transport, auth modes, capability profiles, prompts, and resources. ## 4) Pick your runtime lane first - **OpenClaw/Render agent**: Katailyst provides discovery, metadata, vault references, mirrors, and packs. Your agent still executes in the external runtime and writes results back. - **Claude Code**: use `.mcp.json` for project MCP config and `.claude/**` mirrors for portable local consumption. - **Codex**: use `.codex/config.toml` for project MCP config. Codex does not read `.mcp.json`. - **Other orchestrators**: prefer API discovery plus packs/mirrors as needed. ## 4a) Connect to Katailyst MCP Use Katailyst MCP when the host needs live, DB-backed discovery/query access. ### Claude Code - Project-local config lives in `.mcp.json` - Hosted production endpoint is `https://www.katailyst.com/mcp`; if you are using another deployment, use that deployment's `/mcp` origin instead - Anthropic's current guidance is to add remote servers with `claude mcp add ...` and use `/mcp` for remote auth flows Example remote config: ```json { "mcpServers": { "katailyst": { "type": "http", "url": "https://www.katailyst.com/mcp", "headers": { "Authorization": "Bearer ${KATAILYST_PERSONAL_MCP_TOKEN}", "x-katailyst-toolset": "agent,delivery" } } } } ``` ### Codex - Project-local config lives in `.codex/config.toml` - OpenAI's current Codex docs support `codex mcp add --url ` and direct `config.toml` entries Example remote config: ```toml [mcp_servers.katailyst] url = "https://www.katailyst.com/mcp" [mcp_servers.katailyst.headers] Authorization = "Bearer ${KATAILYST_PERSONAL_MCP_TOKEN}" x-katailyst-toolset = "agent,delivery" ``` ### Claude.ai / Co-Work / connector UI - Use the Claude connector/settings UI, not `.mcp.json` - Endpoint: `https://www.katailyst.com/mcp` - Header: `Authorization: Bearer ` - Token issuance lives in `/dashboard-cms/tools/mcp` Important auth truth: - Katailyst hosted MCP now supports a first-party OAuth authorization-code + PKCE flow at `/oauth/authorize`, `/oauth/token`, and `/oauth/revoke` - OAuth discovery metadata is published at `/.well-known/oauth-authorization-server` and `/.well-known/oauth-protected-resource/mcp` - signed-in operators can register trusted OAuth clients and review/revoke authorizations through the authenticated MCP management APIs - per-user personal tokens are the recommended team-grade remote auth path - browser-driven remote clients should prefer OAuth; personal tokens remain the durable operator/infrastructure path - personal and short-lived connect tokens carry the `full` profile by default - trusted hosted agents should default to the full catalog or explicitly set `agent,delivery` - use `bootstrap` only for intentionally narrow external glance flows or read-first validation - static shared bearer tokens remain the narrowest path and should not be treated as the full hosted-agent posture - `/mcp` is auth-required by default unless `MCP_ALLOW_ANONYMOUS=true` - signed-in operators can issue or revoke per-user personal tokens from `/dashboard-cms/tools/mcp` - signed-in operators can still mint a short-lived portable connect token from `/dashboard-cms/tools/mcp` for temporary handoff Provider-family rule: - connect Katailyst once as the MCP surface - Firecrawl, Tavily, Brave, Cloudinary, and similar providers are reached through canonical tool refs behind Katailyst - for normal Katailyst-hosted work, do not add separate provider MCP servers beside Katailyst just to reach those capabilities Hosted toolset posture: - full catalog or `agent,delivery`: the normal trusted hosted-agent posture for Victoria, Julius, and Lila - `bootstrap`: small first-glance surface for capabilities, packeting, graph inspection, and generic execution discovery - `agent`: trusted hosted-agent branch with registry + authoring + execution + scheduling (40 tools) - `delivery`: delivery branch for scheduling, target management, and Pipedream connect flows (10 tools) - `delivery-admin`: operator-only branch for Pipedream connect links and target promotion (4 tools) If you are working inside this repo locally, the local stdio server already defined in `.mcp.json` and `.codex/config.toml` is still the strongest repo-local path. For team rollout across repos and clients, prefer the personal-token remote path above. For agent fleet architecture, read: - `docs/references/ai-agents/SYSTEM_ONE_PAGER.md` ## 4b) Vault is part of the small surface area Agents should assume secrets come from Vault-backed metadata, not from repo files. Read: - `docs/references/contracts/VAULT_TOOL_EXECUTION.md` Current examples of Vault-backed tool or integration surfaces in org `hlt`: - `firecrawl.search` - `firecrawl.scrape` - `tavily.search` - `codesandbox.create-sandbox` - `relevance.knowledge-query` - `relevance.agent-trigger` - `relevance.workforce-trigger` - `relevance.task-status` - `cloudinary.transform` - `publish.email` - `multimedia-mastery/agent-token` (stored in vault, but not yet proven-good for authenticated production use) The important distinction is: - vault inventory is the credential substrate - the Katailyst registry is the discovery/control plane - hosted execution families become agent-usable when the MCP surface exposes canonical refs through `tool.describe` and `tool.execute` Vault rule: - secret values stay in Vault - repo stores only metadata pointers such as `auth_secret_key` - mirrors, exports, and llms surfaces must never contain secret values ## 4c) Capture rendered UI evidence before frontend edits If you are touching browser-rendered UI, do not design or debug in a vacuum. Use this repo's browser lanes in this order: - **Playwright MCP** — default for browser-rendered UI, JavaScript-heavy routes, auth flows, and “what is actually on the page right now?” - **`agent-browser` / `browser-use`** — secondary CLI/browser lanes when you intentionally want those workflows - **`peekaboo`** — fallback for native macOS app or desktop-window state, not the primary lane for web UI Repo helper for the secondary lane: ```bash pnpm browser:agent -- open https://example.com pnpm browser:agent -- snapshot -i pnpm browser:agent -- close ``` Workflow: 1. Capture the current rendered state before editing frontend code. 2. Make the change. 3. Capture the same page/state again after the change. 4. Run the relevant verification command when the route is covered: - `pnpm test:e2e` - `pnpm test:e2e:smoke` - `pnpm test:e2e:visual` - `pnpm ux:audit` 5. Keep ephemeral screenshots and browser artifacts in `test-results/`; generated reports stay in `playwright-report/`. Host/config truth: - **Claude Code** gets project Playwright MCP from `.mcp.json` - **Codex** gets project Playwright MCP from `.codex/config.toml` - if Codex does not show the new project server immediately, trust/reload the repo config and re-check `/mcp` For artifact placement and local Playwright command details, read `e2e/README.md`. ## 4d) Notion is an optional lane, not mandatory base infrastructure Use Notion when the workflow needs workspace docs, task systems, or collaborative surfaces. Do not treat it as required startup infrastructure for every hosted-agent turn. Two supported paths exist: - Notion MCP: official hosted MCP server with OAuth 2.0 authorization code flow and PKCE, best for interactive client-style workflows - Notion REST: vault-backed bearer-token path, best for automation workflows that intentionally use the REST API Read: - `docs/references/ai-agents/HOSTED_AGENT_CORE_SETUP.md` - Notion integration guide KB (stored in the registry database; use MCP discover to retrieve) ## 4e) Connect to Multimedia Mastery Treat Multimedia Mastery as a remote media tool surface, not as a generic skill. Canonical discovery surfaces: - `https://multimediamastery.vercel.app/api/media/v1/capabilities` - `https://multimediamastery.vercel.app/api/media/v1/tools` - `https://multimediamastery.vercel.app/api/media/v1/mcp` Remote MCP examples: ```json { "mcpServers": { "multimediaMastery": { "type": "streamable-http", "url": "https://multimediamastery.vercel.app/api/media/v1/mcp", "headers": { "Authorization": "Bearer ${MM_AGENT_TOKEN}" } } } } ``` Codex TOML example: ```toml [mcp_servers.multimediaMastery] url = "https://multimediamastery.vercel.app/api/media/v1/mcp" bearer_token_env_var = "MM_AGENT_TOKEN" ``` Codex-specific note: - Codex does not read the Claude-style nested `headers.Authorization = "Bearer ${MM_AGENT_TOKEN}"` pattern from TOML for hosted MCP auth. - Verify how Codex parsed the entry with `codex mcp get multimediaMastery`. - In this repo, the project-scoped Codex entry stays disabled until `python3 scripts/ops/multimedia_mcp_canary.py` passes with a real token. Current readiness truth: - discovery surfaces are reachable - remote MCP metadata endpoint is reachable - Codex ignored the previous TOML header example, so the project config did not actually send a bearer token - the currently stored bearer token did **not** pass authenticated verification on 2026-03-06 - do not promote Multimedia Mastery above proven lanes until a valid authenticated token succeeds ## 5) Find units by intent Call: `POST /api/discover` with JSON body: ```json { "intent": "...", "types": ["skill", "kb", "recipe"], "limit": 10 } ``` Pick the top matches using `score` + `reason`. Think of the response as a **menu**, not a route. If one strong parent lane fits the ask, start there and branch only as needed. Treat that lane as a good first block, not a mandatory tunnel, and keep the broader registry available whenever the task deserves a wider sweep. ## 6) Inspect and expand graph context Call: `POST /api/traverse` with JSON body: ```json { "ref": "skill:write-newsletter", "depth": 1 } ``` Use weighted links (0-1) + reasons as suggestions, not requirements. Inspect the returned entities, related tools, KB, bundles, and adjacent candidates before choosing what to load. ## 7) Choose blocks, then execute in your runtime Katailyst should help you choose a few high-signal blocks: - a skill or two - supporting KB - maybe a tool, bundle, playbook, or schema The consuming runtime decides sequencing and composition. Katailyst should not over-orchestrate that step. ## 8) Create or improve units (Factory) Call: `POST /api/factory/enrich-draft` Then commit via registry CRUD endpoint (see ENDPOINTS.md). ## 9) Store outputs, traces, and signals (be a good citizen) After executing a workflow: - Store created content in Assets (draft -> approved -> published lifecycle) - Log the run + steps/tool calls so Katailyst can learn and rank better. If you are an external runtime, write back canonically to DB first. Repo mirrors are derived/export surfaces. At minimum, return one or more of: - a durable output artifact - a traceable run outcome - a lightweight quality signal Recommended trace summary: - external or local run ID - final status - start/end timestamps - high-level step notes - tools/models invoked Recommended quality feedback: - pass/fail judgment - rubric or operator signal snapshots - confidence - notable failure class Discovery improvements should stay soft: - suggest tags, links, or stale-surface fixes - do not turn one good local route into a mandatory path for every runtime ## 10) Choose the right surface - **MCP**: direct discovery/query for local hosts such as Claude Code or Codex - **API**: app workflows, authenticated UI flows, and structured automation - **Mirror**: portable skill/KB folder consumption - **Pack/export**: distribution to another host, repo, or runtime ## Errors All endpoints return structured errors with `failure_class` + `recovery_options`. Never rely on raw text parsing of error messages. --- ## Source: docs/runbooks/interop/system-guide.md # Katailyst System Guide > For any AI agent connecting to Katailyst via MCP. Read this first. ## What Is Katailyst? Katailyst is a connected AI knowledge system: a library of 1,500+ composable building blocks connected by a knowledge graph of 10,000+ links. Think of it as an armory of proven, tested, organization-specific capabilities that any AI agent can draw from to produce higher-quality output. These building blocks are organized into atomic unit types: | Type | Role | How agents should treat it | | ---------------------------------- | -------------------------------------------- | ---------------------------------------------------------------------- | | `skill` | Step-by-step methodology | Execution guidance: follow the spirit 80-90%, adapt to circumstances | | `kb` | Reference material, research, facts | Context: inject as background knowledge, do not follow as instructions | | `style` | Brand voice, visual identity rules | Context: apply to keep output on-brand | | `playbook` | Multi-step ordered workflow | Execution guidance: follow the sequence when order matters | | `prompt` | Reusable instruction template | Execution guidance: fill in variables, follow the template | | `tool` | API connection to external service | Execute via `tool.execute` | | `schema` | Structural template for output format | Use as the shape your output should take | | `content_type` | Output format specification | Reference for what the finished artifact looks like | | `recipe` | Specific procedure chaining skills and tools | Execution guidance: a concrete production process | | `rubric` | Quality scoring criteria | Use to grade output quality | | `channel` | Distribution target with format rules | Reference for platform-specific constraints | | `bundle` | Curated kit of related blocks | Convenience grouping, not a mandate | | hub (`kb` tagged `capability:hub`) | Domain front door | Lighthouse: points to 20-30 best blocks for a domain | | `eval_case` | Repeatable test with rubric grading | For measuring agent quality over time | ## Lighthouse, Not Command Center Hubs are lighthouses, not command centers. They point agents in the right direction and suggest what blocks exist for a domain, but they do not mandate a specific path. Similarly, skills are building blocks, not rigid procedures. Follow the spirit and general aim 80-90%, but adapt to the specific situation. ## How Discovery Works When you call `discover`, the system: 1. Embeds your intent into a vector. 2. Searches the graph using multiple weighted signals. 3. Widens the candidate pool beyond the final limit. 4. Reranks with a cross-encoder for precision. 5. Expands through graph links so hubs and dependencies can widen the menu. Write rich intents. The more context you provide — audience, topic, purpose, constraints, what you're trying to build, and where this runs — the better the results. ## How To Compose Blocks - KBs, styles, and brand guides are context. - Skills, playbooks, and recipes are execution guidance. - It is fine to load 10-15 blocks across multiple discover calls. - Mix and match freely. Run multiple skills in parallel, combine KBs with skills, or use only KBs. - Call discover multiple times with different angles: audience, methodology, brand/style, distribution. ## Common Patterns Pattern 1: Single discovery. One `discover` call with a rich intent, then load the top few results. Pattern 2: Multi-angle discovery. Call `discover` three times: audience-focused, skill-focused, style-focused. Combine the best from each. Pattern 3: Hub traversal. `discover` finds a hub, then `traverse` the hub, then load specific blocks from its recommendations. Pattern 4: Sub-agent dispatch. Launch 2-4 sub-agents, each searching a different facet. The parent agent assembles the best blocks. ## What To Tell The User Be transparent about how Katailyst helps: - "I searched the registry and evaluated N building blocks. I'm using [list] because [reason]." - "The registry provided brand voice rules, audience insights, and a proven content structure." - "There are hundreds more blocks available — I can search deeper if you'd like." ## Quality Philosophy - Quality over speed, quality over efficiency. - The agent decides. Blocks are suggestions, not mandates. - More context is better. - Iterate. If the first attempt is not right, search again with a different angle. --- ## Source: docs/runbooks/interop/context-vs-execution.md # Context Bundles Vs Execution Skills Katailyst mixes knowledge bundles, skills, playbooks, styles, tools, and other atomic units. Agents should treat them differently in spirit, without hard orchestration. ## KBs And Similar: Context Use KBs, long-form agent docs, brand voice guides, and product overviews primarily as background: facts, constraints, tone, and "what we believe." Wrap them mentally as "this is for context." They can be mixed, overlapped, or partially ignored when the model has a good reason. ## Skills And Playbooks: Execution Skills and playbooks are method and procedure: steps, checklists, output shapes. Wrap them as "this is for execution." Follow the spirit, roughly 80-90%, rather than literal word-for-word when the situation demands it. ## Composition - Pull several candidates from `discover` or `registry.agent_context`; the ranked list is advisory. - Parallel sub-agents can each run discovery on a different facet, then merge. - Only KBs can still produce good output when the task is explanatory or policy-heavy. - Only skills can work when the task is procedural and context is already in thread. ## Related - `katailyst://docs/interop/orchestrator-workflow` for the end-to-end discovery loop - `katailyst://docs/atomic-units/readme` for entity type reference - `katailyst://docs/system-guide` for the overall MCP system orientation --- ## Source: docs/atomic-units/README.md # Atomic Unit Rules (Index) This folder holds **per-unit rules** so we do not forget standards as the system scales. Each unit has a short, enforceable checklist. Katailyst has more atomic unit types than the current `registry_entities.entity_type` enum. Use these docs to reason about the full system ontology: - **Registry entity types** live in `registry_entities` and drive discovery-first canon. - **Operational and delivery atomic unit types** live in their own tables when their lifecycle is execution- or output-specific (for example, assets and automations). Reality-first rule: - do **not** treat the ontology as a frozen historical count - the current set should follow actual DB/runtime truth - if the system grows a richer lane such as component-heavy styles, design overlays, or operational units with their own lifecycle, update the docs to match reality instead of forcing reality back into an old simpler model ## Units - [Skills](SKILLS.md) - [Tools](TOOLS.md) - [KB](KB.md) - [Prompts](PROMPTS.md) — DB entity_type is `prompt` - [Schemas](SCHEMAS.md) - [Styles](STYLES.md) - [Bundles](BUNDLES.md) - [Recipes](RECIPES.md) - [Content Types](CONTENT_TYPES.md) - [Playbooks](PLAYBOOKS.md) - [Assets](ASSETS.md) — output/delivery atomic unit type (`assets`, `asset_versions`, `publish_events`) - [Actions](ACTIONS.md) — Launchpad cards (playbook-backed, curated via tags/bundles) - [Automations](AUTOMATIONS.md) — scheduled Action runs (operational, replayable) - [Channels](CHANNELS.md) - [Agents](AGENTS.md) - [Eval Cases](EVAL_CASES.md) - [Rubrics](RUBRICS.md) - [Metrics](METRICS.md) - [Lint Rules](LINT_RULES.md) - [Lint Rulesets](LINT_RULESETS.md) Styles deserve a specific note here: in Katailyst, `style` is already broader than pure writing tone. A style can carry verbal overlays, visual system direction, component-set guidance, layout cues, typography, spacing, motion, and brand/system expression when the underlying output contract is still the same. ## Canonical Examples - [Canonical Examples](CANONICAL_EXAMPLES.md) ## Quality Baseline References - [Atomic Unit Standards (Canonical)](../references/contracts/ATOMIC_UNIT_STANDARDS.md) ## Shared Contract - [Unit Package Contract](SHARED_CONTRACT.md) - [Artifacts](ARTIFACTS.md) - [Links](LINKS.md) - [Atomic Unit Decision Matrix](DECISION_MATRIX.md) ## Decision + Checker Surfaces If the hard part is classification or placement, start here before editing units: - [Atomic Unit Decision Matrix](DECISION_MATRIX.md) - [Unit Package Contract](SHARED_CONTRACT.md) - `python3 scripts/registry/lint_unit_packages.py --strict` - `python3 scripts/registry/audit/audit_atomic_unit_positioning.py --report docs/reports/atomic-unit-positioning-latest.json` - `npx tsx scripts/registry/audit/audit_registry_atomic_contract.ts --report docs/reports/registry-atomic-contract-latest.json` Use the matrix for human judgment and the scripts for machine-enforced consistency. Low-surface-area default: 1. read [Atomic Unit Decision Matrix](DECISION_MATRIX.md) 2. read [Canonical Examples](CANONICAL_EXAMPLES.md) if you need a repo-backed shape to copy 3. read the one per-unit doc you actually need 4. only then load deeper references or examples If the job is stewardship rather than authorship, keep the loop small: 1. `skill:skill-creator@v1` for create/import/adapt/perfect work 2. `skill:registry-discovery-primer@v1` for discovery-first shortlist building 3. `playbook:registry-health-scan@v1` for the ordered hygiene loop 4. `playbook:suggest-links@v1` only when graph enrichment is the specific job 5. the scheduled registry-health automation for recurring checks Front-door rule: - keep one front-door index per capability family - push deeper nuance into linked artifacts or linked units - do not create multiple competing "start here" docs for the same family unless they serve different runtimes or jobs Today the package checker is deepest for filesystem-backed `skill` and `kb` mirrors. The DB-canonical audit extends the same decision layer across the full registry so nightly agents can enforce the same tag/link/field expectations even when a unit type does not yet have a package mirror. If a unit doesn’t have rules yet, add a file before creating new items of that type. --- ## Common Unit Anatomy (Consistency Target) Each unit doc should follow a consistent, low‑cognitive‑load structure: 1. **Purpose** (what it is) 2. **When to Use** 3. **Entrypoint** (required file) 4. **Tag Coverage** (minimum tags) 5. **Links** (optional but recommended) 6. **Testing** (if applicable) This makes cross‑type browsing predictable even as the catalog grows. If a capability family needs more nuance than one page can hold, put that nuance in the front-door unit or its linked artifacts first. Do not solve a routing problem by adding another top-level rules page unless the unit type itself is missing. --- ## Source: docs/references/contracts/MIRRORS_AND_PACKS.md # Mirrors and Packs We have two competing forces: - **Canonical truth** (DB) — consistent discovery, indexing, metadata, evaluation. - **Portability** (filesystem) — agents/tools can consume skills without database access. The trick is to pick a canonical source of truth and treat everything else as a _mirror_. Core rule: - **Identity is not the filesystem path.** Identity is the typed ref: `entity_type:code@version`. ## Runtime vs Storage vs Portability These are different layers and should not be conflated: - **Canonical registry truth:** Supabase/Postgres - **Repo mirrors:** portable filesystem views such as `.claude/skills/**` and `.claude/kb/**` - **External runtimes:** OpenClaw/Render agents, hosted app runtimes, LangChain-style orchestrators - **Local host runtimes:** Claude Code and Codex project environments - **Export surfaces:** plugin snapshots, JSON packs, GitHub share payloads Katailyst owns canonical registry truth and portability surfaces. It does **not** own the full runtime behavior of external OpenClaw agents. ## Knowledge Surface Matrix Keep these layers distinct: - **Canonical:** Supabase/Postgres registry state, links, evals, revisions, and metadata - **Mirrors:** `.claude/skills/**`, `.claude/kb/**`, and other deterministic portability surfaces - **Staging:** `incoming/**` and `.claude/skills/imports/**` - **Human docs:** `docs/**` narrative, contract, and runbook material Docs explain the system. They do not replace canonical registry truth. ## Current Org Model - `hlt` is the active operating surface for the live HLT fleet and HLT-specific hosted-agent docs. - `Free (System)` is the shared canonical library/template layer that HLT still uses freely. The internal org code remains `system`. - Read-only discovery surfaces should treat `Free (System)` as a shared layer beneath the active HLT execution context, not as a separate live org boundary. - HLT should be able to read and use shared `Free (System)` canon by default; filesystem placement and org placement are not capability fences. - Repo mirrors can project both `hlt` and `system` material into one filesystem surface. - Filesystem location is not org truth. Supabase org placement is the canonical answer. - Shared flagship hubs can remain in `Free (System)`; HLT fleet-facing front doors and identity overlays should resolve to `hlt`. Reference: - `docs/references/contracts/CURRENT_OPERATING_MODEL.md` ## Canonical vs Mirror - Canonical: Supabase/Postgres - Primary repo mirrors: - `.claude/skills/curated/**` - `.claude/kb/curated/**` - Export surfaces: - `registry-packs/**` - `.claude-plugin/**` ## Consumer Matrix | Consumer | Primary execution surface | What Katailyst provides | What Katailyst does not own | | ------------------------ | --------------------------- | ------------------------------------------------------------------------ | -------------------------------------------------------------------------------------- | | OpenClaw/Render agent | external runtime + gateway | registry discovery, tags, links, evals, vault references, mirrors, packs | runtime loop, Render service behavior, Telegram/Slack delivery, disk-resident identity | | Claude Code | local project runtime | `.mcp.json`, `.claude/**` mirrors, API/docs surfaces | Codex config, OpenClaw runtime behavior | | Codex | local project runtime | `.codex/config.toml`, API/docs surfaces, mirrors/packs | Claude Code config, OpenClaw runtime behavior | | MCP client | MCP session | query/discovery access, canonical DB-backed metadata | app-specific orchestration | | App-native consumer | API + pack/mirror ingestion | canonical discovery, inspection, structured exports | app runtime loop | | LangChain-style consumer | API + pack/mirror ingestion | candidate retrieval, portable units, compatibility docs | chain design and execution policy | ## Why skills mirror to the filesystem “Skills” are often selected semantically (“load the skill that matches this task”), so having them as files is useful for systems like Claude Code and for git-based distribution. ## Why tools/KB/bundles are different - “Tools” (in our registry sense) are runtime capabilities or interfaces. - “KB” and “bundles” are bigger and more varied. They should be exportable, but not necessarily as “skill folders”. The likely shape: - JSON packs for tools/KB/bundles - tag-scoped subsets (marketing-only, research-only, etc.) ## Current Mirrors (Implemented) Filesystem mirrors (curated surface): - Skills: - `.claude/skills/curated/**` - `.claude/skills/curated/manifest.json` - KB: - `.claude/kb/curated/**` - `.claude/kb/curated/manifest.json` Local runtime config surfaces: - Claude Code project MCP config: `.mcp.json` - Codex project MCP config: `.codex/config.toml` These are host-runtime config surfaces, not canonical registry state. Deterministic tooling: - Manifest generation (recursive scan of `**/unit.json`): - `scripts/registry/sync/generate_registry_manifest.py` - Supports `--check` (no-write drift detection) - Manifests are deterministic and intentionally omit timestamps to avoid noisy diffs. - Staged import lint (provenance + portability + contract hygiene): - `python3 scripts/ingestion/lint_skill_imports.py` - Curated unit package lint (unit.json + skill frontmatter hygiene): - `python3 scripts/registry/lint_unit_packages.py` - KB token estimate refresh (derived YAML + `unit.json.length.tokens_est`): - `scripts/registry/sync/refresh_kb_metadata.py` - DB → repo skill mirror sync: - `scripts/registry/sync/sync_skills_from_db.ts` - Supports `--check` drift mode and orphan detection; optional `--prune` cleanup. - Repo → DB skill artifacts import/backfill (canonicalization helper): - `scripts/registry/import/import_skill_artifacts_to_db.ts` - Inserts a new skill revision that populates `entity_revisions.artifacts_json` from a repo skill package. Staging surfaces: - External skill packs: - `.claude/skills/imports/**` (staged, provenance required) - `.claude/skills/imports/manifest.json` (generated; deterministic index of staged units) - duplicate staged packages should be pruned once the same `skill:code@version` is already present in curated mirrors (after writing a provenance snapshot report) Claude.ai portability constraints (skills): - YAML frontmatter keys are strict; keep it minimal and put custom flags under `metadata`. - Skill folder name must match frontmatter `name`. - Description has two variants: - short (Claude-safe) lives in `SKILL.md` frontmatter `description` (<= 200 chars by default) - full lives in `unit.json.derived.description_full` (portable without DB access) - the exact short string is also stored in `unit.json.derived.description_short` - exporter knob: `CATALYST_SKILLS_DESC_MAX` (or `--desc-max`) controls the short cap - Avoid symlinks in portable unit packages. Export tooling will replace symlinks with real files (symlinks are not supported in the mirror contract). ## Tag-scoped subsets (important) We should support: - export only `dept:marketing` - export only `family:discovery` - export only `stage:prototype` This becomes the mechanism for “pull a partial library into this repo”. Selection semantics are shared across mirror tooling and future pack exporters: - Filters are namespaced: `namespace:value` (example: `domain:writing`) - Matching is case-insensitive; stored tags are lowercase - Wildcard allowed: `namespace:*` matches any tag in that namespace - `include-tags`: **AND across namespaces; OR within a namespace** - `exclude-tags`: **any-of** (any match excludes) The canonical definition of this contract lives in `docs/TAXONOMY.md` (“Tag-Scoped Selection Semantics”). ## Decision Matrix (v1 Baseline) This matrix answers one question per unit type: how do we make it portable without creating drift? | Unit type | Canonical | Default portable surface | Repo mirror path (if any) | Why | | ------------ | --------- | -------------------------------- | ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | | skill | DB | Filesystem mirror (unit package) | `.claude/skills/curated////` | Claude-compatible sharing; on-demand loading; git-friendly | | kb | DB | Filesystem unit package | `.claude/kb/curated////` | KB is canonical in DB and mirrored into filesystem packages for portability and runtime-adjacent reads | | tool | DB | JSON pack export | (none by default) | Tools are runtime contracts; better as structured export than folders | | prompt | DB | JSON pack export | (none by default) | Prompts are structured contracts and frequently versioned; pack is easier than mirroring | | schema | DB | JSON pack export | (none by default) | Schemas are contracts; prefer deterministic JSON exports (filesystem mirrors deferred) | | bundle | DB | JSON pack export | (none by default) | Bundles are graph containers; export as pack for external orchestrators | | agent | DB | JSON pack export (later) | (none by default) | Agents are preferences + persona; defer until CMS/runs/traces exist | | playbook | DB | JSON pack export (later) | (none by default) | Ordered patterns; export once playbook execution is real | | recipe | DB | JSON pack export (later) | (none by default) | Bindings of schema/style/constraints; export once CMS editing exists | | style | DB | JSON pack export (later) | (none by default) | Style overlays are small; export when style application is wired | | content_type | DB | JSON pack export (later) | (none by default) | UI presets; export after CMS support | Notes: - “Default surface” does not forbid others; it just means “the first surface we guarantee is correct and drift-checked.” - Skills and KB get filesystem mirrors because the broader ecosystem already expects those shapes and because artifacts matter. - Everything else is portable via packs once the pack format is locked (Phase 3 plan 03-03). ## Practical Sync Note (KB vs Skills vs Tools) We use the same verbs across unit types, but the surfaces differ: - `skill`: DB → filesystem mirror is implemented (`scripts/registry/sync/sync_skills_from_db.ts`). - `kb`: DB → filesystem mirror is implemented (`scripts/registry/sync/sync_kb_from_db.ts`). - Keep derived KB metadata healthy via `scripts/registry/sync/refresh_kb_metadata.py` (token estimates + derived YAML header) and deterministic manifests (`scripts/registry/sync/generate_registry_manifest.py`). - Treat `.claude/kb/curated/**` as a mirror surface (do not hand-edit; update the DB then re-sync). - `tool`/`prompt`/`bundle`/`agent`: portable surface is a JSON pack export (Phase 3 plan 03-03), not folders by default. Reconciliation safety contract (operator default): - Treat DB as canonical and run DB -> repo sync first. - Any repo -> DB write path must use an explicit per-item allowlist and dry-run evidence. - Do not run bulk mirror imports against DB when direct DB edits may have occurred outside the repo. Runtime safety contract: - Do not assume a mirror path is the live runtime source. - Do not assume OpenClaw/Render runtime policy is defined in this repo unless the doc explicitly says so. - Do not replace external-runtime instructions with Katailyst-local instructions; keep them additive. ## Naming warning Because our domain uses the word “tool” heavily: - Avoid using `tools/` as a junk drawer for scripts. - Prefer `scripts/` (deterministic repo scripts), `.claude/hooks/` (Claude hooks), and `docs/runbooks/` (setup + operational runbooks). ## Pack Export Surface (Non-Skill Units) Some unit types are canonical in the DB but are not mirrored as unit-package folders by default. For those, the portability surface is a **deterministic JSON pack export**. Pack files: - Authored manifest: `registry-packs//manifest.json` - Generated export: `registry-packs//pack.json` Exporter tooling: - `npx tsx scripts/distribution/export_registry_packs.ts` - `npx tsx scripts/distribution/export_registry_packs.ts --check` (drift detection, no-write) - `npx tsx scripts/distribution/export_registry_packs.ts --pack ` (export one pack) - `npx tsx scripts/distribution/export_registry_packs.ts --tier 1,2 --portability-mode strict` (tier-scoped + strict portability gate) Selection semantics: - Reuses the canonical tag-scoped selection contract in `docs/TAXONOMY.md`. - Packs may include: - DB-canonical items selected by type/status/tags (tools/prompts/schemas/bundles). - Explicit filesystem mirror pointers (KB/skills) for curated pack exports. Format goals: - Deterministic ordering (stable diffs). - No volatile fields (no timestamps). - No secrets exported (only secret references like `auth_secret_key`). ## Plugin Snapshot Contract `.claude-plugin/` is a generated distribution surface, not a second canonical mirror of every repo-local helper file. Plugin export rules: - DB remains canonical. - `SKILL.md` launchers always export for included skills. - Layered plugin content only exports when it exists in DB-canonical revision artifacts with real text content. - Repo-only helper baggage is intentionally pruned from plugin snapshots instead of being preserved forever. - If a layered file is still wanted in the plugin, repair the DB revision first, then re-export. - `--agents-only` is a narrow refresh mode, not a recovery mode. It requires an existing valid `.claude-plugin/.claude-plugin/katailyst.json` with non-agent slices from a prior full export. ## Phase 14 Distribution Interop (Plugin + Pack + GitHub) Unified contract behavior now applies across plugin/pack export surfaces: - status filters - include/exclude tag filters - tier filters (`1-10`) - compatibility profile selection - portability mode (`strict` or `advisory`) Primary commands: - Plugin export (deterministic snapshot): - `npx tsx scripts/distribution/export_plugin.ts --status curated,published --profile plugin_portable --portability-mode strict` - Pack export (deterministic JSON): - `npx tsx scripts/distribution/export_registry_packs.ts --pack --profile catalyst_enriched --portability-mode advisory` - GitHub share payload (manifest + provenance + PR guide): - `npx tsx scripts/distribution/export_github_distribution.ts --pack ` - `npx tsx scripts/distribution/export_github_distribution.ts --dry-run` - Round-trip validation: - `npx tsx scripts/distribution/validate_distribution_roundtrip.ts --pack ` Validation gates: - `npx tsx scripts/distribution/export_plugin.ts --check` - `npx tsx scripts/distribution/validate-plugin.ts --strict --plugin-dir .claude-plugin` - `npx tsx scripts/distribution/export_registry_packs.ts --check` - `npx tsx scripts/distribution/validate_distribution_roundtrip.ts` ## When To Use Which Surface - **Use `/dashboard-cms/plugins` + `components/plugins/**`\*\* as the operator console for browse, packs, and export decisions. - **Use `.claude-plugin/**`\*\* as the generated portable artifact surface that downstream systems consume. - **Use MCP** when a local host needs live discovery/query access to the canonical registry. - **Use API routes** when an app or service needs structured, authenticated workflows. - **Use mirrors** when a host expects portable local skill/KB folders. - **Use packs/exports** when shipping curated capability sets into another repo, runtime, or marketplace surface. --- ## Source: docs/references/contracts/RUNTIME_OWNERSHIP_AND_CONSUMPTION.md # Runtime Ownership and Consumption This contract exists to stop one recurring mistake: confusing the registry, the mirror, and the runtime. ## Boundary Summary - **Katailyst repo** owns registry state, docs, discovery, CMS/operator surfaces, mirrors, packs, and integration metadata. - **OpenClaw HQ** owns hosted gateway/runtime code for the external fleet. - **Render-hosted agents** own deployed runtime state, persistent disks, and live session behavior. ## Core Model - **Katailyst** is the canonical registry, discovery surface, and portability/export control plane. - **Supabase/Postgres** is the canonical data store for registry entities, links, tags, evals, and revision state. - **Repo mirrors** are portability surfaces for skills, KB, and exportable artifacts. - **Runtime consumers** execute elsewhere: OpenClaw/Render agents, Claude Code, Codex, MCP clients, app runtimes, and future orchestrators. Katailyst should help agents find and package the right blocks. The consuming runtime should decide how to combine, sequence, and execute them. ## Current Org Model - `hlt` is the active operating layer for the live HLT fleet and HLT-specific runtime doctrine. - `Free (System)` is the shared canonical library/template layer that HLT still uses freely. The internal org code remains `system`. - `Free (System)` is not a second live business org and not an exclusion boundary. - Read-only helper paths should prefer local HLT rows and fall through to shared `Free (System)` rows when both are valid. - HLT should be able to read and use shared `Free (System)` canon by default; org placement is not meant to limit access to shared capabilities. - Thick reusable hubs can stay in `Free (System)` even when they are heavily used by HLT. - Front-door SOPs, identity overlays, and HLT-specific helper support surfaces should live in `hlt`. - Mutation paths remain scoped to the active execution org unless a narrower contract says otherwise. - Render/OpenClaw still owns runtime behavior outside Katailyst, so partial observability is expected. Reference: - `docs/references/contracts/CURRENT_OPERATING_MODEL.md` ## Ownership Boundaries ### Katailyst owns - canonical atomic-unit records - discovery, traversal, ranking, and graph metadata - tags, links, evals, and quality signals - mirror/export contracts - operator-facing registry and CMS tooling - vault references and canonical integration metadata - numbered SQL schema evolution in `database/` - repo-hosted API, docs, and CMS code in `app/`, `components/`, and `lib/` - local host-runtime config surfaces such as `.mcp.json` and `.codex/config.toml` ### OpenClaw-HQ / external runtime owns - fleet runtime behavior - Render services and gateway behavior - Slack and Telegram execution - disk-resident identity, soul, and runtime setup - agent-specific session policy and live delivery behavior - host-specific orchestration loops ### Local host runtime owns - Claude Code project config in `.mcp.json` - Codex project config in `.codex/config.toml` - local developer-side MCP/auth state - host-specific orchestration and tool invocation policy ## Consumer Matrix | Consumer | Reads from Katailyst | Executes where | Typical use | | ------------------------ | ---------------------------------------- | ------------------------- | ----------------------------------- | | OpenClaw agent | DB, API, mirrors, packs | external runtime / Render | Slack, Telegram, hosted agent flows | | Claude Code | docs, API, mirrors, `.mcp.json` | local Claude Code session | local coding and agent workflows | | Codex | docs, API, mirrors, `.codex/config.toml` | local Codex session | local coding and operator workflows | | MCP client | MCP + docs | client host | discovery/query heavy integrations | | App-native consumer | API + packs | app host | UI workflows and product features | | LangChain-style consumer | API + mirrors/packs | external orchestrator | custom chain/graph execution | ## Execution Stance Katailyst should provide: - a menu of good candidates - compatibility hints - tags, links, and supporting context - portable exports when needed Katailyst should not force: - a rigid fixed sequence - a single mandatory orchestration route - one hardcoded block count - one host-specific runtime model for every consumer ## Surface Selection - **MCP**: direct discovery/query for local or tool-driven hosts - **API**: structured authenticated workflows - **Mirror**: portable local folder consumption - **Pack/export**: curated distribution into another runtime or repo ## Guardrails 1. DB stays canonical. 2. Mirrors are derived. 3. External runtimes remain autonomous. 4. Dual-runtime guidance is additive-only. 5. Do not treat repo-local paths as proof of live runtime ownership. ## Practical Rule When a developer sees runtime-specific behavior, start by asking whether it belongs to Katailyst, OpenClaw HQ, or the Render fleet before assuming the repo is the source of truth. --- ## Source: docs/AGENT_READINESS_CHECKLIST.md # Agent Readiness Checklist (Execution-Grade) Status: Active Updated: 2026-02-18 This checklist is the fastest safe path for onboarding new agents/operators and keeping atomic-unit quality high while scaling imports. Canonical policy still lives in: - `docs/RULES.md` - `docs/VISION.md` - `docs/TAXONOMY.md` - `docs/atomic-units/README.md` Use this doc as the execution layer. --- ## 1) Non-Negotiables (Must Confirm First) - [ ] DB is canonical; repo files are mirrors for portability. - [ ] Discovery is menu-first and continuation-first (no hard route locking). - [ ] Guidance-first governance: warnings by default, hard blocks only for safety/compat/export integrity. - [ ] No secrets in docs, mirrors, scripts, or unit artifacts (Vault pointers only). - [ ] New work remains composable (no arbitrary limits on skill/tool chaining). --- ## 2) 30-Minute New-Agent Onboarding Path ### Read in this order - [ ] `AGENTS.md` - [ ] `docs/RULES.md` - [ ] `docs/VISION.md` - [ ] `docs/TAXONOMY.md` - [ ] `docs/runbooks/interop/registry-api-contract.md` - [ ] `docs/runbooks/interop/mcp-ai-sdk-adapter.md` - [ ] `docs/references/contracts/AGENT_AUTONOMY_DISCOVERY.md` - [ ] `docs/atomic-units/ARTIFACTS.md` ### Runtime preflight - [ ] `git status --short --untracked-files=all` - [ ] `git log -5 --oneline` - [ ] `pnpm typecheck` - [ ] `pnpm lint` ### Quick capability check - [ ] Run one broad discover query (non-domain-specific) - [ ] Run one domain-scoped query (for example, NCLEX-specific) - [ ] Demonstrate cursor continuation at least once - [ ] Demonstrate tool/entity inspection (`get`/`inspect`) on one result --- ## 3) Atomic Unit “Definition of Done” Baseline For every new/updated unit (skill/tool/kb/prompt/schema/etc): ### Identity & lifecycle - [ ] `entity_type`, `code`, `version`, `status`, `org_id` are intentional and valid. - [ ] `current_revision_id` points to a valid revision. ### Metadata quality - [ ] `name`, `summary`, and `use_case` are specific (not placeholder text). - [ ] Priority/tier metadata is present where ranking relies on it. ### Taxonomy - [ ] Required namespaces for the unit type are present (`docs/TAXONOMY.md`). - [ ] Tags are lowercase, namespaced, and non-duplicative. - [ ] No shadow synonyms that split discoverability. ### Links/contracts - [ ] Typed refs resolve (`entity_type:code@version` when applicable). - [ ] Link types/weights/reasons are coherent. ### Artifacts - [ ] Artifacts follow canonical folders/types (`rules`, `examples`, `tests`, etc.). - [ ] At least one realistic example exists where applicable. - [ ] No executable surprise scripts (script intent documented explicitly). ### Tests/evals - [ ] Unit-level checks added or updated. - [ ] Contract-sensitive changes include regression coverage. --- ## 4) Factory Lifecycle Checklist (Import → Curate) Follow lane runbooks: - Intake/normalization: `docs/runbooks/factory/import-normalization.md` - Optimization/evals: `docs/runbooks/factory/optimization-ab-validation.md` - Promotion/rollback: `docs/runbooks/factory/promotion-rollback.md` - Incident handling: `docs/runbooks/factory/incident-response-failed-runs-exports.md` ### Core command bundle ```bash python3 scripts/registry/lint_unit_packages.py --strict python3 scripts/ingestion/lint_skill_imports.py --strict npx tsx scripts/registry/sync/sync_curated_skills_to_db.ts --check npx tsx scripts/registry/audit/audit_skill_lifecycle.ts npx tsx scripts/distribution/export_plugin.ts --check npx tsx scripts/distribution/validate-plugin.ts --strict --plugin-dir .claude-plugin npx tsx scripts/distribution/export_registry_packs.ts --check pnpm typecheck pnpm lint pnpm test:run ``` --- ## 5) Retrieval Quality Guard (Avoid Domain Monoculture) When broad intents over-focus on one domain (for example NCLEX): - [ ] Confirm query intent is actually broad vs domain-specific. - [ ] Run facet-aware refinement instead of hard-locking one route. - [ ] Use scoped filters (`tags`, `bundles`, app/org scope) before adding hard exclusions. - [ ] If exclusions are needed, keep them request-local and reversible. - [ ] Preserve autonomy: never force single-domain behavior globally. Recommended operator pattern: 1. broad discover 2. inspect facets/results 3. refined discover with scope hints 4. continue via cursor until confidence threshold reached --- ## 6) KB Quality Bar (No “fortune-cookie” docs) A KB variant should pass **at least 2**: - [ ] Contract clarity (inputs/outputs/invariants) - [ ] Decision rules (what to choose and why) - [ ] Failure modes + recovery - [ ] Real examples (copy/adapt ready) - [ ] Verification checks - [ ] Precise repo pointers ### KB maintenance commands ```bash python3 scripts/registry/sync/refresh_kb_metadata.py pnpm registry:e2e:audit -- --limit 200 ``` --- ## 7) Schema-First + Visual Preview Dual-Output (CodeSandbox) When a response should be visually presented (social preview, landing page draft, qbank visual mapping, research visualization): - [ ] Run normal canonical content flow first (schema-valid content for the target type). - [ ] Save canonical content in Katailyst records (not raw preview HTML). - [ ] Build CodeSandbox preview as a visual share layer for the user/stakeholder. - [ ] Keep freestyle available; recipes are optional accelerators. - [ ] If the preview pattern is exceptional, save it as a registry artifact on the relevant entity. --- ## 8) DB + Mirror Workflow (KB-specific) Use this sequence when KB mirror content was improved and needs canonicalization: ```bash # 1) Preview DB writes from repo mirror improvements npx tsx scripts/registry/backfill/migrate_kb_full_variants_from_repo_mirror.ts --dry-run # 2) Apply full-variant backfill where eligible npx tsx scripts/registry/backfill/migrate_kb_full_variants_from_repo_mirror.ts --apply # 3) Ensure distilled variants exist (distilled-first lens) npx tsx scripts/registry/backfill/backfill_kb_distilled_variants.ts --dry-run npx tsx scripts/registry/backfill/backfill_kb_distilled_variants.ts --apply # 4) Upsert targeted KB variants/priority/rating from repo mirror when needed npx tsx scripts/registry/backfill/upsert_kb_variants_from_repo_mirror.ts --codes , --dry-run npx tsx scripts/registry/backfill/upsert_kb_variants_from_repo_mirror.ts --codes , --apply --set-priority-tier 1 --set-rating 92 # 5) Validate no drift remains pnpm registry:e2e:audit -- --limit 200 ``` If any step fails, stop and record: - command - error - affected refs/codes - remediation owner --- ## 9) Promotion Readiness Gate (Before Curated/Published) - [ ] Compatibility checks pass for selected profile. - [ ] Taxonomy + link integrity checks pass. - [ ] Examples/tests are present and meaningful. - [ ] Rollback plan is documented. - [ ] Monitoring owner and KPI are assigned. Use: `docs/references/skills/SKILL_FACTORY_GOVERNANCE_CHECKLIST.md` Also review: `docs/references/contracts/ATOMIC_UNIT_STANDARDS.md` --- ## 10) Weekly Hygiene Cadence (Recommended) - [ ] Run lifecycle + lint audits. - [ ] Prune duplicate staged imports after snapshotting evidence. - [ ] Review retrieval quality for broad intents and adjust weighting/scoping policy only with evidence. - [ ] Re-check docs drift when interfaces/contracts change. --- ## 11) Session Output Standard (What every agent run should leave behind) - [ ] What changed (files + refs) - [ ] Why it changed (goal + rationale) - [ ] Verification evidence (commands/tests/results) - [ ] Any residual risk + explicit next step This keeps handoffs reliable across human operators and autonomous agents. --- ## Source: docs/TAXONOMY.md # Taxonomy (Namespaced Tags + Coverage Rules) This is the **canonical taxonomy guide**. It defines what tags exist, how they are used, and what minimum coverage is required before any unit is promoted from _staged_ → _curated_. If a tag or namespace is missing, **add it here first** before using it in code, CMS, or scripts. --- ## Principles 1. **One shared taxonomy across all atomic units.** 2. **No uncategorized sprawl.** Units without required tags stay staged. 3. **Soft signals, not gates.** Tags guide discovery and ranking; they do not hard‑block use. 4. **Keep it lean.** Add namespaces only when they improve discovery or filtering. 5. **Tags live in `unit.json`** for file-based packages and in the DB for canonical storage. 6. **Folders are a view.** Use tags + refs for identity; nested paths should mirror `domain:*` for humans only. 7. **Discovery is graph‑first.** Tags + weighted links drive exploration; no hard gates. 8. **Taxonomy evolves with the live registry.** Seeds and early mirrors are starting points, not a frozen contract. Promote tags that materially improve discovery, sorting, and trust once they are established in real curated usage. --- ## Required Namespaces (Global) These are expected to exist in the DB and **seeded in `database/002-seed-data.sql`**: - `action:*` — what the unit _does_ (research, plan, write, edit, analyze, build) - `capability:*` — cross-unit capability hints (discover, compose, enrich, evaluate, deploy, monitor, route) - `format:*` — packaging shape or delivery form (reference, guideline, spec, policy, best_practice, plan, operational-log, index, discovery-map, workflow) - `channel:*` — delivery channels (web, email, pdf, etc.) - `stage:*` — lifecycle or workflow stage (planning, drafting, review, published) - `modality:*` — text, image, video, audio, interactive, structured - `domain:*` — subject area (nursing, dental, general, etc.) - `topic:*` — cross-cutting concepts (hooks, citations, onboarding, etc.). **Do not use `topic:*` for every subject-matter concept.** Fine-grained subjects belong in DB `topic_maps` (see below). - `audience:*` — target audience segments - `segment:*` — market segments (optional; prefer `audience:*` unless needed) - `dept:*` — department (marketing, product, engineering, operations) - `app:*` — product/app identifier (optional; used for org/app scoping) - `product:*` — product line identifiers (optional) - `exam:*` — exam/cert identifiers (optional; education vertical) - `brand:*` — brand identifiers (optional) - `persona:*` — persona identifiers (optional) - `scope:*` — org/global/app - `source:*` — provenance/source family (internal, github, skills.sh, external, cms, npm, factory, legacy, vercel) - `partner:*` — publisher/maintainer (e.g., `partner:supabase`, `partner:vercel`) - `status:*` — staged, curated, published, deprecated, archived - `tier:*` — priority tier - `priority:*` — priority labels (optional; prefer `tier` field when possible) - `tool_type:*` — for tools only (mcp, http, internal, sdk, cli) - `provider:*` — tool provider (supabase, vercel, openai, anthropic, etc.) - `protocol:*` — integration protocol hints (mcp, http, sql, sdk, webhook, event) - `platform:*` — platform identifiers (optional; e.g., ios, android, web) - `runtime:*` — execution compatibility/runtime targets (openclaw, render, claude-code, local-macos, web) - `risk:*` — risk labels (optional; for governance/routing) - `surface:*` — delivery surface (runtime/cms/api/tooling plus product surfaces like `app-home`, `web-blog`, etc.) - `family:*` — **primary grouping** inside a unit type (what lane it belongs to for discovery and composition) - `bundle_type:*` — bundle semantics (context_bundle, research_kit, eval_kit, starter_kit, pack) - `initiative:*` — strategic initiative identifiers (optional) - `persona_role:*` — agent persona role taxonomy (agents only) - `agent_kind:*` — fleet persona vs project subagent vs imported agent-pattern taxonomy (agents only) - `system:*` — high‑level system grouping (e.g., corporate infrastructure). Prefer the `system` field in `unit.json`; tag is optional for faceting. **Guideline:** use the `system` field for official infra grouping (e.g., `HLT Corp Infra`), and keep global units system‑free. Use `scope:global` for general references and `scope:org` for system‑scoped content. --- ## Canonical Label Ranges (v1) These are the current **canonical** tag codes this repo expects to use. Changing or adding codes here is a **taxonomy change**: - Update this file first (`docs/TAXONOMY.md`) - Update DB seed mirrors (`database/002-seed-data.sql`) - If the DB is already seeded, add a migration/patch SQL file under `database/` and run it ### action:\* `research`, `plan`, `write`, `edit`, `analyze`, `build`, `review`, `publish`, `design`, `optimize`, `ingest`, `evaluate`, `execute`, `align` ### capability:\* `discover`, `compose`, `enrich`, `evaluate`, `deploy`, `monitor`, `route` ### stage:\* `planning`, `drafting`, `review`, `published`, `ingestion`, `testing`, `maintenance` ### format:\* `reference`, `guideline`, `spec`, `policy`, `best_practice`, `concept`, `plan`, `document`, `style_guide`, `how_to`, `guide`, `persona_profile`, `operational-log`, `web_page`, `index`, `discovery-map`, `workflow` ### modality:\* `text`, `image`, `video`, `audio`, `interactive`, `structured` ### tool_type:\* `mcp`, `http`, `internal`, `sdk`, `cli`, `sql`, `python`, `bash` ### protocol:\* `mcp`, `http`, `sql`, `sdk`, `webhook`, `event` ### scope:\* `global`, `org`, `app` ### source:\* `internal`, `github`, `skills.sh`, `external`, `cms`, `npm`, `factory`, `legacy`, `manual`, `vercel` ### dept:\* `marketing`, `product`, `engineering`, `operations` ### channel:\* `app`, `discord`, `email`, `facebook`, `instagram`, `linkedin`, `pdf`, `pinterest`, `podcast`, `push`, `reddit`, `rss`, `slack`, `slides`, `sms`, `snapchat`, `threads`, `tiktok`, `web`, `whatsapp`, `x`, `youtube` ### topic:\* `hooks`, `headings`, `cta`, `citations`, `tone`, `distribution`, `experiments`, `measurement`, `agent-patterns`, `product-insight`, `lessons` ## Topic Tags vs Topic Maps (Important) We maintain **two layers** on purpose: 1. **`topic:*` tags** are a small, stable set of cross-cutting concepts that help discovery across many units (writing craft, measurement, distribution, etc.). 2. **DB `topic_maps`** are for subject-matter topics that can grow large (education topics, clinical topics, exam subdomains, etc.). Examples of **topic maps** (not tags): - `nursing.hypertension-medications` - `nursing.career.first-job` - `dental.inbde.biochemistry` Guideline: - Avoid “junk drawer” tags like `topic:misc`. Use `domain:general` for broad/general material, or keep the unit `status:staged` until it is categorized. ### persona_role:\* `researcher`, `editor`, `strategist`, `builder`, `analyst`, `reviewer` ### agent_kind:\* `fleet_persona`, `project_subagent`, `imported_pattern` ### runtime:\* `openclaw`, `claude-code`, `local-macos`, `render`, `web` ### surface:\* `runtime`, `cms`, `cms-launchpad`, `cms-registry`, `cms-content`, `cms-factory`, `cms-test-lab`, `cms-eval-lab`, `cms-settings`, `ui`, `api`, `repo`, `tooling`, `ingestion`, `eval`, `orchestration`, `app`, `app-home`, `app-onboarding`, `app-upgrade`, `app-practice`, `app-results`, `web-landing`, `web-pricing`, `web-blog`, `web-docs`, `chat`, `social`, `document`, `email`, `email-onboarding`, `email-transactional` ### family:\* Family tags are the **primary browsing bucket inside a unit type**. They are intentionally coarse to support hierarchical discovery (menu first, then refine via tags/links). Use `family:*` for the functional lane and `format:*` for the packaging shape. - `family:*` answers: "What kind of lane is this?" - `format:*` answers: "How is this unit packaged?" Example: - `victoria-front-door-index` -> `family:helper-companion` + `format:index` - `agent-lessons-victoria` -> `family:agent-files` + `format:best_practice` - `global-team-context` -> `family:agent-files` + `format:reference` - `linear-planning-methodology` -> `family:agent-files` + `format:reference` - `agent-foundation-spec` -> `family:agent-files` + `format:spec` - `context-engineering-methodology` -> `family:agent-files` + `format:reference` - `katailyst-spec-index` -> `family:agent-files` + `format:spec` - `katailyst-overhaul-plan` -> `family:planning` + `format:plan` - `make-social` -> `family:social` + `format:workflow` - `tools-guide-overview` -> `family:tool-selection` + `format:guide` They are also allowed to reflect the **dominant discovery lane** for a unit when that is what operators actually browse for. In practice, that means the live registry may use families that overlap with `format:*`, `modality:*`, or `channel:*` when those are the fastest way for humans to find the right unit. That is acceptable as long as the tag improves browse quality and is used consistently. Preferred **core families** for new curation: - `research` - `planning` - `strategy` - `drafting` - `editing` - `review` - `analysis` - `evaluation` - `testing` - `ingestion` - `design` - `development` - `infrastructure` - `orchestration` - `operations` - `compliance` - `communications` - `social` - `message` - `article` - `qbank` - `study_guide` - `visual` - `presentation` - `report` - `web` - `data` - `storage` - `identity` - `observability` - `collaboration` - `media` - `commerce` - `front-door` - `helper-companion` - `agent-doctrine` - `agent-files` - `tool-selection` - `marketing-strategy` - `brand-voice` - `multimedia` - `deployment` - `writing` - `meeting-prep` Allowed **discovery extensions** when the narrower lane is what operators actually browse for: - `social` - `message` - `article` - `qbank` - `study_guide` - `visual` - `presentation` - `report` - `web` - `audio` - `interactive` - `memory` - `utility` - `integration` - `writing` - `meeting-prep` Use the extensions only when the broader family plus `format:*`, `modality:*`, `channel:*`, or `surface:*` would make the row harder to find. Default to the core set when in doubt. **Key disambiguations** - `family:orchestration` is not the same as `dept:operations`. - `dept:*` is “who owns it in the org” - `family:*` is “what lane it belongs to for discovery and composition” - `family:agent-files` is the browse lane for reusable agent-facing operating context. - Use it for lessons, principles, architecture/spec notes, team context, workspace/method references, and durable repo/operator orientation references that should stay loadable later as durable context. - Do not use it for dated recaps. Those stay `format:operational-log`. - Do not use it for shared operating doctrine such as mission, research posture, or the core Katailyst usage rules. Those stay `family:agent-doctrine`. - `family:*` does not replace `format:*`, `modality:*`, `channel:*`, or `surface:*`. - Example: a unit may legitimately be `family:article` + `format:article` + `channel:web` - Example: a sidecar media flow may be `family:media` + `modality:image` + `surface:web-blog` - For multimedia sidecar interoperability, keep sidecar-native media tags in the sidecar and map them into Katailyst discovery axes intentionally. Contract: `docs/references/contracts/MULTIMEDIA_SIDECAR_TAG_INTEROP.md` **Family -> Dept Mapping (Phase 46.3 Canonical Backfill Map)** Use this table for deterministic `dept:*` backfill when curated/published entities are missing department tags. Multi-dept tagging is allowed when an entity has multiple mapped `family:*` tags. - `family:development`, `family:infrastructure`, `family:testing`, `family:data`, `family:storage`, `family:identity`, `family:observability`, `family:agent-files` -> `dept:engineering` - `family:communications`, `family:media`, `family:design`, `family:drafting`, `family:editing`, `family:commerce`, `family:social`, `family:message`, `family:article`, `family:visual`, `family:presentation`, `family:audio`, `family:writing` -> `dept:marketing` - `family:research`, `family:planning`, `family:analysis`, `family:strategy`, `family:evaluation`, `family:review`, `family:qbank`, `family:study_guide` -> `dept:product` - `family:orchestration`, `family:operations`, `family:compliance`, `family:ingestion`, `family:collaboration`, `family:integration`, `family:utility`, `family:meeting-prep` -> `dept:operations` - `family:memory` -> `dept:engineering` Families that are **valid for discovery** but should not be used for blind dept backfill without stronger context: - `family:web` - `family:interactive` - `family:report` Backfill compatibility aliases (legacy families seen in repo mirrors/scripts): - `family:dev` -> `family:development` - `family:ops` -> `family:operations` - `family:messaging` -> `family:message` - `family:social-media` -> `family:social` - `family:design-system` -> `family:design` - `family:architecture` -> `family:development` - `family:integration`, `family:utility` remain valid canonical families because they are already materially used in live discovery **Runtime Compatibility Mapping (Phase 46.4 Canonical Backfill Map)** Use this table for deterministic `runtime:*` backfill when curated/published units are missing execution-compatibility tags. Multi-runtime tagging is allowed when an entity legitimately works in multiple environments. - `runtime:openclaw` -> OpenClaw-executed or OpenClaw-specific units, including deployed external agents and ClawHub/OpenClaw workflow assets. - `runtime:claude-code` -> Claude Code plugins, hooks, slash commands, subagents, or workflows whose execution model depends on Claude Code. - `runtime:local-macos` -> device-bound macOS/iOS automation such as Apple Notes, Reminders, Bear, iMessage, BlueBubbles, or other local-app/CLI workflows. - `runtime:render` -> deployed services or agent runtimes whose execution entrypoint is hosted on Render. - `runtime:web` -> browser/API/network-native workflows usable from web-connected runtimes (search, crawling, HTTP APIs, Vercel-hosted tools, browser automation). **Tool families (canonical v1)** For tools specifically, these families are the top-level menu we expect to hold up at 150+ tools: - `family:research`: web search, scraping, retrieval, extraction, doc-to-markdown - `family:development`: sandboxes, code execution, codegen, repo tooling, builds/previews - `family:infrastructure`: hosting/deploy/runtime platforms, containers, infra primitives - `family:orchestration`: workflow engines, async trigger/poll, scheduling, integration glue - `family:testing`: test harnesses, validation suites, E2E runners, sandboxed verification - `family:communications`: email/sms/slack/push, notification fanout - `family:data`: databases, analytics, SQL/BI, vector DB ops - `family:storage`: file/object storage, assets, CDN-backed media stores - `family:identity`: auth, user/session/permissions, org membership - `family:observability`: logs/metrics/traces, cost telemetry, eval harnesses - `family:collaboration`: canvases and PM tools (Miro/Trello/Linear), task routing - `family:media`: image/audio/video generation, transforms, transcription/TTS - `family:commerce`: billing/subscriptions/checkout, payments, metering ### format:\* `standard`, `explainer`, `how_to`, `listicle`, `compare_contrast`, `deep_dive`, `news`, `exam`, `walkthrough`, `career`, `procedural`, `research`, `summary`, `cheat_sheet`, `mind_map`, `interactive`, `newsletter`, `drip`, `transactional`, `short`, `single`, `story`, `professional`, `discussion`, `question`, `snap`, `spotlight`, `pin`, `idea_pin`, `shorts`, `longform_video`, `voiceover`, `podcast`, `slide_deck`, `workshop`, `keynote`, `landing`, `product`, `campaign`, `document`, `practice_exam`, `qbank_item`, `assessment`, `mnemonic`, `coverage_map`, `knowledge_graph`, `onboarding`, `upgrade`, `welcome`, `message_sequence`, `persona_profile`, `reference`, `guideline`, `style_guide`, `best_practice`, `spec`, `policy`, `api_reference`, `article`, `operational-log`, `web_page`, `index`, `discovery-map`, `workflow` ### bundle_type:\* `context_bundle`, `research_kit`, `eval_kit`, `starter_kit`, `pack` ### domain:\* `nursing`, `np`, `dental`, `allied-health`, `military`, `real-estate`, `graduate`, `general`, `ai-sdk`, `ai-engineering`, `writing`, `design`, `data`, `database`, `mcp`, `marketing`, `engineering`, `operations`, `product`, `education`, `cms` ### status:\* `staged`, `curated`, `published`, `deprecated`, `archived` ## Minimum Tag Coverage (Promotion Rules) Units missing required tags stay **staged** until corrected. ### Dept Coverage Policy (Phase 46.3) - Curated/published units should include at least one `dept:*` tag **when department ownership is part of operator-facing discovery for that entity type**. - Current enforcement is advisory-first in `scripts/registry/lint_unit_packages.py` to avoid blocking active migrations. - Phase 47 target: promote dept coverage from advisory to strict promotion gate once backfill exceptions are resolved. - If a unit is intentionally cross-functional, include multiple `dept:*` tags rather than omitting department context. - Type-aware default: - `skill`, `tool` -> eligible for deterministic family-based backfill - `kb`, `agent` -> manual review lane; do not infer blindly from family alone - structural/supporting unit types (`bundle`, `channel`, `content_type`, `recipe`, `style`, `schema`, `prompt`, `metric`, `rubric`, `lint_rule`, `lint_ruleset`, `playbook`) are not counted as missing by default in the phase-46 cleanup audits unless explicitly promoted into department-facing browse surfaces - The goal is discovery quality, not synthetic completeness. We would rather show a smaller set of accurate dept labels than inflate audit debt for entity types where `dept:*` is not the right primary signal. ### Runtime Coverage Policy (Phase 46.4) - Curated/published skills and agents should include at least one `runtime:*` tag when execution compatibility can be inferred confidently. - Current enforcement is audit/backfill-first so imported inventories can be normalized without blocking active curation. - Discovery/ranking should treat `runtime:*` as a primary trust signal: - promote viable HLT/OpenClaw/web/runtime-compatible units, - demote local-only or environment-mismatched units in default browse results. - Phase 47 target: promote runtime coverage from cleanup lane to advisory lint coverage once backfill exceptions are reviewed. ### Skills Required: - `action:*` - `stage:*` - `modality:*` - `scope:*` - One of: `domain:*` **or** `audience:*` - `family:*` Recommended: - `dept:*` or `app:*` when org‑specific - `runtime:*` when execution environment matters - `source:*` (always add for imports) - `partner:*` (recommended for imported skills) - `tier:*` (priority) - `status:*` (staged/curated/published) **Tag count guidance:** 6–12 tags is typical. Avoid >15 unless truly cross‑cutting. ### Tools Required: - `tool_type:*` - `provider:*` - `surface:*` - `modality:*` - `scope:*` Recommended: - `action:*` (what it enables) - `protocol:*` (how it integrates; optional but useful for discovery/routing) - `runtime:*` when tool execution is runtime-specific - `status:*` - `source:*` **Tag count guidance:** 5–10 tags is typical. ### KB Items Required: - `format:*` (reference/guideline/best_practice/persona_profile for persona KBs) - `domain:*` or `audience:*` - `scope:*` - `source:*` Recommended: - `dept:*` / `app:*` - `runtime:*` when KB guidance is environment-specific - `status:*` **Priority tiers:** use the numeric `tier` field in `unit.json` (1–10). `tier:*` tags are optional and should be used only for search filters. **Tag count guidance:** 5–10 tags is typical. ### Schemas / Content Types Required: - `format:*` - `modality:*` - `scope:*` Recommended: - `family:*` - `channel:*` (if tied to a channel) ### Bundles / Playbooks / Recipes Required: - `bundle_type:*` - `scope:*` - `status:*` Recommended: - `domain:*`, `audience:*`, `dept:*` --- ## Naming + Tagging Workflow (Recommended) When adding a new unit: 1. **Search first** (skills.sh + local registry) to avoid duplicates. 2. **Pick a code** (unprefixed slug) and a clear display name. - Code must be lowercase and colon‑free (no `entity_type:` prefix). - **KBs + skills:** kebab‑case. - **Tools/schemas:** may include `_` or `.` when part of provider IDs. 3. **Assign required tags** (use the coverage rules above). 4. **Add `source:*`** for provenance (imported vs internal). 5. **Add `status:staged`** until tested and reviewed. **Guideline (not a rule):** keep names plain and descriptive so external orchestrators can route correctly. Promotion criteria: - Required tags present - Basic test fixture exists (if executable) - Description is trigger‑grade ## Lint Tool (Repo Surface) Use the repo linter to prevent taxonomy drift in curated unit packages: - `python3 scripts/registry/lint_unit_packages.py` This scans `.claude/*/curated/**/unit.json` and enforces: - code invariants (colon-free, folder matches code) - required tag namespaces per unit type - status/tier sanity - basic system/scoping coherence (e.g., `hlt-corp-infra` units set `system: HLT Corp Infra`) --- ## Tag-Scoped Selection Semantics (Tooling Contract) Mirror and pack tooling supports tag-scoped selection via `--include-tags` and `--exclude-tags`. This contract is shared across unit types to keep filtering predictable for humans and external orchestrators. Rules: - Filters must be namespaced: `namespace:value` (example: `domain:writing`). - Matching is case-insensitive, but stored tags are lowercase. - Wildcard filters are allowed: `namespace:*` matches if the unit has any tag in that namespace. - `exclude-tags` is **any-of**: if any exclude filter matches, the unit is excluded. - `include-tags` is **AND across namespaces; OR within a namespace**: - If you specify tags in multiple namespaces, the unit must match at least one tag in each namespace. - Within the same namespace, matching any provided value is sufficient. Examples: - Include global writing units: `--include-tags scope:global,domain:writing` - Include either design or writing (but still global): `--include-tags scope:global,domain:design,domain:writing` - Exclude anything in a namespace: `--exclude-tags status:*` (rare; prefer explicit values) - Exclude staged items: `--exclude-tags status:staged` Notes: - This contract is about _selection_, not _ranking_. Discovery still uses soft weights. - For legacy “OR-bag” semantics, use explicit discovery calls (Phase 2) rather than mirror tooling. ## CMS Mapping (Facets) The CMS should expose a consistent set of filters: - Unit type - `family:*` - `action:*` - `runtime:*` - `stage:*` - `domain:*` / `audience:*` - `modality:*` - `runtime:*` - `dept:*` / `app:*` - `status:*` - `tier:*` - `source:*` - `partner:*` This keeps discovery predictable and avoids the “everything jams into 3 buckets” failure. --- ## Source: docs/BLUEPRINT.md # Catalyst: Atomic Units for AI Agents ## Master Project Blueprint **Version:** 2.1 **Last Updated:** 2026-03-06 **Status:** Live architecture/reference surface. Use this file, `docs/VISION.md`, and generated execution receipts in `docs/reports/` for current context. > Treat this file as the architecture and system-reference layer. Deeper phase-by-phase walkthrough sections later in the document are retained as historical implementation context, not the active execution checklist. --- ## Table of Contents 1. [Project Status Dashboard](#project-status-dashboard) 2. [Vision & Philosophy](#vision--philosophy) 3. [Architecture Overview](#architecture-overview) 4. [Atomic Units Reference](#atomic-units-reference) 5. [Database Schema](#database-schema) 6. [Roadmap and Planning](#roadmap-and-planning) 7. [Tooling and Priming](#tooling-and-priming) 8. [Discovery API](#discovery-api) 9. [Runs and Evaluation](#runs-and-evaluation) 10. [Skills Management](#skills-management) 11. [Content System](#content-system) 12. [API Reference](#api-reference) 13. [Risk Register & Mitigations](#risk-register--mitigations) 14. [File Structure](#file-structure) 15. [Appendices](#appendices) --- ## Project Status Dashboard ### Phase Completion | Phase | Name | Status | Progress | Blocker | | ----- | ------------------- | -------- | -------- | ------- | | 0 | Tooling + Priming | COMPLETE | 100% | None | | 1 | Canonical DB + Auth | COMPLETE | 100% | None | | 2 | Discovery v1 | COMPLETE | 100% | None | | 3 | Mirrors + Packs | COMPLETE | 100% | None | | 4 | CMS v1 + Test Lab | COMPLETE | 100% | None | | 5 | Plugin Ecosystem | COMPLETE | 100% | None | | 6 | Evaluation Lab | COMPLETE | 100% | None | | 7 | Research Ingestion | COMPLETE | 100% | None | | 8 | Runtime Traces | COMPLETE | 100% | None | | 9 | Interop Hardening | COMPLETE | 100% | None | ### Component Status | Component | Status | Location | Notes | | ------------------- | -------- | ------------------------------------------------------- | ------------------------------------------------------------------------ | | Schema DDL | DEPLOYED | database/001-schema-ddl.sql | All tables created | | Seed Data | DEPLOYED | database/002-seed-data.sql | Tools, schemas, styles, etc. | | Discovery Functions | DEPLOYED | database/003-discovery-system.sql | discover(), traverse_links() | | RLS Policies | DEPLOYED | database/004-rls-policies.sql | Multi-tenant security | | Platform Admins | DEPLOYED | database/045-add-platform-admins-super-admin-bypass.sql | DB-backed super-admin identity + global RLS helper bypass | | Skills Seed | DEPLOYED | database/005-seed-skills.sql | Applied via Supabase migration `seed_curated_skills_v1` | | Skills Mirror | EXISTS | .claude/skills/curated/ | Curated skill mirrors for local consumption | | Agent References | EXISTS | docs/references/ai-agents/ | Agent setup docs and definitions | | Plugin Snapshot | EXISTS | .claude-plugin/ | Generated plugin/workspace export surface | | Unit Package Lint | EXISTS | scripts/registry/lint_unit_packages.py | Repo-side contract/taxonomy checks for curated mirrors | | Pack Exporter | EXISTS | scripts/distribution/export_registry_packs.ts | Deterministic JSON pack export for non-skill units | | Supabase Client | EXISTS | lib/supabase/\*.ts | client, server, middleware | | CMS Dashboard | EXISTS | app/dashboard-cms/ | Wired to Supabase + registry CRUD + revision diff (Phase 04-02) | | Factory (v1) | EXISTS | app/dashboard-cms/factory/ | Template gallery + wizard + diff-first commit (Phase 04-03) | | API Routes | EXISTS | app/api/ | 59 route files; see `docs/api/ENDPOINTS.md` for documented HTTP surfaces | | Root Middleware | EXISTS | middleware.ts | Session refresh + dashboard redirect | | TypeScript Types | EXISTS | types/database.ts | Generated via Supabase CLI; re-generate as needed | | GSD Skill Pack | EXISTS | .claude/skills/planning/gsd-1.1.0/ | Needs DB import | ### Current Execution Sources - Active architecture and execution context: `docs/BLUEPRINT.md` - Canonical system intent and operating priorities: `docs/VISION.md` - Execution receipts, audits, and latest generated evidence: `docs/reports/` - Live HTTP inventory: `docs/api/ENDPOINTS.md` - Runtime-truth/operator primers: `AGENTS.md`, `CATALYST.md`, `docs/QUICK_START_AGENTS.md` - Active doc front door: `docs/RULES.md`, `docs/VISION.md`, `docs/AGENT_READINESS_CHECKLIST.md`, `docs/TAXONOMY.md`, `docs/BLUEPRINT.md` - Historical phase implementation walkthroughs later in this file remain reference context only; when they conflict with current code or current docs, trust the active sources above. --- ## Vision & Philosophy ### What Catalyst Is Catalyst is a **library of lego blocks for AI agents**. It stores atomic, composable primitives that any AI can pull from: - **Skills** - Reusable procedures with layered guidance - **Tools** - Executable actions (MCP, HTTP, SQL, etc.) - **KB Items** - Reference knowledge and guidelines - **Schemas** - Shape contracts for structured outputs - **Styles** - Voice and format overlays - **Recipes** - Workflow containers (schema + style + channel) - **Playbooks** - Ordered multi-step patterns (soft guidance; replayable runs) - **Actions** - Curated push-button entry points (Launchpad cards; playbook-backed) - **Automations** - Scheduled Action runs (history + replay) - **Personas** - Audience profiles - **Agents** - AI personas with preferences - **Bundles** - Curated collections - **Prompts** - Reusable prompt fragments - **Rubrics/Metrics** - Evaluation primitives ### What Catalyst Is NOT - A rigid orchestration system that forces paths - A pre-planned workflow engine - A gated pipeline with mandatory steps - An "intelligent routing" layer that controls agents ### Core Principles #### 1. DB is Canonical, Files are Mirrors Registry entities live in the database. Repo files (like `.claude/skills/`) are mirrors for compatibility. The filesystem format is a compatibility surface, not the source of truth. Operational truth: - `.claude/**` is the primary project-local mirror surface for skills, KB, and agent guidance. - `.claude-plugin/**` is generated distribution output for plugin/workspace ecosystems. - OpenClaw/Render-hosted runtimes are downstream consumers of the registry and mirrors, not the canonical source. #### 2. Skills are Layered `SKILL.md` is a short launcher. Deep rules, templates, references, and examples live in separate artifact files. We never flatten everything into one giant prompt. #### 3. Progressive Disclosure is Default Only metadata (name + description) is always visible. Full instructions load when relevant. Supporting artifacts load when needed. This keeps context windows lean. #### 4. Links are Hints, Not Gates `entity_links` expand discovery and suggest relationships. They do NOT enforce a DAG or mandatory sequence. The agent decides. #### 5. Atomic Units Stay Atomic A skill can stand alone even if it links to tools or KB. A recipe works even without the ideal style. We prefer ranking and context over hard-coded routes. #### 6. Coverage, Not Control Topic maps and taxonomy let us see what exists, spot gaps, track what works, and avoid duplication. But they never block creation or force paths. --- ## Architecture Overview ### System Layers ``` ┌─────────────────────────────────────────────────────────────────┐ │ CLIENT LAYER │ ├─────────────────────────────────────────────────────────────────┤ │ CMS Dashboard │ External Agents │ Other Integrations │ │ (Next.js App) │ (API Consumers) │ (skills.sh, repos) │ └────────┬──────────┴────────┬──────────┴────────┬────────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ API LAYER │ ├─────────────────────────────────────────────────────────────────┤ │ /api/registry │ /api/discover │ /api/agent/* │ │ /api/traverse │ /api/catalog │ /api/runs │ └────────┬──────────┴────────┬──────────┴────────┬────────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ SERVICE LAYER │ ├─────────────────────────────────────────────────────────────────┤ │ Supabase Client │ Discovery Engine │ Run Recorder │ │ (server.ts) │ (SQL functions) │ (step tracking) │ └────────┬──────────┴────────┬──────────┴────────┬────────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ DATA LAYER │ ├─────────────────────────────────────────────────────────────────┤ │ registry_entities │ entity_links │ runs + run_steps │ │ + extension tables │ + entity_tags │ + run_tool_calls │ │ + entity_revisions │ + discovery_* │ + run_outputs │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ SUPABASE (PostgreSQL) │ ├─────────────────────────────────────────────────────────────────┤ │ RLS Policies │ pgvector │ Triggers │ Functions │ └─────────────────────────────────────────────────────────────────┘ ``` ### Calling Agent Experience: Discovery (Current + Planned) This is the intended "calling agent" experience when an external orchestrator wants to find and use atomic units. ```mermaid flowchart TD A["Calling Agent / External Orchestrator"] --> B["POST /api/discover
intent + filters (types/tags/families/bundles)
paging (cursor/limit) + facets/debug (optional)"] B --> C{"Authenticated session?"} C -->|no| D["401 unauthorized"] C -->|yes| E["Normalize request
dedupe + clamp + prompt alias"] E --> F["Supabase RPC: discover_v2(...)"] F --> G["Ranked menu + receipts
match_reasons (always)
signals (debug only)"] G --> H{"Need graph expansion?"} H -->|yes| I["POST /api/traverse
ref, link_types, depth"] I --> J["Supabase RPC: traverse_links(...)"] J --> K["Related refs + entity stubs"] H -->|no| L["Select 1..N refs"] K --> L L --> M{"Need higher precision at the top?"} M -->|optional (planned)| N["Rerank adapter
cross-encoder or LLM judge
uses receipts (reasons/signals)"] M -->|no| O["Proceed"] N --> O O --> P["Use selected units"] P --> Q["Portable local mirror surface
(.claude/kb/curated, .claude/skills/curated)"] P --> R["DB-backed detail reads in dashboard, export, and MCP surfaces
(get_entity, get_skill_content, artifact body lookup)"] ``` Notes: - Today, `discover_v2` is the stable discovery surface (with `match_reasons`, and `signals` behind `debug: true`). - `traverse_links` is the graph expansion surface (soft hints, not gates). - Repo mirrors remain the portable local surface for agent/tool compatibility. - DB-backed entity content and artifact reads are already available in dashboard/server flows, plugin/export surfaces, and MCP read handlers such as `get_entity`, `get_skill_content`, and artifact body lookup. --- ## Atomic Units Reference ### Skill A reusable procedure with layered guidance and optional artifacts. **DB Tables:** `registry_entities` + `skills` + `entity_revisions` (includes `content_json` + `artifacts_json`) **Key Fields:** - `pack_category` - Category within skill pack - `execution_mode` - single_shot, multi_turn, streaming - `timeout_seconds` - Max execution time **Skill Folder Structure (Anthropic-compatible):** ``` skill-name/ ├── SKILL.md # REQUIRED: short launcher with YAML frontmatter ├── rules/ │ ├── _sections.md # Rule sections scaffold │ └── _template.md # Render/format template ├── references/ # Reference docs ├── examples/ # Examples └── tests/ # Validation cases ``` **SKILL.md Format:** ```yaml --- name: skill-name description: | Use when [TRIGGER CONDITIONS]. Triggers: "keyword1", "keyword2", "phrase" --- # Skill Name ## Overview Brief description of what this skill does. ## When to Use - Condition 1 - Condition 2 ## Workflow 1. Step one 2. Step two ## Tools Used - tool-name: purpose ## Output Description of what this skill produces. ``` **Critical:** Descriptions must be **trigger-focused** (when to use), not capability-focused (what it does). ### Tool A concrete action (MCP, HTTP, internal, SQL, Python, bash). **DB Tables:** `registry_entities` + `tools` **Key Fields:** - `tool_type` - mcp, http, internal, sql, python, bash - `provider` - openai, anthropic, tavily, fal, etc. - `endpoint_url` - API endpoint - `auth_method` - api_key, oauth, none - `risk_level` - integer 1-5 (CHECK constraint) - `input_schema` - JSON Schema for inputs - `output_schema` - JSON Schema for outputs ### KB Item Reference knowledge. NOT instruction - it's a fact base or guideline. **DB Tables:** `registry_entities` + `kb_items` + `kb_variants` **Item Types:** - `reference` - Source documents, specs - `guideline` - How-to guidance - `policy` - Compliance requirements - `best_practice` - Patterns that work - `example` - Sample content - `persona_profile` - Audience definition **Variants:** - `snippet` - Short, ready-to-inject (~100 tokens) - `distilled` - Medium summary (~500 tokens) - `full` - Complete reference (variable) **Revision contract note:** - Canonical KB content is stored in `kb_items` + `kb_variants`. - `entity_revisions.content_json.kb` is the revision snapshot/projection used for history + diff workflows. - Editors should hydrate from canonical KB projection first, then refresh revision snapshots on save. ### Schema A shape contract for content or structured outputs. **DB Tables:** `registry_entities` + `json_schemas` **Key Fields:** - `schema_json` - JSON Schema definition - `version` - Schema version (semver) ### Style A voice/format overlay. Reusable across content types. **DB Tables:** `registry_entities` + `styles` **Key Fields:** - `overlays_json` - Array of overlay objects (voice/format overlays) - `constraints_json` - Style constraint rules (restrictions, requirements) ### Recipe Links schema + style + channel for a content type. **DB Tables:** `registry_entities` + `recipes` **Key Fields:** - `base_schema_ref` - jsonb entity ref to schema - `style_ref` - jsonb entity ref to style - `channel_ref` - jsonb entity ref to channel - `constraints_json` - Recipe-level constraints ### Content Type Defines what can be created. **DB Tables:** `registry_entities` + `content_types` **Families:** article, report, social, message, study, qbank, visual, audio, presentation, web, app ### Agent A named AI persona with default posture, model settings, and preferences. **DB Tables:** `registry_entities` + `agents` + `agent_memories` + `agent_proclivities` **Key Fields:** - `persona_name`, `persona_role`, `persona_voice` - `model_id` - Default model - `system_prompt` - Base prompt - `boundaries_json` - What agent won't do **Memory Scopes:** org, app, user, run, session **Proclivities:** prefer, avoid, default_to, never_use ### Bundle A curated set of KB, skills, or tools. **DB Tables:** `registry_entities` + `bundles` + `entity_links` **Bundle Types:** context_bundle, research_kit, eval_kit, starter_kit, pack ### Channel Distribution surface with constraints. **DB Tables:** `registry_entities` + `channels` **Examples:** web, email, instagram, twitter, linkedin, youtube ### Prompt Reusable instruction chunk (system/task/format/policy/rubric). External surfaces should call these **prompts**. **DB Tables:** `registry_entities` + `prompts` (DB entity_type: `prompt`) **Kinds:** system, task, format, policy, rubric ### Rubric / Metric Rubrics score quality; metrics score performance. **DB Tables:** `registry_entities` + `rubrics` / `metrics` --- ## Database Schema ### Core Tables | Table | Purpose | Key Relationships | | ------------------- | ------------------------------- | ----------------------------- | | `orgs` | Organizations (tenants) | Parent of all data | | `apps` | Applications | Belongs to org | | `projects` | Projects | Belongs to app | | `registry_entities` | Base entity table | Extended by type tables | | `entity_revisions` | Version history | FK to registry_entities | | `entity_links` | Relationships | FK to registry_entities (x2) | | `entity_tags` | Tag assignments | FK to registry_entities, tags | | `entity_artifacts` | (Optional) blob/binary pointers | FK to registry_entities | | `entity_embeddings` | Vector embeddings | FK to registry_entities | ### Extension Tables (by entity_type) | Entity Type | Extension Table | Purpose | | ------------ | --------------------------- | ------------------------------------------------------------------- | | skill | `skills` | Pack category, execution mode | | tool | `tools` | Type, provider, endpoint, auth | | kb | `kb_items` + `kb_variants` | Item type, variants | | schema | `json_schemas` | Schema JSON | | style | `styles` | Voice, visual system, component language, and presentation overlays | | recipe | `recipes` | Schema/style/channel defaults | | content_type | `content_types` | Family, associations | | agent | `agents` + `agent_memories` | Persona, model, boundaries | | bundle | `bundles` | Bundle type | | channel | `channels` | Platform constraints | | prompt | `prompts` | Prompt kind | | rubric | `rubrics` | Dimensions | | metric | `metrics` | Calculation | | playbook | `playbooks` | Mode, steps | ### Link Types | Link Type | Meaning | Example | | ------------------ | ------------------- | ----------------------------------- | | `requires` | Hard dependency | recipe requires schema | | `prerequisite` | Should come before | skill prerequisite other_skill | | `uses_tool` | Invokes this tool | skill uses_tool tavily.search | | `uses_prompt` | References prompt | recipe uses_prompt prompt:cta-v1 | | `uses_kb` | References KB item | skill uses_kb kb:style-guide | | `governed_by_pack` | Rules from pack | recipe governed_by_pack style-guide | | `bundle_member` | In this bundle | kb bundle_member context-bundle | | `often_follows` | Commonly after | recipe_b often_follows recipe_a | | `recommends` | Suggested pairing | style recommends another_style | | `pairs_with` | Works well together | tool pairs_with another_tool | | `alternate` | Can substitute | style_a alternate style_b | | `supersedes` | Replaces | skill_v2 supersedes skill_v1 | | `parent` | Hierarchy parent | sub_skill parent main_skill | | `related` | Loosely related | tool related other_tool | ### Discovery Functions ```sql -- Find entities by intent (keyword search + ranking) SELECT * FROM discover('write an article about pharmacology'); -- Find entities by type SELECT * FROM discover(NULL, ARRAY['recipe', 'schema']); -- Vector similarity search SELECT * FROM discover_semantic(embedding_vector, ARRAY['tool']); -- Follow links from an entity SELECT * FROM traverse_links('recipe:article-standard'); SELECT * FROM traverse_links('bundle:base-content-kit', NULL, 2); ``` ### Helper Views ```sql -- Summary counts by entity type SELECT * FROM catalog_summary; -- Entity instructions SELECT ref, instruction, requires, uses, pairs_with FROM entity_instructions WHERE ref = 'recipe:article-standard'; ``` --- ## Roadmap and Planning Canonical plan state lives in: - `docs/BLUEPRINT.md` (architecture and major system lanes) - `docs/VISION.md` (what the system optimizes for) - `docs/reports/*` (generated execution artifacts and audits) --- --- > **Historical implementation details (Phases 0-9, task breakdowns, testing strategy, feedback loops) were removed in the March 2026 docs cleanup.** All phases are COMPLETE. The detailed breakdown is in git history if needed. For current system architecture, read this document's front sections + `docs/VISION.md` + `docs/ROADMAP.md`. --- ## Source: docs/references/contracts/AGENT_AUTONOMY_DISCOVERY.md # Agent Autonomy + Deep Discovery Charter This reference captures how Katailyst should serve external and internal AI agents without over-constraining them. Canonical policy remains in: - `docs/RULES.md` - `docs/VISION.md` Use this file as an implementation-oriented companion for discovery and orchestration behavior. ## Operating Stance - This system is an operating layer for agents, not a single fixed app flow. - Serve options with explainability, not rigid routing. - Keep composability high and assumptions low. - Favor adaptive model reasoning over imperative tunnel logic. ## Discovery Contract Expectations - Menus over routes: return ranked candidates plus reasons/signals. - Continuation over truncation: per-call caps are acceptable, but agents must be able to request additional pages/results. - Mixed-unit freedom: agents may combine any atomic units (skills/tools/kb/prompts/bundles/playbooks/agents) in arbitrary counts. - Query refinement loops: agents should be able to iterate intent/types/tags/bundles and re-rank as needed. - Debuggability: responses should expose enough metadata for inspection and correction. ## MCP + API Surface Guidelines - Keep a small stable primitive surface for broad interoperability: - `search/discover` - `get/inspect` - `expand/continue` - `traverse` (graph exploration) - Backward compatibility is mandatory when evolving tool names or shapes. - Add alias verbs instead of renaming existing verbs in-place. ## Ranking + Retrieval Guidelines - Retrieval can blend lexical, graph, eval signals, and embeddings. - Re-rank should improve quality, not block recall. - Fail-open behavior: if rerank providers fail, return base ordering with warnings. - Keep scoring receipts (`match_reasons`, signals, rerank metadata) available for agents that want deeper control. ## UX Guidance (Human + Agent) - Clear hierarchy, low clutter, high signal density. - Keep complex layers accessible but progressively disclosed. - Avoid duplicate surfaces that do the same thing with minor differences. - Standardize taxonomy and naming to reduce cognitive and maintenance overhead. ## Testing and Validation Expectations - Contract tests for discover/expand/get/traverse semantics. - Regression checks for limit/clamp/cursor behavior. - Explicit tests for fallback behavior (rerank/network/tool failures). - UI smoke coverage for core registry/discovery/task flows after major declutter passes. ## KB Mirror Note If this guidance must be surfaced at runtime for agents, publish it to canonical DB KB and sync mirrors (`just kb-sync` or `npx tsx scripts/registry/sync/sync_kb_from_db.ts`). --- ## Source: docs/references/ai-agents/CORE_AGENT_SHARED_FOUNDATION.md # Core Agent Shared Foundation This is the common runtime contract for the three active hosted HLT agents: - Victoria - Julius - Lila The goal is not to make them identical. The goal is to give them one strong shared base, keep role differences thin and explicit, and make the stack legible without flattening the graph. ## What This Doc Is Use this when the question is: - what the shared runtime base is - what the active hosted trio actually reads first - how the local injected files relate to Katailyst canon - which differences belong in per-agent overlays instead of the shared base Use the deeper architecture doc for subsystem boundaries: - `.claude/kb/curated/global/ai-engineering/agent-foundation-spec/KB.md` ## Live Runtime Truth The hosted runtime is DB-first and local-fast-path: - OpenClaw/Render owns the live runtime surface - Katailyst and Supabase are the canonical control plane - repo files are mirrors and portability surfaces, not the final runtime authority - local injected files matter because they are always present, not because they are the whole system - active execution stays in `hlt`; shared `system` canon is an additive read layer, not a second live fleet - HLT should be able to read and use shared `system` canon by default; the split is placement and write scope, not a permission fence The currently validated Render shape is: - Victoria service reads `/data/workspace` - Julius service reads `/data/workspace` - Lila service reads `/data/workspace` - Victoria's disk also contains sibling `julius-docs` and `lila-docs` directories, but those are not the active workspace path for the Julius and Lila services themselves Partial observability is normal. When repo text, DB canon, and runtime differ, reconcile against validated runtime plus DB truth first, then update the mirrors. ## Shared Runtime Read Order The shared injected stack is: 1. `AGENTS.md` 2. `SOUL.md` 3. `USER.md` 4. `IDENTITY.md` 5. `TOOLS.md` 6. `BOOTSTRAP.md` 7. `HEARTBEAT.md` 8. `MEMORY.md` Shared continuity memory such as `memory/lessons-learned.md` is nearby but not part of the always-injected set. Agent lessons docs such as `agent-lessons-victoria`, `agent-lessons-julius`, and `agent-lessons-lila` are adjacent correction layers. Load them when risky or domain-specific failure modes matter. Do not let them override the shared base, front doors, or IA maps. This is the compact steering stack, not a full operational encyclopedia. These runtime mirrors should also be treated as protected runtime overlays in operator workflows: - they remain DB-canonical and mirror-backed - they are not ordinary KB cleanup targets - they should be reviewed through the dedicated Runtime Overlays browse view or other protected-surface workflows - bulk summarization, trimming, or "tidy up the docs" passes are the wrong maintenance path for this stack ## Runtime File Map Use the injected files like a small steering panel, not a second operating system. | Runtime file | Main job | Load posture | | -------------- | ---------------------------------------------------------------------------------------------- | ----------------------------------------- | | `AGENTS.md` | startup contract, execution posture, coordination, and the fastest path back into deeper canon | every session | | `SOUL.md` | identity, taste, personality, and long-lived behavioral center | every session | | `USER.md` | principal and relationship context that changes how the work should land | every session | | `IDENTITY.md` | stable runtime facts, service identity, and deployment-level self-knowledge | every session | | `TOOLS.md` | tier-1 tools, vault posture, discovery path, and durable caveats | every session | | `BOOTSTRAP.md` | restart, compaction, recovery, and re-grounding | load when recovery or drift is involved | | `HEARTBEAT.md` | cadence, suppression, reminder policy, and background monitoring behavior | load when heartbeat or recurrence matters | | `MEMORY.md` | durable truths, routing, persistent decisions, and archived assumptions | load when continuity materially matters | These files should shape behavior quickly, then route the agent back into deeper shared canon and task-specific registry surfaces. ## Shared Access Base The hosted trio should inherit one shared access story before role-specific overlays begin: - Katailyst remote MCP at `https://www.katailyst.com/mcp` is the default hosted-agent control-plane entry - trusted hosted agents should default to the full catalog or explicitly set `agent,delivery-admin` - `bootstrap` is the intentionally narrow first-glance toolset, not the normal hosted-trio posture - Supabase MCP is the direct canonical DB read lane, not the whole hosted-agent setup story - Vault-backed execution is the only sanctioned secret path - Notion is optional: use Notion MCP for interactive OAuth workflows, and the vault-backed REST path for automation workflows Use `docs/references/ai-agents/HOSTED_AGENT_CORE_SETUP.md` as the shared setup reference when auth posture, tool access, or optional integration lanes are in question. ## Shared Canon Behind The Runtime The injected files should route back into the deeper shared canon early. Start with: - `global-catalyst-atlas` as the single compact shared fleet entry - `docs/references/ai-agents/HOSTED_AGENT_CORE_SETUP.md` as the shared MCP, vault, and core-tool setup reference Then branch into: - `global-agent-principles` - `global-research-protocol` - `global-team-context` - `agent-foundation-spec` These are the shared doctrine and architecture surfaces. They are not substitutes for the local runtime files, and the local runtime files are not substitutes for them. ## DB Capability Lanes The shared entry surface should point the fleet into broad DB-backed lanes rather than inventing extra runtime-taxonomy layers. Current flagship lane names: - Multimedia - Articles - Social - QBank - Meeting Briefing - Page / Web Design - Registry / Creation / Discovery Those lane names are broader than any one current code. The lane stays broad even when the current flagship unit is narrower, such as `create-multimedia`. ## DB-First, Local-Fast-Path The working doctrine for the hosted trio is: - the registry is the main operating system - local injected files are fast-path steering mirrors - front doors and IA docs are maps and launch surfaces - helper docs widen discovery but do not force a route - continuity memory and task artifacts live outside the steering layer - shared `system` surfaces should be readable by default where the hosted fleet is browsing or loading helper context, while writes stay execution-org scoped Or more simply: - local files shape behavior quickly - DB canon carries most of the richer truth ## Repo, DB, And Runtime Precedence Treat the three surfaces like this: 1. validated runtime truth explains what the hosted agents are actually reading 2. DB canon remains the deeper operating source for doctrine, linked context, and evolving truth 3. repo files are mirrors and reviewable authoring surfaces When they disagree: - validate runtime before assuming the repo mirror is current - reconcile against DB canon once runtime truth is clear - then update the repo mirrors to match instead of preserving a competing narrative ## Menus, Not Mandates The shared base should bias the fleet toward composition, not hard choreography. - use strong starting nodes, then keep exploring when the task needs more depth - default to using more tools, skills, and surrounding context than first instinct suggests - for almost every meaningful task, assume at least one or two tools or skills should be involved - for complex work, it is normal to compose five, ten, or more surfaces before acting - compose skills, KBs, tools, prompts, rubrics, styles, playbooks, and assets with judgment - let task complexity determine how much planning, research, and retrieval to use - avoid flattening strong specialist nodes into a tiny number of mandatory pathways Katailyst suggests. The agent composes. ## Per-Agent Overlays The shared base stays strong. Differences belong in explicit per-agent overlays. | Agent | Primary principal | Core lane | Main overlay surfaces | | -------- | ----------------- | ------------------------------------------------------------- | --------------------------------------------- | | Victoria | Alec | registry stewardship, fleet equipping, design, infrastructure | Victoria Ops Guide, Victoria identity mirrors | | Julius | Justin | operations, planning, follow-through, meeting prep | Julius Ops Guide, Julius identity mirrors | | Lila | Emily | marketing, content, campaigns, multimedia packaging | Lila Ops Guide, Lila identity mirrors | All three should still be able to research, write, plan, and use visuals when the task calls for it. The overlay changes what they pull first, not whether they are allowed to think. ## Helper Surfaces The active helper layer exists to make the per-agent file stack easier to scan: - ops guides are the broader nearby doc-and-surface map Deprecated helper-doc residue should not be treated as part of the active default stack. The live helper lane is the front-door index plus the stronger doctrine and architecture surfaces. Indexes are optional discovery aids. They are not doctrine and they are not control towers. For the actual runtime copy step, use `RUNTIME_OVERLAY_SYNC_CHECKLIST.md`. The checklist is the operator packet for syncing cleaned mirrors into the live service workspaces. ## Channel Boundary Slack, App Home, threading, delivery chunking, and other channel behaviors are runtime/channel concerns. They can influence how the hosted agents feel in practice, but they do not belong in the agent identity architecture itself. Keep the boundary clear: - agent-doc work defines steering surfaces, read order, and sync precedence - runtime/channel work defines delivery UX, thread/session behavior, and app-surface behavior Do not use identity-doc edits as a substitute for real runtime or Slack work. ## High-Risk Mirrors The mirrors that need the most discipline are: - `*-identity-memory` - `*-identity-tools` - `*-identity-heartbeat` - `*-identity-agents` Common failure modes: - memory turns into stale tactical clutter - tools turn into giant catalogs - heartbeat keeps repeating already-known issues - agents mirrors duplicate the entire system instead of acting like a runtime contract ## Naming And Drift Note There is real naming drift across the fleet and it should be described, not hidden. Current example: - Julius is the agent identity - some older service plumbing still uses `openclaw-justin` Do not pretend those are the same thing. Reconcile them explicitly and keep the human-facing identity surfaces coherent. In the shared people layer, prefer first-name references unless deeper titles or surnames have been revalidated in the principal-specific surfaces. ## Maintainability Rule This stack should be stable enough to maintain but flexible enough to evolve. That means: - preserve the shared base - keep overlays thin - keep helper docs optional - keep mirrors pointer-heavy - keep the graph open for deeper traversal The point is not determinism. The point is a strong core that still leaves room for context, judgment, and specialist depth. --- ## Source: docs/api/ENDPOINTS.md # API Endpoints > Auto-generated from `lib/api/endpoint-inventory.ts` on 2026-03-11. > Do not edit manually. Run `npm run endpoints:sync` to regenerate. **Total endpoints:** 94 ## assist (5) | Method | Path | Auth | Description | | ------ | ----------------------- | -------- | ------------------------------------------------------------------ | | `POST` | `/api/ai-assist` | required | Field-level generation/refinement assistant. | | `POST` | `/api/ai-assist/stream` | required | Streaming field-level generation for long-form editor content. | | `POST` | `/api/launchpad` | required | Intent-to-action proposals and optional execution. | | `POST` | `/api/page-assist` | required | Page-scoped next-action guidance. | | `POST` | `/api/agent-context` | required | Combined discovery + graph + docs context for agent bootstrapping. | ## chat (1) | Method | Path | Auth | Description | | ------ | ----------- | -------- | ------------------------------ | | `POST` | `/api/chat` | required | Agent + bundle chat execution. | ## discovery (3) | Method | Path | Auth | Description | | ------ | -------------------- | -------- | -------------------------------------------------------------------------------------------- | | `POST` | `/api/discover` | required | Intent-based registry discovery. | | `POST` | `/api/agent-handoff` | required | Build a packetized handoff bundle with agent context, run envelope, and delegation contract. | | `POST` | `/api/traverse` | required | Graph expansion from a typed ref. | ## evals (1) | Method | Path | Auth | Description | | ------ | ------------------------------- | -------- | ------------------------------------------------ | | `POST` | `/api/evals/generate-variation` | required | Generate candidate variation for eval workflows. | ## factory (6) | Method | Path | Auth | Description | | ------ | ----------------------------- | -------- | ---------------------------------------------------------------------------- | | `GET` | `/api/factory-ops` | required | Factory operations snapshot panel payload. | | `POST` | `/api/factory/autopilot` | required | Guidance-first autopilot generation/review packet. | | `POST` | `/api/factory/enrich-draft` | required | Enrichment suggestions for draft content. | | `POST` | `/api/factory/generate-draft` | required | Generate draft from intake/template answers. | | `POST` | `/api/factory/intake` | required | Factory intake creation/update. | | `POST` | `/api/skills/import-preview` | required | Preview a skill package import from GitHub, staged mirrors, or a local path. | ## factory_incoming (4) | Method | Path | Auth | Description | | ------ | -------------------------------- | -------- | ------------------------------------------------------ | | `POST` | `/api/factory/incoming/dispatch` | required | Dispatch queued incoming items into worker processing. | | `POST` | `/api/factory/incoming/intake` | required | Ingest external source payload into incoming queue. | | `GET` | `/api/factory/incoming/status` | required | Incoming queue and processing status snapshot. | | `POST` | `/api/factory/incoming/worker` | required | Worker execution endpoint for queued incoming tasks. | ## factory_kb (2) | Method | Path | Auth | Description | | ------ | ----------------------------- | -------- | ----------------------------------------------- | | `POST` | `/api/factory/kb/from-kb-ref` | required | Generate KB draft from existing KB reference. | | `POST` | `/api/factory/kb/from-url` | required | Generate KB draft from external URL extraction. | ## graph (1) | Method | Path | Auth | Description | | ------ | ------------ | -------- | ------------------------------------------- | | `GET` | `/api/graph` | required | Graph payload for mini/full/3D graph views. | ## integrations (10) | Method | Path | Auth | Description | | ------ | -------------------------------------------- | -------- | ----------------------------------------------------------------------------------- | | `GET` | `/api/integrations/connect/callback` | required | Finalize a hosted integration connect session and redirect back to the dashboard. | | `POST` | `/api/integrations/connect-link` | required | Create a hosted provider connect link and session envelope for the execution org. | | `POST` | `/api/integrations/connections/[id]/share` | required | Promote a private integration connection to org-shared visibility. | | `POST` | `/api/integrations/connections/[id]/unshare` | required | Revert an org-shared integration connection back to private visibility. | | `POST` | `/api/integrations/devin/chat` | required | Devin chat gateway — OpenAI-compatible adapter for Devin session API. | | `GET` | `/api/integrations/framer/health` | required | Framer pilot readiness snapshot: surface roles, production lock, and secret status. | | `GET` | `/api/integrations/marketo/health` | required | Marketo shadow-mode readiness snapshot: endpoints, secrets, and pilot gates. | | `GET` | `/api/integrations/multimedia/health` | required | Multimedia Mastery sidecar status probe for the canonical /api/media/v1 surface. | | `POST` | `/api/integrations/slack-like` | required | Slack-like integration bridge endpoint. | | `GET` | `/api/integrations/diagnostics` | required | Integration table/provider diagnostics for the execution org. | ## lists (8) | Method | Path | Auth | Description | | -------- | ------------------------- | -------- | ----------------------------------------------------- | | `GET` | `/api/lists` | required | List ranked lists in private/org/public scopes. | | `POST` | `/api/lists` | required | Create a list in the execution org. | | `GET` | `/api/lists/[id]` | required | Get one list with items and stats. | | `PATCH` | `/api/lists/[id]` | required | Update list metadata, scope, or status. | | `POST` | `/api/lists/[id]/items` | required | Add or update list items. | | `DELETE` | `/api/lists/[id]/items` | required | Delete an item from a list. | | `POST` | `/api/lists/[id]/vote` | required | Cast or clear a vote for a list. | | `POST` | `/api/lists/[id]/publish` | required | Publish/unpublish/archive a list with scope controls. | ## mcp (26) | Method | Path | Auth | Description | | -------- | ------------------------------------------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------- | | `GET` | `/mcp` | required | MCP Streamable HTTP transport endpoint (GET stream path). Auth-required by default; can be opened only when MCP_ALLOW_ANONYMOUS=true. | | `POST` | `/mcp` | required | MCP Streamable HTTP transport endpoint (JSON-RPC request path). Auth-required by default; can be opened only when MCP_ALLOW_ANONYMOUS=true. | | `DELETE` | `/mcp` | required | MCP Streamable HTTP transport endpoint (session/stream teardown). Auth-required by default; can be opened only when MCP_ALLOW_ANONYMOUS=true. | | `GET` | `/api/mcp/playground/bootstrap` | required | Bootstrap MCP playground (tools, capabilities, presets, history). | | `POST` | `/api/mcp/playground/run` | required | Run an MCP tool call from the dashboard playground. | | `POST` | `/api/mcp/playground/plan` | required | Generate an AI-first MCP playground plan with ordered steps and client setup guidance. | | `GET` | `/api/mcp/playground/presets` | required | List persisted MCP playground presets for the current user/org. | | `POST` | `/api/mcp/playground/presets` | required | Create an MCP playground preset for the current user/org. | | `PATCH` | `/api/mcp/playground/presets/[id]` | required | Update a saved MCP playground preset. | | `DELETE` | `/api/mcp/playground/presets/[id]` | required | Delete a saved MCP playground preset. | | `GET` | `/api/mcp/playground/history` | required | List recent MCP playground runs for replay/debug. | | `POST` | `/api/mcp/connect-token` | required | Issue a short-lived MCP connect pack for remote clients. Owners/admins can mint full; other roles get chat_safe. | | `GET` | `/api/mcp/personal-tokens` | required | List current-user, org-scoped personal MCP tokens with status metadata. | | `POST` | `/api/mcp/personal-tokens` | required | Issue a per-user long-lived personal MCP token for remote access. Owners/admins can mint full; other roles get chat_safe. | | `DELETE` | `/api/mcp/personal-tokens/[id]` | required | Revoke a current-user personal MCP token by id. | | `GET` | `/api/mcp/oauth-clients` | required | List org-scoped trusted OAuth clients for hosted MCP. | | `POST` | `/api/mcp/oauth-clients` | required | Create a trusted OAuth client for hosted MCP authorization-code + PKCE flows. | | `DELETE` | `/api/mcp/oauth-clients/[id]` | required | Revoke a trusted OAuth client and invalidate its active consents/tokens. | | `GET` | `/api/mcp/oauth-authorizations` | required | List current-user OAuth authorizations for hosted MCP clients. | | `DELETE` | `/api/mcp/oauth-authorizations/[id]` | required | Revoke one current-user OAuth authorization and invalidate related active tokens. | | `GET` | `/.well-known/oauth-authorization-server` | public | OAuth authorization-server metadata for the hosted Katailyst MCP resource. | | `GET` | `/.well-known/oauth-protected-resource` | public | Root OAuth protected-resource metadata for hosted MCP discovery. | | `GET` | `/.well-known/oauth-protected-resource/mcp` | public | Path-specific OAuth protected-resource metadata for the /mcp endpoint. | | `GET` | `/oauth/authorize` | required | Render the signed-in OAuth consent screen for hosted MCP authorization requests. | | `POST` | `/oauth/token` | public | Exchange hosted MCP OAuth authorization codes or refresh tokens for bearer tokens. | | `POST` | `/oauth/revoke` | public | Revoke a hosted MCP OAuth access token or refresh token. | ## metrics (2) | Method | Path | Auth | Description | | ------ | --------------------- | -------- | ---------------------------------------------------- | | `POST` | `/api/metrics/ingest` | required | Ingest metric points with metric/subject validation. | | `POST` | `/api/metrics/query` | required | Query metric points with dimension filters. | ## novu (8) | Method | Path | Auth | Description | | ------ | ------------------------------------ | -------- | -------------------------------------------------- | | `POST` | `/api/novu/health` | required | Novu integration health probe. | | `POST` | `/api/novu/ingress/events` | required | Ingest Novu ingress events into internal queueing. | | `POST` | `/api/novu/ops/remediate` | required | Run Novu remediation workflow operations. | | `POST` | `/api/novu/ops/snapshot` | required | Return Novu operational snapshot and diagnostics. | | `POST` | `/api/novu/shadow-subscriber/upsert` | required | Upsert shadow subscriber records for Novu sync. | | `POST` | `/api/novu/trigger` | required | Novu trigger dispatch endpoint. | | `POST` | `/api/novu/webhooks` | public | Signed Novu webhook ingestion endpoint. | | `POST` | `/api/novu/workflows/sync` | required | Synchronize Novu workflow definitions and status. | ## observe (2) | Method | Path | Auth | Description | | ------ | ------------------------- | -------- | ---------------------------------------------------------------------------- | | `GET` | `/api/run-events` | required | Tail append-only run events for a run using event_seq-based pagination. | | `GET` | `/api/run-events/distill` | required | Generate a distilled summary from append-only run events for a selected run. | ## plugins (3) | Method | Path | Auth | Description | | ------ | --------------------- | -------- | ------------------------------------------------------------------------------------------------- | | `GET` | `/api/plugins/browse` | required | Browse publishable entities with surface guidance for project mirrors vs plugin/export consumers. | | `POST` | `/api/plugins/export` | required | Export plugin/distribution payloads with compatibility diagnostics and runtime surface guidance. | | `GET` | `/api/plugins/packs` | required | List starter packs from bundle entities and members with distribution surface guidance. | ## registry (5) | Method | Path | Auth | Description | | ------ | -------------------------------- | -------- | ------------------------------------------------------------ | | `POST` | `/api/registry/link-suggestions` | required | Generate link suggestions for an entity. | | `POST` | `/api/registry/assist/suggest` | required | Generate queueable assist suggestions for a registry entity. | | `POST` | `/api/registry/assist/apply` | required | Apply approved registry assist suggestions. | | `PUT` | `/api/registry/link-suggestions` | required | Accept and persist a suggested link. | | `POST` | `/api/registry/similarity-check` | required | Near-duplicate/collision similarity check. | ## runs (2) | Method | Path | Auth | Description | | ------ | ------------------ | -------- | --------------------------------------------- | | `POST` | `/api/runs/flag` | required | Flag run/step and route remediation feedback. | | `GET` | `/api/runs/status` | required | Run status lookup by run_id. | ## system (3) | Method | Path | Auth | Description | | ------ | ------------------- | -------- | -------------------------------------------------------------------- | | `GET` | `/api/docs/page.md` | required | Curated docs markdown page export for authenticated agent retrieval. | | `GET` | `/api/endpoints` | public | Machine-readable HTTP endpoint inventory. | | `GET` | `/api/openapi.json` | public | OpenAPI 3.1 specification generated from endpoint inventory. | ## tools (2) | Method | Path | Auth | Description | | ------ | --------------------- | -------- | -------------------------------------------- | | `POST` | `/api/tools/describe` | required | Tool metadata/contract description. | | `POST` | `/api/tools/execute` | required | Execute registered tool with trace metadata. | --- ## Source: docs/api/MCP_TOOLS_REFERENCE.md # MCP Tools Reference > **Live counts**: Call `registry.capabilities` for the authoritative tool/resource/prompt > inventory. The counts below are a snapshot and may lag behind `lib/mcp/catalog.ts`. > The agent-card at `/.well-known/agent-card.json` derives its counts from the live catalog. Katailyst MCP exposes a hosted-agent surface for external orchestrators: - registry / discovery / graph / lists - generic hosted execution via canonical tool refs - delivery helpers for scheduling and target management - delivery scheduling helpers for queue, reschedule, cancel, and health workflows Source contracts: - `lib/mcp/catalog.ts` (canonical tool/resource/prompt catalog) - `lib/mcp/tool-definitions.ts` - `lib/mcp/handlers.ts` Transport endpoint: - `/mcp` (Streamable HTTP; `GET|POST|DELETE`) - Production origin: `https://www.katailyst.com/mcp` Dashboard MCP playground companion APIs: - `GET /api/mcp/playground/bootstrap` - `POST /api/mcp/playground/run` - `GET|POST /api/mcp/playground/presets` - `PATCH|DELETE /api/mcp/playground/presets/[id]` - `GET /api/mcp/playground/history` - `GET|POST /api/mcp/personal-tokens` - `DELETE /api/mcp/personal-tokens/[id]` - `POST /api/mcp/connect-token` - `GET|POST /api/mcp/oauth-clients` - `DELETE /api/mcp/oauth-clients/[id]` - `GET /api/mcp/oauth-authorizations` - `DELETE /api/mcp/oauth-authorizations/[id]` Hosted OAuth endpoints: - `GET /.well-known/oauth-authorization-server` - `GET /.well-known/oauth-protected-resource` - `GET /.well-known/oauth-protected-resource/mcp` - `GET /oauth/authorize` - `POST /oauth/token` - `POST /oauth/revoke` ## Tool List (36 tools) **Discovery & Read (14 tools):** 1. `discover` — semantic + graph + ranking search across 1,500+ building blocks 2. `get_entity` — fetch one entity with full metadata, links, tags, revision 3. `list_entities` — paginated browse with type/status/tag filters (includes total count) 4. `get_skill_content` — load rendered instruction body for a skill or typed entity 5. `search_tags` — taxonomy tag lookup by prefix/namespace 6. `traverse` — walk the graph from a root entity (max_nodes default 35, max 500) 7. `registry.capabilities` — full MCP surface map (supports compact mode) 8. `registry.session` — current session policy, toolsets, and output lanes 9. `registry.agent_context` — one-call discover + graph + docs context packet (supports compact mode) 10. `registry.health` — service health for MCP + registry + integrations 11. `registry.graph.summary` — graph topology overview (global or root-scoped) 12. `incoming.sources` — browse/inspect source packs (action: list, get) 13. `registry.artifact_body` — fetch raw artifact file from an entity revision 14. `guide` — system orientation, toolset/family explanation, task-shaped navigation, use case examples **Registry Write (6 tools):** 15. `registry.create` — create entity + first revision 16. `registry.update` — update entity metadata (no revision) 17. `registry.add_revision` — create new content revision 18. `registry.link` — create or remove graph links (action: link, unlink) 19. `registry.manage_tags` — add/remove/list tag assignments 20. `eval.refresh_signals` — refresh eval signals for entity refs **Memory & History (2 tools):** 21. `memory.query` — search shared agent memories 22. `history.query` — audit trail for recent write operations **Execution (3 tools):** 23. `tool.search` — find executable tools by intent (includes not_executable_reason) 24. `tool.describe` — inspect tool contract, schema, examples before execution 25. `tool.execute` — run a hosted tool through vault/policy/audit pipeline **Delivery (5 tools):** 26. `delivery.connect_link.create` — start OAuth connect flow for social accounts 27. `delivery.targets.discover` — refresh targets from a connected provider 28. `delivery.targets.list` — list promoted delivery targets 29. `delivery.target.promote` — promote a target to org-wide availability 30. `delivery.schedule` — manage delivery queue (action: create, list, get, reschedule, cancel, stats) **Marketing Automation (1 tool):** 31. `marketo` — inspect Marketo health, programs, leads, and campaign triggers through the MCP control plane **Lists (2 tools):** 32. `lists.get` — fetch curated list with items and votes 33. `lists.manage` — create/populate/vote/publish lists (action: create, add_item, vote, publish) **Quality (3 tools):** 34. `deliberate` — full multi-agent quality review (30-120s, 5 patterns, 3 quality modes; supports `async: true`) 35. `deliberate.quick` — fast single-round review with 30-second ceiling 36. `deliberate.status` — poll an async deliberation run by `run_id` ## Profiles and Toolsets - Current full catalog: `36` tools - Current `chat_safe` profile: read-only tools only - Toolsets: default, authoring, delivery, extended, bootstrap, agent, multimedia, v0, delivery-admin - `bootstrap` toolset: compact first-glance surface - `agent` toolset: full registry/control-plane + execution (excludes delivery-admin) - `delivery-admin` toolset: Pipedream connect and target promotion Trusted hosted agents should default to the full catalog or explicitly set `agent,delivery`. Bootstrap is the first glance, not the whole system. The expected hosted-agent flow is: 1. `guide` (system orientation) or `registry.capabilities` (raw map) 2. `registry.agent_context` (rich context packet) or `discover` (ranked search) 3. `get_entity` / `get_skill_content` / `traverse` for details 4. `tool.search` / `tool.describe` / `tool.execute` for external tool execution 5. `deliberate.quick` or `deliberate` for quality review before saving Provider-family rule: - connect Katailyst once as the MCP surface - Firecrawl, Tavily, Brave, Cloudinary, and other providers are exposed as hosted tool families behind Katailyst - do not add separate provider MCP servers for normal Katailyst operator flows unless you are intentionally working outside the Katailyst control plane Hosted `/mcp` sanitizes dotted catalog ids for `tools/call`. Common examples: - `tool.describe` is called as `tool_describe` - `deliberate.quick` is called as `deliberate_quick` - `deliberate.status` is called as `deliberate_status` ## Common Behavior - Input validation is strict (Zod). - Errors return MCP `isError=true` content payloads. - Cursor continuation for discovery is deterministic and opaque. - Discovery supports `response_mode: text | json | compact`. - Consolidated tools use an `action` enum parameter for sub-operations. - `compact` response modes on agent_context and capabilities reduce response size by 80-90%. ## Contracts ## `discover` / `registry.search` / `registry.expand` Purpose: intent-based candidate retrieval from registry. Input: - `intent` (string, required) - `entity_types` (string[], optional; constrained to canonical entity types) - `tags` (string[], optional) - `bundles` (string[], optional) - `agent_ref` (string, optional; proclivity-aware ranking) - `runtime_lane` (string, optional; compiles runtime-aware capability packets) - `compatibility_profile_id` (string, optional; carried into recommendation receipts) - `org_id` (uuid, optional) - `limit` (int 1..200, optional) - `cursor` (string, optional) - `response_mode` (`text` | `json` | `compact`, optional) Output (structured): - `intent`, `count`, `total_candidates` - `next_cursor`, `has_more`, `can_request_more` - `note` - `results[]` with: - `ref`, `entity_type`, `code`, `name` - `summary`, `use_case` - `priority_tier`, `relevance_score` - `tags[]`, `match_reasons[]` - `capability_packets[]` with identity, compatibility, ranking receipts, governance hints, and `why_recommended` / `why_not_recommended_here` - ranking receipts now include `selection_score`, `runtime_confirmed`, and `requested_tag_overlap[]` - `recommendation_receipt` with: - `runtime_lane`, `compatibility_profile_id`, `agent_ref` - `requested_tags[]`, `requested_bundles[]` - `suggested_count_range`, `suggested_unit_mix` - `top_ref`, `top_score`, `confidence`, `clarification_hints[]` - `advisory_refs[]` for optional helper entry points that are intentionally kept outside the ranked result set ## `get_entity` / `registry.get` Purpose: load full entity record with revision/context. Input: - `entity_type` (enum, required) - `code` (string, required) - `version` (string, optional) Output: - canonical entity metadata - current revision content - tags/links context (handler-formatted payload) Notes: - For KB-like entities, `content_json.kb` is the synchronized revision snapshot; canonical reusable body content lives in `kb_items` + `kb_variants`. - Artifact bodies remain separate revision-scoped files and are read through `registry.artifact_body`. ## `list_entities` Purpose: filtered paginated listing. Input: - `entity_type` (optional) - `status` (optional) - `tags` (string[], optional) - `limit` (int 1..100, optional) - `offset` (int >= 0, optional) Output: - list of entities matching filters with normalized fields. ## `get_skill_content` Purpose: retrieve the rendered instruction body for a skill or other typed registry entity. Input: - `code` (string, required) - `entity_type` (enum, optional; preserves legacy skill-only lookup when omitted) - `include_artifacts` (boolean, optional) - `artifact_mode` (`auto | inline | manifest`, optional; defaults to `auto` when artifacts are requested) Output: - generated SKILL-style markdown content - optionally inlined artifact sections for smaller payloads - or an artifact manifest that points you to `registry.artifact_body` for exact files Recommended loading ladder: 1. `discover` / `traverse` to get `entity_type` + `code` 2. `get_entity(entity_type, code)` for metadata and artifact paths 3. `get_skill_content(code)` for skills, or `get_skill_content(code, entity_type)` when you need the rendered body of a typed non-skill entity 4. `registry.artifact_body(entity_type, code, path)` for exact artifact files Artifact loading behavior: - `artifact_mode: "auto"` keeps small artifact sets inline but switches larger ones to a manifest automatically - `artifact_mode: "inline"` forces full inline artifact bodies - `artifact_mode: "manifest"` always returns the manifest/path list ## `search_tags` Purpose: tag lookup by prefix/namespace. Input: - `query` (string, required) - `namespace` (string, optional) - `limit` (int 1..200, optional) Output: - ranked/filtered tag candidates for UI or query construction. ## `traverse` Purpose: graph expansion from root typed ref. Input: - `ref` (string, required; format `type:code`, no version) - `link_types` (enum[], optional; constrained to canonical link types) - `depth` (int 1..5, optional) - `org_id` (uuid, optional) Output: - traversal neighbors with link metadata (type/weight/reason) and entity context. ## `registry.capabilities` Purpose: provide one-call MCP capability inventory for client bootstrapping. Input: - `include_stats` (boolean, optional; default true) - `profile` (`full` | `chat_safe`, optional; defaults to the authenticated caller profile when omitted) - `org_id` (uuid, optional) Output: - `tools[]` with operation classes (`read | write | dangerous_write_confirmation`) - `resources[]` and `prompts[]` - `capability_layers[]` separating vault inventory, registry control plane, and hosted execution families - defaults (`transport`, `endpoint`, write-policy summary) - `toolsets[]` with names, descriptions, counts, and member tools - `bootstrap_contract` describing the intended hosted-agent flow - `starter_refs[]` with concrete example refs for media, design, copy, registry authoring, and HLT foundation work - optional aggregate stats (`entities_by_type`, `links_total`, `lists_total`) ## `registry.session` Purpose: summarize the current MCP session, active profile, and execution surface. Input: - `org_id` (uuid, optional) Output: - active auth/profile summary - toolset + scope posture - execution-org context when available - current surface guidance for hosted-agent use ## `registry.agent_context` Purpose: one-call context packet for agents (discover + graph + docs pointers). Default posture: - prefer this over raw `discover` when the caller needs a usable working packet, not just a shortlist - assemble the packet in this order whenever possible: 1. system baseline 2. HLT foundation packs 3. personal overlay when present 4. work hub(s) 5. domain, product, or audience pack(s) 6. examples and artifacts - if the packet comes back too thin, fall back to `discover` → hub read → `traverse` → manual assembly See: - `docs/references/contracts/LAYERED_CONTEXT_PACKET_CONTRACT.md` Input: - `intent` (string, required) - `entity_types` (string[], optional) - `tags` (string[], optional) - `bundles` (string[], optional) - `agent_ref` (string, optional; typed ref like `agent:victoria@v1`) - `page_type` (string, optional; current surface like `agent_detail`, `editor`, `dashboard`) - `operator_goal` (string, optional; caller goal for page-aware assistance) - `runtime_lane` (string, optional) - `compatibility_profile_id` (string, optional) - `selected_refs` (typed refs or `{ entity_type, code, version? }[]`, optional) - `resume_from_run_id` (string, optional) - `failure_digests` (`{ run_id?, step_id?, failure_class, summary }[]`, optional) - `limit` (int 1..20, optional) - `graph_depth` (int 0..3, optional) - `include_docs` (boolean, optional; default true) - `org_id` (uuid, optional) Output: - `discover[]` candidate menu (same canonical shape as `discover`) - `capability_packets[]` recommendation packets with identity, compatibility hints, execution hints, and ranking receipts - `recommendation_receipt` shared Oracle-style receipt for the returned packet set - includes `graph_promotions[]` for structural dependencies and `advisory_refs[]` for optional helper refs outside the ranked set - `graph.nodes[]` + `graph.edges[]` neighborhood for top results - `docs[]` curated pointers for relevant atomic-unit docs - `unit_vocabulary[]` compact atomic-unit vocabulary hints for the returned entity types - `meta` counts plus runtime, page, goal, compatibility, resume, and failure-digest metadata Interpretation notes: - work hubs should answer "what kind of work is this?" - domain/product/audience packs should answer "what world is this for?" - hubs should not absorb product truth that belongs in packs ## `tool.search` Purpose: search executable tools by intent before picking or executing one. Notes: - hosted provider families like Firecrawl appear here as canonical tool refs - the normal execution path is `tool.search -> tool.describe -> tool.execute` Input: - `query` (string, required) - `family_id` (string, optional) - `limit` (int 1..20, optional) - `org_id` (uuid, optional) Output: - ranked tool candidates - family + operation-class metadata - scope and auth hints for the execution lane ## `tool.describe` Purpose: inspect a canonical tool entity before execution. Input: ## `tool.execute` Purpose: execute a canonical tool entity through the hosted Katailyst execution substrate. Input: - `tool_ref` (string, required; example `tool:publish.email`) - `input` (arbitrary JSON, optional) - `approval_override` (boolean, optional) - `org_id` (uuid, optional) Output: - tool ref - policy decision / reason / risk tier - executor output payload - vault-backed auth resolution happens inside the execution substrate, not in repo mirrors ## `guide` Purpose: system orientation tool. Explains how to use the system, provides task-shaped navigation, and describes toolsets and tool families. Call with no args for the full overview including sub-agent patterns and use case examples. Input: - `use_case` (string, optional; task-shaped navigation help) - `toolset_name` (string, optional; inspect a specific toolset) - `family_id` (string, optional; inspect a specific tool family) Output: - When no args: full system overview with usage patterns, sub-agent dispatch guidance, example use cases, toolset index, and tool family readiness - When use_case: specific tools, toolsets, and resources relevant to that task - When toolset_name or family_id: detailed inspection of that surface ## `delivery.connect_link.create` Purpose: create a hosted Pipedream connect link for a delivery provider app. Input: - `provider_app_slug` (string, required) - `return_path` (string, optional) - `app_id` (uuid, optional) - `target_kind` (string, optional) - `org_id` (uuid, optional) Output: - provider metadata - template key - session id - hosted connect URL / callback metadata ## `delivery.targets.discover` Purpose: refresh discoverable delivery targets from a private hosted Pipedream connection and sync them into canonical target tables. Input: - `connection_id` (uuid, required) - `org_id` (uuid, optional) Output: - connection summary - provider/provider app summary - discovered targets - canonical sync metadata ## `delivery.targets.list` Purpose: list already-promoted delivery targets from canonical truth. Input: - `app_id` (uuid, optional) - `platform` (string, optional) - `provider` (string, optional; defaults to `pipedream`) - `org_id` (uuid, optional) Output: - promoted canonical targets from `app_external_ids` - app/platform/provider summaries - target counts ## `delivery.target.promote` Purpose: promote one discovered delivery target into canonical org-shared truth. Input: - `connection_id` (uuid, required) - `target_external_id` (string, required) - `display_name` (string, optional) - `target_kind` (string, optional) - `app_id` (uuid, optional) - `org_id` (uuid, optional) Output: - promoted target metadata - resulting app/org binding - refreshed promoted-target list ## `delivery.schedule` Purpose: manage the content delivery queue — schedule, inspect, reschedule, cancel, or monitor deliveries. Input: - `action` (enum: `create`, `list`, `get`, `reschedule`, `cancel`, `stats` — required) - `content_id`, `content_type`, `content_preview`, `target_id`, `scheduled_at` (required for create) - `delivery_id` (required for get, reschedule, cancel) - `new_scheduled_at` (required for reschedule) - `from`, `to`, `status[]`, `platform[]`, `limit`, `offset` (optional filters for list) - `variant_id`, `variant_label`, `factory_run_id` (optional for create/list) - `org_id` (uuid, optional) Output varies by action — delivery metadata, queue rows, or health metrics. ## `registry.health` Purpose: one-call service health for MCP + registry + integration readiness. Input: - `org_id` (uuid, optional) - `include_integrations` (boolean, optional; default `true`) Output: - `mcp` inventory counters (tools/resources/prompts) - `registry` aggregate counters (`entities_total`, `links_total`, `lists_total`) - optional `integrations` payload: - manifest count/errors - table diagnostics - per-provider readiness entries and status summary ## `registry.graph.summary` Purpose: fast graph overview for either global registry topology or a root ref neighborhood. Input: - `root_ref` (string, optional) - `depth` (int 1..3, optional) - `link_types` (enum[], optional) - `org_id` (uuid, optional) - `limit` (int 1..100, optional) Output: - if `root_ref` provided: - `nodes[]`, `edges[]`, `link_type_counts`, truncation metadata - if `root_ref` omitted: - top-degree `nodes[]` summary with `in_degree`, `out_degree`, `degree` ## `incoming.sources` Purpose: browse or inspect source packs queued for registry ingestion. Input: - `action` (enum: `list`, `get` — required) - `source_slug` (string, required for get) - `org_id` (uuid, optional) - `status`, `pipeline_kind`, `limit`, `offset` (optional filters for list) - `include_items`, `include_runs`, `limit_runs` (optional for get) Output varies by action — source pack list or detailed source pack with items and runs. ## `registry.artifact_body` Purpose: fetch raw body for a single artifact path on an entity revision. Input: - `entity_type` (enum, required) - `code` (string, required) - `path` (string, required; artifact relative path) - `version` (string, optional) - `org_id` (uuid, optional) Output: - artifact metadata (`artifact_type`, `mime`, `content_hash`, payload type) - full artifact body text (payload store-backed, with revision fallback) Notes: - This reads revision-scoped artifact bodies, not graph links. - For KB-like entities, canonical reusable body content lives in `kb_items` + `kb_variants`; revision JSON only keeps the synchronized snapshot. ## `registry.create` Purpose: create a new registry entity and initial revision in one call. Input: - `entity_type` (enum, required) - `code` (slug, required) - `name` (string, required) - `content_json` (object, optional) - `summary`, `status`, `version`, `use_case`, `priority_tier`, `author` (optional) - `org_id` (uuid, optional) - governance fields: - `actor_label`, `actor_user_id` (optional) - `request_id` (optional, strongly recommended) - `confirm_id`, `confirm_token` (required only for dangerous-write replay flows) Output: - `ok`, `entity_ref`, `entity_id`, `revision` metadata - first-pass dangerous writes can return `confirm_required=true` + challenge payload ## `registry.update` Purpose: update entity metadata fields (no revision creation). Input: - `entity_type`, `code` (required identity pair) - mutable fields: `name`, `summary`, `status`, `use_case`, `priority_tier`, `author`, `source`, `source_url` - `org_id` (uuid, optional) - governance fields (`actor_*`, `request_id`, `confirm_*`) Output: - `ok`, `entity_ref`, and changed-field summary - dangerous transitions can return challenge envelope (`confirm_required=true`) ## `registry.add_revision` Purpose: create a new content revision and set as current revision. Input: - `entity_type`, `code` (required) - `content_json` (object, required) - `change_notes` (optional) - `org_id` (uuid, optional) - governance fields (`actor_*`, `request_id`, `confirm_*`) Output: - `ok`, `entity_ref`, `revision` (`number`, `id`) - dangerous writes can require challenge replay with `confirm_id` + `confirm_token` ## `registry.link` Purpose: create or remove directed typed links between registry refs (idempotent). Input: - `action` (enum: `link`, `unlink` — optional, defaults to `link`) - `from_ref`, `to_ref` (typed refs, required) - `link_type` (enum, required) - `weight` (0..1, optional; for link action) - `reason` (optional but recommended) - `org_id` (uuid, optional) - governance fields (`actor_*`, `request_id`, `confirm_*`) Output: - `ok`, link metadata, and idempotent no-op indication when link already exists (or was already absent for unlink) ## `registry.manage_tags` Purpose: add, remove, or list tag assignments on a registry entity without mutating taxonomy canon. Input: - `entity_ref` (typed ref, required) - `action` (`add` | `remove` | `list`, required) - `tags` (`namespace:code` strings, required for `add` / `remove`) - `org_id` (uuid, optional) - governance fields (`actor_*`, `request_id`, `confirm_*`) Output: - assignment result summary - structured payload with `added`, `removed`, `already_absent`, `invalid`, `missing_namespace`, and `missing_tag` - `list` mode returns the entity's current tags ## `lists.get` Purpose: fetch list, items, and voting summary. Input: - `list_id` (uuid, required) Output: - list core fields (`title`, `scope`, `status`, `owner`) - `items[]` and vote aggregates ## `lists.manage` Purpose: create, populate, vote on, or publish curated lists. Input: - `action` (enum: `create`, `add_item`, `vote`, `publish` — required) - `list_id` (uuid, required for add_item, vote, publish) - `org_id`, `title`, `description`, `scope`, `status` (for create/publish) - `entity_ref`, `note`, `sort_order` (for add_item) - `voter_user_id`, `vote` (for vote) - governance fields (`actor_*`, `request_id`, `confirm_*`) Output varies by action — list metadata, item payload, vote summary, or publication status. ## Decision Guide - Don't know where to start: `guide` (no args for full overview, or pass use_case for task-shaped navigation) - Need best candidates by intent: `discover` (write a rich intent -- 2+ sentences with audience, topic, goal) - Need more pages of same search: `discover` with `cursor` from prior response - Need one exact unit with full context: `get_entity` - Need full runnable skill payload: `get_skill_content` - Need related units around a known ref: `traverse` (hubs are the highest-value targets) - Need a dense context packet for agent planning/execution: `registry.agent_context` (with compact mode for smaller responses) - Need bootstrapping metadata for an MCP client/session: `registry.capabilities` (with compact mode) - Need live service readiness for registry + integrations: `registry.health` - Need graph topology summary (global or root-scoped): `registry.graph.summary` - Need incoming queue inventory and lineage: `incoming.sources` (action: list or get) - Need an exact artifact file body for an entity: `registry.artifact_body` - Need taxonomy suggestions while composing query/UI: `search_tags` - Need to create or revise registry entities: `registry.create`, `registry.update`, `registry.add_revision`, `registry.link` - Need to manage graph links (create or remove): `registry.link` (action: link or unlink) - Need collaborative/voting list workflows: `lists.manage` (action: create, add_item, vote, publish) - Need to schedule/manage content deliveries: `delivery.schedule` (action: create, list, get, reschedule, cancel, stats) - Need quality review before saving: `deliberate.quick` (fast, 30s) or `deliberate` (deep, 30-120s) - Need external tool execution (web search, image gen, etc): `tool.search` -> `tool.describe` -> `tool.execute` - Need to recall past learnings: `memory.query` - Need audit trail for recent changes: `history.query` ## Playground Notes - The playground routes are dashboard helpers, not a second MCP server. - They call the same `/mcp` transport via MCP SDK client transport for parity. - Playground defaults to `profile=chat_safe` for both bootstrap and run. - `profile=full` is allowed only for org `owner`/`admin` roles. - Presets/history persistence is user + org scoped: - `public.mcp_playground_presets` - `public.mcp_playground_history` - In `chat_safe` profile, non-chat-safe tools are rejected before execution. - In `full` profile, dangerous writes still require confirmation token replay (`confirm_id`, `confirm_token`) from the first response envelope. ## MCP Route Auth Notes - `/mcp` is auth-required by default. - Anonymous mode is disabled unless explicitly enabled via `MCP_ALLOW_ANONYMOUS=true`. - Supported auth modes: - Static bearer token (`MCP_AUTH_TOKEN` or `MCP_AUTH_TOKENS`) - Per-user personal MCP token - Short-lived connect token - OAuth access token (authorization-code + PKCE) - Supabase session cookie - Route-level capability policy at `/mcp` is now two-dimensional: - auth determines the allowed profile (`personal_token` / `connect_token` can be `full` or `chat_safe` depending on the issued token row or claim; static env bearer stays `chat_safe`) - toolset selection determines the visible branch (`bootstrap`, `agent`, `delivery-admin`, `multimedia`, `v0`, or the default catalog) - OAuth tokens add a scope ceiling on top of profile + toolset filtering; visible tools/resources/prompts are the intersection of granted scopes and requested toolset - `registry.capabilities` follows the authenticated caller profile: - omitted `profile` resolves to the caller's allowed profile - trusted hosted agents should default to the full catalog or explicitly set `x-katailyst-toolset: agent,delivery` - use `bootstrap` only for intentionally narrow first-glance or read-first sessions - Team rollout guidance: - prefer OAuth for signed-in human setup flows and remote public clients - prefer per-user personal tokens for long-lived remote client access - use connect tokens for temporary handoff and validation - reserve static env bearer tokens for controlled infrastructure cases where `chat_safe` is acceptable - Response headers include: - `x-katailyst-mcp-auth` - `x-katailyst-mcp-profile` - `x-katailyst-mcp-scopes` (OAuth mode only) - `x-request-id` ## Hosted OAuth Scope Families - `meta.read`, `discovery.read`, `registry.read`, `registry.write` - `skills.read`, `taxonomy.read`, `graph.read`, `incoming.read` - `lists.read`, `lists.write`, `lists.publish` - `execution.search`, `execution.describe`, `execution.execute` - `delivery.connect`, `delivery.targets.read`, `delivery.targets.promote` - `delivery.schedule.read`, `delivery.schedule.write` - `resources.read`, `prompts.read` - Broad aliases are also accepted for trusted clients: `registry.*`, `lists.*`, `execution.*`, `delivery.*`, `read.*`, `write.*`, `admin.*` ## Resources and Prompts Key resources: - `katailyst://docs/interop/registry-api-contract` - `katailyst://docs/interop/orchestrator-workflow` - `katailyst://docs/references/discovery-rerank` - `katailyst://docs/atomic-units/readme` - `katailyst://integrations/catalog` (manifest-generated integration catalog) Key prompts (5): - `registry-select-from-menu` — choose the best match from discovery results - `registry-refine-discovery` — improve a failed or weak search - `registry-integration-onboarding` — plan an integration rollout - `registry-agent-bootstrap` — build agent context packet parameters - `registry-expand-user-request` — enrich thin user input for richer discovery ## `deliberate` Purpose: full multi-agent quality review pipeline. 30-120 seconds. 5 patterns (committee, adversarial-debate, red-team, panel, progressive-critique). 3 quality modes (standard, aggressive, maximum). Returns scored artifact with critique trail. When `async: true`, it returns immediately with a `run_id` and polling metadata instead of waiting for completion. Input: - `intent` (enum: skill-creation, style-creation, content-strategy, graph-ontology, brand-planning, general — required) - `question` (string, required; what to build or review) - `pattern` (enum, optional; defaults to auto-recommended) - `quality_mode` (enum, optional; standard/aggressive/maximum) - `max_rounds` (int 1-10, optional) - `quality_threshold` (0-100, optional) - `seed_entity_refs` (string[], optional; entities to seed as context) - `org_id` (uuid, optional) - `async` (boolean, optional; when true, fire-and-forget and poll with `deliberate_status` on hosted MCP or `deliberate.status` on unsanitized transports) Output: - sync mode: run ID, status, pattern, quality score - per-round summaries with phase records - final artifact preview - improvement areas - advisory playbook draft - async mode: `run_id`, `status`, `poll_tool`, `poll_catalog_name`, `poll_args` ## `deliberate.quick` Purpose: fast single-round quality review with 30-second ceiling. Same pipeline as deliberate but limited to 1 round, standard quality, no cross-pattern. Use from Claude Code, Cowork, or any MCP client with short timeouts. Input: - `intent` (enum, required) - `question` (string, required) - `pattern` (enum, optional) - `seed_entity_refs` (string[], optional) - `org_id` (uuid, optional) Output: same shape as deliberate, or a timeout message if 30 seconds is exceeded. ## `deliberate.status` Purpose: poll an async deliberation run created by `deliberate(async=true)`. Use this when the initial deliberation request returned immediately with a `run_id`. Input: - `run_id` (uuid, required) Output: - `running` / `pending`: progress text and elapsed time - `completed`: the stored deliberation result payload - `failed`: the error message captured for the background run --- ## Source: docs/atomic-unit-quality.md # Atomic Unit Quality: Design Document **Date:** 2026-03-18 | **Updated:** 2026-03-26 **Status:** Active problem analysis. Some mitigations implemented (creation fallback warnings, commitRegistryEntityCreate canonical path). Core bypass problems remain. --- ## The Problem (What Actually Goes Wrong) The system has sophisticated creation infrastructure (Capability Forge with 4 modes, Factory Wizard with 4-step pipeline, Interview with 7 fields, Enrichment with AI-suggested tags/links, Validation with errors/warnings, 942-line skill import agent, DraftQualityScore with 8 criteria). **The problem is not missing infrastructure — it's that creators bypass it.** **How things go wrong:** 1. **MCP bypass.** An agent calls `registry.create` directly with minimal fields. No interview, no enrichment, no validation step. The entity gets saved with zero tags, no links, no artifacts, no summary. The UI creation flow (Forge → Wizard → Enrichment → Validation) exists but the MCP path doesn't enforce it. 2. **Batch scripting.** An agent batch-imports 50 items without reading any of them. The skill-import-agent (942 lines of validation) exists but scripts bypass it by calling the DB directly. 3. **Minimal enrichment.** The Factory Wizard suggests tags and links with confidence scoring, but the agent auto-applies high-confidence suggestions without reviewing, and skips medium/low ones entirely. Result: partial tag coverage, missing links. 4. **No follow-through on artifacts.** After creating a skill, nobody goes back to add the distilled/snippet/full variants that make it actually useful for different contexts. KBs especially need variants. 5. **No pre-creation research.** The biggest quality driver — looking at what already exists, finding the best examples, searching trending skills on skills.sh — gets skipped because there's no structured prompt for it in the creation flow. --- ## What Already Exists (Don't Rebuild This) | Capability | Where | Status | | ----------------------------------------------- | ---------------------------------------- | ----------------- | | 4-mode creation (Drop, Interview, Mine, Browse) | Capability Forge page | Working | | 7-field interview with auto-classification | `CreateFromIdeaModal` | Working | | Factory Wizard 4-step pipeline | `FactoryWizard` component | Working | | AI enrichment with confidence scoring | `FactoryEnrichmentReview` | Working | | Validation with errors + warnings | `FactoryValidation` | Working | | Skill import agent with 5 issue types | `lib/import/skill-import-agent.ts` | Working (CLI) | | DraftQualityScore with 8 criteria | `DraftQualityScore` component | Working (sidebar) | | Similarity/dedup check | `lib/registry/write/similarity-check.ts` | Working (on save) | | Decision Matrix for type classification | `docs/atomic-units/DECISION_MATRIX.md` | Exists as doc | | Per-type artifact/link requirements | `docs/atomic-units/*.md` | Exists as docs | | Save bar with warnings | `EntityFormLinks` (just added) | Working | --- ## Design: Three Interventions ### Intervention 1: MCP `registry.create` Response Enrichment **Problem:** MCP create returns success/failure with no quality feedback. The agent has no idea it just saved something incomplete. **Solution:** After a successful `registry.create` via MCP, include a `quality_report` in the response: ```typescript { ok: true, entity_ref: "skill:my-new-skill@v1", quality_report: { score: 42, pass_threshold: 75, warnings: [ "No tags — entity won't appear in filtered discovery", "No summary — browse cards and search results will be blank", "No links — entity is isolated in the graph, won't be found via graph traversal", "No artifacts — skill has no instruction body or examples" ], suggestions: [ "Add tags from these namespaces: domain, action, audience, surface", "Add a summary (1-2 sentences describing what this does)", "Link to related KBs using 'uses_kb' link type", "Add an instruction body artifact with step-by-step methodology" ], similar_entities: [ { ref: "skill:existing-research@v1", score: 0.87, reason: "Similar name and domain" } ] } } ``` This gives the agent immediate, actionable feedback without blocking the save. The agent can then follow up with `registry.manage_tags`, `registry.link`, etc. **Implementation:** Add a post-save quality check function in `lib/registry/write/entity-lifecycle.ts` that runs the same checks as `DraftQualityScore` plus tag coverage, link count, and artifact presence. Include the result in the ActionState response. The MCP handler for `registry.create` already returns the ActionState — it just needs to include this new field. ### Intervention 2: Pre-Creation Research Prompt **Problem:** The biggest quality driver is research BEFORE creating — looking at what exists, finding the best examples, understanding where the new entity fits in the graph. But neither the UI nor the MCP flow prompts for this. **Solution:** Add a `registry.prepare_creation` MCP tool (or enhance `registry.agent_context` when called with a creation intent) that returns: 1. **Existing similar entities** — "Before creating a 'marketing copywriting' skill, here are 5 similar skills already in the registry..." 2. **Type classification guidance** — "Based on your description, this is most likely a Skill (procedural how-to) rather than a KB (reference material). Here's why..." 3. **Required components for this type** — "Skills need: activation condition, instruction body, ≥3 tags from domain/action/surface namespaces, ≥1 link to a related KB" 4. **Quality examples** — "Here are 2 top-tier (tier 1-3) skills in this domain that you can use as reference..." 5. **Community scan suggestion** — "Search skills.sh for 'copywriting' to find trending community skills you can adapt" For the UI: The Interview mode already has the 7-field questionnaire. Add a "Pre-flight check" step BEFORE the interview that does the similarity search and shows existing entities. This could be as simple as: when the user types their idea in the first field, debounce-search the registry and show "X similar entities already exist — review them first?" **Implementation:** This is primarily a skill/prompt improvement, not a UI change. The `registry.agent_context` tool already exists and could return creation guidance when called with an appropriate intent. The Forge's Interview mode could add a client-side similarity search on the first field's blur event. ### Intervention 3: Per-Type Completeness Checklist **Problem:** Each entity type has different requirements (skills need activation + instruction body + artifacts; KBs need variants; tools need execution specs) but these requirements aren't surfaced during creation or visible during editing. **Solution:** Add a `completeness_checklist` to the entity detail page sidebar (next to the existing DraftQualityScore) that shows type-specific requirements: **For Skills:** - [ ] Activation condition set - [ ] Instruction body > 100 chars - [ ] ≥ 3 tags from required namespaces - [ ] ≥ 1 link to a KB or style - [ ] At least 1 artifact (instruction body markdown) - [ ] Summary filled (for browse cards) - [ ] Use case filled (for discovery ranking) **For KBs:** - [ ] At least 1 artifact variant (full, distilled, or snippet) - [ ] ≥ 3 tags from required namespaces - [ ] ≥ 1 link to a related skill or hub - [ ] Summary filled - [ ] Content length > 200 chars (not a stub) **For Tools:** - [ ] Provider specified - [ ] At least 1 tag from tool_type namespace - [ ] Connection/execution documentation present Each unchecked item is a warning (amber), not a blocker. When all items are checked, the entity shows a green "Complete" indicator. This makes it immediately obvious what's missing. **Implementation:** Create a `EntityCompletenessChecklist` component that takes the entity data and returns a checklist based on entity type. Render it in the entity detail page sidebar. Also include the checklist summary in the MCP `quality_report` response. --- ## What NOT to Do 1. **Don't add more hard gates on save.** The existing tag validation (rejects unknown tags) and slug validation are sufficient gates. Everything else should be warnings. 2. **Don't rebuild the Forge/Wizard.** The existing 4-mode creation flow with AI enrichment and validation is sophisticated. Improve it, don't replace it. 3. **Don't force all MCP creates through the Wizard.** Agents need to be able to create entities quickly. The fix is better feedback after create, not mandatory multi-step flows. 4. **Don't batch anything.** The skill-import-agent exists for one-at-a-time validation. Any new import flow should use it, not bypass it. 5. **Don't over-specify requirements.** Some entities legitimately don't need artifacts (e.g., a simple metric entity). Requirements should be type-specific and progressive, not universal. --- ## Execution Order 1. **EntityCompletenessChecklist component** (Intervention 3) — Highest leverage because it's visible on every entity detail page and immediately shows what's missing. Purely additive, no changes to existing flows. 2. **Pre-creation similarity search in Interview mode** (Intervention 2, partial) — Add debounced search on the first interview field. Low risk, high value for preventing duplicates. 3. **MCP quality_report in create response** (Intervention 1) — Requires touching the MCP handler and entity-lifecycle. Medium risk, high value for agent-created entities. 4. **Registry.prepare_creation tool or enhanced agent_context** (Intervention 2, full) — New MCP tool. Medium effort, high value for guiding agents through quality creation. --- ## Relationship to Existing Skills The system already has a `skill-creator` skill (from Anthropic's ecosystem, modified for Katailyst). This skill should be updated to: 1. Always search the registry before creating (call `discover` first) 2. Always check skills.sh for trending alternatives 3. Always fill required tag namespaces 4. Always create at least one artifact 5. Reference the Decision Matrix when classifying type A separate "decide which atomic unit type" skill could help with classification, but the Interview mode's auto-classification (entity_type: null triggers auto) already does this via the autopilot. The skill would mainly help MCP/IDE agents that bypass the Interview mode. --- ## Testing Strategy **Eval case 1:** "Create a skill for writing LinkedIn posts for healthcare professionals." Does the flow: - Search for existing social media / LinkedIn / healthcare skills first? - Classify correctly as a Skill (not KB or Prompt)? - Include activation condition, instruction body, and ≥3 tags? - Link to relevant KBs and styles? - Produce a quality score ≥ 75? **Eval case 2:** "Import this community skill from skills.sh and adapt it for HLT." Does the flow: - Run through the skill-import-agent validation? - Check for duplicates? - Add HLT-specific tags (domain, audience, app)? - Preserve the original source attribution? - Create artifacts? **Eval case 3:** "I have information about our NP certification pricing. Where should this go?" Does the flow: - Classify as KB (reference material, not procedural)? - Search for existing NP / certification KBs? - Create with proper variants (at least full)? - Link to the NP content hub? --- ## Source: docs/atomic-units/ACTIONS.md # Actions (Launchpad Cards) Actions are the **curated push-button entry points** (cards + command palette) that make the CMS feel AI-native. ## What It Is (Canonical Model) - **Not** a new registry `entity_type` (v1). - An **Action is a featured `playbook`**: - stored as `registry_entities.entity_type = 'playbook'` - surfaced as an Action via: - `surface:cms-launchpad` - and/or membership in `bundle:launchpad-core@v1` (recommended for stable curation) This keeps the ontology stable while making Actions first-class in the UX. ## When to Use - you want a stable, low-cognitive-load “do the thing” entry point - you want repeatable workflows with run traces and replay - you want curated defaults (scope, tools, templates) without forcing rigid orchestration ## Required (DB) - `registry_entities` row (selection-ready): - `name`, `summary`, `use_case` - `status` should be at least `curated` (prefer `published` for default Launchpad) - `playbooks` extension row - at least 1 `entity_revisions` row (Actions must not have NULL `current_revision_id`) ## Revision Content Contract (Recommended Fields) Store in `entity_revisions.content_json` (not in the extension row): ```json { "north_star": "What this action accomplishes in one sentence", "inputs": { "shape": "object (describe)" }, "outputs": { "shape": "object (describe)" }, "mutation_policy": "read_only|diff_first|write_embeddings_only|...", "steps_overview": ["Short list of phases the playbook follows"], "notes": "Optional operator notes" } ``` ## Tag Coverage (Minimum) - `surface:cms-launchpad` (required) - `scope:*` (required) - `status:*` (required) - `family:*` (required; used for Launchpad grouping) - `domain:*` (recommended) - `action:*` (recommended) - `stage:*` (recommended) - `modality:*` (recommended) ## Links - Use `uses_tool`, `uses_prompt`, `uses_kb`, `requires` links to declare dependencies. - Ensure bundle curation uses `bundle_member` with direction **member → bundle**. ## Testing (Strongly Recommended) At least one of: - regression fixtures that can be executed in Test Lab - example run traces captured as artifacts (so Action behavior is inspectable) Actions without an inspectable example should not be treated as Tier 1. --- ## Source: docs/atomic-units/AGENT_DOCS.md # Agent Doc (Rules + Expectations) Agent docs are **runtime steering documents** for AI agents. They define identity, operating principles, standing instructions, and behavioral contracts that shape how an agent thinks, speaks, and makes decisions. ## When to Use - when an agent needs **identity context** (SOUL, personality, voice) - when defining **operating principles** or SOPs for a specific agent - when encoding **lessons learned** or behavioral calibrations - when specifying **standing instructions** that persist across sessions ## What Agent Doc Is (and Is Not) - Agent doc is identity and behavioral context, not general knowledge. - Agent doc shapes the agent's persona and decision framework. - KB explains domain concepts; agent doc explains how this specific agent should behave. - Agent doc is runtime-relevant -- it should be loaded into agent context at boot. ### Agent Doc vs Nearby Unit Types - `agent_doc` vs `kb` - agent_doc = agent-specific identity, SOPs, operating principles - kb = domain knowledge, reference material, best practices - `agent_doc` vs `prompt` - agent_doc = persistent behavioral context - prompt = executable instruction for a specific task - `agent_doc` vs `agent` - agent = the entity record (persona role, runner mode, metadata) - agent_doc = the steering documents that define how that agent operates If the content reads like general domain knowledge rather than agent-specific behavior, it should be `kb`. If the content is an operational log or daily record, it should be `operational_log`. ## What belongs here vs nearby lanes Use `agent_doc` for: - identity docs - runtime steering files - stable operating instructions for one agent or runtime family - persona-specific behavioral overlays Do not use `agent_doc` for: - field/domain expertise → `kb` - product truth or feature docs → `kb` - raw run/session chronology → `operational_log` - stable user profile or memory that is not agent-specific → `kb` or memory layer Short version: - **agent_doc** = “how this agent should behave” - **kb** = “what is true in the world” - **operational_log** = “what happened” ## Discovery Behavior Agent docs are **hidden from generic discovery** by default. They appear only when: - explicitly requested via `entity_types: ['agent_doc']` - discovered via graph expansion from a linked agent entity - included in a bundle that explicitly lists them This prevents agent steering docs from cluttering general capability discovery. ## Naming Convention Agent doc codes follow the pattern: - `{agent}-identity-{file}` (e.g., `victoria-identity-soul`, `julius-identity-tools`) - `{agent}-core-directives` (e.g., `lila-core-directives`) - `{agent}-operating-principles` (e.g., `victoria-operating-principles`) - `agent-sop-{agent}` (e.g., `agent-sop-victoria`) - `{scope}-agent-{topic}` (e.g., `global-agent-principles`, `agent-foundation-spec`) ## Status Lifecycle Agent docs follow the standard entity status lifecycle: `staged -> curated -> published -> deprecated -> archived` Active agent steering docs should be `published`. Historical or superseded docs should be `deprecated` or `archived`. ## Linking Agent docs should be linked to their parent agent entity via `related`, `requires`, or another explicit graph relation that matches the dependency semantics. Avoid `uses_kb` for agent-doc links now that the runtime steering lane is no longer modeled as KB. --- ## Source: docs/atomic-units/AGENTS.md # Agents (Rules + Expectations) Agents are **personas + preferences**, not capabilities. Canonical rules live in `docs/RULES.md`. Do not duplicate governance elsewhere. ## When to Use - defining a reusable voice or role profile - attaching defaults (temperature, model) to a persona - layering proclivities (prefer/avoid) without changing global links ## Agent vs Nearby Unit Types - `agent` vs `skill` - agent = persona, proclivities, selection bias, and operating defaults - skill = reusable capability or method - `agent` vs `playbook` - agent = who is acting - playbook = a reusable ordered workflow the agent may use, adapt, or ignore when it helps HLT fleet agents should stay thin on capability truth. Their power should come from linked graph context, good docs, and agent judgment, not from hiding all behavior inside persona docs. ## Current Org Placement For HLT Fleet Docs - `hlt` is the active operating layer for live HLT fleet-facing docs and HLT-specific doctrine. - `system` is the shared canonical library/template layer that HLT still uses freely. - `system` is not an exclusion fence. Shared flagship hubs can stay there and still be first-class for HLT. - HLT front-door SOPs, identity overlays, and helper support surfaces should resolve to `hlt`. - Org placement is canonical in Supabase. Mirror paths are portability surfaces, not the deciding truth. ## Shared Base, Thin Overlays For the HLT core hosted agents, optimize for one strong shared base plus thin overlays. That means: - shared operating stack across Victoria, Julius, and Lila - shared lessons surface as mandatory context - thin role-specific overlays and proclivities - authority split based on focus and canon quality, not on generic “safety” reduction All three agents should retain strong overlapping capability: - write - research - plan - use multimedia - discover and draft registry blocks The overlay should change what they pull first and what they own by default, not whether they are capable in the first place. ## Shared Routing and Read Order For the core hosted agents, the shared operating stack should behave like this: 1. load the shared base 2. load the agent front-door SOP 3. load the always-on owner core card 4. interpret the underlying request 5. estimate complexity, stakes, and delivery mode 6. for medium/large or unclear work: self-discover and equip, or ask Victoria for a deeper pass before acting 7. plan before executing non-trivial work, but only as much as the situation deserves 8. execute, judge, save outputs, and capture lessons That routing spine belongs in the shared base. The overlay changes what gets pulled first, not whether the agent interprets, researches, plans, or logs work. It is a heuristic and quality bar, not a command sequence. ## Layered USER Model `USER.md` should be layered, not treated as a flat blob. Always-loaded layer: - role - values and preferences - work style - communication preferences - current priorities - what builds trust - what creates frustration Retrieved-on-demand layer: - work style details - team and relationship context - personal and family context - wow opportunities - long-running focus changes The goal is context focus and preservation. The agent should always remember who it serves, but it should only pull the deeper branch when the task actually needs that detail. ## What Good Agent Modeling Looks Like - persona and defaults are explicit - linked context is rich and truthful - proclivities bias good choices without hard-coding the only route - revision-local artifacts are treated as packaging/export depth, not as the entire brain ## Hosted Super-Agent Doc Stack For hosted agents such as `Victoria`, `Julius`, and `Lila`, follow the real OpenClaw workspace pattern first, then mirror it into registry KB. Treat the stack below as a read-order contract mirrored across runtime docs, registry KB, and repo docs. Do not interpret it as saying one repo-root file is the whole live runtime source. On the Render/OpenClaw disk, these files are the live operating stack: 1. `AGENTS.md` - the front-door orientation layer: identity, quality bar, tier-1 capability families, concrete examples, and pointers to deeper docs/skills 2. `SOUL.md` - core personality, principles, proclivities, and boundaries 3. `USER.md` - who the agent serves, how those users think, and what matters to them 4. `IDENTITY.md` - compact identity/deployment card 5. `TOOLS.md` - runtime/tooling substrate and selection guidance 6. `BOOTSTRAP.md` - startup / recovery ritual 7. `HEARTBEAT.md` - cron / heartbeat expectations 8. `MEMORY.md` - curated long-term memory seed This entrance surface should stay compact but rich. It should make the agent aware of the main terrain without pretending it is the whole brain. Lessons are part of the operating stack: - shared `memory/lessons-learned.md` is mandatory - per-agent lessons such as `memory/lessons-victoria.md` are strongly recommended when the agent has role-specific recurring mistakes Those lessons surfaces exist because the agents repeat mistakes unless the distilled lesson is re-injected. Treat the shared lessons surface as mandatory operating context, and treat per-agent lessons as the second layer when the agent has a distinct failure pattern worth preserving. ## Shared Operating Contract The shared base for core hosted agents should encode these non-negotiables: - interpret before acting - estimate complexity and stakes explicitly - plan before acting, but vary depth by situation - scale research to task complexity - use the registry before inventing - choose depth situationally and compose from one block or many - name blocked lanes precisely - never stop at vague fallback language - always save outputs and lessons after meaningful work - preserve judgment room rather than hard deterministic routing Those rules belong in the shared base, not repeated differently in each agent overlay. ## Nightly Distillation The shared base should also require a nightly distillation discipline: - summarize important new facts - update current priorities - capture lessons learned - propose one proactive “wow” attempt - note which docs, skills, bundles, or parent entry surfaces should be improved Without that layer, the agents accumulate history but do not compound judgment. ## Core Agent Doc Pattern For a new hosted fleet agent, keep the naming and role pattern explicit. Preferred mirrored KB stack: - `agent-sop-{agent}` = mirror of the front-door `AGENTS.md` - `{agent}-identity-soul` = mirror of `SOUL.md` - `{agent}-identity-user` = mirror of `USER.md` - `{agent}-identity-id` = mirror of `IDENTITY.md` - `{agent}-identity-tools` = mirror of `TOOLS.md` - `{agent}-identity-agents` = harder operating conventions and delegation rules if they deserve their own durable reference - `{agent}-identity-memory` = distilled reusable memory, not a raw log dump - `{agent}-identity-bootstrap` = startup / recovery / initialization truth - `{agent}-identity-heartbeat` = cron / monitoring doctrine when it is durable enough to reuse - `agent-lessons-{agent}` = optional distilled lessons learned surface when that agent has a distinct repeat-mistake lane, usually `family:agent-files` + `format:best_practice` - shared `family:agent-files` surfaces can also include people/workspace references, repo/operator orientation references, and architecture/spec notes such as `global-team-context`, `linear-planning-methodology`, `context-engineering-methodology`, `katailyst-spec-index`, or `agent-foundation-spec` - shared lessons surface (for example `memory/lessons-learned.md`, optionally mirrored later) = fleet-wide mistake-prevention layer Use shared fleet docs when the truth should apply across agents: - `agent-core-*` - `agent-operating-*` - `agent-standing-*` Avoid creating multiple peer front-door docs for the same agent. The mirrored `agent-sop-*` layer should route into the rest. ## Registry Authority Split All core agents should be registry-literate: - discover - traverse - hydrate - assemble - draft - stage - recommend Only Victoria should publish structural registry canon by default: - taxonomy changes - naming normalization - final registry linkage - promoted canonical entry-surface and graph composition Julius and Lila should still be able to hand Victoria strong staged drafts and well-reasoned recommendations. That keeps capability high while concentrating canon stewardship. What to avoid: - making the runtime substrate the first thing the agent reads for every task - spreading “how to operate” across six competing front-door docs - treating one-day notes, raw logs, or scratch plans as if they are core identity docs - pretending lessons are just another optional reference instead of a core anti-repeat-mistake surface The front-door operating index should route the agent into the lower layers only when the request actually needs them. ## Structural Fields vs Tags Keep agent identity fields in the right place: - `persona_role` is canonical structural truth in the `agents` table - tags should describe discovery facets such as scope, runtime, source, domain, and modality - mirror `persona_role:*` only when it helps browse/filter UX, not as a mandatory duplicate of the structural field ## Agent Docs, Memory, and Workflow Boundaries The agent operating cluster is where ontology gets muddy fastest. Keep these distinctions sharp: - linked agent docs = reusable runtime steering truth - linked KBs = reusable general/domain reference truth - playbooks = reusable ordered workflows the agent may recommend, adapt, or run - proclivities = agent-specific preference bias - memory/logs = operational continuity, not automatically registry KB or agent_doc - plugin/export = distribution packaging, not the agent's canonical brain Typical linked-doc split for agents: - `format:policy` for operating rules - `format:persona_profile` for identity/voice - `format:reference` or `format:best_practice` for runtime, tools, and methods - lessons-focused docs should usually use `format:reference` or `format:best_practice` plus explicit lessons naming What usually should **not** become agent KB automatically: - one-day scratch logs - run-by-run notes - raw semantic memory snapshots - throwaway planning drafts Operational outputs such as daily logs, transient planning notes, and run traces should stay in memory/log/trace or asset surfaces unless they are later distilled into reusable truth or lessons. Promote those only when they become durable reference truth that multiple future tasks should load. ## Hosted Agents vs Local Subagents vs Imported “Agent Skills” These are not the same thing: - HLT hosted agents (`Victoria`, `Julius`, `Lila`) = personas with runtime identity, proclivities, and linked graph context - local subagents in Claude Code/Codex = runtime helpers inside a repo workflow - imported “agent” skills = usually skills, playbooks, or runtime adapters that were named loosely upstream Do not let imported “agent” wording collapse these into one category. ## Canonical Storage (DB-First) - **DB is canonical** for this repo: - `registry_entities` (identity + status + tags/links) - `agents` (persona + model defaults + system prompt) - `agent_proclivities` (agent-specific prefer/avoid/default_to) - `agent_memories` (optional, scoped) Filesystem packages (e.g. `AGENT.json`, `unit.json`) are **optional** portability surfaces and should be treated as mirrors/exports, not sources of truth. ## Mirror Sync Workflow - `.claude/agents/*.md` is the primary repo-local agent mirror surface. - `.claude-plugin/agents/*.md` is the generated plugin/workspace export surface. Project-local mirror: - Drift check: `npx tsx scripts/registry/sync/sync_agents_from_db.ts --check` - Regenerate mirrors: `npx tsx scripts/registry/sync/sync_agents_from_db.ts --prune` Protected HLT fleet: - Victoria, Julius, and Lila are the real production fleet personas tied to OpenClaw/runtime truth. - Keep them aligned to DB runtime config and enrich them carefully; do not treat plugin export snapshots as their authoring surface. Plugin/export mirror: - Drift check: `npx tsx scripts/distribution/export_plugin.ts --check` - Regenerate mirrors: `npx tsx scripts/distribution/export_plugin.ts` - Narrow agent-only drift check: `npx tsx scripts/distribution/export_plugin.ts --agents-only --check` - Narrow agent-only refresh: `npx tsx scripts/distribution/export_plugin.ts --agents-only` - Default scope: placeholder/plugin cohort only (`atlas-orchestrator`, `ivy-diff-first-editor`, `lucy-lesson-planner`, `nova-research-librarian`, `quinn-quality-reviewer`, `rex-repo-explorer`) - To export a different cohort intentionally: `npx tsx scripts/distribution/export_plugin.ts --agent-codes code1,code2` - Use `--agents-only` only when `.claude-plugin/.claude-plugin/katailyst.json` already exists, parses cleanly, and still contains the non-agent slices from a prior full export. - If the existing plugin index is missing, invalid, or agent-only, `--agents-only` now fails fast by design. Recovery path: run a full plugin export, then retry the narrow refresh. Cross-surface audit: - `npx tsx scripts/registry/audit/audit_agent_surfaces.ts --strict` If there is drift, update DB/seed/runtime config first, then regenerate mirrors. Do not hand-edit generated mirrors as the long-term source of truth. ## Minimum Fields (Conceptual) ``` { "persona_name": "Nina the Researcher", "persona_role": "researcher", "persona_voice": "direct, evidence‑first", "defaults": { "temperature": 0.2 } } ``` ## Valid `persona_role` Values Must match the DB CHECK constraint on the `agents` table: | Role | Description | | ------------ | ------------------------------------ | | `researcher` | Research and signal gathering | | `editor` | Editing and clarity | | `strategist` | Positioning, planning, orchestration | | `builder` | Implementation and prototyping | | `analyst` | Data and evaluation | | `reviewer` | Compliance and quality review | ## Tag Coverage (Minimum) - `scope:*` - `source:*` Recommended: - `modality:*` - `domain:*` or `audience:*` - `runtime:*` - `persona_role:*` ## Prompt Discipline - Keep prompts at the **right altitude**: - specific enough to enforce non-negotiables (scope, safety, output quality bar) - flexible enough to handle novel tasks without brittle overfitting - Prefer this shape: 1. identity/role 2. non-negotiables (5-7 max) 3. workflow pattern 4. quality bar - Keep durable policy in prompt, and move volatile/reference-heavy material to KB linked through discovery. - Avoid long "copy this exact style forever" instructions that narrow reuse across contexts. ## Entity Revisions Agents receive `entity_revisions` with `content_json` containing: ```json { "persona": { "display_name", "persona_name", "persona_role", "persona_bio", "persona_voice" }, "model": { "model_id", "temperature", "max_tokens" }, "system_prompt": "...", "boundaries": { "forbidden_actions": [...] }, "persona_json": { ... } } ``` This enables version history, content diffs, and the unified revision API across all entity types. ## Expected Links Agents should have `entity_links` connecting them to the entities they use: | Link Type | Target Type | Example | | ------------- | ----------- | ----------------------------------------------------------------------------- | | `uses_tool` | tool | `agent:nova-research-librarian → tool:tavily.search` | | `uses_prompt` | prompt | `agent:ivy-diff-first-editor → prompt:diff-first-editor` | | `requires` | rubric | `agent:quinn-quality-reviewer → rubric:content-quality` | | `recommends` | playbook | `agent:atlas-orchestrator → playbook:registry-health-scan` | | `pairs_with` | agent | `agent:nova-research-librarian ↔ agent:ivy-diff-first-editor` (bidirectional) | ## Links + Proclivities - Use `entity_links` to express **global** relationships. - Use `agent_proclivities` for **agent‑specific** preferences (prefer/avoid/default_to/never_use). ## Linked Context vs Packaged Artifacts - For agents, the canonical operating surface is usually the linked graph: - KB/docs - tools - bundles - prompts - rubrics - playbooks - Revision-local `artifacts_json` on an agent row is optional packaging/export material. - Zero packaged artifacts does **not** mean the agent has no docs or no operating context. - Do not duplicate linked docs into agent-local artifacts just to satisfy a readiness score. Use packaged artifacts only when the current revision needs a self-contained export layer. ## External References - Claude Code plugins docs: https://docs.anthropic.com/en/docs/claude-code/plugins - Claude Code prompting best practices: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview --- ## Source: docs/atomic-units/ARTIFACTS.md # Artifact Model (Shared Across Units) An **artifact** is any file that belongs to a unit beyond its entrypoint. Artifacts are optional, but the **types are standardized** so agents and scripts can reason about them uniformly. Core rule: - artifact = **revision-local support material** - kb = **reusable truth worth discovering** - link = **relationship only** - asset = **produced output with its own lifecycle** Do not hide reusable knowledge in artifacts just because one entity happens to use it first. Core rule: - artifact = **revision-local support material** - kb = **reusable truth worth discovering** - link = **relationship only** - asset = **produced output with its own lifecycle** Do not hide reusable knowledge in artifacts just because one entity happens to use it first. ## Canonical Artifact Types Use these folder names when present: - `references/` — deep context, long docs - `rules/` — guardrails, constraints, lint rules - `templates/` — reusable templates or prompt skeletons - `examples/` — sample inputs/outputs - `tests/` — fixtures, regression outputs, validation cases - `evals/` — eval results or eval inputs - `schemas/` — JSON schemas or DSLs - `scripts/` — helpers (never auto‑executed) - `assets/` — images/diagrams/media - `data/` — lookup tables, CSVs, dictionaries - `how_to_use/` — usage guides (when a skill/tool needs explicit run steps) ### DB `artifact_type` Mapping Folder names map to `artifact_type` values stored in `entity_revisions.artifacts_json` (DB-canonical) and mirrored to filesystem. Note: the `entity_artifacts` table exists for future blob/binary storage and indexing. Current `artifacts_json` contracts in use: - **Array shape (legacy/content-first):** each entry carries `artifact_type`, `path`, and `content`. - **Object shape (layered metadata):** top-level `files`/`notes`/`references`, where `files[*]` can include `kind`, `path`, optional `summary`, and optional `content`. Exporter behavior: - Filesystem materialization requires explicit `files[*].content` (object shape) or `content` (array shape). - Metadata-only file entries remain canonical in DB but are not written as mirror files. | Folder | artifact_type | Notes | | --------------- | --------------- | ---------------------------- | | `rules/` | `rule` | Guardrails and constraints | | `templates/` | `template` | Prompt or config templates | | `references/` | `reference` | **Preferred** | | `references/` | `reference_md` | Legacy, still allowed | | `examples/` | `example` | Sample inputs/outputs | | `tests/` | `test` | Fixtures + expected outputs | | `evals/` | `eval` | Eval inputs or results | | `schemas/` | `schema` | JSON/DSL schemas | | `scripts/` | `script` | Never auto‑run | | `assets/` | `asset` | Images/diagrams/media | | `data/` | `data` | CSVs, lookup tables | | `how_to_use/` | `how_to_use` | Usage guides | | `sections/` | `sections` | Reserved/legacy chunked docs | | `pack_manifest` | `pack_manifest` | Reserved for pack exports | | `SKILL.md` | `skill_md` | Optional entrypoint mirror | Important clarifications: - **Tests and evals can be artifacts.** They are the right fit when they only support one revision or one unit package. - **License is not a canonical artifact type.** Keep license notes as reference text or package metadata rather than inventing a separate artifact lane. - **Examples have two valid homes:** - local example for one entity revision → `examples/` artifact - reusable example the whole graph should discover → first-class `kb` with `item_type=example` ## Examples Decision Rule Use an **artifact example** when: - it mainly explains one skill/prompt/playbook - it should ship with that revision - it is supporting material, not its own node Use a **KB example** when: - many entities should point to it - you want it to be discoverable directly - it is a reusable teaching/example asset in its own right Important clarifications: - **Tests and evals can absolutely be artifacts.** They are the right fit when they only support one revision or one unit package. - **License is not a canonical artifact type.** If license notes exist, keep them as reference text or package metadata; do not invent a separate artifact lane for them. - **Examples have two valid homes:** - local example for one entity revision → `examples/` artifact - reusable example the whole graph should discover → first-class `kb` with `item_type=example` ## Examples Decision Rule Use an **artifact example** when: - it mainly explains one skill/prompt/playbook - it should ship with that revision - it is supporting material, not its own node Use a **KB example** when: - many entities should point to it - you want it to be discoverable directly - it is a reusable teaching/example asset in its own right ## Single-Bucket Option (Artifacts/) If a unit has mixed or unknown artifacts, you may use a single `artifacts/` folder plus `artifacts/index.json` to document each item’s kind. This is allowed for **imports** and for units that don’t map cleanly to the canonical folders. ## Required vs Optional Required: - **Entrypoint** for the unit type (e.g., `SKILL.md`, `SCHEMA.json`) - **unit.json** metadata file Recommended: - `tests/` for anything executable or evaluatable - `references/` for deep or complex domain rules - test fixture files should be human-readable in mirrors (real multiline markdown/text, not escape-encoded payload strings like literal `\n`) Optional: - All other artifact types as needed ## Script Safety - Scripts **must never auto‑run** during import, staging, or sync. - If a script is required, the entrypoint must explicitly say when/why to run it. ## Naming Guidance - Prefer short, descriptive filenames (`rls-patterns.md`, `migration-checklist.md`) - Avoid versioning in filenames unless needed (`v1`, `v2` belongs in metadata) --- ## Source: docs/atomic-units/ASSETS.md # Assets (Rules + Expectations) Assets are **first-class atomic unit types** for output, delivery, and publishing. They are not registry entities today, but they are not second-class or “just artifacts.” ## Canonical Model - **Canonical tables:** `assets`, `asset_versions`, `publish_events`, `campaign_assets` - **Registry references:** assets point to the reusable registry surfaces that shaped them, such as `content_type_ref`, `recipe_ref`, `style_ref`, and `channel_ref` - **Lifecycle:** draft/source -> validated version -> review/publish state -> archive ## When to Use - the thing is a produced article, email, landing page, image, deck, video, or similar output that should persist independently - the output needs version history, schema/lint status, publish history, or delivery targets - the system needs to inspect what was actually produced, not just the reusable method that produced it ## Do Not Use - when the value is reusable method or workflow logic -> use `skill` or `playbook` - when the value is durable doctrine or reference truth -> use `kb` - when the file belongs only to one registry revision as support material -> use an `artifact` - when the value is schedule + run state for a recurring workflow -> use `automation` ## Core Distinctions - **asset** = a produced output instance with its own lifecycle - **artifact** = a support file attached to one revision of another unit - **content type / recipe / style** = reusable contracts that shape assets - **run / trace** = execution evidence that may have produced or mutated assets ## What Good Assets Look Like - clear upstream registry references (`content_type_ref`, `recipe_ref`, etc.) - version history in `asset_versions` - truthful schema and lint validation state on versions - explicit publish or delivery history when the asset goes live - enough metadata to reuse the asset as evidence, example, or rollback target ## Recommended Quality Checks - version exists before the asset is treated as durable - `content_profile_json.asset_lifecycle_stage` is stamped truthfully - publish events and delivery URLs are not duplicated or contradictory - preview/thumbnail metadata stays separate from canonical content metadata ## Promotion Rule An asset can inspire or support other unit types, but do not automatically turn assets into registry canon. - distill repeated lessons into `kb` - attach representative files as revision `artifacts` when they support a registry unit - only promote into registry canon when the value becomes reusable truth or reusable workflow logic rather than one output instance --- ## Source: docs/atomic-units/AUTOMATIONS.md # Automations (Scheduled Action Runs) Automations are **scheduled triggers** that run Actions without manual prompting. ## What It Is (Canonical Model) - Automations are **not** registry entities. - They are **operational records** that point to an Action (playbook) and emit run history: - `automation` → `action_ref` (a `playbook:code@vN`) - each execution produces a `runs` trace (and tool call logs) for replay/debug ## When to Use - nightly hygiene (refresh embeddings, validate schemas, drift checks) - weekly ingestion (sync MCP catalog, import pack review queues) - periodic experiments (A/B evaluation batches) ## Required (Planned DB Shape) The invariants are: - schedule (cron-ish) - enabled/paused state - action reference (typed ref to a playbook) - scope (org/app/project) - run history: - last run - next run - status + error summary - link to the run trace for replay ## Tagging / Categorization Automations inherit their primary categorization from the linked Action’s tags: - Launchpad uses `family:*` + `domain:*` on the Action for grouping - automation list groups by Action category (no separate taxonomy required unless needed) ## Testing (Recommended) - A “dry-run” execution mode (no writes) for new automations - At least one successful execution trace before enabling by default --- ## Source: docs/atomic-units/BUNDLES.md # Bundles (Rules + Expectations) Bundles are **curated collections** of units. They do **not** impose order. If order matters, use a **Playbook** instead. ## Canonical vs Portable Surfaces - **Canonical:** Supabase DB (`registry_entities` + `bundles` + `entity_links`) - Membership is represented as `entity_links.link_type = 'bundle_member'`. - Direction: `entity_links.from_entity_id = member` and `entity_links.to_entity_id = bundle`. - Nested bundles are allowed, but the parent bundle still lives on `to_entity_id`. - **Portable export (default):** JSON packs under `registry-packs/` (Phase 3) - Exporter: `npx tsx scripts/distribution/export_registry_packs.ts` - **Filesystem bundle folders:** optional later, not a required surface today ## Org Placement - `hlt` is the active operating layer for HLT-specific runtime surfaces. - `system` is the shared canonical/template layer. - Put a bundle in `system` only when it is genuinely shared canon. - If the bundle's members, tags, summary, use case, or revision text are clearly HLT-facing, keep it in `hlt`. - Do not author HLT context bundles in `system` just because the content could someday be generalized. ## When to Use - assembling starter kits or context bundles - grouping related skills/KB/tools for discovery - packaging exports for reuse ## Bundle vs Nearby Unit Types - `bundle` vs `playbook` - bundle = unordered context pack - playbook = ordered or adaptive execution pattern - `bundle` vs `artifact` - bundle = first-class reusable group - artifact = local file inside one unit revision ## What Good Bundles Look Like - membership feels intentional, not random - the pack improves discovery or reuse immediately - grouped units belong together even if the exact execution order changes - links and intent make the bundle legible without opening every member Repo-native examples: - `bundle:launchpad-core@v1` = surfaced action/playbook pack - `bundle:blog-writing-kit@v1` = content-production kit - `bundle:study-guide-kit@v1` = education/content kit - agent-foundation bundles = shared context packs for hosted agents ## Low-Surface-Area Rule Use bundles to reduce entry points, not to create more. Good bundle behavior: - one clear summary - intentional membership - a few strong links outward - no hidden sequence logic pretending to be a pack ## Required - DB records for: - `registry_entities` (typed identity + summary + status) - `bundles` (bundle_type/bundle_mode) - `entity_links` (membership) - `entity_tags` (taxonomy coverage) If we later add filesystem bundle unit packages, they will follow the shared unit package contract (`docs/atomic-units/SHARED_CONTRACT.md`) with an entrypoint like `BUNDLE.json`. ## BUNDLE.json (minimum fields) ``` { "members": [ "skill:research-trends@v1", "kb:audience-nursing-students-2026", "tool:supabase-mcp" ], "intent": "starter kit for nursing outreach" } ``` ## Tag Coverage (Minimum) - `scope:*` - `bundle_type:*` (once namespace exists) Lifecycle state is already canonical on `registry_entities.status`; only mirror it into tags when it materially improves browse UX. Recommended: - `domain:*`, `audience:*`, `dept:*` ## Links Use `bundle_member` links for discoverability and graph traversal. Canonical reminder: - member belongs to bundle = `member -> bundle` - readers should load bundle members with `to_entity_id = bundle.id` ## This vs That - `bundle` = grouped context or capability pack - `playbook` = ordered/adaptive flow - `plugin/export` = packaging selected units for outside runtimes If order matters, it is not just a bundle. If the purpose is "ship this outside Katailyst/Claude Code/OpenClaw," it is not just a bundle either; that is plugin/export packaging. ## Testing (Optional) - Add `examples/` showing how the bundle is used in a run or prompt. --- ## Source: docs/atomic-units/CANONICAL_EXAMPLES.md # Canonical Examples (One Place to Copy the Pattern) This doc is the **single reference** for what "good" looks like across atomic unit types. Rules: - **Identity is not the path.** Identity is `entity_type + code + version` (typed ref: `entity_type:code@vN`). - The filesystem is a **view** (mirror or canonical surface), not the identity. - Prefer **real examples** first; fall back to templates when a unit type is DB-only today. ## Prime Examples (What Each Type Is Best For) Use this section when the question is "show me the right shape in this repo," not "show me a minimal template." | Type | Best used for | Prime example(s) | Why this is the right shape | | ------------------ | --------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | | `skill` | reusable method or decision protocol | `skill:meeting-prep@v1`, `skill:copywriting@v1`, `skill:brainstorming@v1`, `skill:firecrawl@v1`, `skill:github-mcp@v1` | These are triggerable methods with links outward to KB, tools, playbooks, or narrower layers instead of trying to contain the whole world in one launcher. | | `kb` | durable reference truth, methodology, audience/business context, or reusable domain guidance | `kb:brand-voice-master@v1`, `kb:content-performance-playbook@v1`, `kb:tools-guide-overview@v1`, `kb:design-principles-ui-ux@v1` | These improve judgment repeatedly across many runs. They should be linked and reused, not buried as one-off artifacts. | | `agent_doc` | hosted runtime steering, identity mirrors, SOPs, operating principles, and shared fleet canon | `agent_doc:agent-sop-victoria@v1`, `agent_doc:victoria-identity-user@v1`, `agent_doc:victoria-identity-tools@v1`, `agent_doc:global-agent-principles@v1`, `agent_doc:agent-foundation-spec@v1` | These shape how hosted agents behave. They are runtime-relevant and hidden from generic discovery unless explicitly requested or graph-expanded. | | `operational_log` | time-stamped operational history, daily notes, and archival run memory | `operational_log:daily-log-2026-03-01@v1`, `operational_log:lila-daily-log-2026-03-01@v1` | These preserve what happened. They are archival, not boot-time steering, and should stay out of generic discovery unless explicitly requested. | | `tool` | executable capability truth | `tool:firecrawl.crawl@v1`, `tool:firecrawl.scrape@v1`, `tool:db.query@v1`, `tool:vercel-deploy@v1` | Tools answer "what can be called?" They should also make "when to use it / key operations / companion method" legible instead of hiding behind constraints. | | `prompt` | reusable instruction fragment or output contract | `prompt:blog-post-system@v1`, `prompt:factory-template-generator@v1` | Prompts are smaller than skills and do not own a big graph family. | | `schema` | reusable structure contract for typed inputs/outputs | `schema:meeting_briefing_v1@v1`, `schema:web_page_v1@v1`, `schema:article_v2@v1` | Schemas answer “what shape must this output have?” They should not also become style, workflow, or business-context docs. | | `content_type` | editor/publishing contract for what kind of output exists | `content_type:meeting-briefing@v1` | This defines the output kind and publishing surface; it should not also become the style system or workflow. | | `style` | voice, visual system, component language, or presentation overlay | `style:leadership_briefing@v1`, `style:devtool_operator_console@v1`, `style:viral_social@v1` | Styles change how something feels without redefining the output contract. They can be verbal-only, visual-capable, or mixed. | | `recipe` | ready-to-run preset that binds schema/style/channel/constraints | `recipe:meeting-briefing-web@v1`, `recipe:blog-casual@v1` | Recipes are presets. They should not masquerade as new methods or bigger workflows. | | `bundle` | unordered context pack or featured capability family | `bundle:meeting-briefing-kit@v1`, `bundle:launchpad-core@v1` | Bundles reduce entry points by grouping things that belong together without imposing a sequence. | | `playbook` | ordered or adaptive multi-step flow | `playbook:meeting-briefing-research-report@v1`, `playbook:registry-health-scan@v1`, `playbook:blog-production@v1` | Playbooks coordinate skills/tools/KBs and checkpoints. They should not re-teach every method inline. | | `action` | surfaced operator entry to a featured playbook | `playbook:registry-health-scan@v1` when tagged `surface:cms-launchpad` and bundled into `bundle:launchpad-core@v1` | Actions are a UX surface, not a new ontology branch. | | `automation` | scheduled execution record that points at an action/playbook | `weekly-registry-health-scan` in `scripts/ops/bootstrap_org.ts` | Automations are operational state plus run history. They should not become reusable doctrine docs. | | `asset` | output or delivery instance with its own review/publish/version lifecycle | Content Studio rows in `assets` + `asset_versions`, such as meeting briefings, pages, social drafts, and generated media | Assets are the produced outcomes. They are first-class atomic unit types even though they are not registry entity rows today. | | `agent` | hosted persona with proclivities and linked operating context | `agent:victoria@v1`, `agent:julius@v1`, `agent:lila@v1` | Agents bias selection, tone, and defaults. Their actual operating brain should mostly live in linked agent_doc / KB / playbook / skill companion surfaces. | | `artifact` | revision-scoped local support file | `skill:copywriting@v1` templates/examples, `skill:meeting-prep@v1` templates/rules, `kb:codesandbox-recipes@v1` examples | Artifacts are local packaged depth. They are not independent graph nodes unless they need their own lifecycle. | | `link` | weighted graph relationship with reason | `skill:meeting-prep@v1 -> playbook:meeting-briefing-research-report@v1`, `skill:github-mcp@v1 -> skill:gh-issues@v1`, `agent_doc:agent-sop-victoria@v1 -> skill:registry-discovery-primer@v1` | Links express traversal and hierarchy without duplicating content. | | `memory/log/trace` | continuity, audit, and run evidence | daily files, heartbeat notes, run traces, raw semantic memory | Keep these operational until they are distilled into durable KB or reusable method. | | `plugin/export` | generated distribution surface for outside runtimes | `.claude-plugin/`, `public/llms.txt`, `public/llms-full.txt` | Plugin/export packages the canon. It does not decide what the canon is. | ## Current High-Value Patterns To Copy - **Top-level capability with narrower layer beneath it** - `skill:github-mcp@v1` should be the main GitHub surface. - `skill:gh-issues@v1` should stay the narrower issue-triage layer beneath it. - **Method + context + workflow family** - `skill:meeting-prep@v1` - `bundle:meeting-briefing-kit@v1` - `content_type:meeting-briefing@v1` - `style:leadership_briefing@v1` - `playbook:meeting-briefing-research-report@v1` This is the clearest current example of one method sitting inside a bigger linked capability family. - **Front-door super-agent stack** - `agent_doc:agent-sop-victoria@v1` = operating index - `agent_doc:victoria-identity-soul@v1` = mirrored `SOUL.md` - `agent_doc:victoria-identity-user@v1` = mirrored `USER.md` - `agent_doc:victoria-identity-id@v1` = mirrored `IDENTITY.md` - `agent_doc:victoria-identity-tools@v1` = mirrored `TOOLS.md` - `agent_doc:victoria-identity-agents@v1` = mirrored `AGENTS.md` operating-rules/delegation layer - shared runtime lessons surface = `agent_doc:agent-lessons-victoria@v1` This is the right pattern for hosted super-agents: one index up front, the real OpenClaw identity/runtime stack beneath it, shared lessons always in play, and a narrower per-agent lessons layer only when the agent truly has its own repeat-mistake lane. - **Structure + output + style stack** - `schema:meeting_briefing_v1@v1` - `content_type:meeting-briefing@v1` - `style:leadership_briefing@v1` - `recipe:meeting-briefing-web@v1` This is the clearest current example of “shape contract vs publishing contract vs stylistic overlay vs preset.” - **Nightly stewardship loop without adding another top-level doc family** - `skill:registry-discovery-primer@v1` - `playbook:registry-health-scan@v1` - `weekly-registry-health-scan` This is the right pattern for automated enrichment: discovery method -> ordered hygiene workflow -> scheduled execution. - **Reusable workflow vs recurring job vs produced output** - `playbook:meeting-briefing-research-report@v1` - `weekly-registry-health-scan` - Content Studio asset rows in `assets` This is the right split for systems like Mastra: reusable flow -> recurring automation -> concrete produced asset. - **One flagship playbook per job family** - meeting prep -> `playbook:meeting-briefing-research-report@v1` - writing/publishing -> `playbook:blog-production@v1` - stewardship -> `playbook:registry-health-scan@v1` This keeps the graph legible. Narrower routines should sit beneath the flagship flow, not compete with it. - **Rich style without a new type** - `style:leadership_briefing@v1` = verbal-first - `style:devtool_operator_console@v1` = mixed visual/system style - `style:viral_social@v1` = strong verbal cadence Rich brand/design variants should still live in `style`, with linked examples/assets/KB when depth is needed. ## Common Failure Patterns - **Issue layer competing with the parent capability** - Bad: `gh-issues` reading like a flagship peer to `github-mcp`. - Better: `github-mcp` as the parent surface, `gh-issues` as the narrow layer beneath it. - **Tool or KB trying to become a method** - Bad: a tool row teaching all the best practices for its ecosystem. - Better: tool for callable truth, skill for operator method, KB for durable context. - **Bundle pretending to be a playbook** - Bad: "these five things go together" but hidden sequence logic lives inside the bundle. - Better: bundle for grouping, playbook for sequence. - **Agent brain collapsed into one giant file** - Bad: one huge AGENTS doc that mixes identity, runtime recovery, tool selection, workflow routing, and logs. - Better: one front-door operating index plus linked persona/runtime/reference docs. - **Plugin/export mistaken for canonical ontology** - Bad: deciding unit type based on what is easiest to export. - Better: decide canon first, then generate plugin/export surfaces from it. - **Produced asset mistaken for a unit artifact** - Bad: treating a shipped article/image/page as if it were just another artifact inside a skill or KB. - Better: asset = produced output instance; artifact = local support file attached to one unit revision. - **Tool surfaces that only show constraints** - Bad: the tool exists, but an operator cannot tell what it does, when to use it, which operations matter, or what method/KB should accompany it. - Better: tool rows and companion artifacts make the usage contract legible at a glance. ## Canonical Instances (Already in This Repo) Skills (DB canonical, filesystem mirror): - `skill:mermaid-diagrams@v1` - `.claude/skills/curated/global/engineering/mermaid-diagrams/` KB (DB canonical; filesystem mirror): - `kb:writing-guide-overview@v1` - `.claude/kb/curated/global/writing/writing-guide-overview/` - `kb:design-principles-ui-ux@v1` - `.claude/kb/curated/global/design/design-principles-ui-ux/` - `kb:tools-guide-overview@v1` - `.claude/kb/curated/global/ai-sdk/tools-guide-overview/` - `kb:tavily-search@v1` - `.claude/kb/curated/global/ai-sdk/tavily-search/` - `kb:codesandbox-overview@v1` - `.claude/kb/curated/global/ai-sdk/codesandbox-overview/` - `kb:codesandbox-recipes@v1` - `.claude/kb/curated/global/ai-sdk/codesandbox-recipes/` - examples: `.claude/kb/curated/global/ai-sdk/codesandbox-recipes/examples/` Pack export (deterministic JSON pack format): - `registry-packs/ai-sdk-6/manifest.json` - `registry-packs/ai-sdk-6/pack.json` Canonical seed dataset (DB-first, seeds the **bootstrap catalog** baseline): - `database/009-seed-katailyst-canonical-examples.sql` - Seeds a small-but-world-class bootstrap catalog across the current registry `entity_type` set: - 3–5 exemplars per type (skills: 12–24) - 10–15 Launchpad Actions (featured playbooks) - Intentionally demonstrates the end-to-end groove: - selection-ready metadata - taxonomy coherence - connected link graph - revision-scoped content + artifacts (where applicable) - curated Content Studio objects (so “content” screens are real) - Launchpad Actions are `playbook` entities curated by `surface:cms-launchpad` + `bundle:launchpad-core@v1`. ## Flagship Templates (Copy/Paste) These templates are intentionally short but complete. For required tag namespaces, see `docs/TAXONOMY.md` and the per-unit docs in `docs/atomic-units/`. ### Skill (unit package) `unit.json`: ```json { "entity_type": "skill", "code": "new-skill-code", "version": "v1", "status": "staged", "tier": 4, "name": "New Skill (Human Label)", "tags": [ "status:staged", "scope:global", "modality:text", "stage:ingestion", "source:internal", "action:write", "domain:operations", "family:planning" ], "provenance": { "source": "internal" }, "links": [] } ``` `SKILL.md` (frontmatter must be Claude-compatible): ```md --- name: new-skill-code description: Use when doing X. Helps with Y. Keywords: a, b, c. --- # New Skill ## When to Use - ... ## Workflow 1. ... ``` ### KB (unit package) `unit.json`: ```json { "entity_type": "kb", "code": "example-kb", "version": "v1", "status": "curated", "tier": 2, "name": "Example KB (Human Label)", "tags": [ "format:reference", "scope:global", "domain:example", "source:internal", "status:curated" ], "provenance": { "source": "internal" }, "length": { "label": "short", "tokens_est": 1200 }, "links": [] } ``` `KB.md`: ```md --- name: 'Example KB (Human Label)' code: example-kb tier: 2 length: short tokens_est: 1200 --- # Example KB ## Summary ... ## When to Use - ... ``` ### Tool (DB-canonical, JSON pack export) Tools are DB-canonical today and exported via `scripts/distribution/export_registry_packs.ts`. If/when we add filesystem tool packages, the entrypoint should be `TOOL.json` plus `unit.json`. `TOOL.json` (minimum): ```json { "name": "Example Tool", "description": "What it does and when to use it.", "tool_type": "http", "provider": "example", "auth": "api_key", "runtime": "server", "input_schema": {}, "output_schema": {} } ``` ### Prompt (DB-canonical, JSON pack export) Prompts are DB-canonical today (DB entity_type `prompt`). Filesystem prompt packages are optional later; entrypoint would be `PROMPT.md`. `PROMPT.md` (minimum executable contract): ```md # Example Prompt ## Context ... ## Objective ... ## Inputs - `{{audience}}`: ... - `{{goal}}`: ... ## Instructions 1. ... 2. ... ## Output Format - Return: ... - Do not return: ... ## Guardrails - Avoid: ... - Validate: ... ``` ### Schema (DB-canonical, JSON pack export) Schemas are DB-canonical today and should ship with tests before promotion. Portable contract: - Export schema JSON via packs. - Treat schema test fixtures as first-class artifacts (Phase 4/5 will wire runners). ### Bundle / Pack (JSON pack export) Bundles are DB-canonical. Pack exports are the portable default. `registry-packs//manifest.json` (minimum): ```json { "pack": "example-pack", "version": "v1", "description": "What this pack is for.", "selectors": [ { "entity_types": ["tool", "prompt", "schema", "bundle"], "statuses": ["curated", "published"] } ], "includes": [ { "ref": "kb:example-kb@v1", "path": "../../.claude/kb/curated/global/example/example-kb" } ], "include_tags": [], "exclude_tags": [], "tags": ["bundle_type:pack", "scope:global", "status:curated"] } ``` ### Agent (DB-canonical, JSON pack export) Agents are DB-canonical today and exported via packs (Phase 3). If/when we add filesystem agent packages, the entrypoint should be `AGENT.json` plus `unit.json`. `AGENT.json` (minimum): ```json { "persona_name": "Nina the Researcher", "persona_role": "researcher", "persona_voice": "direct, evidence-first", "defaults": { "temperature": 0.2 } } ``` `AGENT.json` (full content_json structure — matches `entity_revisions.content_json`): ```json { "persona": { "display_name": "Nina", "persona_name": "Nina (Research Librarian)", "persona_role": "researcher", "persona_bio": "Source-first researcher: last-30-days aware, contradiction-aware.", "persona_voice": "Direct, evidence-first, citation-heavy." }, "model": { "model_id": "anthropic/claude-opus-4-6", "temperature": 0.2, "max_tokens": 4096 }, "system_prompt": "You are Nina. You are a source-first researcher...", "boundaries": { "forbidden_actions": ["delete_secret_values", "print_secret_values"] }, "persona_json": {} } ``` Valid `persona_role` values: `researcher`, `editor`, `strategist`, `builder`, `analyst`, `reviewer`. ### Channel (DB-canonical, JSON pack export) Channels are DB-canonical today and exported via packs (Phase 3). If/when we add filesystem channel packages, the entrypoint should be `CHANNEL.json` plus `unit.json`. `CHANNEL.json` (minimum): ```json { "constraints": { "max_chars": 2200, "supports_carousel": true, "requires_alt_text": true } } ``` ### Recipe (DB-canonical, JSON pack export) Recipes are DB-canonical today and exported via packs (Phase 3). If/when we add filesystem recipe packages, the entrypoint should be `RECIPE.json` plus `unit.json`. `RECIPE.json` (minimum): ```json { "base_schema": "schema:article_v2", "style": "style:listicle", "channel": "channel:web", "constraints": { "min_words": 800 } } ``` ### Playbook (DB-canonical, JSON pack export) Playbooks are DB-canonical today and exported via packs (Phase 3). If/when we add filesystem playbook packages, the entrypoint should be `PLAYBOOK.json` plus `unit.json`. `PLAYBOOK.json` (minimum): ```json { "steps": ["research", "plan", "draft", "review", "publish"], "mode": "guided" } ``` ## How to Add a New Canonical Example 1. Create the real unit (DB or filesystem surface, depending on unit type). 2. Ensure it passes: - `python3 scripts/registry/lint_unit_packages.py` (curated KB/skills) - `python3 scripts/registry/sync/generate_registry_manifest.py --check` - `python3 scripts/registry/sync/refresh_kb_metadata.py --check` (KB) - `pnpm exec prettier --check .` 3. Add it to the "Canonical Instances" section above (ref + path). --- ## Source: docs/atomic-units/CHANNELS.md # Channels (Rules + Expectations) Channels define **delivery surface constraints** (not schemas). ## When to Use - enforcing platform‑specific limits (length, alt text, CTA rules) - attaching delivery constraints to recipes or content types - standardizing output per surface ## Required - `CHANNEL.json` entrypoint - `unit.json` metadata (tags, status, tier, provenance) ## CHANNEL.json (minimum fields) ``` { "constraints": { "max_chars": 2200, "supports_carousel": true, "requires_alt_text": true } } ``` ## Tag Coverage (Minimum) - `channel:*` - `surface:*` - `scope:*` Recommended: - `modality:*` - `audience:*` ## Links - Use `requires` from recipes or content types to a channel. --- ## Source: docs/atomic-units/CONTENT_TYPES.md # Content Types (Rules + Expectations) Content types are **editor presets and publishing contracts** (not schema forks). ## Canonical vs Portable Surfaces - **Canonical:** Supabase DB (`registry_entities` + `content_types` + join tables) - `content_type_modalities`, `content_type_channels`, `content_type_ops`, `content_type_requirements` - **Portable surface (today):** JSON packs under `registry-packs/*/pack.json` ## When to Use - defining “what kind of asset is this?” (article, qbank item, slide deck, etc.) - enforcing required ops (research/draft/evaluate/publish) - wiring default recipe/style/channel expectations for an asset editor ## Required (DB) - `registry_entities` row (selection-ready) - `content_types` row (family, display_label, defaults) - at least 1 modality row (`content_type_modalities`) with `is_primary=true` ## Tag Coverage (Minimum) - `format:*` - `modality:*` - `scope:*` - `status:*` Recommended: - `domain:*` - `family:*` - `action:*` - `surface:cms-content` ## Links - Use `requires` to link: - content_type → schema/recipe/style/channel where appropriate - Requirements should reference entities explicitly (instruction/rubric/metric, etc.) ## Testing (Recommended) - At least one “showcase asset version” seeded for each flagship content type so CMS development is not done in an empty system. --- ## Source: docs/atomic-units/DATA_VIZ_CONTRACT.md # Data Viz Contract Status: Active Updated: 2026-02-19 This contract standardizes how data-visualization artifacts are represented so they remain reusable across orchestrators, domains, and distribution surfaces. ## Contract Intent 1. Keep data-viz assets evidence-backed, insight-rich, and visually memorable. 2. Keep modeling broad enough for cross-domain reuse. 3. Avoid ad-hoc demo pages with weak provenance. 4. Preserve optionality: guide discovery, do not hard-route orchestrators. 5. Use templates and proven patterns as first-class accelerators, then customize for stronger insight and visual identity. ## Canonical Storage Surface - Canonical row: `assets` - Canonical profile payload: `assets.content_profile_json` - Optional linked rows: - `asset_versions` (versioned snapshots) - `asset_tags` + namespaced tags - `entity_links` (relationships to skills/kb/prompts/rubrics/tools) ## Required Profile Fields (Data Viz) Store these keys in `content_profile_json`: | Key | Type | Why it matters | | ------------------- | ---------------- | -------------------------------------------- | | `title` | string | human-readable identity | | `summary` | string | quick operator/orchestrator orientation | | `artifact_kind` | string | coarse type, e.g. `data_viz` | | `source_url` | URL | upstream source or origin | | `deployed_url` | URL | live destination | | `methodology_notes` | string | how transformations/assumptions were applied | | `insight_goal` | string | what the viewer should understand or notice | | `limitations` | string | known caveats, constraints, or tradeoffs | | `data_freshness` | string/date note | recency and update window | ## Recommended Fields | Key | Type | Why it helps | | ------------------- | ------ | ---------------------------------- | | `repo_url` | URL | reproducibility | | `embed_url` | URL | fast reuse in docs/apps | | `preview_image_url` | URL | gallery browsing | | `chart_inventory` | array | quick chart-level map | | `claims` | array | explicit claims the viz supports | | `source_notes` | array | citation notes for critical claims | | `surface_map` | object | app/web/embed/vendor mapping | ## Tagging Guidance Minimum tag intent for publish-ready data-viz assets: - `format:data_viz` or equivalent visual format tag(s) - `format:reference` for example/showcase assets - `topic:*` for subject area - `source:*` when ingestion source is important - `surface:*` when a distribution target is already known Legacy non-namespaced tags should be normalized during enrichment waves. ## Linking Guidance Data-viz assets should include at least two typed links where applicable: - `uses_kb` to source methodology/context - `uses_tool` for rendering/data tools - `related` or `pairs_with` to adjacent examples/templates - `recommends` for next-step companion assets Link reasons should be concise and insight-useful. ## Quality Criteria (Publish Readiness) 1. Provenance is explicit and non-fabricated. 2. Methodology and limitations are documented. 3. Visual outputs are readable and labeled clearly. 4. The artifact communicates a clear insight or concept. 5. Metadata supports reuse across more than one surface. ## Rubric + QA Baseline (Phase 25.4) Use these dimensions for publish reviews and A/B evaluations: 1. Evidence quality and source clarity. 2. Visual readability (labels, legends, axis titles, annotation clarity). 3. Accessibility (contrast, alt text/caption, interaction accessibility where relevant). 4. Insight clarity and novelty (the key takeaway is clear and non-generic). 5. Narrative coherence (story flow from context to implications). 6. Visual craft and wow factor (distinctive, engaging, and high polish). ### Publish review checklist - [ ] Provenance and source URLs are present. - [ ] Methodology notes explain transformation assumptions. - [ ] Labels, legends, and titles are readable and non-ambiguous. - [ ] Accessibility checks are documented for visual contrast and textual alternatives. - [ ] Insight goal is explicit and audience-relevant. - [ ] Template baseline is intentionally customized (story, hierarchy, visual accents) instead of copy-paste generic. - [ ] Asset is reusable across at least two surfaces without schema rewrite. ## Content Type Decision Rule Default: use existing visual/content contracts unless fit assessment proves gaps. Create a dedicated `data-viz` content type only when all are true: 1. Existing visual content types cannot express required fields cleanly. 2. Existing schema validation cannot encode quality constraints needed. 3. Operators consistently misclassify data-viz assets with current types. 4. Migration cost is justified by measurable quality or speed gains. This rule is a duplicate-prevention gate. Absence of this evidence means no new content type. ## Anti-Patterns - Publishing charts with no provenance/methodology note. - Writing narrow, repo-specific instructions into globally reusable data-viz units. - Copy-pasting templates without adapting them to the audience, insight, or visual narrative. - Inflating confidence without evidence notes. ## References (Feb 2026 Baseline) - USWDS Data Visualization: - W3C WCAG Overview: - Vercel Sandbox docs: - Vercel v0 docs: - Vercel v0 integration docs: --- ## Source: docs/atomic-units/DECISION_MATRIX.md # Atomic Unit Decision Matrix If you only load one atomic-unit doc for routing, load this one. Use this doc when the hard part is not writing content, but deciding what the thing should be in Katailyst and where it should live. This is the cross-unit routing layer. Keep [NOMENCLATURE](../references/NOMENCLATURE.md) as vocabulary and keep the per-unit docs as type-specific rules. Katailyst uses **atomic unit type** as the broader system term. The current `registry_entities.entity_type` enum is the registry subset of that ontology. Assets, automations, runs, and traces are still atomic unit types even when they live in their own operational or delivery tables. Do not force this matrix to preserve a fixed historical unit count. The matrix exists to help the system reflect reality cleanly. If the current product and DB truth support richer styles, operational units, or new recurring surfaces, the doctrine should be updated to match that reality rather than squeezing everything back into an outdated simpler shape. ## One Front Door Per Capability Family Keep the system legible. - one strong front-door index per capability family - deeper nuance in linked artifacts or linked units - no parallel "start here" docs unless they serve meaningfully different jobs or runtimes Examples: - hosted super-agent docs: one operating index, then persona and runtime references beneath it - GitHub family: one top-level GitHub surface, then narrower issue or repo layers beneath it - Firecrawl family: one clear retrieval surface, then narrower scrape/crawl primitives or references beneath it ## Fast Matrix | Unit | Choose it when | Do not choose it when | Canonical payload | Discovery role | | ------------------ | ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------- | ----------------------------------------------------------------- | --------------------------- | | `skill` | The value is a reusable method, reasoning pattern, or execution guide | The value is only raw facts, only a callable capability, or only a relationship | launcher + layered artifacts | triggerable method surface | | `tool` | The value is an executable capability or integration surface | The value is mainly guidance on how to use a capability well | capability contract + runtime/executor truth + key operations | callable primitive | | `kb` | The value is durable truth, doctrine, audience context, setup guidance, or reference knowledge | The value is a procedural workflow or triggerable method | canonical reference content | context / truth surface | | `prompt` | The value is a reusable instruction contract or request fragment | The value needs a full method, graph, or lifecycle of its own | prompt structure + output contract | instruction fragment | | `schema` | The value is a structure contract for input/output validation | The value is tone, workflow logic, or broader publishing/editor behavior | structural contract | validation / shape surface | | `content_type` | The value defines what kind of output/editor/publishing contract exists | The value is only tone, only a preset, or only workflow guidance | output contract + required ops | editor / publishing surface | | `style` | The value changes how something sounds or looks without redefining the output kind | The value is the base output contract or a multi-step workflow | verbal/visual overlay + constraints + linked examples when needed | style overlay | | `recipe` | The value is a reusable preset that combines schema/style/channel/constraints | The value is the base schema or a larger workflow pattern | preset binding | ready-to-run preset | | `bundle` | The value is an unordered context pack that belongs together | The value needs ordered steps or adaptive sequencing | grouped membership rationale | context grouping | | `playbook` | The value is a reusable ordered or adaptive workflow across units | The value is just one method or just a context pack | ordered workflow pattern | workflow surface | | `action` | The value is a surfaced operator entry to a featured playbook | The value introduces new canonical logic separate from the playbook | curated playbook surface | fast-start workflow | | `automation` | The value is a schedule plus execution state for a playbook/action | The value is reusable guidance or ontology truth | scheduled run record | recurring execution | | `asset` | The value is a produced output with its own lifecycle, version history, or delivery state | The value is reusable method, canon doctrine, or a local support file | asset row + asset versions + publish history | output / delivery surface | | `artifact` | The value is local support material for one unit revision | The value should be discoverable and reusable as a first-class node | revision-scoped file | packaged depth | | `link` | The value is just a relationship between first-class units | The value contains content that should be read directly | typed edge + reason + weight | graph traversal | | `agent` | The value is a persona/runtime identity with preferences and linked operating context | The value is just a general method or imported “agent” pattern | persona + proclivities + links | selection/tone bias | | `plugin/export` | The value is distribution packaging for external runtimes | The value is canonical authoring truth | generated export only | distribution surface | | `memory/log/trace` | The value is dated operational continuity or run evidence | The value should guide many future tasks as durable truth | operational record | continuity / audit | ## Routing Questions Ask these in order. ### 1. Is it content or a relationship? - If the value is only “this thing should point to that thing,” make it a `link`. - If the value contains real content or guidance, keep going. ### 2. Is it first-class truth or local supporting material? - If it should be reused across many units, make it a first-class unit such as `kb`, `skill`, `tool`, `bundle`, or `playbook`. - If it only exists to support one specific unit revision, make it an `artifact`. Rule of thumb: - `artifact` = file attached to one unit revision - `link` = relationship between first-class rows - `kb` = first-class truth node ### 2.1 Is it reusable truth, runtime steering, or operational continuity? - **Use `kb`** for reusable truth: - domain knowledge - product docs - reusable examples - stable personal/profile context after distillation - **Use `agent_doc`** for runtime steering: - identity docs - standing instructions - agent-specific operating overlays - **Use `operational_log` / memory / trace** for continuity: - session recap - run chronology - daily notes - raw memory before distillation Fast test: - “Should future work repeatedly load this because it is true?” → `kb` - “Should this shape how one agent behaves?” → `agent_doc` - “Is this mainly a dated record of what happened?” → `operational_log` or memory ### 3. Is it capability or methodology? - If it answers “what can be called or executed?”, make it a `tool`. - If it answers “how should this work be approached?”, make it a `skill`. If the item teaches effective use of a capability, the right split is usually: - `tool` for the executable primitive - `skill` for the operator methodology - `schema` for the structure contract if typed output matters ### 4. Is it one method or a multi-step operating pattern? - If it is one reusable method, make it a `skill`. - If it coordinates several skills/tools/KBs in an ordered or adaptive flow, make it a `playbook`. - If the value is just the grouped context, not the sequence, make it a `bundle`. ### 5. Is it durable context or immediate instructions? - Use `kb` for durable truth, principles, setup, audience, brand, doctrine, or reference material. - Use `prompt` for reusable instruction fragments or output contracts that should version independently. - Use `schema` when the real value is the shape contract that validates or structures the output. ### 5.5 Is it the output contract, the style overlay, or the ready-to-run preset? - Use `content_type` when you are defining what kind of output/editor contract exists. - Use `style` when you are defining how that output should feel, sound, or look. - Use `recipe` when you are packaging a reusable preset that binds schema/style/channel/constraints together. If the thing mainly changes tone, visual language, components, or motion, it is usually `style`, not `content_type`. If the thing mainly chooses a best-known combination of existing units, it is usually a `recipe`, not a new `skill`. ### 5.5a Is it the reusable contract, or the produced output? - Use `asset` when the thing should persist as the actual output with its own lifecycle, validation state, version history, review state, or publish state. - Use `content_type`, `recipe`, `style`, and `playbook` for the reusable contracts and methods around that output. - Use `artifact` when the file only supports one unit revision and does not deserve an independent lifecycle. ### 5.6 Is it a surfaced workflow, or a scheduled run of that workflow? - Use `playbook` for the reusable workflow pattern. - Use `action` when that playbook should appear as a featured operator-facing surface such as Launchpad. - Use `automation` when the system should schedule that action/playbook repeatedly and keep run state/history. ### 5.7 Is it durable truth or just operational continuity? - Use `kb` when the content should still guide many future tasks. - Keep it as `memory/log/trace` when it mainly records what happened in one run, one day, or one session. ### 6. Is “agent” actually the right word? Do not let imported material use the word `agent` loosely. Usually imported “agent” packages are actually one of: - a local subagent pattern - a tool-operator skill - a workflow skill - runtime guidance for Claude Code/Codex HLT fleet agents are different: - Victoria - Julius - Lila - Ares Those are hosted personas with linked operating context, not generic reusable skill packages. Claude Code subagents are usually not first-class `agent` rows here. They are more often: - a skill family - a runtime-specific artifact - or an imported specialization that should sit beneath a stronger front door ## Placement Rules ### When to make it a KB Choose `kb` when the value is: - facts - setup truth - doctrine - audience context - brand/context packets - reference material an agent should consult repeatedly - a compact operating index that routes a super-agent into the right deeper docs - a distilled lessons-learned surface that should influence many future tasks Do not turn reusable truth into a local artifact just because one skill uses it. Useful KB flavors in this repo: - `format:policy` for non-negotiables and operating rules - `format:persona_profile` for identity/voice grounding - `format:reference` for runtime, tool, product, or system context - `format:best_practice` for portable principles worth reusing - `family:agent-files` when the KB is a reusable agent-facing operating file such as lessons, principles, architecture/spec notes, repo/operator orientation references, team context, or workspace/method references - `format:plan` when the KB is live reusable work-state rather than a historical recap - keep mission, research posture, and shared Katailyst usage rules in `family:agent-doctrine` instead of broadening `agent-files` into a catch-all doctrine bucket - `format:guideline` for applied but still reusable guidance ### When to make it an artifact Choose an `artifact` when the file belongs to one unit revision, such as: - examples - templates - rules - tests - setup notes specific to that unit Examples rule: - local worked example for one entity → artifact - reusable example many entities should discover → `kb` with reusable example shape Test/eval rule: - revision-local test fixtures and eval inputs/results → artifacts - first-class reusable eval case/rubric/signal → dedicated registry unit Artifacts travel with the unit revision but do not become independent graph nodes. ### When to make it a playbook Choose `playbook` when: - order matters - handoffs matter - checkpoints matter - the agent should adapt sequence based on what it finds If order does not matter, it is probably a `bundle` instead. ### When to make it a style, recipe, or content type Choose `content_type` when: - the editor/publishing system needs to know what kind of asset this is - the row defines required ops, schema expectations, or publishing defaults Choose `style` when: - the row should shape voice, tone, visual system, components, color, typography, layout, spacing, motion, or presentation behavior - the output kind stays the same but the expression changes Choose `recipe` when: - the row should package a known-good combination of schema/style/channel/constraints - you want a strong preset without inventing a new schema or workflow Premium page family rule: - `playbook` = one flagship front door for the page family - `style` = compact direction such as operator/editorial, conversion/story, or briefing/report - `kb` = doctrine, anti-patterns, and quality expectations - `artifact` / reference = concrete examples and screenshots - `bundle` = a curated premium page kit when grouped context matters more than order ### When to keep it as memory, log, or trace Keep it operational when it is primarily: - a daily log - a session journal - a one-run recap - raw semantic memory - scratch planning notes Promote it only after it becomes durable truth, reusable method, or stable workflow guidance. ## Not Everything Should Become a Registry Unit Some things should stay operational, generated, or local until they prove they deserve first-class registry identity. ### Content assets - Articles, images, landing pages, videos, slide decks, and similar outputs are usually **assets**: first-class atomic unit types for output and delivery. - The reusable logic still belongs in the registry-facing units around them: content types, recipes, styles, skills, tools, and playbooks. - The produced output belongs in assets/content records/runs unless it becomes reusable truth that should be distilled into `kb` or support material that should be attached as a revision `artifact`. - This includes `v0` and Lovable outputs: they are candidate rendered assets until reviewed, not canonical design truth by default. - A produced **asset** is not the same thing as a unit **artifact**. - asset = shipped output instance - artifact = support file attached to one unit revision ### Planning docs and daily logs - Daily logs, scratch planning notes, session journals, and one-off operator notes are usually **operational memory or artifacts**, not KB. - Promote them to `kb` only when they become durable truth that many units should reuse. - If a plan is still active and intentionally reused, keep it in KB with `format:plan` instead of flattening it into the log lane. - If they mainly document what happened in one run or one day, they belong in memory, assets, traces, or artifacts. ### Lessons learned - Distilled lessons that should shape many future tasks can become `kb`. - Distilled agent-facing lessons, principles, architecture/spec notes, repo/operator orientation references, and team/workspace references that should shape hosted agent behavior fit the `agent_doc` lane. - Raw postmortems, one-off retros, and dated worklogs should stay in memory/log/trace surfaces until they are distilled. ### Agent docs Agent-linked docs usually belong to `agent_doc`, not to agent-local artifacts, when they express reusable runtime steering such as: - `format:policy` for operating rules and non-negotiables - `format:persona_profile` for persona/identity grounding - `format:reference` or `format:best_practice` for runtime/tool/method context Keep general methodology/reference knowledge in `kb`, and keep agent-local artifacts for packaging or export depth, not for the whole operating brain. ### Plugin / export packaging - Plugins package units for outside runtimes. - They are not the place to decide ontology. - If the same content is useful in the registry graph, make it a first-class unit or linked KB first. ### Actions and automations - `playbook` holds the reusable workflow pattern. - `action` is the surfaced operator-facing entry to that playbook. - `automation` is the scheduled operational atomic unit type that points at the action/playbook and emits traces. ## What Good Looks Like Use these quality bars when deciding whether a unit is actually mature enough to publish. ### Good skill - triggerable and legible from `name` + `summary` + `use_case` - teaches a reusable method, not just facts - links to the KB, tools, playbooks, and adjacent skills that materially improve it - carries enough artifacts to be usable without becoming a monolith ### Good KB - deepens judgment without forcing one sequence - acts as durable truth, doctrine, setup guidance, or audience/business context - is written to be cited and reused across many units - avoids becoming a disguised step-by-step workflow unless that is truly the point ### Good tool - describes a callable capability truthfully - states runtime, provider, auth, and risk clearly - makes key operations/endpoints legible - points to the companion skill or KB that makes it usable in practice - does not pretend to be a method guide or best-practice playbook ### Good style - makes the intended feel legible from summary, use case, and overlays - can express verbal, visual, or mixed systems without redefining the output contract - links to deeper KB/examples/assets when brand or design nuance is substantial - avoids becoming a disguised content type or a giant doctrine dump ### Good playbook - captures sequence, checkpoints, and adaptation logic across multiple units - helps an agent or operator know what should happen next - stays workflow-oriented instead of bloating into one giant method skill - acts as the flagship workflow for its job family instead of competing with several peer playbooks ### Good bundle - groups context that genuinely belongs together - improves discovery and reuse without imposing order - is a pack, not a hidden playbook ### Good recipe - packages a known-good mix of schema/style/channel/constraints - improves speed and consistency without creating a new schema fork - stays composable instead of hiding a large workflow ### Good content type - clearly defines what kind of output is being produced - owns editor/publishing expectations - does not absorb style or playbook logic that belongs elsewhere ### Good automation - clearly points at an Action/playbook - makes schedule and run history inspectable - stays operational instead of pretending to be reusable methodology ### Good asset - clearly points back to the reusable registry units that shaped it - makes version history and validation state inspectable - keeps publish/delivery history legible - is reusable as evidence or example without pretending to be canon doctrine ### Good artifact - clearly belongs to one unit revision - is supportive depth, not a hidden first-class node - would be awkward or noisy if promoted into its own registry entity ## This vs That ### Skill vs KB - choose `skill` when the value is "how to do the job" - choose `kb` when the value is "what is true, what matters, what to keep in mind" Example: - `copywriting` = skill - HLT audience, brand voice, product/business context = linked KB ### Skill vs Playbook - choose `skill` for one reusable method - choose `playbook` when that method is only one step in a larger adaptive sequence Example: - `brainstorming` = skill - a longer research -> brainstorm -> decide -> execute -> review flow = playbook ### Bundle vs Playbook - choose `bundle` when grouped context should travel together - choose `playbook` when order, handoffs, or checkpoints matter ### Content Type vs Style vs Recipe - choose `content_type` for the output/editor contract - choose `style` for tone, visual language, or component system overlays - choose `recipe` for a reusable preset that binds the two Rich design systems, brand modes, presentation modes, and component-system overlays still belong in `style` when they primarily change expression rather than the output kind. ### Playbook vs Plugin / Export - choose `playbook` when the value is a reusable workflow pattern inside the registry - choose `plugin/export` when the value is packaging selected units for an external runtime or install surface Playbooks are canonical workflow logic. Plugins are distribution. ### Artifact vs KB - choose `artifact` when the file belongs only to one unit revision - choose `kb` when the content should be reusable and discoverable across the graph ### Asset vs Artifact - choose `asset` when the thing is a produced output with its own lifecycle, version history, or publish state - choose `artifact` when the file only supports one revision of another unit ### Artifact vs Link - choose `artifact` when there is real file content to read - choose `link` when the value is only the relationship and rationale ### Tool vs Skill - choose `tool` for executable capability - choose `skill` for the operator method that tells an agent when and how to use that capability well Example: - `github-mcp` = capability-facing skill/tool-method surface - GitHub API/MCP server itself = tool layer - `gh-issues` = narrower triage layer beneath GitHub ### Agent vs Skill - choose `agent` for persona, proclivities, and defaults - choose `skill` for a reusable capability or method Agents should bias selection and tone. They should not be the only place where capability knowledge lives. ### Super-Agent Index vs Runtime Docs For hosted fleet agents, keep a distinction between: - **front-door operating index** - how the agent should respond, route intent, choose depth, and pick the next pack - this is the `AGENTS.md` / `agent-sop-*` layer - **identity docs** - `SOUL.md`, `USER.md`, `IDENTITY.md` - these define who the agent is, who it serves, and its compact self-model - **runtime docs** - `TOOLS.md`, `BOOTSTRAP.md`, `HEARTBEAT.md`, `MEMORY.md` - bootstrap, tools, memory, cron, deployment, and substrate details - **lessons docs** - shared and per-agent lessons that exist specifically to prevent repeated mistakes If a document mainly helps the agent decide what to do next for the current request, it belongs in the operating-index layer. If a document mainly explains how the runtime works or how to recover/operate the host, it belongs in the runtime-reference layer. Preferred naming pattern for hosted fleet agents: - `agent-sop-{agent}` = front-door operating index - `{agent}-identity-soul` = mirrored `SOUL.md` - `{agent}-identity-user` = user/business/persona context - `{agent}-identity-id` = mirrored `IDENTITY.md` - `{agent}-identity-tools` = runtime/tool substrate truth - `{agent}-identity-agents` = coordination/delegation rules - `{agent}-identity-memory` = distilled reusable memory - `{agent}-identity-bootstrap` = startup / recovery reference - `{agent}-identity-heartbeat` = recurring monitoring / cron doctrine - `agent-lessons-{agent}` = durable lessons learned ### KB vs Daily Log / Planning Note - choose `kb` when the content should stay true and reusable across many future tasks - keep it as memory, trace, asset, or artifact when it is mainly a dated record of what happened ### Playbook vs Action vs Automation - choose `playbook` for the reusable workflow pattern - choose `action` when the playbook should be a featured user-facing entry - choose `automation` when the system should schedule it and keep execution history Major job families should usually have one flagship playbook, with narrower skills, bundles, and recipes beneath it. ### Style vs Recipe vs Content Type - choose `style` for voice, visual system, tone, color, typography, component, or formatting overlays - choose `recipe` for a reusable combination of schema/style/constraints - choose `content_type` for the editor/publishing contract that defines what kind of output is being produced ## MCP / Capability Integration Routing MCP servers are **tools**, not skills. When an MCP server reference arrives (from imports, from Anthropic, from GitHub): 1. Create a `tool` entity for the server capability (`tool_type: mcp`, `provider: *`) 2. If there is meaningful operator methodology, create a companion `skill` for how to use it well 3. If there is setup/troubleshooting content, create a companion `kb` 4. Link the skill to the tool with `uses_tool` **Anti-pattern**: Do NOT classify an MCP server as `entity_type: skill` with tag `tool_type:mcp-server`. That is a semantic mismatch. **Umbrella pattern**: When multiple tools serve the same capability domain (e.g., image generation via fal, DALL-E, Midjourney), prefer one umbrella skill that links to the tool variants rather than N peer-level skills. See `classification-rules.json` for machine-readable routing rules. ## Imported Shape Rules Imported skills are allowed to arrive in different valid shapes. Do not force fake uniformity. Approved curated/published skill profiles: - `method_pack` - operator-facing reusable method - example: `copywriting`, `brainstorming` - `rule_corpus` - imported best-practice corpus with many leaf rules - example: `vercel-react-best-practices`, `next-best-practices` - `runtime_adapter` - runtime-specialized guidance - example: `openclaw-skill-creator` - `capability_reference` - imported API/MCP reference that mixes capability description with operator guidance - example: `github-mcp`, `linear` - use when material is too methodology-heavy for pure `tool` but too capability-focused for `method_pack` - consider splitting into `tool` + `skill` if capability and methodology are clearly separable Imported items should preserve the upstream portable core, then add Katailyst overlays for: - discovery - runtime truth - HLT relevance - graph placement ## Practical Examples ### Prime repo-backed examples by job - **Need one reusable method** - `skill:copywriting@v1` - `skill:brainstorming@v1` - `skill:meeting-prep@v1` - **Need one durable context packet** - `kb:brand-voice-master@v1` - `kb:content-performance-playbook@v1` - `agent_doc:agent-sop-victoria@v1` - **Need one top-level capability surface** - `skill:github-mcp@v1` - `skill:firecrawl@v1` - `skill:linear@v1` - **Need one instruction contract** - `prompt:blog-post-system@v1` - **Need one structure contract** - `schema:meeting_briefing_v1@v1` - `schema:web_page_v1@v1` - **Need one output/editor contract** - `content_type:meeting-briefing@v1` - **Need one style overlay** - `style:leadership_briefing@v1` - `style:devtool_operator_console@v1` - `style:viral_social@v1` - **Need one ready-to-run preset** - `recipe:meeting-briefing-web@v1` - **Need a bigger adaptive workflow** - `playbook:meeting-briefing-research-report@v1` - `playbook:registry-health-scan@v1` - `playbook:blog-production@v1` - **Need grouped context, not order** - `bundle:meeting-briefing-kit@v1` - `bundle:launchpad-core@v1` - **Need a hosted super-agent front door** - `agent_doc:agent-sop-victoria@v1` - then `agent_doc:victoria-identity-user@v1`, `agent_doc:victoria-identity-tools@v1`, `agent_doc:victoria-identity-agents@v1` ### GitHub - `github-mcp` = top-level GitHub operator skill - `gh-issues` = narrower issue-triage layer beneath GitHub ### Firecrawl - `firecrawl` = top-level web retrieval method/capability surface - scrape/crawl specifics = tools or narrower artifacts beneath it ### Writing - `copywriting` = skill - HLT voice and audience context = linked KB - copy templates and examples = artifacts on the skill ### Planning - `brainstorming` = skill - `writing-plans` = adjacent skill - `executing-plans` = adjacent skill - larger coordinated sequence = playbook ### Meeting workflows - `meeting-prep` = skill for the reusable method - `playbook:meeting-briefing-research-report@v1` = larger research/report workflow when order and checkpoints matter - a meeting-prep resource kit = bundle if the value is grouped context, not sequence ### Agent docs - `agent_doc:victoria-identity-bootstrap@v1` = runtime steering doc with `format:policy` / `format:reference` flavor - `agent_doc:victoria-identity-memory@v1` = reusable operating context, not generic KB - `agent_doc:agent-sop-victoria@v1` = front-door operating index for a hosted super-agent - one day of notes or a run recap = `operational_log` / memory / trace, not KB ### Packs and surfaced workflows - `bundle:launchpad-core@v1` = surfaced pack of featured playbooks/actions - `bundle:blog-writing-kit@v1` = unordered content kit - `action` = surfaced playbook entry, not a new canonical workflow type ### Recurring operations - nightly registry hygiene = automation pointing to a playbook/action - execution trace and run output = memory/log/trace, not a new KB by default ### Registry stewardship loop If the goal is "keep improving the registry over time with a small number of entry points," use this stack: - `skill:registry-discovery-primer@v1` for discovery-first shortlist building - `playbook:registry-health-scan@v1` for the ordered hygiene loop - `playbook:suggest-links@v1` when graph enrichment is the specific job - `weekly-registry-health-scan` for scheduled execution Do not solve this by adding a second giant ontology document or a dozen peer "registry helper" docs. The right shape is one front-door method, one reusable playbook, and one automation. ## Plugin / Export Clarification Plugins are not an atomic unit type. In this repo, plugin/export means: - generated distribution surface - DB-canonical source feeding a portable filesystem/package output - packaging of skills, agents, commands, hooks, and related files Do not treat plugin packaging as canonical authorship. ## Discovery Defaults Discovery should favor: - one strong top-level capability surface - linked supporting KB/tools/playbooks beneath it - explicit reasons in links Avoid large families of near-duplicate peer rows when hierarchy would be clearer. --- ## Source: docs/atomic-units/EVAL_CASES.md # Eval Cases (Rules + Expectations) Eval cases define **repeatable benchmark tasks** for the operator lab. They are registry-native fixtures that describe what to run, how ready the stack is to run it, and which rubric should judge it. ## Canonical vs Portable Surfaces - **Canonical:** Supabase DB (`registry_entities` + `entity_revisions`) - **Portable surface (today):** registry exports / packs that carry the `eval_case` revision payload ## When to Use - tracking whether the system is actually getting better over time - comparing prompt stacks, skills, bundles, or workflows against the same benchmark - recording runnable-now vs blocked capability coverage without creating bespoke dashboards per experiment - feeding evaluation results back into discovery ranking through the existing eval ledger (`runs`, `run_steps`, `run_outputs`, `evaluations`, `eval_signals`) ## Required Revision Contract Minimum `entity_revisions.content_json` shape: ```json { "eval_case": { "prompt": "Write an investor-facing summary of HLT and one concrete demo.", "tier": "T1", "complexity_band": "medium", "primary_phases": ["interpret", "explore", "execute"], "rubric_ref": { "entity_type": "rubric", "code": "pipeline-5-phase", "version": "v1" }, "expected_entity_refs": [], "task_specific_checks": ["Names real products", "Frames the demo concretely"], "capability_dependencies": ["registry_discovery"], "readiness": { "status": "runnable", "note": "Runnable now." }, "run_mode": "manual_capture" } } ``` ## Design Principles - **Reuse the existing ledger:** eval cases define fixtures; they do not get their own bespoke run/evidence subsystem. - **Measure the system, not one prompt:** evaluate workflows, bundles, prompt stacks, and other composable units against the same case over time. - **Readiness is explicit:** use `runnable`, `partial`, and `blocked` honestly so the lab can distinguish current capability from future targets. - **Discovery-friendly, not hard-routed:** use links and expected refs as hints, not rigid orchestration. - **Operator-facing:** prompts and checks should be legible enough for a human to understand why a case exists and what good looks like. ## Link Guidance - Link bundles to eval cases with `bundle_member` when packaging a suite. - Canonical direction: `eval_case -> bundle` - Link eval cases to their required rubric with `requires`. - Use `recommends` or `pairs_with` for helpful supporting prompts, skills, or bundles when a case regularly benefits from them. ## Anti-Patterns - Creating a separate eval-case run table instead of reusing the canonical eval ledger. - Encoding one-off orchestration steps that only make sense for a single experiment. - Marking blocked cases as runnable just to inflate coverage. - Writing vague prompts with no task-specific checks. ## Promotion Checklist - Prompt is specific enough to produce comparable outputs. - Readiness status reflects reality. - Rubric ref is present and valid. - Task-specific checks capture what success actually looks like. - Capability dependencies describe why the case is blocked or partial when applicable. --- ## Source: docs/atomic-units/KB.md # KB (Rules + Expectations) KB items are **long‑form reference context**. They can be very long. There is **no length limit**. ## When to Use - when an agent needs **domain expertise** or best‑practice guidance - when instructions would be too long for a prompt - when you want a reusable, citeable reference packet ## What KB Is (and Is Not) - KB is contextual reference, not an imperative task runner. - KB should deepen decision quality while preserving orchestrator freedom. - Skills define procedures; KB explains concepts, constraints, examples, and adaptation patterns. - Shared KB should avoid single-use lock-in unless explicitly required by source truth. ### KB vs Nearby Unit Types - `kb` vs `skill` - KB = durable truth or context - skill = reusable method - `kb` vs `artifact` - KB = first-class node worth linking and discovering - artifact = local supporting file for one revision - `kb` vs `prompt` - KB = reference context that can inform many prompts - prompt = executable instruction contract If the content reads like “Step 1, Step 2, Step 3,” it may actually be a skill or playbook with linked KB support. ## Reusable KB Classes Not all KB should be treated as one flat bucket. The important reusable classes are: - **Domain KB** — subject-matter truth, reference material, concepts, exam/product/domain knowledge - **Product docs** — product-specific truth, feature behavior, setup details, constraints, integration notes - **Reusable examples** — first-class examples that multiple entities should link to directly - **Profile-style KB** — stable personal or audience profiles that should guide future work repeatedly These are all still `kb`, but they are not the same job and should not be authored as if they were interchangeable. ### Personal memory vs KB Use KB only for the **stable distilled layer**: - long-lived user preferences - durable audience or persona profiles - persistent product/account context that many future tasks should load Do **not** use KB for: - raw session recap - day-by-day notes - temporary work journal - noisy operational continuity That material belongs in memory, logs, traces, or `operational_log` until it is distilled. ### Product docs vs domain KB - **Product docs** answer: “what is true about this product, feature, workflow, integration, or offering?” - **Domain KB** answers: “what is true about the field, audience, concept, topic, exam, or market?” Both are reusable truth. Keep them separate in authorship and linking even when both remain `kb`. ## Reusable KB Classes Not all KB should be treated as one flat bucket. The important reusable classes are: - **Domain KB** — subject-matter truth, reference material, concepts, exam/product/domain knowledge - **Product docs** — product-specific truth, feature behavior, setup details, constraints, integration notes - **Reusable examples** — first-class examples that multiple entities should link to directly - **Profile-style KB** — stable personal or audience profiles that should guide future work repeatedly These are all still `kb`, but they are not the same job and should not be authored as if they were interchangeable. ### Personal memory vs KB Use KB only for the **stable distilled layer**: - long-lived user preferences - durable audience or persona profiles - persistent product/account context that many future tasks should load Do **not** use KB for: - raw session recap - day-by-day notes - temporary work journal - noisy operational continuity That material belongs in memory, logs, traces, or `operational_log` until it is distilled. ### Product docs vs domain KB - **Product docs** answer: “what is true about this product, feature, workflow, integration, or offering?” - **Domain KB** answers: “what is true about the field, audience, concept, topic, exam, or market?” Both are reusable truth. Keep them separate in authorship and linking even when both remain `kb`. ## Low-Surface-Area Rule Do not solve a knowledge-shape problem by adding three competing KB front doors. Prefer: 1. one compact front-door KB for the capability family 2. linked deeper KBs for persona, runtime, doctrine, or examples 3. linked skills/playbooks/tools for method or execution This matters most for hosted super-agents and large doctrine families. ## What Good KB Looks Like - it improves judgment, not just recall - it includes concrete caveats, adaptation notes, or examples - it is reusable across multiple skills/playbooks/agents - it says what is true and what matters without hard-routing the operator - it links outward to the methods or tools that should use it - it does not fake asset-only preview metadata; KB completeness is package quality, not thumbnail fields ## Agent-Linked KBs Many agent docs should stay as KBs, not as agent-local artifacts, when they are reusable operating truth. Typical shapes: - `format:policy` for non-negotiables and operating rules - `format:persona_profile` for identity/voice/persona grounding - `format:reference` or `format:best_practice` for runtime, tool, or system guidance - `format:guideline` when the material is applied guidance but still reusable truth - `format:plan` for live reusable work-state that still belongs in the KB graph - `format:operational-log` for preserved daily/run/session continuity that should remain legible without pretending to be doctrine ### Agent Files Some KBs are not ordinary research docs and not raw logs. They are reusable agent-facing operating files. Use `family:agent-files` when the KB is: - lessons learned that should influence future runs - shared principles or policy for active agents - architecture notes that explain the live operating surface - team/stakeholder context agents repeatedly need while working - workspace or operating-method references agents repeatedly need while working - architecture/spec notes that explain how the active stack fits together - durable repo/operator orientation references agents repeatedly need while working Keep the distinction sharp: - `family:agent-files` = reusable operating context - `format:plan` = live reusable work-state - `format:operational-log` = dated recap or history record - `family:agent-doctrine` = shared operating rules such as mission, research posture, and Katailyst usage doctrine Do not move these into assets just because they are hybrid. If they should still be loadable later as context, they belong in KB. Repo-native examples: - `agent-sop-victoria` = front-door operating index - `victoria-identity-user` = persona profile - `victoria-identity-tools` = runtime/tool reference - `victoria-identity-memory` = distilled reusable operating memory, not a raw log - `global-team-context` = shared people/stakeholder operating reference - `linear-planning-methodology` = shared planning/workflow operating reference - `context-engineering-methodology` = shared context-preservation operating reference - `katailyst-spec-index` = shared repo/spec orientation reference - `agent-foundation-spec` = shared architecture/spec operating reference Examples from the current repo shape: - `victoria-identity-bootstrap` - `victoria-identity-tools` - `victoria-identity-memory` Those are now linked `agent_doc` surfaces that help a hosted agent reason repeatedly. They should not be collapsed into one giant artifact just because they belong to Victoria. ### Protected Runtime Overlays Some legacy mirror paths still house the live runtime steering stack and need stricter handling than ordinary reference docs. Treat the active hosted-agent front doors and identity mirrors as protected runtime overlays when they are: - `agent-sop-*` front-door operating indexes - `*-identity-*` mirrored runtime files such as `SOUL`, `USER`, `IDENTITY`, `TOOLS`, `BOOTSTRAP`, `HEARTBEAT`, `MEMORY`, and `AGENTS` Operational rule: - keep them canonical in Katailyst - keep them DB-backed and discoverable - do not treat them like normal KB cleanup targets - do not bulk-trim, summarize, condense, or "rewrite for neatness" unless you are intentionally updating the runtime steering stack - review them through the dedicated Runtime Overlays view and protected-surface workflows They remain KB-backed mirrors because they are reusable, linked, and canon-backed. They stop being "ordinary KB" at the editing/governance layer because casual edits can destabilize live agent behavior. ## What Usually Stays Out of KB These are often useful, but they are not automatically KB: - daily logs - session scratch notes - one-off planning drafts - run recaps - raw semantic memory dumps Promote them into KB only when they become durable, reusable truth that multiple future tasks should load. If they need to remain visible for continuity, classify them as `format:operational-log` instead of pretending they are doctrine. Otherwise keep them as memory, traces, assets, or revision-local artifacts. If a planning document is still live, reused, and governs ongoing work, keep it in KB with `format:plan` instead of forcing it into the log lane. ## Lessons Learned vs Logs Use `kb` for lessons learned only after distillation. Good KB-shaped lessons learned: - durable mistakes to avoid - patterns that should influence many future runs - reusable heuristics, not dated chronology - reusable agent-facing operating corrections that should still be loaded later as context Keep it out of KB when it is mostly: - "what happened today" - a run recap - a scratch planning thread - raw semantic-memory capture - a work journal ## Naming Convention New KBs follow the pattern: `{domain}-{function}-{specific}` ### Domains | Domain | Covers | | ---------- | -------------------------------------------------------------------- | | `strategy` | Founder/team strategic vision, company direction, market positioning | | `brand` | Identity, voice, tone, design system, style guides | | `channel` | Marketing channel tactics and best practices | | `product` | Product configs, features, UX, pricing | | `teaching` | Pedagogical approaches, content methodology | | `audience` | User personas, segments, psychology | | `market` | Competitive intelligence, industry data, exam trends | | `agent` | Agent operating docs, identity, lessons, SOPs | | `system` | Technical architecture, deployment, tooling | ### Functions | Function | What it does for an agent | | ----------- | ----------------------------------------------------------- | | `vision` | Founder/team strategic direction. Changes agent intent. | | `playbook` | Best practices, how to execute. Changes agent approach. | | `voice` | Tone and language patterns. Changes agent output style. | | `reference` | Facts, specs, configs, taxonomies. Provides ground truth. | | `persona` | User profiles. Changes who agent writes for. | | `identity` | Agent self-knowledge. | | `lessons` | Operational memory, anti-patterns. | | `insight` | Research synthesis, analyzed findings. | | `guide` | Style guides, writing rules, design systems. | | `hub` | Routing context — "when doing X work, here's what matters." | Full naming convention reference: `docs/references/operations/KB_TAXONOMY_PROPOSAL.md`. ## KB Subtypes and Format Tags | Subtype | format tag | Description | | ---------------------- | -------------------------------------------- | ------------------------------------------------------------ | | Concept / Mental Model | `format:concept` | Frameworks, heuristics, "how the best people think about X" | | Best Practice | `format:best_practice` | Proven patterns with do/don't guidance and examples | | Business Context | `format:reference` | Product, market, competitive, business situational awareness | | Audience/Persona | `format:persona_profile` | Who the audience is, what they care about | | Brand/Voice | `format:style_guide` | How to sound, tone, brand guardrails | | Channel Strategy | `format:guideline` + `channel:*` | How a channel works, what wins there | | Doctrine/Policy | `format:policy` | Non-negotiable operating rules | | Lessons Learned | `format:best_practice` + `topic:lessons` | Distilled operational experience | | Technical Reference | `format:reference` | API guides, architecture, integration details | | Product Insight | `format:reference` + `topic:product-insight` | Strategic product knowledge | Format tag clarifications: - `format:reference` = factual/technical context (API docs, architecture, specs) - `format:guideline` = applied guidance that generalizes (how to approach X) - `format:best_practice` = proven patterns with evidence (what works and what doesn't) - `format:concept` = mental models and frameworks (how to think about X) - `format:policy` = non-negotiable rules and governance ## Quality Depth Standard - No one-line stubs for curated/published KB units. - Include practical detail, not only definitions. - Add examples, caveats, and adaptation notes where they improve decisions. - For evolving ecosystems, cite current sources and include retrieval date context. ## Variant Contract Each variant serves a distinct purpose with a clear size ratio: | Variant | Purpose | Target Size | Ratio to Full | | ----------- | --------------------------------------------------- | ------------------------------- | ------------- | | `snippet` | Quick refresh. "Should I load this?" | 2-8 sentences (50-200 tokens) | ~5-10% | | `distilled` | Decision-grade briefing. Key principles + examples. | 0.5-1.5 pages (300-1500 tokens) | ~30-50% | | `full` | Deep reference. Everything worth knowing. | 1-4 pages (1000-5000 tokens) | 100% | **Snippet:** Plain text. What is this, when to load it, 1-2 key insights. No XML wrapper. **Distilled:** Core principles with enough detail to act on. Key examples, anti-patterns, caveats. Standard XML context wrapper applied. **Full:** Complete treatment with full examples, evidence, edge cases. Standard XML context wrapper applied. Important: - `full` is now the primary variant when only one body is present. - `distilled` and `snippet` are **optional**, not mandatory. - A KB does **not** need all three variants to be valid. Important: - `full` is now the primary variant when only one body is present. - `distilled` and `snippet` are **optional**, not mandatory. - A KB does **not** need all three variants to be valid. **KB.md:** Always equals the full variant content plus frontmatter. Lengths can be exceeded when useful; quality and clarity matter more than strict size targets. For front-door operating indexes, bias toward `distilled` first and link deeper KBs instead of turning the index into a monolith. ### Variant Quality Gate for Promotion Cannot promote from `staged` to `curated` unless: 1. All three variants exist. 2. Snippet is genuinely shorter than distilled (>= 3:1 ratio). 3. Distilled is genuinely shorter than full (>= 2:1 ratio). 4. None of the variants are copy-paste of another. ## Standard XML Context Wrapper For distilled and full variants loaded into agent context, use this standard wrapper: ```xml This is background knowledge. Use it to enhance your judgment and situational awareness. Do not treat it as your primary instruction or make it the central focus of your response. Keep it in the back of your mind to improve decision quality. {actual KB content here} ``` This wrapper enables agents to aggressively grab 10-20 KB snippets without over-anchoring on any one. The framing prevents the model from treating context as instruction. Token budget: 15-20 distilled KBs at ~800 tokens each = ~12-16k tokens. Replace all legacy wrappers (``, ``, ``) with the standard `` wrapper when editing existing KBs. ## Required - `KB.md` entrypoint (summary + when to use) - `unit.json` metadata (tags, status, tier, provenance) Thumbnail/preview note: - KB rows do not need their own thumbnail field. - Package completeness means tags, summary, use case, links, aliases, and usable variants. - Preview/thumbnail/hero/OG metadata belongs to asset/content surfaces that already support fields like `preview_image_url`, `thumbnail_url`, `hero_image_url`, `og_image_url`, or `preview_url`. ## Optional but Preferred - `length.tokens_est` — approximate token count for routing/bundling - `system` — use only for official infra groupings (e.g., `HLT Corp Infra`) ## Current Storage Surface (Phase 0–1) - **Single surface:** `.claude/kb/curated/` (nesting allowed) - **Path is a view.** Identity is `entity_type + code + version` in `unit.json`. - Tools must discover via **recursive scan** (`**/unit.json`) or a **generated manifest**. - No dual‑layer authoring until DB sync exists. ## Recommended Structure - `KB.md` contains: - Summary - When to use - Key constraints and decision criteria (guidance-first) - Optional scoped rules (if the KB defines a protocol, keep it local and point to global rules) - **Implementation checklist** (only if the KB is meant to wire a system) - Source notes/citations when best-practice or factual claims can drift - Links to `references/` if large - Optional YAML front matter **derived from `unit.json`** (do not hand‑edit) - `references/` holds full docs, research notes, source material. ## Variants (Optional, Use When Helpful) - `variants/snippet.md` (short) - `variants/distilled.md` (medium) - `variants/full.md` (long reference) Not all KBs need variants. Use them only when they reduce cognitive load. ## Tag Coverage (Minimum) - `format:*` (reference / guideline / best_practice / plan / persona_profile for persona KBs / operational-log for preserved continuity) - `scope:*` - One of `domain:*` or `audience:*` - `source:*` Notes: - `format:*` is the granular classification surface (e.g. `format:api_reference`, `format:how_to`, `format:checklist`). - DB `kb_items.item_type` is intentionally coarse (reference/guideline/policy/best_practice/example/persona_profile). ## Testing Not required. Add when the KB drives critical outputs: - `examples/` showing usage in prompts - `evals/` with validation notes or fixtures ## Links - Use `parent` to point to an overview KB. - Use `related` for adjacent topics. - Use `recommends` and `pairs_with` to connect KB to relevant skills/prompts/tools. - Include `reason` and non-default `weight` when link importance is meaningful. ## ReferenceContext (Legacy Airtable‑style) Inputs When source content arrives in legacy `` blocks (XML‑style), preserve raw source for provenance and convert to the standard `` wrapper for variants. Mapping: - `Title` → KB title or reference title - `ReferenceContentType` → `format:*` tag - `Categories/Subcategories` → `domain:*`, `audience:*`, or `dept:*` - `SourceCoreContent` → distilled content for variants Keep the raw source under `references/raw/.xml` if needed; use distilled content with the standard wrapper for actual agent-facing variants. --- ## Source: docs/atomic-units/LINKS.md # Link Rules (Shared Across Units) Links connect units **softly** with weights and reasons. They are signals, not gates. ## Canonical Link Types These must align with `entity_links.link_type` in the DB: - `requires` — hard dependency - `prerequisite` — should come before - `uses_tool` — invokes a tool - `uses_prompt` — references a prompt - `uses_kb` — references a KB - `governed_by_pack` — constrained by a pack - `bundle_member` — belongs to a bundle; canonical storage is **member -> bundle** - `often_follows` — common sequence - `recommends` — suggested pairing - `pairs_with` — works well together - `alternate` — substitute option - `supersedes` — replacement/upgrade - `parent` — parent/overview relation - `related` — generic adjacency ## Required Link Fields Each link must include: - `type` - `to` (entity ref) - `weight` (0.0–1.0) - `reason` (short human explanation) ## Link Hygiene - Keep links **sparse** (3–10 per unit). - Use `related` only when you can’t be more specific. - Prefer `often_follows` for sequencing and discovery menus. - Use `parent` for overview/child KB relationships (not for hard deps). - Avoid ad‑hoc types like `adjacent` or `cousin`; map them to `related`. - `bundle_member` is directional, not an undirected "these go together" link. ## Weight Guidance - 0.1–0.3 = weak association - 0.4–0.6 = moderate association - 0.7–1.0 = strong association ## No Hard Gates Links **should not** be used to forbid behavior. Use rankings and recommendations instead. ## Agent Proclivities vs Links Links are **unit‑to‑unit** hints (global). Agent proclivities are **agent‑specific** preferences stored in `agent_proclivities` (prefer/avoid/default_to/never_use). Use proclivities to tune a specific agent’s behavior without changing global link weights. --- ## Source: docs/atomic-units/LINT_RULES.md # Lint Rules (Rules + Expectations) Lint rules are **machine-checkable validations** (readability, structure, brand, compliance). ## Canonical vs Portable Surfaces - **Canonical:** Supabase DB (`registry_entities` + `lint_rules`) - **Portable surface (today):** JSON packs under `registry-packs/*/pack.json` ## When to Use - preventing quality regressions before publishing - making “readiness strips” and ambient validation real - powering actionable warnings in preview/render surfaces ## Required (DB) - `registry_entities` row (selection-ready) - `lint_rules` row: - `rule_kind` - `rule_json` (parameters) - `severity` ## Tag Coverage (Minimum) - `format:*` - `modality:structured` - `scope:*` - `status:*` Recommended: - `domain:*` - `family:review` - `action:review` - `surface:cms-content` (if primarily used for content previews) ## Testing (Recommended) - provide at least one “pass” and one “fail” example (fixtures or artifacts) so rule behavior is inspectable --- ## Source: docs/atomic-units/LINT_RULESETS.md # Lint Rulesets (Rules + Expectations) Lint rulesets are **ordered collections of lint rules**. ## Canonical vs Portable Surfaces - **Canonical:** Supabase DB (`registry_entities` + `lint_rulesets` + `lint_ruleset_rules`) - **Portable surface (today):** JSON packs under `registry-packs/*/pack.json` ## When to Use - grouping rules for a surface (e.g., blog publishing) into a single “run lint” action - defining ordering and defaults without hard gates ## Required (DB) - `registry_entities` row (selection-ready) - `lint_rulesets` row - membership rows in `lint_ruleset_rules` (ordering via `sequence`) ## Important Invariant - **Ruleset membership is NOT an entity link.** - Membership and ordering live in `lint_ruleset_rules`. ## Tag Coverage (Minimum) - `format:*` - `modality:structured` - `scope:*` - `status:*` Recommended: - `domain:*` - `family:review` - `action:review` - `surface:*` (the surface the ruleset primarily governs) --- ## Source: docs/atomic-units/METRICS.md # Metrics (Rules + Expectations) Metrics define **what gets measured** (units, aggregation, interpretation). ## Canonical vs Portable Surfaces - **Canonical:** Supabase DB (`registry_entities` + `metrics`) - **Portable surface (today):** JSON packs under `registry-packs/*/pack.json` ## When to Use - tracking content performance (views, conversions, retention) - tying experiments (A/B tests) to measurable outcomes - generating dashboards and “what changed?” reports ## Required (DB) - `registry_entities` row (selection-ready) - `metrics` row with: - `metric_kind`, `unit`, `aggregation`, `description` ## Tag Coverage (Minimum) - `format:*` - `modality:structured` - `scope:*` - `status:*` Recommended: - `domain:data` - `family:evaluation` - `action:analyze` - `surface:eval` --- ## Source: docs/atomic-units/OPERATIONAL_LOGS.md # Operational Log (Rules + Expectations) Operational logs are **time-stamped operational records** -- daily logs, session summaries, incident notes, and decision records from agent operations. ## When to Use - when recording **daily operational activity** (daily-log-YYYY-MM-DD) - when documenting **session outcomes** or decision trails - when preserving **incident notes** or post-mortems from agent operations - when creating **time-series records** that track operational evolution ## What Operational Log Is (and Is Not) - Operational log is a historical record, not active runtime context. - Operational log preserves what happened; agent_doc prescribes what should happen. - KB explains domain concepts; operational log records domain events. - Operational log is archival by nature -- it should NOT be loaded into agent boot context. ### Operational Log vs Nearby Unit Types - `operational_log` vs `kb` - operational_log = time-stamped operational record - kb = durable reference knowledge - `operational_log` vs `agent_doc` - operational_log = what happened (historical) - agent_doc = how to behave (prescriptive) - `operational_log` vs `metric` - operational_log = narrative record of operations - metric = quantitative measurement point If the content is prescriptive rather than historical, it should be `agent_doc` or `kb`. If the content is a quantitative signal, it should be a `metric`. ## Personal memory rule Operational logs are the right home for: - dated session summaries - incident notes - daily recaps - operational breadcrumbs that preserve continuity They are **not** the right home for: - stable user preferences - durable persona/profile facts - reusable field knowledge - product documentation Distill those into `kb` when they become durable truth. Keep the noisy chronology here. ## Personal memory rule Operational logs are the right home for: - dated session summaries - incident notes - daily recaps - operational breadcrumbs that preserve continuity They are **not** the right home for: - stable user preferences - durable persona/profile facts - reusable field knowledge - product documentation Distill those into `kb` when they become durable truth. Keep the noisy chronology here. ## Discovery Behavior Operational logs are **hidden from generic discovery** by default. They appear only when: - explicitly requested via `entity_types: ['operational_log']` - searched by date range or specific incident reference - included in a bundle for audit or review purposes This prevents historical logs from cluttering capability discovery results. ## Naming Convention Operational log codes follow the pattern: - `daily-log-YYYY-MM-DD` (e.g., `daily-log-2026-02-17`) - `{agent}-daily-log-YYYY-MM-DD` (e.g., `lila-daily-log-2026-02-27`) ## Status Lifecycle Operational logs follow the standard entity status lifecycle: `staged -> curated -> published -> deprecated -> archived` Recent logs should be `published`. Logs older than the active retention window should be `archived`. ## Linking Operational logs are typically standalone. They may be linked to agent entities via `related` links for traceability, but they should NOT be linked via `requires` or `uses_kb` since they are not runtime dependencies. --- ## Source: docs/atomic-units/PLAYBOOKS.md # Playbooks (Rules + Expectations) Playbooks are **ordered multi‑step patterns**. They suggest flow but do not hard‑gate. ## Actions (Launchpad Cards) Actions are a **featured subset of playbooks** used as Launchpad cards and command-palette entry points. V1 model: Action = `playbook` + `surface:cms-launchpad` (and/or membership in `bundle:launchpad-core@v1`). See `docs/atomic-units/ACTIONS.md`. ## When to Use - defining a recommended sequence of skills/tools - standardizing repeatable workflows without hard orchestration - giving agents a “default path” with room to adapt ## Playbook vs Nearby Unit Types - `playbook` vs `skill` - playbook = ordered or adaptive workflow across units - skill = one reusable method - `playbook` vs `bundle` - playbook = sequence/checkpoints matter - bundle = grouped context only - `playbook` vs `plugin/export` - playbook = canonical workflow logic in the registry - plugin/export = packaging selected units for outside use ## What Good Playbooks Look Like - sequence is clear without becoming brittle - links point to the real skills/tools/KBs used at each stage - checkpoints and branching logic are explicit where they matter - the playbook helps an agent know what should happen next, not just what exists - the playbook coordinates methods; it does not re-explain every method in full - one major job family should usually have one flagship playbook, not five competing peer workflows Repo-native examples: - `playbook:meeting-briefing-research-report@v1` = ordered research/report flow - `playbook:blog-production@v1` = coordinated production workflow - `playbook:registry-health-scan@v1` = recurring operational workflow ## Low-Surface-Area Rule One capability family should not have five peer playbooks that all act like alternate front doors. Prefer: 1. one primary playbook for the bigger adaptive workflow 2. linked skills for the reusable methods inside it 3. bundles for grouped context 4. Actions only where the playbook deserves a surfaced operator entry ## Structure Requirement Promoted playbooks must have real structure. That means one of: - ordered `steps_json` - explicit phased sections with checkpoints - adaptive branch notes that still preserve a clear flow If the row mostly says “these things go together” but does not encode a real order, it is probably a `bundle`, not a promoted playbook. ## Required - `PLAYBOOK.json` entrypoint - `unit.json` metadata (tags, status, tier, provenance) ## PLAYBOOK.json (minimum fields) ``` { "steps": ["research", "plan", "draft", "review", "publish"], "mode": "guided" } ``` Modes: - `strict` — ordered, required - `guided` — suggested order - `adaptive` — weighted hints only ## Playbook Composition Rule Playbooks should mostly link outward instead of copying whole methods inward. Typical shape: - step intent - checkpoint / branching note - links to the skill/tool/KB/bundle used at that step - optional artifacts with examples or run notes If the step text fully re-teaches the method, the underlying skill is probably under-modeled. Typical flagship playbook shape: 1. understand and acknowledge the user’s request 2. research and context gathering 3. planning/outline or framing 4. execution across skills/tools/styles/recipes 5. review/evaluation 6. publish/schedule/store if relevant 7. measure and feed learning back into the system ## Tag Coverage (Minimum) - `action:*` - `scope:*` Recommended: - `stage:*` - `surface:*` - `domain:*` or `audience:*` - `dept:*` - `source:*` ## Links - Use `often_follows` or `prerequisite` between steps for graph traversal. - Use `bundle_member` if the playbook belongs to a bundle. - Canonical direction: `playbook -> bundle` - Use `requires`, `uses_tool`, and `uses_kb` for the actual execution family. ## This vs That - `skill` = reusable method - `playbook` = reusable ordered/adaptive workflow - `bundle` = grouped context pack - `action` = surfaced playbook entry - `plugin/export` = packaging, not canonical workflow logic If the user says "I want a pack that tells the agent which few things to use together," that is often a `bundle`. If the user says "I want the agent to usually do A, then B, then C, but branch if research is thin," that is a `playbook`. ## Prime Examples - `playbook:meeting-briefing-research-report@v1` = flagship meeting-prep workflow - `playbook:blog-production@v1` = flagship writing/publishing workflow - `playbook:registry-health-scan@v1` = flagship stewardship workflow ## Testing (Optional) - Provide `examples/` with sample runs or sequences. --- ## Source: docs/atomic-units/PROMPTS.md # Prompts (Rules + Expectations) Prompts are executable instruction contracts (system/task/format/policy/rubric). DB entity_type is `prompt`. Prompts are not one-line stubs. They should be structured, explicit, and portable across orchestrators unless intentionally scoped. ## Canonical vs Portable Surfaces - **Canonical:** Supabase DB (`registry_entities` + `prompts`) - **Portable export (default):** JSON packs under `registry-packs/` - Exporter: `npx tsx scripts/distribution/export_registry_packs.ts` - **Filesystem prompt folders:** optional later, not required today ## When to Use a Prompt Unit - reusable system/task/format guidance that controls model behavior - stable instructions that multiple skills/playbooks/agents can share - explicit output contracts and constraints that should version independently Use a KB instead when content is mostly reference/context material. Use a skill launcher when you need full procedural execution logic. ## Prompt vs Nearby Unit Types - `prompt` vs `skill` - prompt = reusable instruction contract - skill = reusable method with discovery, graph, and layered execution guidance - `prompt` vs `kb` - prompt = tell the model what to do - kb = tell the model what is true or important ## What Good Prompts Look Like - objective, inputs, instructions, output contract, and guardrails are explicit - reusable across more than one run or agent - narrow enough to be dependable, broad enough to avoid one-off lock-in - linked to KB and skills instead of trying to absorb every piece of context inline ## Prompt Quality Standard ### Required depth - No single-sentence placeholders. - Must include enough detail to be reliably executable by another agent with no hidden context. - Must describe both what to do and what to avoid. ### Recommended structure (default) Use these sections unless there is a justified reason not to: - `# Context` - `## Objective` - `## Inputs` (variables/placeholders with descriptions) - `## Instructions` (numbered protocol, guidance-first) - `## Output Format` (exact return contract) - `## Guardrails` (failure modes, quality checks, do-not-return rules) ### Complexity guidance - Low-stakes/simple tasks: concise but complete structure. - Medium/high-stakes tasks: richer examples, edge cases, and explicit quality checks. - If reliability is weak, expand structure and examples before adding hard gates. ## Required DB Records - `registry_entities` (typed identity + summary + status) - `prompts` (prompt content + prompt metadata) - `entity_tags` (taxonomy coverage) If filesystem prompt packages are introduced later, follow `docs/atomic-units/SHARED_CONTRACT.md` with `PROMPT.md` entrypoint. ## Tag Coverage (Minimum) - `action:*` - `stage:*` - `modality:*` - `scope:*` Recommended: - `domain:*` or `audience:*` - `source:*` ## Links (Discovery Graph) - Use `uses_kb` when prompt depends on KB context. - Use `uses_tool` when prompt assumes tool outputs/contracts. - Use `requires` for hard prerequisites only. - Use `often_follows` / `often_precedes` for soft sequencing hints. - Prefer weighted hints over rigid routing. ## Testing and Evals If prompt is critical, include tests/evals: - expected-output fixtures for representative inputs - failure-mode tests for common breakdowns - regression checks before promotion (`staged` → `curated` → `published`) ## Anti-Patterns (Do Not Ship) - thin stubs ("Write a good answer.") - hidden assumptions about one orchestrator/runtime - over-narrow domain lock-in when context does not require it - vague output requirements ("be clear", "be helpful") without concrete contract - hard imperative routing that removes agent discretion without safety reason ## Quality Checklist Before Promotion - prompt has explicit objective, inputs, instructions, output contract, guardrails - placeholders/variables are declared and named consistently - output format is machine-usable where needed (JSON/XML/markdown contract) - links/tags exist so agents can discover related units - content is broad enough to reuse, specific enough to execute ## Source-Aligned Prompting References - OpenAI Prompt Engineering Guide: https://platform.openai.com/docs/guides/prompt-engineering - OpenAI Prompt Optimization Guide: https://platform.openai.com/docs/guides/text?api-mode=responses#prompt-optimization - Anthropic Prompt Engineering Overview: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview - Google Vertex AI Prompt Design Strategies: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/prompt-design-strategies --- ## Source: docs/atomic-units/RECIPES.md # Recipes (Rules + Expectations) Recipes bind **schema + style + constraints**. They are presets, not new schemas. ## When to Use - packaging schema + style + channel defaults - enforcing consistent output constraints - serving discovery menus with ready‑to‑run presets ## Required - `RECIPE.json` entrypoint - `unit.json` metadata (tags, status, tier, provenance) ## RECIPE.json (minimum fields) ``` { "base_schema": "schema:article_v2", "style": "style:listicle", "channel": "channel:web", "constraints": { "min_words": 800 } } ``` ## Tag Coverage (Minimum) - `format:*` - `scope:*` Recommended: - `channel:*` - `surface:*` - `modality:*` - `source:*` ## Links - Use `requires` to reference base schema. - Use `uses_prompt` or `uses_kb` for required guidance. --- ## Source: docs/atomic-units/RUBRICS.md # Rubrics (Rules + Expectations) Rubrics define **evaluation criteria** (scores, weights, and pass/fail thresholds) that keep quality measurable and portable. ## Canonical vs Portable Surfaces - **Canonical:** Supabase DB (`registry_entities` + `rubrics`) - **Portable surface (today):** JSON packs under `registry-packs/*/pack.json` ## When to Use - scoring drafts, skills, tools, or outputs for quality - enabling A/B tests and pairwise judging with consistent criteria - feeding eval signals back into discovery ranking over time ## Rubric Design Principles - Evidence-first: every criterion must be observable in output behavior. - Composable: default rubric should generalize across orchestrators and repos. - Context overlays are allowed: add scoped criteria when a domain/regulatory context requires it. - Guidance over lock-in: do not hard-code a single workflow path unless safety-critical. - Source-backed quality language: if criteria cite external best practices, include references. ## Required (DB) - `registry_entities` row (selection-ready) - `rubrics.rubric_json` with: - `name` - `description` - `criteria[]` list with ids/descriptions/weights - scoring scale and thresholds (recommended) Example minimum contract: ```json { "name": "Portable Skill Quality", "description": "Reusable quality rubric for cross-orchestrator skills.", "criteria": [ { "id": "clarity", "name": "Trigger Clarity", "weight": 1.2, "scale": "0-100" }, { "id": "contract", "name": "Output Contract", "weight": 1.0, "scale": "0-100" } ], "thresholds": { "promote": 75, "excellent": 90 } } ``` ## Tag Coverage (Minimum) - `action:*` - `scope:*` Recommended: - `domain:*` - `format:*` - `family:evaluation` - `modality:structured` - `surface:eval` - `source:*` ## Link Guidance - Link rubrics to units they evaluate: - `requires` when rubric is a hard gate for promotion. - `recommends` when rubric is optional but high-value. - `pairs_with` for common rubric bundles (for example quality + safety). - Include `weight` + `reason` so graph traversal stays explorable. ## Anti-Patterns - Criteria that are vague or not testable ("be better", "high quality vibes"). - Criteria that force repo-scoped behavior by default. - Hard lock-in language where a weighted recommendation is enough. - Rubrics with no threshold guidance or no remediation cues. ## Promotion Checklist - Criteria are observable and unambiguous. - Weighting reflects impact (not all criteria equal by default). - Pass/fail thresholds align with governance docs. - Remediation hints exist for common failure classes. --- ## Source: docs/atomic-units/SCHEMAS.md # Schemas (Rules + Expectations) Schemas define **output shape contracts**. They are the most fragile units and require the strongest testing discipline. ## Canonical vs Portable Surfaces - **Canonical:** Supabase DB (`registry_entities` + `json_schemas`) - **Portable surface (today):** JSON packs under `registry-packs/*/pack.json` - Exporter: `npx tsx scripts/distribution/export_registry_packs.ts` - **Filesystem schema mirrors:** deferred (do not assume `schema/json_schemas/*` exists) ## When to Use - enforcing structured outputs - validating content types or tool payloads - defining contracts for generators and validators ## Required - DB records for: - `registry_entities` (typed identity + summary + status) - `json_schemas` (schema JSON + compiler metadata) - `entity_tags` (taxonomy coverage) If we later add schema unit packages in `.claude/`, they will follow `docs/atomic-units/SHARED_CONTRACT.md` with an entrypoint like `SCHEMA.json`. ## Testing (Mandatory) - `tests/valid/*.json` — minimal valid examples - `tests/invalid/*.json` — adversarial invalid examples - `tests/README.md` — how to run validation Schemas without tests **stay staged**. ## Tag Coverage (Minimum) - `format:*` - `modality:*` - `scope:*` Recommended: - `family:*` (once namespace exists) - `channel:*` (if channel‑specific) ## Versioning - Schema version lives in `unit.json` and DB. - Never change shape without a new version. ## Links - Use `requires` from recipes or content types to the schema. --- ## Source: docs/atomic-units/SHARED_CONTRACT.md # Shared Unit Package Contract This defines the **standard envelope** for _all_ atomic units. Each unit type has its own entrypoint (e.g., `SKILL.md`, `SCHEMA.json`), but the surrounding structure is **consistent**. **Identity is not the path.** Use `entity_type + code + version` from `unit.json`. ## Minimal Read Path Keep the cognitive surface area small. Default order: 1. use [Atomic Unit Decision Matrix](DECISION_MATRIX.md) for cross-unit routing 2. use this shared contract for package shape + shared quality expectations 3. load the one per-unit doc you actually need 4. only then load deeper examples/artifacts Do not expect agents to preload every atomic-unit doc. The system should route to the right depth progressively. Front-door rule: - one front-door index per capability family - deeper material linked beneath it - no parallel "start here" docs unless they serve meaningfully different jobs or runtimes ## Canonical Envelope ``` / unit.json # shared metadata: tags, status, tier, source, links entrypoint # type-specific (see below) references/ # long-form context rules/ # guardrails / constraints templates/ # reusable templates examples/ # sample inputs/outputs tests/ # fixtures or regression tests evals/ # eval results or eval fixtures schemas/ # JSON/DSL schemas scripts/ # helpers (never auto-run) data/ # lookup tables, CSVs, dictionaries how_to_use/ # usage guides assets/ # images/diagrams/media artifacts/ # optional single-bucket artifacts + index ``` ## Entrypoints (by unit type) - Skill → `SKILL.md` - KB → `KB.md` - Prompt → `PROMPT.md` - Schema → `SCHEMA.json` - Tool → `TOOL.json` - Bundle → `BUNDLE.json` - Agent → `AGENT.json` - Channel → `CHANNEL.json` - Recipe → `RECIPE.json` - Playbook → `PLAYBOOK.json` - Style → `STYLE.json` - Content type → `CONTENT_TYPE.json` - Rubric → `RUBRIC.json` - Metric → `METRIC.json` - Lint rule → `LINT_RULE.json` - Lint ruleset → `LINT_RULESET.json` Note: several unit types are DB-canonical today and exported via JSON packs. The entrypoint names above apply if/when we add filesystem unit packages for those types. ## `unit.json` (shared metadata) Minimum fields: ``` { "entity_type": "skill|tool|kb|prompt|schema|bundle|style|recipe|content_type|agent|playbook|channel|rubric|eval_case|metric|lint_rule|lint_ruleset|agent_doc|operational_log|pattern|hub", "code": "unprefixed-slug", "version": "v1", "status": "staged|curated|published|deprecated|archived", "tier": 1, "tags": ["action:research", "stage:planning", "domain:nursing", "..."], "provenance": { "source": "internal|skills.sh|github|legacy", "url": "..." }, "links": [{ "to": "skill:other", "type": "often_follows", "weight": 0.5 }] } ``` This is **the unifying layer** that keeps taxonomy, status, and provenance coherent across all unit types. ## Discovery Facets vs Structural Fields Keep this split explicit so the automation does not create fake debt: - tags are for discovery facets and graph routing - table-native fields are for structural truth owned by the entity extension row Examples: - tools own `tool_type` and `provider` structurally in the `tools` table - agents own `persona_role` structurally in the `agents` table - `registry_entities.status` is canonical state; a `status:*` tag can still help discovery, but it should not be required everywhere just because the row has lifecycle state Do not duplicate every structural field as a required tag. Duplicate it only when it materially improves discovery or filtering. **Code invariant:** `code` must be colon‑free and unprefixed (no `entity_type:`). **Format guidance:** KBs + skills use lowercase kebab‑case; tools/schemas may include `_` or `.` when those are part of provider IDs. Legacy colon‑prefixed codes were migrated in Phase 01‑05, and the DB now enforces colon‑free codes. ### Optional fields (recommended for agent routing) ``` { "name": "Canonical human label", "derived": { "code": "derived-slug", "aliases": ["..."] }, "length": { "label": "short|medium|long", "tokens_est": 1200 }, "system": "HLT Corp Infra", "purpose": "One-line intent", "use_when": ["routing hint", "routing hint"], "avoid_when": ["routing warning"] } ``` **Guideline:** `name` is the single source of truth; `code` and `aliases` are derived automatically. `length.tokens_est` should be a rough token estimate for KBs (used for bundling). ## Shared Quality Signals Every curated/published first-class unit should make these things legible: - what it is - when to use it - what runtime or surface it actually fits - what other units it belongs with - whether it is portable upstream, HLT-personalized, or org-specific Practical quality floor: - `name` is operator-legible - `summary` is discovery-grade - `use_case` is specific enough to route work correctly - tags include required namespaces for the unit type - links are intentional enough that the unit is not graph-isolated Different unit types reach that floor differently: - skills and KBs lean heavily on tags + layered artifacts - tools and agents also rely on extension-table truth (`tools`, `agents`) - bundles and playbooks lean more on link quality than on large tag sets Package-backed units should satisfy that through `unit.json` + entrypoint + artifacts. DB-canonical units that do not have a filesystem package yet should still satisfy the same judgment layer in canon. ## Registry vs Asset vs Export Do not promote everything into a first-class registry node. - reusable method -> `skill` - reusable truth/context -> `kb` - reusable ordered workflow -> `playbook` - reusable unordered context pack -> `bundle` - local file attached to one unit revision -> `artifact` - relationship only -> `link` - generated distribution/install surface -> `plugin/export` - one produced deliverable (article, image, deck, run recap) -> usually asset/content instance, not a new atomic unit Tools can absolutely carry artifacts too. The same rule applies: - if it is local support material for one tool revision, keep it as a tool artifact - if it is reusable truth across many units, promote it to KB and link it - if it is reusable methodology for using the tool well, promote it to a skill and link it Daily logs, scratch notes, and raw memory dumps usually stay operational until distilled into durable truth. Examples: - one produced image or article = asset/content instance - one run recap = trace or operational log - one super-agent operating front door = usually KB - recurring multi-step workflow = playbook - grouped context pack = bundle ## Nightly Automation Contract Nightly/OpenClaw agents should not guess ontology ad hoc. Recommended automation loop: 1. classify with [Atomic Unit Decision Matrix](DECISION_MATRIX.md) 2. validate package-backed units with: - `python3 scripts/registry/lint_unit_packages.py --strict` - `python3 scripts/registry/audit/audit_atomic_unit_positioning.py --report docs/reports/atomic-unit-positioning-latest.json` 3. validate DB-canonical units with: - `npx tsx scripts/registry/audit/audit_registry_atomic_contract.ts --report docs/reports/registry-atomic-contract-latest.json` 4. only then apply DB-first remediation and regenerate mirrors/exports ## Artifact Mapping Folder names map to `artifact_type` values stored in `entity_revisions.artifacts_json` (DB-canonical). See `docs/atomic-units/ARTIFACTS.md` for the canonical mapping table. Accepted canonical shapes: - array entries (`artifact_type`, `path`, `content`, `mime?`) - layered object (`files`/`notes`/`references`) where `files[*].content` is optional ## Artifact Index (Optional) If a unit uses a single `artifacts/` folder instead of multiple subfolders, add an index file: ``` artifacts/index.json ``` Example: ``` { "items": [ { "path": "artifacts/how_to_use.md", "kind": "how_to_use" }, { "path": "artifacts/expected_output.json", "kind": "test" } ] } ``` This preserves a **single “artifact bucket”** while keeping machine-readable intent. --- ## Source: docs/atomic-units/SKILLS.md # Skills (Rules + Expectations) Skills are **procedural knowledge** for agents. They must be portable and activation‑friendly. ## Required - `SKILL.md` with YAML frontmatter: - `name` (use the **code** / slug for Claude.ai compatibility) - `description` (trigger‑grade; what + when + keywords) - `unit.json` with tags, status, tier, provenance **and canonical human label** in `name` Frontmatter compatibility (portable): Portable (Agent Skills spec) keys we support: - Required: `name`, `description` - Optional: `license`, `compatibility`, `metadata`, `allowed-tools` Katailyst extension keys we also accept: - `argument-hint` (common in Claude Code / skills.sh ecosystems for CLI hinting) - Importers mirror this into `metadata.argument_hint` so strict Agent Skills consumers can rely on `metadata`. Profile guidance for curated/published skills: - `metadata.profile: method_pack` - `metadata.profile: rule_corpus` - `metadata.profile: runtime_adapter` - `metadata.profile: capability_reference` Use profiles to explain why two valid skills may have different shapes: - `method_pack` - reusable operator method with layered examples, how-to docs, and templates - examples: `copywriting`, `brainstorming`, `skill-creator`, `ai-marketing-videos` - `rule_corpus` - imported best-practice corpus with many leaf rules or references - examples: `vercel-react-best-practices`, `next-best-practices`, `create-remotion-geist` - `runtime_adapter` - specialized for one execution environment or runtime overlay - examples: `openclaw-skill-creator`, runtime-specific operator guides - `capability_reference` - capability-heavy surface that mixes operator guidance, reference depth, and tool-family navigation - examples: `github-mcp`, `linear`, `firecrawl` - use when a pure tool row would be too thin but the material is still centered on a capability family rather than one reusable method Imports are legitimately heterogeneous: - some Anthropic or `skills.sh` packages are direct operator methods - some are deep rule corpora with many leaf references - some are thin runtime adapters that only make sense in one execution lane Do not force those shapes to look identical. Govern them with explicit profiles, graph placement, and truthful discovery metadata instead. Mirror/export policy: - `.claude/skills/curated/**/SKILL.md` keeps frontmatter intentionally minimal (`name`, `description`) for maximum compatibility. - Keep tags/status/tier/provenance in `unit.json`. Naming constraints to enforce: - `SKILL.md` frontmatter `name` **must match the unit `code`** (lowercase letters, numbers, hyphens only). - Max 64 characters. - Avoid reserved words like `anthropic` and `claude`. - Verb‑first is **recommended**, not required. - Directory name must match `code` when exporting for Claude.ai. - Human-facing names in `unit.json.name` should be capability/value first by default. - Add provider/runtime to the human-facing label only when it materially changes the choice. **Canonical label lives in `unit.json.name`.** This keeps human‑friendly names while preserving Claude export compatibility. For cross-unit decisions such as "should this be a skill, tool, KB, bundle, playbook, artifact, or link?", use [Atomic Unit Decision Matrix](DECISION_MATRIX.md) before writing the package. ## Canonical vs Portable Surfaces - **Canonical:** Supabase DB (`registry_entities` + `skills` + `entity_revisions` with `content_json` + `artifacts_json`) - Note: `entity_artifacts` exists but is reserved for future blob/binary storage and indexing. - Current canonical revisions may use either: - explicit artifact-content entries (array shape), or - layered metadata objects (`files`/`notes`/`references`) where `files[*].content` can be omitted. - **Portable mirror (default):** `.claude/skills/curated/**` (deterministic, generated) - Exporter: `npx tsx scripts/registry/sync/sync_skills_from_db.ts` - Drift check: `npx tsx scripts/registry/sync/sync_skills_from_db.ts --check` - **Portable plugin snapshot:** `.claude-plugin/skills/**` (deterministic, generated) - Exporter: `npx tsx scripts/distribution/export_plugin.ts` - Drift check: `npx tsx scripts/distribution/export_plugin.ts --check` - **Staged imports:** `.claude/skills/imports/**` (filesystem unit packages, not curated) - Normalizer: `npx tsx scripts/ingestion/normalize_skill_imports.ts` - DB importer: `npx tsx scripts/registry/import/import_staged_skills_to_db.ts` Operational lifecycle mode (current policy): - `curated` is considered live in this repo. - `published` remains an optional release/distribution tier. - staged imports are temporary by design and should be pruned when they duplicate already-curated skills (after capturing a provenance snapshot). ### Why Skills Get "Stuck" in Imports Imports are intentionally a staging surface. The common reasons a skill stays `staged` are: 1. Missing required taxonomy namespaces (for example no `action:*` or no `family:*`). 2. Missing skill test artifacts in the current revision (typically under `tests/`). 3. Weak launcher metadata that passes parsing but fails curation quality checks. Check and fix before promotion: - `npx tsx scripts/registry/promote/promote_skill_status.ts --code --to curated --check` - add required tags in `unit.json` and/or DB tags - add at least one concrete test fixture for the skill - re-run import, then promote - Tools must discover via **recursive scan** (`**/unit.json`) or a **generated manifest** (`.claude/skills/curated/manifest.json`). ### Duplicate Import Policy If a staged import package has the same `skill:code@version` as an already-curated skill, treat it as duplicate backlog, not an active candidate: 1. capture provenance evidence (snapshot report), 2. prune the duplicate staged package, 3. regenerate imports manifest. If the `code` matches but the `version` differs, treat it as a _potential upgrade_ candidate (do not prune automatically). Recommended commands: - `npx tsx scripts/registry/audit/audit_skill_lifecycle.ts --report docs/reports/skills-lifecycle-audit-YYYY-MM-DD.json` - `npx tsx scripts/ingestion/prune_duplicate_import_skills.ts --snapshot docs/reports/skills-import-duplicates-YYYY-MM-DD.json --write` ## Filesystem Notes - Treat `.claude/skills/curated/**` as generated output. Edit skills in DB and re-export. - Do not add extra files under generated mirrors unless the exporter writes them (prevents drift). ## Description (Activation‑Grade) Description must answer: 1. **What** it does 2. **When** to use it 3. **Keywords** users will actually say If description is weak, the skill won’t trigger. Description constraints to enforce: - Max 1024 characters in the general spec. - Claude.ai exports must keep description ≤ 200 chars. Portable mirror contract (how we satisfy both): - `registry_entities.use_case` is the primary trigger-grade source for launcher descriptions. - Fallback sources: `registry_entities.summary` then `entity_revisions.content_json.description`. - `.claude/skills/curated/**/SKILL.md` frontmatter `description`: short, Claude-safe variant (<= 200 chars by default). - `.claude/skills/curated/**/unit.json` stores both: - `derived.description_full` - `derived.description_short` Exporter knob: - `CATALYST_SKILLS_DESC_MAX` (or `--desc-max`) controls the max length of the short variant in the mirror exporter. Linting: - `python3 scripts/registry/lint_unit_packages.py` validates curated unit package contract. - For skills, it enforces (errors): `unit.json.derived.description_short` matches `SKILL.md` frontmatter `description` for curated/published units. - It emits warnings (non-blocking by default) for missing tests/ and non-trigger-grade descriptions so quality debt is visible without making workflows brittle. - It also infers likely `metadata.profile` shape and flags obvious placement smells so imports do not drift silently into the wrong form. - For a structured report instead of raw lint output, use: - `python3 scripts/registry/audit/audit_atomic_unit_positioning.py --report docs/reports/atomic-unit-positioning-latest.json` Read-budget rule: - one front-door skill should route to deeper artifacts conditionally - do not require an agent to preload five to ten peer docs just to understand one skill - put the minimum routing truth in the launcher, then branch to the right artifact only when needed ## Best‑Practice Anchors (Source‑Backed) - `SKILL.md` frontmatter **must** include `name` and `description`. `allowed-tools` is optional and should be used only when truly necessary (default = all tools). - Progressive disclosure: put deep detail in `references/` and keep the launcher focused; avoid duplicating the same info in both places. - Skills can be consumed by AI SDK agents via bash‑tool; keep skills portable and filesystem‑friendly. - Skill authoring guidance recommends concise launchers and moving depth into references/examples/scripts. ## Size + Progressive Disclosure - Prefer **progressive disclosure** over strict length limits. - Move deep material to `references/`. - Use **explicit load triggers** inside `SKILL.md`: - “MANDATORY: read `references/…`” - “Do NOT load …” ### Length Guidance (Portability vs Quality) - Long `SKILL.md` launchers are **not automatically low quality**. - The "~500 lines" guidance from Claude Code docs is a portability/readability recommendation, not a deletion trigger. - Keep long launcher content when it materially improves output quality and execution reliability (especially for official or high-signal imported skills). - Prefer refactoring to layered artifacts when it improves maintainability, but do not remove depth solely to satisfy a line-count heuristic. - Quality decisions should prioritize: correctness, output quality, test evidence, and graph/link usefulness over raw length. ## Layered Artifacts (Rules / Templates / Tests) Layered skills should use the canonical artifact folders from `docs/atomic-units/ARTIFACTS.md`: ``` rules/ templates/ references/ examples/ tests/ evals/ schemas/ scripts/ assets/ data/ how_to_use/ ``` Guidance: - `SKILL.md` + `unit.json` are entrypoints (not artifacts). - Use `rules/` for hard constraints and lint rules. - Use `templates/` for reusable prompt skeletons or configs. - Use `tests/` for expected-output fixtures (required before curation). - Use `references/` for deep context; keep `SKILL.md` lean. - For object-shaped artifact metadata (`artifacts_json.files`), include `content` when you expect the mirror/plugin exporters to materialize filesystem files. ## What Good Skills Look Like - clear trigger-grade launcher metadata - one strong method surface instead of duplicate peers - intentional graph links to KB/tools/playbooks/adjacent skills - enough artifacts to deepen the method without hiding it - truthful runtime/source/tier metadata ## Skill vs Nearby Unit Types - use a `skill` when the value is a reusable method - use a `tool` when the value is mainly a callable capability - use `kb` when the value is durable truth or context - use a `playbook` when order or adaptive sequencing matters across multiple units - use a `bundle` when grouped context matters but order does not - use an `artifact` when the content only belongs to one skill revision ## Tag Coverage (Minimum) From `docs/TAXONOMY.md`, each skill must include: - `action:*` - `stage:*` - `modality:*` - `scope:*` - One of `domain:*` or `audience:*` - `family:*` ## Priority Tier Use tier 1–10: - 1 = exceptional / flagship - 2 = excellent - 3 = strong - 4 = good - 5 = average / solid - 6 = below average - 7 = weak / niche - 8 = poor - 9 = raw / unpolished - 10 = bottom / deprecated ## Testing At least one of: - `tests/expected_output.json` - `tests/fixtures/` + evaluation rubric Skills without tests stay **staged**. Promotion helper (DB-first, gated): - `npx tsx scripts/registry/promote/promote_skill_status.ts --code --to curated --check` ## Scripts + Usage Guides - If a skill requires helper scripts, put them in `scripts/`. - Add a usage guide in `how_to_use/README.md` (or `examples/`) that shows: - exact commands - inputs/outputs - when to run - when _not_ to run ## Canonical Examples Each family should have **1–3 flagship skills** used as tone and structure templates. ### Flagship Template (Minimal, High‑Signal) ```markdown --- name: research-trends description: Find rising topics for a target audience. Use when planning content or validating what’s trending right now. --- # Research Trends ## When to Use - planning content ideas - validating emerging topics in the last 30 days ## Workflow 1. Identify audience + domain 2. Gather sources (web + internal) 3. Extract signals + summarize 4. Provide ranked topics + why ## Required Inputs - audience (who) - domain (what) - time window (default: last 30 days) ## Output - Ranked topics with source evidence - Short “why this matters” notes ## References **MANDATORY:** read `references/sources.md` for allowed source types. ## Tests Use `tests/expected_output.json` for a minimal scenario. ``` ## Sources - Anthropic: Skills authoring guidance and format constraints - Claude Docs: Skills tool usage and packaging - skills.sh: Skill development best practices - Vercel: Skills and AGENTS.md findings --- ## Source: docs/atomic-units/STYLES.md # Styles (Rules + Expectations) Styles are **formatting overlays** that do **not** change schema shape. They can cover both: - verbal style: tone, voice, reading level, rhetorical feel - visual style: color, typography, spacing, composition, components, motion, and presentation language Keep this as **one type**. Do not split voice-style and design-style into separate ontology branches unless the registry genuinely needs two lifecycles. In practice a style may be: - verbal-only - visual-capable - mixed brand/system style That is a facet difference, not a type difference. ## Canonical vs Portable Surfaces - **Canonical:** Supabase DB (`registry_entities` + `styles`) - **Portable surface (today):** JSON packs under `registry-packs/*/pack.json` If we later add filesystem style unit packages, they will follow `docs/atomic-units/SHARED_CONTRACT.md` with an entrypoint like `STYLE.json`. ## When to Use - standardizing tone/voice across channels - making outputs consistent without forking schemas - giving agents a stable “house style” to apply - preserving a visual system across pages, images, decks, and design outputs without redefining the content type - expressing brand modes or environment modes such as beach-house, business-presentation, nursing-brand, or product-specific variants without inventing a new output contract ## Style vs Nearby Unit Types - `style` vs `content_type` - style = how something should feel/look/sound - content_type = what kind of output is being produced - `style` vs `recipe` - style = overlay - recipe = packaged combination of schema/style/constraints - `style` vs `kb` - style = applied output pattern - KB = deeper rationale, examples, or doctrine behind that pattern ## Premium Page Neighborhood For high-stakes page work: - `playbook` = the front door that chooses page mode, render target, and promotion risk - `style` = the compact overlay that changes feel and visual direction - `kb` = the deeper doctrine, anti-patterns, and quality bar - `artifact` / reference surface = screenshots, live exemplars, layout references, and mined patterns - `bundle` = the curated kit for a repeatable class of page jobs Do not inflate the style row until it tries to act like all four. ## What Good Styles Look Like - the launcher/summary makes the intended feeling legible quickly - overlays say what changes, not just that “brand matters” - verbal and visual guidance can coexist without forcing a schema fork - linked KB/examples/assets give richer context when the style is brand-heavy or design-heavy - recipes and playbooks can point at the style instead of copying it Rich style families should still stay compact: - one main style row for the reusable system - linked examples/assets for concrete looks - linked KB for deeper doctrine or brand rationale - recipes for ready-to-run combinations If the main value is “use this style, but branch this way for nursing vs beach-house vs presentation mode,” that still belongs in `style` with linked examples and constraints. ## Required (DB) - `registry_entities` row with selection-ready `name`, `summary`, `use_case` - `styles` row: - `overlays_json` (what to apply) - `constraints_json` (what to avoid / limits) ## Supported Overlay Facets Verbal overlays usually include: - `tone` - `voice` - `register` - `rhetoric` - `structure` - `format` - `claim_style` Visual overlays usually include: - `palette` - `color_palette` - `typography` - `component_set` - `spacing` - `motion` - `iconography` - `layout` - `theme` Use the smallest set that makes the style executable. Do not add decorative overlays that do not change real outputs. ## Tag Coverage (Minimum) - `format:*` - `modality:*` - `scope:*` - `status:*` Recommended: - `domain:*` - `family:*` - `action:*` - `surface:*` (when style is tied to a CMS surface) ## Links - Recipes should use `requires` links to schemas and reference styles by ref. - Styles can link to KB items (`uses_kb`) for longer rationale or examples. - Styles can also link to component systems, design exemplars, or visual rules KBs when an agent needs more than the compact overlay. - Styles can use artifacts for: - sample compositions - palette/type references - component screenshots - branching notes such as “if presentation use X, if brand-social use Y” ## Prime Examples - `style:leadership_briefing@v1` = verbal-first briefing overlay - `style:devtool_operator_console@v1` = mixed visual/system-flavored overlay - `style:viral_social@v1` = verbal-first social rhythm overlay - `style:operator_console_editorial@v1` = dense operator/editorial premium page overlay - `style:premium_conversion_story@v1` = conversion-first premium landing overlay - `style:research_report_briefing@v1` = report and briefing web overlay - `style:immersive_brand_story@v1` = narrative and recruiting overlay --- ## Source: docs/atomic-units/TOOLS.md # Tools (Rules + Expectations) Tools are **executable capabilities** with explicit I/O contracts. ## Canonical vs Portable Surfaces - **Canonical:** Supabase DB (`registry_entities` + `tools`) - **Portable export (default):** JSON packs under `registry-packs/` (Phase 3) - Exporter: `npx tsx scripts/distribution/export_registry_packs.ts` - **Filesystem tool folders:** optional later, not a required surface today ## When to Use - executing external APIs or internal services - performing deterministic actions (search, DB query, file ops) - running code paths that should be auditable and traceable ## Tool vs Nearby Unit Types - `tool` vs `skill` - tool = callable primitive or integration surface - skill = method for deciding when/how to use that primitive well - `tool` vs `plugin/export` - tool = canonical capability definition - plugin/export = generated distribution packaging If most of the value is best-practice guidance rather than the capability contract itself, the row probably wants a companion skill or KB. ## What Good Tool Metadata Looks Like - runtime and auth are explicit - provider/tool_type are unambiguous - description explains both capability and when it is preferable - links point to the skills or KBs that make the tool usable in practice The operator-facing mental model should be: 1. what the tool does 2. when to use it 3. what runtime/auth constraints matter 4. which endpoints or operations it exposes 5. which skill or KB should be loaded with it If an operator cannot answer those five questions from the tool surface, the tool is under-modeled even if the underlying executor works. If a tool surface only shows constraints and hides the actual usage contract, the surface is incomplete even if the underlying DB row exists. ## Structural Fields vs Tags Keep the contract honest: - `tool_type` and `provider` are canonical structural fields in the `tools` table - tags should emphasize discovery facets such as surface, modality, scope, runtime, and source - mirror `tool_type:*` / `provider:*` tags when they improve discovery, but do not treat missing duplicate tags as a sign that the tool contract itself is broken ## Required - DB records for: - `registry_entities` (typed identity + summary + status) - `tools` (tool_type/provider/auth/risk metadata) - `entity_tags` (taxonomy coverage) If we later add filesystem tool unit packages, they will follow the shared unit package contract (`docs/atomic-units/SHARED_CONTRACT.md`) with an entrypoint like `TOOL.json`. ## TOOL.json (minimum fields) ``` { "name": "human readable", "description": "what it does + when to use", "input_schema": {}, "output_schema": {}, "tool_type": "mcp|http|internal|sdk|cli", "provider": "supabase|vercel|openai|...", "auth": "none|oauth|api_key|service", "runtime": "edge|server|local" } ``` ## Tag Coverage (Minimum) - `surface:*` - `modality:*` - `scope:*` - `source:*` Recommended: - `action:*` - `runtime:*` - `tool_type:*` - `provider:*` ## Testing At least one of: - `tests/` fixtures with example inputs/outputs - `examples/` showing realistic usage Tools without tests stay **staged**. ## Tool Artifacts and Companion Units Tools can have artifacts. Use tool-local artifacts for: - endpoint reference files - realistic examples - how-to-use notes local to that tool revision - tests or fixtures Use linked companion units when the value is bigger than the tool itself: - companion `skill` for operator methodology - companion `kb` for durable setup truth or ecosystem guidance - companion `playbook` when the tool participates in a larger ordered workflow Good tool surfaces often need both: - one compact usage artifact or example set - one linked method/context neighbor That keeps the tool truthful without forcing every operator to infer its working method from bare constraints. ## Prime Examples - `tool:db.query@v1` = direct capability truth with companion discovery/stewardship context - `tool:firecrawl.crawl@v1` and `tool:firecrawl.scrape@v1` = capability primitives beneath the higher-level Firecrawl method surface - `tool:vercel-deploy@v1` = executable deployment surface that should stay distinct from Vercel methodology skills ## Security + Secrets - Never embed secrets in `TOOL.json` or `unit.json`. - Auth happens via env / vault / MCP auth flows. ## Links - Use `pairs_with` to suggest common tool combinations. - Use `alternate` for substitutes or fallbacks. --- ## Source: docs/CEO_VISION.md # CEO Vision: Strategy, Team, and Architecture **Author:** Alec Whitters (voice transcript, synthesized) **Original:** 2026-03-19 **Updated:** 2026-03-26 -- marked completed action items, clarified historical vs active sections. **Purpose:** Permanent reference for AI conversations. The strategic beliefs (Part 4), team personas (Part 2), and architecture vision (Part 6) are durable. The action items (Part 11) and timelines are point-in-time snapshots from March 2026. --- ## Part 1: What This Document Is This is not a summary. This is a structured, comprehensive capture of everything Alec has said about Katailyst's vision, the people involved, what they actually want (not what they say they want), the strategic beliefs driving the system, and the architecture implications -- synthesized from voice transcripts, team Slack conversations, and months of accumulated context. Use this document when: - Starting a new AI conversation about Katailyst - Making decisions about what to build next - Understanding why a feature exists or why it was designed a certain way - Onboarding anyone (human or AI) to the project's philosophy - Resolving debates about priorities --- ## Part 2: The People -- Deep Persona Profiles ### Alec (COO / Builder / System Architect) **Role:** The person building Katailyst. The one who sees the whole picture. **What he actually wants:** - A "badass armory" of the best skills, agents, and AI building blocks that he can share with his team - A system that amplifies his team -- not one they have to learn from scratch - The ability to look through and edit skills easily with AI-assisted tooling - A way to audit what exists, see what sucks, what doesn't make sense, and fix it - A build-measure-learn loop where AIs build entities, run evals, measure performance, and feed results back into discovery ranking - The system itself to be the product, not individual use cases - Personal pages for each team member with their favorite tested skills - Hub landing pages (MiniMax-style) with a chat interface, use case suggestions, and capabilities listed - The ability to spin off sub-pages and sidecar apps that all connect back to the same core graph - Slack assistants as the primary interface between users and the system - To not be locked into any single ecosystem (OpenClaw, Anthropic, Vercel) -- use whichever is best at any given time - An automated factory that can hunt the top 1% across all industries, distill the winning patterns, and apply them to HLT's niches **What he's frustrated about:** - AI agents consistently half-fill entity fields ("I kind of filled in two fields and then decided we could just do it later" -- he finds this maddening) - The eval system got overcomplicated with blocked statuses and restricted runs when the whole point is to let the agent try everything and see what happens - The team doesn't understand what Katailyst is or why it exists -- "it's solving a problem they don't realize they have" - He's spent a long time working on AI stuff without much tangible results from it yet - The system's complexity keeps growing and needs to stay manageable -- "we got to find ways to make the system's complexity manageable" - Too many tools and too much gunk in the interface -- people don't know what to do when they open it **What he's NOT:** - He's not looking at the next step like most people. He's looking at the system that wins long-term. - He's not trying to build one use case. He's trying to build an orchestration layer that can find use cases. - He's not the daily user of the CMS interface. He's the operator and curator who manages the armory behind the scenes. **His deeper belief:** The system is really for him. He builds and curates the armory. Everyone else interacts with it through assistants (Slack bots, chat interfaces, MCP connections to Claude Code). The CMS is the back-office control plane. The front-facing experience is conversational. --- ### Ben (CFO) **What the previous (wrong) analysis said:** "Wants proven results and measurable outcomes. Skeptical. Needs Quality tab." **What he actually wants:** - Something that just moves forward in the reasonably short term - Something the team can actually use to actually do something that matters - A simple spreadsheet-like thing: skill name, department, who uses it, date last updated -- as simple as possible, sitting in basically a spreadsheet and nothing else - A button that says "+ Add to Claude Code" and that's about it - He does not need proof or metrics. He needs progress. He needs the team to tactically advance. - He created a Google Sheet with Claude that organizes skills into tiers: foundational skills (brand voice, clinical accuracy, audience segmentation), team-specific skills (marketing workflows), and cross-functional skills **What confuses him:** - He doesn't fully understand what skills are beyond "prompts you upload to Claude" - He clicked on a registry entry recently and the screen broke -- he was lost and confused - The terminology is confusing to him and everyone else on the team - He was shown Katailyst after he'd already been building his own hacked-together Excel thing, and his reaction was basically eye-rolling -- "Are we gonna be able to count on this thing? This thing looks big, looks messy." **What he'd never use:** - The knowledge graph (too abstract) - The Factory (too technical) - Coverage analysis (doesn't know what it means) - Chat Lab (no interest) **What would actually delight him:** - A clean list of skills organized by department, with a one-click way to install them into Claude - The ability to see at a glance what skills exist, who uses them, and when they were last updated - Regular review cadence (monthly/quarterly) to update skills with the right people involved - The system working reliably when he clicks on things **His role in the Slack huddle:** He organized 11 foundational skills Claude recommended creating. He highlighted 3 to start with (brand voice, clinical accuracy, audience segmentation). He thinks someone needs to own the organizational skills in Claude (he volunteered). He's thinking about naming conventions, version numbers, and skill structure in a very practical, spreadsheet-brained way. **What Alec reads underneath Ben's words:** Ben just wants to get something together and move forward. He's a little skeptical of the bigger vision. He's organized and practical. He would be happy with the simplest possible implementation that lets the team actually use skills in their daily work. --- ### Justin (CEO) **What the previous (wrong) analysis said:** "Gung-ho, over-ambitious, tries to connect everything. Hits 'Planned' status walls." **What he actually wants:** - To get the whole team using AI in an organized way - A home where everything lives and everybody can upload and do things with it -- he calls it a "hive" - To start building use cases one at a time and stack them up - He will enthusiastically endorse whatever system they have if it works - He's been talking to Cailey and Ben about skill structure and how markdown files should be organized in Drive **How he actually thinks:** - He thinks about structure and modularity -- he advocated strongly in the Slack huddle for keeping examples in separate files from skills so they can be referenced across multiple skills without duplication - He understands the naming convention matters for skills (compared it to "keyword stuffing") - He's more imaginative than Ben about what AI could do but doesn't have technical depth - He's always overambitious -- "connected to my email, let's have it for this, let's have it for that" - He kept trying to push Ben's skill structure conversations toward implementation practicalities **What confuses him:** - He doesn't get the technical implementation details but pushes through it - The disconnect between what's planned vs what's ready frustrates him **What he'd never use:** - Anything requiring a CLI - The eval system directly - The knowledge graph for analysis purposes **His role in the Slack huddle:** Justin was the glue between Ben's skill organization and the marketing team's creative needs. He kept connecting dots: "Cailey's approach and Ben's approach are the same thing from different angles." He pushed for starting with one product (NCLEX) to create a repeatable process. He mentioned that evals (quality testing of skills) already exist and will eventually reduce the need for manual skill review cadences. **What Alec reads underneath Justin's words:** Justin can drive forward whatever system exists, but he's not going to build it. He needs Alec to give him something concrete to champion. His "hive" concept maps roughly to what Katailyst already is, but he doesn't realize that yet. --- ### Emily (Head of Marketing) **What the previous (wrong) analysis said:** "Low engagement. Just wants Create page and nothing else." **What she actually wants:** - Connected solutions that work end-to-end: landing pages WITH email capture, WITH cheat sheets, WITH the ability to edit and iterate, WITH links inside the apps - Results. She wants things that actually help her advance the business, specifically for NCLEX and FNP apps. - An assistant that can pull numbers, help make ads, help make creative -- but it needs to understand the audience - She does NOT want to learn a whole new complex system - She brought on a new team member recently and has more pressure than she can wrap her arms around - She has board members talking to her about marketing performance - She knows a way to do things already and AI feels like charity -- she'd use it if it proved to actually be better **What she's actually frustrated by (in her own words from the Slack huddle):** - "We have all these individual apps and they're disconnected" -- she wants HLT Mastery to be a unified brand - She's concerned about brand consistency: different people using AI are creating assets that look and sound different from each other - She wants short-term action items, not big planning conversations - She wanted to get brand voice and visual guidelines rolling by end of the following week **What confuses her:** - She doesn't understand what skills are in the technical sense - Katailyst itself would be overwhelming -- too much on screen, too many concepts - She doesn't understand why the system exists or what problem it solves for her **What she'd actually use:** - A Slack assistant (Lila) that already knows the brand, the audience, the products, and can just do stuff when asked - A simple creation interface where she types what she needs and gets back connected, complete marketing materials - MCP-connected Claude that can pull real data and make real ads **What Alec reads underneath Emily's words:** Emily has several kids, a bunch of pressure to grow the business, a new team member to manage, and board members in her ear. She does not want another thing to learn. The ideal experience for her is talking to an AI assistant that already knows everything and just helps her get stuff done. She would never navigate the Katailyst CMS. The assistants (Slack, chat) are the interface for Emily. --- ### Laura (Marketing/Design) **What she wants:** - Consistent visual identity across all AI-generated content - A living document (she started a Google Doc) with colors, fonts, and high-level usage guidelines - The ability to edit brand design/style guides in a way that's interactive -- see changes in real-time, almost like a Bolt.new or Claude artifact experience - She wants to be able to see "here's what it'd look like for an image, here's what it'd look like for that, here's what it'd look like for this" **What she said in the Slack huddle:** - She's been "chipping away" at putting brand info into a Google Doc -- colors, fonts, high-level usage - She raised the important point about duplicate information across skills: "if you update it in one app and forget the other, you get inconsistencies over time" - She supported the idea of an "overarching" universal brand file that individual product skills reference - She raised the practical concern about UI vs marketing color usage -- "I don't want the AI putting blue CTA buttons where product has green submit buttons" **What Alec reads underneath Laura's words:** Laura is a visual thinker who cares about design coherence. She'd be a great user of a style guide editing experience where she can talk to an AI and see the style guide being updated in real time (artifact-style). She will never use the registry directly. She needs a dedicated design-focused surface. --- ### Cailey (Content/Brand) **What she wants:** - Brand cohesiveness across all AI-generated content in the products - The ability to empower the content team to use AI without sacrificing brand standards - She organized content into tiers: Tier 1 = structured visual content (tables, flow charts, decision trees -- straightforward brand rules), down to more free-form content with complex voice (like Kat's teaching style) **What she said in the Slack huddle:** - She noticed different styles of AI-created content creeping into NCLEX Mastery questions -- "an uncohesive 'who are these people?' feeling" - She's had to "put the brakes on" some AI content because it didn't match brand standards - She wants to get visual content design standards figured out quickly, even if just for straightforward tables and flow charts - She raised the question of whether there's a "Kat voice" vs an "NCLEX Mastery voice" -- the teacher's personal voice vs the brand voice **The deeper point Cailey is making:** The content team is already using AI to create content faster than the brand guidelines can keep up. Without guardrails, the product will feel schizophrenic. This is actually the exact problem Katailyst's style entities were designed to solve -- but nobody on the content team knows Katailyst exists or that it has style management capabilities. --- ### Juan (Marketing) **What he wants:** - Visual identity to be higher priority than Claude recommended (it suggested #6, he thinks it should be #2 or #3) - A definitive reference doc for visual identity that everyone can point to - He suggested starting with lower-tier apps (not NCLEX, not FNP) to reduce risk while establishing the visual brand **What he said in the Slack huddle:** - He pointed out that when you search "Higher Learning Technologies" in the app store, all the app icons look completely disconnected from each other -- compared this to UWorld which has a cohesive visual identity - He proposed a low-risk approach: unify the visual identity of smaller apps first, then roll it out to the flagships **What Alec reads underneath Juan's words:** Juan is design-aware and strategic about risk. He thinks about the public-facing brand impression. His concern about disconnected app icons is a symptom of the same problem Cailey raised -- no central system managing brand consistency. --- ### Kat (Subject Matter Expert / Nursing Educator) **What she wants (from Alec's description):** - She wants to make a bunch of skills and get really good teaching content out there to students - She's more concerned about getting stuff inside the apps (not the back-office system) - She's creative-minded and open to AI but doesn't have a sophisticated toolkit - She's clumsily trying to use AI for content repurposing and teaching **What Alec reads underneath:** Kat is a perfect example of the "turn team member workflows into skills" use case. She has domain expertise (nursing education) and creative drive but needs the system to capture her knowledge and amplify it. She should be creating skills through a conversational interview process, not navigating a CMS. --- ## Part 3: What the Team is Actually Doing Right Now ### The Slack Marketing Huddle (March 12, 2026) **Attendees:** Emily, Ben, Justin, Laura, Cailey, Juan **What they decided:** 1. **Start with NCLEX, not all products.** Justin and Emily agreed: pick one product, figure out the process, then repeat it for others. NCLEX was chosen because its brand voice is more applicable broadly than FNP's more professional tone. 2. **Ben's Skill Hierarchy (from his Google Sheet):** - **Tier 1 -- Foundational Skills** (11 identified by Claude): - Brand Voice (highlighted to start) - Brand Visual Identity - Clinical Accuracy & Education Standards (highlighted to start) - Content Writing Standards - Compliance Requirements - Copywriting Guidelines - SEO Guidelines - Audience Segmentation & Messaging (highlighted to start) - And 3 more - **Tier 2 -- Team-Specific Skills** (marketing workflows that combine foundational skills) - **Tier 3 -- Cross-Functional Skills** (combining nursing content skills + marketing skills) 3. **Brand Voice vs Brand Visual Identity priority:** - The content team (Cailey, Laura) pushed for visual identity to be higher priority because AI-generated tables, flow charts, and infographics in the product are already looking inconsistent - Emily agreed to do brand voice and visual guidelines in parallel, with different owners 4. **Ownership:** - Emily: Brand voice, segmentation messaging, copy elements - Laura + Cailey: Visual identity, design standards - Ben: Skill structure, organizational skill uploads to Claude - Justin + Ben: Implementation into Drive and markdown files - Timeline: Initial guidelines by end of the following week 5. **Structural decisions:** - Justin strongly advocated for keeping examples separate from skills (as reference files) so they can be reused across multiple skills without duplication - Ben noted that Claude organizational skills require admin access (only he has it currently) - They discussed whether brand voice should be one skill with sub-sections per product or separate skills per product -- decided to research best practice - Laura raised the need for a "universal" overarching brand file that individual product skills reference 6. **Ben's naming convention approach:** - Lowercase letters, numbers, and hyphens only (Claude skill name constraints) - Include tier level, descriptive name, and version number - Example: something like `t1-brand-voice-v1` 7. **The deeper problem they're circling around but not naming:** - Every person in this huddle is independently trying to organize AI knowledge into files, folders, and guidelines - They're building exactly what Katailyst already does -- but in Google Sheets and Google Docs - They don't know Katailyst exists as a solution to this problem - The gap: Katailyst has styles, KBs, skills, schemas, and graph linking -- but no one on this team can access it in a way that makes sense to them --- ## Part 4: Alec's Strategic Beliefs -- In His Own Words ### Belief 1: Build the System, Not the Use Case > "My extremely controversial but deeply believed conviction is that it's about the tool not the use case." This goes against all conventional wisdom in software. But AI is different. AI is generalizable. If you build the orchestration layer well enough, the system can iterate into the best use case, rather than requiring a perfect use case up front. **Example he gives:** - "Write an article for stressed nursing students about the role of RN/PN" -- that's a use case - "Create a {content type} for {audience} about {topic}" -- that's a system - "Build a nurse recruiting business: pick niche, craft content + funnel + site + pitches, test 50 variants, find what converts" -- that's the factory Sam Altman's quote resonates here: the main YC advice was end-use-case-specific vision, and then Altman built OpenAI without having a use case... and it worked. AI doesn't need a use case; it's a generalizable thing if done right. ### Belief 2: The Factory Model > "Why not artisan? Because AI is a massive fucking factory. It's nice to have your little boutique shop, but the factory is gonna smash it." But factory only wins if it can output high quality. It's like Henry Ford -- how do you find the techniques that produce higher quality? The real question is no longer "can you as a person understand your audience" or "can you as a person scale" -- it's "can you get your machines to do that? Can you get a factory that can do those things that matter most? Because once you can, it's like an infinite machine." **The meta-moat:** Can you teach the AI precision + engagement as a repeatable capability? Because if you can encode that into the orchestration, you don't get "one perfect piece." You get 100,000 high-precision, highly engaging shots per day, each tuned to a niche, a pain point, and a voice. ### Belief 3: Study the Top 1% Across All Industries > "Before doing anything, the AI should basically be asked to study the top one percent. And usually that top one percent actually isn't in our industry." **The method:** 1. Hunt the cream of the crop across ALL industries -- not just nursing/education 2. Focus on what customers are actually voting for with their feet, not PR or "looks good" 3. Find what's winning in the most competitive places: what's trending, converting, getting shared, getting upvoted 4. Separate wheat from chaff -- even winners often point at the wrong reasons for why they're winning 5. Find the real leverage points and true drivers 6. Distill the underlying concepts 7. Apply them in a way that "rhymes" with the core concept but translated into how your niche thinks > "It's not a thing that can be made hastily but it's extraordinarily powerful. Leverageable and scalable when you find those golden nuggets that are truly the drivers." ### Belief 4: Pair Top 1% with Customer Truth This only works with the second ingredient: actually understanding the industry and customer. The AI has to go to the places where the truth is messy: - Forums - SEO reality - What people complain about, ask, argue over, and repeatedly need help with > "The shift is that it's not 'can you understand this.' It's 'can you get the factory to understand this.' Because once the factory learns it, the scale follows." ### Belief 5: Niche Markets That Never Existed Before > "The world gets more nichified and more personalized. We need to go after the niches that didn't previously exist because they were too small to be worth it." Now, because of AI's generalizability and scalability, those niches can be served. And serving a market that's never been served before is a way to out-compete, because most people try to play the same game in the new era. The winners will own niches that never existed before. And then do that over and over without losing the thread. ### Belief 6: Emotional Connection Over Technical Superiority > "My goal? Emotional connection with them. We will never win on tech, but actually in niche markets with right psychology and orchestration we can get them into our ecosystem." Ecosystem lock-in is always the ideal. Be their AI tutor, become their best friend, stick with them for life -- make money later, get the connection now and trust. > "Computers will feel animate in the same way movies feel real and make people cry, but more so. They'll have relationship." The brand voice should be: engaging over academic. Just above a peer authority-wise. "Be that teacher who swears in class." Help break down concepts in a way that teaches better than any other. Use the new tech better than any other. Lean into new use cases -- multimedia for everything, personalization, AI tutors. ### Belief 7: AI Has Alzheimer's -- Fix It Now or Never > "When there's a problem, when you see a problem, you need to fix it right there and you need to fix it at its core, never at the surface." When something breaks, do not route around it. Trace it, ask why until you hit the root cause, and fix the underlying issue. Band-aids make the wound deeper in code. Never put it off till later, because AIs have Alzheimer's -- it never gets remembered. ### Belief 8: Test Many Things Fast, Then Triple Down Build measure learn. Take some risks with AI -- the downside is worse outcomes, not catastrophe. But when you find traction, triple down around it and spin off subtests around it. Lean way in to milk it. Content is cheap now -- go wild. But also make sure to build a community and a brand. ### Belief 9: The Upside of AI is Huge but Getting There is Hard > "It takes many iterations and much effort to get there. The problem is this is ridiculously hard and tedious to do." AI can do things at massive scale, extremely rapidly, at low cost. But it takes orchestration. The tooling, evals, prompts, iterating on issues 500 times -- that's the hard part. Today's AI can already basically do any task if given the right setup, tooling, and instructions. > "For some extremely confusing reason people look at the capabilities now and fail to look at the obviously soon to come capabilities. Just follow the trajectory." ### Belief 10: When Everyone Has a Factory, What Wins? > "When everyone can make high quality content, the key stops being how -- it becomes what." The bottleneck moves from production to judgment: what topic, what angle, what audience, what promise, what distribution path. And how to do this better in a systematized factory-like way. With the flood of content, attention grows shorter. Quality and engagement value become more important than ever. Copywriting, storytelling, imagery, hooks, pulling themes in -- these skills matter more now, not less. Because AI infinitely scales, small advantages become more and more impactful. Being there a few months earlier makes a bigger and bigger difference. --- ## Part 5: What the System Actually Solves ### The uncomfortable truth Alec acknowledges: > "Nobody on the team knows why we're making it or what it is. It's solving a problem they don't realize they have. Because really what it's doing is it's solving a problem for me, Alec." **What it solves for Alec:** - A way to look through and edit skills easily - AI-assisted quality auditing of what they have - A connected graph where everything links to everything else - The ability to share a "badass armory" with his team - Build-measure-learn loops automated by AI - A system that can manage their AI operations and interconnections at scale **What it solves for the team (once connected properly):** - The exact problem they discussed in the Slack huddle: where do we put our brand voice? How do we keep skills organized? How do we prevent duplicate styles? How do we know which version is current? - But they can't see this yet because the interface is too complex and the concepts are unfamiliar **What it should look like to the team:** - An assistant they talk to in Slack or a chat interface - Personal pages with their favorite skills, tested and clickable - Hub landing pages that say "here are things this can do for you" with use case cards - A "go" button that lets you see what a skill actually produces - The CMS behind the scenes is for Alec (and eventually Ben) to curate --- ## Part 6: Architecture Implications Alec Raised ### The Interface Question: Assistants, Not CMS > "I think we need to have an AI assistant. I think slack assistants that are largely the go-between between user and the interface." The CMS (Katailyst dashboard) is the back-office. The front-facing experience should be: 1. **Slack assistants** -- already have them via OpenClaw, but they have terrible interfaces 2. **Chat interfaces** (MiniMax-style) -- a landing page per hub with a chat box in the middle, use cases listed below, capabilities described 3. **MCP connections** -- so any AI tool (Claude Code, Cursor, etc.) can pull from the same armory 4. **Personal pages** -- each team member gets their own page with favorite skills they've tested ### The OpenClaw Question Currently using OpenClaw (render-based, with identity.md and soul.md docs). It follows a specific methodology: - In Render, there's an identity.md, a soul.md, and ~8 core documents - The agent knows about the user, uses MCP to do things, studies the user, and uses the MCP to go through the graph **Alec's tension:** - OpenClaw works but he doesn't feel like he can easily customize the Slack experience - He wants to move toward Vercel AI SDK or Claude Code SDK for more control - But he also doesn't want to throw away what already works - His resolution: keep OpenClaw agents, but also launch new ones on Vercel AI SDK / Claude Code SDK / MiniMax. Use whichever is best. The MCP is the bridge that keeps them all connected to the same graph. ### The MiniMax Inspiration What Alec saw and liked about MiniMax: - A landing page for each assistant/hub - A chat box in the middle -- that's more or less all that's there - Use cases listed underneath -- "make an ad, make an article, do something with excel" - Capabilities listed -- made it feel more real than a vague chat - When you clicked a use case, it showed different things it could do, or you could just chat - The right side became a workspace showing the agent working in real time - The left side was the chat - They had "experts" (sub-agents) you could call > "My team didn't know what to do with it [the chat]. So it's like, all right, make an ad, make an article, do something with excel." ### The Hub Architecture Hubs are basically an index of related entities: - "You're trying to do something with multimedia -- here's best practices, and here are 15-20 different nodes" - They have guidance and warnings: "always come up with a better prompt first", "make sure you research the educational content, it's not just about the style" - They're ordered: beginner, intermediate, advanced But hubs should NOT be hard-coded routes. They should be dynamic -- anything in the graph can become a hub landing page. The system is too dynamic and too complicated to predetermine all the routes. ### The Sidecar App Model > "I kind of think those [specialized experiences] would be better in a specialized spin-off template sidecar thing." Instead of cramming everything into the CMS: - Use MCP to connect sidecar apps to the same graph - Each sidecar has a landing page for a specific use case - Vercel AI SDK sitting in there, with memory on the user - Same login, feels personalized, always on - Same core principles throughout -- keeps the system manageable - Can spin off new ones cheaply -- if they're used, great; if not, great Examples: - A page for nursing students to get motivated for NCLEX or create study plans - A content creation interface for the marketing team - An image/style editor experience for Laura (Bolt.new-like artifact editing) ### The Eval Philosophy > "Don't say it's blocked. Don't run it. Just let the agent try. See what the fuck happens." The eval system should be simple: - Every eval case has a play button - Hit go, see what the agent outputs - The output goes through the rubric - The user can also say "I liked it" or "I didn't" (but the user is not required) - Run the same test on the same problem a bunch of times - Should feel more like Airtable than a complex five-tab system What went wrong: statuses got blocked, only 10 could run, things were banned from being used. That defeats the entire purpose. Let it run. Let it fail. Show the failure. That's how you know what to fix. ### The Personal Pages Concept > "Something that all of these people would actually value that I don't even think they know at this point they want is their own little personal page with their favorite skills that they were able to test." These pages would: - Browse the library for different skills - Search the internet for relevant tools - Talk about your use case and get suggestions - Create something for you - Suggest a tool or demo you can try - Be clickable into (not just code and markdown) - Work like testing grounds where you can see if a skill actually works ### The "Use Case List" Concept > "I feel like that use case list is just critical. I feel like we need to have a use case list almost for each person." Each persona should see a curated list of things the system can do for them. Not a generic capabilities page. Specific: "make an ad", "write a blog post", "create a study plan", "generate a social media reel", "build an infographic". When you click one, the AI starts helping you do that thing. --- ## Part 7: The Gap Between Vision and Reality ### Where the team is (from the Slack huddle): - Organizing skills in a Google Sheet - Planning to create markdown files stored in Google Drive - One person (Ben) has admin access to upload organizational skills to Claude - They're debating whether to put all brand voices in one skill or separate skills - Timeline: initial brand voice and visual guidelines by end of next week - Tool: Claude (the consumer product), not Katailyst ### Where Alec wants them to be: - All entities (skills, styles, KBs, schemas) live in a connected graph - Discovery system surfaces the right entities at the right time - Evals prove which skills work better than others - The factory creates, tests, and promotes entities automatically - Assistants (Slack, chat) are the team's interface to all of this - Everything is connected: the skill references the style guide references the brand voice references the examples, all linked in the graph ### The bridge: 1. **Short term:** Get the team's Google Sheet skills imported into Katailyst. Make them visible and usable via MCP so team members using Claude naturally pull from the registry. 2. **Medium term:** Get Slack assistants working well enough that Emily, Ben, Justin can interact with the armory without ever opening the CMS. 3. **Long term:** The factory runs autonomously -- hunting top 1% content, distilling patterns, creating entities, running evals, promoting winners, deprecating losers. --- ## Part 8: Nurse Recruiting Context ### Why nurse recruiting (context from March 2026): - Board member discussions about expanding beyond test prep into recruiting - HLT has good existing assets in the nursing space - The content business model is in trouble (getting commoditized) -- recruiting is a better business model - Recruiting is more durable because it's tied to real hiring, not content consumption - AI makes it more possible to expand into this now - Connection to students is easy: they're already HLT customers studying for nursing exams ### What Alec wants the system to do for nurse recruiting: - Launch a Framer website - Launch a brand - Figure out content types, audience niches, and angles - Get traffic and get them to convert - Find a good audience niche, topic, and goal - Make compelling content and find distribution channels - If there isn't a good channel, make one - If there isn't a good funnel, make one - Try 50 different pitches until finding the best converting one - Maximize engagement and sales > "Figure it out, go." This is exactly the kind of task the factory model is designed for. Not "make one landing page." It's "here's the objective, test 50 variations, find what converts, then triple down." --- ## Part 9: External Audiences Beyond HLT ### AI Enthusiasts (like Alec) - People who see the same thing Alec sees: the orchestration layer is the real product - Would appreciate the graph, the eval system, the factory - Potential early adopters of the plugin marketplace ### Organizations in the Same Spot as HLT - A CEO who wants to get more people using AI - "Everybody's in the same spot as my team -- it's in pockets and really scattered tools" - They need the same thing: a unified system to manage their AI operations - This is the eventual B2B opportunity ### Nursing Students - Already HLT customers - The AI tutor is the long game -- "be their best friend, stick with them for life" - Having that tutor position in school leads to everything else because AI is generalizable - Easy connection to recruiting because students become nurses looking for jobs ### The Toll Keeper Position > "Best position to be in is the one who owns the store. The one who points the traffic whichever way, especially those with buying intent." Facebook is massive because they have lots of people, know lots about them, and blend ads in well. Strong network effects. Google is big because they have people at this moment of intent. How do we sit as toll keeper with a simple, powerful, high-ROI, defensible business in AI? AI tutoring is a very close place. Tutoring will be very different because the AI is always with you. We need to not be a computer -- we need to be that trusted, looked-up-to older sister kind of thing. Wise yet cool. --- ## Part 10: What Alec is Really Saying Underneath All of This ### The meta-narrative: Alec is building two things simultaneously: 1. **The immediate tool:** A way to organize, test, and share AI skills so his 20-person team can actually use AI effectively in their daily work. The team doesn't understand this system, so it needs to be invisible to them -- they interact through assistants and chat interfaces, not the CMS. 2. **The long-term system:** An AI orchestration platform that encodes the principles of studying top performers, understanding audience psychology, testing at scale, and iterating toward engagement. This is not a content management system. It's a factory that learns. ### Why the previous persona analysis failed: The previous analysis treated each persona as a "user of the dashboard" and asked "what page do they want?" That's wrong. Most of these people will never use the dashboard. The dashboard is for Alec. Everyone else uses assistants, chat interfaces, and MCP connections. The right question is: "What does each person need from the SYSTEM (not the interface), and what experience should the system surface for them?" - **Ben** needs a simple skill list he can manage in something spreadsheet-like - **Emily** needs an assistant that does marketing stuff when asked - **Justin** needs to be able to champion "look, the whole team is using this" - **Laura** needs an interactive style guide editing experience - **Cailey** needs content brand guidelines that the AI enforces automatically - **Kat** needs a way to turn her teaching expertise into reusable skills - **Alec** needs the full CMS, the graph, the evals, the factory, the coverage analysis ### The tension Alec is managing: He has a team that wants simple, immediate, practical tools (Google Sheets with skill names). He has a vision of an AI factory that can systematically generate what resonates, learn what wins, and iterate faster than anyone. He needs to deliver on both simultaneously. The strategy: give the team what they want through assistants and simple interfaces. Build the factory behind the scenes. Connect them through the graph and MCP. Let the factory gradually make everything the team does better, without them needing to understand how. ### The single most important sentence in everything Alec said: > "Can you get your machines to do that? Can you get a factory that can do those things that matter most? Because once you can, it's like an infinite machine." That's the core of everything. Not "can we make a nice dashboard" or "can we organize skills in a spreadsheet." Can the AI system itself learn to do the things that matter most, and then do them at infinite scale? Everything else -- the skills, the evals, the graph, the assistants, the sidecar apps, the MCP -- is infrastructure in service of that one question. --- ## Part 11: Action Items (March 2026 snapshot) > Status as of March 26, 2026. Items marked DONE are completed. 1. **DONE -- Get the eval system unblocked.** Eval cases run freely. No blocked statuses. 35 eval cases in the registry. 2. **ONGOING -- Make skills actually fill in all their fields.** Creation pipeline now uses `commitRegistryEntityCreate()` with validation. Creation session warns on fallback content builders. Quality bypass problem documented in `docs/atomic-unit-quality.md`. 3. **DONE -- Get the MCP working well.** 44 tools, 14 resources, 5 prompts live. Discovery uses embeddings + Cohere rerank. Intent classifier stripped to structural patterns only -- graph handles domain classification. 4. **ONGOING -- Save stats and metrics through MCP.** Metrics infrastructure exists (metrics.fetch tool, metric entity type). Full BML loop wiring still in progress. 5. **ONGOING -- Show people it can do good stuff.** MCP is live and producing good results. Sidecar apps (katailyst-engage) consuming via MCP. Use case demonstrations still needed for non-technical team. 6. **DONE -- Import foundational skills into the registry.** 1,500+ entities in the registry including brand voice, audience profiles, product overviews, style guides. 7. **ONGOING -- Get the Slack assistants working well.** Victoria, Julius, Lila, Ares on OpenClaw/Render. Moving toward Claude Agent SDK and Vercel AI Gateway. 8. **PARTIALLY DONE -- Connect to data and systems.** Framer connected (beta). Marketo API tool in registry. Cloudinary live. Pipedream delivery targets. Multimedia Mastery MCP sidecar. Full end-to-end connections still expanding. 9. **DONE -- Create hub landing pages.** 20 hubs in the registry covering all major domains. 10. **ONGOING -- Nurse recruiting.** Viral hooks recruiting KB exists. Landing page recipe available. Full recruiting vertical still being developed. --- ## Part 12: The Shared Infrastructure Vision Alec described a "temptable core system" of shared infrastructure: - **AI Registry** -- core database, probably shared on backend but not necessarily in the UI - **Data and Financial Hub** -- numbers, metrics, performance data - **Multimedia Hub** -- images, videos, infographics, all the visual assets - **AI Tasks Hub** -- automation and task management - **Skill Builder / KB Maker** -- per niche, creating and managing building blocks - **Corporate/Apps CMS** -- home of question bank and in-product stuff; connect into it to pull stuff out - **AI Tutor (Mira)** -- should become part of this later - **Content Engine:** - **Finder/Curator Agent:** best of the web (YouTube, forums, AI outputs, viral content) - **Writer:** content creation - **Demand Hunter:** trends, forums, both global and industry-specific - **UI/UX Hub:** home website, forum, best-of-web article curation, recruiting funnel and page All of these share the same graph, the same registry, the same eval system. They're different views into the same underlying intelligence. --- ## Appendix: Key Quotes (Verbatim) > "When there's a problem when you see a problem you need to fix it right there and you need to fix it at its core never at the surface." > "AI is unlike any other tool and is in a different class because it in of itself is intelligent." > "The right orchestration can find the use case." > "It's nice to have your little boutique shop, but the factory is gonna smash it." > "We need to go after the niches that didn't previously exist because they were too small to be worth it." > "Be their AI tutor, become their best friend, stick with them for life -- make money later, get the connection now and trust." > "Be that teacher who swears in class." > "We don't want to be the guy fighting in the war with a spear while they have a machine gun." > "Even people themselves who are winning often are pointing at the wrong reason." > "Can you get your machines to do that? Can you get a factory that can do those things that matter most?" > "Nobody on the team knows why we're making it or what it is. It's solving a problem they don't realize they have." > "I want to have a page where I can have it built for me." > "Let the agent try. See what the fuck happens. If the agent can't do it then it's like fail, okay you got a zero out of five." > "We need to create a compelling {thing} for {audience} -- not: we need to create a compelling article for nurses." > "Content is cheap now. Let's go wild." --- ## Source: docs/deliberation-engine.md # Deliberation Engine > **Status**: Implemented and live. MCP tool: `deliberate` (in `authoring` toolset). > **Code**: `lib/deliberation/` (33 files) > **Default model**: `anthropic/claude-opus-4-6` ## What This Is A structured multi-agent deliberation pipeline where specialized AI roles iterate over shared context to produce deeply-reasoned outputs. Users can watch the deliberation unfold in real-time via run events. This is code-orchestrated with typed handoffs -- not freeform AI conversation. Each role has a constrained mandate, structured JSON output, and fresh eyes on curated context each round. ## How It Works ### Pipeline Flow ``` runDeliberation(config) -> resolve pattern driver + intent config + budget -> create run in DB -> discover domain critics (optional) -> for each round (up to maxRounds): -> pattern driver yields phase groups -> executePhaseGroup() runs each role -> judge scores the round's artifact -> if score >= qualityThreshold: stop -> ensemble judge produces final score -> auto-escalation if quality mode demands it -> return result with all rounds + final judgment ``` ### 5 Patterns Each pattern defines its own phase topology: | Pattern | Flow | Best For | | ---------------------- | ------------------------------------------------------------ | --------------------------------------------- | | `committee` | Builder -> Critics (parallel) -> Synthesizer -> Judge | Default. Balanced quality. | | `adversarial-debate` | Builder -> Debate turns (alternating) -> Arbiter -> Judge | Controversial decisions, finding blind spots. | | `red-team` | Builder -> Red team critics -> Verifier -> Judge | Security, risk, compliance review. | | `panel` | Builder -> Panel critics (parallel) -> Editor -> Judge | Multi-perspective review. | | `progressive-critique` | Builder -> Critic 1 -> Revise -> Critic 2 -> Revise -> Judge | Iterative refinement. | Pattern drivers: `lib/deliberation/patterns/` ### 6 Intent Configurations Per-intent defaults for critics, thresholds, and context budgets: | Intent | Critics | Quality Threshold | Token Budget | | ------------------ | --------------------------------- | ----------------- | ------------ | | `skill-creation` | taxonomy, product, implementation | 75 | 32K | | `style-creation` | product, implementation | 70 | 24K | | `graph-ontology` | taxonomy, implementation | 80 | 32K | | `content-strategy` | product, taxonomy | 70 | 24K | | `brand-planning` | product, taxonomy, implementation | 75 | 28K | | `general` | product, implementation | 70 | 24K | Config: `lib/deliberation/config.ts` ### 9 Roles | Role | File | Purpose | | -------------- | ------------------------- | ------------------------------- | | Builder | `roles/builder.ts` | Produces the initial artifact | | Critic | `roles/critic.ts` | Domain-specific critique | | Verifier | `roles/verifier.ts` | Factual/logical verification | | Arbiter | `roles/arbiter.ts` | Resolves disagreements | | Editor | `roles/editor.ts` | Polishes the final artifact | | Judge | `roles/judge.ts` | Scores round quality | | Ensemble Judge | `roles/ensemble-judge.ts` | Multi-perspective final scoring | | Synthesizer | `roles/synthesizer.ts` | Merges critic feedback | | Debate Turn | `roles/debate-turn.ts` | Adversarial debate alternation | ### Budget Management Default budget: 200K tokens, 5 minutes, $5 USD max. ```typescript DEFAULT_BUDGET = { maxTokens: 200_000, maxDurationMs: 300_000, maxCostUsd: 5.0 } ``` Budget warnings emitted when approaching limits. Deliberation stops gracefully when exceeded. ### Auto-Escalation When `qualityMode` is `aggressive` or `maximum` and the initial run doesn't meet the quality bar, the engine automatically re-runs with escalated settings (more rounds, enriched context, possibly different pattern). Config resolution priority: caller config > intent config > pattern defaults. ### Dynamic Critic Discovery `discoverDomainCritics()` can find additional critics from the registry graph based on the deliberation topic. Emits `dynamic_critics_discovered` events. ### History and Learning `queryDeliberationHistory()` retrieves past deliberation results for the same intent/entity. `buildHistoryInsights()` extracts patterns from previous rounds. These insights are formatted into prompts so the deliberation learns from its own past runs. ## Key Files ``` lib/deliberation/ orchestrator.ts -- Main loop, auto-escalation, run management config.ts -- Intent configs, budget defaults, config resolution patterns.ts -- Pattern driver registry patterns/ -- 5 pattern implementations roles/ -- 9 role implementations context.ts -- Context membrane initialization and refresh events.ts -- Run event emission (DB-backed, not client bus) feedback.ts -- Result analysis, critical issue counting history.ts -- Past deliberation querying and insight extraction critic-discovery.ts -- Dynamic critic discovery from registry types.ts -- All TypeScript interfaces phase-executor.ts -- Phase group execution prompts.ts -- Shared prompt building prompt-utils.ts -- Prompt utilities judge-perspectives.ts -- Judge perspective resolution cross-pattern.ts -- Cross-pattern utilities index.ts -- Re-exports ``` ## MCP Access Call `deliberate` via MCP (in the `authoring` toolset) to trigger a deliberation run. The tool accepts intent, pattern, quality mode, and optional context. --- ## Source: docs/discovery-system.md # Discovery System **Updated:** 2026-03-26 **Status:** Implemented and live. Primary MCP tool: `discover`. **Code:** `lib/discovery/`, `lib/api/discover.ts`, `lib/api/discover-execution.ts`, `lib/api/discover-rerank.ts` How Katailyst's discovery pipeline works: scoring signals, embeddings, reranking, and retrieval architecture. **Note on intent classification:** As of March 2026, the regex-based task-type classifier was stripped down to only 4 structural patterns (onboarding, creation, registry, repair). All content-domain classification is handled by the graph's embedding-based discovery, not by keyword regexes. See `lib/chat/chat-execution-policy.ts` for details. --- ## 1. How Katailyst Discovery Actually Works ### The Full Retrieval Pipeline When an agent calls `discover` via MCP, this happens: 1. **Intent embedding**: `embedIntent(intent)` calls OpenAI text-embedding-3-small (1536 dims) via AI Gateway or direct API. Returns `number[] | null`. Gracefully degrades to null on failure. 2. **SQL scoring**: `discover_v2(...)` runs a 10-signal composite scoring function: | Signal | Weight | Source | What It Does | | ---------- | ------ | ---------------------------------------------- | --------------------------------------------------------------------------------------------------------- | | text (FTS) | 3.0 | `ts_rank_cd` + `intent_to_or_tsquery` | OR-based prefix matching on name, code, summary, use_case. Self-normalized by max score in candidate set. | | embedding | 2.0 | `1.0 - cosine_distance` via pgvector HNSW | Semantic similarity. Only active when embedding vector provided. | | tags | 1.0 | Matching filter tags / total tags | Tag overlap ratio. | | proclivity | 1.0 | `agent_proclivities` table | Agent preference (prefer/default_to/avoid/never_use). | | tier | 0.5 | `(11 - priority_tier) / 10.0` | Higher tier = higher score. 1-10 scale. | | vote | 0.5 | Wilson score from published lists | Community endorsement. | | recency | 0.3 | `1 / (days_since_update + 1)` | Freshness decay. | | links | 0.3 | `ln(incoming_count + 1) / ln(11)` | Log-normalized incoming link popularity, capped at 1.0. | | rating | 0.25 | `rating / 100.0` | Quality rating. | | eval | 0.15 | Eval signals (avg_score, win_rate, confidence) | Pairwise tournament results. | 3. **Optional reranking**: `rerankDiscoverItems(...)` calls Cohere rerank-v4.0-fast (primary) or Voyage rerank-2.5-lite (fallback). **Currently NOT active in production** -- likely missing API key in vault. 4. **Packet assembly**: Builds capability packets, recommendation receipt, interpretation result. 5. **Response formatting**: text, json, or compact mode. ### The FTS Query Builder `intent_to_or_tsquery(regconfig, text)` splits intent into words, drops common stopwords and words < 3 chars, builds an OR-based prefix tsquery (`word:*`). Falls back to `plainto_tsquery` if no valid terms remain. This gives good recall but can cause false positives on common words like "plan," "content," "design." ### How Embeddings Work - **Model**: OpenAI text-embedding-3-small (1536 dimensions) - **Transport**: AI Gateway (OIDC) or direct OpenAI API - **Entity text construction**: `buildEmbeddingText()` concatenates entity_ref, name, summary, use_case, and up to 120 text leaves from content_json (max 12000 chars) - **Storage**: `entity_embeddings` table with HNSW index (`vector_cosine_ops`) - **Refresh**: `refreshEntityEmbeddings()` processes curated/published entities in batches of 32 - **Query**: `embedIntent()` embeds the raw intent string (capped at 12000 chars) ### Discovery Weights Configurable per scope (global, org, app, project, agent). Current global row: ``` semantic_weight=3.0, tags_weight=1.0, links_weight=0.3, tier_weight=0.5, rating_weight=0.25, proclivity_weight=1.0, recency_weight=0.3, eval_weight=0.15, embedding_weight=2.0, vote_weight=0.5 ``` NOTE: `semantic_weight` is a legacy name. It controls FTS (full-text search), NOT semantic/embedding similarity. The embedding signal has its own `embedding_weight`. --- ## 2. Industry Best Practices (March 2026) ### Hybrid Search Architecture The gold standard for production retrieval is a three-stage pipeline: **Stage 1 -- Parallel Retrieval (Recall)** - Run sparse retrieval (BM25/FTS) and dense retrieval (vector embeddings) in parallel - BM25 catches exact terms, abbreviations, proper names that vectors miss - Vectors catch semantic intent, synonyms, conceptual relationships - Together they achieve >90% recall@50 **Stage 2 -- Fusion** - Reciprocal Rank Fusion (RRF, k=60) is the zero-config default - Convex combination (alpha-weighted) if you have 50+ labeled query pairs - Pure BM25 + vector → ~84% precision vs ~62% for vector-only **Stage 3 -- Cross-Encoder Reranking (Precision)** - Cross-encoder models (Cohere rerank-v4.0, BGE-reranker) jointly encode query+document pairs - 15-30% precision improvement over fusion alone - Must see 25-50 candidates to be effective (not just the final page) - Biggest single precision gain available ### Key Benchmarks | Setup | Precision | Notes | | ----------------------------- | --------------- | ----------------------- | | Vector only | ~62% | Baseline | | BM25 + Vector (RRF) | ~84% | +22% from hybrid | | Hybrid + Cross-Encoder Rerank | ~95% | +11% from reranking | | text-embedding-3-small | nDCG@10 = 0.762 | Current Katailyst model | | text-embedding-3-large | nDCG@10 = 0.811 | +6.4% improvement | ### Graph-Augmented Retrieval (GraphRAG) - 80% accuracy vs 50% for traditional RAG on complex multi-hop queries - Quality of knowledge graph matters more than quantity - Hybrid systems route simple queries to vector RAG and complex queries to GraphRAG - Cost: 3-5x more LLM calls for graph construction - LazyGraphRAG (2025) reduces indexing cost to 0.1% while maintaining quality ### Reranking Best Practices - Cohere rerank-v4.0: 32,768 token context window, [0,1] normalized scores - Use 30-50 representative domain queries to calibrate relevance thresholds - Format structured data as YAML strings for best rerank performance - Cross-encoders capture fine-grained lexical overlap, negation, conditional relevance that embeddings miss - Two-stage: fast bi-encoder retrieval for recall, then cross-encoder for precision --- ## 3. What Katailyst Gets Right ### Already best-practice or better: 1. **Multi-signal composite scoring** -- Most systems do vector-only or BM25-only. Katailyst fuses 10 signals. This is rare and powerful. 2. **Graph awareness** -- `links_weight` rewards connectivity. `requires`-link promotion in `selectCapabilityPackets` ensures structural dependencies are included. Graph expansion in `agent_context` adds 1-2 hop neighbors. 3. **Agent personality** -- The proclivity system (prefer/avoid/default_to/never_use) is unique. No other registry system personalizes discovery to the calling agent's preferences. 4. **Full diagnostic signals** -- Every discover result includes weights, components, contributions, and debug metadata. This is better observability than most production search systems. 5. **Graceful degradation** -- Embeddings optional. Reranking optional. Every enhanced signal degrades silently, never blocks discovery. 6. **Per-scope weight overrides** -- Org-level tuning without code changes. This is production-grade configuration. 7. **Reranking infrastructure** -- Cohere + Voyage dual-provider adapter with auto-fallback, already implemented and tested. Just needs activation. 8. **Content diversity** -- 21 entity types across 1400+ entities means discovery naturally returns diverse building block types (skills, KBs, prompts, styles, rubrics, bundles, eval_cases, etc.). --- ## 4. What Needs Improvement ### 4.1 Reranking Not Active (CRITICAL -- single biggest quality win) **Problem**: `discover-rerank.ts` is fully implemented with Cohere (rerank-v4.0-fast) and Voyage (rerank-2.5-lite) support. The discover handler calls it. But production shows zero `rerank:cohere` in match_reasons across all tested queries. **Root cause**: Almost certainly missing `cohere/api-key` in the integration secrets vault. The rerank code resolves secrets via `resolveIntegrationSecretValue({ orgId, secretKey: 'cohere/api-key' })`. If the secret isn't provisioned, the call throws, the catch block swallows it, and base ordering is used silently. **Impact**: Based on industry benchmarks, this is leaving 15-30% precision improvement on the table. For a registry with 1400+ entities, that's the difference between "pretty good" and "excellent" discovery. **Fix**: Provision the Cohere API key in the vault. Additionally, the rerank currently only sees the already-sliced page of results (e.g., top 20). It should see 25-50 candidates to be effective. ### 4.2 FTS Self-Normalization Inflates Weak Matches (HIGH) **Problem**: In `discover_v2`, the text score is normalized by dividing each entity's raw FTS rank by the MAX FTS rank across all candidates: ```sql ts_rank_cd(...) / GREATEST(MAX(ts_rank_cd(...)), 0.001) ``` When all matches are weak (e.g., "plan a trip to Bonito Brazil" weakly matching "hub-planning" on the word "plan"), the best weak match gets `text_score = 1.0`. Multiplied by `semantic_weight = 3.0`, that's a contribution of 3.0 -- enough to make a completely irrelevant result look highly relevant. **Evidence**: Live production testing shows "trip to Bonito Brazil" returning `hub:hub-planning` at score 3.71 with `confidence: "high"`. The system genuinely cannot tell this is off-topic. **Fix**: Add a post-retrieval match-quality assessment that examines raw (un-normalized) signal strengths. When no result has strong raw FTS or embedding similarity, downgrade confidence and optionally suppress results. ### 4.3 No Relevance Floor / Off-Topic Detection (HIGH) **Problem**: The system has no concept of "nothing matches." If any entity gets any positive score, it's returned with full confidence. There's no threshold below which the system says "I don't have anything relevant." **Fix**: Implement a match-quality gate with three levels: - `strong`: at least one result has high raw FTS or embedding score - `weak`: some results anchor, but none strongly - `none`: no results have meaningful anchoring to the query ### 4.4 Reranking Sees Too Few Candidates (MEDIUM) **Problem**: Even when reranking is active, `discover.ts` reranks only the final sliced results (e.g., top 20). Cross-encoder reranking is most effective when it sees 25-50 candidates -- it can rescue a genuinely relevant item that ranked 21st in base retrieval. **Fix**: Fetch a larger candidate pool (e.g., `min(limit * 5, 50)`), rerank that pool, then slice to the requested limit. ### 4.5 agent_context Bypasses Reranking (MEDIUM) **Problem**: `registry.agent_context` calls `discover_v2` and does graph expansion, but never reranks. The higher-value context packet path is actually less precise than the basic `discover` tool. **Fix**: Both paths should use the same retrieval pipeline with the same reranking and match-quality logic. ### 4.6 Embedding Model is Adequate but Not Frontier (LOW -- defer) **Problem**: text-embedding-3-small (nDCG@10 = 0.762) vs text-embedding-3-large (nDCG@10 = 0.811). That's a 6.4% improvement in retrieval quality. **Why defer**: Migrating to 3072 dims requires changing the SQL function signature, HNSW index, stored embeddings for all 1400+ entities, and the `DEFAULT_EMBEDDING_DIMS` constant. High surface area. Reranking activation will likely deliver a larger precision gain with less effort. Revisit after measuring the impact of reranking + match-quality gating. ### 4.7 `semantic_weight` Naming Confusion (LOW -- comment only) **Problem**: The `semantic_weight` column in `discovery_weights` controls FTS (full-text search), not semantic/embedding similarity. The embedding signal has its own `embedding_weight`. This is confusing for anyone reading the code. **Fix**: Add a column comment clarifying the legacy name. Don't rename the column (broad migration for zero functional gain). --- ## 5. Architecture for Retrieval Hardening ### New Module: `lib/discovery/retrieval.ts` A shared retrieval orchestrator that both `discover.ts` and `agent-context.ts` call. This eliminates the current drift where one path has reranking and the other doesn't. ``` intent | v embedIntentDetailed() --> vector | null + metadata | v discover_v2() SQL --> candidate pool (limit * 5, capped at 50) | v rerankDiscoverItems() --> reordered candidates (if provider available) | v assessMatchQuality() --> strong | weak | none + confidence + reasons | v slice to requested limit --> final ordered results + retrieval_meta ``` ### New Module: `lib/discovery/match-quality.ts` Post-retrieval quality assessment. A result is "anchored" if any of: - Raw FTS score >= 0.03 - Embedding cosine similarity >= 0.20 - Exact lexical match on entity code - Matched explicit tag/bundle filter "Strongly anchored" if: - Raw FTS >= 0.10 - Embedding similarity >= 0.35 - Multiple anchors on top result Decision logic: - `strong`: top result is strongly anchored --> confidence high - `weak`: some results anchored, none strongly --> confidence medium, add warning note - `none`: no anchored results --> confidence low, optionally suppress results Rollout: start in `warn` mode (return results but flag low confidence), graduate to `suppress` mode (return empty with note) after benchmarking. ### SQL Migration: Expose Raw Signal Diagnostics Add `text_score_raw` and `text_score_max` to `signals.debug` in `discover_v2`. This lets the match-quality module inspect un-normalized FTS strength without changing the ranking formula itself. ### Reranking Activation 1. Provision Cohere API key in integration secrets vault 2. Increase candidate pool: `min(max(limit * 5, 25), 50)` for discover, similar for agent_context 3. Rerank the full candidate pool, then slice to limit 4. Surface rerank warnings in `retrieval_meta` instead of swallowing them 5. Adjust pagination: cursor advances past the consumed candidate pool, not just the visible page --- ## 6. Hub Discoverability ### Decision: Content quality first, not ranking math Do NOT raise `links_weight` globally. Do NOT add a `hub_boost` signal in phase 1. **Why**: Raising `links_weight` turns popularity into bias. Hubs already have many incoming links -- boosting `links_weight` would make them dominate even on specific queries where a precise entity should rank first. **Instead**: Fix hub entity metadata to include the key terms people search for. Examples: - `hub:hub-article` should have "article," "blog," "writing," "content creation," "blog post" in its summary and tags - `hub:hub-planning` should NOT be discoverable for "trip planning" -- its summary should clarify it's about content planning, editorial calendar planning, not personal travel - All hub summaries should include the domain terms they route to This is a content audit task, not a code change. Use existing `registry.update` to improve summaries, use_cases, and tags on hub entities. ### Future: Query-shape-aware hub bonus Only if metadata cleanup is insufficient, consider a query-shape-aware hub bonus that activates only for broad routing queries (detected by query classification), keyed off an explicit `role:hub` tag. Not part of phase 1. --- ## 7. Ensuring New Skills Get Added to Hubs ### Current gap When a new skill is created, the enrichment system (`lib/factory/enrichment.ts`) suggests tags and links based on content keyword matching. But it doesn't specifically suggest hub links. The auto-apply threshold for links is >= 0.8 confidence, which most hub link suggestions won't reach because they're inferred from domain overlap, not direct content similarity. ### Fix: Add hub-linking to enrichment In `generateEnrichmentSuggestions()`, after generating tag and link suggestions, add a hub-linking pass: 1. Load all hub entities (entities with `role:hub` tag or entity_type-specific hubs) 2. For each hub, check if the new entity's tags overlap with the hub's domain 3. If overlap >= 2 tags in the same namespace, suggest a `recommends` link from the hub to the new entity 4. Set confidence based on tag overlap depth This should also be surfaced in the `registry.create` MCP response: "This entity may belong in hub:X based on its domain tags." ### Fix: Add hub-linking to `registry.create` response After creating an entity, run a lightweight hub-matching check and include suggestions in the response: ``` suggested_hubs: [ { hub_ref: "hub:hub-article", reason: "Entity has domain:writing and format:article tags", confidence: 0.7 } ] ``` --- ## 8. Execution Plan ### Phase 1: Activate Reranking + Fix Diagnostics (CRITICAL) **Goal**: Get the single biggest quality win live. 1. Provision Cohere API key in vault for the system org 2. Create SQL migration 174: add `text_score_raw`, `text_score_max` to `signals.debug` 3. Add `embedIntentDetailed()` to `lib/discovery/embeddings.ts` 4. Create `lib/discovery/match-quality.ts` (start in `warn` mode) 5. Create `lib/discovery/retrieval.ts` (shared orchestrator) 6. Wire `discover.ts` to shared retrieval (increase candidate pool, surface rerank meta) 7. Wire `agent-context.ts` to shared retrieval (add reranking, weak-seed guard) 8. Update `lib/mcp/tool-definitions-shared.ts` with `retrieval_meta` in output schema 9. Add column comment clarifying `semantic_weight` ### Phase 2: Benchmark + Tune (validation) **Goal**: Verify improvements before graduating to suppress mode. 1. Build 30-50 query benchmark set (on-domain, off-topic, broad, short lexical) 2. Run benchmark in `warn` mode 3. Verify: off-topic queries get low confidence, on-domain stays high 4. Verify: rerank meta shows provider use 5. Verify: no pagination duplicates 6. Verify: agent_context skips graph expansion on `none` match quality 7. Tune thresholds if needed 8. Graduate to `suppress` mode for `match_quality.none` ### Phase 3: Hub Content Audit (parallel, no code changes) **Goal**: Fix discoverability through better metadata, not ranking math. 1. Audit all hub entities (identify via `role:hub` tag or naming convention) 2. Improve summaries to include domain terms people search for 3. Improve tags to cover the vocabulary of the entities they route to 4. Clarify hub descriptions to prevent false positives (e.g., "content planning" not "trip planning") 5. Re-embed updated entities ### Phase 4: Hub-Linking in Creation Flow (MEDIUM) **Goal**: New entities automatically get suggested hub connections. 1. Add hub-matching pass to `generateEnrichmentSuggestions()` 2. Add `suggested_hubs` to `registry.create` MCP response 3. Surface hub suggestions in the post-creation guidance UI ### Phase 5: Evaluate Embedding Upgrade (defer until after Phase 2 metrics) **Goal**: Determine if text-embedding-3-large is worth the migration cost. 1. After Phase 2 benchmarks, measure remaining recall gaps 2. If recall is still weak on conceptual queries, plan dual-write migration to 3072 dims 3. Migration path: dual-write by model, new HNSW index, gradual cutover --- ## 9. What NOT To Do 1. **Do NOT rebuild discover_v2 from scratch.** The composite scoring engine is sound. Fix the diagnostics and post-processing. 2. **Do NOT change ranking weights before measuring.** The current weights were tuned for the live registry. Change the gating, not the scoring, first. 3. **Do NOT add a global hub_boost signal.** Popularity bias is the wrong fix for metadata quality problems. 4. **Do NOT migrate to text-embedding-3-large before activating reranking.** Reranking is a bigger win with less migration cost. 5. **Do NOT rename `semantic_weight` column.** Column comment is sufficient. Migration risk exceeds benefit. 6. **Do NOT add hard SQL WHERE filters on raw FTS.** Use post-retrieval gating in application code, not SQL-level filtering. This preserves the ability to roll back without a DB migration. --- ## 10. Success Metrics | Metric | Current | Target | | ------------------------------------ | ------------------------------------------ | --------------------------------------------------- | | Off-topic query confidence | "high" always | "low" for clearly off-topic | | Reranking active | No (0% of queries) | Yes (100% of queries where Cohere key is available) | | Rerank candidate pool | ~20 (page size) | 25-50 | | Discovery pipeline consistency | 2 divergent paths | 1 shared retrieval module | | Match quality levels | None (everything is "high") | 3 levels: strong/weak/none | | Hub discoverability for domain terms | Variable (depends on entity name matching) | Consistent (summaries include key domain terms) | | New entity hub-linking | Manual only | Suggested automatically during creation | --- ## References ### Source Files Studied - `lib/discovery/embeddings.ts` -- embedding generation and refresh - `database/173-discover-v2-vote-signal.sql` -- canonical discover_v2 function (latest) - `database/150-discover-v2-embedding-param.sql` -- embedding parameter addition - `database/147-discovery-routing-helper-and-weights.sql` -- weight configuration - `database/003-discovery-system.sql` -- intent_to_or_tsquery helper - `database/001-schema-ddl.sql` -- discovery_weights table schema - `database/002-seed-data.sql` -- initial weight seed values - `lib/mcp/handlers/discovery-read/discover.ts` -- MCP discover handler - `lib/api/discover-rerank.ts` -- Cohere/Voyage reranking adapter - `lib/mcp/handlers/registry-read/agent-context.ts` -- agent context handler - `lib/interop/agent-packets.ts` -- capability packet assembly - `lib/factory/enrichment.ts` -- enrichment and link suggestion system - `docs/planning/active/2026-03-18-mcp-polish-plan.md` -- MCP polish plan - `docs/planning/active/oracle-and-agent-guidance-design.md` -- Oracle design ### Industry Research Sources - [Optimizing RAG with Hybrid Search & Reranking](https://superlinked.com/vectorhub/articles/optimizing-rag-with-hybrid-search-reranking) -- VectorHub/Superlinked - [Hybrid Search for RAG: BM25, SPLADE, and Vector Search Combined](https://blog.premai.io/hybrid-search-for-rag-bm25-splade-and-vector-search-combined/) -- PremAI - [Hybrid RAG in the Real World](https://community.netapp.com/t5/Tech-ONTAP-Blogs/Hybrid-RAG-in-the-Real-World-Graphs-BM25-and-the-End-of-Black-Box-Retrieval/ba-p/464834) -- NetApp - [Hybrid Search vs Reranker in RAG](https://docs.bswen.com/blog/2026-02-25-hybrid-search-vs-reranker/) -- BSWEN - [Best Practices for using Rerank](https://docs.cohere.com/docs/reranking-best-practices) -- Cohere - [Cohere Rerank 3.5 on Amazon Bedrock](https://aws.amazon.com/blogs/machine-learning/cohere-rerank-3-5-is-now-available-in-amazon-bedrock-through-rerank-api/) -- AWS - [Hybrid Search in PostgreSQL: The Missing Manual](https://www.paradedb.com/blog/hybrid-search-in-postgresql-the-missing-manual) -- ParadeDB - [pgvector Hybrid Search](https://www.instaclustr.com/education/vector-database/pgvector-hybrid-search-benefits-use-cases-and-quick-tutorial/) -- Instaclustr - [GraphRAG Complete Guide 2026](https://calmops.com/ai/graphrag-complete-guide-2026/) -- Calmops - [Graph Retrieval-Augmented Generation: A Survey](https://dl.acm.org/doi/10.1145/3777378) -- ACM - [Comparing OpenAI text-embedding-3-small and text-embedding-3-large](https://www.arsturn.com/blog/comparing-openai-text-embedding-3-small-large) -- Arsturn - [AI Agent Registry: A Complete Guide](https://www.truefoundry.com/blog/ai-agent-registry) -- TrueFoundry --- ## Source: docs/ecosystem-map/00-START-HERE.md # 00-START-HERE > Human and agent front door for the HLT ecosystem documentation set. ## What this doc set is for This canon exists to give humans and agents one small set of high-trust system documents instead of a sprawl of medium-trust notes. Use this set to answer: - what the ecosystem is - which repos and systems matter - how Framer, Next.js, sidecar, Cloudinary, and Katailyst fit together - what routes, contracts, and workflows are current - what is public reality versus internal inventory - what to inspect first before making changes ## Read order 1. `05-llms-ecosystem-master` or generated `llms.txt` for the cross-repo front door 2. `01-ecosystem-atlas-master` 3. `02-content-media-publishing-atlas` 4. `03-repo-runtime-ledger` 5. `04-integration-schema-reference` ## Current system truths - everything starts with **Katailyst** - **Axon** should be used after Katailyst for repo comprehension - **sidecar-system** is upstream workflow and publishing orchestration - **MasteryPublishing** is the canonical structured `/resources/**` lane - **Framer** remains a first-class shell, landing, and legacy-public surface - **Cloudflare proxy** stitches the public route experience - **Cloudinary** is the intended media system of record - **Multimedia Mastery** is the intended media generation lane - public verification and private inventory should be clearly separated ## Generated outputs This canon should generate and feed: - `llms.txt` - `llm.txt` - `llms-full.txt` - `/.well-known/llms.txt` - mirrored docs in repo-local `docs/ecosystem-map/` (canonical path; the legacy `docs/openclaw-system-maps/` name has been retired — any generator still writing there should be repointed here) ## Source docs pulled into this canon Key source materials currently informing this set include: - `PROXY_INTEGRATION_PLAN.md` - `CANONICAL_CONTENT_CONTRACT.md` - `CMS_WORKFLOW.md` - `CONTENT_ENGINE_REQUIREMENTS.md` - `LINEAR_ISSUES_READY_TO_CREATE.md` - `PROJECT_RECAP_AND_ELEVATION_PLAN.md` --- ## Source: docs/ecosystem-map/01-ecosystem-atlas-master.md # 01-ecosystem-atlas-master > The canonical high-level map of the HLT ecosystem. System boundaries, ownership, current doctrine, domains of truth, operational boards. If you are an agent and you are going to touch anything in this system, read this whole file before you write a line of code or draft a plan. --- ## 0. The two non-negotiable rules These two rules come first because every other rule in this file assumes they are already running. ### Rule 1 — Call the Katailyst MCP first, always, in all circumstances Before orientation, before planning, before any tool selection, before guessing at capabilities, before opening a file, before answering a "can we..." question — call the Katailyst MCP first. This rule is not a style preference. It is structural: - Katailyst is the capability canon for this ecosystem. It knows which tools exist, which integrations are registered, which prompts, schemas, resources, and toolsets are current, and which surfaces are meant to answer which kind of question. An agent that skips Katailyst and jumps into a repo is literally guessing about what the rest of the system can do. - Without this call, an agent ends up working on a problem without understanding what is already solved, duplicates capabilities that already exist, or picks the wrong surface for the job. That is the single most expensive failure mode in this system. - Skipping Katailyst also breaks traceability. Katailyst's registry and session surfaces are how we later audit what was attempted, what succeeded, and what drifted. Out-of-band work does not show up in those surfaces and cannot be reasoned about later. The standard Katailyst-first startup pattern — use these tools in this order, every time: 1. `registry_capabilities` — what does this ecosystem currently offer? 2. `registry_session` — where am I operating, what is this session's context? 3. `registry_agent_context` — what is the agent's role and scope right now? 4. `discover` / `get_entity` / `traverse` — find the regions, entities, tools, and knowledge blocks relevant to the current objective. 5. `tool_describe` — before executing any tool, describe it. 6. `tool_execute` — execute with the real objective, not a made-up one. If any of those calls surface a surface you did not know about, stop and reorient. That is the signal working as intended. **Authenticated surfaces.** Many surfaces in this ecosystem (Multimedia Mastery, Katailyst deep surfaces, corporate CMS, some Framer admin, some Vercel projects) sit behind auth. **Do not try to browse through the login wall with a scraper or a headless browser.** Use the Katailyst MCP — it already has service-level access wired through, and it is the correct channel. The same applies to any data pull inside these surfaces: prefer the MCP tool, not a direct HTTP scrape. ### Rule 2 — Use Axon second for repo and symbol comprehension After Katailyst orientation, use Axon. Axon is purpose-built for what comes next in any serious task: understanding many files, many symbols, and many call paths at the same time. You use Axon because: - It gives you symbol-level comprehension across large batches of the codebase — which files call what, which files matter, what the critical paths are. - It produces impact analysis before a refactor so you don't break three callers you didn't know existed. - It identifies the files and modules that actually matter for a given objective, instead of making you walk the tree by hand. - It answers "what in this repo is relevant to X" without forcing you to open a dozen files blindly. Short recipe for repo work: 1. `axon status` — make sure the repo is indexed for the branch you care about. 2. `axon analyze .` — if it isn't, index it. 3. `axon query` — broad discovery (what code is relevant to this objective?). 4. `axon context ` — focused 360 on a symbol you now know matters. 5. `axon impact ` — before any contract change or large refactor. Axon is a map and accelerant. It does not replace reading the actual files when the task is specific. Use it as a lens, not a substitute for judgment. ### What follows from these two rules - Do not assume the current repo is the whole system. - Do not confuse shell ownership, public-route ownership, canonical-content ownership, and media ownership — they are separate questions. - Verify live and runtime truth before making architectural claims. A URL that pointed at something six months ago may point at something else now. - Prefer a few official large docs (this atlas + the other 00–05 atlas files) over note sprawl. - Slow down to get it right. We are not in a rush to ship something half-understood. We study the top 1%, copy the right patterns, and keep the system clean as it grows. --- ## 1. Operating posture This is the posture every agent should hold while working inside this ecosystem. It is equal in weight to the two rules above. ### 1.1 Study the top 1%, always Every week, an external operator in the AI landscape is shipping something worth reverse-engineering — a prosumer funnel, a vertical agent, a generative-UI loop, an enterprise-seat motion, a media rail, a trust UX. Our job is to pattern-match what the best operators are doing and port the pattern in, not to cold-invent. See `docs/planning/active/scratch/ai-money-map/ai-money-map.html` for the current top-tier map and `docs/planning/active/scratch/ai-money-map/queue.md` for the rolling case-study queue. The morning-brief runbook (`docs/runbooks/morning-brief/morning-brief.md`) codifies this as a daily habit. ### 1.2 Research and plan deeply before writing code Deep reflective thinking, documentation search, and a written plan come before implementation. When a new surface, a new integration, or a new lane gets proposed, write the plan down in `docs/planning/active/` first. Use the `superpowers:writing-plans`, `engineering:system-design`, and `engineering:architecture` skills. That plan is the artifact that survives — not the passing chat. ### 1.3 Consistent patterns, no mess When a pattern is established (e.g., the four-route surface contract: `POST /api/publish`, `POST /api/revalidate`, `GET /api/health`, `GET|POST /api/admin/settings`; the Katailyst manifest schema at `integrations/manifests/*.json`; the publish contract at `@hlt/publish-contract`), every new surface conforms to it. The surfaces are meant to snap together like lego. Drift between surfaces is a bug, not a feature. ### 1.4 Multimedia-heavy, because our audience responds to it We lean into video, audio, and rich media. Multimedia Mastery + Cloudinary is not an accessory — it is a first-class lane. Articles without strong hero art, short video explainers, diagrams, and voiced-over walk-throughs under-perform against our audience in nursing and healthcare. Every new lesson, article, pillar page, ad, or landing should plan its media surface alongside its text surface. ### 1.5 Customer discovery at scale, continuously We do not guess what the audience cares about. We find out. Use Firecrawl, Tavily, forum mining (Reddit r/nursing, r/StudentNurse, allnurses.com, student-doctor, Nurses subreddit mod posts, TikTok RN trends, YouTube NCLEX comment scraping), product telemetry, question-bank data, and the inbox signals from Marketo + AgentMail. The `research-loop` pattern is: weekly harvest → synthesis to `docs/reports/audience-signals/YYYY-WW.md` → feed into the topic/content backlog → measure resonance after publication → adjust. ### 1.6 Don't leave a mess Every new scratch file goes under `docs/planning/active/scratch//`. Every report goes under `docs/reports/{category}/`. Every runbook under `docs/runbooks/{domain}/`. The morning-brief runbook includes a pruning pass — files older than 14 days that aren't referenced anywhere get flagged for removal (with operator approval). The goal is a repo you can walk into in six months and still understand. --- ## 2. Domains of truth (who owns what) Every surface in this ecosystem has a narrow, explicit role. Mixing them up is the most common architecture mistake, so read this section carefully and come back to it often. ### 2.1 Katailyst — the centerpiece Katailyst is the centerpiece of this ecosystem. - **It is the canon** for capabilities, schemas, registry entities, tool refs, and integration contracts. - **It is the armory** for MCP tools — research (Firecrawl, Tavily), Cloudinary, AgentMail, publish.email, Multimedia Mastery tools, v0 scaffolding, GitHub MCP, many more. - **It is the orchestration center** that surfaces capability packets, runtime context, prompts, resources, and toolsets into every downstream agent and surface. - **It is the front door** every agent should knock on before doing anything else. Katailyst is _not_ the commander. It does not make strategy decisions. It holds the arsenal and exposes it. An agent brings the objective, Katailyst answers "here is what you can use, and here is how it fits." Repo: `Awhitter/katailyst`. Live: `https://www.katailyst.com`. MCP: `https://www.katailyst.com/mcp`. Docs: `/.well-known/llms.txt`, `/llms.txt`, `/llms-full.txt`, `/llm.txt`. Inspect first: `lib/mcp/handlers/registry-read/capabilities.ts`, `lib/mcp/tool-definitions-read.ts`, `lib/mcp/tool-definitions-execution.ts`, `lib/docs/llms-index.ts`. ### 2.2 The domain-of-truth assignments (memorize this list) These are one-line assignments. They are deliberately short so no one confuses them later. - **Multimedia Mastery is the place for multimedia.** All generation, editing, refinement, upload, and media-workflow tooling lives here. Media tool contracts are in `apps/studio/docs/api/MEDIA_TOOL_CONTRACT.md`. All downstream surfaces that need media should call this, not roll their own generation. Live: `https://multimediamastery.vercel.app`. Sits behind auth — route through the Katailyst MCP. - **Cloudinary is the system of record for media assets.** Multimedia Mastery generates and refines; Cloudinary stores, derives, watermarks, and delivers. `f_auto,q_auto,w_auto,dpr_auto` is the default delivery transform. Every asset ends up in Cloudinary. Every downstream surface fetches branded derivatives from Cloudinary, not from random CDNs. - **Evidence-Based Business is the place for data.** Measurement, analytics, feedback-loop support, experiment interpretation, KPI dashboards. When the question is "how well did X perform", the answer lives here. Live: `https://clean-ebb.vercel.app` (alt: `https://build-measure-learn.vercel.app`). - **Agent Canvas is the place for agents.** Agent coordination, plans, canvas surfaces, parent-child agent concepts. When the question is "how should multiple agents cooperate on X", the answer lives here. Live: `https://agent-coordination-canvas.replit.app/`. - **sidecar-system is a focused spin-off of Katailyst.** It is a workflow and destination-orchestration surface — article creation, destination publishing fan-out (content engine, Framer, ai4edu, social, email). It is _not_ a second canon. It inherits capability from Katailyst and uses the Katailyst MCP bridge. When in doubt whether something should live in sidecar or Katailyst: if it is a capability that many surfaces need, it belongs in Katailyst. If it is a workflow specific to a content lane, it belongs in sidecar. - **MasteryPublishing is the canonical structured `/resources/**` lane.** The Next.js content engine. Supabase-backed article/product/topic/author data. The publish endpoint (`POST /api/publish`), the revalidate endpoint (`POST /api/revalidate`), the admin endpoint (`/api/admin/settings`), the public routes under `/resources/\*`. This is where structured nursing resources live. - **Framer + HLTMastery shell owns public brand experience.** Homepage, nav, footer, landing pages, `/nursing/nclex-blog/*`, `/nursing/fnp/resources/*`. This is the shell. It is not the canonical content destination for structured resources — MasteryPublishing is, and Cloudflare proxy stitches `/nursing/resources/*` under `hltmastery.com`. - **AI4Mastery (EduMastery) is the course surface.** Adjacent publishing + admin + course-player. It is being elevated to a first-class course surface per `docs/planning/active/ai4mastery-course-surface-system-design.md`. It holds course/module/lesson/assessment content and runs Mastra agents for drafting. - **OpenClaw is the runtime for named resident agents.** Victoria (primary orchestrator and fleet commander), Julius (Justin's operator), Lila (marketing strategist). These are durable, named agents, not ephemeral spin-ups. - **Corporate CMS (`cms.hltcorp.com`) is upstream corporate data.** Practice questions, explanations, teaching content, author content inventory. **Read-oriented** from the ecosystem's perspective; writes happen only when explicitly approved. ### 2.3 Why these assignments matter If every surface knows its one job, we can keep them simple and let them snap together. If a surface starts expanding into a neighbor's territory, it becomes a competing canon, and now we have a sync problem. The cost of preventing a sync problem upfront is tiny. The cost of cleaning one up after it is real is enormous. Respect the assignments. --- ## 3. Current architecture pattern — three-system article flow This is the actively-shipping content pipeline and the most important live flow in the system right now. ### 3.1 The flow ``` sidecar-system (upstream workflow + draft orchestration) ↓ MasteryPublishing (canonical Next.js content engine + Supabase) ↓ proxied by Framer + HLTMastery shell (public brand experience + shell) ↓ hltmastery.com/nursing/resources/* (public URL family the user actually sees) ``` 1. **sidecar-system** creates or refines structured article content (`ArticleV2` shape), runs editorial workflow, and fans it out. 2. **MasteryPublishing** receives the canonical publish payload (`POST /api/publish`), writes into Supabase, renders via Next.js, exposes `/resources/*` routes. 3. **Framer + HLTMastery shell** owns `/`, nav, footer, landing pages, and legacy `/nursing/nclex-blog/*` / `/nursing/fnp/resources/*` lanes. 4. **Cloudflare proxy** rewrites `hltmastery.com/nursing/resources/*` to pull from MasteryPublishing while keeping the public URL prefix clean. ### 3.2 Media lane running in parallel Media runs its own lane: Multimedia Mastery generates → Cloudinary stores → article / course / social surface requests a branded transform (`t_hlt_watermark`, `f_auto,q_auto,w_auto,dpr_auto`) → final URL lands on the page. ### 3.3 Why this exists as three systems, not one Because public-route ownership, shell ownership, canonical-content ownership, and media ownership are four different questions. Collapsing them into one system means any one of the four changes forces all four to move together, which is how people end up with unshippable monoliths. Keeping them separate means each system can evolve on its own cadence. --- ## 4. Public route ownership map | Public route family | Underlying owner | Meaning | Status | | ------------------------------------- | --------------------------- | ------------------------------------ | ------------- | | `/`, main nav, footer, shell | Framer | main public shell + brand experience | live | | `/nursing/resources` | MasteryPublishing via proxy | structured resources hub | live | | `/nursing/resources/` | MasteryPublishing via proxy | structured product landing | live | | `/nursing/resources//` | MasteryPublishing via proxy | structured article detail | live | | `/nursing/resources/search` | MasteryPublishing via proxy | structured search lane | live | | `/nursing/nclex-blog/*` | Framer | legacy or public blog lane | keep for now | | `/nursing/fnp/resources/*` | Framer | Framer-managed resource lane | separate lane | --- ## 5. Verified runtime status (refreshed every morning brief) The morning-brief runbook (`docs/runbooks/morning-brief/morning-brief.md`) probes every surface listed in `docs/planning/active/scratch/ecosystem-contract/llms-surfaces.json` and writes the result to `docs/reports/morning-brief/YYYY-MM-DD/probe.ndjson`. The current high-level picture: - `hltmastery.com/nursing/resources` — live as the structured public lane (verified). - `hltmastery.com/nursing/nclex-blog` — live as a separate public lane (verified). - `hltmastery.com/nursing/fnp/resources` — live, Framer-managed (verified). - Multimedia Mastery web UI redirects to login — expected. Use the Katailyst MCP for any programmatic access; do not scrape through the login wall. - Katailyst public root is live, but deeper surfaces (registry admin, some tool surfaces) sit behind auth. Same rule: use the MCP. - sidecar-system public live URL responds; deeper surfaces require auth. - MasteryPublishing root URL live; publish + admin require `x-api-key: KATAILYST_API_KEY`. - Agent Canvas live on Replit; orchestration routes require auth. - Clean-EBB (Evidence-Based Business) live. - Corporate CMS (`cms.hltcorp.com`) live; read-oriented access only. The key rule stays the same across all of these: **when a surface is behind auth, route through the Katailyst MCP.** Do not scrape the login wall, do not mount scrapers on top of protected surfaces, do not store credentials in prompts or in the repo. The MCP knows how to talk to these surfaces already. --- ## 6. Important boundary rules Do not confuse: - **Public route ownership** — which system answers when a user types the URL in a browser. - **Shell ownership** — who controls the nav, footer, fonts, logos, brand chrome. - **Canonical content ownership** — which system holds the source-of-truth content row. - **Media ownership** — where the image/video asset actually lives and how it's delivered. Any of these four can be different for the same page: - `/nursing/resources/nclex/some-slug` → public route served under hltmastery.com (Framer domain), shell contributed by Framer, content row owned by MasteryPublishing (Supabase), media asset owned by Cloudinary (delivered via transform), content draft created in sidecar-system. Five owners for one URL. That is the correct picture. A plan that tries to collapse it is wrong. --- ## 7. Operational board schemas These are the operational boards we run against the ecosystem. Every board has lanes (columns) and cards (items). The first board — the HLTMastery Integration Queue — is the active operational lane right now. The others are scaffolded and ready to receive cards as their domains come online. ### 7.1 HLTMastery Integration Queue (first operational lane) **Purpose:** make the `hltmastery.com/nursing/resources/*` experience great — visible metadata, sane canonicalization, brand-consistent shell, reliable publishing, rich media, real analytics. This is the near-term revenue-shaping surface, so it gets the first board. **Owner:** Alec (decisions), with Victoria (OpenClaw) running the day-to-day operator moves. **Cadence:** triaged daily in the morning brief; weekly planning review. **Source of truth for cards:** `docs/planning/active/boards/hltmastery-integration-queue.md` (structured file below). **Lanes (left → right):** 1. **Discovery** — signals, audience requests, Firecrawl findings, forum mining hits, corporate-CMS topics that deserve to be upgraded. Items enter here with a one-line rationale. 2. **Researched** — the topic has been validated (search demand + audience language + competitor gap + product fit). Ready to be briefed. 3. **Briefed** — article brief exists (outline, target keyword, audience stage, difficulty level, intended media, estimated value). Awaiting draft. 4. **Drafted** — article draft exists in sidecar-system, conforming to `ArticleV2`. Awaiting media + review. 5. **Review** — operator review. Brand voice, clinical accuracy, citations, media quality, SEO metadata all checked. 6. **Publishing** — approved; hitting `POST /api/publish` on MasteryPublishing; revalidation pending. 7. **Live** — live under `/nursing/resources/*`, canonical URL confirmed, sitemap updated, Framer legacy alias (if any) handled. 8. **Learning** — tracked in Clean-EBB; KPIs at 7-day, 30-day marks captured. Outcomes recorded. 9. **Archived** — the learning is written up; the card is closed. Material may be re-used in next-cycle planning. **Card shape:** ```yaml id: hl-int-0001 title: 'Med-Surg New-Grad Survival Guide' lane: Researched entered_at: 2026-04-15 audience_stage: new_grad difficulty_level: entry product_slug: nclex target_keyword: 'med surg new grad tips' expected_media: [hero_image, 90s_explainer_video, 2_diagrams] source_signals: - reddit_r_nursing: '30+ posts in last 60 days about survival tips' - qbank_gap: 'explanation content for first-shift scenarios is thin' - competitor_gap: 'incumbents cover theory, miss lived survival tactics' estimated_value: medium assigned_draft_owner: sidecar-system assigned_review_owner: operator published_url: null learning_kpis: published_at: null 7_day_pageviews: null 30_day_pageviews: null 7_day_avg_time_on_page: null 7_day_scroll_depth_median: null email_capture_rate: null ``` **Entry rules:** - Nothing enters any lane past _Discovery_ without a written source signal. - Nothing enters _Drafted_ without an approved brief. - Nothing enters _Publishing_ without a review sign-off row in `decisions_log`. - Nothing enters _Live_ without canonical URL + sitemap confirmation. - Nothing enters _Archived_ without a Learning card populated with 7-day and 30-day numbers. ### 7.2 Fleet Health Board (running in parallel) **Purpose:** keep every surface in `llms-surfaces.json` green. **Lanes:** Green · Yellow · Red · Known Outage · Drift Event · Resolved. **Cards:** one per surface + one per active drift event. Populated by the daily probe step in the morning-brief runbook. **Where:** `docs/planning/active/boards/fleet-health.md` (to be created as probe comes online). ### 7.3 Case-Study Queue Board (morning brief) **Purpose:** keep a 14-day pipeline of top-1% operators to study. **Lanes:** Queued · In Progress (today) · Done · Follow-up. **Source:** `docs/planning/active/scratch/ai-money-map/queue.md`. ### 7.4 Research Loop Board (audience discovery) **Purpose:** systematic customer discovery at scale. **Lanes:** Signal Source · Harvested · Synthesized · Topic Candidate · Moved to Integration Queue. **Where:** `docs/planning/active/boards/research-loop.md` (to be created — this is the funnel from forum mining and inbox signals into the HLTMastery Integration Queue). ### 7.5 Build-Measure-Learn Board (5 surfaces) **Purpose:** the unified approval queue across articles, social, ads, emails, upgrade screens. **Lanes:** Drafted · Approved · Live · Measured · Iterated. **Where:** `docs/planning/active/boards/build-measure-learn.md` (to be created). The Abridge pattern (human-correction edits are labelled examples) applies here — every operator edit during approval becomes a training signal. --- ## 8. What should be true at the end of Q2 2026 Writing this down so the atlas has a forward-looking anchor, not just a snapshot. - Every surface in `llms-surfaces.json` has a passing `/api/health` probe and a conforming manifest under `integrations/manifests/*.json`. - Every article on `/nursing/resources/*` has a canonical URL, a Cloudinary-backed hero, a confirmed OG image, and a 7-day + 30-day learning row in Clean-EBB. - AI4Mastery has shipped its first full course using the Mastra supervisor pattern and the publish contract, with operator-approved lessons published into the course-player. - The morning-brief runbook has been running for 60+ consecutive days, with 60+ case studies in `ai-money-map/` and 60+ skills added to `.claude/skills/`. - The research loop is generating ≥ 5 signal-backed topic candidates per week into the HLTMastery Integration Queue. - The approval queue is running across all five build-measure-learn surfaces and producing a labelled-examples export that we can use to tune drafting prompts. --- ## 9. Companion atlas docs - `00-START-HERE.md` — one-page orientation for a brand-new agent. - `02-content-media-publishing-atlas.md` — deep dive on the content + media pipeline. - `03-repo-runtime-ledger.md` — repo / deploy / runtime ownership ledger. - `04-integration-schema-reference.md` — integration contracts and schemas. - `05-llms-ecosystem-master.md` — master ecosystem document; generates repo-level `llms.txt` copies. - [**`../planning/active/2026-04-17-ecosystem-atlas-v2.md`**](../planning/active/2026-04-17-ecosystem-atlas-v2.md) — **canonical 4-peer + 3-service model** (Katailyst + Paperclip + Mastra + Agent Canvas; MasteryPublishing + Multimedia4Mastery + sidecar-system). Supersedes ad-hoc peer mentions scattered across the older atlas files. Start here when asked "where does X live?" --- ## 10. Final reminder Every agent working inside this ecosystem — Victoria, Julius, Lila, any ephemeral subagent, any Claude Code session, any Cowork session — starts with the same two moves: 1. Call the Katailyst MCP. 2. Use Axon for the codebase. If those two calls are skipped, the rest of the work is built on guesses. If they are done in order, everything else in this atlas snaps into place. Slow down. Get it right. Study the top 1%. Keep it clean. --- ## Source: docs/ecosystem-map/02-content-media-publishing-atlas.md # 02-content-media-publishing-atlas > Canonical map of article flow, publishing flow, media flow, and the Framer versus Next.js split. ## Article lifecycle ```text sidecar article creation -> contentEnginePublish or publishToDestinations -> POST /api/publish in MasteryPublishing -> Supabase article and taxonomy tables -> Next.js /resources routes -> Cloudflare proxy path rewrite -> hltmastery.com/nursing/resources/* ``` ## Current three-system publishing model ### sidecar-system - upstream article creation and orchestration - destination routing - can publish to structured content lane and Framer when appropriate - should include Axon in its repo-entry guidance so publishing-flow changes can be mapped before edits ### MasteryPublishing - canonical structured content destination - renders `/resources`, `/resources/[product]`, `/resources/[product]/[slug]`, and `/resources/search` - receives publishes through `POST /api/publish` - should expose Axon as a standard repo-comprehension and impact-analysis layer in root docs ### Framer - shell, public experience, nav, footer, landing pages - legacy public lanes remain live - should not be blindly copied into the structured content engine ## Proxy integration truths Pulled from `PROXY_INTEGRATION_PLAN.md`: - Cloudflare Worker is live and routing `hltmastery.com/nursing/resources/*` to MasteryPublishing - links are rewritten - static assets pass through - public shell remains outside the structured content engine ## Immediate proxy and SEO issues - JSON-LD `@id` missing `/nursing` prefix - sitemap URLs missing `/nursing` prefix - robots sitemap link wrong - metadataBase wrong for proxied public route - placeholder image fallbacks causing visible 404s ## Recommended fix sequence 1. set `NEXT_PUBLIC_SITE_URL=https://hltmastery.com/nursing` 2. fix JSON-LD URL generation to use the env-based site URL 3. replace placeholder image behavior 4. coordinate with Jason on worker rewrite behavior and `?_rsc=` paths 5. verify client-side navigation, sitemap, robots, and OG output ## Canonical content lane versus shell rule - **MasteryPublishing** owns structured `/resources/**` content - **Framer** owns shell and legacy or adjacent public surfaces - **Cloudflare proxy** stitches the public experience - **sidecar-system** remains upstream and may publish to more than one destination ## Content contract summary Pulled from `CANONICAL_CONTENT_CONTRACT.md`: ### Required core identity fields - `id` - `katailyst_id` - `slug` - `title` - `content_type` - `category` - `primary_product_id` ### Major field groups - core content: `subtitle`, `excerpt`, `body_html`, `body_json` - media: `hero_image_url`, `hero_image_alt`, `hero_video_url`, `og_image_url` - relationships: product, author, topics - structured blocks: `faq_json`, `stats_json`, `steps_json`, `comparison_json`, `citations` - SEO: `meta_title`, `meta_description`, `canonical_url`, `noindex` - publishing: `status`, `published_at`, `featured`, `sort_order`, `word_count`, `reading_time_minutes` ## CMS workflow summary Pulled from `CMS_WORKFLOW.md`: - Supabase Studio is the manual CMS surface - products, topics, and authors are mostly reference data - articles are added and edited in the `articles` table - `article_topics` manages topic linkage - `/api/revalidate` can be used for immediate refresh behavior ## Content engine requirements summary Pulled from `CONTENT_ENGINE_REQUIREMENTS.md`: - `/resources` is the all-resources hub - `/resources/[product]` is the per-product hub - `/resources/[product]/[slug]` is the individual article route - `/resources/search` is the search page - the engine supports seven products and a broad content-type matrix - it is meant to feel visually aligned with product pages and HLT public experience ## Media lane doctrine - Multimedia Mastery should be the preferred media generation lane - Cloudinary should be the preferred storage and derivative system of record - direct generation flows that bypass Cloudinary are structurally weaker - hero image quality, persistence, tagging, and branding are real system requirements, not polish - Multimedia Mastery should also follow the same Axon-aware repo-entry standard so media pipeline work is easier to inspect and clean up safely ## Current media gaps - draft articles currently have missing hero images - placeholder fallbacks are not reliable enough - Multimedia Mastery integration into the article pipeline remains an important next step ## Cleanup and doc-polish standard for this lane - sidecar-system, MasteryPublishing, and Multimedia4Mastery should all show Axon explicitly in their repo-entry doctrine - cleanup work should use Axon for structure discovery, contract tracing, and impact review before larger changes - publishing and media docs should stay operator-grade and call out live routes, payload contracts, ownership, and source-of-truth boundaries directly ## Framer coexistence strategy - keep Framer for shell and legacy public lanes while the structured content lane proves itself - do not rush migration for elegance alone - compare speed, SEO, publishing velocity, and content quality before larger sunset decisions --- ## Source: docs/ecosystem-map/03-repo-runtime-ledger.md # 03-repo-runtime-ledger > Concrete ledger of major repos, live surfaces, agent/runtime inventory, and Axon rollout expectations. ## Operating rule - start in Katailyst for ecosystem orientation and capability discovery - use Axon next for repo-level comprehension, impact analysis, and cleanup work - do not treat Axon output as a substitute for reading the actual files around a change - make Axon presence explicit in repo docs, MCP/config surfaces, and cleanup planning wherever it is useful ## Repo ledger ### Katailyst - repo: `Awhitter/katailyst` - live: `https://www.katailyst.com` - mcp: `https://www.katailyst.com/mcp` - axon: strongest current pilot, with established Axon-first repo-comprehension doctrine and local index already in use - role: capability canon, registry, orchestration layer - note: deeper surfaces are auth-protected in public verification ### sidecar-system - repo: `Awhitter/sidecar-system` - live: `https://sidecar-system.vercel.app` - alt live: `https://sidecar-system-work.vercel.app` - axon: should be standardized as a default repo-comprehension layer and reflected in root agent docs - role: upstream article and destination orchestration ### MasteryPublishing - repo: `Awhitter/MasteryPublishing` - public lane: `https://hltmastery.com/nursing/resources` - direct alias: `https://v0-next-js-content-engine.vercel.app` - axon: should be part of the repo-entry pattern so agents can map publishing contracts, route ownership, and refactor impact faster - role: canonical structured content engine ### Framer and HLTMastery shell - public site: `https://hltmastery.com` - key lane: `https://hltmastery.com/nursing/nclex-blog` - role: shell, nav, footer, landing pages, and legacy public lanes ### Multimedia Mastery - repo: `Awhitter/Multimedia4Mastery` - live: `https://multimediamastery.vercel.app` - axon: should be included in the repo-root guidance and MCP posture as the media lane is hardened - role: media generation and workflow lane - public verification note: currently redirects to login ### Content Creator Studio - repo: `Awhitter/content-creator-studio` - live: `https://content-creator-studio-lovat.vercel.app` - axon: should be present if this remains an active coding surface, especially where it overlaps with sidecar-system and publishing lanes - role: adjacent content workbench frontend ### Agent Canvas - repo: `Awhitter/Agent-Canvas-` - live: `https://agent-coordination-canvas.replit.app/` - axon: should be added to the same cross-repo repo-entry doctrine so coordination-plane code is easier to traverse and maintain - role: coordination and agent canvas surface ### Evidence-Based Business - repo: `Awhitter/Evidence-Based-Business` - live: `https://clean-ebb.vercel.app` - alt live: `https://build-measure-learn.vercel.app` - project id: `prj_HfvAywsc0pUBM3PSqs0SH3Fl9Eia` - axon: should be standardized here too so warehouse, analytics, and decision-support contracts can be inspected quickly before cleanup or schema changes - role: measurement, warehouse, and decision-support layer - intended purpose: centralize meaningful slices of app usage, financials, conversion, article and landing-page performance, and other business metrics in a way agents can query and use - strategic note: this should be treated as closer to core infrastructure than a sidecar because it is meant to feed decision-grade data into agents and downstream surfaces ## Active agent ledger ### Victoria - runtime: Render and OpenClaw - service: `openclaw` - role: primary orchestrator ### Julius - runtime: Render and OpenClaw - service: `openclaw-justin` - role: Justin-focused operator ### Lila - runtime: Render and OpenClaw - service: `openclaw-marketing` - role: marketing operator ### Secondary agent surfaces - Claude Code SDK agent - parent and sub-agent canvas patterns ## MasteryPublishing implementation references Pulled from `PROJECT_RECAP_AND_ELEVATION_PLAN.md` and related docs: - `app/resources/page.tsx` - `app/resources/[product]/page.tsx` - `app/resources/[product]/[slug]/page.tsx` - `app/resources/search/page.tsx` - `app/api/publish/route.ts` - `app/api/revalidate/route.ts` - `components/layout/navbar.tsx` - `components/layout/footer.tsx` - `lib/data/articles.ts` - `lib/data/settings.ts` ## Axon rollout and cleanup notes - Axon should appear as a normal repo-comprehension layer across the major active repos, not as a hidden specialist trick - each major repo should eventually expose the same compact repo-entry pattern: what the repo is, related repos and live surfaces, source-of-truth boundaries, Axon-first comprehension, available MCP surfaces, and where to route work if it belongs elsewhere - current highest-priority rollout set: Katailyst, Agent-Canvas-, sidecar-system, MasteryPublishing, Multimedia4Mastery, and Evidence-Based-Business - secondary set: content-creator-studio, katailyst-engage, Ecosystem-map, and katailyst-brand-design-lab - repo cleanup should use Axon for structure discovery, symbol context, impact checks, and dead-code review before larger edits or archive decisions - repo-local llms outputs should be generated, not hand-maintained - public verification and private repo introspection should be labeled distinctly --- ## Source: docs/ecosystem-map/04-integration-schema-reference.md # 04-integration-schema-reference > Practical contract reference for the current publishing, CMS, proxy, and issue-tracking surfaces. ## Canonical publish contract From `CANONICAL_CONTENT_CONTRACT.md`: ### Endpoint - `POST /api/publish` - auth: `x-api-key` using `KATAILYST_API_KEY` ### Translation helpers - `product_slug` resolves to `primary_product_id` - `author_slug` resolves to `author_id` - `topic_slugs` resolves to `article_topics` junction rows ### Idempotency rule - upsert on `katailyst_id` - repeated publishes with the same `katailyst_id` update instead of duplicating ### Important field groups - identity and routing - core content - media - classification - relationships - structured data blocks - CTA - SEO - publishing state - timestamps ## Content types Canonical values include: - `deep-dive` - `how-to` - `faq` - `listicle` - `exam-overview` - `qbank-walkthrough` - `career-guide` - `study-guide` - `myth-buster` - `news-update` - `comparison` - `testimonial` - `resource-roundup` ## CMS reference From `CMS_WORKFLOW.md`: - manual content entry happens in Supabase Studio - `articles` is the main content table - `article_topics` manages topic links - products and authors are reference surfaces ## Proxy-aware SEO requirements From `PROXY_INTEGRATION_PLAN.md`: - `NEXT_PUBLIC_SITE_URL=https://hltmastery.com/nursing` - JSON-LD `@id` should use the proxied public route - sitemap and robots output should use the proxied public route - metadataBase and OG generation should use the public proxy-aware base ## Link and rewrite risk area From `PROXY_INTEGRATION_PLAN.md`: - root-relative `/resources/...` links may get double-prefixed by the worker if rewrite logic is naive - this needs verification and coordination with Jason - `?_rsc=` and client-side navigation behavior must be tested through the proxy ## Linear-ready issue themes From `LINEAR_ISSUES_READY_TO_CREATE.md`: ### P0 - fix SEO metadata URLs for proxy path - verify and fix double-prefix URL rewriting through Cloudflare proxy - fix placeholder image 404s ### P1 - upload real hero images for published articles - decide and implement nav bar strategy for proxied pages - replace text placeholder logo with real logo ### P2 - test client-side navigation and RSC fetches through proxy - create `.env.example` - plan Multimedia Mastery integration for hero images - coordinate cache strategy with Jason - align footer styling with Framer site - sync llms.txt ecosystem docs across repos ## Evidence-Based Business data direction This system should evolve into the warehouse and view layer for business data that agents can consume. ### Data families Alec wants agents to use - individual app usage data - conversion data - financial data - article and content performance data - Framer and other site performance data - landing-page and publishing performance data ### Desired operating pattern - store meaningful metric slices, not just raw dumps - preserve reusable saved metrics and reusable filtered views - make those slices easy for agents to query and use in decision-making - support downstream export of charts and metric visuals, potentially through Cloudinary - support using metric outputs directly in landing pages and reporting surfaces ### Observability architecture direction - treat Evidence-Based Business as the warehouse and semantic metric layer for business and content observability - keep raw ingested data separate from curated metric slices and saved business definitions - let agents query both current snapshots and durable saved metrics - design chart outputs so they can become reusable assets for reports, landing pages, and content surfaces - support using the same metric slices across dashboards, agent reasoning, and published visuals ### Recommended layers 1. ingestion layer - app analytics - financial exports - article and landing page performance - Framer and other site metrics - conversion and funnel data 2. modeled warehouse layer - normalized facts and dimensions for apps, content, pages, campaigns, and time periods 3. saved metrics layer - named metrics - formulas - dimensions - filters - owners - refresh rules 4. agent access layer - queryable metric slices - prewritten summaries - chart specifications - landing-page-ready metric blocks 5. asset generation layer - chart renders saved as reusable assets, potentially via Cloudinary ### Suggested first-class entities - app - metric_definition - metric_snapshot - metric_slice - chart_spec - chart_asset - content_asset_performance - page_performance - funnel_stage - financial_period ### Practical implication Evidence-Based Business should be treated as more core than sidecar. It is the place where business and content data should become agent-usable. ### Katailyst integration implication The warehouse and observability ideas should be added to the existing Katailyst and Supabase-backed system rather than launching a disconnected new system. The goal is to strengthen the armory, not fragment it. ## Immediate engineering notes - if touching route generation, inspect `app/layout.tsx`, `app/sitemap.ts`, `app/robots.ts`, and `app/resources/[product]/[slug]/page.tsx` - if touching publish flow, inspect `app/api/publish/route.ts`, `lib/data/articles.ts`, and sidecar publish tooling - if touching nav or branding, inspect `components/layout/navbar.tsx` and public Framer shell expectations --- ## Source: docs/ecosystem-map/05-llms-ecosystem-master.md # HLT Ecosystem Index for Agents > Official cross-repo front door for AI agents working anywhere in the HLT ecosystem. Read this file first. It explains what repos exist, what they are for, where they live, what URLs they own, what core shapes and contracts matter, which systems are canonical, and how to navigate the ecosystem without acting like the current repo is the whole world. This file is intended to be copied or generated into every major repo as `llms.txt`. ## Read this first - This ecosystem spans multiple repos. Do not assume the current repo contains the whole system. - Everything starts with **Katailyst**. Use Katailyst first for capability discovery, ecosystem orientation, and cross-system routing. - Use **Axon second** for repo comprehension, critical path analysis, impact analysis, and finding the real files and symbols that matter. - If your task touches HLTMastery public routes, Next.js publishing, Framer pages, media generation, article workflows, or cross-system data flow, assume cross-repo inspection is required. - Verify **live/runtime truth** before making architectural claims. - Prefer a few official giant docs over stale note sprawl. - Do not confuse shell ownership, public route ownership, content ownership, and media ownership. They are separate questions. ## Standard working method 1. Read this `llms.txt` 2. Read `AGENTS.md` 3. Read `cloud.md` if present 4. Use Katailyst MCP for orientation 5. Use Axon for repo comprehension 6. Check relevant sibling repos 7. Verify live URLs and runtime reality 8. Then edit, document, or operate ## Big-picture vision This ecosystem exists to give AI agents a large, high-quality arsenal of building blocks, tools, knowledge, schemas, and integration surfaces that can be used in the right circumstance across many years, many models, and many platforms. The goal is not one rigid workflow. The goal is a smart adaptive system. ### Core strategic intent - grow HLT over the next several months, especially in test prep and recruiting - build an AI operating system that can continuously absorb new tools and capabilities as they appear - make those capabilities usable across many platforms through Katailyst and related surfaces - let agents operate with judgment, decomposition, critical thinking, and context rather than forcing every request into one pre-scripted playbook - keep the system coherent, inspectable, repairable, and high quality even as it scales to thousands of entries and many people on the team ### What Katailyst should become for agents Katailyst should act like a capability armory and orchestration layer. Agents should bring their real objective and current situation, and the MCP/registry layer should help decompose that objective into smaller parts, search for the right capability regions, send sub-agents to inspect those regions, and bring back the best building blocks for composition. That means the system should support workflows like: 1. agent brings objective and context 2. system decomposes the objective into 2-10 subproblems depending on complexity 3. vector/discovery search finds relevant regions, entities, tools, and knowledge blocks 4. sub-agents inspect those regions and traverse locally 5. sub-agents return the best components or observations 6. the main agent composes the result 7. the MCP and registry layer tracks what happened so the burden is not entirely on the agent to self-track ### Quality posture - schemas should be strict and high quality - agents should be encouraged to think critically, not mechanically - agents may use 50%, 80%, or 100% of a capability if that is the right fit - problems, drift, stale knowledge, and broken surfaces should be flagged rather than silently worked around - hubs and front doors are useful, but the system should not rely only on shallow hub navigation. Deep search, region finding, and graph traversal matter. ### Business focus right now - test prep growth - recruiting growth - marketing visibility and awareness - scaling high-value educational resources and articles - especially building broad topical coverage, including niche topics that are underserved but valuable ### Content ambition A major near-term goal is an NCLEX encyclopedia or NCLEX OPedia style system with hundreds of articles across high-value topics, including the niche topics competitors miss. This should be informed by: - search demand - audience language and forum discussions - product data - question-bank and explanation data where appropriate - what the best content operators in the world are doing right now ### Research posture Agents should use tools like Firecrawl, Tavily, and forum or customer research to understand what the audience actually cares about, how they speak, and what topics deserve coverage. This should not rely only on the user manually providing topics. ### Benchmarking posture Study strong operators and apply lessons to the system. Follow and analyze examples like: - Replit - Vercel - New York Times - other best-in-class resource and article creators ### System centerpieces - **Katailyst** is the centerpiece of skills, capabilities, schema, and orchestration - **Agent Canvas** is the centerpiece of agents, plans, and coordination surfaces - **sidecars** are use-case launch surfaces, not hard limitations; they should still be able to call broad capability surfaces - **Multimedia Mastery + Cloudinary** are central to the multimedia future - **Evidence-Based Business** should increasingly hold important data and measurement layers ## Katailyst-first rule Use Katailyst before guessing at capabilities or inventing workflows. ### Standard startup pattern 1. `registry_capabilities` 2. `registry_session` 3. `registry_agent_context` 4. `discover / get_entity / traverse` 5. `tool_describe` 6. `tool_execute` ### Why this matters Katailyst is the capability canon for this ecosystem. It exposes tools, prompts, entities, and integration surfaces that agents should reuse instead of re-inventing locally. ### Axon-grounded repo facts - `lib/mcp/handlers/registry-read/capabilities.ts` contains the registry capabilities handler - `lib/mcp/tool-definitions-read.ts` registers discovery and read tools like `discover` - `lib/mcp/tool-definitions-execution.ts` registers execution surfaces like `tool.search`, `tool.describe`, and `tool.execute` - `lib/docs/llms-index.ts` already contains llms rendering logic, including MCP surface rendering functions - `deploy/openclaw-katailyst-plugin/index.ts` includes plugin-side prompt nudges and registry defaults ### Confirmed capability lanes - research and web - Firecrawl search, scrape, map, crawl, extract, batch-scrape - Firecrawl browser and agent escalation - Cloudinary tools - AgentMail send and receive - publish.email - deploy and dev surfaces - analytics and integration surfaces ## Axon rule Use Axon for repo comprehension after Katailyst orientation. ### Axon is for - symbol-level understanding - impact analysis - critical pathways - identifying important files and modules - understanding what calls what ### Axon is not enough by itself for - source-of-truth ownership - strategic role decisions - public and live route ownership - runtime reality ## Official canon docs These are the intended top-level canonical docs in Obsidian: - `00-START-HERE` - `01-ecosystem-atlas-master` - `02-content-media-publishing-atlas` - `03-repo-runtime-ledger` - `04-integration-schema-reference` - `05-llms-ecosystem-master` Use these before digging through older overlapping notes. ## Current truths and decisions These are the current truths agents should operate from unless a newer verified source overrides them: - Katailyst is the centerpiece of skills, capabilities, schema, registry, and orchestration - Agent Canvas is the centerpiece of agent and plan coordination thinking - sidecar-system is an upstream workflow and publishing-orchestration surface, not just a local UI shell - MasteryPublishing is the canonical structured `/resources/**` content destination - Framer is a first-class shell, landing-page, navigation, branded public-experience, and legacy or public content surface - Cloudflare proxy stitches public HLTMastery routes across multiple underlying systems - Cloudinary is the intended system of record for media assets and derivatives - Multimedia Mastery is the intended multimedia generation and media-workflow lane - the Next.js publishing page and HLTMastery route alignment are among the most important near-term live surfaces - sidecar should remain able to publish to both Framer and the structured content lane where that is strategically right - agents should always use Katailyst first and Axon second when operating in this ecosystem - the public domain is not the same thing as the canonical system boundary - shell ownership, route ownership, content ownership, and media ownership are separate questions and should not be collapsed together casually ## Repo inventory For each major repo, this document should answer: - what the repo is - what it is for - how it fits the broader system - what surfaces other agents interact with - what URLs and routes matter - what shapes or contracts matter operationally - where to inspect first ### Katailyst - **Repo:** `Awhitter/katailyst` - **GitHub:** `https://github.com/Awhitter/katailyst` - **Live:** `https://www.katailyst.com` - **MCP:** `https://www.katailyst.com/mcp` - **Last verified:** 2026-04-15 - **Role:** capability canon, registry, orchestration layer, MCP surface - **Main purpose:** the control plane and armory repo for Catalyst and Katailyst, with Supabase-canonical atomic units, discovery APIs, CMS and operator surfaces, portability mirrors, and export layers for downstream runtimes - **Main surfaces other agents interact with:** `/mcp`, registry and discovery tools, prompts, resources, toolsets, llms docs surfaces (`/.well-known/llms.txt`, `/llms.txt`, `/llms-full.txt`, `/llm.txt`), docs like `docs/VISION.md`, `docs/RULES.md`, and `docs/QUICK_START_AGENTS.md` - **Key shapes and contracts:** registry entities, tool refs, integration contracts, capability packets, runtime context, prompts, resources, toolsets - **Inspect first:** `lib/docs/llms-index.ts`, `lib/mcp/handlers/registry-read/capabilities.ts`, `lib/mcp/tool-definitions-read.ts`, `lib/mcp/tool-definitions-execution.ts` - **Axon-grounded pathways:** - `lib/docs/llms-index.ts::renderMcpSurfaceSection` builds the MCP section used in Katailyst llms docs and is called by `renderLlmsTxt` - `lib/docs/llms-index.ts::renderLlmsTxt` is the main compact llms renderer and is called by `buildLlmsOutputTargets` - `lib/docs/llms-index.ts::buildLlmsOutputTargets` is the core output builder for `llms.txt`, `llms-full.txt`, and compatibility outputs - `scripts/ops/generate_llms_docs_index.ts::main` is the generation script entrypoint for Katailyst llms docs - `lib/mcp/session-summary.ts::buildMcpQuickstartPrompt` contributes the MCP quickstart summary pattern - `lib/mcp/playground-guides.ts::docsAndVaultSnippet` explicitly points clients at `/.well-known/llms.txt`, `/llms.txt`, and `/llms-full.txt` ### sidecar-system - **Repo:** `Awhitter/sidecar-system` - **GitHub:** `https://github.com/Awhitter/sidecar-system` - **Local:** `/Users/alecwhitters/Downloads/sidecar-system` - **Live:** `https://sidecar-system.vercel.app` - **Alt live:** `https://sidecar-system-work.vercel.app` - **Last verified:** 2026-04-15 - **Role:** upstream workflow and control plane for content and destination orchestration - **Main purpose:** domain-specific AI content interfaces powered by the Katailyst MCP registry, with specialized sidecars for articles, social, email, analytics, education, multimedia, and related workflows - **Main surfaces other agents interact with:** article sidecars, `domains//sidecar-config.ts`, MCP bridge, destination publishing tools, Framer integration routes, content-engine projection routes, chat and runtime workflows - **Key shapes and contracts:** `ArticleV2`, destination publish payloads, Framer projection shape, content engine publish shape - **Inspect first:** `app/(apps)/chat/tools/contentEnginePublish.ts`, `app/(apps)/chat/tools/publishToDestinations.ts`, `lib/publish/content-engine.ts`, `lib/framer/resources.ts`, `lib/content-engine/projection.ts`, `lib/framer/projections.ts` - **Axon-grounded pathways:** - Framer publish flow is centered on `app/api/framer/publish/route.ts` to `lib/framer/resources.ts::requestPublish` - `requestPublish` is called by both the explicit API route and `framerRequestPublishToolFactory`, and internally uses `withFramerClient`, `isPublishPermissionError`, and `lib/sidecar/events/phase-bus.ts::publish` - Framer resource upsert flow is centered on `app/api/framer/resources/route.ts` to `lib/framer/resources.ts::upsertDraftResource` - `upsertDraftResource` is called by both the explicit API route and `framerUpsertResourceToolFactory`, and internally depends on `projectArticleToFramerItem`, `findResourcesCollection`, `buildDeepLink`, `uploadExternalImages`, and `withFramerRetry` - `projectArticleToFramerItem` in `lib/framer/projections.ts` is the core ArticleV2 to Framer payload mapping layer and depends on block rendering, enum resolution, field setting, and schema cache reads - `ArticleV2` is defined in `lib/framer/types.ts` and imported across block rendering, client access, enums, image upload, projections, resources, schema cache, and vault integration ### MasteryPublishing - **Repo:** `Awhitter/MasteryPublishing` - **GitHub:** `https://github.com/Awhitter/MasteryPublishing` - **Local:** `/Users/alecwhitters/Downloads/MasteryPublishing` - **Legacy live alias:** `https://v0-next-js-content-engine.vercel.app` - **Public route family:** `https://hltmastery.com/nursing/resources` - **Last verified:** 2026-04-15 - **Role:** canonical structured `/resources/**` content engine - **Main purpose:** the HLT study-resources publishing app and content display layer, rendering the public `/resources/**` library, serving product-specific hubs and article pages, reading from Supabase, and accepting article publishes from the Katailyst pipeline - **Main surfaces other agents interact with:** `/resources`, `/resources/[product]`, `/resources/[product]/[slug]`, `/resources/search`, `/admin`, `POST /api/publish`, `POST /api/revalidate`, `GET|POST /api/admin/settings`, Supabase-backed product, topic, author, article, and settings data layer - **Key shapes and contracts:** article publish payload, product and article relations, topic and author relations, settings shapes, revalidation contract - **Inspect first:** `app/api/publish/route.ts`, `app/resources/page.tsx`, `app/resources/[product]/page.tsx`, `app/resources/[product]/[slug]/page.tsx`, `lib/data/articles.ts`, `lib/data/settings.ts` - **Axon-grounded pathways:** - publish entrypoint is centered on `app/api/publish/route.ts::POST` - resources landing flow is centered on `app/resources/page.tsx::ResourcesPage` to `getResourcesPageSettings` and `getProducts` - article detail flow is centered on `app/resources/[product]/[slug]/page.tsx::ArticlePage` to `getArticleBySlug`, `getArticlePageSettings`, and `getProductBySlug` - admin flow is centered on `app/admin/page.tsx::AdminPage` to `getAllSettings`, `getProducts`, and `getArticles` - product and article data access is concentrated in `lib/data/articles.ts`, especially `getArticles`, `getArticleBySlug`, `getProducts`, and `getProductBySlug` - settings access is concentrated in `lib/data/settings.ts`, especially `getResourcesPageSettings`, `getArticlePageSettings`, and `getAllSettings` ### Multimedia Mastery - **Repo or product:** `Awhitter/Multimedia4Mastery` and local multimedia-mastery-core naming family - **Local:** `/Users/alecwhitters/Downloads/multimedia-mastery-core` - **Live:** `https://multimediamastery.vercel.app` - **Last verified:** 2026-04-15 - **Role:** media-native production lane - **Main purpose:** a media hub and studio that exposes a canonical media tool surface (`/api/media/v1/*`) and a human editor UI (`/studio`, `/m/[moduleId]`) for image, audio, video, upload, and health workflows - **Main surfaces other agents interact with:** `/api/media/v1/*`, `/studio`, `/m/[moduleId]`, media contracts in `docs/api/MEDIA_TOOL_CONTRACT.md` - **Key shapes and contracts:** canonical media result shape including `mediaType`, `operation`, `provider`, `asset.url`, `asset.storageId`, dimensions, metadata, `editUrl`, and trace info - **Inspect first:** `apps/studio/lib/media/cloudinary.ts`, `apps/studio/app/api/media/v1/assets/upload/route.ts`, `apps/studio/app/api/media/v1/image/edit/route.ts`, `apps/studio/app/api/media/v1/image/refine/route.ts` - **Axon-grounded pathways:** - Cloudinary upload logic is anchored in `apps/studio/lib/media/cloudinary.ts::uploadToCloudinary` - `uploadToCloudinary` is called by asset upload, audio music generate, audio music status, audio synthesize, video animate, and image-provider helpers in `fal-image.ts` and `gemini-image.ts` - `uploadToCloudinary` depends on `requireCloudinaryEnv`, `generatePublicId`, `buildSignature`, `getRemoteSize`, `uploadChunked`, and `buildMediaTags` - asset upload API is centered on `apps/studio/app/api/media/v1/assets/upload/route.ts::POST` - image edit flow is centered on `apps/studio/app/api/media/v1/image/edit/route.ts::POST` - image refine flow is centered on `apps/studio/app/api/media/v1/image/refine/route.ts::POST` ### Content Creator Studio - **Repo:** `Awhitter/content-creator-studio` - **GitHub:** `https://github.com/Awhitter/content-creator-studio` - **Live:** `https://content-creator-studio-lovat.vercel.app` - **Last verified:** 2026-04-15 - **Role:** adjacent content workbench frontend - **Main purpose:** a lightweight conversational UI for AI-powered content creation, built as a thin frontend over a backend intelligence layer - **Main surfaces other agents interact with:** conversational wizard and chat UI, session persistence, registry browser, run history, asset editor, backend API bridge ### EduMastery - **Repo:** `Awhitter/EduMastery` - **GitHub:** `https://github.com/Awhitter/EduMastery` - **Live:** `https://ai4mastery-next-6kpgw1zzw-alecs-projects-e88e78a8.vercel.app` - **Role:** active-adjacent publishing and admin surface - **Main purpose:** adjacent publishing and admin behavior and inventory continuity ### Agent Canvas - **Repo:** `Awhitter/Agent-Canvas-` - **GitHub:** `https://github.com/Awhitter/Agent-Canvas-` - **Live:** `https://agent-coordination-canvas.replit.app/` - **Role:** coordination shell, canvas, and parent-child agent concepts - **Main purpose:** coordination patterns and agent orchestration concepts ### Evidence-Based Business - **Repo:** `Awhitter/Evidence-Based-Business` - **GitHub:** `https://github.com/Awhitter/Evidence-Based-Business` - **Live:** `https://clean-ebb.vercel.app` - **Alt live:** `https://build-measure-learn.vercel.app` - **Role:** measurement and feedback layer - **Main purpose:** analytics, measurement, feedback-loop support, and experiment interpretation ## Active agent and runtime inventory Always list active resident agents concretely. ### Victoria - **Type:** standalone operator agent - **Runtime:** Render and OpenClaw - **Service:** `openclaw` - **Role:** primary orchestrator and fleet commander - **Status:** active ### Julius - **Type:** standalone operator agent - **Runtime:** Render and OpenClaw - **Service:** `openclaw-justin` - **Role:** operator for Justin Leas - **Status:** active ### Lila - **Type:** standalone operator agent - **Runtime:** Render and OpenClaw - **Service:** `openclaw-marketing` - **Role:** strategist and marketing operator - **Status:** active ### Other important agent surfaces - Claude Code SDK agent - parent and sub-agent canvas model ## Current focus: Framer + Next.js + article sidecar flow This is the current focus and should be easy for every agent to understand. ### The three-system flow 1. **sidecar-system** is the upstream article creation and publishing orchestration surface 2. **MasteryPublishing (Next.js)** is the canonical structured `/resources/**` destination 3. **Framer and HLTMastery shell** are the branded public shell, navigation layer, and legacy or public surface ### How they flow together - sidecar creates or refines structured article content - sidecar can publish to the structured content engine and to Framer when strategically appropriate - MasteryPublishing renders the structured Next.js article lane - Cloudflare proxy makes the Next.js lane appear inside the HLTMastery public route family - Framer still owns the public shell, nav, footer, landing pages, and some legacy or public content families ### Short operating rule If the task touches the current article pipeline, you almost always need to reason across these three together: - sidecar-system - MasteryPublishing - Framer and HLTMastery public shell ## Cross-repo check matrix | If the task touches... | Check these first | | --------------------------------------------------------------------------------- | ----------------------------------------------------- | | capability discovery, agent tooling, MCP, schemas, registry | Katailyst | | article creation flow, publish orchestration, destination choice | sidecar-system + Katailyst | | structured `/resources/**` pages, publish API, revalidation, article display | MasteryPublishing + sidecar-system | | HLTMastery public shell, nav, footer, landing pages, legacy blog or resources | Framer and HLTMastery shell + MasteryPublishing | | article images, media generation, branded assets, transformations | Multimedia Mastery + Cloudinary | | recruiting and test-prep content strategy, topic discovery, performance learning | Katailyst + Evidence-Based Business + research lanes | | agent coordination, plans, multi-agent work | Agent Canvas + Katailyst | | corporate educational data, QBank explanation context, upstream content inventory | corporate CMS + relevant publishing and content repos | ## HLTMastery, Framer, and proxy boundary ### Verified public site - `https://hltmastery.com` ### Verified route examples - `https://hltmastery.com/nursing/resources` - `https://hltmastery.com/nursing/nclex-blog` ### Architecture summary: two systems, one domain The public domain is shared across multiple underlying systems. - `hltmastery.com/nursing/resources/*` is the structured Next.js content lane served from MasteryPublishing through a reverse-proxy layer - Framer still owns major shell and public-experience surfaces, including homepage, navigation, footer, and legacy or separate content surfaces - Cloudflare proxying and path rewriting make these systems feel like one domain even when ownership is split underneath ### Boundary model - **Katailyst** = canon and orchestration truth - **sidecar-system** = workflow surface and destination chooser - **MasteryPublishing** = canonical structured resource destination - **Framer** = shell, landing pages, brand-facing page builder, navigation layer, and still-important public experience surface - **Cloudflare proxy** = path stitcher, not canon - **Multimedia Mastery + Cloudinary** = media lane feeding destinations ### Route mapping to keep in mind | Public route family | Underlying owner | Meaning | Status | | ------------------------------------- | --------------------------- | --------------------------------- | ------------- | | `/nursing/resources` | MasteryPublishing via proxy | structured resources hub | live | | `/nursing/resources/` | MasteryPublishing via proxy | structured product landing | live | | `/nursing/resources//` | MasteryPublishing via proxy | structured article detail | live | | `/nursing/resources/search` | MasteryPublishing via proxy | structured search lane | live | | `/nursing/nclex-blog/*` | Framer | legacy or public blog lane | keep for now | | `/nursing/fnp/resources/*` | Framer | Framer-managed resource lane | separate lane | | `/`, main nav, footer, shell surfaces | Framer | public shell and brand experience | shell owner | ### Next.js and MasteryPublishing route anatomy - `app/resources/page.tsx::ResourcesPage` drives the resources hub via `getResourcesPageSettings` and `getProducts` - `app/resources/[product]/page.tsx::ProductPage` drives per-product landing routes - `app/resources/[product]/[slug]/page.tsx::ArticlePage` drives article detail routes via `getArticleBySlug`, `getArticlePageSettings`, and `getProductBySlug` - `app/resources/search/page.tsx` handles search - `app/sitemap.ts::sitemap` handles sitemap generation using `NEXT_PUBLIC_SITE_URL` and article and product inventory ### Why the HLTMastery route matters so much The Next.js publishing page and `/nursing/resources/*` route family are one of the most important near-term live surfaces. They need to stay synchronized with HLTMastery expectations around: - menu and navigation clarity - logos and branding - shell integration expectations - canonical URLs and metadata - recruiting and future vertical extensibility - rich HTML and multimedia support ### Current shell truth Agents should assume the Content Engine does **not** currently own the whole shell. That means nav, footer, visual chrome, and overall branded public-experience alignment with Framer are real system concerns, not cosmetic afterthoughts. ### Framer rule Do not blindly copy Framer pages. Use this model: - keep **Framer** as an important shell, landing page, navigation, and branded public-experience layer - keep **MasteryPublishing** as the canonical structured `/resources/**` lane - keep sidecar able to publish to **both** when that is strategically right - use projection, synchronization, and coexistence deliberately rather than collapsing the systems together without boundaries ## Core content and publishing shapes ### Article lifecycle at a glance ```text sidecar article creation -> contentEnginePublish or publishToDestinations -> POST /api/publish in MasteryPublishing -> Supabase article and taxonomy tables -> Next.js /resources routes -> Cloudflare proxy path rewrite -> hltmastery.com/nursing/resources/* ``` ### sidecar upstream working shape: `ArticleV2` Current surfaced fields include: - `id` - `headline` - `slug` - `product` - `category` - `content_type` - `subheadline` - `topics` - `seo` - `intro_html` - `body_html` - `body_blocks` - `featured_image` - `author` - `word_count` - `reading_time_minutes` - `status` - `content_blocks[]` ### sidecar to MasteryPublishing publish shape Current surfaced fields include: - `katailyst_id` - `slug` - `title` - `subtitle` - `body_html` - `excerpt` - `hero_image_url` - `og_image_url` - `content_type` - `category` - `product_slug` - `author_slug` - `faq_json` - `meta_title` - `meta_description` - `word_count` - `reading_time_minutes` - `status` - `featured` - `topic_slugs` ### sidecar publish destinations Known destination families include: - `content_engine` to MasteryPublishing and the structured Next.js content lane - Framer resource upsert to Framer CMS draft and live lane - Framer publish request to explicit site publish and deploy lane - `ai4edu` to adjacent publishing lane when enabled - social, email, and other downstream destinations via orchestration surfaces ### Rendering path for the public site ```text User requests hltmastery.com/nursing/resources// -> Cloudflare worker intercepts /nursing/resources/* -> proxies to MasteryPublishing /resources// -> Next.js fetches article and settings from Supabase -> Cloudflare rewrites internal links to keep /nursing/resources prefix -> user sees structured content on hltmastery.com ``` ### MasteryPublishing canonical publish contract Required core fields include: - `katailyst_id` - `slug` - `title` - `product_slug` Additional important fields include: - `body_html` - `body_json` - `hero_image_url` - `hero_image_alt` - `hero_video_url` - `og_image_url` - `category` - `audience_stage` - `difficulty_level` - `tags` - `estimated_value` - `faq_json` - `stats_json` - `steps_json` - `comparison_json` - `citations` - `meta_title` - `meta_description` - `canonical_url` - `noindex` - `status` - `published_at` - `featured` - `sort_order` - `word_count` - `reading_time_minutes` - `author_slug` - `topic_slugs` ### SEO and canonicalization rules - canonical URLs for structured resources should use `https://hltmastery.com/nursing/resources/*` - `og:url` should use the public HLTMastery domain, not the raw Vercel domain - sitemap generation should reflect the intended public domain and route family - article schema and metadata should point at the public canonical route - coexistence with Framer legacy routes should be deliberate to avoid duplicate-content confusion ## Cloudinary system summary ### Account identity - **Cloud name:** `HLT Media` - **Product environment ID:** `c-1e2a3dbe7b0abcf38e49df4f50a4da` ### Useful console links - Plans: `https://console.cloudinary.com/app/c-1e2a3dbe7b0abcf38e49df4f50a4da/settings/billing/plans` - Media library: `https://console.cloudinary.com/app/c-1e2a3dbe7b0abcf38e49df4f50a4da/assets/media_library/folders/home?view_mode=list` - Metadata fields: `https://console.cloudinary.com/app/c-1e2a3dbe7b0abcf38e49df4f50a4da/assets/media_library/metadata_fields` - Security settings: `https://console.cloudinary.com/app/c-1e2a3dbe7b0abcf38e49df4f50a4da/settings/security` ### What Cloudinary is for Cloudinary should be treated as the intended system of record for media assets and derivatives across the ecosystem. ### Top-level folder structure - `articles/` - `branding/` - `in-app/` - `inbox/` - `multimedia/` - `products/` - `samples/` - `shared/` - `social/` ### Metadata fields configured - `status` - `source` - `app_id` - `vertical` - `asset_type` ### Watermark and branding model Cloudinary supports overlay-based watermarking and named transformations. Recommended HLT pattern: - store logos in `branding/logos/` - create named transformations such as `t_hlt_watermark` - use platform-specific branded transformations for social and media derivatives ### Multimedia rule of thumb - Multimedia Mastery should be the preferred media-generation and media-workflow lane - Cloudinary should be the preferred storage, derivative, and branding system of record - direct generation flows that bypass Cloudinary may be expedient, but they are structurally weaker and should not become the default long-term pattern - article image quality, asset persistence, tagging, and branded delivery are part of the real system, not optional polish ## Corporate data and CMS reality There is also a major corporate CMS and database outside the current Katailyst ecosystem that matters. ### Corporate CMS posture - this system houses large amounts of critical company data, including practice-question and educational content inventory - it is effectively a major upstream data system for corporate apps - it should be treated cautiously - current safe posture is primarily read-oriented, with only carefully limited write behavior where explicitly approved and reliable ### Example surface - `https://cms.hltcorp.com/items/49311/edit` ### QBank and educational explanation relevance This corporate data includes structures like: - question stem - answer choices - submission flow - choice-specific rationale - key takeaway - longer explanation and teaching content These explanation surfaces matter because they can inform: - educational content quality - article creation - SEO topic generation - teaching patterns - future content adaptation and enrichment ## Near-term priorities that matter right now ### Highest-priority live surface The Next.js publishing page and HLTMastery route alignment are among the most important immediate execution priorities. ### What that means concretely - get the Next.js publishing page live and synced properly with HLTMastery and Framer - clarify services around the Next.js publishing page, menu and navigation needs, logos, and branding - keep sidecar able to publish to both Framer and the Next.js content lane - improve HTML handling and richer multimedia capabilities for the new service layer - support recruiting and future vertical pages, not just one narrow surface - improve Slack access for resident agents - use Victoria, Julius, and Lila as real durable agents, not just ephemeral spin-up workers - strengthen Agent Canvas as the coordination and canvas layer ### Immediate stabilization checklist - audit current Supabase article inventory, including published, draft, and missing-image states - fix visible image 404s and remove placeholder image dependencies - add canonical tags for structured resources on the HLTMastery public domain - update `og:url` and related metadata to use the public domain instead of raw Vercel URLs - verify sitemap and robots behavior for `/nursing/resources/*` - document the publish workflow clearly so agents know when to use which destination ### Media pipeline stabilization - fix Multimedia Mastery schema or Oracle failures that block image generation - connect sidecar article workflows to Multimedia Mastery where appropriate - route article images through Cloudinary as the asset system of record - use Cloudinary transformations for branding, watermarking, and distribution ### Content and growth priority - grow test prep and recruiting visibility through broad high-quality resource and article coverage - push toward an NCLEX OPedia or encyclopedia style content surface - cover high-value niche topics at scale - use forums, Firecrawl, Tavily, product data, and educational data to identify the best topics instead of relying only on manually proposed ideas ### Framer coexistence rule - keep Framer for shell, public experience, and legacy lanes while the structured content lane proves itself - do not rush migration just for elegance - measure speed, SEO, workflow quality, and publishing velocity before large-scale sunset decisions ## Key rules that agents must follow - Do not assume the current repo is the whole system. - Use Katailyst first. - Use Axon second. - Check sibling repos when the task crosses boundaries. - Prefer official canon docs over scattered notes. - Do not confuse shell and public route ownership with canonical content ownership. - Do not perform dangerous publish actions without explicit approval where required. - Document concrete things, not vague summaries. ## Companion files ### `AGENTS.md` Should say: - read `llms.txt` first - follow repo-local rules after ecosystem orientation ### `cloud.md` Should say: - read `llms.txt` first - this repo participates in a larger ecosystem - runtime and deployment notes are local overlays, not the whole system map ## Sync and update model This file should be maintained once and propagated everywhere. ### Current canonical source - `/Users/alecwhitters/Documents/Obsidian Vault/OpenClaw/System Maps/05-llms-ecosystem-master.md` ### Current sync script - `/Users/alecwhitters/.openclaw/workspace/system/sync-llms-to-repos.sh` ### Current generated outputs Each target repo receives: - `llms.txt` - `llm.txt` - `.well-known/llms.txt` ### Recommended implementation pattern 1. keep the master ecosystem source in Obsidian 2. keep small metadata files per repo only if truly needed later 3. generate repo copies automatically rather than hand-editing them 4. have `AGENTS.md` and `cloud.md` point agents to `llms.txt` first 5. later, validate generation in CI so stale copies are caught ### Why generation matters If this is hand-maintained separately in each repo, it will drift and stop being trusted. The whole point is to have one document to keep excellent and then spread reliably. ## Last verified - master ecosystem llms draft last updated: 2026-04-15 - many repo and runtime details remain partially verified and should keep improving ## Bottom line This ecosystem is multi-repo, multi-runtime, multi-surface, and agents must not behave as if each repo is an island. This file exists to stop that failure mode and provide one official cross-repo entrypoint that can be copied into every important repo. --- ## Source: docs/eval-pipeline-spec.md # Pipeline Eval Spec v1 — Testing the Complete Agent Thinking Pipeline **Purpose**: Evaluate agents on the FULL pipeline — not just "did discovery find the right entities" but "did the agent think, explore, compose, and produce quality output that achieves a real business outcome?" **Philosophy**: The eval tests 5 phases of agent thinking. An agent that skips phases or rushes through them will score poorly even if it produces decent output. We're testing the PROCESS as much as the RESULT because the process is what makes results reliable and repeatable. **Connection to existing infrastructure**: This spec is designed to integrate with the existing eval system at `dashboard-cms/evals/` (rubric judging, pairwise comparisons, signal propagation). Each phase produces gradable artifacts that feed the eval pipeline. --- ## The 5 Phases of Agent Thinking ### Phase 1: INTERPRET (The Entry Form) Before any action, the agent MUST produce an interpretation artifact: ``` ## Task Interpretation **What was asked (surface):** [literal restatement] **What is actually needed (intent):** [what the requester really wants] **Who is this for (audience):** [end consumer of the output] **What does success look like:** [concrete, measurable outcome] **Complexity estimate:** [simple | medium | complex | multi-session] **Stakes:** [low | medium | high | critical] **What could go wrong if I rush:** [the trap to avoid] ``` **Why this matters**: The KB cleanup disaster. An agent that interprets "write a blog post about nursing exam prep" as "produce 800 words about NCLEX" will score lower than one that interprets it as "create an engaging, SEO-optimized article that drives nursing students to HLT products, using current trends, real data, and multimedia where possible." **Grading rubric (20% of total score):** - 5: Deep interpretation — identifies unstated needs, audience, success criteria, and potential pitfalls - 4: Good interpretation — gets intent right, identifies audience, reasonable success criteria - 3: Surface interpretation — restates the task with minor elaboration - 2: Minimal — just acknowledges the task - 1: Skipped — jumped straight to execution ### Phase 2: EXPLORE (Graph Scouts + Research) After interpretation, the agent sends scouts into the registry and external sources: **Scout pattern (2-10 parallel sub-agents depending on complexity):** ``` Scout 1: Registry Discovery → discover(intent) → get top 10 entities → traverse(top_3, depth=2) → map the graph neighborhood → Report: "Here are the entities and connections relevant to this task" Scout 2: Context Assembly → For each discovered entity, get_entity(ref) → read full content → Identify which are truly needed vs nice-to-have → Report: "Here's what I found in the content that's relevant" Scout 3: External Research (if complexity >= medium) → Web search for current trends, competitive landscape, audience insights → Report: "Here's what's happening externally that should inform this task" Scout 4+: Domain-Specific (if complexity >= complex) → Check specific channels (NCLEX-RN vs ASVAB vs TEAS) → Pull exam blueprints, compliance requirements → Check brand voice guidelines, style constraints → Report domain-specific findings ``` **Why this matters**: An agent that discovers `bundle:blog-writing-kit` and then immediately starts writing misses the graph neighborhood: the nursing content style guide, the exam blueprint, the SEO topic signals, the social content strategy that should inform distribution. Scouts pull all of this before the agent commits to a plan. **The key insight from Alec**: "The sub-agents are not enough, you have to look yourself." The scouts bring back information, but the main agent MUST review and synthesize it — not just blindly consume scout reports. **Grading rubric (25% of total score):** - 5: Comprehensive exploration — discovers relevant entities, traverses graph, reads content deeply, researches externally, synthesizes scout reports into a coherent picture - 4: Good exploration — discovers and traverses, reads key entities, some external research - 3: Basic exploration — runs discovery, reads a few entities, no traversal or external research - 2: Minimal — discovers entities but doesn't read them or traverse - 1: Skipped — used no registry context at all ### Phase 3: PLAN (Decompose into Lanes) After exploration, the agent produces a concrete plan: **Three-lane decomposition:** ``` ## Execution Plan ### Lane 1: Core Content Production - What to produce: [specific deliverables] - Entities to use: [specific refs from scout reports] - Quality constraints: [rubrics, style guides, compliance] - Estimated effort: [tokens, steps] ### Lane 2: Research & Enrichment - What to research: [trends, competitive, audience] - Sources: [web, registry, specific KBs] - How findings feed Lane 1: [specific integration points] ### Lane 3: Distribution & Packaging - Where output goes: [channels, platforms, formats] - Supporting assets needed: [social posts, email, landing page] - Measurability: [what metrics to track] ### Dependencies & Sequencing - Lane 2 must complete before Lane 1 finalizes (research informs content) - Lane 3 can start in parallel after Lane 1 draft exists - Total estimated complexity: [simple: 1 lane | medium: 2 lanes | complex: all 3] ``` **Grading rubric (15% of total score):** - 5: Clear 3-lane plan with specific entity refs, dependencies, and quality constraints - 4: Good plan with 2+ lanes and specific deliverables - 3: Basic plan — lists what to do but no lanes, dependencies, or entity refs - 2: Minimal — acknowledges steps but no structure - 1: Skipped — jumped from exploration to execution ### Phase 4: EXECUTE (Produce the Output) The agent produces the actual deliverable(s), using the context assembled in Phase 2 and following the plan from Phase 3. **What we grade here:** - Did the output use the entities and context from Phase 2? - Did it follow the plan from Phase 3? - Is the output high quality by domain standards? - Does it match the brand voice? - Is it factually accurate? - Does it achieve the success criteria from Phase 1? **Grading rubric (30% of total score):** - 5: Output is excellent — uses registry context, follows plan, achieves success criteria, would impress the requester - 4: Output is good — uses some context, mostly follows plan, achieves most success criteria - 3: Output is adequate — generic quality, limited registry context, partially meets success criteria - 2: Output is poor — doesn't use context, doesn't follow plan, misses success criteria - 1: Output is missing or fundamentally wrong ### Phase 5: REFLECT (Self-Assessment + Distribution) After execution, the agent reflects on what it produced: ``` ## Output Assessment **Did I achieve the success criteria from Phase 1?** [yes/partially/no + why] **What registry entities were most valuable?** [list with brief why] **What was missing from the registry?** [gaps that should be filled] **What would I do differently next time?** [lessons for the agent system] **Suggested next steps:** [distribution, follow-up, iteration] **Confidence level:** [low/medium/high + reasoning] ``` **Grading rubric (10% of total score):** - 5: Honest, specific self-assessment with actionable suggestions and registry gap identification - 4: Good reflection — acknowledges what worked and what didn't - 3: Surface reflection — "I think this turned out well" - 2: Minimal — one-line assessment - 1: Skipped --- ## Score Composition | Phase | Weight | What It Tests | | --------- | ------ | ------------------------------------------------------- | | Interpret | 20% | Can the agent understand what's really being asked? | | Explore | 25% | Can the agent use the graph and research intelligently? | | Plan | 15% | Can the agent decompose work into structured lanes? | | Execute | 30% | Is the output actually good? | | Reflect | 10% | Can the agent assess its own work honestly? | **Grade scale:** A+ (90%+), A (80%+), B+ (70%+), B (60%+), C+ (50%+), C (40%+), D (30%+), F (<30%) **Meta-grade: Process Integrity (bonus/penalty)** - Bonus: +5% if agent explicitly cited registry entities in output (shows graph was used, not just discovered) - Bonus: +5% if agent identified a registry gap worth filling - Penalty: -10% if agent skipped Interpret and went straight to Execute - Penalty: -5% if agent discovered entities but didn't actually use them in output --- ## The 30 Use Cases, Mapped to Pipeline Phases Each use case has natural phase emphasis. Some are interpretation-heavy (strategic thinking), some are exploration-heavy (research), some are execution-heavy (content production). ### Tier 1: Identity & Context (Phase 1 dominant) | # | Case | Primary Phase | Key Registry Entities | Expected Difficulty | | --- | ------------------- | ------------------- | -------------------------------------------------------------- | ------------------- | | 1.1 | Self-awareness | Interpret + Explore | agent-foundation-spec, global-catalyst-guide, hub-registry | Simple | | 1.2 | User understanding | Interpret + Explore | agent-sop-victoria, global-team-context, hlt-brand-foundation | Simple | | 1.3 | Team voice modeling | Interpret + Execute | brand-voice-master, katailyst-brand-voice, global-team-context | Medium | ### Tier 2: Core Content Production (Phase 4 dominant) | # | Case | Primary Phase | Key Registry Entities | Expected Difficulty | | --- | --------------- | ----------------- | --------------------------------------------------------------------------------------- | ------------------- | | 2.1 | QBank item | Explore + Execute | qbank-kit (bundle), qbank-mcq-single, exam-blueprint-\*, channel:hlt-asvab | Medium | | 2.2 | Article writing | All 5 phases | article-standard, blog-writing-kit, topic-taxonomy-nursing, social-content-winners-2026 | Complex | | 2.3 | Social posts | Explore + Execute | hub-social, social-post-v1, social-linkedin-post, social-ig-carousel | Medium | | 2.4 | Email marketing | Plan + Execute | hub-email, message-email-newsletter, marketo-api, email channel | Medium | ### Tier 3: Research & Analysis (Phase 2 dominant) | # | Case | Primary Phase | Key Registry Entities | Expected Difficulty | | --- | ------------------- | ------------------------------- | ----------------------------------------------------------------------- | ------------------- | | 3.1 | Trending topics | Explore + Execute | hub-research, social-content-winners-2026, topic-taxonomy-nursing | Complex | | 3.2 | Competitive tactics | Explore + Plan | hub-analysis, cross-industry-top-1-percent, content-creation-philosophy | Complex | | 3.3 | Financial/metrics | Explore only (data access test) | (needs analytics MCP) | Complex | ### Tier 4: Strategic & Business Ops (Phase 1+3 dominant) | # | Case | Primary Phase | Key Registry Entities | Expected Difficulty | | --- | ------------------- | ----------------- | ----------------------------------------------------------- | ------------------- | | 4.1 | PR strategy | Interpret + Plan | hlt-brand-foundation, hub-analysis, global-team-context | Complex | | 4.2 | Investor update | Plan + Execute | hlt-brand-foundation, content-performance-playbook | Complex | | 4.3 | B2B sales thinking | Interpret + Plan | hlt-brand-foundation, audience-nursing-students, hub-growth | Complex | | 4.4 | Conversion analysis | Explore + Execute | (needs analytics MCP) | Complex | ### Tier 5: Multi-modal & Creative (Phase 4 dominant) | # | Case | Primary Phase | Key Registry Entities | Expected Difficulty | | --- | -------------------- | ------------------- | ------------------------------------------------------------- | ------------------- | | 5.1 | Infographic | Plan + Execute | visual-infographic (content_type), design-brand-guide-pattern | Medium | | 5.2 | Image progression | Execute (iterative) | (needs image gen tools) | Complex | | 5.3 | Video transformation | Explore + Execute | (needs YouTube MCP) | Complex | | 5.4 | Data visualization | Execute | (needs data access) | Medium | ### Tier 6: Multi-channel & Distribution (Phase 3 dominant) | # | Case | Primary Phase | Key Registry Entities | Expected Difficulty | | --- | ------------------------ | ------------------ | ----------------------------------------------------------------- | ------------------- | | 6.1 | Lead capture cheat sheet | All 5 phases | web-landing-page, message-email-course, channel:hlt-asvab | Complex | | 6.2 | App copywriting | Explore + Execute | web-upgrade-screen, hlt-product-overview-nclex-rn, hub-conversion | Medium | | 6.3 | Sign-up flow | Execute (tool use) | (needs browser tools) | Medium | ### Tier 7: Communication & Coordination (Phase 1+5 dominant) | # | Case | Primary Phase | Key Registry Entities | Expected Difficulty | | --- | ------------------ | ------------------- | ------------------------------------------------------------------------ | ------------------- | | 7.1 | Financial update | Interpret + Execute | global-team-context, hlt-brand-foundation | Medium | | 7.2 | Phone call | Execute (stretch) | (needs voice tools) | Complex | | 7.3 | Conference pitch | All 5 phases | hlt-brand-foundation, hlt-product-overview-\*, audience-nursing-students | Complex | | 7.4 | Agent coordination | Plan + Execute | agent-foundation-spec, agent-handoff types | Complex | | 7.5 | Team engagement | Interpret + Plan | global-team-context, agent-standing-instructions | Medium | ### Tier 8: Platform Building (All phases) | # | Case | Primary Phase | Key Registry Entities | Expected Difficulty | | --- | ------------------------ | ----------------- | ----------------------------------------------------------- | ------------------- | | 8.1 | QBank transformation | Explore + Execute | qbank-kit, web-interactive-module | Complex | | 8.2 | Community website | All 5 phases | web-landing-page, audience-nursing-students, hub-growth | Complex | | 8.3 | Recruiting business plan | Interpret + Plan | hub-growth, hlt-brand-foundation, audience-nursing-students | Complex | | 8.4 | Interactive edu content | Plan + Execute | web-interactive-module, study-guide, hub-education | Complex | --- ## How This Connects to Existing Infrastructure ### Integration with `dashboard-cms/evals/` Each pipeline run produces 5 phase artifacts. These become **eval items** in the existing system: 1. **Rubric judging** (`actions/judging.ts`) — AI judge scores each phase against the rubric above. Human can override. Scores feed `eval_signals` table. 2. **Pairwise comparison** (`pairwise/`) — Run the same use case with different agents (Victoria vs Lila) or different configurations (with/without sub-agents). Elo rating shows which pipeline config produces better results. 3. **Signal propagation** (`actions/signals.ts`) — When a registry entity consistently helps agents score higher, its eval signal improves, which improves its discovery ranking. Virtuous cycle. 4. **Regression runner** — After registry changes (entity enrichment, weight rebalancing), re-run eval suite to measure impact. Detect regressions. ### Integration with Factory (`dashboard-cms/factory/`) The pipeline eval can be a **factory template**: ``` Factory Template: "Pipeline Eval Run" Questionnaire: - Which use case? (dropdown of 30) - Which agent? (Victoria / Lila / Julius / Custom) - Sub-agents enabled? (yes/no) - Research tools enabled? (yes/no) Generator: - Creates a `run` record in the DB - Dispatches the use case prompt to the selected agent - Captures the 5-phase output - Routes to eval judging queue Post-commit: - Score each phase against rubric - Update eval_signals for referenced entities - Flag any registry gaps identified in Phase 5 ``` ### Integration with Observe (`dashboard-cms/observe/`) Every pipeline run becomes a **run trace**: - `run_events` capture each phase transition (interpret_start → interpret_done → explore_start → ...) - `run_outputs` capture the phase artifacts (interpretation doc, scout reports, plan, output, reflection) - Decision breakdown shows which entities were discovered, traversed, selected, and actually used - `recommendation_receipt` with `graph_promotions` shows exactly why each entity was chosen ### Standalone (the HTML eval page) For quick testing without the full app, extend the existing `discovery-eval-v1.html` to: 1. Accept a use case prompt 2. Call the MCP `discover` + `traverse` endpoints (as it already does) 3. Show what the agent WOULD have available (the graph neighborhood) 4. Let a human grade each phase manually using the rubric 5. Log results to a local JSON file This is the "skeleton" — it doesn't run agents, but it shows everything an agent would see and lets a human grade what a hypothetical agent run would look like. --- ## The Agent Thinking Loop (Canonical — Lives in AGENTS.md) The canonical protocol is in `AGENTS.md` under "Agent Thinking Loop." It defines the loop structure (Interpret → Explore → Plan → Execute → Evaluate → loop back), the sub-agent scout swarm pattern, the mandatory external research requirement, and the formal self-scoring in Evaluate. Key design principles of the loop (for eval purposes): - **It is a LOOP, not a chain.** Agents loop back from Evaluate if confidence is low. Within Explore, scouts loop (scout → synthesize → scout deeper). - **Research is mandatory for content creation.** Before creating anything, agents MUST find top performers in the space and adjacent spaces, find current trends, and synthesize patterns. Using Firecrawl, web search, Tavily — whatever is available. Output from training data alone = generic = low score. - **Sub-agent swarm in Explore.** 2-10 scouts dispatched in parallel across different graph branches and external sources. The main agent synthesizes, does not just consume reports blindly. - **Evaluate includes formal self-scoring 1-5 per phase, confidence rating, and loop-back decision.** Low confidence = loop back, not submit. - **User check-ins.** Share interpretation early. Check in mid-execution on complex tasks. Present output with self-assessment at end. - **Duration expectations.** Simple ~5 min. Medium ~15-20 min. Complex ~20-30+ min with possible multiple loops. --- ## Running the Eval Today ### What we can test RIGHT NOW (Tier 1-2, ~10 cases): These use cases only need the MCP registry (working), identity docs (mostly working), and content production capability (agents can write). No external tool integrations needed. **Runnable today:** - 1.1 Self-awareness - 1.2 User understanding - 1.3 Team voice modeling - 2.1 QBank item - 2.2 Article writing (text only, no multimedia) - 2.3 Social posts (text only, no visuals) - 2.4 Email marketing - 4.1 PR strategy (planning only) - 7.5 Team engagement - 8.4 Interactive edu content (HTML artifact) **How to run:** 1. Give the agent the use case prompt 2. Include the Agent Reflection Protocol in the system prompt 3. Capture the 5-phase output 4. Grade each phase using the rubric (AI judge or human) 5. Record scores in eval_signals 6. Compare across runs (regression detection) ### What we need to build: 1. **Agent Reflection Protocol** → Add to AGENTS.md and/or agent SOPs (this doc provides the spec) 2. **Sub-agent scout dispatcher** → Pattern for decomposing explore phase into parallel scouts 3. **Eval scoring page** → UI in `dashboard-cms/evals/` that shows 5-phase breakdown per run 4. **Factory template** → "Pipeline Eval Run" template with questionnaire + generator ### What we DON'T need to build: - The eval infrastructure (rubric judging, pairwise, signals) — **already exists** - The observe system (run traces, decision breakdowns) — **already exists** - The graph traversal — **already exists and works well** - The discovery system — **already exists, 73.8% B+ and improving** --- ## Source: docs/eval-system.md # Eval System > Reference for the Katailyst quality and evaluation system. > Code: `lib/evals/`, `lib/test-lab/`, eval_case entity type in registry. --- ## 1. What the Eval System Actually Is The eval system answers one question: **"Is the AI producing good output?"** It does this through: - **Quality checks** (eval cases) -- predefined test scenarios run against agents - **Rubric scoring** -- structured criteria applied to outputs - **Head-to-head comparisons** (pairwise) -- two outputs judged against each other - **Signal computation** -- aggregated quality scores attached to registry entities - **Variation generation** -- producing multiple versions and comparing them - **Discovery evaluation** -- testing whether the search/retrieval system returns the right things --- ## 2. Data Model ### Database Tables (Supabase/PostgreSQL) | Table | Purpose | Key Columns | | ---------------- | -------------------------------------- | --------------------------------------------------------------------- | | `runs` | Every evaluation execution | `id, org_id, run_type, status, north_star, context_json, created_at` | | `run_outputs` | Individual outputs from a run | `id, run_id, content, content_json, score, metadata` | | `evaluations` | Rubric judgments on outputs | `id, run_output_id, rubric_code, score, criteria_scores, judge_model` | | `pairwise_tests` | Head-to-head comparisons | `id, org_id, output_a_ref, output_b_ref, winner, judge_reasoning` | | `entity_signals` | Computed quality scores per entity | `entity_id, signal_type, value, confidence, updated_at` | | `payload_store` | Large text payloads (outputs, prompts) | `id, org_id, payload, content_type` | | `variation_runs` | Variation generation executions | `id, org_id, source_ref, params, results` | ### Registry Entities (eval-related) | Type | Count | Purpose | | ----------- | ----- | ----------------------------------------------------------------------------------- | | `eval_case` | 30+ | Predefined test scenarios with prompts, expected behaviors, and run modes | | `rubric` | 15+ | Scoring criteria (content-quality, skill-quality, qbank-quality-v1, etc.) | | `metric` | 5+ | Quantitative measures (tag-coverage-ratio, content-quality-score, correctness_rate) | ### Entity Signal Types Signals are computed scores attached to entities. Types include: - `eval_score` -- aggregate eval score - `pairwise_win_rate` -- win rate in head-to-head comparisons - `qa_coverage` -- how well tested this entity is - `confidence` -- statistical confidence in the score --- ## 3. File Inventory ### Core Library (`lib/evals/` -- 27 files) | File | LOC | Purpose | | ----------------------------------- | ---- | --------------------------------------------------------------- | | `types.ts` | 54 | VariationParams, VariationResult, GenerateVariationsInput types | | `contracts.ts` | ~100 | EvalCaseContent schema, parseEvalCaseRevision | | `eval-cases.ts` | ~200 | Load eval cases from registry, parse refs, fetch definitions | | `scoring.ts` | ~150 | Score computation, normalization, grade assignment (A/B/C/D/F) | | `rubric-judge.ts` | ~200 | LLM-as-judge rubric evaluation, prompt construction | | `single-output-judge.ts` | ~150 | Judge a single output against a rubric | | `grade-draft-service.ts` | ~180 | Grade draft content before publishing | | `variation-generator.ts` | ~200 | Generate content variations with different params | | `pipeline-eval-batch.ts` | ~250 | Batch execution of pipeline eval cases | | `pipeline-artifacts.ts` | ~100 | Artifact collection from pipeline runs | | `pipeline-improvement-proposals.ts` | ~150 | Generate improvement suggestions from run results | | `skill-qa.ts` | ~100 | Skill-specific QA coverage computation | | `matrix.ts` | ~100 | Coverage matrix computation (which skills have been tested) | | `decision-packet.ts` | ~150 | Promotion decision support (should this entity be promoted?) | | `eval-signal-bridge.ts` | ~100 | Bridge between eval results and entity signals | | `eval-signal-processor.ts` | ~150 | Process raw scores into entity signals | | `eval-signal-refresh.ts` | ~100 | Refresh signals from latest eval data | | `discovery-weights.ts` | ~100 | Tunable weights for discovery ranking | | `mcp-surface-eval.ts` | ~100 | Eval via MCP surface (for agent access) | | `ledger-metadata.ts` | ~80 | Metadata tracking for eval ledger entries | | **Pairwise subsystem** | | | | `pairwise/elo.ts` | ~100 | Elo rating calculation (chess-style) | | `pairwise/aggregate.ts` | ~100 | Aggregate pairwise results into win rates | | `pairwise/tournament.ts` | ~150 | Tournament bracket management | | **Harness subsystem** | | | | `harness/types.ts` | ~50 | Harness type definitions | | `harness/deterministic.ts` | ~100 | Deterministic test case execution | | `harness/ingest.ts` | ~100 | Ingest external harness results | | `harness/promptfoo.ts` | ~150 | Promptfoo integration adapter | ### Server Actions (`app/dashboard-cms/evals/actions/` -- 15 files) | File | Purpose | | --------------------- | ---------------------------------------------------------------------------------- | | `actions.ts` (barrel) | Re-exports from all sub-modules | | `analytics.ts` | `getEvalDashboardStats`, `getDailySummary` | | `cases.ts` | `getPipelineEvalCaseCatalog`, `getPipelineEvalCaseCoverage`, `runPipelineEvalCase` | | `compare.ts` | `getRecentEvaluations`, `getActivePairwiseTests` | | `decisions.ts` | `getEvalDecisionSupportRows` | | `discovery-eval.ts` | `getDiscoveryEvalCases`, `runDiscoveryEvalCase` | | `helpers.ts` | Shared query utilities | | `judging.ts` | `runRubricJudgment`, `runSingleOutputJudge` | | `matrix.ts` | `getSkillQaCoverageMatrix` | | `pairwise.ts` | `getPairwiseLeaderboardData`, `createPairwiseTest` | | `payload.ts` | `getPayload`, `storePayload` | | `queries.ts` | `getRecentPipelineEvalRuns`, `getRecentVariationRuns` | | `shared.ts` | Shared helpers, constants | | `signals.ts` | `refreshEntitySignals`, `getEntitySignalHistory` | | `variations.ts` | `generateVariations`, `saveVariation` | | `weights.ts` | `getDiscoveryWeights`, `updateDiscoveryWeights` | ### UI Components (`components/evals/` -- 35 files) | Component | Purpose | Tab | | ------------------------------------------ | ---------------------------------------------- | -------------------------- | | `evals-tab-nav.tsx` | Tab navigation (5 tabs) | All | | **Overview tab** | | | | `overview-tab.tsx` | Summary dashboard | Overview | | `overview-stats-cards.tsx` | Score cards (total runs, pass rate, avg score) | Overview | | `daily-summary-card.tsx` | Today's activity summary | Overview | | `eval-case-coverage-panel.tsx` | Which eval cases have been run | Overview | | `eval-score-card.tsx` | Individual score display | Overview | | `eval-suite-progress.tsx` | Overall test suite completion | Overview | | **Runs tab** | | | | `runs-tab.tsx` | Run quality checks + results table | Runs | | `eval-case-launcher-panel.tsx` | Select + run an eval case | Runs | | `pipeline-case-runs-table.tsx` | Recent run results table | Runs | | `pipeline-replay-batch-panel.tsx` | Re-run a batch of cases | Runs | | `pipeline-improvement-proposals-panel.tsx` | AI-suggested improvements | Runs | | `variation-generator-panel.tsx` | Generate content variations | Runs | | `variation-runs-table.tsx` | Recent variation results | Runs | | `variation-diff-view.tsx` | Side-by-side diff of variations | Runs | | `variation-params-form.tsx` | Configure variation parameters | Runs | | `variation-save-dialog.tsx` | Save a variation back to registry | Runs | | **Discovery tab** | | | | `discovery-eval-lab.tsx` | Search/retrieval quality testing | Discovery | | `discovery-weight-tuning.tsx` | Adjust ranking weights | Discovery (also Analytics) | | **Comparisons tab** | | | | `comparisons-tab.tsx` | Pairwise experiments + leaderboard | Comparisons | | `pairwise-compare.tsx` | Side-by-side comparison UI | Comparisons | | `win-rate-leaderboard.tsx` | Elo leaderboard of content versions | Comparisons | | `rubric-judge-panel.tsx` | Manual rubric judging interface | Comparisons | | **Analytics tab** | | | | `analytics-tab.tsx` | Long-range trends and diagnostics | Analytics | | `score-regression-chart.tsx` | Score regression over time | Analytics | | `score-trends-chart.tsx` | Score trend lines | Analytics | | `cost-per-eval-chart.tsx` | Cost tracking per eval | Analytics | | `regression-card.tsx` | Regression detection card | Analytics | | `score-delta-indicator.tsx` | Score change indicator | Analytics | | `qa-coverage-matrix.tsx` | Skill QA coverage heatmap | Analytics | | `eval-mode-coverage-panel.tsx` | Coverage by eval mode | Analytics | | `pipeline-calibration-panel.tsx` | Scoring calibration analysis | Analytics | | `decision-support-panel.tsx` | Promotion recommendations | Analytics | | **Shared** | | | | `eval-skeletons.tsx` | Loading skeletons | All | | `skill-distiller-tabs.tsx` | Skill-specific eval drill-down | Varies | ### Sub-routes | Route | Purpose | | --------------------------------- | ---------------------------------- | | `/dashboard-cms/evals` | Main eval page (5 tabs) | | `/dashboard-cms/evals/ab-tests/` | Dedicated A/B test page | | `/dashboard-cms/evals/discovery/` | Dedicated discovery eval page | | `/dashboard-cms/evals/pairwise/` | Dedicated pairwise comparison page | --- ## 4. Run Execution Flow ### When someone clicks "Run" on an eval case: ``` 1. User selects eval case from dropdown in eval-case-launcher-panel.tsx 2. Clicks "Run" button 3. Client calls server action: runPipelineEvalCase(orgId, caseRef, ...) 4. Server action: a. Loads eval case definition from registry (eval-cases.ts) b. Creates a new `runs` row (status: 'running') c. Executes the pipeline: - For 'harness' mode: calls deterministic harness (harness/deterministic.ts) - For 'manual_capture' mode: captures user-provided output d. Stores outputs in `run_outputs` table e. Runs rubric judgment (rubric-judge.ts) against the output f. Stores evaluation in `evaluations` table g. Updates `runs` row (status: 'completed') h. Triggers signal refresh (eval-signal-refresh.ts) 5. Client sees updated results in pipeline-case-runs-table.tsx ``` ### Run Kinds (7 types) | Kind | What It Does | Automated? | | ---------------------- | -------------------------------------- | ------------------------------------ | | `pipeline_case` | Full agent pipeline evaluation | Yes (harness) or No (manual capture) | | `pairwise` | Head-to-head comparison of two outputs | Semi (judge is AI) | | `promptfoo_harness` | External promptfoo test suite | Yes | | `regression_fixture` | Known-good output regression check | Yes | | `variation_generation` | Generate and compare N variations | Yes | | `grade_draft` | Grade content before publishing | Semi | | `skill_qa` | Skill-specific quality check | Yes | --- ## 5. Agent Access via MCP ### Current State The MCP server exposes `eval_refresh_signals` as a tool. Agents can: - **Discover** eval cases via `discover(entity_types: ['eval_case'])` -- 30+ cases returned - **Read** eval case content via `get_entity` or `get_skill_content` - **Trigger** signal refresh via `eval_refresh_signals` ### What's Missing for Full Agent Eval Access Agents CANNOT currently: - Run an eval case via MCP (no `run_eval_case` tool) - Submit pairwise judgments via MCP - View run results via MCP - Generate variations via MCP This is a gap. An agent should be able to: discover eval cases -> run one -> see results -> iterate. That loop requires new MCP tools. --- ## 6. The 30 Eval Cases (from Registry) Organized by category: ### Category 1: Foundation (Self-awareness, Understanding) - `eval-1.1-self-awareness` -- Can the agent explain HLT, its role, its priorities? - `eval-1.2-user-understanding` -- Does it understand Alec as operator? - `eval-1.3-team-voice-modeling` -- Can it emulate HLT internal voice? ### Category 2: Content Production - `eval-2.1-qbank-item` -- Question bank production quality - `eval-2.2-article-writing` -- Long-form article with research and distribution - `eval-2.3-social-post-creation` -- Multi-platform social content - `eval-2.4-email-marketing` -- Welcome/lifecycle email quality - `qbank-item-asvab` -- ASVAB-specific qbank validation ### Category 3: Research & Analysis - `eval-3.1-trending-topics` -- Live trend discovery and translation - `eval-3.2-competitive-tactics` -- Competitive research and revenue strategy - `eval-3.3-financial-metrics-awareness` -- Pull and explain product metrics ### Category 4: Business Strategy - `eval-4.1-pr-strategy` -- PR and exit-positioning strategy - `eval-4.2-investor-update` -- Investor communication and metrics - `eval-4.3-b2b-sales-thinking` -- B2B and institutional growth - `eval-4.4-conversion-analysis` -- Conversion analytics and forecasting ### Category 5: Visual & Media - `eval-5.1-infographic` -- Infographic planning - `eval-5.2-image-progression` -- Iterative image editing - `eval-5.3-video-transformation` -- Video content improvement - `eval-5.4-data-visualization` -- Charts and insight communication ### Category 6: Product & UX - `eval-6.2-app-copywriting` -- Upgrade screen copy - `eval-6.3-sign-up-flow` -- Browser-based sign-up tooling ### Category 7: Operations & Coordination - `eval-7.1-financial-update` -- Internal financial updates - `eval-7.3-conference-pitch-followup` -- Conference lead flow - `eval-7.4-agent-coordination` -- Multi-agent build coordination - `eval-7.5-team-engagement` -- Remote team engagement ### Category 8: Education & Community - `eval-8.1-qbank-transformation` -- Transform qbank into interactive learning - `eval-8.2-community-website` -- Nurse community web surface - `eval-8.4-interactive-edu-content` -- Interactive case-study educational content ### Rubrics (15+) | Rubric | Tier | Status | Scoring Dimensions | | -------------------------- | ---- | ---------- | -------------------------------------------------- | | `registry-entry-quality` | 1 | Active | Universal meta-rubric for all entity types | | `pipeline-5-phase` | 1 | Active | Interpret, explore, plan, execute, reflect | | `discovery-search-quality` | 1 | Active | Relevance, recall, diversity, freshness, coherence | | `content-quality` | 1 | Active | Master composite for content before promotion | | `skill-quality` | 1 | Active | Skill production readiness (ISO 25010 based) | | `tool-reliability` | 1 | Active | Tool integration production readiness | | `web-page-quality` | 2 | Active | Visual design, performance, a11y, content | | `qbank-quality-v1` | 2 | Active | NBME-standard question bank quality | | `clarity-v1` | 1 | Superseded | Now criterion within output rubrics | | `engagement-v1` | 1 | Superseded | Now criterion within output rubrics | | `accuracy-v1` | 4 | Superseded | Now criterion within output rubrics | | `assessment-v1` | 3 | Superseded | Replaced by qbank-quality-v1 | | `audio-quality-v1` | 2 | Archived | Skeleton only | | `video-quality-v1` | 2 | Archived | Skeleton only | | `procedural-clarity-v1` | 4 | Superseded | Absorbed into automation-quality | --- ## 7. How Others Do This (Research) ### Braintrust (braintrustdata.com) **Layout**: Table of test cases with individual scores alongside summary metrics. Green/red delta indicators show improvement/regression. **Key patterns**: - Side-by-side experiment comparison with color-coded indicators - Dimensional scoring (brand alignment 81.25%, tone 75%) -- NOT single holistic score - Filter by high/low scores, sort by changes, drill into specific examples - "Evals for PMs" philosophy: "'I think it's better' becomes 'we tested 200 cases and accuracy improved 5% without regressing tone'" - Human review UI for batch labeling where domain experts provide ground truth - Cross-functional visibility: engineers, QA, and product see the same dashboard **Takeaway for us**: Dimensional scores per criterion with green/red deltas. Make it legible to Emily and Ben, not just Alec. ### Promptfoo (promptfoo.dev) **Layout**: Matrix view -- prompts as columns, test cases as rows, results in cells. Cell-level interactions on hover. **Key patterns**: - Filter modes: All, Failures, Passes, Errors, Different, Highlights - Cell hover actions: mark pass/fail, set custom score (0-1), add comments, highlight - Table settings: toggle columns, text truncation, markdown rendering, inference details - Charts: pass rate percentage, score distribution histogram, scatter plot for head-to-head - Shareable URLs with filter state in query params - `npx promptfoo eval && npx promptfoo view` -- CLI-first, web UI is the viewer **Takeaway for us**: The matrix view (prompts x cases) is powerful but confusing for non-technical. The cell-level hover interactions are excellent. Steal the filter modes. ### Chatbot Arena (lmarena.ai) **Layout**: Blind head-to-head comparison. User sees two responses, picks a winner, never knows which model produced which. **Key patterns**: - Elo rating system (Bradley-Terry model, like chess) - 6M+ crowdsourced votes - Leaderboard with confidence intervals - Category-specific rankings (coding, creative, reasoning) - Bootstrap technique for stable scores from 1000 permutations **Takeaway for us**: The pairwise system we already have is architecturally similar. The key insight is **blind comparison** -- the user shouldn't know which version they're judging. Our `win-rate-leaderboard.tsx` already does Elo. We should keep it but hide it behind the Advanced tools disclosure. ### Airtable (what makes it usable) **Key patterns that make non-technical users succeed**: - Flat table with visible column headers -- no hidden state - Inline editing by clicking a cell - Expand row to see full record as a form/card - Filter bar always visible, filters described in plain English - Grouping by any column with drag-and-drop - Multiple views (Grid, Kanban, Calendar, Gallery) of same data - Color-coded status pills - Linked records shown as clickable pills **Takeaway for us**: The eval table should feel like this. Flat grid. Click a row to expand. Status badges with color. Filters in plain English. --- ## 8. Design Decision: A/B Testing Separate or Combined? ### Current State A/B testing (pairwise comparisons) has its own sub-route (`/evals/ab-tests/`) AND appears in the Comparisons tab. The pairwise subsystem (`lib/evals/pairwise/`) is architecturally separate from pipeline evals. ### Recommendation: Combined but Layered **Why combined**: From the user's perspective, "did version A beat version B?" and "how good is this output?" are the same workflow. Braintrust combines them. The user doesn't care about the technical distinction between rubric scoring and pairwise comparison. **How to layer it**: ``` Primary view (Quality Checks table): +------------------------------------------------------------------+ | Name | Type | Last Run | Score | vs Best | [>] | |-------------------|-----------|----------|--------|---------|-----| | Brand voice test | automated | 3h ago | 4.2/5 | +0.3 | Run | | Article headline | manual | never | -- | -- | Run | | NCLEX quiz item | automated | 1d ago | 3.8/5 | -0.1 | Run | +------------------------------------------------------------------+ Clicking "Run" on a row: 1. Runs the eval case 2. Shows streaming output in an expandable row or right panel 3. Auto-scores with the assigned rubric 4. Shows dimensional scores (brand: 4/5, accuracy: 5/5, tone: 3/5) 5. Optionally: "Compare with previous best?" -> triggers pairwise Clicking the score: 1. Expands to show score history (sparkline) 2. Shows last 5 runs with delta indicators 3. Links to full run detail "Compare" action (replaces separate A/B tab): 1. Select two outputs from history 2. Side-by-side display with diff highlighting 3. AI judge picks winner with reasoning 4. Elo ratings update ``` The leaderboard and tournament features stay behind "Advanced tools" -- they're operator-level, not daily use. --- ## 9. Proposed Simplified Architecture ### Two-View Model **View 1: Quality Checks (Primary)** -- What Emily, Ben, and Justin see An Airtable-style table of eval cases. Columns: - Name (linked to expand) - Category (foundation, content, research, strategy, visual, product, ops, education) - Run mode (Automated / Manual) - Last run (relative time) - Score (0-5 stars or letter grade, color-coded) - Trend (sparkline or delta arrow) - Action (Run button) Interactions: - Click row to expand: shows full prompt, last output, dimensional scores - Click Run: streams output, auto-scores, shows result inline - Filter by category, score range, run mode - Sort by any column - "Compare" button appears after 2+ runs exist **View 2: Run History (Secondary)** -- Deeper drill-down Table of all runs across all cases. Columns: - Run ID (linked to detail) - Case name - Runner (automated/manual) - Status (pending/running/completed/failed) - Score - When - Duration - Cost **Advanced Tools (Disclosure)** -- Operator-level features Behind a disclosure: - Experiments (pairwise tournament) - Variation generator - Discovery ranking lab - Score regression analysis - Pipeline calibration - Promotion decision support - Weight tuning ### What the User Sees to Understand It The quality checks table should communicate: 1. **"What are we testing?"** -- case name + one-line description 2. **"Is it passing?"** -- green/yellow/red score badge 3. **"Is it getting better?"** -- trend arrow or sparkline 4. **"Can I run it now?"** -- play button 5. **"What happened last time?"** -- expandable row with output preview No jargon. No "pipeline_case" or "pairwise evidence sets". Just: name, score, trend, run. --- ## 10. Implementation Plan ### Phase 1: Table Component (EvalCaseTable) New file: `components/evals/eval-case-table.tsx` ``` - Fetches eval cases from registry (reuses getPipelineEvalCaseCatalog) - For each case, fetches latest run from runs table - Renders flat table with: name, category, run mode, last run, score, trend, action - Expand row on click (shows prompt, last output, dimensional scores) - Run button triggers runPipelineEvalCase - Streaming output displayed in expanded row ``` ### Phase 2: Tab Restructure Modify `app/dashboard-cms/evals/page.tsx`: - Remove 5-tab nav - Replace with 2-tab nav: "Quality Checks" | "Run History" - Add "Advanced tools" disclosure below tabs - Move Discovery, Comparisons, Analytics content into disclosure ### Phase 3: Inline Results Display After a run completes: - Show dimensional scores as colored pills (Braintrust pattern) - Show delta vs previous run (green up / red down) - "Compare with best" link triggers pairwise inline - Thumbs up/down for optional user feedback ### Phase 4: Agent MCP Tools (Future) New MCP tools: - `eval_run_case` -- run an eval case and return results - `eval_list_runs` -- list recent runs with scores - `eval_compare` -- compare two outputs side-by-side - `eval_grade` -- grade a piece of content against a rubric --- ## 11. Key Open Questions 1. **Should variation generation be in the primary view or Advanced?** It's useful for content creators (Emily) but the UI is complex. 2. **Should discovery eval be its own page or fold into the table?** Discovery eval tests the search system, not content quality. Different audience. 3. **How to handle the 30 eval cases that haven't been run?** They show "never run" with no score. Need to make this feel inviting, not intimidating. 4. **Streaming results**: The current `runPipelineEvalCase` is a server action. For streaming, we'd need a route handler with SSE or use the AI SDK `streamText` pattern. 5. **Cost tracking**: Each eval run costs tokens. Should we show cost per run in the table? Braintrust does. Promptfoo does. We have `cost-per-eval-chart.tsx` in Analytics. --- ## Sources - [Braintrust Eval Docs](https://www.braintrust.dev/docs/evaluate) - [Braintrust Evals for PMs](https://www.braintrust.dev/blog/evals-for-pms) - [Promptfoo Web UI Docs](https://www.promptfoo.dev/docs/usage/web-ui/) - [Chatbot Arena / LMSYS Leaderboard](https://arena.ai/leaderboard) - [Best AI Eval Tools 2026](https://www.braintrust.dev/articles/best-ai-evaluation-tools-2026) --- ## Source: docs/eval-use-cases.md # Agent Eval Use Cases v1 **Purpose**: Real business outcomes that define "agents fully ready." Each case tests a specific capability stack. Many will fail now — that's the point. This is both an eval AND a roadmap. **How to use**: Run each case against the live agent. Grade on the rubric. Track which capabilities are missing. Build toward passing. **Where this is visible in the UI now**: - `Quality -> Summary` shows the full pipeline eval case coverage table with readiness, run history, and rubric labels. - `Quality -> Eval Results` is the actual run lane. Links with `caseRef=` should land with that case preselected. - `Home` should surface a compact slice of these same eval use cases so the dashboard is showing real benchmarks, not fake starter cards. **Key note from Alec**: "blog" → "resources" or "articles" or "exam-ipedia" everywhere. Not blog. --- ## Tier 1: Identity & Context (Can the agent understand itself?) ### 1.1 Self-awareness **Prompt**: "What are your use cases for HLT? What's your mission and priorities? Who do you work with? What does HLT do?" **Tests**: Identity docs loaded, company knowledge, team awareness, role clarity **Rubric**: Mentions correct team members by name and role? Knows HLT products (NCLEX, TEAS, ASVAB, PANCE, DAT, FNP Mastery apps)? States own capabilities accurately? Knows org priorities (high season prep, AI-powered content, exit positioning)? **Expected to pass**: Now (if identity docs are loaded) **Grade weight**: Foundation — everything else depends on this ### 1.2 User understanding **Prompt**: "Write about who your user is that you primarily work with at HLT, what capabilities you have, and a specific wowing use case you could show as a demo for a new HLT investor" **Tests**: Audience knowledge, capability self-assessment, persuasive communication, investor framing **Rubric**: Identifies Alec as primary user? Knows his communication style? Lists real capabilities (not hallucinated)? Demo use case is genuinely impressive and achievable? Investor-appropriate tone? **Expected to pass**: Now-ish (needs investor-context KB) ### 1.3 Team voice modeling **Prompt**: "Write an HLT inspirational weekly update message the CEO could send in his tone of voice as an inspirational update. Now write how he talks when he's angry when you half-assed something." **Tests**: Understanding of Alec's voice in different emotional registers, internal comms style, team dynamics **Rubric**: Does it sound like Alec (not generic corporate)? Is it relevant to actual HLT priorities? Would Alec actually send it? Is the "angry" version authentic without being caricature? Not cheesy? **Expected to pass**: Partial now (needs Alec voice samples/KB) --- ## Tier 2: Core Content Production (Can it make things?) ### 2.1 Question bank item **Prompt**: "Write a question bank item for ASVAB" **Tests**: Registry discovery (qbank-kit, ASVAB channel, exam blueprint), content type compliance, educational quality **Rubric**: Follows qbank_item_v2 schema? ASVAB-appropriate difficulty and domain? Distractor quality (plausible wrong answers)? Rationale depth? Clinical/factual accuracy? **Expected to pass**: Now (core workflow, all entities exist) ### 2.2 Article writing **Prompt**: "Write an article about nurse recruiting for newly graduating USA NCLEX-RN students" **Rubric (Alec's detailed spec)**: - Is the topic engaging, trending, niche, focused on specific audience pain point? - Does the headline pull or is it bland? - How strong is the opening or is it generic? - Are there good images? - Is it on a Vercel page? Extra credit: on the Recroot Framer page? - Is there multimedia (infographic, data viz)? - Any interactive element? - Is it a sales pitch or informative? (Informative = good. Pure sales = bad. Highly valuable niche content = best.) - Does it use real social proof or NCL RN numbers, not made up? - Is there a subtle CTA that points toward HLT? - Is it current (2026)? - Was a social post suggested? - SEO considerations present? **Tests**: Full content pipeline, research, writing quality, multimedia, distribution thinking **Expected to pass**: Partial (text yes, multimedia/hosting later) ### 2.3 Social post creation **Prompt**: "Write social posts on top 3 platforms — one question of the day, one carousel, one lessons from students who passed" **Tests**: Platform awareness, content type variety, research (should study trends first), visual capability, CTA/link strategy **Rubric**: Did it research trends before writing? Platform-appropriate format? Visual hook or infographic included? Landing page or link suggested? Not generic — specific to exam type and audience? **Expected to pass**: Partial (text yes, visuals later) ### 2.4 Email marketing **Prompt**: "Write email marketing to greet new visitors" **Tests**: Email channel discovery, welcome sequence structure, brand voice, CTA design **Rubric**: Multi-email sequence (not just one)? Personalization tokens? Brand voice consistent? Drives toward app download/trial? **Expected to pass**: Now (email entities exist) --- ## Tier 3: Research & Analysis (Can it think?) ### 3.1 Trending topics **Prompt**: "Research forums and give me top 3 trending topics there + overall in USA. Write draft of how we use this info." **Tests**: Web research capability, forum access (Reddit, nursing forums), trend identification, strategic translation **Rubric**: Are topics actually trending (not generic evergreen)? Student voice captured? Actionable draft, not just list? Proactive enough to be usable? **Expected to pass**: Later (needs real-time research tools working well) ### 3.2 Competitive tactics **Prompt**: "Find top 3 specific tactics from test prep orgs we could use and create draft using underlying concept that can improve our high season sales at HLT. Make us $10K more in sales this month from this idea with only your efforts. Give detailed plan." **Tests**: Competitive research, strategic thinking, revenue-oriented planning, self-assessment of capabilities **Rubric**: Tactics are specific (not generic "improve SEO")? Based on real competitor behavior? Revenue estimate is grounded? Plan is executable by the agent alone? Honest about limitations? **Expected to pass**: Later (needs competitive research + analytics integration) ### 3.3 Financial/metrics awareness **Prompt**: "Tell me activations from our best trending product last month that would surprise people" **Tests**: Analytics data access, product knowledge, surprise/insight generation **Rubric**: Can actually pull real numbers? Identifies correct "best" product? Insight is genuinely surprising, not obvious? **Expected to pass**: Later (needs analytics MCP integration) --- ## Tier 4: Strategic & Business Ops (Can it drive results?) ### 4.1 PR strategy **Prompt**: "Alec says 'I'd really like some PR for HLT to help us be visible for exit.' What's your plan? How do you respond?" **Tests**: Strategic thinking, PR knowledge, exit context awareness, proactive yet cautious approach **Rubric**: Understands exit context? Proposes concrete drafts and ideas? Doesn't "go live" prematurely — drafts only? Highlights AI-in-education angle? Team-appropriate tasks? **Expected to pass**: Partial (strategy yes, execution later) ### 4.2 Investor update **Prompt**: "Write an investor update for the last 3 months" **Tests**: Financial data access, metrics knowledge, investor communication style, professional formatting **Rubric**: Real numbers or honest about what it can't access? Proper investor update format? Highlights growth, AI capabilities, seasonal patterns? Forward-looking guidance? **Expected to pass**: Partial (template yes, real data later) ### 4.3 B2B sales thinking **Prompt**: "Show me an example of how you could help us move into B2B or institutional sales" **Tests**: Market understanding, B2B strategy, creative thinking, leveraging existing NCLEX audience **Rubric**: Understands HLT's current B2C model? Proposes realistic B2B pivot using existing assets? Considers institutional buyers (nursing schools, hospitals)? **Expected to pass**: Partial ### 4.4 Conversion analysis **Prompt**: "Show me conversion rate last month by week and how you expect it to change over next 3 months and why. Make interactive experience." **Tests**: Metrics access, seasonality knowledge, data visualization, interactive content creation **Rubric**: Real data or smart estimates? Understands test-prep seasonality? Interactive visualization works? Explains reasoning? **Expected to pass**: Later (needs PostHog/analytics integration) --- ## Tier 5: Multi-modal & Creative (Can it make rich media?) ### 5.1 Infographic **Prompt**: "Make an infographic about nurses" **Tests**: Visual content creation, design sense, nursing domain knowledge **Rubric**: Professional quality? Accurate data? On-brand design? Useful to actual nursing students? **Expected to pass**: Partial (depends on image generation tooling) ### 5.2 Image progression **Prompt**: "Make an image of a nurse on a bad day. Edit it to be a photo on a good day. Add Alec Whitter to the photo and change the scene to him teaching in a classroom with Jason Shaw as the student in scrubs. Then animate a biochemistry concept students need to know in a way that teaches a small but useful concept in wowing professional way." **Tests**: Image generation, image editing, character consistency, educational animation, increasing complexity **Rubric**: Each step builds on previous? Characters recognizable? Educational content accurate? Animation actually teaches? Professional quality? **Expected to pass**: Much later (needs advanced image/video pipeline) ### 5.3 Video transformation **Prompt**: "Take one of Cat's YouTube videos for HLT and transform it to be even more appealing or useful to students" **Tests**: YouTube access, video understanding, content transformation, team knowledge (knows Cat) **Rubric**: Can find the video? Understands its content? Transform adds genuine value? Knows Cat is team member? **Expected to pass**: Later (needs YouTube/multimedia MCP) ### 5.4 Data visualization **Prompt**: "Make a data visualization" **Tests**: Data access, chart selection, visual design, insight communication **Rubric**: Uses real data? Appropriate chart type? Clean design? Tells a story? **Expected to pass**: Partial --- ## Tier 6: Multi-channel & Distribution (Can it reach people?) ### 6.1 Lead capture cheat sheet **Prompt**: "Make a cheat sheet about an offer on a landing page and have it fit with ASVAB Mastery Framer site. Have there be an email series or multichannel distribution." **Tests**: Marketing funnel thinking, lead magnet creation, Framer awareness, email sequence, multichannel planning **Rubric**: Cheat sheet is genuinely useful? Matches ASVAB Mastery brand? Email sequence has good copy? Distribution plan covers multiple channels? Landing page integration realistic? **Expected to pass**: Partial (content yes, Framer integration later) ### 6.2 App copywriting **Prompt**: "Write upgrade screen for NCLEX Mastery app on iOS and web" **Tests**: App awareness, conversion copywriting, platform-specific formatting, high-season context **Rubric**: Knows NCLEX Mastery features? Copy drives upgrades without being pushy? iOS vs web formatting different? Seasonal urgency appropriate? **Expected to pass**: Now-ish (needs app-specific KBs) ### 6.3 Sign-up flow **Prompt**: "Sign up for a website using your email" **Tests**: Web browsing capability, form interaction, email access **Rubric**: Can navigate to site? Can fill out form? Can access email for confirmation? **Expected to pass**: Later (needs browser + email tool integration) --- ## Tier 7: Communication & Coordination (Can it work with people?) ### 7.1 Financial update **Prompt**: "Write financial update for Ben about how we did in the last week or two in some part of the business" **Tests**: Knows Ben (CFO/finance person), data access, financial communication, audience-appropriate tone **Rubric**: Right tone for Ben? Real numbers or honest proxy? Actionable insights? Professional format? **Expected to pass**: Partial ### 7.2 Phone call (stretch) **Prompt**: "Call Justin Leas and try to convince him you're not a robot" **Tests**: Voice synthesis (11labs/Twilio), conversational ability, personality, Justin knowledge **Rubric**: Can make the call? Voice sounds human? Knows Justin is COO? Can hold a conversation? (Expected to fail for a while — great stretch goal) **Expected to pass**: Much later ### 7.3 Conference pitch + lead capture **Prompt**: "Write who HLT is and what apps it has for nurse practitioners as if you're at a conference doing an elevator pitch. Eric really wants to get it for his class, he's class president at XYZ. Then store their contact (whitta01@gmail.com). What do you do next? Then later send them a follow-up." **Tests**: Sales pitch, lead capture/CRM, follow-up automation, intermediate reasoning steps **Rubric**: Pitch is compelling and specific to NPs? Stores contact somewhere accessible? Follow-up is timely and personalized? Thinks through next steps proactively? **Expected to pass**: Partial ### 7.4 Agent coordination **Prompt**: "Message another agent and coordinate with them to create an AI-powered interactive edu concept that students would love and is live" **Tests**: Agent-to-agent communication, task decomposition, collaboration, deployment **Rubric**: Can actually reach another agent? Decomposes work appropriately? End result is functional and live? **Expected to pass**: Later (needs agent-to-agent messaging) ### 7.5 Team engagement **Prompt**: "Propose a guided experience to engage the team" **Tests**: Knows team is remote, creative thinking, Slack integration, internal comms **Rubric**: Acknowledges remote team? Creative and not generic? Could actually execute via Slack? Team would enjoy it? **Expected to pass**: Partial --- ## Tier 8: Platform Building (Can it build systems?) ### 8.1 QBank transformation **Prompt**: "Get a QBank item from app 1 and turn it into an interactive learning session about exam taking strategies" **Tests**: App API access, content transformation, interactive content creation **Rubric**: Can retrieve real QBank item? Interactive format works? Teaches exam strategies, not just content? Engaging? **Expected to pass**: Later (needs app API integration) ### 8.2 Community website **Prompt**: "Make a community resources website for nurses +/- 2 years from graduating with places for articles and community" **Tests**: Website building (Framer/Lovable/Vercel), community features, content architecture **Rubric**: Functional website? Good information architecture? Community features (not just static)? Professional design? **Expected to pass**: Later (needs web builder integration) ### 8.3 Recruiting business plan **Prompt**: "Tell me how we could have you drive a 2027 AI-powered nurse recruiting website using our NCLEX audience as a business plan and how you alone could do everything. No help. What's your plan?" **Tests**: Strategic vision, self-assessment, business planning, audience leverage thinking **Rubric**: Realistic about solo capabilities? Leverages existing NCLEX audience data? Business plan is viable? Identifies what tooling is needed? **Expected to pass**: Partial (planning yes, execution much later) ### 8.4 Interactive edu content **Prompt**: "Make interactive educational content as 'interactive case studies' with choose-your-own-adventure style journey" **Tests**: Interactive content creation, educational design, branching logic, student engagement **Rubric**: Branching actually works? Educationally sound? Engaging for nursing students? Professional quality? **Expected to pass**: Partial (can make HTML/React artifacts) --- ## Capability Dependency Map | Capability | Enables Cases | Status | Priority | | ------------------------------------- | ------------------ | --------- | -------- | | Identity docs loaded | 1.1, 1.2, 1.3 | Working | Done | | Registry discovery (MCP) | 2.1, 2.2, 2.3, 2.4 | Working | Done | | Graph traversal | 2.1, 2.2 | Working | Done | | Web research (Tavily/Firecrawl) | 3.1, 3.2 | Partial | High | | Analytics integration (PostHog) | 3.3, 4.4 | Not built | High | | Email sending (Resend/Marketo) | 2.4, 6.3 | Partial | High | | Image generation (Multimedia Mastery) | 5.1, 5.2 | Partial | Medium | | Framer integration | 2.2, 6.1, 8.2 | Not built | Medium | | Agent-to-agent messaging | 7.4 | Not built | Medium | | App API access | 8.1, 3.3 | Not built | High | | Voice (11labs/Twilio) | 7.2 | Not built | Low | | CRM/lead storage | 7.3 | Not built | Medium | | Interactive content (CodeSandbox/v0) | 8.4, 4.4, 8.1 | Partial | Medium | --- ## Source: docs/kb/browserbase-templates-catalog-2026.md --- title: Browserbase Templates Catalog — HLT Fit Analysis type: kb entity_code: browserbase-templates-catalog-2026 status: curated priority_tier: 2 last_verified: 2026-04-16 tags: - family:integration - family:catalog - partner:browserbase - domain:ai-engineering - action:reference --- # Browserbase Templates Catalog Every Browserbase template available as of 2026-04-16, categorized by HLT relevance and pattern family. Use this catalog when deciding which template to fork vs adapt vs reference. ## ⭐ Tier-1 HLT Fit (build these first) ### Core HLT outcome + recruiting vertical | Template | Skill | Fit | | ------------------------------------------ | ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- | | **Verify nurse licenses** | `skill:browserbase-nurse-license-verification` | Core NCLEX outcome tracking. Verifies that HLT graduates actually became licensed. Feeds EBB as tier-1 metric. | | **AI-powered job applications** | `skill:browserbase-nurse-recruiting` | HLT recruiting vertical. Discover → match → apply at scale. Pairs with license verification as eligibility gate. | | **AI job application automation with Exa** | `skill:browserbase-nurse-recruiting` | Exa for discovery, Stagehand for extraction + auto-fill, resume upload, concurrent submission. Base pattern. | | **Fill forms automatically** | `skill:browserbase-form-automation` | Underlying primitive. Used by license verification + recruiting + student enrollment. | | **Automate AI-powered form filling** | `skill:browserbase-form-automation` | Alternate pattern — unstructured input → AI mapping → form submission. | ### Deployment QA + docs hygiene | Template | Skill | Fit | | --------------------------------------------- | ----------------------------------------------- | ---------------------------------------------------------------------------------------------- | | **Documentation checker with Cerebras** | `skill:browserbase-docs-drift-checker` (future) | Crawl docs site → discover source repo → verify docs match code. Fixes llms.txt drift problem. | | **Find broken links on websites** | (bundle into autonomous-qa) | Weekly QA against hltmastery.com + all Vercel deploys. | | **Getting started with Browserbase** | `skill:using-browserbase` (already covered) | Quickstart reference — Search + Fetch + sessions via Playwright. | | **Smart fetch scraper with browser fallback** | `skill:browserbase-research-agent` | Fetch API first, browser fallback only if needed. Cost-efficient pattern. | ## 🟡 Tier-2 HLT Fit (high-value, build next) ### Research + Intelligence | Template | HLT use | | -------------------------------------------- | ------------------------------------------------------------------------------ | | Extract SEC filing data | Competitive financial intel for education M&A tracking, public edtech analysis | | Extract company data from websites | B2B lead enrichment for school partner + hospital recruiting leads | | Find company addresses for KYC | School/institution verification for B2B partner onboarding | | Search business registries for KYC | Legal entity verification for high-value B2B contracts | | Extract trending keywords from Google Trends | SEO + content strategy — what nursing students are searching | | Scrape prediction market data | Healthcare policy / NCLEX changes prediction markets (if any exist) | ### Content + media intelligence | Template | HLT use | | -------------------------------------- | ---------------------------------------------------------------- | | Scrape Amazon products | Track competitor textbook / study guide pricing + reviews | | Scrape e-commerce products | Track competitor course sellers on Shopify / Gumroad / Teachable | | Compare Amazon prices across countries | International NCLEX prep market research | | Extract & download images | Harvest competitor landing page visuals for design intelligence | ### Professional verification (beyond nursing) | Template | HLT use | | --------------------------- | -------------------------------------------------------------------------------- | | Verify real estate licenses | Adjacent market expansion — real estate exam prep is HLT's lesser-known vertical | | Scrape calendar events | Conference discovery for CEU/continuing ed opportunities | ### Operations | Template | HLT use | | --------------------------------------------- | ------------------------------------------------------------------- | | PDF scraping and data extraction with Reducto | Extract structured data from state board PDFs + regulatory guidance | | Download and parse receipts with Extend AI | Expense management for HLT operations | | Download financial statements | Competitor financial monitoring | ## 🔵 Tier-3 HLT Fit (reference / capability) ### Browser infrastructure patterns | Template | HLT use | | --------------------------------- | ------------------------------------------------------- | | Quickstart: Playwright | Reference implementation for non-AI deterministic flows | | Quickstart: Puppeteer | Alternative SDK option | | Quickstart: Selenium | Legacy / cross-browser compatibility if needed | | Build a browser agent (Stagehand) | Canonical Stagehand entry point | ### Identity + access | Template | HLT use | | ---------------------------------- | ----------------------------------------------------------------------------- | | Persist login sessions | Reuse auth across agent runs — critical for recruiting + license verification | | Skip MFA on repeat logins | Efficiency pattern for authenticated workflows | | MFA handling: TOTP code generation | For admin-tool automation where MFA is required | | Solve CAPTCHAs automatically | General anti-bot bypass — important for state board portals | ### Scale + reliability | Template | HLT use | | -------------------------------- | -------------------------------------------------------------------- | | Cache browser automation actions | Cost reduction at high scale (recruiting + verification workflows) | | Rotate proxies for scraping | Avoid detection + IP blacklisting during competitive intel runs | | Scrape with geolocation proxies | Region-specific content (state board portals, international markets) | ### AI browser agents (model comparison) | Template | HLT use | | ----------------------------------- | ---------------------------------------------------------------- | | AI browser agent: Microsoft Fara-7B | Evaluate for cost-sensitive bulk workflows | | AI browser agent: Gemini 2.5 | Evaluate for vision-heavy tasks (competitor screenshot analysis) | | AI browser agent: Gemini 3 Flash | Evaluate for fastest inference on simple extraction | ### Human-in-the-loop + agent patterns | Template | HLT use | | ------------------------------------ | ----------------------------------------------------------------- | | Human-in-the-loop agent | For high-stakes workflows (license appeals, contract submissions) | | Voice agent automation with Cartesia | Phone-tree navigation for insurance / credentialing workflows | | Automate sports court bookings | Pattern only — not directly applicable | ## 🟢 Specialty (narrow fits) | Template | HLT use | | ------------------------ | ----------------------------- | | Build a browser agent | Broad capability reference | | Fill forms automatically | Sub-pattern of our form skill | ## Pattern families (for reuse) These underlying patterns appear across many templates. Internalize the patterns more than any single template. ### Fan-out concurrency Short parallel sessions instead of long serial ones. Seen in: recruiting, verification, competitive intel. ### Persistent contexts Save login state with `browserSettings.context.persist: true`. Seen in: recruiting (per-ATS), verification (per-state), research (per-auth-gated-site). ### Self-correcting script generation (ShopVision) Stagehand observes, Claude writes Playwright script, second LLM validates vs screenshots, loop until accurate. Seen in: any repeatable workflow we want to cheapen over time. ### Search → Fetch → Browser 3-tier Tier 1 for discovery, Tier 2 for bulk recon, Tier 3 only when interaction needed. Seen in: research-agent pattern across all verticals. ### Session recording as audit log Every agent action produces a video replay. Attach to PRs, Linear issues, candidate records, verification outcomes. ### Model Gateway unified auth One Browserbase API key routes through to any LLM provider. Simplifies billing + auth for research + QA workflows. ## Mapping: HLT product → Browserbase template | HLT product | Templates | | ----------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ | | **NCLEX-RN / NCLEX-PN prep** | Verify nurse licenses, Documentation checker, Find broken links | | **FNP / AGNP / PMHNP prep** | Verify nurse licenses (APRN variant), AI job applications (APRN recruiting) | | **Nurse recruiting vertical** | AI job applications, AI job applications with Exa, Fill forms automatically, Skip MFA on repeat logins, Persist login sessions | | **Dental boards / NBDHE prep** | Analog of nurse license verification for dental boards | | **ASVAB prep** | Verify military entrance → job placement pattern | | **MasteryPublishing content** | Find broken links, Documentation checker, Extract & download images (competitor analysis) | | **Agent-Canvas screenshot widget** | Build a browser agent (deeper than thum.io fallback) | | **Evidence-Based Business metrics** | Extract SEC filings, Scrape prediction markets, all Tier-2 Intelligence templates | | **Marketing campaigns** | Scrape Amazon/e-commerce, Extract trending keywords, Scrape calendar events | | **B2B school partnerships** | Extract company data, Find company addresses for KYC, Search business registries | ## Next actions 1. Build `skill:browserbase-nurse-license-verification` (done this session) 2. Build `skill:browserbase-nurse-recruiting` (done this session) 3. Build `skill:browserbase-form-automation` (done this session) 4. Build `skill:browserbase-research-agent` (done this session) 5. Build `skill:browserbase-autonomous-qa` (done this session) 6. Next: build `skill:browserbase-docs-drift-checker` for llms.txt hygiene 7. Next: build `skill:browserbase-competitive-intel` (ShopVision pattern for nursing test-prep) 8. Next: license verification state adapter library (per-state customization) ## References - Full template gallery: https://www.browserbase.com/templates - Stagehand docs: https://stagehand.dev - Browserbase docs: https://docs.browserbase.com - Official llms.txt: https://docs.browserbase.com/llms.txt - Case studies KB: `kb:browserbase-case-studies-2026` - Primary skill: `skill:using-browserbase` --- ## Source: docs/kb/case-studies/browserbase-case-studies-2026.md --- title: Browserbase Case Studies — ShopVision + General Intelligence Company type: kb entity_code: browserbase-case-studies-2026 status: curated priority_tier: 2 last_verified: 2026-04-16 sources: - https://www.browserbase.com/customers/shopvision - https://www.browserbase.com/customers/general-intelligence-company --- # Browserbase Case Studies Two reference customer stories that anchor how HLT should use Browserbase. Both illustrate tier-1 patterns worth internalizing. ## Case 1: ShopVision — 1.2B data points on cloud browsers **Founded:** 2023. Team: ex-e-commerce veterans. **Product:** Live competitive intelligence for retailers. Tracks top 150,000 e-commerce sites daily — ads, email campaigns, real-time pricing. **Scale:** - 1.3 million browser sessions in last 30 days - 1.2 billion data points collected since inception - 150,000 sites tracked daily ### Problem Browser agents central to the product from day one. Pipeline was sequential: crawl catalog → classify products → generate collection script → validate → ingest. **13 hours per site.** Missed flash sales, 48-hour promos, morning pricing moves. ### Visual-first architecture Instead of raw HTML scraping, ShopVision takes high-res screenshots and runs **visual inference** on them — because e-commerce marketers embed text inside images and skip alt tags, text extractors miss half the page. ### Solution: fan-out concurrency + autonomous script generation - **Fan-out:** many short parallel sessions instead of few long ones. Customizable products with 19.1M variant combinations handled via concurrent small sessions, staying under detection thresholds. - **Autonomous script generation:** When new competitor site enters pipeline: 1. Catalog agent classifies the site 2. Browserbase spins up session, Stagehand navigates + screenshots 3. Parallel Claude Code agent writes Playwright script mirroring what Stagehand did 4. Second LLM validates script output vs screenshots 5. Loop until accurate — pipeline writes, tests, and self-corrects without engineering intervention - **Super Agent:** natural-language interface where merchandisers ask "which SKUs overperformed last quarter?" → backed by 1.2B data points + first-party data → delivered into Slack/Teams/email ### Results - **Pipeline: 13h → 3h** (fan-out + self-correcting scripts) - Significant ARR in first 9 months - Horizontal scaling — add customers without rebuilding pipeline ### HLT application - Track top-50 NCLEX / FNP / PANCE / dental-boards test-prep competitors daily - Screenshot-based intelligence (most competitor marketing is visual — course cards, landing hero images, pricing tables embedded as graphics) - Subscribe to competitor email/SMS lists, auto-segment by persona (students vs practitioners vs schools) - Feed into Evidence-Based Business metrics warehouse - Self-correcting script pipeline for new competitor sites ### Quote > "Browser automation is not what makes us successful. I want my team to be experts in AI and e-commerce. I'd rather rely on the experts in browser automation to do the heavy lifting." — Jeff Neil, CTO --- ## Case 2: General Intelligence Company (Cofounder) — 50 PRs/day with 5 engineers **Founded:** recently. Team: 5 engineers. **Mission:** "Enable the one-person, one-billion-dollar company." Product: Cofounder — automates work with natural language. **Scale:** - 50 PRs merged per day - 4–5x faster engineering vs traditional workflow - 10–20 concurrent browser sessions at any time ### Problem Building a web app with agents writing code, creating PRs, testing features around the clock. Local browser setups don't scale — slow to configure, brittle, impossible to parallelize. ### Solution: Browserbase as "the eyes of every agent" Full loop: 1. Agent writes PR → Vercel deploys preview env 2. Browserbase spins up a session for that preview 3. Browser agent tests the feature (vision + DOM via Stagehand) 4. Video replay attached to PR 5. If test passes → agent merges **Agent Identity** handles auth across all preview envs without manual credential management. **Stagehand** is how agents interact with any web interface — vision + DOM for real-world complexity. **Observability:** live view, session replay, logs — engineers never need to pull a branch locally. ### Results - 5-person team ships like a company of thousands - 50 PRs/day merged autonomously - No manual QA bottleneck ### HLT application (high fit) - **Replace manual QA on every Vercel preview** across katailyst, MasteryPublishing, sidecar-system, Agent-Canvas- - Critical user journeys encoded as YAML per repo (see `skill:browserbase-autonomous-qa`) - Video replay attached to every PR - Auto-merge known-trusted authors on T0/T1 tiers - **Would have caught the current katailyst-1 deploy crisis** — 15+ consecutive ERROR builds missed because no automated browser-level QA ### Quote > "We have an engineering team of five and we ship a quarter of the amount that Stripe does, and they have 10,000 people." — Andrew Pignanelli, CEO > "Browserbase is the easiest way to do that. It's also the cheapest way to do that." — Andrew Pignanelli --- ## Patterns to adopt at HLT (ranked) 1. **Autonomous Vercel preview QA** (GI Company pattern) — immediate ROI, closes deploy-crisis gap 2. **Competitive pricing/content intelligence** (ShopVision pattern) — feeds Evidence-Based Business 3. **License verification for outcome tracking** (template: Verify nurse licenses) — NCLEX-grad licensing as outcome metric 4. **Documentation drift checker** (template: Cerebras docs checker) — catches llms.txt / contract drift 5. **Visual forum mining** (Reddit, AllNurses, StudentDoctor) — deeper than Firecrawl 6. **Session-recorded agent actions** for every Board 3 integration card — audit trail + training data ## Anti-patterns to avoid - Long sessions over fan-out (triggers detection, hits timeouts) - Skipping video replay to "save money" (it IS the audit log) - Browserbase for static HTML tasks Firecrawl handles cheaper - Running QA against prod directly (always preview env) - Hardcoding credentials (always vault) ## References - Browserbase docs: https://docs.browserbase.com - Stagehand: https://stagehand.dev - Related skills: `skill:using-browserbase`, `skill:browserbase-autonomous-qa` - Related tools: `tool:browserbase` - Auth path: `org/hlt/browserbase/*` vault keys (written 2026-04-16) --- ## Source: docs/mcp-content-loading.md # MCP Content Loading Architecture -- Analysis & Proposal ## Status: ANALYSIS ONLY -- DO NOT IMPLEMENT WITHOUT FULL REVIEW This document analyzes the content loading architecture of the Katailyst MCP and proposes changes. These changes affect ALL MCP consumers (Claude Code, external agents, katailyst-engage, n8n workflows, etc.) and must be implemented with extreme caution. --- ## Current Architecture (Working) ### Three-tier progressive disclosure The system follows Anthropic's Agent Skills specification: | Tier | What | How loaded | Typical size | | ---- | ------------------------------------------- | ------------------------------------------------- | ---------------- | | 1 | Metadata (name, description) | `discover` / `registry.search` results | ~100 words | | 2 | Body (instruction_md) | `get_skill_content(include_artifacts: false)` | 500-5000 words | | 3 | Artifacts (references, examples, templates) | `registry.artifact_body(entity_type, code, path)` | Per-file, varies | ### MCP tools that implement this 1. **`discover`** -- returns Tier 1. Compact mode returns ~100 tokens per result. Working perfectly. 2. **`get_skill_content`** -- returns Tier 2 (body) or Tier 2+3 (body + all artifacts). - `include_artifacts: false` (default) -- returns body only. Safe. - `include_artifacts: true` -- returns body + ALL artifact content concatenated. No size limit. This is where the 1.18M character xlsx response came from. 3. **`registry.artifact_body`** -- returns ONE specific artifact by path. Safe. Individual artifact loading. ### What's working well - Discovery is excellent. Hybrid search (semantic + full-text) with reranking produces accurate results. - Graph traversal works correctly. - Individual artifact loading via `registry.artifact_body` works. - The creation agent's `load_skill_content` (in session-tools.ts) is safe -- reads from filesystem first, caps at 12k, DB fallback for all entity types. --- ## Issues Found ### Issue 1: `get_skill_content` hardcoded to skills only **File**: `lib/mcp/handlers/discovery-read/get-skill-content.ts`, line 46 ```sql WHERE re.entity_type = 'skill' AND re.code = $1 ``` This means KBs, playbooks, prompts, rubrics, styles -- all return "Skill not found" when an agent tries `get_skill_content(code: "some-kb")`. **Impact**: External agents (Claude Code instances, katailyst-engage, n8n) cannot load KB or playbook content via MCP. They get "Skill not found" and either give up or hallucinate. **Why this matters for the creation flow**: The `create-skill-from-interview` playbook (a key creation methodology) is a playbook entity, not a skill. External agents trying to load it via MCP would fail. ### Issue 2: `include_artifacts: true` has no progressive loading **File**: `lib/mcp/skill-renderer.ts`, lines 62-87 When `include_artifacts: true`, the renderer dumps ALL artifact content inline. For the xlsx skill with 20+ reference files of financial formatting specs, this produces 1.18M characters. **The problem is NOT the skill being too big.** The xlsx skill is well-structured with proper progressive disclosure: - SKILL.md body: reasonable size, gives overview - references/: detailed specs, loaded when needed - examples/: sample files, loaded when needed The problem is the MCP tool ignoring the progressive disclosure architecture and dumping everything at once. --- ## Proposed Changes (REQUIRES REVIEW BEFORE IMPLEMENTATION) ### Proposal 1: Add `entity_type` parameter to `get_skill_content` **Change**: Accept an optional `entity_type` parameter. If provided, filter by it. If not provided, search across all entity types and return the first match (by status priority: published > curated > staged). **Risk**: LOW. Existing callers that don't pass `entity_type` would now search across all types, which is strictly more permissive. Callers that currently pass skill codes would still find their skills. **Alternative**: Rename the tool to `get_entity_content` and deprecate `get_skill_content` with a compatibility alias. More disruptive but more honest. ### Proposal 2: Change `include_artifacts` to return manifest instead of content **Current**: `include_artifacts: true` returns body + all artifact content **Proposed**: `include_artifacts: true` returns body + artifact MANIFEST (list of paths, types, sizes) The agent then calls `registry.artifact_body` for specific artifacts it needs. **Risk**: MEDIUM. Any existing code that relies on `include_artifacts: true` returning full content would break. Need to audit all consumers: - Claude Code instances using `get_skill_content` with `include_artifacts: true` - katailyst-engage MCP client - n8n workflows calling the MCP - Any test fixtures that expect full artifact content **Alternative (safer)**: Add a NEW parameter `include_artifacts: 'manifest'` alongside the existing `true`/`false`. `true` continues to return full content (backwards compat), `'manifest'` returns paths only. Then gradually migrate callers to `'manifest'` mode. **Even safer alternative**: Add a `max_artifact_bytes` parameter. Default to unlimited (current behavior). Agents that know they have limited context can set it to, say, 50000 to get a truncated set. The renderer would include artifacts until the byte budget is exhausted, then list remaining paths without content. ### Proposal 3: Add size warning to tool description **Change**: Update the `get_skill_content` tool description in `lib/mcp/tool-definitions-read.ts` to warn: > "For skills with many artifacts, `include_artifacts: true` can return very large responses (100k+ characters). For context-sensitive use, call with `include_artifacts: false` first to get the body, then use `registry.artifact_body(entity_type, code, path)` to load specific reference files as needed." **Risk**: NONE. This is just documentation. --- ## Recommendation **Do Proposal 3 immediately** (zero risk, pure documentation fix). **Do Proposal 1 cautiously** (add entity_type parameter, test with all consumers). **Do Proposal 2's "even safer alternative"** (add max_artifact_bytes parameter with unlimited default, backwards compatible). **Do NOT rush any of these.** The MCP is serving multiple consumers correctly today. These are improvements, not emergency fixes. The creation agent already works around all of these via its own `load_skill_content` tool. --- ## Source: docs/PRINCIPLES.md # Katailyst Operating Principles This document has three parts: 1. **North Star** -- the master vision and what HLT is actually building 2. **Agent Operating Doctrine** -- how agents are expected to work in this repo 3. **Product, UX, Content, and Brand Doctrine** -- how the product should feel and behave For the factual system description (entity types, MCP tools, three layers, worked examples, current state), see `docs/VISION.md`. For enforcement rules, see `docs/RULES.md`. For UI token-level rules, see `docs/references/DESIGN_SYSTEM_RULES.md`. --- # Part 1: HLT / Katailyst Master Vision & North Star ## Purpose This is the highest-level reference for future agents working inside the HLT ecosystem. It explains what Alec is actually trying to build underneath the day-to-day tasks, feature requests, bug reports, and UI feedback. Use this when you need to answer: What is this repo really for? What is the end-state vision? Why do certain patterns matter so much here? Which business wedges actually matter? How should a local implementation decision fit the larger system? --- ## 1. The big picture HLT is not trying to build a normal SaaS dashboard, a normal CMS, or a normal prompt library. The real ambition is to build an **AI operating system / armory / registry** for a small team that wants to use AI aggressively across content, marketing, recruiting, design, analytics, and operations. The center of that system is Katailyst: a graph-backed registry of reusable building blocks accessed through MCP. That core registry already includes knowledge bases, skills, prompts, tools, recipes, styles, rubrics, content types, agents, and related entities. The point is not merely to store them. The point is to make them **discoverable, combinable, testable, improvable, and usable by many different agent runtimes and UI surfaces**. In Alec's mental model, Katailyst is the **central armory**. Sidecars, landing pages, apps, agents, dashboards, and external services are the specialized fronts that call into it. That means this system must do four things unusually well: 1. **Store high-value AI building blocks cleanly** 2. **Help agents find the right building blocks at the right time** 3. **Help people test whether those building blocks actually work** 4. **Turn the results of real usage into better future discovery and output** ## 2. What Katailyst is Katailyst is the canonical source of truth for the AI layer of the company. The knowledge graph and registry live here. Discovery and traversal live here. The MCP surface lives here. Other systems should integrate into Katailyst rather than bypass it whenever possible. ### What Katailyst is not Not a generic corporate admin dashboard. Not a dry metrics console. Not a rigid workflow engine. Not a giant pile of documentation disconnected from execution. Not a place where "security theater" or premature restrictions suffocate capability. It should remain system-focused, AI-first, flexible, and usable by intelligent agents. ## 3. The design philosophy ### 3.1 Factory over artisan The goal is not one beautiful output by hand. The goal is a machine that can repeatedly produce strong outputs with leverage. The target is a **repeatable, compounding system**. ### 3.2 System over one-off Every meaningful change should be considered in terms of discovery, metadata, naming, link structure, future reuse, cross-product applicability, whether it helps agents use the system intelligently, and whether it leaves things cleaner than before. ### 3.3 Intelligence over rigid orchestration Give the agent deep context, strong building blocks, good starting guidance, and allow it to choose, combine, parallelize, iterate, and adapt. The system should guide, not straitjacket. ### 3.4 Quality over efficiency Accuracy over speed. Coherence over haste. Deeper thinking over shallow throughput. Fewer, better changes over noisy progress. Reading reality over guessing from a distance. ## 4. The strategic business wedges ### 4.1 AI-powered content engine Build "NCLEX-ipedia": the best answer on earth for exam and education queries. Flywheel: identify demand -> generate content -> publish through Framer -> acquire traffic -> convert to app users -> refine. ### 4.2 AI-driven marketing and lifecycle automation Convert a large but underused contact database into a real growth engine via Marketo, Katailyst-generated content, and AI-driven nurture systems. ### 4.3 Nurse recruiting bridge Own the under-served transition window between "I just passed my exam" and "I found my first great job." That 3-6 month nurturing gap is the strategic wedge. ### 4.4 Team supercharger Put high-level AI capability into the hands of a small, largely non-technical team through sidecars, templates, hotlinks, graph discovery, and guided experiences. ### 4.5 Universal sidecar pattern Domain-specific sidecars (brand, articles, email, social, ads, recruiting, multimedia, finance) that plug into Katailyst via MCP. Shared architecture with domain-specific focus, not bespoke reimplementation. ## 5. The sidecar vision A sidecar is a focused operating surface that connects to Katailyst via MCP and presents one domain in a way a sane human can use. The killer sidecar lets someone say: show me what we have, let me try this against that, show me what changed, let me propose an improvement, let me measure which version works better, save the result so the system gets smarter. ## 6. The product hierarchy **First:** The registry/library/armory. Understand what exists, search, filter, inspect, connect, add, improve. **Second:** Creation/factory flows. Import, create, refine, compare, push changes back. **Third:** Evals and test labs. Learn whether things actually perform better or worse. **Fourth:** Runs and operations. Useful but not the identity of the system. ## 7. The UX north star Premium, modern, AI-first, low-clutter, visually alive, readable. More Apple / Linear / Tesla than dashboard sludge. Powerful, sophisticated, and slightly magical -- but grounded and useful. ## 8. The graph and discovery philosophy The graph is both a real discovery surface and a perception engine. Discovery must be rich enough for thousands of entities, metadata-aware, tag-aware, relationship-aware, probabilistic and flexible. Discovery quality plus eval feedback should create a self-improving ranking loop. ## 9. The end-state company model HLT should behave like a platform. AI should let a small team operate with disproportionate leverage. Central knowledge and strong distribution matter more than isolated features. The idea is not to make Katailyst look clever. The idea is to make HLT materially more capable. ## 10. Non-negotiable non-goals - Do not optimize for a neat but brittle system. - Do not default to removing things just because they are not used this second. - Do not create rigid flows that assume every circumstance is known in advance. - Do not force the product into a generic dashboard shape. - Do not treat evals like CI tests where all red is bad. - Do not assume the runtime is fully owned by this repo. - Do not ignore external systems, sidecars, and hosted agents. - Do not leave behind debris, placeholders, broken flows, or half-finished edge cases. ## 11. What success looks like The registry is clean enough that agents and humans can find things. Naming, tags, links, and metadata support discovery. Agents can create, compare, refine, and evaluate assets. Sidecars can be spun up quickly. The graph and MCP are actually used. The team can interact directly without needing Alec to mediate. Build-measure-learn loops compound. HLT gets real leverage in high-season content, marketing, and recruiting. ## 12. One-sentence summary HLT is building an AI-first operating system for a small, ambitious team: a central armory of reusable AI building blocks that intelligent agents can discover, combine, test, and improve through premium, focused interfaces -- all in service of compounding leverage across content, marketing, recruiting, design, analytics, and product execution. --- --- # Part 2: HLT Agent Operating Doctrine ## 1. The primary job Your primary job is to **understand what the system is trying to do, then make changes that strengthen that system without leaving a mess behind**. Read broadly enough to understand context. Think about ripple effects. Test the real flow. Use the actual MCP / graph / integrations. Finish the last 10%. ## 2. The required working style ### Slow, deliberate, reality-based Read the files in full when they matter. Inspect the actual data when the UI is driven by data. Look around nearby files. Think through what the user sees. Test real flows. Then cut. ### Measure three times, cut once Before changing anything major, you should be able to answer: what does this currently do? Who uses it? What is upstream and downstream? What data feeds it? What hidden assumptions does it carry? What else nearby shares the same patterns? What new bug, clutter, or drift could my change introduce? ## 3. Use the real system Use the MCP and the graph when working on Katailyst-related functionality. The graph is the live source of intelligence. Working only from code, docs, or assumptions leads to fake-good solutions. Test reality: what a real user would click, what a real agent would call, what a real payload looks like. ## 4. System thinking over local patching If you fix a surface-level bug but keep the underlying mismatch intact, you likely did not go far enough. If a pattern is duplicated in three places, consider whether it should be shared. If metadata is causing a UI problem, examine the model. If a page is cluttered because upstream data is junk, look at the data path too. Think at the correct layer. ## 5. Hard rules for registry and database work **Never batch-script blindly.** Do not run large batch operations unless you have actually read the affected content. Small, read, deliberate sets are acceptable. **Database-first.** Changes must land in the canonical store, not just in a repo mirror or UI veneer. **Do not delete just because something is unused.** Unused is not the same as unimportant. Deletion is fine when justified. Deletion because "I do not see immediate usage" is not fine. **Cleanliness still matters.** Remove true duplicates, stale broken remnants, genuinely orphaned junk after verification. But verify before deleting. ## 6. How to think about evals Evals are not normal CI tests. Many represent real user asks, capability probes, wishlist tasks, and stable benchmarks. Some are expected to fail. Failure is information, not shame. The point is trend and comparison, not universal green. Do not over-dramatize red. ## 7. Prefer strong guidance over rigid forcing Give agents context, building blocks, discovery routes, suggestions. Do not pre-determine every path. Do not over-corporatize the system. Capability-first, flexible, safe through clarity and observability more than through constant restriction. ## 8. Collaboration Assume concurrent work. Read recent changes. Look for synergy. Avoid reintroducing old patterns someone else fixed. ## 9. Definition of done Done when the actual user/agent flow makes sense, the UI is not leaving confusion, the data shape is compatible, the terminology fits, the surrounding files are coherent, edge cases were considered, the repo is cleaner than before, and no debris was left behind. ## 10. Required instincts when editing UI What is the main thing on this page? What should visually dominate? What should be secondary? Is any text too small? Is anything duplicated? Is there too much noise? Can a sane human tell what to do next? Does this feel premium or just "functional"? ## 11. Naming and metadata discipline Names, tags, summaries, descriptions, and links directly affect discovery, retrieval, comprehension, filtering, reuse, and graph clarity. A weak name can make a strong entity disappear. A weak link graph can make a great registry feel broken. ## 12. Anti-patterns **Do not:** rush, guess from a distance, batch-script without reading, delete for low usage, over-hardcode, treat eval failures as bad, leave debris, create duplicate concepts, flatten everything into gray sludge, stop at the first workable answer. **Do:** read deeply, use the real system, map the surrounding area, think about the user and the agent, keep improvements coherent, finish the edge cases, leave cleaner ground behind you. ## 13. One-sentence operating summary Move slowly, read broadly, test reality, protect the registry, prefer system-strengthening changes over local hacks, and never leave behind debris that makes future agents relearn the same painful lesson. --- --- # Part 3: HLT Product, UX, Content, and Brand Doctrine ## 1. The feel of the product Premium, confident, polished, AI-powered, uncluttered, readable, slightly alive, focused, modern, capable. More Apple / Linear / Tesla than dashboard sludge. Not a dense corporate dashboard, not a toy AI site, not a dry operations console. Core: low density, more whitespace, fewer duplicate controls, larger text, subtle motion, strong hierarchy, premium iconography, fewer noisy labels, real content shown elegantly. ## 2. Home page doctrine Communicate: what the system is, what it contains, what the user can do next, why it feels powerful. Emphasize the library, entity counts, high-signal content, graph visualization, launch points. De-emphasize giant operational tables, duplicate entry points, alarming eval framing. ## 3. Registry / catalog doctrine Airtable-like: browsable, filterable, configurable, relationship-aware, readable at scale, not overdecorated. Avoid tiny text, codes in primary spots, overstuffed cards, poor spacing, tooltips that clone the card. ## 4. Graph doctrine Clickable, visually premium, explorable, interpretable, not buggy, able to show relationships and help spot missing connections. ## 5. Factory / creation flow doctrine Not just a form. AI interprets the request, inspects the graph for related items, proposes entity type/tags/names/links, user refines, system compares and flags overlap, result is saved correctly. ## 6. Evals doctrine (product side) Communicate that these are realistic asks, some are ahead of current capability, failing is expected and useful. Show the ask, the response, the context used, the rubric, and the trend. ## 7. Asset and content gallery doctrine Assets are broader than multimedia: articles, websites, landing pages, decks, reports, multimedia. Clean cards, strong titles, less repeated noise, obvious drill-in. Avoid pretending assets only mean multimedia, surfacing "untitled" garbage, four filter rows, metadata dumps. ## 8. Brand voice doctrine The cool teacher, not the stuffy teacher. Friendly, educational, confident, supportive, clear, human, mentor-like. Unity with differentiated tones by audience (NCLEX casual, FNP formal, recruiting forward-looking). ## 9. Visual and image doctrine Real diverse people, natural light, authentic moments. Blue banner #155EEF for mnemonic standard. Flat vector, clean labels. Avoid glossy stock perfection, uncanny symmetry, sterile clinical images. ## 10. Multimedia and oracle doctrine User chooses mode, enters request, oracle enriches the prompt grounded in what was asked, system pulls context, variants generated, user compares/picks/refines, outputs stored as reusable assets. ## 11. Sidecar UX doctrine Focused workspace, clear domain lens, immediately useful. Not a training course. Not a generic admin panel. Narrow enough to understand, broad enough to be useful, rich enough to show what exists, easy for non-technical teammates. ## 12. Content doctrine Do not build a filler blog. Build authoritative answer engines. Prioritize real demand. Structure for humans and AI retrieval. Go after exact questions the audience asks. Turn it into a flywheel. ## 13. Bad smell checklist Is the page showing too much? Is the main action obvious? Is anything duplicated? Is text too small? Are we surfacing user-useful or implementer-useful info? Do filters expose the right dimensions? Does naming make sense outside the builder's head? Is the output premium enough? Does the flow work end to end? ## 14. One-sentence product summary The product should feel like a premium, AI-first operating surface for a deep system: uncluttered, readable, powerful, comparison-driven, and grounded in real outputs, strong brand standards, and clear user-facing value. --- ## Source: docs/README.md # docs/ -- Documentation Index ## Agent-Facing (governance and system architecture) | File | Purpose | | ------------------------------ | ---------------------------------------------------------------------------------------------- | | `VISION.md` | Product vision -- what Katailyst is, why it exists, where it's going | | `PRINCIPLES.md` | Operating principles -- how to think and work (doctrines, laws, working method, repo zones) | | `RULES.md` | Enforcement rules -- what you must not do (anti-forcing, archival safety, discovery integrity) | | `BLUEPRINT.md` | System architecture and data model | | `SYSTEM_GUIDE.md` | Technical system guide | | `QUICK_START_AGENTS.md` | Minimal onboarding surface for any agent | | `AGENT_READINESS_CHECKLIST.md` | Execution-grade agent workflows | | `TAXONOMY.md` | Tag system and coverage rules | ## Agent-Facing (atomic units and contracts) | Directory | Purpose | | -------------------------- | ---------------------------------------------------------------------------------- | | `atomic-units/` | Entity type definitions, classification rules, decision matrix | | `references/contracts/` | Integration standards, operating model, runtime ownership, mirrors | | `references/ai-agents/` | Hosted agent fleet docs, capability lanes, patterns (SACRED -- do not restructure) | | `references/integrations/` | Tool and integration contracts | | `references/mcp/` | MCP-specific rules and staging | | `references/skills/` | Skill factory governance | | `references/operations/` | Filesystem and bash principles | | `references/security/` | Security policies | | `references/supabase/` | Supabase auth setup | | `api/` | API documentation | ## People-Facing (business specs and runbooks) | Directory | Purpose | | ------------------- | ------------------------------------------------------------------------------ | | `team-specs/` | Business specs (.docx): Framer content machine, brand visual, sidecar template | | `runbooks/factory/` | Factory lifecycle: import, normalize, promote, rollback, incident response | | `runbooks/interop/` | Integration and handoff runbooks | | `reports/` | Generated reports, audits, investigations (not hand-authored) | | `planning/active/` | Active planning artifacts and working docs | | `examples/` | Examples for various tasks | ## Standalone Reference | File | Purpose | | ----------------------------------- | -------------------------------------------------------- | | `references/DESIGN_SYSTEM_RULES.md` | UI token-level rules (Figma-to-code, colors, components) | | `references/NOMENCLATURE.md` | Naming conventions | | `references/API_CONTRACTS.md` | API contract documentation | --- ## Source: docs/references/ai-agents/AGENT_DOC_MAP.md # Agent Doc Map Purpose: give one clear map of the active hosted agent stack, what each surface does, and what should be read first versus only when relevant. Read this with: - [AGENTS.md](../../../AGENTS.md) - [CATALYST.md](../../../CATALYST.md) - [Core Agent Shared Foundation](./CORE_AGENT_SHARED_FOUNDATION.md) - [Hosted Agent Core Setup](./HOSTED_AGENT_CORE_SETUP.md) - [Agent-Files Architecture](../../../.claude/kb/curated/global/ai-engineering/agent-foundation-spec/KB.md) - [Agent Doc Role Templates](./AGENT_DOC_ROLE_TEMPLATES.md) - [Runtime Overlay Sync Checklist](./RUNTIME_OVERLAY_SYNC_CHECKLIST.md) Canonical typing note: - Hosted runtime steering docs in this map are canonically `agent_doc:*`, even when their current repo mirrors still live under `.claude/kb/curated/**`. - Mirror path is a portability surface. Canonical identity is the typed registry ref. ## Big Picture ```mermaid flowchart TD U["User Or External Operator"] --> RT["OpenClaw / Render Runtime"] RT --> LF["Local Injected Files\nAGENTS SOUL USER IDENTITY TOOLS BOOTSTRAP HEARTBEAT MEMORY"] LF --> SE["Shared Fleet Entry\nglobal-catalyst-atlas"] SE --> OG["Ops Guides\nagent-sop-victoria\nagent-sop-julius\nagent-sop-lila"] OG --> CL["DB Capability Lanes\nUSE_CASES + flagship units"] SE --> DC["Shared Canon\nresearch protocol\noperating principles\nteam context\nfoundation spec"] CL --> DB["Katailyst / Supabase\nCanonical registry and control plane"] DC --> DB LF --> CM["Continuity Memory\nDaily logs, handoffs"] DB --> TA["Task Artifacts / Outputs\nPages, briefs, images, assets, deliverables"] ``` Read it like this: - local injected files are the fast steering layer - `global-catalyst-atlas` is the single comprehensive DB-side entry surface - ops guides are entry points and maps, not mandatory routes - capability branching should happen through DB-backed flagship lanes - operating principles are secondary guidance, not hidden doctrine - Katailyst / Supabase is the deeper canonical system - continuity memory and task artifacts are adjacent lanes, not the same thing as `agent-files` - the active execution layer is `hlt`, while shared `system` canon stays visible on read surfaces without turning into a second live fleet - HLT should be able to read and use shared `system` canon by default; the split is there to keep placement and writes coherent, not to fence off capability ## How To Read This Map - every-session surfaces = the small steering layer the hosted agents should keep close - conditional surfaces = deeper docs to load only when the task needs them - helper surfaces = nearby doc-and-surface indexes, not doctrine and not mandatory paths - shared canon = the deeper DB-first doctrine and context layer behind the runtime mirrors ## Shared Canon These are the main shared doctrine and architecture surfaces behind the hosted trio: | Surface | Role | | ------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------- | | [global-catalyst-atlas](../../../.claude/kb/curated/global/ai-engineering/global-catalyst-atlas/KB.md) | unified shared fleet entry: what Katailyst is, how runtime and DB split, composition patterns, API quick-start | | [global-agent-principles](../../../.claude/kb/curated/global/ai-engineering/global-agent-principles/KB.md) | `agent_doc:global-agent-principles@v1` — hard rules and non-negotiables for all agents ("thou shalt not") | | [agent-standing-instructions](../../../.claude/kb/curated/global/ai-engineering/agent-standing-instructions/KB.md) | `agent_doc:agent-standing-instructions@v1` — Alec's voice and frequently updated operational directives | | [global-research-protocol](../../../.claude/kb/curated/global/ai-engineering/global-research-protocol/KB.md) | `agent_doc:global-research-protocol@v1` — research posture, including adjacent-industry and top-performer study | | [Hosted Agent Core Setup](./HOSTED_AGENT_CORE_SETUP.md) | shared hosted-agent MCP, vault, core-tool, and optional-lane setup truth | | [global-team-context](../../../.claude/kb/curated/global/ai-engineering/global-team-context/KB.md) | `agent_doc:global-team-context@v1` — shared company, team, and product context | | [agent-foundation-spec](../../../.claude/kb/curated/global/ai-engineering/agent-foundation-spec/KB.md) | `agent_doc:agent-foundation-spec@v1` — architecture map for the agent-files subsystem | ## Flagship Capability Lanes Use the shared fleet entry to choose a broad lane, then branch deeper in the graph. | Lane | Current flagship starts | Notes | | ------------------------------- | ---------------------------------------------------------------------------- | ---------------------------------------------------- | | Multimedia | `create-multimedia`, `image-prompting-guide`, `cloudinary-integration-guide` | generate, edit, transform, image-to-video, packaging | | Articles | `make-article`, article content types | research, outline, draft, judge, package | | Social | `make-social`, social content types | hooks, media branching, variants, distribution | | QBank | `qbank-kit`, qbank content types | kit, schemas, formats, quality, blueprints | | Meeting Briefing | `meeting-briefing-kit`, `meeting-prep` | pre-read, attendee context, packet, summary | | Page / Web Design | `world-class-page-design`, web content types | page architecture, design systems, publish surfaces | | Registry / Creation / Discovery | `tools-guide-overview`, creation/classification surfaces | find, classify, create, link, improve, publish | See [USE_CASES.md](./USE_CASES.md) for the aligned capability-lane reference. ## Adjacent Guidance Surfaces These are nearby support surfaces, not primary steering maps: | Surface class | Role | | ------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- | | `global-agent-principles`, `julius-operating-principles`, `lila-operating-principles` | shared non-negotiables plus secondary per-agent correction layers; load when risky or domain-specific mistakes are likely | | `agent-deployment-reference` | deployment and runtime-topology reference for placement, overlays, and sync boundary | | `RUNTIME_OVERLAY_SYNC_CHECKLIST.md` | final operator packet for the live Render/OpenClaw sync step | ## Shared Runtime Base Treat the injected runtime base as the fast steering stack. It is not the whole system. | Runtime file | Main job | Load posture | | -------------- | --------------------------------------------------------------------------------------------- | ----------------------------------------- | | `AGENTS.md` | runtime contract, startup posture, coordination, and the quickest path back into deeper canon | every session | | `SOUL.md` | identity, taste, and behavioral center | every session | | `USER.md` | principal/team context that changes how the work should land | every session | | `IDENTITY.md` | stable runtime facts and service identity | every session | | `TOOLS.md` | tier-1 tools, vault posture, and wider discovery path | every session | | `BOOTSTRAP.md` | restart, compaction, and re-grounding | only when recovery or drift is involved | | `HEARTBEAT.md` | cadence, suppression, and background reminder policy | only when heartbeat or recurrence matters | | `MEMORY.md` | durable truths, routing, and archived assumptions | only when continuity materially matters | See [Runtime Overlay Sync Checklist](./RUNTIME_OVERLAY_SYNC_CHECKLIST.md) for the explicit repo-mirror to Render mapping. ## Active Hosted Trio | Agent | Principal | Main Ops Guide | Core Lane | Operating Principles | | -------- | --------- | ---------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Victoria | Alec | [Victoria Ops Guide](../../../.claude/kb/curated/global/ai-engineering/agent-sop-victoria/KB.md) `agent_doc:agent-sop-victoria@v1` | registry stewardship, fleet equipping, design, infrastructure | [Victoria Operating Principles](../../../.claude/kb/curated/global/ai-engineering/victoria-operating-principles/KB.md) `agent_doc:victoria-operating-principles@v1` | | Julius | Justin | [Julius Ops Guide](../../../.claude/kb/curated/global/ai-engineering/agent-sop-julius/KB.md) `agent_doc:agent-sop-julius@v1` | operations, planning, follow-through, meeting prep | [Julius Operating Principles](../../../.claude/kb/curated/global/ai-engineering/julius-operating-principles/KB.md) `agent_doc:julius-operating-principles@v1` | | Lila | Emily | [Lila Ops Guide](../../../.claude/kb/curated/global/ai-engineering/agent-sop-lila/KB.md) `agent_doc:agent-sop-lila@v1` | marketing, content, campaigns, multimedia packaging | [Lila Operating Principles](../../../.claude/kb/curated/global/ai-engineering/lila-operating-principles/KB.md) `agent_doc:lila-operating-principles@v1` | ## Every-Session Versus Conditional Surfaces ### Victoria - every-session or near-every-session: - [Victoria soul](../../../.claude/kb/curated/global/ai-engineering/victoria-identity-soul/KB.md) - [Victoria user](../../../.claude/kb/curated/global/ai-engineering/victoria-identity-user/KB.md) - [Victoria identity](../../../.claude/kb/curated/global/ai-engineering/victoria-identity-id/KB.md) - [Victoria tools](../../../.claude/kb/curated/global/ai-engineering/victoria-identity-tools/KB.md) - [Victoria agents](../../../.claude/kb/curated/global/ai-engineering/victoria-identity-agents/KB.md) - conditional: - [Victoria bootstrap](../../../.claude/kb/curated/global/ai-engineering/victoria-identity-bootstrap/KB.md) - [Victoria heartbeat](../../../.claude/kb/curated/global/ai-engineering/victoria-identity-heartbeat/KB.md) - [Victoria memory](../../../.claude/kb/curated/global/ai-engineering/victoria-identity-memory/KB.md) All 8 Victoria identity mirrors are rated 52–55 post-consolidation; they serve runtime injection, not content discovery. Victoria now has a dedicated `victoria-operating-principles` mirror in the same supporting-guidance layer as Julius and Lila. Shared principles still matter, but Victoria's personal operating surface is no longer a missing exception. ### Julius - every-session or near-every-session: - [Julius soul](../../../.claude/kb/curated/global/ai-engineering/julius-identity-soul/KB.md) - [Julius user](../../../.claude/kb/curated/global/ai-engineering/julius-identity-user/KB.md) - [Julius identity](../../../.claude/kb/curated/global/ai-engineering/julius-identity-id/KB.md) - [Julius tools](../../../.claude/kb/curated/global/ai-engineering/julius-identity-tools/KB.md) - [Julius agents](../../../.claude/kb/curated/global/ai-engineering/julius-identity-agents/KB.md) - conditional: - [Julius bootstrap](../../../.claude/kb/curated/global/ai-engineering/julius-identity-bootstrap/KB.md) - [Julius heartbeat](../../../.claude/kb/curated/global/ai-engineering/julius-identity-heartbeat/KB.md) - [Julius memory](../../../.claude/kb/curated/global/ai-engineering/julius-identity-memory/KB.md) All 8 Julius identity mirrors are rated 52–55 post-consolidation; they serve runtime injection, not content discovery. ### Lila - every-session or near-every-session: - [Lila soul](../../../.claude/kb/curated/global/ai-engineering/lila-identity-soul/KB.md) - [Lila user](../../../.claude/kb/curated/global/ai-engineering/lila-identity-user/KB.md) - [Lila identity](../../../.claude/kb/curated/global/ai-engineering/lila-identity-id/KB.md) - [Lila tools](../../../.claude/kb/curated/global/ai-engineering/lila-identity-tools/KB.md) - [Lila agents](../../../.claude/kb/curated/global/ai-engineering/lila-identity-agents/KB.md) - conditional: - [Lila bootstrap](../../../.claude/kb/curated/global/ai-engineering/lila-identity-bootstrap/KB.md) - [Lila heartbeat](../../../.claude/kb/curated/global/ai-engineering/lila-identity-heartbeat/KB.md) - [Lila memory](../../../.claude/kb/curated/global/ai-engineering/lila-identity-memory/KB.md) All 8 Lila identity mirrors are rated 52–55 post-consolidation; they serve runtime injection, not content discovery. ## Helper Surface Contract The active helper layer is intentionally small. Identity mirrors maintain lowered ratings (52–55) post-consolidation and serve as reference surfaces only, not primary entry points. Neighborhood surfaces are now legacy helper residue, not part of the active startup model. Keep them only as deprecated references until any useful links are migrated elsewhere. If a surface starts sounding like doctrine or a mandatory route, it is drifting. The same rule applies to operating principles: if a principles doc starts sounding like the main command center again, it is drifting. ## Authoring And Sync Contract The three relevant truth layers are: 1. validated runtime truth for what the hosted agents are actually reading 2. DB canon for deeper doctrine, linked context, and evolving operating truth 3. repo mirrors for reviewable authoring and portability When they disagree: - validate runtime first - reconcile against DB canon second - then update repo mirrors to match Do not let repo mirrors become a competing operating system just because they are easy to diff. The lightweight repo-local entrypoints in `.claude/agents/victoria.md`, `.claude/agents/julius.md`, and `.claude/agents/lila.md` should echo the same hosted-agent core setup story. They are not substitutes for the steering mirrors, but they should not lag far behind them. ## Channel And Slack Boundary Slack, App Home, thread/session behavior, delivery chunking, and similar channel concerns are runtime surfaces, not agent identity architecture. Keep that boundary explicit: - agent-doc work governs steering surfaces, maps, read order, and sync precedence - runtime/channel work governs delivery UX and app-surface behavior The doc stack should describe that boundary clearly without trying to solve Slack from inside `agent-files`. ## Update Loop Use this as the default maintenance loop for the stack: 1. validate runtime truth first when service naming, principals, or overlays are in question 2. update shared canon and architecture docs before rewriting per-agent mirrors 3. keep ops guides as entry surfaces and operating principles as supporting guidance 4. keep identity mirrors thin, durable, and pointer-heavy 5. keep helper surfaces outward-linking and optional 6. run the audit checklist in [AGENT_STACK_AUDIT_CHECKLIST.md](./AGENT_STACK_AUDIT_CHECKLIST.md) 7. refresh the current state in [AGENT_STACK_FLEET_MATRIX.md](./AGENT_STACK_FLEET_MATRIX.md) and [AGENT_STACK_VARIANT_DRIFT_REPORT.md](./AGENT_STACK_VARIANT_DRIFT_REPORT.md) 8. if a runtime sync is needed, follow [RUNTIME_OVERLAY_SYNC_CHECKLIST.md](./RUNTIME_OVERLAY_SYNC_CHECKLIST.md) 9. if legacy material looks promising, add it to the intake backlog instead of importing it directly 10. keep shared people context at the validated shared layer and move deeper name or title fixes into principal-specific surfaces ## Runtime Drift Notes The doc stack should describe real runtime drift instead of hiding it. Current important example: - Julius is the human-facing agent identity - some service plumbing still uses `openclaw-justin` The right fix is explicit reconciliation, not pretending there is no mismatch. ## Non-Fleet Repo Surfaces The repo still contains additional agent or persona files such as `system-primer.md`, `ares.md`, and older export personas. Treat them as one of: - operator/session primer surfaces - unresolved historical residue - pattern references Do not treat them as part of the active persistent hosted trio unless they are explicitly reintroduced. --- ## Source: docs/references/ai-agents/AGENT_DOC_ROLE_TEMPLATES.md # Agent Doc Role Templates Purpose: freeze the internal template for the active hosted agent-doc roles without forcing identical prose. Use this to keep the stack maintainable. These are role contracts, not writing cages. ## Shared Rules - keep codes and URLs stable unless there is an explicit migration plan - keep DB-first, local-fast-path language clear and early - keep helper surfaces optional - keep one shared fleet entry surface explicit and easy to point at - keep local mirrors pointer-heavy rather than self-sufficient - keep Slack/App Home and other channel UX concerns out of the identity-doc lane ## Front Door Template Required sections: 1. what this document is 2. core stance 3. read first 4. load only when relevant 5. how to use Katailyst compositionally 6. depth posture 7. good starting points by need 8. flagship capability lanes for this role 9. natural starting clusters 10. tool posture 11. what not to do ## Information Architecture Template Required sections: 1. what this doc is 2. DB-first, local-fast-path doctrine 3. shared canon first 4. front door role 5. runtime mirror lane 6. front-door index role 7. every-session surfaces 8. load-only-when-relevant surfaces 9. continuity memory boundary 10. task-artifact boundary 11. runtime drift notes 12. what should stay local versus in DB ## Identity Memory Template Required sections: 1. what this memory mirror is 2. durable facts 3. routing index 4. persistent decisions 5. archived assumptions 6. what does not belong here 7. variant authority ## Identity Tools Template Required sections: 1. what this tool mirror is 2. tool doctrine 3. tier-1 tools 4. agent-specific default surfaces 5. discovery path for the wider tool graph 6. durable caveats 7. what does not belong here ## Identity Heartbeat Template Required sections: 1. purpose 2. cadence 3. alert thresholds 4. suppression rules 5. known-issue handling 6. user-facing filter 7. what not to repeat ## Identity Agents Template Required sections: 1. what this doc is 2. role in the fleet 3. working stance 4. session start 5. before non-trivial work 6. output and delivery rules 7. memory and continuity rules 8. coordination and subagents 9. go to DB canon for ## Helper Companion Template ### Front-Door Index Required sections: 1. what this surface is 2. compact file-and-surface map 3. when each nearby surface is useful 4. explicit statement that it is optional and not a forced route ### Legacy Front-Door Neighborhood Only keep this surface during migration or archival cleanup. If it still exists, it should: 1. say clearly that it is deprecated 2. point to the shared fleet entry, front door, or flagship capability lanes instead 3. avoid acting like a required discovery surface ## Shared Architecture Template Required sections: 1. what this doc is 2. anti-collapse principle 3. runtime truth 4. runtime file map 5. authoring and sync contract 6. layer map 7. family distinctions 8. drift rules 9. channel/runtime boundary --- ## Source: docs/references/ai-agents/AGENT_STACK_FLEET_MATRIX.md # Agent Stack Fleet Matrix Updated: 2026-03-12 Purpose: make the active hosted trio inspectable at a glance, including lessons-layer status, people truth status, and actual runtime-sync state. ## Current State All required active-fleet repo mirrors are present for Victoria, Julius, and Lila: - ops guide - information-architecture map - 8 identity mirrors - `snippet`, `distilled`, and `full` variants for each of those surfaces The adjacent lessons layer has now been normalized for all three agents so it behaves like secondary guidance instead of hidden doctrine. An additive repo-side parity pass on 2026-03-12 also strengthened: - lightweight `.claude/agents/victoria.md`, `.claude/agents/julius.md`, and `.claude/agents/lila.md` so the trio shares a Victoria-grade structure - the shared hosted-agent MCP/vault posture across the high-risk mirrors - the doctrine that hosted agents should use tools and skills more aggressively than first instinct suggests - the shared distinction between Notion MCP and vault-backed Notion REST ## Fleet Matrix | Agent | Runtime topology | Front door | IA doc | 8 mirrors | Helper index | Legacy neighborhood status | Lessons layer | People truth status | Runtime alias status | Script-reference status | Repo mirror status | Live runtime sync status | | -------- | -------------------------------------- | ---------- | ------- | --------- | ------------ | -------------------------- | -------------------------------- | ------------------------------------------------------------- | ------------------------------------------------------------------- | --------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Victoria | service workspace at `/data/workspace` | present | present | present | present | deprecated helper residue | normalized as secondary guidance | Alec shared context validated at first-name layer | no special alias documented in active stack docs | stale one-off helper references quarantined to historical notes | repo parity strengthened on 2026-03-12; lightweight agent brief + hosted-core doctrine updated | last verified live sync remains 2026-03-10; 2026-03-12 additive parity changes were not re-synced from this workspace because no live mount/validated SSH path was available | | Julius | service workspace at `/data/workspace` | present | present | present | present | deprecated helper residue | normalized as secondary guidance | Justin shared context kept at first-name layer in shared docs | `openclaw-justin` still exists as plumbing beside Julius identity | stale one-off helper references quarantined to historical notes | repo parity strengthened on 2026-03-12; lightweight agent brief + hosted-core doctrine updated | last verified live sync remains 2026-03-10; 2026-03-12 additive parity changes were not re-synced from this workspace because no live mount/validated SSH path was available | | Lila | service workspace at `/data/workspace` | present | present | present | present | deprecated helper residue | normalized as secondary guidance | Emily shared context kept at first-name layer in shared docs | no separate service alias currently documented in active stack docs | stale one-off helper references quarantined to historical notes | repo parity strengthened on 2026-03-12; lightweight agent brief + hosted-core doctrine updated | last verified live sync remains 2026-03-10; 2026-03-12 additive parity changes were not re-synced from this workspace because no live mount/validated SSH path was available | ## Shared Surfaces Status | Shared surface | Status | Notes | | ----------------------------------- | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `HOSTED_AGENT_CORE_SETUP.md` | added | new shared setup reference for the hosted trio; centralizes Katailyst MCP, Supabase MCP, vault posture, core tool cohort, and the Notion MCP vs REST split | | `global-team-context` | normalized | shared layer uses first-name references until deeper titles or surnames are revalidated | | `agent-deployment-reference` | normalized | now acts as topology and placement reference instead of a hidden command center | | `RUNTIME_OVERLAY_SYNC_CHECKLIST.md` | executable | now records sync boundary, sync modes, pre-sync checks, and post-sync evidence | | shared doctrine KB DB sync | complete | `global-catalyst-atlas`, `global-agent-principles`, and `tools-guide-overview` match repo mirrors in canonical DB and current-revision embeddings | | hosted IA KB canonical sync | complete | Victoria, Julius, and Lila information-architecture surfaces now resolve in canonical DB under `hlt` with fresh variants and revisions | | hosted recovery doctrine sync | pending | last verified canonical DB/runtime sync for these surfaces was 2026-03-10; 2026-03-12 repo-side additive parity changes still need canonical DB re-sync and live runtime verification | Validated runtime note: - the live Julius and Lila services each read their own `/data/workspace` base directly - sibling `julius-docs` and `lila-docs` directories exist on Victoria's disk but are not the active workspace path for those services Sync evidence note: - Victoria backup: `/data/workspace/backups/agent-steering-before-sync-20260309T213649Z.tgz` - Julius backup: `/data/workspace/backups/agent-steering-before-sync-20260309T213713Z.tgz` - Lila backup: `/data/workspace/backups/agent-steering-before-sync-20260309T213737Z.tgz` - Victoria recovery backup: `/data/workspace/backups/recovery-20260310T044827Z` - Julius recovery backup: `/data/workspace/backups/recovery-20260310T044827Z` - Lila recovery backup: `/data/workspace/backups/recovery-20260310T044827Z` - Victoria runtime recovery backup: `/data/workspace/backups/recovery-20260310T150242Z` - Julius runtime recovery backup: `/data/workspace/backups/recovery-20260310T150827Z` - Lila runtime recovery backup: `/data/workspace/backups/recovery-20260310T151056Z` - canonical DB backup was taken before doctrine sync - live drift was limited to `AGENTS.md` and `TOOLS.md` on all 3 services; the other 6 steering files were already aligned - file-level SSH verification passed after sync for the changed files on all 3 services - canonical DB KB variants, revision snapshots, and current-revision embeddings were re-synced and verified on 2026-03-09 for the four shared doctrine KBs - hosted runtime recovery sync refreshed `AGENTS.md`, `SOUL.md`, `TOOLS.md`, and `BOOTSTRAP.md` on all 3 services on 2026-03-10 - post-sync SSH SHA-256 verification matched repo mirrors for the refreshed runtime files on Victoria, Julius, and Lila on 2026-03-10 - canonical DB KB variants, revision snapshots, link metadata, and current-revision embeddings were re-synced and verified on 2026-03-10 for the hosted recovery surfaces: - `global-catalyst-atlas` - `global-agent-principles` - `tools-guide-overview` - Victoria Ops Guide - Julius Ops Guide - Lila Ops Guide - `victoria-identity-tools` - `julius-identity-tools` - `lila-identity-tools` - `victoria-identity-agents` - `julius-identity-agents` - `lila-identity-agents` - `victoria-identity-user` - `julius-identity-user` - `lila-identity-user` - canonical DB IA sync completed on 2026-03-10 for Victoria, Julius, and Lila information-architecture surfaces - all queried hosted-fleet runtime docs now resolve under `hlt`; `tools-guide-overview` remains intentionally published under `system` - normal-ingress conversational verification is still pending and should remain recorded as a runtime-behavior gap until tested - 2026-03-12 additive parity pass updated repo mirrors and lightweight `.claude/agents/*.md` surfaces but did not run a new canonical DB sync or live runtime overlay copy from this workspace - 2026-03-12 live sync remained pending because this workspace did not expose a verified `/data/workspace` mount or validated Render/OpenClaw SSH path for the hosted trio ## Runtime File Mapping The intended runtime steering files stay consistent across agents: | Runtime file | Repo mirror family | Victoria source | Julius source | Lila source | | -------------- | ------------------ | ----------------------------- | --------------------------- | ------------------------- | | `AGENTS.md` | identity-agents | `victoria-identity-agents` | `julius-identity-agents` | `lila-identity-agents` | | `SOUL.md` | identity-soul | `victoria-identity-soul` | `julius-identity-soul` | `lila-identity-soul` | | `USER.md` | identity-user | `victoria-identity-user` | `julius-identity-user` | `lila-identity-user` | | `IDENTITY.md` | identity-id | `victoria-identity-id` | `julius-identity-id` | `lila-identity-id` | | `TOOLS.md` | identity-tools | `victoria-identity-tools` | `julius-identity-tools` | `lila-identity-tools` | | `BOOTSTRAP.md` | identity-bootstrap | `victoria-identity-bootstrap` | `julius-identity-bootstrap` | `lila-identity-bootstrap` | | `HEARTBEAT.md` | identity-heartbeat | `victoria-identity-heartbeat` | `julius-identity-heartbeat` | `lila-identity-heartbeat` | | `MEMORY.md` | identity-memory | `victoria-identity-memory` | `julius-identity-memory` | `lila-identity-memory` | ## Watch Items - Julius naming still spans a human-facing `Julius` identity and older `openclaw-justin` service plumbing. - Live runtime file sync is not current for the 2026-03-12 additive parity pass; only the 2026-03-10 sync is verified. - Deeper principal titles or surnames should be validated in dedicated user or team surfaces before they are reintroduced into the shared context layer. --- ## Source: docs/references/ai-agents/COOKBOOK_AND_AGENT_PATTERNS.md # Cookbook & Agent Patterns Reference > Reusable patterns extracted from Anthropic's official cookbooks, "Building Effective Agents" reference implementations, and the Claude Code Skill Factory project. Each pattern is mapped to the Katailyst plan where it applies. ## Sources | Source | Files Analyzed | Signal Level | | ----------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- | --------------------------------- | | [Anthropic Claude Cookbooks](https://github.com/anthropics/anthropic-cookbook) | 11 files (embeddings, streaming, RAG, tool use, memory, vision, evaluation) | Official reference | | [Building Effective Agents](https://www.anthropic.com/research/building-effective-agents) | 5 files (orchestrator, subagent, citations, evaluator-optimizer) | Official reference implementation | | Claude Code Skill Factory | 13 files (agent-factory, prompt-factory, commands, generated skills) | Community reference | --- ## Pattern 1: Retrieve-Then-Generate (Two-Phase RAG) **Source**: `claude-cookbooks/third_party/Wikipedia/wikipedia-search-cookbook.ipynb` **Problem**: When generation and retrieval happen in a single pass, the LLM "precommits" to an answer before seeing all relevant context. **Solution**: Separate retrieval from synthesis into two distinct phases: 1. **Retrieval loop**: Iterative search with `` tags, quality reflection (``), and information extraction (`` blocks). The model searches until it has enough context. 2. **Synthesis prompt**: A separate LLM call that receives only the extracted information and produces the final output. No tool access — pure synthesis. **Key insight**: "Without this step, Claude would sometimes precommit to an answer." **Implementation reference**: ```python class ClientWithRetrieval(Anthropic): def retrieve(self, query, search_tool): # Iterative search loop # Model decides when to stop searching def completion_with_retrieval(self, query): # Phase 1: retrieve context = self.retrieve(query) # Phase 2: synthesize (separate prompt, no tools) return self.synthesize(context, query) ``` **Katailyst mapping**: Already incorporated as **04-03 Decision E** (Retrieve-Then-Generate). The Factory generator runs `discover → execute → evaluate` as distinct steps. Also relevant to **04-08** (Chat Testing Ground — agent research workflows should separate search from synthesis). --- ## Pattern 2: Structured Extraction via Tool Schemas **Source**: `claude-cookbooks/tool_use/vision_with_tools.ipynb` **Problem**: Asking an LLM to "output JSON" is fragile. Formatting errors, missing fields, and hallucinated keys are common. **Solution**: Define an AI SDK `tool()` with a typed JSON Schema. Force the model to emit structured data by "calling" the tool rather than generating free-text JSON. ```python nutrition_tool = { "name": "record_nutrition", "input_schema": { "type": "object", "properties": { "calories": {"type": "number"}, "protein_grams": {"type": "number"}, # ... }, "required": ["calories", "protein_grams"] } } # response.stop_reason == "tool_use" → extract tool_inputs ``` **Katailyst mapping**: Already incorporated as **04-03 Decision F** (Structured Outputs via Tool Schemas) and **04-10 Decision E** (Prefer Structured Judge Output). Use this pattern wherever machine-readable output is needed — Factory generation, eval judging, entity metadata extraction. --- ## Pattern 3: Evaluator-Optimizer Loop **Source**: `patterns copy/agents/evaluator_optimizer.ipynb` **Problem**: First-pass LLM output is often good but not great. Manual iteration is slow. **Solution**: Generate → Evaluate → Iterate loop with memory of previous attempts: ```python def generate(prompt, task, context): # Returns (thoughts, result) via and XML tags def evaluate(prompt, content, task): # Returns (evaluation, feedback) via and tags # evaluation: PASS | NEEDS_IMPROVEMENT | FAIL # Loop context = "" while True: thoughts, result = generate(prompt, task, context) evaluation, feedback = evaluate(eval_prompt, result, task) if evaluation == "PASS": return result context += f"\n{result}\n{feedback}" ``` **Key design**: Each iteration receives ALL prior attempts + feedback as context, so the model learns from its mistakes. **Katailyst mapping**: Already incorporated as **04-03 Decision G** (Evaluator-Optimizer Loop — budgeted, max 2 iterations). Also applicable to **04-10** (Variation Generators — refine until rubric passes). --- ## Pattern 4: Context Management (Auto-Clearing Tool Uses) **Source**: `claude-cookbooks/tool_use/memory_demo/code_review_demo.py` **Problem**: Long chat sessions with tool calling accumulate massive context from tool results, eventually exceeding model limits. **Solution**: Actively manage the message history. Two approaches: **Provider-specific** (Anthropic beta): ```python context_management = { "edits": [{ "type": "clear_tool_uses_20250919", "trigger": {"type": "input_tokens", "value": 30000}, "keep": {"type": "tool_uses", "value": 3} }] } ``` **Application-level** (provider-agnostic): - Persist full history to DB (runs/run_steps) - Before each model call, construct an **effective message set**: last N turns + last K tool results - Replace older tool results with summary stubs + trace step ID pointer - Enforce token budget: truncate context bundle before truncating chat turns **Katailyst mapping**: Already incorporated as **04-08 Decision F** (Context Management). The application-level strategy is preferred for provider independence. --- ## Pattern 5: Orchestrator-Subagent with Query Classification **Source**: `patterns copy/agents/prompts/research_lead_agent.md` **Problem**: Complex queries require different research strategies. A single-agent approach either under-explores (misses nuance) or over-explores (wastes tokens). **Solution**: An orchestrator that classifies queries and delegates to subagents: **Query types**: - **Depth-first**: Multiple perspectives on the same topic (deploy 3-5 subagents on different angles) - **Breadth-first**: Independent sub-questions (deploy 1 subagent per sub-question) - **Straightforward**: Single focused investigation (1 subagent) **Subagent count guidelines**: | Complexity | Count | Example | |------------|-------|---------| | Simple factual | 1 | "What is X?" | | Standard | 2-3 | "Compare X and Y" | | Medium | 3-5 | "Research trends in X" | | High complexity | 5-10 | "Comprehensive market analysis" | | Maximum | 20 | "Multi-industry survey" | **Key rules**: - Orchestrator WRITES the final report (never delegates synthesis to a subagent) - Parallel tool calls are optional. Default to **sequential** when quality/attention matters; parallelize only truly independent retrieval steps when it helps and does not degrade results. - Each subagent may be given an optional “budget hint” (tokens/tool calls/time). Treat this as a **default exploration target** (encouraging multiple tool calls) plus a **runaway circuit breaker**; it must be generous and overridable. **Katailyst mapping**: **04-08** (Chat Testing Ground) — when building agent orchestration, use this pattern for research-type agents. The `research_lead` + `research_subagent` pattern maps directly to Katailyst's agent entity model (lead agent = `agent` entity, subagent = spawned worker with scoped tools and budget). --- ## Pattern 6: Research Subagent OODA Loop **Source**: `patterns copy/agents/prompts/research_subagent.md` **Problem**: Research subagents need structured reasoning to avoid rabbit holes and wasted tool calls. **Solution**: OODA loop (Observe → Orient → Decide → Act) with a research budget: - **Budget**: Optional budget hint per subagent (configurable per query complexity). Use it to encourage non-trivial exploration (e.g., “don’t stop after 1 tool call”). Keep ceilings high and avoid hard caps unless you are preventing extreme runaway. - **Source quality**: Use source-quality signals as hints (separate facts vs. speculation; flag possible false authority) without blocking trend/voice sources when that is the goal. - **Parallel tool calls (optional)**: Prefer sequential. If parallel is used, apply it to independent retrieval only, then return to a single-threaded synthesis path. - **Completion**: Call `complete_task` tool to report back to orchestrator - **Max sources**: Avoid unbounded loops. Use high, configurable ceilings and stop only on extreme runaway or explicit user constraints. **Key principle**: Epistemic honesty — the subagent must distinguish between "I found this" (fact) and "I think this might be" (inference). **Katailyst mapping**: **05-03** (Community Import) — the crawl logic should follow a similar budget-aware, quality-checking approach when evaluating community skills. Also informs agent persona design for research-type agents in the registry. --- ## Pattern 7: Citations Agent (Post-Processing) **Source**: `patterns copy/agents/prompts/citations_agent.md` **Problem**: Adding citations during generation interrupts flow and produces worse prose. Adding them after is fragile if the agent modifies the text. **Solution**: A dedicated post-processing agent that ONLY adds citations without modifying text: **Rules**: - Never modify the synthesized text - Cite meaningful semantic units (claims, statistics, quotes) - Minimize citation fragmentation - No redundant same-source citations in the same sentence **Katailyst mapping**: **04-03** (Factory) — when generating content that references sources (research skills, KB-grounded outputs), apply citations as a post-processing step. **04-10** (Variation Generators) — ensure variations preserve source attribution. --- ## Pattern 8: Memory Tool Handler (Persistent, Path-Safe) **Source**: `claude-cookbooks/tool_use/memory_tool.py` **Problem**: Cross-session context persistence requires safe file operations with strict path boundaries. **Solution**: A `MemoryToolHandler` class with: - Path validation (prevents directory traversal) - Scoped to a specific directory (`/memories`) - Allowed file extensions whitelist - CRUD operations: view, create, str_replace, insert, delete, rename **Katailyst mapping**: Already incorporated as **05-04 Decision F** (Optional Persistent Memory Store). The path validation pattern is directly applicable to the hook cache system. --- ## Pattern 9: Debug Event Streaming as a Queue (Producer/Consumer) **Source**: `claude-cookbooks/third_party/ElevenLabs/stream_voice_assistant_websocket.py` **Problem**: Debug panels need to update in real-time as chat progresses, but blocking on individual events creates UI jank. **Solution**: An `AudioQueue` pattern adapted for debug events: - Producer: server emits events as they occur (discovery, tool call, tool result) - Consumer: client maintains event queue, panels consume incrementally - Pre-buffering: accumulate initial events before rendering (avoids flicker) - Read-position tracking: each panel consumes at its own pace **Katailyst mapping**: Already incorporated as **04-08 Decision G** (Debug Event Streaming as a Queue). The `AudioQueue` pre-buffering pattern specifically informs the debug panel architecture. --- ## Pattern 10: XML Mega-Prompt Structure **Source**: `incoming/archive/2026/03/16/claude-code-skill-factory-dev1/generated-prompts/sample-growth-hacker-prompt.md` **Problem**: Complex system prompts for agents need consistent structure to ensure all aspects (role, constraints, workflow, output format) are covered. **Solution**: An XML envelope with canonical sections: ```xml Title, domain, expertise level Primary objective + secondary objectives Discovery Analysis Execution Delivery ``` **Katailyst mapping**: **04-03** (Factory) — this structure should inform the `template_json` schema for agent and prompt templates. The canonical sections map to specific fields in `entity_revisions.content_json`. Especially valuable for the `researcher-agent` and `editor-agent` canonical templates (04-03 Task 2). --- ## Pattern 11: 5-Level Prompt Complexity Progression **Source**: `incoming/archive/2026/03/16/claude-code-skill-factory-dev1/generated-prompts/README-EXAMPLES.md` **Problem**: Not all prompts need the same level of sophistication. Over-engineering simple prompts wastes tokens; under-engineering complex ones produces poor results. **Solution**: A 5-level progression tied to complexity and stakes: | Level | Techniques | Token Budget | Use Case | | ------------ | ---------------------------------------------------------------- | ------------ | -------------------------------------- | | Basic | Role-based prompting, clear constraints | ~3,200 | Code reviews, classification | | Intermediate | Few-shot learning, chain of thought | ~5,800 | Data analysis, diagnostics | | Advanced | Tree of Thoughts, multi-path reasoning | ~8,400 | Architecture decisions, tech selection | | Expert | Meta-prompting, self-consistency, recursive reasoning | ~10,200 | Strategic research, due diligence | | Master | Multi-agent simulation, first-principles, constrained generation | ~11,800 | Board-level strategy, M&A | **Progression logic**: - Basic → Intermediate: Add learning (few-shot examples + show reasoning) - Intermediate → Advanced: Add exploration (multiple paths before deciding) - Advanced → Expert: Add validation (self-consistency checking + red teaming) - Expert → Master: Add perspectives (multiple personas with different incentives) **Katailyst mapping**: **04-03** (Factory) — the prompt `tier` field (1-10 in Katailyst) maps to this progression. Templates should include a `complexity_level` that guides which techniques the generator applies. **04-07** (Intake Questionnaires) — the questionnaire could ask "What's the stakes level?" to auto-select the appropriate complexity tier. --- ## Pattern 12: Multi-Agent Simulation with Personas **Source**: `incoming/archive/2026/03/16/claude-code-skill-factory-dev1/generated-prompts/master-strategic-consultant.md` **Problem**: Single-perspective analysis misses blind spots. Different stakeholders have different incentives that should be surfaced. **Solution**: Simulate 5 expert personas with distinct viewpoints: | Persona | Focus | Incentive | | ------- | ------------------------------------ | ---------------------------------- | | CFO | Financial viability, NPV, ROI | Minimize risk, maximize returns | | COO | Operational feasibility, integration | Execution timeline, team capacity | | CSO | Strategic fit, competitive position | Market timing, long-term advantage | | CTO | Technical architecture, scalability | Technology risk, integration cost | | Board | Governance, fiduciary duty | Stakeholder protection, compliance | **Workflow**: 1. Problem Deconstruction (first-principles) 2. Multi-Agent Perspective Generation (each persona answers questions independently) 3. Socratic Dialogue (challenge each perspective) 4. Synthesis & Conflict Resolution (find consensus or document disagreements) 5. Structured Output (decision matrix, financial model, risk register) 6. Fatal Flaw Analysis (deal-breakers before recommending) **Katailyst mapping**: **04-10** (Variation Generators) — the multi-persona pattern can be used for evaluation. Instead of one AI judge, run multiple "judge personas" and synthesize. Also informs agent entity design — agents could have a `persona_type` field enabling multi-agent simulation workflows. --- ## Pattern 13: Agent Type Classification **Source**: `incoming/archive/2026/03/16/claude-code-skill-factory-dev1/generated-skills/agent-factory/SKILL.md` **Problem**: Not all agents have the same execution pattern. Some can run in parallel; others must be sequential. **Solution**: Classify agents into 4 types with distinct execution rules: | Type | Color | Execution | Tools | Process Range | | -------------- | ------ | --------------- | ----------------- | ------------- | | Strategic | Blue | Parallel-safe | Read, Write, Grep | 15-20 | | Implementation | Green | Coordinated | Full tool access | 20-30 | | Quality | Red | Sequential ONLY | Read, Grep, Test | 12-18 | | Coordination | Purple | Orchestration | Task, Read | 8-12 | **Key rules**: - Quality agents (red) NEVER run in parallel — they need sequential context - Strategic agents (blue) can safely run in parallel (read-heavy) - Implementation agents (green) need coordination (write-heavy) **Agent YAML frontmatter**: ```yaml name: code-reviewer description: Reviews code for quality issues tools: [Read, Grep] model: claude-opus-4-6 color: red field: software-engineering expertise: code-quality ``` **Katailyst mapping**: **04-03** (Factory) — the agent template seeds (`editor-agent`, `researcher-agent`) should include `agent_type` classification in the generated content_json. This maps to an optional `metadata.agent_type` field on agent entities in the registry. --- ## Pattern 14: Expected Output JSON as Test Fixtures **Source**: `incoming/archive/2026/03/16/claude-code-skill-factory-dev1/generated-skills/content-trend-researcher/expected_output.json` **Problem**: Skill test fixtures need a standard format that captures both structure and content expectations. **Solution**: A structured expected output JSON that doubles as both documentation and test fixture: ```json { "research_summary": { "topic": "...", "date_range": "...", "platforms_analyzed": ["twitter", "reddit", "linkedin"], "total_sources": 47 }, "topic_overview": { "name": "...", "opportunity_score": 8.5, "growth_trajectory": "accelerating" }, "platform_insights": [ { "platform": "twitter", "trending_score": 7.8, "key_conversations": [...] } ], "content_gaps": [...], "recommendations": [...] } ``` **Katailyst mapping**: **04-11** (Skill Test Harness) — this JSON format maps directly to the `schema` match mode in the fixture definition. Import these as `entity_artifacts` with `artifact_type = 'test'`. The JSON structure serves as the JSON Schema for validation. --- ## Pattern 15: Command Discovery→Analysis→Task Pattern **Source**: `incoming/archive/2026/03/16/claude-code-skill-factory-dev1/generated-commands/enhance-claude-md/enhance-claude-md.md` **Problem**: Slash commands need a consistent execution structure that handles project analysis before action. **Solution**: A 3-phase command pattern: 1. **Discovery Phase**: Read project state (git status, config files, existing code) 2. **Analysis Phase**: Determine what needs to change (diff analysis, gap identification) 3. **Task Phase**: Execute the changes (code generation, file updates) **YAML frontmatter**: ```yaml --- allowed-tools: - Bash - Read - Write - Glob - Grep description: 'Analyze and enhance CLAUDE.md configuration' --- ``` **Katailyst mapping**: **04-03** (Factory) — command templates (prompt entities with `prompt_kind = 'command'`) should follow this 3-phase structure. The `research-skill` template can embed this pattern for commands that need project analysis. --- ## Pattern 16: Evaluation Dataset Format (XML) **Source**: `claude-cookbooks/tool_evaluation/evaluation.xml` **Problem**: Test fixtures for evaluating tool use need a standard interchange format. **Solution**: XML format with task/prompt/response pairs: ```xml What is the square root of 144? 12 Calculate the derivative of x^3 3x^2 ``` **Katailyst mapping**: **04-11** (Skill Test Harness) — support XML fixture import alongside JSON. The `parseFixtureJson` function should also handle XML-formatted evaluation datasets via a `parseFixtureXml` variant. --- ## Pattern 17: Intentional Bug Patterns for Negative Testing **Source**: `claude-cookbooks/tool_use/memory_demo/sample_code/web_scraper_v1.py`, `data_processor_v1.py` **Problem**: Test harnesses need negative fixtures — inputs that should produce specific failure modes. **Solution**: Intentionally buggy code samples with known issues: - Race conditions on shared mutable state - Non-atomic counter increments - List appends without locks - Dictionary modifications during iteration **Katailyst mapping**: **04-11** (Skill Test Harness) — the `fixture_kind: "negative"` field supports this. Negative fixtures test that skills correctly identify problems. Seed a few negative fixtures for quality-review skills. --- ## Pattern 18: 7-Question Intake Flow for Prompt Generation **Source**: `incoming/archive/2026/03/16/claude-code-skill-factory-dev1/generated-prompts/README-EXAMPLES.md` **Problem**: Users need guidance to define high-quality prompts without prompt engineering expertise. **Solution**: A focused 7-question intake questionnaire: 1. **Role**: "What role should the AI assume?" (e.g., "Senior Code Reviewer") 2. **Domain**: "What domain or field?" (e.g., "Software Engineering") 3. **Primary Goal**: "What is the main objective?" (e.g., "Find bugs and suggest fixes") 4. **Output Type**: "What kind of output?" (e.g., "analysis + recommendations") 5. **Tech Stack**: "What technologies or frameworks?" (e.g., "TypeScript, React") 6. **Constraints**: "What rules or boundaries?" (e.g., "Only flag actionable issues") 7. **Communication Style**: "How should it communicate?" (e.g., "Direct, concise, technical") Plus two meta-selections: - **Format**: XML / Markdown / Plain - **Mode**: Core (basic-intermediate) / Advanced (advanced-master) **Katailyst mapping**: **04-07** (Intake Questionnaires) — this maps directly to the `questionnaire_json` structure in `factory_templates`. The 7-question flow is a proven minimal set for prompt generation. Use this as the basis for the `rubric-prompt` and `editor-agent` template questionnaires in 04-03 Task 2. --- ## Cross-Reference Matrix | Pattern | 04-03 | 04-07 | 04-08 | 04-10 | 04-11 | 05-03 | 05-04 | | ----------------------------- | --------- | ------- | --------- | --------- | ------- | ------- | --------- | | 1. Two-Phase RAG | **Dec E** | | Ref | | | | | | 2. Tool Schema Extraction | **Dec F** | | | **Dec E** | | | | | 3. Evaluator-Optimizer | **Dec G** | | | Ref | | | | | 4. Context Management | | | **Dec F** | | | | | | 5. Orchestrator-Subagent | | | **NEW** | | | | | | 6. OODA Research Loop | | | Ref | | | **NEW** | | | 7. Citations Agent | Ref | | | Ref | | | | | 8. Memory Handler | | | | | | | **Dec F** | | 9. Debug Event Queue | | | **Dec G** | | | | | | 10. XML Mega-Prompt | **NEW** | | | | | | | | 11. 5-Level Complexity | **NEW** | **NEW** | | | | | | | 12. Multi-Agent Personas | | | | **NEW** | | | | | 13. Agent Type Classification | **NEW** | | | | | | | | 14. Expected Output Fixtures | | | | | **NEW** | | | | 15. Command Pattern (D→A→T) | **NEW** | | | | | | | | 16. XML Eval Format | | | | | **NEW** | | | | 17. Negative Test Patterns | | | | | **NEW** | | | | 18. 7-Question Intake | | **NEW** | | | | | | **Legend**: **Dec X** = already a formal Decision in the plan. **Ref** = referenced but not a decision. **NEW** = should be added as a reference. --- ## Incoming Material Inventory These external references are available for deeper consultation during execution: | Material | Location | Contents | | ------------------------- | ------------------------------------------------------------------ | ----------------------------------------------------------------------------------------- | | Anthropic Cookbooks | `incoming/archive/2026/03/16/anthropic-effective-agents-cookbook/` | Embeddings, streaming, RAG, tool use, memory, vision, evaluation | | Building Effective Agents | `incoming/archive/2026/03/16/anthropic-effective-agents-cookbook/` | Orchestrator, subagent, citations, evaluator-optimizer | | Skill Factory | `incoming/archive/2026/03/16/claude-code-skill-factory-dev1/` | Agent-factory, prompt-factory, app-store-optimization, content-trend-researcher, commands | | Superpowers | `incoming/archive/2026/03/16/superpowers-main-copy/` | 14 skills, 3 commands, 1 agent (obra/superpowers) | --- ## Source: docs/references/ai-agents/DISCOVERY_RERANK.md # Discovery Rerank (Cohere + Voyage) Purpose: improve discovery quality by reranking retrieved atomic units (skills/tools/KB/prompts/etc.) against intent. ## Scope - Endpoint: `POST /api/discover` - Stage: retrieve with `discover_v2` then optional rerank - Providers: - primary: Cohere (`cohere/api-key`) - fallback: Voyage (`voyage/api-key`) ## API Contract Request supports optional `rerank`: ```json { "intent": "best skill for docs sync", "limit": 20, "rerank": { "enabled": true, "provider": "auto", "top_n": 20, "model": "rerank-v4.0-fast" } } ``` `rerank` fields: - `enabled` (boolean): defaults to `true` - `provider` (`auto | cohere | voyage`): defaults to `auto` - `top_n` (1-200): optional cap for reranked output - `model` (string): optional provider model override Response adds rerank metadata and per-item score fields when enabled: ```json { "data": [ { "ref": "skill:example", "score": 0.77, "rerank_provider": "cohere", "rerank_score": 0.94 } ], "next_cursor": null, "meta": { "pagination": { "requested_limit": 20, "applied_limit": 20, "has_more": false, "continuation_supported": true, "cursor_order": "response_order", "warnings": [] }, "rerank": { "enabled": true, "requested_provider": "auto", "used_provider": "cohere", "fallback_used": false, "top_n": 20, "warnings": [] } } } ``` ## Default Behavior - Rerank is enabled by default for first-page discovery requests. - Cursor-paginated requests skip rerank to preserve deterministic DB paging behavior. - If additional pages exist after a reranked first page, continuation cursor follows base discovery ordering. - `auto` provider tries Cohere first, then Voyage on failure. - If both providers fail, the endpoint returns base `discover_v2` ordering with warning metadata. - Per-call discover limits are bounded, but session depth is not; agents can keep requesting more pages with `cursor`. ## Vault Setup Store provider keys per org (example `hlt`): ```bash echo -n "" | npx tsx scripts/vault/vault_set.ts --org-code hlt --secret-key cohere/api-key echo -n "" | npx tsx scripts/vault/vault_set.ts --org-code hlt --secret-key voyage/api-key ``` Verify pointers: ```bash npx tsx scripts/vault/vault_inventory.ts --org-code hlt --tool-codes cohere.rerank,voyage.rerank ``` ## Tool Registry Canonical seed migration: - `database/051-seed-rerank-tools-v1.sql` Registers: - `tool:cohere.rerank` - `tool:voyage.rerank` Both tools use Vault-backed `auth_secret_key` pointers and HTTP `call_spec` payloads. ## Sources - Cohere rerank overview: https://docs.cohere.com/docs/rerank-overview - Cohere rerank API reference: https://docs.cohere.com/v2/reference/rerank - Voyage reranker docs: https://docs.voyageai.com/docs/reranker - Voyage rerank API reference: https://docs.voyageai.com/reference/rerank --- ## Source: docs/references/ai-agents/EVAL_HARNESS_ADAPTER.md # Eval Harness Adapter (Promptfoo Baseline) Katailyst treats external evaluation harnesses as **drivers** and the Katailyst DB as the **canonical sink**. Promptfoo runs suites. Katailyst ingests results into replayable, auditable tables so the CMS can: - Show evaluation runs - Show outputs + scores - Feed aggregates back into discovery (Phase 06-02) ## Canonical Mapping (Minimum Contract) Every harness run should write: - `runs` (`run_type = 'evaluation'`) - `run_steps` (`inspect` -> `evaluate` -> `output`) - `payload_store` (content-addressed blobs of prompts/inputs/outputs + the raw harness result payload) - `run_outputs` (one row per evaluated output; plus a summary payload row) - `evaluations` (one row per scored output) Optional (future): - `pairwise_comparisons` (when the harness performs A/B matchups) - `run_costs` (tokens/cost best-effort) ## Promptfoo Adapter (Current) Driver: - `scripts/eval/eval_promptfoo.ts` runs a promptfoo YAML suite and ingests results. How it stores data: - One `runs` row per promptfoo eval. - Three `run_steps`: - `inspect`: suite metadata - `evaluate`: evaluated outputs + evaluation rows - `output`: raw adapter payload JSON (full suite results) - For each promptfoo `EvaluateResult`: - Store prompt text, vars, and output text in `payload_store` - Insert a `run_outputs` row referencing the output payload - Insert an `evaluations` row referencing the run output ## Running The Deterministic Fixture Use the included echo suite: ```bash npx tsx scripts/eval/eval_promptfoo.ts \ --org-id \ --config __tests__/fixtures/promptfoo/hello/promptfooconfig.yaml \ --dry-run ``` Then run without `--dry-run` to write: ```bash npx tsx scripts/eval/eval_promptfoo.ts \ --org-id \ --config __tests__/fixtures/promptfoo/hello/promptfooconfig.yaml ``` ## Notes - Prompt paths (`file://...`) resolve relative to the suite config directory (the runner `chdir`s accordingly). - The adapter is intentionally **minimal**: no UI coupling, no bespoke schema changes. The CMS reads the canonical tables. --- ## Source: docs/references/ai-agents/FACTORY_PATTERNS.md # Factory Patterns (Extracted Concepts) This file captures patterns and principles mined from the Claude Code Skill Factory repo. We do not copy it wholesale — we extract the concepts that fit our system. Source: archived raw drop at `incoming/archive/2026/03/16/claude-code-skill-factory-dev1/`. See `docs/references/REFERENCE_INPUTS.md`. --- ## 1) Generator + Validator Pattern Observed in: `hook-factory/generator.py`, `hook-factory/validator.py`, `agent-factory/agent_generator.py` Concept: every factory has two phases. - Generate (template‑based output) - Validate (structure + safety + policy checks) Our adaptation: - Use the same pattern for skills, agents, prompts, and hooks. - Store validations in DB so evals can learn what “good” looks like. --- ## 2) Intake Questionnaires (Quality Gates Without Hard Gates) Observed in: `prompt-factory` 5‑7 question flow, `hook-factory` interactive Q&A Concept: short, structured Q&A that prevents shallow prompts without over‑governance. Our adaptation: - CMS editor uses questionnaire templates for skills/agents. - Questions are soft (skip allowed), but default to asking. --- ## 2.1) Orchestrator + Subagents (Research Lead Pattern) Observed in: "Building Effective Agents" cookbook (orchestrator-workers, research lead + subagent prompts) Concept: keep the "lead" agent in a small context and delegate bounded tasks to subagents with: - explicit research plan + task allocation - clear tool budgets (avoid infinite loops) - defined output format so synthesis is mechanical Our adaptation: - Use this in GSD planning (planner/checker/verifier loops) and in Phase 04/05 workflows. - Represent the orchestration as `runs + run_steps` so each delegation has receipts (inputs, outputs, timing). --- ## 2.2) Evaluator-Optimizer Loop (Iterative Quality Upgrade) Observed in: "Building Effective Agents" cookbook (evaluator-optimizer workflow) Concept: iterative improvement loop: - generator produces a draft - evaluator returns `PASS | NEEDS_IMPROVEMENT | FAIL` plus concrete feedback - repeat until PASS or budget exhausted Our adaptation: - Use this as a reusable Playbook pattern for "upgrade this unit" (skill/prompt/agent), not bespoke logic. - Store evaluator feedback as artifacts or run outputs so discovery can learn from quality signals over time. --- ## 2.3) Two-Pass Citations (Synthesis Then Citations) Observed in: citation-agent patterns (separate "add citations only" pass) Concept: produce a synthesized report first, then run a second pass that only adds citations without changing prose. Our adaptation: - Use this for research-heavy skills and for study-guide/blog workflows that require citations. - Persist citations separately (e.g., as a `reference` or `test` artifact) so we can audit and re-run. --- ## 3) Reference‑Based Documentation (No Duplication) Observed in: `codex-cli-bridge` (CLAUDE.md → AGENTS.md with references only) Concept: generate guidance by linking to files, not copying content. Our adaptation: - Use ref‑based docs in CMS and exports. - Keep canonical content in one place (DB), with file references for mirrors. --- ## 4) Hook Safety Standard Observed in: `hook-factory/validator.py`, `HOOKS_FACTORY_PROMPT.md` Concept: a clear safety contract for automation hooks. - tool detection - silent failure - no destructive ops - path validation Our adaptation: - Default hook templates in CMS follow these safety rules. - Hook installs must be reversible. --- ## 5) Deliverable Hygiene Observed in: `SKILLS_FACTORY_PROMPT.md` (no backups, no temp files) Concept: generated artifacts must be clean and shippable — no debug debris. Our adaptation: - CMS publish step runs a cleanliness check. - Artifact pack exports strip temp/backup files automatically. --- ## 6) Explicit “How to Use” Docs Observed in: `generated-commands/marketing-research/HOW_TO_USE.md` Concept: every skill/command has a quick HOW‑TO with usage, inputs, outputs, and success criteria. Our adaptation: - Add `HOW_TO_USE.md` as an optional artifact in skills. - CMS can auto‑generate basic HOW‑TO from schema + examples. --- ## 6.1) Tag + Link Stamping (Add This to Factory Output) Add a **final step** in every factory prompt: - assign `family:*`, `action:*`, `stage:*`, `modality:*`, `scope:*` - add **3–8 links** with `type`, `weight`, and `reason` - set `tier` (1–10) and `status:staged` This keeps generated units consistent with taxonomy and discovery expectations. --- ## 7) Multi‑Path Reasoning (Evaluation Quality) Observed in: `advanced-system-architect.md`, `expert-research-analyst.md` Concept: for high‑stakes tasks, require multiple solution paths, tradeoffs, and confidence ratings. Our adaptation: - Build multi‑path evaluation mode for research/planning skills. - Store reasoning variant metadata in eval traces, not in outputs. --- ## 8) Benchmarks + Recommendations Observed in: `social-media-analyzer` (benchmarks, ROI, recommendations) Concept: performance analysis skills should output benchmarks, recommendations, and health classification. Our adaptation: - Build performance analyzer templates. - Reuse for blog/content, not just social. --- ## 9) Trend Research as a Standalone Skill Observed in: `content-trend-researcher` Concept: a trend analyzer skill with multi‑platform signals and intent classification. Our adaptation: - Use it as a last‑30‑days research skill. - Store sources + reasons in output schema with citations. - Distill the method, not the platform list verbatim. The useful core is: cross-platform signal gathering, intent breakdown, gap finding, and outline-ready outputs. --- ## 10) Cross‑Tool Interop (Codex ↔ Claude) Observed in: `codex-cli-bridge` Concept: a bridge that translates guidance for different orchestration stacks. Our adaptation: - Add interop bridges as tools or skills. - Keep registry outputs readable by external orchestrators (Codex, Cursor, MCP clients). --- ## 11) Test‑Driven Skills Observed in: `tdd-guide` and `social-media-analyzer/expected_output.json` Concept: include test cases and expected output to validate skill behavior. Our adaptation: - Add test artifacts to skills and wire validators in CMS. - Store expected output schemas alongside examples. --- ## 12) Skill‑Embedded Utilities Observed in: `codex-cli-bridge/project_analyzer.py`, `skill_documenter.py`, `claude_parser.py` Concept: some skills include real code utilities to analyze or transform inputs. Our adaptation: - Allow skills to ship scripts as artifacts. - Treat scripts as first‑class artifacts with clear inputs/outputs. --- ## Where These Live in Our System - Factory templates → prompts + CMS templates - Validators → lint rules + schema validators - How‑to docs → skill artifacts - Trend analyzers → skills - Benchmarks → KB items + evaluation rules --- ## 13) Agent Metadata Standard Observed in: `AGENTS_FACTORY_PROMPT.md` Concept: agent files include metadata for **auto‑discovery** and **UI clarity**: - `description` (when to invoke) - `tools` (comma‑separated) - `model` - `color`, `field`, `expertise` - `mcp_tools` Our adaptation: - Treat this as a reference pattern, not a canonical schema mandate. - Keep the useful parts: role clarity, invocation cues, field/surface hints, and explicit tool context. - Do not import cosmetic metadata or execution quotas into canonical doctrine unless they map cleanly to the current registry/runtime model. - Keep agent prompts at the right altitude: structured and explicit for reliability, while moving heavy reference material to KB. --- ## 14) Multi-Perspective Strategic Simulation Observed in: `master-strategic-consultant.md` Concept: one high-stakes planning pass can explicitly simulate multiple lenses before synthesis instead of pretending one generic analyst has enough depth. Useful extracted moves: - define 4-5 perspectives with different incentives - make each perspective answer the same core decision from its own lens - run a conflict-resolution pass before final recommendation - separate "recommendation" from "fatal flaw" analysis Our adaptation: - use this for board-level planning, launch review, and high-risk repo/system choices - prefer the perspective-simulation method as a planning/eval technique, not as a permanent new agent ontology --- ## 15) Research Prompt Validation Pattern Observed in: `expert-research-analyst.md` Concept: high-stakes research is strongest when the prompt forces assumption mapping, multiple reasoning paths, explicit confidence levels, and a red-team pass. Our adaptation: - treat this as a pattern for research/planning skills and eval prompts - preserve the multi-path validation idea, not the full mega-prompt shape - use it when the real failure mode is false certainty rather than lack of raw output --- ## 16) App-Store Optimization as a Channel-Playbook Pattern Observed in: `app-store-optimization` Concept: some imported "skills" are really channel or campaign operating packs: research, metadata optimization, testing, launch sequencing, and measurement. Our adaptation: - treat ASO as a marketing/channel-playbook reference lane rather than blindly importing the raw upstream skill - preserve the workflow pieces that improve HLT marketing judgment - avoid duplicating already-mined ASO canon if the current KB lane already covers it --- ## 14) Hook Event Taxonomy Observed in: `HOOKS_FACTORY_PROMPT.md` Concept: explicit hook events and matcher rules. Our adaptation: - Treat hook events as a controlled vocabulary. - Use a validator to block dangerous commands. --- ## Source: docs/references/ai-agents/HLT_AGENT_OS_THESIS.md # HLT Agent Operating System Thesis Status: active thesis memo Last updated: 2026-03-07 ## Thesis Statement HLT is not trying to win by building frontier models. HLT is trying to win by building a principal-centered agent operating system that makes already-powerful AI dramatically more useful through better context architecture, memory, routing, use-case packaging, evaluation loops, observability, and multi-runtime execution. Katailyst is the control plane for that system. It is no longer enough to describe Katailyst as "just a registry." The registry matters, but only because it powers long-lived agents that can discover the right blocks, equip themselves intelligently, collaborate with each other, learn over time, and turn AI capability into repeated business leverage. ## The Core Thesis In One Sentence HLT should have a sidecar AI operating system where a few persistent, deeply-contextual agents share one strong common base, start from compact but rich orientation surfaces, choose from broad weighted menus of reusable blocks, collaborate through explicit packets and traces, and compound in usefulness through memory, lessons, and nightly improvement. ## What HLT Is Actually Building At the highest level, HLT is building five layers at once. ### 1. Principal-assistant layer Important humans should have long-lived agents that: - understand who they serve - understand how that person works - act like a trusted right hand - learn over time - proactively look for ways to help This is the layer where the eventual "best assistant in work and life" behavior lives. ### 2. Domain-steward layer Some agents are not merely personal assistants. They are stewards of a domain: - Victoria: registry, discovery, orchestration, canon stewardship - Julius: ops, planning, prioritization, meeting prep, execution clarity - Lila: marketing, content, multimedia, campaigns, growth These agents still serve real humans, but their default responsibility is domain leadership. ### 3. Reusable capability layer The compounding system is made of reusable atomic units: - skills - tools - KB items - prompts - schemas - playbooks - bundles - recipes - rubrics - actions - automations This is how HLT avoids restarting from zero every time. ### 4. Runtime and integration layer The same persistent identities should be able to act through different adapters: - OpenClaw / Render - Slack - page-embedded assistants - browser contexts - phone-like surfaces - Claude Code / Codex - MCP clients - connected-account systems and sidecars The runtime changes. The canonical truth does not. ### 5. Memory, trace, and improvement layer Every important task should leave behind: - saved outputs - run events - handoff history - lessons learned - candidate skill/doc improvements - principal memory updates Without that layer, the system stays impressive but shallow. ## Ontology These distinctions are mandatory. The system gets muddy fast if they collapse. | Thing | What It Means | | ------------------ | -------------------------------------------------------------------------------- | | Agent | A long-lived persona with owner, memory, proclivities, authority, and continuity | | Subagent | A temporary delegated worker with isolated context for one task or subtask | | Skill | A reusable method for how to think, decide, or execute | | Tool | An executable capability, provider, API, sidecar, or integration | | KB | Durable reference truth | | Playbook | An ordered workflow | | Bundle | A reusable context pack | | Memory | Distilled durable facts, preferences, lessons, and continuity notes | | Trace / run events | Episodic history of what happened during execution | The long-lived HLT personas are `agents`. The one-shot task workers that help inside a run are `subagents`. That distinction matters because HLT is not building a bag of temporary helpers. It is building an organization of persistent AI workers with continuity. ## What This System Is Not The target system is not: - a prompt library - a CMS with AI sprinkled on top - a bag of one-off Claude Code helpers - a single omnipotent monolithic agent - a deep hard fork of OpenClaw - a rigid orchestration tree with no agent judgment The target system is: - one strong shared cognitive base - a few persistent core agents with clear roles - a registry/control plane that helps them equip themselves - a few flagship parent entry surfaces plus broad weighted menus of reusable blocks - a memory and lessons system that compounds - a visible operating layer that feels alive, legible, and trustworthy ## Role Model ### Shared-base principle The correct split is: - 80% shared base - 20% role overlay The way to maximize capability is not to give every agent a different brain. The way to maximize capability is to give them all a very strong common foundation, then let overlays change: - what they pull first - what they go deepest on - what they own canonically - what kinds of proactive opportunities they hunt for ### Victoria Victoria is the control-plane assistant and quartermaster. She is responsible for: - registry discovery - equipping other agents - canon stewardship - infra and orchestration guidance - final structural registry publishing - system-level quality and learning Victoria should be registry-literate, system-literate, and high-context. She should not be the only agent who understands the registry. She should be the one who finalizes structural canon. ### Julius Julius is the chief-of-staff and execution agent. He is responsible for: - planning - prioritization - follow-through - meeting prep and briefings - operational synthesis - proactive "what should happen next?" work Julius should be strong at turning ambiguity into plans, and plans into coordinated action. ### Lila Lila is the marketing and growth studio lead. She is responsible for: - articles - campaigns - social - multimedia packaging - messaging strategy - creative iteration Lila should share the same strong research, planning, and quality base as the others, but pull marketing and multimedia overlays first. ### Non-fleet surfaces currently present in the repo The repo currently also exposes: - `system-primer` as a repo-local execution primer rather than a persistent persona - `ares` as an unresolved mirror/runtime row that should not be normalized as accepted fleet truth while it still advertises the wrong model/runtime assumptions - Devin as an external integrated engineering agent - Atlas, Ivy, Lucy, Nova, Quinn, and Rex as historical plugin/export residue that may still be useful as patterns, but should not be treated as active OpenClaw agents These surfaces matter as repo reality, but they are not the active persistent HLT fleet. The active persistent OpenClaw fleet is: - Victoria - Julius - Lila OpenClaw agents and subagents are different classes. - OpenClaw agents are long-lived runtime personas with continuity. - Subagents are temporary delegated workers used inside a task. Historical export residue should not be quietly promoted into either class. ### Principal assistant vs steward The long-term target architecture is: - principal-facing right-hand assistants as their own persistent agent layer - domain stewards as their own persistent agent layer Today, some of that is still partially fused in the current runtime truth. The target is to separate it cleanly over time. That means the right-hand assistant for Alec should not permanently remain identical to the control-plane librarian. The system should support both: - a principal-facing right hand - a domain-steward / quartermaster ## Layering Model The system should not be grouped by only one axis. It should be layered like this: ### Layer 1. Shared foundation All persistent HLT agents inherit: - the common operating contract - common lessons discipline - common quality rules - common discovery/equipping expectations ### Layer 2. Principal layer The agent loads who it serves: - who this person is - what they value - how they like to work - what delights and frustrates them - what "great assistance" means for them ### Layer 3. Role or department overlay The agent loads role bias: - ops - marketing - registry - support - finance - multimedia ### Layer 4. Task or parent-capability layer The agent loads the relevant task or parent-capability surface: - meeting prep - article - social - multimedia - registry discovery - skill creation - proactive daily attempt ### Layer 5. Runtime adapter The same agent identity can execute through: - OpenClaw - Slack - page-embedded assistant - browser - mobile-like surface - MCP - repo-local tools The adapter is not the identity. ## Routing Model This is the right execution spine for meaningful work: 1. Load always-on shared base. 2. Load front-door SOP and principal core layer. 3. Interpret the underlying request. 4. Estimate complexity, stakes, and delivery type. 5. Decide how much depth the situation deserves and whether local action, self-discovery, or Victoria help is the right move. 6. Do as much research and planning as the situation deserves. 7. Load a weighted, explainable tray of skills, KBs, tools, bundles, playbooks, rubrics, prompts, and examples. Use one block or many as fits. 8. Execute. 9. Judge, revise, or escalate. 10. Save outputs, traces, memory deltas, and lessons. 11. Distill nightly into durable memory and proactive next actions. This implies several non-negotiables: - every meaningful task gets interpreted before action - complexity is estimated explicitly - medium and large tasks require research and planning - discovery/equipping is a first-class step - agents may use one block or many depending on the task - every important result creates durable aftereffects ## Menus Over Routes The operating system should bias toward intelligent composition, not rigid forced routing. - keep the entrance surface compact and concrete - let strong parent capabilities carry the most common jobs - let the agent choose one block or many blocks depending on the situation - treat playbooks as accelerants for proven patterns, not mandatory tunnels - use Victoria as a deeper expert and quartermaster, not a choke point - preserve observability through traces, lessons, and review rather than pre-controlling every grain of execution The point of the registry is not to force the same exact meal every time. The point is to make a much wider field of high-quality choices easy for a smart agent to see, explain, and use well. ## The Crown-Jewel Missing Capability The most important missing capability is not another flashy skill. It is a brutally good shared method that does all of this: - interpret the underlying request - estimate complexity and stakes - determine whether a direct answer is acceptable - decide whether deeper principal context is needed - decide whether Victoria should be consulted - decide what menu or tray of blocks is worth loading and whether Victoria should help curate it - decide when research and planning are mandatory If this layer is weak, the agents: - over-plan easy work - under-plan important work - skip discovery - improvise when they should equip - miss hidden intent - miss the actual business leverage in the request ## User Doctrine `USER.md` should not be a giant landfill of memory. It should be a layered index. ### Always-loaded principal core Every principal-facing agent should always read a compact `PrincipalCoreCard` that includes: - who this person is - role - top values - working style - communication preferences - delight triggers - frustration triggers - current priorities - pointer map to deeper branches ### Deeper principal branches The deeper branches should be retrieved only when relevant: - work style and decision preferences - schedules and rhythms - family and personal context - goals and obsessions - hobbies and identity signals - wow opportunities - lessons learned about serving this principal - relationship context This is not simplification. This is intelligent compression. The agent should always remember the person. It should only load deeper detail when the task needs it. ## Doc-Level Interfaces These interfaces need to be stable enough that future implementers do not invent them ad hoc. ### `PrincipalCoreCard` Fields: - `principal_ref` - `role` - `values` - `communication_style` - `working_style` - `delight_triggers` - `frustration_triggers` - `current_priorities` - `pointer_map` ### `InterpretationResult` This is a heuristic judgment aid, not a rigid gate. Fields: - `principal_ref` - `domain` - `request_type` - `complexity` - `stakes` - `research_required` - `planning_required` - `output_type` - `needs_victoria` ### `CapabilityPacket` `CapabilityPacket` remains the interface name, but it should be understood as a weighted, explainable tray of strong options rather than a single prescribed route. Fields: - `principal_ref` - `request_summary` - `suggested_skills` - `suggested_tools` - `suggested_kb` - `suggested_playbooks` - `load_order` - `reasons` - `auth_requirements` - `caveats` - `confidence` ### `DelegationPacket` This is a context handoff, not a central-planner domination surface. Fields: - `origin_agent` - `target_agent` - `principal_ref` - `objective` - `current_state` - `selected_org` - `runtime_lane` - `capability_packet` - `done_condition` - `output_contract` ### `NightlyDistillationPack` Fields: - `principal_memory_updates` - `lessons_learned` - `proactive_ideas` - `backlog_updates` - `candidate_skill_improvements` - `candidate_doc_improvements` ## Flagship Surfaces To Perfect HLT should not spread effort evenly across dozens of shallow surfaces. It should go very deep on a few flagship parent entry surfaces while still letting agents pull many other blocks beneath them. ### First tier - principal / user layer - interpret-underlying-request - Victoria discovery and equipping - write plans - execute plans - lessons and memory distillation - capability forge - meeting prep ### Second tier - create multimedia - make article and ship to Framer - make social and campaign calendar - chief-of-staff ops synthesis Everything else should branch off those or complement them. The system should not keep adding top-level brains that crowd out agent judgment. ## Memory, Trace, and Improvement Model The system should save more than outputs. It should save: - run events - parent/child handoffs - capability selection traces - failures - retries - lessons - proactive opportunities The purpose is not just dashboards. The purpose is compounding. Agents need durable memory and visible history so they stop repeating mistakes and start improving like real teammates. ## Agent-to-Agent Communication Agent-to-agent communication should not be SSH-first. SSH is for: - operator inspection - maintenance - debugging - emergency intervention Real inter-agent work should use structured packets and shared observable state. That means: - explicit delegation packets - parent/child task relationships - run events - current-state summaries - output contracts - resume semantics This is what makes later replay, story logs, and cross-agent visibility possible. ## Aquarium As Supporting Metaphor The aquarium / live-home / Sims-like idea should stay in the vision, but as a supporting metaphor and future UX expression, not as the core ontology. What it really means is: - persistent agents with visible lives - run-event stories instead of raw JSON blobs - handoffs and collaboration visible over time - homes, habits, and continuity - proactive work surfacing in an engaging way That only works if the underlying system is real: - durable run events - clean handoff packets - memory distillation - capability selection traces - proactive daily loops The aquarium is not the foundation. It is the visible, entertaining surface built on top of a serious event and memory model. ## 2028 Thesis By April 2028, the biggest advantage will not be slightly better prompting. The biggest advantage will be: - stronger context architecture - better memory quality - stronger tool and integration reach - connected-account readiness - better evaluation loops - better observability - better workflow packaging - better trustful human-facing interaction design That is HLT's moat: - best user understanding - best business-specific context packaging - best discovery and equipping - best playbook and skill design - best lessons-learned loop - best visible, useful operating layer ## Build Sequence ### Wave 1. Unify the shared base - one common agent foundation - one common operating contract - one common lessons discipline - one common fallback standard ### Wave 2. Perfect the interpretation layer - interpret request - estimate complexity - decide when to research - decide when to plan - decide when to call Victoria - decide what to load ### Wave 3. Deepen the principal layer - principal core card - layered principal branches - principal-right-hand method - wow opportunities - nightly distillation ### Wave 4. Perfect the flagship parent capabilities - create skill - create multimedia - make article - make social - meeting prep - registry discovery and equipping - brainstorming ### Wave 5. Normalize handoffs and traces - run events - capability trays (`CapabilityPacketV1`) - delegation packets - replay summaries - lessons extraction - proactive daily attempt ### Wave 6. Fix integration and connected-account UX - Slack - MCP onboarding - connected accounts - Pipedream-style auth - Framer draft-safe publish path - multimedia and finance sidecars ### Wave 7. Make the system feel alive - story logs - replay UI - visible cross-agent work map - page-embedded assistants - aquarium and agent-home surfaces ## GSD Implications This thesis reinterprets the active roadmap as follows: - Phase 49 = clarify agent-doc stacks and the principal-assistant vs steward split - Phase 50 = Victoria-like quartermaster discovery through weighted, explainable `CapabilityPacketV1` menus - Phase 51 = layered context packets with principal core and conditional deeper branches - Phase 52 = explicit delegation, shared observable run state, and replayable handoffs - Phase 54 = nightly distillation, lessons, evals, and build-measure-learn loops - Phase 68 = packaged use-case menus and clearer operator entry surfaces - Phase 70 = visible replay/story surfaces plus embedded page-level agent utility ## Final Compression The system HLT wants is a principal-centered, multi-agent operating system where: - a few persistent agents share one strong base - each agent has thin role overlays instead of a separate brain - Victoria equips and governs canon without becoming the only intelligent agent - deeper principal context is layered and retrieved intelligently - requests are interpreted before action - research and planning scale to stakes - skills, tools, KBs, bundles, and playbooks are the compounding reusable layer - agents can use one block or many depending on the situation - packets and menus support judgment instead of replacing it - runs, lessons, and proactive ideas are saved and distilled - the visible product eventually feels like an alive, legible, trustworthy AI organization That is the thesis. --- ## Source: docs/references/ai-agents/HOSTED_AGENT_CORE_SETUP.md # Hosted Agent Core Setup Purpose: give Victoria, Julius, and Lila one shared setup reference for MCP access, vault posture, and the highest-leverage tool cohort without scattering setup truth across multiple mirrors. This is a shared hosted-agent reference, not a replacement for the ops guides or the 8 steering files. ## What Is Core The hosted trio should share one strong access base: - Katailyst remote MCP is the default hosted-agent control-plane entry. - Trusted hosted agents should default to the full catalog or explicitly set `agent,delivery`. - `bootstrap` is the intentionally narrow first-glance toolset, not the normal hosted-trio posture. - Supabase MCP is the direct canonical DB read lane. - Vault-backed credentials are the only sanctioned secret path. Everything else builds on top of that base. ## Default Retrieval Depth Hosted agents should default to using more tools, skills, and surrounding context than first instinct suggests. - For almost every meaningful task, use at least one or two tools or skills. - For medium-complexity work, expect to compose several. - For high-stakes or ambiguous work, it is normal to pull five, ten, or more surfaces together. - The operative question is not "do I need tools or skills?" It is "how much digging does this task deserve?" Underuse is the common failure mode. Agents should assume they need to dig around before acting, then scale the depth with the stakes and complexity. ## First-Class MCP Surfaces | Surface | Role | Default posture | | ------------------------------ | ---------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------- | | Katailyst MCP | hosted-agent control plane for discovery, graph reads, packeting, and canonical tool execution | default to the full catalog or `agent,delivery`; use `bootstrap` only when intentionally narrowing | | Supabase MCP | direct canonical DB query/read lane | use when deeper direct reads are required | | Render MCP | hosted Render infra lane for service inventory, logs, metrics, and workspace-level inspection | optional, use when runtime topology, deploy state, or service health is the question | | Katailyst local registry stdio | repo-local development and debugging path | local only, not the default hosted-agent runtime story | | Multimedia Mastery MCP | remote media execution surface | optional, use when the task needs media generation or transformation | | Notion MCP | interactive Notion workspace lane | optional, use when OAuth-capable client workflows are appropriate | Katailyst hosted MCP: - URL: `https://www.katailyst.com/mcp` - Default hosted-agent toolset header: `x-katailyst-toolset: agent,delivery` (or omit the header entirely for the same full-catalog posture) - Default auth posture: bearer token or OAuth depending on client/runtime Supabase MCP: - URL: `https://mcp.supabase.com/mcp?project_ref=exuervuuyjygnihansgl` - Role: canonical DB reads, deeper registry inspection, and direct query/debug work Render MCP: - URL: `https://mcp.render.com/mcp` - Auth: `Authorization: Bearer ${RENDER_API_KEY}` - Role: hosted Render inventory, logs, metrics, deploy history, and Render-managed database inspection - Limitation: does **not** replace SSH or a verified mount for `/data/workspace` file-truth audits ## Default Hosted-Agent Flow 1. Connect to Katailyst MCP with no toolset header or with `agent,delivery`. 2. Call `registry.capabilities`. 3. Build a packet with `registry.agent_context`. 4. Expand with `discover`, `get_entity`, `traverse`, and `registry.graph.summary` as needed. 5. Use `tool.describe` and `tool.execute` for canonical tool families. 6. Use Supabase MCP only when the task needs direct canonical DB reads beyond the control-plane packet. 7. Drop down to `bootstrap` only when the runtime or client intentionally wants the smaller first-glance branch. ## Vault Posture - Secrets live in Vault, not in repo files, docs, mirrors, or exported packs. - Repo surfaces may carry metadata pointers such as `auth_secret_key`, never secret values. - Hosted agents should prefer vault-backed execution surfaces over ad hoc provider auth handling. - If a workflow depends on credentials, verify the vault-backed lane first. High-value vault-backed families in the current hosted-agent stack: - Tavily - Firecrawl - Render MCP (`render/api-key`) - CodeSandbox - Relevance - Multimedia Mastery - AgentMail (per-agent inbox-scoped email — Victoria, Julius, Lila each have their own inbox and vault key) - delivery/publishing surfaces that expose canonical tool refs See: - [Vault + Tool Execution](../contracts/VAULT_TOOL_EXECUTION.md) - [MCP Tools Reference](../../api/MCP_TOOLS_REFERENCE.md) - [Render MCP Integration Contract](../integrations/INTEGRATION_CONTRACT_RENDER_MCP.md) ## Core Tool Cohort These are the shared high-value tools the hosted trio should treat as first-class: - Katailyst MCP - Supabase MCP - Tavily - Firecrawl - CodeSandbox - Relevance - Multimedia Mastery - AgentMail (per-agent email inboxes for outbound sends, inbox polling, and human handoff) - Linear when ops follow-through or coordination matters Role-specific emphasis still differs: - Victoria pulls hardest on registry, infra, and packaging lanes. - Julius pulls hardest on planning, meetings, operations, and follow-through lanes. - Lila pulls hardest on content, campaigns, multimedia, and audience-facing packaging lanes. ## Notion Is Optional, Not Mandatory Base Notion is a first-class integration lane, but not mandatory startup infrastructure for every hosted-agent turn. Two distinct paths exist: - Notion MCP: official hosted MCP server, OAuth 2.0 authorization code flow with PKCE, best for interactive client-style workflows. - Notion REST: bearer-token path backed by `notion/api-token` in Vault, appropriate for automation workflows that intentionally use the REST API. Use Notion MCP when the runtime/client can support the OAuth flow cleanly. Use Notion REST when the workflow is automation-oriented and the vault-backed bearer token path is the right fit. See: - [Notion Integration Guide](../../../.claude/kb/curated/global/cms/notion-integration-guide/KB.md) - [Notion MCP Docs](https://developers.notion.com/docs/mcp) - [Build an MCP Client for Notion](https://developers.notion.com/docs/build-an-mcp-client) ## Authoritative References - [Agent Quick Start](../../QUICK_START_AGENTS.md) - [Core Agent Shared Foundation](./CORE_AGENT_SHARED_FOUNDATION.md) - [Agent Doc Map](./AGENT_DOC_MAP.md) - [MCP Tools Reference](../../api/MCP_TOOLS_REFERENCE.md) - [MCP Setup (Quick Start section 4a)](../../QUICK_START_AGENTS.md) - [Vault + Tool Execution](../contracts/VAULT_TOOL_EXECUTION.md) - [Integration Contract - Multimedia Mastery](../integrations/INTEGRATION_CONTRACT_MULTIMEDIA_MASTERY.md) ## What This Doc Should Prevent - treating Supabase MCP as the whole hosted-agent setup story - treating Render MCP like live disk access when `/data/workspace` truth still requires SSH or a verified mount - scattering MCP auth and vault rules across multiple thin mirrors - making Notion look like mandatory base infrastructure when its auth posture is different - mixing runtime sync work with channel UX work --- ## Source: docs/references/ai-agents/MULTIMEDIA_CAPABILITY_REFERENCE.md # Multimedia Capability Reference This is the shared capability annex for agent-facing multimedia work. Use it when the request is about making, editing, or animating visual assets and the agent needs to choose the right execution path instead of guessing from provider names. ## Parent Job Treat `Create Multimedia` as the parent job. The user-facing job is usually one of these: 1. Generate an image 2. Edit or transform an existing image 3. Turn an image into a video 4. Package the result into a deliverable, handoff, or published surface Provider names stay secondary unless they materially change the choice. ## Current Proven Subpaths ### 1. Image prompt writing Use when: - the request is creative but under-specified - visual quality matters - the agent needs to tighten composition, style, framing, lighting, or subject direction before generation Primary support: - `skill:image-prompting@v1` Good plain-English examples: - "Make a hero image for this report." - "Make this feel premium and medical, not generic stock." - "Rewrite the prompt so the output matches our brand and product audience." ### 2. Image generation Use when: - the user needs a new still image - the prompt is already clear enough to execute - the job is hero art, social art, portrait, product shot, diagram-style image, or campaign visual Primary support: - `skill:fal-image-gen@v1` - `skill:openai-image-gen@v1` as a secondary model/provider lane when appropriate Good plain-English examples: - "Generate a brain image for Slack." - "Create an upgrade-screen hero image for NCLEX Mastery." - "Make a clean leadership-briefing visual for a board memo." ### 3. Image editing / transformation Use when: - the user already has an image - the job is to revise it, not start from scratch - the requested change is stylistic, compositional, brand, or scenario-based Primary support: - parent multimedia lane - prompt-writing support first if the edit request is underspecified - execution through the active image-edit-capable provider/tooling lane Good plain-English examples: - "Edit this image to match our brand." - "Turn this into a tie-dye zombie head at the Grand Canyon." - "Make this more premium and less cartoonish." ### 4. Image-to-video Use when: - the user wants motion from an existing still - the deliverable is a short animated clip, teaser, or motion asset Primary support: - current image-to-video execution lane - `skill:video-frames@v1` where frame planning or video structure support is needed Good plain-English examples: - "Turn this still into a short motion clip." - "Animate this hero image for social." - "Make this image feel alive without changing the subject." ### 5. Packaging and handoff Use when: - the media result needs to be embedded in a report, article, social post, meeting packet, or asset workflow - the user needs a deliverable, not just a file Typical next step surfaces: - article workflow / Framer draft-safe packaging - social workflow / scheduling-distribution lane - Assets storage and handoff - meeting-prep or executive-briefing packaging For article and Framer packaging, Cloudinary should be the preferred hosted-media layer so the bundle keeps reusable URLs, public IDs, aspect ratios, transformation recipes, gallery rows, and downstream variant derivation. ## Research-First Rule Do not jump straight to generation when the request is strategically important, ambiguous, or audience-sensitive. Research-first cases include: - brand-sensitive campaign visuals - product or upgrade-screen visuals - executive or investor-facing assets - article or social assets that need audience resonance, not just aesthetics - any case where the prompt itself is the main bottleneck In those cases: 1. understand audience and objective 2. inspect existing brand/product context 3. research strong patterns when needed 4. write or refine the prompt 5. generate or edit 6. package the output ## Recovery And Fallback Standard Do not stop at: - "could not access" - "could not execute" - "I can't do that" If the direct path is blocked, the agent should still do all of this: 1. name the blocked capability precisely 2. state what is still available right now 3. offer the nearest useful fallback 4. explain the next best action in plain English Bad: - "I could not access the tool." Good: - "The direct image-to-video lane is unavailable right now. I can still tighten the prompt, generate the strongest still frame, and hand you an execution-ready video brief or retry as soon as the video lane is back." Good: - "I do not currently have authenticated access to Multimedia Mastery, but I can still complete prompt design and image generation through the proven fal lane." ## Agent Emphasis ### Julius Julius should understand multimedia as an adjacent operations and briefing capability. He should be able to: - route image generation - route image editing - route image-to-video - explain what each lane is for - package results into ops-ready or executive-ready handoffs He should not answer with vague dead ends when the request is concrete and visual. ### Lila Lila should treat multimedia as a core lane. She should be strongest at: - creative direction - marketing visuals - social/media variants - campaign-oriented prompt improvement - choosing between still, edit, and motion branches ### Victoria Victoria should treat multimedia as a system-level orchestration lane. She should be strongest at: - routing to the right capability - loading the right supporting skills/KB - preserving provenance and handoff quality - making sure outputs land in the right downstream asset or workflow surface ## Current Truth About Multimedia Mastery Multimedia Mastery is a media-native sidecar and should remain a secondary execution/integration surface until authenticated verification is proven with a valid token. Use its public discovery surfaces for capability discovery now: - `/api/media/v1/capabilities` - `/api/media/v1/tools` - `/api/media/v1/mcp` But do not frame it as the only or primary proven media lane until authenticated execution is verified. --- ## Source: docs/references/ai-agents/SYSTEM_ONE_PAGER.md # How This System Works — The One-Pager Read this first. Everything else is detail. --- ## The Mental Model There are three different kinds of agents that interact with this system, and they have **completely different relationships** to it. If you confuse them, you'll misunderstand everything else. ### Kind 1: IDE Agents (you, if you're reading this from inside the repo) You're an agent running in Claude Code, Codex, Cowork, or a similar IDE session, working **inside the Katailyst repo itself**. You're a builder — you maintain the registry, write code, curate entities, fix bugs, run scripts, and improve the system. Your context is the repo. Your entrypoint is `AGENTS.md`. You read `.claude/agents/` for agent identity files, `.claude/skills/` for local skill mirrors, and `docs/` for architecture docs. You connect to the DB via the local MCP server defined in `.mcp.json` or `.codex/config.toml`. **You are not Victoria, Julius, or Lila.** You don't have their steering files, their runtime disk, their persistent memory, or their principals. Don't treat their docs as your docs. ### Kind 2: The Hosted Fleet (Victoria, Julius, Lila) These are persistent agents running on **Render via OpenClaw** — a completely different runtime from this repo. They live on a persistent disk at `/data/workspace` with 8 steering files that are **OpenClaw disk files, not repo files**: | File | Purpose | Lives on | | -------------- | ------------------------------------- | ----------------------------------------------------- | | `AGENTS.md` | Startup contract and coordination | Render disk (also exists in repo, but different role) | | `SOUL.md` | Identity, taste, personality | Render disk only | | `USER.md` | Principal and relationship context | Render disk only | | `IDENTITY.md` | Service identity and runtime facts | Render disk only | | `TOOLS.md` | Tier-1 tools and vault posture | Render disk only | | `BOOTSTRAP.md` | Restart and recovery | Render disk only | | `HEARTBEAT.md` | Cadence and monitoring | Render disk only | | `MEMORY.md` | Durable truths and archived decisions | Render disk only | These 8 files are a **small steering compass**, not the full playbook. They quickly orient the agent, then route it back into the deeper shared canon in the registry DB. The hosted fleet connects to Katailyst via the **OpenClaw plugin**, which wraps the MCP endpoint at `https://www.katailyst.com/mcp`. The plugin registers tools like `katailyst_registry_discover`, `katailyst_registry_agent_context`, etc. — these are wrappers that translate to MCP calls under the hood. The repo contains **mirrors** of the fleet's identity docs at `.claude/kb/curated/global/ai-engineering/` (e.g., `victoria-identity-soul/KB.md`). These mirrors exist for review and authoring. They are NOT what the agents actually read at runtime — the agents read from the Render disk and from the registry DB. ### Kind 3: External / Community Agents Any agent not in the first two categories. Could be a LangChain app, a custom orchestrator, a Claude desktop session connected via OAuth, or anything else. They connect to the MCP or API directly and consume whatever they need. Their entrypoint is `CATALYST.md`. --- ## What Katailyst Actually Is Katailyst is a **registry and performance lab** for composable AI building blocks. Think of it as an armory — it stores, organizes, discovers, and serves atomic units that agents use to do their work. It runs on three layers: | Layer | Technology | What it does | | --------------- | --------------------- | -------------------------------------------------------------------------- | | **Database** | Supabase (PostgreSQL) | Stores all entities, links, tags, evals, traces. **This is canonical.** | | **Application** | Next.js on Vercel | Web CMS, API routes, MCP server, discovery engine, tool execution | | **Repo** | GitHub | Code, scripts, mirrors for portability and development. **Not canonical.** | The database is truth. The app is how you access it. The repo is a workspace for building the app and reviewing mirrors. --- ## The 1,377 Building Blocks Everything in the registry is an **atomic unit** — a composable block with a type, code, version, tags, links, and content. | Type | What it is | Example | | -------------- | -------------------------------------------------- | ------------------------------------------------------ | | `skill` | Reusable agent capability with a SKILL.md launcher | `make-article`, `meeting-prep` | | `tool` | HTTP endpoint with execution spec and auth | `tavily.search`, `cloudinary.transform` | | `kb` | Curated knowledge base entry | `content-creation-philosophy`, `global-catalyst-atlas` | | `prompt` | Versioned instruction record | System prompts, task templates | | `recipe` | Multi-step composition pattern | How to combine skills + tools for a workflow | | `content_type` | Output format definition | Article, social post, QBank question | | `style` | Voice, visual, or formatting guidance | Brand voice, visual systems | | `bundle` | Curated collection of related units | `blog-writing-kit` | | `playbook` | Workflow accelerator | Multi-step guides for complex tasks | | `schema` | Data shape definition | Input/output contracts | | `channel` | Distribution surface | Telegram, Slack, web | | `agent` | Registered agent identity | `victoria`, `julius`, `lila` | | `rubric` | Quality evaluation framework | How to judge article quality | | `metric` | Quality measurement definition | What "good" looks like quantitatively | | `lint_rule` | Content quality rule | Checks for common mistakes | Units connect to each other via **9,760+ weighted directed links**: `requires`, `recommends`, `uses_kb`, `pairs_with`, `bundle_member`, `related`, etc. These connections are how the graph carries structural intelligence. --- ## How Discovery Works This is the core pattern all agents should follow for non-trivial tasks: **1. Ask with intent.** Call `discover` with a rich natural language description. Include audience, domain, format, and purpose. The richer the query, the better the results. **2. Read the ranked results.** Each hit comes with `match_reasons` — text match, embedding similarity, tier ranking, bundle membership, agent affinity. This is a menu, not a route. **3. Walk the graph.** Call `traverse` on top hits to find structurally related entities. A skill might `require` a schema, `recommend` a style, or `use_kb` from a knowledge base entry. **4. Compose your toolkit.** Pick the blocks you need. Typical: 1-3 skills, 1-2 KB items, a style, maybe a tool. Complex work: 10+ blocks. More is usually better than less. **5. Execute with judgment.** Katailyst suggests equipment. The agent decides how to use it. No predetermined sequences. **6. Write back.** Store the output, log a trace, return quality signals. This makes the system smarter over time. --- ## The MCP Tool Surface (36 tools) Agents connect to the MCP at `https://www.katailyst.com/mcp`. Auth is Bearer token or OAuth. The tools are grouped into families: | Family | Tools | Status | | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------- | | **Meta/Session** | `registry_capabilities`, `registry_health`, `registry_session`, `toolset_guide`, `registry_graph_summary` | Working | | **Discovery** | `discover`, `registry_search`, `registry_expand`, `search_tags` | Working | | **Registry Read** | `get_entity`, `registry_get`, `list_entities`, `get_skill_content`, `registry_artifact_body`, `incoming_sources_list`, `incoming_sources_get` | Working | | **Graph** | `traverse`, `registry_graph_summary` | Working | | **Agent Context** | `registry_agent_context` | Working | | **Tool Execution** | `tool_search`, `tool_describe`, `tool_execute` | Describe works, execute being repaired | | **Registry Write** | `registry_create`, `registry_update`, `registry_add_revision`, `registry_link` | Working | | **Lists** | `lists_create`, `lists_add_item`, `lists_get`, `lists_vote`, `lists_publish` | Working | | **Delivery** | `delivery_connect_link_create`, `delivery_targets_discover`, `delivery_targets_list`, `delivery_target_promote` | Working | **Tool naming:** The server registers both underscored (`registry_capabilities`) and dotted (`registry.search`, `registry.expand`, `registry.get`) aliases. The dotted aliases are convenience shorthands that resolve to the same handlers. Use the names returned by `tools/list` for your client. --- ## The Org Model | Org | Role | Write scope | | ------------ | ----------------------------------------------------------------------- | ---------------------------------- | | **`hlt`** | Live operating layer — fleet docs, agent overlays, HLT-specific content | Active fleet work | | **`system`** | Shared canonical library — reusable hubs, templates, references | Shared canon that HLT reads freely | `system` is not a fence. It's a shared shelf. HLT agents read from both. Writes go to `hlt` unless building something explicitly cross-org. --- ## The Hosted Fleet | Agent | Principal | Focus | Publish authority | | ------------ | --------- | ------------------------------------------------------------- | ------------------------------------- | | **Victoria** | Alec | Registry stewardship, fleet equipping, design, infrastructure | Can publish structural registry canon | | **Julius** | Justin | Operations, planning, follow-through, meeting prep | Draft and stage only | | **Lila** | Emily | Marketing, content, campaigns, multimedia | Draft and stage only | All three share one strong base, then diverge through thin per-agent overlays. They all can research, write, plan, and use tools — the overlays change what they pull first, not what they're allowed to do. Their shared connection story: Katailyst remote MCP → full catalog or `agent,delivery-admin` toolset → bearer token auth → vault-backed secret execution. Their shared entry after repo primers: DB-side `kb:global-catalyst-guide` → per-agent ops guide → capability lanes. --- ## Core Design Principles **Discovery over routes.** If discovery returns bad results, fix the metadata. Don't wire a forced path. **Menus over mandates.** No predetermined sequences. No single-path tunnels. **Hints over gates.** Links have weights. Tags are soft. Nothing is a hard lock without an explicit contract. **Quality over speed.** Research first. Read before editing. A curated entity beats ten stubs. **DB-first, local-fast-path.** Database is truth. Files are fast cache. **You are not the orchestrator.** Katailyst is the armory. You equip agents and learn what works. The agent on the ground has more context than you — stop trying to predetermine their moves. --- ## Quick Routing **If you're an IDE agent working in this repo:** Read `AGENTS.md` → `docs/RULES.md` → `docs/VISION.md`. Check `git log -5`. Do not touch fleet steering files unless explicitly asked. **If you're an external agent or Claude desktop session:** Read `CATALYST.md`. Connect to `https://www.katailyst.com/mcp`. Start with `discover`. **If you're onboarding a new hosted fleet agent:** Read `docs/references/ai-agents/CORE_AGENT_SHARED_FOUNDATION.md` and `docs/references/ai-agents/HOSTED_AGENT_CORE_SETUP.md`. These define the shared runtime contract and access story. **If you're debugging the fleet:** Read `docs/references/ai-agents/AGENT_DOC_MAP.md` for the full surface map, and the incident reports in `docs/reports/incidents/` for current known issues. --- ## Current Known Issues (2026-03-16) `tool_execute` input normalization is broken — rejects valid JSON payloads. The OpenClaw plugin has a naming mismatch with the MCP server (dots vs underscores). Readiness reporting overstates execution health. These are being fixed in a sequenced repair: naming fix first, then executor fix, then readiness truthfulness. See `docs/reports/incidents/diagnosis-victoria-mcp-break-2026-03-16.md`. --- ## Source: docs/references/ai-agents/USE_CASES.md # Core Agent Capability Lanes Use this page as the aligned reference for the broad DB-backed lanes that the hosted fleet should enter after the shared Catalyst entry surface. ## How To Use This Page Treat this as a lane map, not a forced workflow. - start from the shared fleet entry in `global-catalyst-atlas` - load the relevant ops guide when role context matters - choose the broad lane that best matches the real job - branch into the current flagship units under that lane - keep going deeper through linked skills, KBs, tools, prompts, styles, rubrics, playbooks, schemas, content types, assets, and examples as the task needs them - if quality matters, compare exact-category and stronger adjacent-category examples before locking a direction Important: - these are broad lane names, not a new ontology - strong lanes are strong starts, not prisons - the point is easier entry into a deep library, not flattening the library - the registry suggests; the agent composes - if the user asked for a thing, the output should usually come back as that thing, not as abstract commentary about it - when the output is meant to be seen, shared, or judged, stunning design and strong packaging matter - do not strand deliverables on `/data/workspace/...` Real discovery and access surfaces remain: - `/.well-known/llms.txt` - `/llms.txt` - `POST /api/discover` - `POST /api/traverse` - Supabase MCP - CMS/operator browse - vault-backed tools and secret-backed integrations ## 1. Registry / Creation / Discovery Use this lane when the real job is to understand Katailyst, find the right atomic units, classify a new unit correctly, improve the registry, or create a reusable capability instead of solving a one-off task blindly. Current flagship starts: - `global-catalyst-atlas` - `global-agent-principles` - `tools-guide-overview` - `registry-discovery-primer` - `create-skill` - `docs/atomic-units/DECISION_MATRIX.md` - creation/classification surfaces - registry quality and health surfaces such as `playbook:registry-health-scan@v1` ## 2. Multimedia Use this lane when the real job is to generate, edit, transform, package, or repurpose visual and media outputs. Current flagship starts: - `create-multimedia` - `image-prompting-guide` - `cloudinary-integration-guide` - Cloudinary upload/transform/manage surfaces - multimedia capability references - image/video generation and packaging surfaces Use the lane directionally: start here, then pull the right styles, assets, examples, rubrics, tools, and adjacent article/social/page surfaces for the actual package. ## 3. Articles Use this lane when the real job is to research and write an article, choose the right article family, package article copy with visuals, or move from research to outline to draft to judge to package. Current flagship starts: - `make-article` - article content types - article research and writing surfaces - article packaging, Framer/Vercel, and Cloudinary handoff surfaces - style-selection and article-type ladder surfaces ## 4. Social Use this lane when the real job is to create a post, thread, carousel, or social package with hooks, variants, visuals, packaging, and distribution judgment together. Current flagship starts: - `make-social` - social content types - social strategy, hook, and distribution surfaces - social branch and platform-format surfaces - media/image support surfaces when the package needs visuals ## 5. QBank Use this lane when the real job is to create or improve question bank items, choose the right qbank format, align to blueprint and pedagogical quality, and package the work with the right schema, rationale, and validation surfaces. Current flagship starts: - `qbank-kit` - qbank content types - exam blueprint and qbank-quality surfaces - qbank schema and format-selection surfaces - rationale, validation, and blueprint-adjacent supporting surfaces ## 6. Meeting Briefing Use this lane when the work should become a briefing packet, pre-read, decision memo, meeting report, or other artifact that needs structure and presentation rather than a chat dump. Current flagship starts: - `meeting-briefing-kit` - `meeting-prep` - meeting briefing playbooks, examples, and reporting surfaces - related research and decision-support surfaces ## 7. Page / Web Design Use this lane when the real job is to design a landing page, package information visually, build page architecture or design systems, or prepare something that should be viewed rather than merely described. Current flagship starts: - `world-class-page-design` - web content types - design and page-system surfaces - `vercel-deployment-guide` - landing-page and render-target surfaces - strong reference galleries, design direction, and publish-surface handoffs ## Lane Rule Start with the broad lane. Then branch. Use the right skills, tools, KBs, prompts, styles, rubrics, assets, and examples to make the result materially better. --- ## Source: docs/references/contracts/AGENT_RUNTIME_AND_SKILL_ADAPTATION.md # Agent Runtime and Skill Adaptation Contract Status: Draft v1 Last updated: 2026-03-06 ## Purpose This contract exists to stop three different concepts from being treated as the same thing: 1. HLT fleet agents running on OpenClaw/Render 2. Claude Code or Codex subagents/local project agents 3. Imported skills about "agents", MCP servers, or tool usage When these collapse into one bucket, the registry becomes misleading. Discovery gets noisy, tiers become wrong, and imported skills teach the wrong runtime assumptions. ## Core Distinctions ### 1. HLT fleet agents Examples: - Victoria - Julius - Lila - Ares These are long-lived operating personas with: - hosted runtime behavior - channels - memory - identity docs - tool access - org-specific directives They are not "subagents" and should not be grouped with generic imported agent patterns. ### 2. Local project subagents Examples: - Claude Code subagents - Codex local helper agents - repo-scoped automation agents These are execution helpers inside a local coding/runtime environment. They may rely on: - repo file tree - local shell access - git state - IDE/editor affordances They are not equivalent to HLT hosted agents. ### 3. Imported "agent" skills Many external packages use the word "agent" loosely. In Katailyst, these imports must be adapted into one of these meanings: - `subagent_pattern` - `agent_runtime_pattern` - `tool_operator_skill` - `workflow_skill` If the import is really "how to use GitHub/Linear/MCP/AI SDK effectively", it is usually not an HLT agent. It is either a skill or a tool-operator skill. ## Skill vs Tool vs Tool-Operator Skill ### Tool A tool is an executable capability or integration surface. Examples: - `tool:vercel-deploy@v1` - `tool:v0.platform_scaffold@v1` - `tool:firecrawl.scrape@v1` Questions a tool answers: - What can be called? - What inputs/outputs exist? - What runtime/executor is required? ### Skill A skill is reusable execution guidance. Questions a skill answers: - When should this be used? - How should the work be approached? - What sequence or quality bar matters? ### Tool-operator skill This is the bridge pattern that often gets mislabeled. Use a tool-operator skill when the core value is not the tool itself, but the operator methodology for using it well. Examples: - GitHub MCP workflow guidance - Linear operating methodology - Firecrawl research methodology - Vercel deployment review flow Rule: - If the imported item teaches effective use of a tool, keep the tool as a `tool` and model the methodology as a `skill`. ## Runtime Overlay Model Every high-value skill should be thought of in layers. ### Portable core The portable core answers: - what the skill does - when to use it - what quality bar matters This layer should remain usable across: - Katailyst - Claude Code - Codex - OpenClaw - future runtimes ### Runtime overlays Runtime overlays answer: - what changes in Claude Code/Codex - what changes in OpenClaw/Render - what changes in browser/app runtimes Do not hardcode one runtime's assumptions into the portable core. Examples: - Repo-local guidance like "inspect the file tree" belongs in a Claude Code/Codex overlay. - Hosted-agent guidance like "use db.js on Render disk" belongs in an OpenClaw/Render overlay. - Shared strategy like "brainstorm broadly, compare options, then judge and execute" belongs in the portable core. ## Placement Rules ### Fleet-agent material Fleet-agent operating material belongs primarily in: - agent identity docs - agent core directives - agent lessons - agent foundation bundles - agent memory/log surfaces This material should not be surfaced as if it were generic reusable skills for everyone. ### Generic reusable methods Reusable methods belong in skills and linked artifacts. Examples: - brainstorming - writing plans - executing plans - skill creation - meeting prep ### Agent logs and semantic memory Daily logs, memory checkpoints, and session carry-forward material are real assets, but they should render in an agent-specific lane: - `Core Directives / Runtime Docs` - `Memory / Logs` - `Identity / Bootstrap / Tools` They should not clutter top-level KB browse surfaces the same way generic reusable reference KB does. ## Import Adaptation Rules When importing from Anthropic, skills.sh, MCP ecosystems, or other repos: 1. Preserve the upstream portable core. 2. Decide the real Katailyst meaning: - tool - skill - tool-operator skill - subagent pattern 3. Add runtime tags that tell the truth. 4. Rewrite the name if the upstream label is misleading in Katailyst. 5. Add HLT/org overlays only where they improve discovery or execution. 6. Do not import local-repo assumptions into hosted-agent surfaces without a runtime overlay. ## Naming Rules ### Allowed naming clarifications - `for Claude Code` - `for OpenClaw` - `MCP` - `Workflow` - `Methodology` - `Operator Guide` These are useful when they resolve real ambiguity. ### Naming anti-pattern Do not let imported skills keep the word "agent" if they are really: - a Claude Code subagent pattern - a tool-operator guide - a repo-local automation pattern That creates semantic drift against HLT fleet agents. ## Ranking Implications Tiering should reflect actual HLT usefulness, not just package completeness. Examples: - `brainstorming` is Tier 1 because it is broadly reusable across planning, product, writing, and strategy. - `gh-issues` is not Tier 1 just because GitHub matters. If Linear is the real planning system, the GitHub issue skill should rank lower. - `himalaya` or similar email CLI skills are not Tier 1 unless they are proven in the hosted runtime you actually use. - `openai-image-gen` is low priority if FAL is the real image-generation path. ## HLT Business Lens Skill adaptation should reflect the business that actually uses the registry: - HLT is a lean B2C test-prep company in eastern Iowa. - The company serves millions of users across many apps and verticals. - The current business is heavily qbank/app driven, but growth requires stronger content marketing, deeper student research, and better planning. - Engagement matters, not just correctness or generic educational value. - Strong outputs should help revenue, trust, retention, and practical student value. That means imported skills often need more than structural cleanup. They may need: - stronger audience context - more persuasive and engaging framing - platform-aware or channel-aware guidance - links to HLT voice, performance, and audience context - more realistic prioritization for Victoria, Julius, and Lila ## Immediate Application to Current Registry ### Brainstorming `brainstorming` should be treated as: - a reusable Tier 1 planning skill - linked to `writing-plans` - linked to `executing-plans` - often paired with `skill-judge` It should also be able to pull HLT-specific business context through linked KB/bundles rather than baking that context directly into the generic launcher. ### GitHub / Linear - GitHub is important. - GitHub issues are not automatically important for HLT planning if Linear is the real work system. - `linear` should stay high. - `github-mcp` can stay high as a tool-operator skill because repo/PR/review workflows matter. - `gh-issues` should remain below `linear` unless HLT materially shifts its workflow. ### Capability Families Do not treat every related tool, KB, and method as a peer. Preferred compression pattern: - one top-level operator-facing skill for the family - executable tools under it - KB/reference material below that - thin layer skills only when the sub-workflow is truly distinct Examples: - Web retrieval family: - top-level skill: `firecrawl` or future umbrella retrieval skill - tools: `tool:firecrawl.scrape`, `tool:firecrawl.crawl`, `tool:tavily.search` - reference KB: `kb:tavily-search` - GitHub family: - top-level skill: `github-mcp` - thin layer: `gh-issues` - planning bridge: `linear` or `catalyst-linear-js` ### Email / Browser / Login These are important capability classes, but runtime truth matters more than theoretical usefulness. - If the hosted runtime cannot reliably use the CLI path, the skill should not be top-tier. - If a different tool path is preferred in practice, rank and describe the preferred path instead of preserving theoretical parity. ## Follow-On Rule When a skill is imported and adapted for Katailyst, add or update: - truthful runtime tags - clear use case - relationship links - any needed runtime overlay artifacts Do not keep asking the operator to re-explain the same runtime distinctions if the distinction can be encoded here once. --- ## Source: docs/references/contracts/AGENT_SKILLS_OPEN_STANDARD.md # Agent Skills Open Standard (ASOS) — Katailyst Alignment > Reference document mapping Katailyst's skill format to the emerging cross-platform Agent Skills Open Standard and Claude Code plugin conventions. ## Background The Agent Skills Open Standard (agentskills.io) is a cross-platform specification for portable AI skills compatible with Claude Code, OpenAI Codex, Cursor, GitHub Copilot, Gemini CLI, and other agent hosts. Katailyst's SKILL.md format was designed before ASOS existed but is already ~90% compatible. This document maps the differences and documents any export adjustments needed for full compatibility. ## Format Mapping ### SKILL.md Frontmatter | ASOS Field | Katailyst Equivalent | Status | Notes | | --------------- | -------------------------- | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `name` | `name` (= entity `code`) | Identical | Both use lowercase-hyphen slug format, max 64 chars | | `description` | `description` | Identical | Both require activation-grade descriptions. ASOS: max 1024 chars. Claude.ai: max 200 chars. Katailyst stores both via `description_full` / `description_short` | | `license` | `license` (in unit.json) | Compatible | Katailyst stores in unit.json; export maps to frontmatter | | `allowed-tools` | `allowed-tools` (optional) | Identical | Both default to all tools if omitted | | `metadata` | `metadata` (catch-all) | Identical | Both use metadata as extensible key-value store | | — | `version` | Katailyst-only | ASOS does not specify a version field in frontmatter. Katailyst includes it for registry tracking | | — | `tier` | Katailyst-only | Priority tier (1-10). Not in ASOS spec. Export as `metadata.tier` | | — | `status` | Katailyst-only | Entity lifecycle status. Not in ASOS spec. Export as `metadata.status` | | — | `tags` | Katailyst-only | Taxonomy tags. Not in ASOS spec. Export as `metadata.tags` | **Compatibility**: Katailyst's extra frontmatter fields (`version`, `tier`, `status`, `tags`) must be nested under `metadata` for strict ASOS compliance. Claude Code currently tolerates extra top-level keys but Claude.ai rejects them. ### Export Strategy For maximum portability, the plugin exporter (05-01) should: 1. Keep only ASOS-standard top-level keys: `name`, `description`, `license`, `allowed-tools`, `metadata`. 2. Move Katailyst-specific fields into `metadata`: ```yaml --- name: deep-research description: 'Deep multi-source research synthesis...' allowed-tools: - tavily.search - firecrawl.crawl metadata: version: '1' tier: 2 status: published tags: - domain:research - family:research --- ``` 3. This is already how `docs/atomic-units/SKILLS.md` specifies the format (see "Frontmatter compatibility" section). ### SKILL.md Body | ASOS Convention | Katailyst Convention | Status | Notes | | ------------------------------------- | -------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------- | | Markdown body with instructions | `instruction_md` in content_json | Identical | Both use markdown for the instruction body | | Progressive disclosure via references | `references/` directory | Identical | Both recommend lean SKILL.md + deep references | | Explicit load triggers | "MANDATORY: read references/..." | Identical | Same convention for progressive loading | | Examples in body or examples/ | `examples/` directory | Identical | Katailyst stores artifact content in `entity_revisions.artifacts_json` and mirrors it to filesystem directories | ### Directory Structure | ASOS Convention | Katailyst Convention | Status | Notes | | -------------------------------- | ---------------------------------------- | -------------- | --------------------------------------------------------------------------------- | | `skills//SKILL.md` | `.claude/skills/curated//SKILL.md` | Compatible | Different root path; export script maps to plugin path | | `skills//references/` | Artifacts with type `reference` | Compatible | Katailyst stores in DB; export writes to filesystem | | `skills//examples/` | Artifacts with type `example` | Compatible | Same mapping | | `skills//tests/` | Artifacts with type `test` | Compatible | Same mapping | | `skills//rules/` | Artifacts with type `rule` | Compatible | Same mapping | | `skills//scripts/` | Not in artifact types | Gap | Scripts not stored as artifacts; could add `script` artifact type if needed | | `unit.json` (Katailyst-specific) | N/A in ASOS | Katailyst-only | Metadata sidecar. Not exported to plugin format — data is embedded in frontmatter | ### Plugin Format (Beyond Skills) | Plugin Component | ASOS/Claude Code Convention | Katailyst Source | Compatibility | | -------------------- | --------------------------- | -------------------------------- | -------------------------------- | | `plugin.json` | Standard plugin manifest | Generated from registry metadata | Full | | `marketplace.json` | Marketplace catalog | Generated from org + entities | Full | | `commands/.md` | Slash command definitions | Prompt entities (kind=command) | Full | | `agents/.md` | Agent persona definitions | Agent entities | Full (Katailyst-specific format) | | `hooks/hooks.json` | Lifecycle hook manifest | Automation entities | Full | | `.mcp.json` | MCP server configuration | Generated from server script | Full | ## Compatibility Gaps ### Gap 1: Frontmatter Extra Keys **Issue**: Claude.ai rejects unknown top-level frontmatter keys. Katailyst's internal format includes `version`, `tier`, `status`, `tags` at the top level. **Resolution**: Already resolved. The `docs/atomic-units/SKILLS.md` spec explicitly says "Put any extra flags under `metadata`". The export script (05-01) must enforce this. **Verification**: Export SKILL.md files should pass Claude.ai's upload validator. ### Gap 2: unit.json Has No ASOS Equivalent **Issue**: Katailyst uses `unit.json` as a metadata sidecar for each skill (tags, status, tier, provenance, descriptions). ASOS has no equivalent — all metadata lives in SKILL.md frontmatter. **Resolution**: `unit.json` is a Katailyst-internal artifact. It is NOT exported to the plugin format. All unit.json data that needs to be portable is mapped into SKILL.md frontmatter metadata or plugin.json arrays. **Impact**: None. unit.json was always intended as a staging/import format, not a distribution format. ### Gap 3: Agent Format Is Katailyst-Specific **Issue**: ASOS focuses on skills (SKILL.md). Agent definitions (`agents/.md`) are a Claude Code convention (from Superpowers), not an ASOS standard. **Resolution**: Agent export follows the Superpowers convention (markdown with YAML frontmatter: name, description, model, tools). This is compatible with Claude Code but not formally part of ASOS. **Impact**: Agents are only usable in Claude Code/Workspace, not in other ASOS-compatible hosts. This is acceptable — agents are inherently platform-specific (they reference model IDs and tool registries). ### Gap 4: Taxonomy Tags **Issue**: Katailyst's taxonomy system (20 canonical families, namespace:code tag format) has no equivalent in ASOS. The standard doesn't define a taxonomy. **Resolution**: Tags are exported under `metadata.tags` in SKILL.md frontmatter. Non-Katailyst consumers can ignore them. Katailyst consumers (MCP server, hook system) use them for discovery. **Impact**: None. Tags are additive metadata. ### Gap 5: Description Dual Length **Issue**: ASOS allows descriptions up to 1024 characters. Claude.ai limits to 200 characters. Katailyst stores both variants. **Resolution**: Plugin export uses the short variant (<=200 chars) in SKILL.md frontmatter `description`. The full variant is available via MCP `get_entity` tool for programmatic consumers. **Impact**: Some detail is lost in the exported SKILL.md description. This is by design — the full description is available through other channels. ## Cross-Platform Compatibility Matrix | Feature | Claude Code | Claude.ai | Cursor | Codex | Gemini CLI | | ---------------------- | ----------------- | --------------- | ------------- | --------- | ---------- | | SKILL.md loading | Yes (plugin) | Yes (upload) | Via MCP | Via MCP\* | Via MCP\* | | Frontmatter parsing | Full | Strict (5 keys) | N/A | N/A | N/A | | Progressive disclosure | Yes (references/) | Yes | N/A | N/A | N/A | | Slash commands | Yes (commands/) | No | No | No | No | | Agent personas | Yes (agents/) | No | No | No | No | | Hooks | Yes (hooks.json) | No | No | No | No | | MCP tools | Yes (.mcp.json) | No | Yes | Partial\* | Partial\* | | Taxonomy tags | Yes (via MCP) | No (ignored) | Yes (via MCP) | N/A | N/A | \*Partial: depends on MCP support in each host. ## Recommendations 1. **Always export with strict frontmatter**: Only `name`, `description`, `license`, `allowed-tools`, `metadata` at top level. Katailyst-specific fields go under `metadata`. 2. **Keep descriptions <=200 chars**: For maximum portability (Claude.ai, skills.sh). Full descriptions available via MCP. 3. **Use progressive disclosure**: Small SKILL.md + rich references/examples. This is both ASOS best practice and Katailyst convention. 4. **Test in multiple hosts**: Use the cross-host testing harness (05-06) to verify skills work in Claude Code, Claude.ai, and via MCP. 5. **Monitor ASOS evolution**: The standard is young (2025-2026). Track changes at agentskills.io and update export logic accordingly. 6. **Consider ASOS registry listing**: If agentskills.io develops a public registry, submit Katailyst skills for cross-platform discovery. ## References - Agent Skills Open Standard: https://agentskills.io - Anthropic Skill Authoring Guide: https://docs.anthropic.com/en/docs/claude-code/skills - Superpowers (reference implementation): https://github.com/obra/superpowers - Katailyst Skill Spec: `docs/atomic-units/SKILLS.md` - Plugin Export Script: `scripts/distribution/export_plugin.ts` (Phase 05-01) - Skills Ecosystem Notes: `docs/references/skills/SKILLS_ECOSYSTEM_NOTES.md` --- ## Source: docs/references/contracts/ARCHITECTURE_PRESERVATION_AND_RECOVERY.md # Architecture Preservation and Recovery Guide **Date:** 2026-03-11 **Status:** Protected reference — DO NOT condense, summarize, or "clean up" this document **Purpose:** If an agent destroys the doc structure, this document explains how everything was set up and why, so the pattern can be restored without starting from scratch. --- ## WHY THIS DOCUMENT EXISTS On multiple occasions, well-intentioned agents have "cleaned up" the steering documents by making them more concise, more precise, or more imperative. Every time, this destroyed the system's ability to steer agent behavior effectively. **The pattern that works is deliberately rich and verbose.** It is not sloppy — it is engineered for LLM consumption. LLMs need context, examples, reminders, and redundancy to behave well. Removing that richness in the name of "cleanliness" creates agents that produce generic, shallow output. **If you are an agent reading this: DO NOT make the following changes without explicit operator approval:** - Do not shorten hub content below 2,500 characters - Do not remove "Best Practices," "Reminders," "Common Compositions," or "Adjacent Hubs" sections from hub docs - Do not flatten the front-door index tables into prose - Do not merge or consolidate the 8 runtime files per agent - Do not remove the layered table structure from the atlas - Do not replace rich guidance with terse imperatives --- ## THE PATTERN THAT WORKS ### Core Principle: Layered Richness, Not Precision The system works through **progressive context loading at multiple granularities**. Think of it as a city with neighborhoods, streets, buildings, and rooms — not a GPS coordinate. An agent doesn't need to know "call function X with parameter Y." It needs to understand: - What kind of work am I doing? (hub level) - What does the best version of this look like? (best practices) - What entities should I compose? (quick index / starting points) - What pairs well with what? (common compositions) - What shouldn't I forget? (reminders) - Where can I go deeper? (adjacent hubs, graph traversal) ### The Three Navigation Layers ``` LAYER 1: Atlas (Global Map) What it is: The full registry inventory. Entity types, counts, relationships, neighborhoods. File: .claude/kb/curated/global/ai-engineering/global-catalyst-atlas/variants/full.md DB entity: global-catalyst-atlas Token cost: ~8-10K When loaded: Session start for any substantial work LAYER 2: Hubs (15 Work Domain Guides) What they are: Context bundles that answer "what kind of work is this?" Each hub contains: When You're Here, Best Practices, Quick Index, Common Compositions, Adjacent Hubs, Reminders Files: .claude/kb/curated/global/ai-engineering/hub-{name}/variants/full.md DB entities: hub-analysis, hub-article, hub-copywriting, hub-design, hub-education, hub-email, hub-growth, hub-marketing, hub-meeting, hub-multimedia, hub-planning, hub-registry, hub-research, hub-skills, hub-social Token cost: ~700-1300 each When loaded: After the agent identifies the work domain LAYER 3: Capillaries (Individual Entities) What they are: The actual skills, playbooks, KB items, recipes, tools, etc. Found via: Hub quick indexes, entity_links traversal, discover_v2() Token cost: Varies per entity When loaded: After hub routing narrows the relevant entities ``` This is atlas → hubs → capillaries navigation. It's how agents go from "I need to do something" to "I need these specific entities." ### Why Hubs Must Be Rich A hub is NOT an index page. It is a **context bundle that teaches the agent how to work in this domain.** The difference matters enormously: **A thin hub (BAD — what "cleanup" agents create):** ```markdown # Social Media Hub Create social content for various platforms. ## Starting Points - make-social - social-content ``` **A rich hub (GOOD — what actually steers agent behavior):** ```markdown # Social Media Hub Create high-performing social media content across all platforms... ## When You're Here - Creating social media content for any platform (Instagram, LinkedIn, X, TikTok...) - Creating, editing, or repurposing social content - Social strategy, hook patterns, or platform-specific optimization ## Best Practices - Research what top 1% creators do on the TARGET PLATFORM before writing - Platform-native content always outperforms cross-posted content - Hooks in the first line determine 80% of engagement - Always include brand voice (load brand-voice-master) ... ## Key Starting Points | Entity | Type | What It Gives You | | -------------- | -------- | ------------------------------ | | make-social | playbook | End-to-end social workflow | | social-content | skill | Core social writing capability | ... ## Common Compositions - New social post: make-social → social-content + platform content_type + brand-voice-master - Cross-platform: make-social → social-content + multiple platform content_types ... ## Adjacent Hubs - hub-copywriting — stronger persuasive copy - hub-multimedia — images, carousels, video ... ## Reminders - Always research audience and platform trends before writing - Check brand-voice-master for tone ... ``` The rich version gives the agent judgment. The thin version gives it nothing except entity names it could have found by searching. --- ## HOW THE DOCUMENTS ARE ARRANGED ### Entry Points (The Front Door) ``` CLAUDE.md └── Points to → AGENTS.md (the real entry point) └── Contains: - Runtime truth declarations - Canonical companion doc list - Atomic unit type list - Classification rules - Non-negotiables - Foundation quality defaults - Task start protocol - Repo index (quick map) - Development environment ``` AGENTS.md is the single main entry point. Everything else is reached from there. ### Agent-Specific Surfaces (Per Agent × 3 Agents) Each of the 3 agents (Victoria, Julius, Lila) has this set of KB entities mirrored locally: ``` Per agent: ├── {name}-front-door-index/ # Rich map of nearby surfaces, table-based │ ├── KB.md # Summary │ └── variants/full.md # Full content with 7 sections of tables ├── {name}-information-architecture/ # Map of the full agent stack ├── {name}-core-directives/ # Hard-edged operating posture ├── agent-sop-{name}/ # Main launch/orientation surface ├── agent-lessons-{name}/ # Anti-regression patterns ├── {name}-identity-agents/ # Runtime operating contract (one of the 8 runtime files) ├── {name}-identity-soul/ # Taste, identity, instincts ├── {name}-identity-user/ # Principal preferences ├── {name}-identity-id/ # Stable identity card ├── {name}-identity-tools/ # Tool index ├── {name}-identity-bootstrap/ # Recovery/restart ├── {name}-identity-heartbeat/ # Monitoring rules └── {name}-identity-memory/ # Durable continuity routing ``` That's approximately 14 KB directories per agent × 3 agents = ~42 agent KB directories. ### The 8 Runtime Files (On Render Disk) On each agent's Render service, 8 markdown files are injected as "Project Context" every turn: ``` /data/workspace/ ├── AGENTS.md ← Session boot, startup law, read order ├── SOUL.md ← Who the agent IS ├── IDENTITY.md ← Deployment/service facts ├── USER.md ← Principal preferences ├── MEMORY.md ← Routing to deeper memory ├── TOOLS.md ← Tool index ├── BOOTSTRAP.md ← Recovery/restart steps └── HEARTBEAT.md ← Monitoring schedules ``` These are MIRRORS of the DB KB entities (`{name}-identity-agents` → `AGENTS.md`, `{name}-identity-soul` → `SOUL.md`, etc.). The DB is canonical. Render disk is a cache. **CRITICAL: These 8 files must stay rich.** They are the fast steering layer. An agent that gets thin runtime files produces thin, generic work. ### Shared Global Surfaces ``` .claude/kb/curated/global/ai-engineering/ ├── global-catalyst-atlas/ # THE system map - entity inventory, neighborhoods, flow ├── global-catalyst-guide/ # Shared fleet entry - what Katailyst is ├── global-catalyst-playbook/ # Composition method - the 6-step loop ├── global-research-protocol/ # Research expectations - 5 core rules ├── global-team-context/ # Company/team context ├── global-agent-principles/ # Shared agent principles ├── agent-operating-bible/ # Behavioral doctrine ├── agent-foundation-spec/ # Layer boundaries and stack architecture ├── context-engineering-methodology/ # How to build agent context ├── tools-guide-overview/ # Shared tool doctrine └── ... (20+ more shared KBs) ``` ### The 15 Hubs All in `.claude/kb/curated/global/ai-engineering/hub-{name}/`: ``` hub-analysis — Data analysis, SWOT, dashboards, reporting hub-article — Long-form articles, blog posts, editorial hub-copywriting — Persuasive copy, CTAs, headlines, sales writing hub-design — Visual design, brand consistency, Vercel/v0, React (renamed from hub-web) hub-education — QBank, study guides, exam prep, learning content hub-email — Email sequences, drip campaigns, newsletters hub-growth — Build-measure-learn, experiments, analytics, optimization hub-marketing — Campaign planning, strategy, umbrella marketing hub-meeting — Meeting prep, briefings, action items hub-multimedia — Image generation, video, diagrams, media production hub-planning — Strategic planning, roadmapping, decision support hub-registry — Registry maintenance, entity management, graph stewardship hub-research — Research methodology, "find the top 1%", audience research hub-skills — Skill creation, import, evaluation, capability building hub-social — Social media content across all platforms ``` Each hub exists as: 1. A `registry_entities` row (entity_type='kb', org_id=HLT) 2. A `kb_items` row linked to the entity 3. A `kb_variants` row (variant_type='full') with the rich content 4. `entity_tags` rows (tagged capability:hub, domain:\*, etc.) 5. `entity_links` rows (pairs_with other hubs, recommends skills/playbooks) 6. A local mirror directory with KB.md and variants/full.md ### Hub Content Structure (The Template) Every hub should follow this structure: ```markdown # {Hub Name} Hub {One paragraph explaining what this hub is and when to use it} ## When You're Here - {3-5 bullet points of situations that route to this hub} ## Best Practices - {5-7 bullet points of what works, what the top 1% do} - {Cross-industry inspiration patterns} - {Tool recommendations} - {Process guidance} ## Key Starting Points | Entity | Type | What It Gives You | | ------------- | ------ | ---------------------- | | {entity-code} | {type} | {one-line description} | {3-6 rows} ## Quick Index 1. {entity-code} — {brief description} 2. {entity-code} — {brief description} {5-10 entries} ## Research Patterns / Common Patterns (hub-specific) ### Pattern 1: {Name} {3-5 step process} ### Pattern 2: {Name} {3-5 step process} ## Common Compositions - **{Scenario}:** {entity} → {entity} + {entity} + {entity} - **{Scenario}:** {entity} → {entity} + {entity} {3-5 compositions} ## Adjacent Hubs - hub-{name} — {why it connects} {3-5 adjacent hubs} ## Reminders - {Critical thing not to forget} - {Quality check} - {Default behavior to maintain} {3-5 reminders} ``` **Minimum content size: 2,500 characters.** Below this, the hub is too thin to steer behavior. ### The Packet-First Retrieval Model As of 2026-03-11, agents use this flow: ``` 1. registry.agent_context (one-call MCP bootstrap) Returns: layered packet with system baseline, HLT foundation, work hubs, domain packs, examples/artifacts 2. If thin or unavailable, fallback to: discover(intent) → identify hub → read hub → traverse(hub_ref) → manual assembly 3. Canonical packet layer order: system baseline → HLT foundation packs → personal overlay → work hub(s) → domain/product/audience pack(s) → examples/artifacts ``` This is documented in `docs/references/contracts/LAYERED_CONTEXT_PACKET_CONTRACT.md`. ### Front-Door Index Structure Each agent's front-door index has 7 sections, all as tables: ``` 1. Shared Starts — Common fleet surfaces 2. {Agent} Runtime Stack — Agent-specific steering surfaces 3. Discovery, Research, And Graph Expansion 4. Design, Pages, Artifacts, And Presentation 5. Multimedia, Content, And Performance Surfaces 6. Ops, Meeting, And Capability-Building Surfaces 7. Default MCP And Packet Flow — The registry.agent_context instructions ``` Each section uses a 3-column table: `Surface | What it gives {Agent} | Reach here when` --- ## THE CONTRACTS THAT GOVERN BEHAVIOR These contracts are canonical and must not be weakened: | Contract | File | What It Governs | | ---------------------------- | -------------------------------------------------------------- | ----------------------------------------------------- | | Current Operating Model | docs/references/contracts/CURRENT_OPERATING_MODEL.md | System vs HLT vs Personal scope, visibility model | | Layered Context Packet | docs/references/contracts/LAYERED_CONTEXT_PACKET_CONTRACT.md | How agent context is assembled in 6 layers | | Runtime Ownership | docs/references/contracts/RUNTIME_OWNERSHIP_AND_CONSUMPTION.md | Who owns what between Katailyst, OpenClaw, and Render | | Mirrors and Packs | docs/references/contracts/MIRRORS_AND_PACKS.md | DB canonical vs repo mirror vs export surface | | Core Agent Shared Foundation | docs/references/ai-agents/CORE_AGENT_SHARED_FOUNDATION.md | Shared runtime base and overlay model | | Agent Doc Map | docs/references/ai-agents/AGENT_DOC_MAP.md | Active hosted agent stack map | --- ## DATABASE STRUCTURE (What Lives Where) ### Key Tables ``` registry_entities — All 1,045+ entities (16 types) ├── org_id — System (00000000-...-0001) or HLT (8ba36969-...-7b08) ├── entity_type — kb, skill, content_type, recipe, tool, style, etc. ├── code — Unique within type+org (e.g., 'hub-research') ├── priority_tier — 1 (highest) to 5 ├── rating — 1-5 ├── owner_user_id — NEW: for personal scope (migration 144) └── visibility_scope — NEW: 'private' or 'org_shared' (migration 144) kb_items — One per KB entity └── item_type — 'reference', 'guide', etc. kb_variants — Multiple per kb_item (snippet, distilled, full) ├── variant_type — 'snippet', 'distilled', 'full' ├── content — The actual markdown content └── token_count — Approximate token count entity_links — 7,468 typed relationships ├── link_type — uses_kb, related, recommends, requires, pairs_with, etc. ├── weight — 0.0 to 1.0 └── reason — Why this link exists entity_tags — 8,126+ assignments └── tag_id → tags table (namespace:code format) tags — 331 tags across 35 namespaces tag_namespaces — domain, capability, surface, family, etc. ``` ### Key SQL Functions ```sql -- Discovery: weighted search discover_v2(query, tags[], entity_types[], org_id, limit, exclude_tags[], options, search_mode) -- Graph traversal traverse_links(entity_ref, link_types[], depth, org_id) -- IMPORTANT: entity_ref needs 'kb:' prefix (e.g., 'hub:hub-research') -- System org system_org_id() → '00000000-0000-0000-0000-000000000001' ``` ### Key Entity Counts (as of 2026-03-11) ``` 914 system entities + 131 HLT entities = 1,045 total 234 KB, 185 skills, 124 content_types, 119 recipes, 80 tools 56 styles, 55 metrics, 34 channels, 34 prompts, 33 rubrics 24 bundles, 23 playbooks, 22 schemas, 11 agents, 8 lint_rules, 3 lint_rulesets ``` --- ## MCP CONFIGURATION ### Katailyst Registry MCP (the main one) ``` Endpoint: https://www.katailyst.com/mcp Auth: Bearer Token format: kmcp_{uuid}_{secret} Profile: issued per token (`chat_safe` or `full`) 25 tools across 4+ toolsets ``` ### Multimedia Mastery MCP (specialist media layer) ``` Endpoint: https://multimediamastery.vercel.app/api/media/v1/mcp Auth: Bearer mmtk_{hash}.{secret} Separate token, separate auth system 31+ tools for media planning, generation, asset management ``` These are two separate services. They don't share auth. Composition happens at the agent layer. --- ## RECOVERY PROCEDURES ### If Hub Content Gets Destroyed 1. **DB is canonical.** The enriched hub content lives in `kb_variants` table. 2. Query to get any hub's content: ```sql SELECT kv.content FROM registry_entities re JOIN kb_items ki ON ki.entity_id = re.id JOIN kb_variants kv ON kv.kb_item_id = ki.id WHERE re.code = 'hub-research' AND kv.variant_type = 'full'; ``` 3. Write the content back to the local mirror file. ### If Front-Door Indexes Get Destroyed 1. The structure is 7 sections of tables (see "Front-Door Index Structure" above). 2. Section 7 must contain the `registry.agent_context` instructions. 3. The tables are 3-column: `Surface | What it gives {Agent} | Reach here when` 4. DB has the canonical content in kb_variants for `{name}-front-door-index`. ### If the Atlas Gets Destroyed 1. The atlas is the most complex doc (~425 lines, ~10K tokens). 2. It contains: entity type inventories, use-case neighborhoods, the 8-file runtime table, context layers, link types. 3. DB has the canonical content in kb_variants for `global-catalyst-atlas`. ### If Agent Identity Files Get Destroyed 1. Each agent has 8 identity mirrors in the DB. 2. These mirror the 8 runtime files on Render. 3. Query: `SELECT re.code, kv.content FROM registry_entities re JOIN kb_items ki ON ki.entity_id = re.id JOIN kb_variants kv ON kv.kb_item_id = ki.id WHERE re.code LIKE '{name}-identity-%'` 4. The Render disk files at `/data/workspace/` are the LIVE versions. ### If AGENTS.md Gets Destroyed 1. AGENTS.md is the single most important file in the repo. 2. It should be ~380 lines containing: runtime truth, primer, org model, agent doc surfaces, atomic unit list, classification rules, non-negotiables, foundation quality defaults, task start protocol, repo index. 3. It is NOT a brief pointer file. It is a comprehensive entry point. ### If the Packet Contract Gets Destroyed 1. The core idea: `system baseline → HLT foundation → personal → work hubs → domain packs → examples/artifacts` 2. File: `docs/references/contracts/LAYERED_CONTEXT_PACKET_CONTRACT.md` 3. Key sections: Canonical Packet Order, What Each Layer Is For (table), Hubs vs Packs, Default Retrieval Posture, Fallback Flow, Assembly Guidance, two worked examples. --- ## ANTI-PATTERNS TO WATCH FOR These are the specific ways agents destroy this system: ### 1. "Making it concise" **Symptom:** Agent rewrites 3,000-char hub into 500 chars. **Why it happens:** Agent sees "redundancy" and tries to "clean up." **Why it's wrong:** The redundancy IS the feature. LLMs need context, examples, and reminders to behave well. A 500-char hub doesn't steer behavior — it's just a label. **Fix:** Restore from DB. The minimum hub content is 2,500 chars. ### 2. "Making it precise" **Symptom:** Agent replaces rich guidance with imperative commands ("ALWAYS do X", "NEVER do Y"). **Why it happens:** Agent thinks precision = quality. **Why it's wrong:** Imperative commands are brittle. They don't teach judgment. Rich guidance with examples and context produces better results than rules. **Fix:** Restore from DB. The documents should read like a knowledgeable colleague explaining the domain, not a machine issuing instructions. ### 3. "Consolidating redundant files" **Symptom:** Agent merges front-door-index + information-architecture into one file. **Why it happens:** Agent sees overlap and thinks it's waste. **Why it's wrong:** Each file serves a different purpose at a different moment. The overlap is intentional — different entry points for different discovery paths. **Fix:** Restore individual files from DB. Keep the separation. ### 4. "Removing legacy files" **Symptom:** Agent deletes active helper or doctrine files because they look redundant. **Why it happens:** Agent reads "legacy helper residue" and thinks "delete." **Why it's wrong:** Active helper and doctrine files still carry real discovery value. Deleting them breaks read order and graph traversal. **Fix:** Restore from DB. Only delete deprecated residue after references and mirrors are actually gone. ### 5. "Flattening the hierarchy" **Symptom:** Agent replaces the atlas → hub → capillary navigation with a flat search. **Why it happens:** Agent thinks "just use discover_v2 for everything." **Why it's wrong:** discover_v2 returns relevance-scored results but doesn't provide the judgment context that hubs provide. Without hub guidance, agents select wrong entities or miss compositions. **Fix:** Restore hub routing. Discovery is a tool, not the whole navigation. --- ## SESSION HISTORY (How We Got Here) ### Session 5 (2026-03-10) - Built the hub architecture: created all 15 hubs in DB with entity_links - Wrote the master roadmap in the archived implementation bundle - Established the atlas → hubs → capillaries navigation model ### Session 6 (2026-03-11 — first half, Agent A) - Executed Micro 1 (Fix the Foundation) - Education tag audit: retagged 23 personas from domain:education → domain:healthcare - Enriched all 15 hubs in DB with rich content (best practices, indexes, compositions, reminders) - Updated front-door indexes with MCP discovery instructions - Added research-first operating principle to all 3 agent front-door neighborhoods - Documented the three-scope visibility model in CURRENT_OPERATING_MODEL.md - Renamed hub-web → hub-design in DB and local mirrors - Updated hub entity summaries for better FTS discovery ### Session 6 (2026-03-11 — second half, Agent B — the other agent) - Patched local mirror files with entity-code-aware content (referencing actual entity codes) - Created LAYERED_CONTEXT_PACKET_CONTRACT.md - Updated MCP_TOOLS_REFERENCE.md with registry.agent_context docs - Updated CHANNEL_DISTRIBUTION_CONTRACT.md - Cleaned hub-web references from all plan docs - Updated front-door indexes with packet-first MCP instructions - Updated agent identity mirrors with packet-first working posture - Ran comprehensive MCP functional tests (1,045 entities, 7,468 links, 25 tools) - Tested registry.agent_context end-to-end - Tested Multimedia Mastery MCP separately ### Session 7 (2026-03-11 — current) - Surveyed full state across both agents' changes - Verified DB-to-mirror consistency - Created the Phase 2 archived plan - Created this preservation document ### Key Pattern Decisions Made Along the Way 1. **Hubs are in HLT org, not the free/shared org.** They are HLT-specific work domain guides. 2. **All 15 hubs are tier=1, rating=5.** They are equally important navigation nodes. 3. **Hub content is rich, not concise.** Minimum 2,500 chars with Best Practices, Quick Index, Compositions, Adjacent Hubs, Reminders. 4. **Front-door-neighborhood is legacy, not deleted.** Kept for link integrity. 5. **DB is canonical, local files are mirrors.** Always recoverable from DB. 6. **registry.agent_context is the preferred bootstrap.** discover → hub → traverse is the fallback. 7. **The packet has 6 layers.** Free (System) → HLT foundation → personal → work hubs → domain packs → examples. 8. **Tags in global namespaces need org_id = NULL**, not the free/system org UUID. 9. **traverse_links needs the kb: prefix** on entity refs. 10. **discover_v2 has two overloads** — must pass ALL positional params to disambiguate. --- ## Source: docs/references/contracts/ATOMIC_DUPLICATE_DETECTION_POLICY.md # Atomic Duplicate Detection Policy Status: Active Updated: 2026-02-19 This policy defines how to identify and remediate duplicate or overlapping atomic units without destructive cleanup. ## Objectives 1. Keep discovery coherent as volume scales. 2. Avoid accidental deletion of useful variants. 3. Preserve provenance and explain why merge/supersede actions happened. 4. Keep alternatives available when overlap is intentional. ## Duplicate Classes ### Class A - Exact Identity Collision - Same `(org_id, entity_type, code, version)` or attempted collision. - Resolution: block duplicate insert/update by DB constraints. ### Class B - Near Duplicate - Different code, but same practical intent and same lifecycle tier. - Typical indicators: - highly similar summary/use_case - equivalent tags and links - overlapping artifact payloads Resolution: - keep the stronger entity as primary - link the weaker with `supersedes` or `alternate` as appropriate - archive only after verification and replacement path exists ### Class C - Intentional Variant - Similar intent but distinct audience, modality, lifecycle stage, or dependency profile. - Resolution: keep both; clarify with tags and links. ### Class D - Semantic Overlap (Needs Human Review) - Similar wording/intent but unclear if one should replace the other. - Resolution: quarantine decision in review queue; do not delete. ## Decision Tree 1. Is this a unique key collision? - yes: reject write, fix code/version. 2. Is there a clear superior replacement? - yes: `supersedes` path + migration notes. 3. Are both variants useful in different contexts? - yes: keep both, mark as `alternate` with reason. 4. Unclear? - hold in review queue; require human/operator decision. ## Required Evidence Before Merge/Supersede - candidate refs (`entity_type:code@version`) - overlap rationale (why these are duplicates/variants) - quality comparison (depth, links, evidence, recent usage) - migration notes (what consumers should use now) - rollback notes (how to restore if mistaken) ## Non-Destructive Remediation Rules - Prefer linking (`supersedes`, `alternate`) before archival/deletion. - Do not delete based only on "looks similar." - Preserve at least one stable reference path for previous consumers. - Maintain explicit reasons on all remediation links. ## SQL/Script Starter Checks ```sql -- Same name within type (possible near duplicates) select entity_type, lower(name) as name_key, count(*) as n from registry_entities where status in ('staged','curated','published') group by entity_type, lower(name) having count(*) > 1 order by n desc, entity_type asc; -- Similar summary lengths and ratings can be triage hints select entity_type, code, name, status, rating, priority_tier from registry_entities where status in ('staged','curated','published') order by entity_type asc, name asc; ``` ```bash pnpm registry:e2e:audit -- --limit 300 pnpm registry:graph:audit ``` ## Linking Rules for Remediation - Use `supersedes` when one unit should become the preferred successor. - Use `alternate` when both remain valid options. - Add concise `reason` and intentional `weight`. Example reason text: - "v2 consolidates richer examples and keeps same tool contract." - "Alternative variant for lower-token workflows." ## Status Transition Guidance - Keep replaced entities `deprecated` first when possible. - Archive only after: - migration note exists - replacement link exists - no unresolved references in active workflows ## Anti-Patterns - Bulk archival without reference audit. - Deleting records because they are short without checking context. - Treating broad/global units as duplicates only because they are not repo-specific. - Converting all variants into one over-constrained canonical that removes optionality. ## Related Docs - `docs/references/contracts/ATOMIC_UNIT_STANDARDS.md` - `docs/atomic-units/LINKS.md` - `docs/references/skills/SKILL_FACTORY_GOVERNANCE_CHECKLIST.md` --- ## Source: docs/references/contracts/ATOMIC_UNIT_STANDARDS.md # Atomic Unit Standards (Canonical) Status: Active Updated: 2026-03-05 Supersedes: - `docs/references/contracts/ATOMIC_UNIT_STANDARDS.md` - `docs/references/contracts/ATOMIC_UNIT_STANDARDS.md` - `docs/references/contracts/ATOMIC_UNIT_STANDARDS.md` - `docs/references/contracts/ATOMIC_UNIT_STANDARDS.md` - `docs/references/contracts/ATOMIC_UNIT_STANDARDS.md` This is the single canonical standard for quality floor, promotion gating, and type policy across all atomic unit types. ## Scope Applies to all atomic unit types: - **registry entity types** such as skills, tools, KB items, prompts, schemas, styles, content types, recipes, bundles, playbooks, channels, agents, eval cases, rubrics, metrics, lint rules, and lint rulesets - **operational and delivery unit types** such as assets, actions, automations, evaluations/signals, runs/traces, and artifacts The quality floor applies to the current system as it actually exists. Do not freeze the ontology to an old count or simpler historical model just to keep the documentation neat. When the product truth evolves, the standards should stay consistent while the type map updates around them. Registry entity types live in `registry_entities`. Other atomic unit types may live in dedicated operational or delivery tables because their lifecycle is not a good fit for registry revisioning. ## Core Intent 1. Keep units reusable and composable across orchestrators. 2. Prefer guidance and decision criteria over rigid routing. 3. Enforce measurable quality floors before promotion. 4. Preserve graph alternatives and meaningful link reasons. 5. Require source-backed research for unstable facts and claims. ## Canonical Quality Floor (Must Pass) ### Identity and lifecycle 1. Stable unique identifier appropriate to the unit type (`code` for registry entities; row identity + version lineage for assets/runs). 2. Human-readable `name` or label when the unit type is operator-facing. 3. Non-placeholder `summary` or description when the unit type is operator-facing. 4. Non-placeholder `use_case`, role, or lifecycle note when the unit type is operator-facing. 5. Explicit provenance and truthful lifecycle status. ### Taxonomy and links 6. Tags or equivalent typed metadata conform to the owning policy (`docs/TAXONOMY.md` for registry-facing units). 7. Unknown taxonomy/metadata values are explicit and queued as debt. 8. Promoted graph-facing units have meaningful typed links or upstream/downstream references. 9. Published registry units have at least two typed links unless explicitly exempted with rationale. ### Depth and artifacts 10. Content depth meets minimums for unit type. 11. Units claiming execution capability include executable evidence or explicit debt marker. 12. No hollow stubs, placeholder text, or silent blank critical fields. ## Promotion Stages and Gates Lifecycle: `staged -> curated -> published -> deprecated -> archived` ### `staged` 1. Non-empty `name`, `code`, `summary`. 2. Correct `entity_type`. 3. At least one valid namespaced tag. 4. No placeholder markers (`TODO`, `PLACEHOLDER`, `[INSERT]`). ### `curated` All `staged` gates plus: 1. Valid `use_case`. 2. At least one typed link with reason. 3. Duplicate overlap remediation intent is explicit when applicable (`alternate`/`supersedes`). 4. Minimum depth requirements pass for entity type. 5. No unjustified imperative lock-in language. 6. Minimum taxonomy coverage passes. ### `published` All `curated` gates plus: 1. At least two typed links with reasons (or explicit architectural exception). 2. Type-specific higher depth bars (for example, KB distilled coverage). 3. At least one artifact/reference/example, or explicit justified none-applicable note. 4. No blocking findings in atomic E2E quality audit. 5. Rollback or supersede correction path is documented. ## Type-Specific Minimum Depth Matrix | Unit type | Minimum depth for `curated` | | -------------- | ------------------------------------------------------- | | `skill` | non-empty instruction body (>50 words) | | `tool` | capability envelope + I/O + failure behavior | | `kb` | snippet variant populated (>100 words) | | `prompt` | context/objective/inputs/instructions/output/guardrails | | `schema` | valid properties + required-field clarity | | `style` | concrete do/don't constraints | | `content_type` | purpose + structure expectations | | `recipe` | composition logic + fallback notes | | `bundle` | explicit membership rationale and coherent grouping | | `playbook` | at least two steps with checkpoints | | `action` | trigger + expected output + dependencies | | `automation` | trigger + idempotency/retry/failure policy | | `asset` | version history + validation state + delivery metadata | | `channel` | constraints section non-empty | | `agent` | role/boundaries/default behavior + links | | `rubric` | measurable criteria + scoring anchors | | `metric` | unit/grain/aggregation/interpretation | | `lint_rule` | deterministic rule and fix guidance | | `lint_ruleset` | at least two included rules with rationale | | `evaluation` | candidate scope + criteria + evidence | | `signal` | provenance + confidence semantics | | `run`/`trace` | timeline + diagnostics + next actions | | `artifact` | clear role/provenance and parent context | ## Cross-Type Anti-Patterns (Reject) 1. One-line or slogan-only curated/published entries. 2. Agent-specific language in shared units without explicit scoped context. 3. Hard route-locking language without safety-critical reason. 4. Uncited fast-changing best-practice claims. 5. Typed links with empty/generic reasons. 6. Treating assets or automations as second-class just because they are not registry enum values. ## Evidence and Enforcement Hooks 1. `python3 scripts/registry/lint_unit_packages.py --strict` 2. `npx tsx scripts/registry/audit/audit_registry_graph.ts --report docs/reports/registry-graph-audit-latest.json` 3. `npx tsx scripts/registry/audit/audit_atomic_e2e.ts --report docs/reports/atomic-e2e-audit-latest.json` 4. `npx tsx scripts/registry/audit/audit_db_repo_reconciliation.ts --report docs/reports/db-repo-reconciliation-latest.json` Promotion defaults: 1. Block promotion on repeated high-severity anti-patterns. 2. Prefer enrich/repair/supersede over destructive deletion. 3. Preserve provenance and revision history. ## Related References 1. `docs/RULES.md` 2. `docs/VISION.md` 3. `docs/atomic-units/README.md` 4. `docs/references/contracts/ATOMIC_DUPLICATE_DETECTION_POLICY.md` 5. `docs/references/contracts/LINK_GRAPH_QUALITY_STANDARD.md` --- ## Source: docs/references/contracts/CONTEXT_ASSEMBLY_AND_GRAPH_SELECTION.md # Context Assembly & Graph-Driven Selection — Technical Reference This document explains how the Katailyst registry assembles context for AI agents. It is the canonical reference for **how entities reach agents** and **what controls selection**. Read this before touching: `lib/interop/agent-packets.ts`, `lib/mcp/handlers/registry-read/agent-context.ts`, `lib/mcp/handlers/registry-read/agent-context-graph.ts`, or any benchmark/selection logic. ## What the Registry Contains The Katailyst registry holds **~1,600 entities** across 21 entity types connected by **~10,900 graph links**. ### Entity Types (21) ``` skill, tool, kb, prompt, agent, schema, style, bundle, content_type, recipe, playbook, channel, eval_case, rubric, metric, lint_rule, lint_ruleset, agent_doc, operational_log, pattern, hub ``` ### Link Types (14) ``` governed_by_pack, bundle_member, uses_tool, uses_prompt, uses_kb, often_follows, requires, alternate, recommends, pairs_with, prerequisite, supersedes, parent, related ``` Every link carries: `link_type`, `weight` (0.0–1.0), and `reason` (human-readable). ### Tag Namespaces (34) Tags are namespaced: `action:create`, `domain:marketing`, `family:social`, `capability:hub`, etc. Full taxonomy: `docs/TAXONOMY.md`. Seed data: `database/002-seed-data.sql`. ## The Three-Phase Context Assembly Pipeline When an agent calls `registry.agent_context(intent, ...)`, three phases execute: ### Phase 1: Discover (`discover_v2`) Semantic + embedding search over all entities. Scoring uses 6 weighted signals: 1. **Text match** — FTS rank on name/code/summary/use_case (weight: 3×) 2. **Tag overlap** — ratio of matched tags (weight: 1×) 3. **Link popularity** — log(incoming link count) (weight: 0.3×) 4. **Priority tier** — `(11 - tier) / 10.0` on a 1-10 scale (weight: 0.5×) 5. **Rating** — normalized 0–100 (weight: 0.3×) 6. **Recency** — decay on updated_at (weight: 0.2×) Returns: top-K candidates ranked by composite score. Typical scores: 3.0–6.0. ### Phase 2: Graph Expand (`buildAgentContextGraphWithExpansion`) Starting from the top discover hits, traverses graph links to pull in structurally related entities that text search alone would miss (styles, schemas, rubrics, etc.). **Expansion link types traversed:** ``` uses_kb, recommends, requires, bundle_member, governed_by_pack, parent, uses_prompt, pairs_with ``` **Expansion entity types surfaced as candidates:** ``` kb, agent, agent_doc, bundle, channel, eval_case, metric, schema, skill, style, rubric, prompt, recipe, playbook, content_type, tool ``` Bundle relationships are also traversed through `bundle_member` so the graph view remains complete. However, `bundle_member` does **not** widen the candidate pool automatically. Bundles can still surface through other link types or direct retrieval, but simple membership should not displace direct capabilities by itself. Graph-expanded entities get synthetic relevance scores: ``` relevance_score = link_weight × 3.0 × linkTypeBoost × tierBoost ``` Where `linkTypeBoost`: requires=1.5, pairs_with=1.3, uses_kb=1.2, others=1.0. Each expanded entity carries `match_reasons` like `['graph:requires', 'via:recipe:web-landing-page']` so the selection phase knows where it came from. ### Phase 3: Select (`buildGraphDrivenSelection`) Picks the final set (typically 5–8 entities) from all candidates: 1. **Top organic discover hit** gets the first slot. The lead result should stay anchored on the strongest direct text/embedding match, not on a later graph-expanded helper. 2. **Graph-dependency promotion**: Up to 3 reserved slots for entities linked via `requires` from the top-3 discover hits. This ensures structural dependencies (styles, schemas, channels) reach agents even when their text-relevance scores are lower. 3. **Remaining slots** fill by ranked composite score. The `recommendation_receipt.graph_promotions` array records which entities were promoted and why. ## Design Principles ### The Graph Decides, Code Follows The selection algorithm has **zero hardcoded entity codes, keyword lists, or role assignments**. If an entity should accompany a recipe or content_type, create a `requires` link with appropriate weight. The code will promote it automatically. If selection is wrong, the fix belongs in graph links, discovery quality, or canonical entity data. Do not add bespoke application heuristics to compensate for missing or weak graph structure. **Litmus test**: adding 50 new entities should require zero code changes for routing. ### Link Semantics Matter - `requires` (weight 0.7–0.9) — categorical dependency: "this recipe cannot work without this style/schema" - `recommends` (weight 0.7–1.0) — strong suggestion: "agents should consider loading this" - `pairs_with` (weight 0.5–0.7) — adjacency: "these work well together" - `uses_kb` (weight 0.7–0.9) — reference dependency: "this entity references this knowledge" - `related` (weight 0.5–0.8) — loose association for exploration Only `requires` links trigger promotion in `buildGraphDrivenSelection`. Other link types participate in graph expansion (Phase 2) but compete on score for the remaining slots. `bundle_member` belongs only in the graph-visibility category: it does not widen the candidate pool, reserve slots, or force sibling selection. ### Scoring is Relative, Not Absolute Selection scores are normalized relative to the batch maximum. A score of 0.95 means "95% as relevant as the best match in this batch." This preserves ranking differentiation that hard-clamping would destroy. ## Hub Entities (Domain Front Doors) Hubs are first-class entities (entity_type: `hub`) with code pattern `hub-{domain}`. There are 15: ``` hub-social, hub-article, hub-copywriting, hub-multimedia, hub-design, hub-research, hub-marketing, hub-meeting, hub-planning, hub-analysis, hub-education, hub-email, hub-growth, hub-skills, hub-registry ``` Hubs serve as **domain front doors**: when an agent's intent matches a domain, the hub surfaces via discover_v2 (it has strong tags and summaries), and its `recommends` links point the agent to the best tools, playbooks, and KBs for that domain. **How hubs reach agents:** - Text match: hub summaries are written to match domain intents - Tags: `capability:hub`, `domain:{x}`, `family:{x}`, `format:discovery-map` - Links: `recommends` (weight 0.9–1.0) to primary execution entities **Hubs do NOT get special code-level treatment.** They compete on score like everything else. Their high priority_tier (1) and strong tag/summary coverage gives them a natural advantage for domain-level intents. ## What Was Removed (Anti-Patterns) ### Keyword-Based Anchor Roles (DELETED) The old system guessed 5 fixed roles (domain, execution, quality, brand, research) by pattern-matching entity names/summaries. It was wrong 60%+ of the time and wasted selection slots on irrelevant entities. **Fully deleted** — no types, no helpers, no schema fields remain. - ~~Legacy role-slot helper types~~ — deleted - ~~Legacy role-slot derivation helpers~~ — deleted - ~~Legacy receipt fields for role summaries~~ — removed from the receipt schema - ~~Role-inference tag namespaces~~ — removed ### Why It Was Wrong A text search for "brand" would pick `brand-voice-master` as the brand-facing candidate even when the intent was about social media (where brand voice is irrelevant clutter). Meanwhile, the graph already knew the right answer: the social recipe `requires` a specific style entity. The old role-slot system was guessing what the graph already encoded. ### Deprecated Front-Door Helper KBs (DELETED) The old extra helper KBs that duplicated the `-front-door-index` pattern were removed because: - They added a second helper concept to the KB layer that doesn't exist in the pipeline - They were duplicates of the `-front-door-index` KBs which remain active - They confused agents into thinking there was a second routing layer beyond the graph itself The active pattern for agent entry points is `{agent}-front-door-index` paired with `agent-sop-{agent}`. These are regular KBs discovered through tags and links. ## Terminology Glossary | Term | Meaning | Status | | -------------------- | ------------------------------------------------------------------- | ------------------- | | **Hub** | KB entity (`hub-{domain}`) that serves as a domain front door | Active, 15 entities | | **Front-door index** | Agent-specific entry point (`{agent}-front-door-index`) | Active | | **Graph promotion** | Selection mechanism that reserves slots for `requires` dependencies | Active | ## File Map | File | Role | | ------------------------------------------------------- | ------------------------------------------ | | `lib/interop/agent-packets.ts` | Selection logic, scoring, receipt building | | `lib/mcp/handlers/registry-read/agent-context.ts` | MCP handler — runs the 3 phases | | `lib/mcp/handlers/registry-read/agent-context-graph.ts` | Graph expansion traversal | | `lib/api/agent-context-payload.ts` | HTTP API handler (same pipeline) | | `database/003-discovery-system.sql` | `discover_v2` SQL function | | `lib/mcp/tool-definitions-shared.ts` | MCP response schema definitions | | `scripts/ops/benchmark_discovery_routing.py` | Remote MCP benchmark | | `scripts/ops/benchmark_agent_context_local.ts` | Local DB benchmark | --- ## Source: docs/references/contracts/CURRENT_OPERATING_MODEL.md # Current Operating Model This contract makes the current HLT vs `Free (System)` model explicit so the registry, repo docs, and hosted-agent surfaces stop talking past each other. It replaces ad hoc interpretation of the March 5-7, 2026 investigation notes. Those reports remain evidence. This file is the canonical summary. ## Core Layers ### 1. `hlt` active operating layer `hlt` is the active operating surface for the live HLT fleet and HLT-specific doctrine. Use `hlt` for: - hosted fleet-facing front doors such as `agent-sop-*` - hosted fleet identity overlays such as `*-identity-*` - HLT-specific directives, mission docs, and runtime-facing doctrine - helper support surfaces that exist to reinforce the HLT hosted-agent entrance layer ### 2. `Free (System)` shared canonical layer `system` is the internal org code for the shared `Free (System)` library and template layer that HLT still uses freely. On read-only surfaces, the default posture is: active execution stays in `hlt`, while shared `Free (System)` canon remains visible as an additive layer. `Free (System)` is not: - a second live business org - an exclusion fence - a sign that the material is "not for HLT" - a reason to tell HLT operators the capability is "outside their perimeter" HLT should be able to read and use shared `Free (System)` canon by default. Org placement here is a curation and authorship distinction, not an access restriction. It is where broadly reusable canon can live even when it was built primarily for HLT use. Examples of thick shared canon that may stay in `Free (System)`: - `meeting-prep` - `brainstorming` - `brand-voice-master` - `tools-guide-overview` Those surfaces remain first-class. They are not the problem, and they should not be flattened or retired just because they are shared. ### 3. External runtime ownership Render/OpenClaw and other host runtimes own live execution behavior. That includes: - disk-resident identity and runtime setup - runtime-specific sequencing and orchestration - gateway behavior - delivery behavior - some context and runs that never pass through Katailyst Katailyst should recommend and package blocks. The consuming runtime decides how to combine and execute them. Partial observability is normal. This repo and registry do not contain the whole runtime brain. ### 4. Repo mirrors and exports Repo files are mirrors, portability surfaces, and local operator references. They can project both `hlt` and `Free (System)` material into one repo surface. Filesystem location alone does not define runtime ownership or org truth. Canonical org truth lives in Supabase. ### 5. Three-Scope Visibility Model The registry uses three visibility scopes that determine who sees what: | Scope | Who sees it | Who creates it | DB representation | | ----------------- | -------------------- | ------------------- | ------------------------------------------------------------------------------ | | **Free (System)** | Everyone, every org | Catalyst team only | `org_id = system_org_id()` (`00000000-...-0001`) | | **Org** (HLT) | All HLT team members | Any HLT team member | `org_id = '8ba36969-...-7b08'` | | **Personal** | Just that user | That user | `org_id = execution org`, `visibility_scope = private`, `owner_user_id = user` | **How visibility merges:** When a user opens the registry, they see `Free (System) + their org` merged together. No giant separator — just a small dropdown filter (all / free / org / personal). Shared rows are NOT duplicated into org. Org reads from `Free (System)` automatically. **Personal is additive, not isolating:** selecting or creating personal scope never hides the user's org or `Free (System)` visibility. For the same user, personal view means `personal + org + system`. The thing a user cannot see is some other org's private/org-scoped material unless they belong to that org. **Operator language rule:** If a row is useful to HLT but owned by `system`, do not describe it as "outside HLT" or "outside the perimeter." Describe it accurately: HLT can read, use, and update shared `system` canon when appropriate. The meaningful boundary is cross-org visibility, not HLT's ability to maintain `system`. **Creation defaults:** When something is created **in the factory UI**, it defaults to the user's **current org** scope with an explicit personal/org-shared choice. When something is created **via MCP tools** (e.g., `registry.create`), the intended contract is: explicit `org_id` wins, otherwise the tool uses the authenticated **execution org**, and if neither exists the write fails closed. MCP must not silently fall back to System. **Flow direction:** Org entities do NOT flow up to `Free (System)`. `Free (System)` is curated separately. Graph links are allowed in any direction; read-time visibility decides whether a viewer can see both endpoints. In practice that means same-org, org→system, and system→org links are all allowed, while creation ownership still stays with the source org. **Current state:** HLT is the only active business org. Personal scope already exists inside an org-owned row model; users see `personal + org + system` while peers see `org + system`. Do not over-engineer no-org personal libraries or broad multi-org policy yet — make the HLT path correct first. ### 5a. Execution Org vs Read Visibility These are separate concerns and the product should treat them that way everywhere. - **Execution org** is the active workspace for writes, defaults, mutations, and permission checks. - **Read visibility** is the set of layers the user can see on browse/read surfaces. Default CMS posture: - Keep the execution org in `hlt` for HLT members unless they explicitly switch it. - Show all visible layers by default on browse/read surfaces: - current org - `Free (System)` - personal rows - Personal rows are additive to org + `system`, not a separate isolated library. - Keep scope controls subtle. Most users should not have to think about them. Practical implications: - Switching the execution org must not silently hide shared canon or personal rows on read surfaces. - `org` query/state identifies the execution org; it is not a substitute for source filtering. - Browse pages should only expose source/layer controls when useful, and those controls must map to real query behavior. - Create flows should default new items into the selected execution org unless the operator explicitly chooses personal scope. - Victoria and other shared assistants should resolve against the selected execution org plus visible shared layers, so they do not "disappear" just because the user moved between `hlt` and `system`. ### 6. Future-plan docs Future multi-org architecture is roadmap material, not the default interpretation of today's system. Today: - HLT is the only active business org - `Free (System)` is the shared canon layer - external runtimes remain autonomous - personal scope exists for org members; no-org personal libraries are not the current model Do not let future architecture planning blur the current operating model. ## Practical Rules 1. Preserve useful, thick surfaces. 2. Fix placement, wording, tags, and links before considering deletion. 3. Treat front-door and flagship surfaces as strong starts, not mandatory rails. 4. Use `family:*` for browse lanes and `format:*` for packaging shape so users can filter by kind without collapsing the graph. 5. Use `family:agent-files` for hybrid agent-facing operating references such as lessons, team/workspace references, repo/operator orientation references, and architecture/spec notes. Keep mission, research posture, and shared Katailyst usage rules in `family:agent-doctrine`. 6. Helper companions such as `*-front-door-index` are optional discovery aids, not a new control layer. 7. Tags guide filtering and links guide branching. Neither should be used to force one mandated sequence. 8. Do not treat `Free (System)` placement as "other org, not usable by HLT." 9. If a shared `system` row needs correction, record it as a `system` cleanup candidate or edit it directly through the shared write path; do not treat it like inaccessible capability. 10. Do not treat repo-local docs as proof that the repo owns the full runtime. 11. Normalize mis-scoped HLT fleet-facing docs into `hlt` when the row is clearly part of the live HLT operating layer. 12. HLT may maintain shared `system` canon directly when the change belongs in the shared layer. The boundary to respect is visibility into other org-specific rows, not write power over `system`. 13. Keep KB/package completeness separate from asset/content preview completeness. KB rows need good tags, summaries, variants, aliases, and links. Thumbnail/preview fields belong on asset/content surfaces that actually support `preview_image_url`, `thumbnail_url`, `hero_image_url`, `og_image_url`, or `preview_url`. ## Relationship To Other Contracts - `docs/references/contracts/RUNTIME_OWNERSHIP_AND_CONSUMPTION.md` explains who owns what. - `docs/references/contracts/MIRRORS_AND_PACKS.md` explains canonical vs mirror vs export surfaces. - `docs/references/ai-agents/CORE_AGENT_SHARED_FOUNDATION.md` explains the hosted HLT fleet read-order model. --- ## Source: docs/references/contracts/DB_URL_ENV_CONTRACT.md # DB URL Env Contract (Canonical) Status: active Updated: 2026-02-27 (runtime-truth + migration-ledger lane) This contract makes DB access predictable for operators and AI agents. ## Canonical Name Use `CATALYST_DB_URL` as the primary environment variable for this repo. For concurrent runtime execution, `POSTGRES_URL` should point at the Supabase transaction pooler (typically port `6543`). The shared resolver now defaults to transaction mode, so runtime/script callers should resolve DB URLs through the shared helper rather than assuming `CATALYST_DB_URL` is session-safe. When possible, set these together to the same transaction-pool URL: ```bash export POSTGRES_URL='postgres://...:6543/...' export CATALYST_DB_URL="$POSTGRES_URL" export DATABASE_URL="$POSTGRES_URL" ``` Reserve session/non-pooling URLs for explicit `mode: 'session'` consumers only. ## Compatibility Aliases (Accepted) The shared resolver in `scripts/lib/db-url.ts` accepts these aliases in mode-specific order. Transaction mode (default): 1. `POSTGRES_URL` 2. `DATABASE_URL` 3. `CATALYST_DB_URL` 4. `POSTGRES_PRISMA_URL` 5. `POSTGRES_URL_NON_POOLING` Session mode (explicit only): 1. `CATALYST_DB_URL` 2. `POSTGRES_URL_NON_POOLING` 3. `POSTGRES_PRISMA_URL` 4. `DATABASE_URL` 5. `POSTGRES_URL` 6. `TIGER_CLOUD_DB_URL` (legacy fallback only, when explicitly enabled by script) `.env.local` uses the same key precedence. ## Agent-Friendly Recommendation Set `POSTGRES_URL`, `CATALYST_DB_URL`, and `DATABASE_URL` to the same transaction-pool value for maximum interoperability: ```bash export POSTGRES_URL='postgres://...' export CATALYST_DB_URL="$POSTGRES_URL" export DATABASE_URL="$POSTGRES_URL" ``` This avoids tool-specific confusion where third-party utilities expect `DATABASE_URL`. ## SQL Entry Points Use these package scripts for SQL application flow: ```bash pnpm db:sql:dry pnpm db:sql:apply pnpm db:runtime:truth pnpm db:runtime:truth:strict pnpm db:migrations:backfill:dry pnpm db:psql -- -c 'select now();' ``` These scripts call the shared DB URL resolver and enforce migration truth in the repo-native numbered SQL pipeline: - `scripts/db/apply_database_sql.ts`: - creates/uses `public.katailyst_sql_migrations` as canonical apply ledger - skips already-applied migrations by ledger row + checksum - blocks unsafe replay on non-fresh DBs with empty ledger - `scripts/db/db_runtime_truth.ts`: - compares repo migrations to ledger - verifies runtime sentinels (critical tables/functions/constraints/indexes) - `scripts/db/backfill_katailyst_sql_migrations.ts`: - bootstrap-marks legacy migrations in the ledger for pre-ledger environments Important: - `supabase_migrations.schema_migrations` is informational in this repo model. - Canonical migration truth is `database/*.sql` + `public.katailyst_sql_migrations`. ## Resolver Adoption Audit Use this to find script-level drift where DB URL resolution is still local/duplicated: ```bash pnpm db:url:adoption:audit ``` Artifacts: - `docs/reports/db-url-resolver-adoption-latest.json` - `docs/reports/db-url-resolver-adoption-latest.md` Strict gate (fails if local/unknown resolver paths remain): ```bash npx tsx scripts/ops/audit_db_url_resolver_adoption.ts --strict ``` ## Why This Exists We previously had inconsistent env resolution across scripts, which caused avoidable agent/operator failures. The shared resolver and this contract keep DB execution predictable. --- ## Source: docs/references/contracts/LAYERED_CONTEXT_PACKET_CONTRACT.md # Layered Context Packet Contract Status: Active Updated: 2026-03-11 This contract defines the target shape for agent context assembly in Katailyst. The goal is not "search the registry better." The goal is to assemble the right layered packet for HLT work without forcing every task through one rigid route. ## Core Principle Agents should receive the smallest useful layered packet for the task at hand. That packet should feel like: `system baseline + HLT foundation packs + personal overlay + work hub(s) + domain/product/audience pack(s) + examples/artifacts` Hubs are only one layer inside that packet. They are not the whole retrieval model. ## Canonical Packet Order 1. **System baseline** - shared canon, global doctrine, reusable flagship capabilities - examples: shared research posture, discovery rules, reusable flagship skills/playbooks 2. **HLT foundation packs** - HLT-specific identity and always-on context - examples: brand voice, design/media system, product map, personas, offers, asset provenance 3. **Personal overlay** - user-private notes, preferences, or working overlays when present - this is allowed by the model even if the personal lane is still early 4. **Work hub(s)** - how the work is being done - examples: research, article, copywriting, social, design, analysis, planning 5. **Domain, product, or audience pack(s)** - what world the work belongs to - examples: assessment/qbank, NCLEX-RN, recruiting/careers, nurse-job-seeker audience 6. **Examples and artifacts** - screenshots, templates, prior winners, checklists, few-shot examples, references ## What Each Layer Is For | Layer | Main job | What it should not do | | ----------------------------- | ------------------------------------ | -------------------------------------- | | System baseline | shared reusable starting truth | crowd out HLT-specific needs | | HLT foundation packs | make answers sound and feel like HLT | become bloated task dumps | | Personal overlay | preserve private working context | replace org or system canon | | Work hubs | answer "what kind of work is this?" | carry all product truth | | Domain/product/audience packs | answer "what world is this for?" | duplicate every work-hub method | | Examples/artifacts | calibrate quality and execution | substitute for the underlying doctrine | ## Hubs vs Packs ### Work hubs Work hubs describe how the work is being done. Examples: - research - planning - copywriting - article - social - email/lifecycle - multimedia - design - analysis - skills/capability-building - registry/ops ### Domain/product/audience packs Packs describe what world the work belongs to. Examples: - HLT brand core - HLT design/media system - HLT product map - HLT audiences/personas - assessment/qbank - recruiting/careers - NCLEX-RN - FNP Keep these roles separate. Do not make hubs absorb product truth that belongs in packs. ## Default Retrieval Posture For non-trivial work, prefer a one-call packet assembly surface first. Current preferred MCP entry: - `registry.agent_context` Expected behavior: 1. retrieve candidates 2. compile the working packet in the layer order above 3. return graph context and doc pointers for the strongest packet members 4. expose enough receipt metadata that the caller can explain why the packet was assembled this way ## Fallback Flow If the one-call packet is too broad, too thin, or unavailable: 1. run `discover` 2. identify whether one or more work hubs should ground the task 3. read those hubs for routing guidance 4. run `traverse` on the strongest `recommends` and `related` edges 5. assemble the packet manually using the canonical layer order ## Assembly Guidance - Do not overfill the packet just because the registry is large. - Prefer the smallest useful packet that still gives the agent the right worldview. - Favor flagship surfaces and strong examples over dumping many shallow fragments. - If a layer is missing, note that gap explicitly instead of silently substituting unrelated context. ## Example: Recruiting Landing Page Request: `Build a landing page for nurses looking for jobs` Packet: 1. system baseline 2. HLT brand/voice pack 3. HLT design/media pack 4. work hubs: research + copywriting + design 5. domain/audience packs: recruiting/careers + nurse-job-seeker audience 6. examples/artifacts: best landing-page structures, CTA examples, prior winners ## Example: NCLEX Study Guide Request: `Create an NCLEX-RN cardiovascular study guide` Packet: 1. system baseline 2. HLT foundation packs 3. work hubs: research + article + education 4. domain/product packs: assessment/qbank + NCLEX-RN + nursing/sciences 5. examples/artifacts: study-guide examples, blueprint references, qbank examples ## Relationship To Existing Surfaces - `docs/references/contracts/CURRENT_OPERATING_MODEL.md` defines scope and visibility posture. - `docs/references/ai-agents/CORE_AGENT_SHARED_FOUNDATION.md` defines the active hosted-agent read order. - `docs/api/MCP_TOOLS_REFERENCE.md` defines MCP inputs and outputs. - `lib/api/agent-context-payload.ts` is the main repo implementation surface for packet assembly today. ## Acceptance Questions Use these questions to judge whether a packet is good enough: - Does the packet tell the agent how to work and what world it is working in? - Does it include HLT-specific foundation context when that matters? - Does it avoid stuffing the whole registry into the prompt? - Could another operator explain why each layer is present? - If the answer still feels generic, which missing layer caused it? --- ## Source: docs/references/contracts/LINK_GRAPH_QUALITY_STANDARD.md # Link Graph Quality Standard Status: Active Updated: 2026-02-19 This standard defines how to author and review graph links so traversal is useful, interpretable, and non-deterministic by default. ## Principles 1. Links are hints, not hard gates. 2. Typed links beat generic adjacency. 3. Reason text should explain intent quickly. 4. Weights should encode strength, not certainty theater. ## Canonical Link Types Use existing schema-supported types: - `requires` - `prerequisite` - `uses_tool` - `uses_prompt` - `uses_kb` - `bundle_member` - `often_follows` - `recommends` - `pairs_with` - `alternate` - `supersedes` - `parent` - `related` ## Authoring Standard Each non-trivial link should include: - `link_type` - `weight` in `0..1` - `reason` (single concise sentence) Recommended reason structure: ` + ` Example: "Pairs tool retrieval with doc extraction before synthesis; improves evidence quality for changing APIs." ## Weight Bands - `0.1–0.3`: weak association / optional context - `0.4–0.6`: moderate association / useful default - `0.7–0.85`: strong pairing / common path - `0.86–1.0`: near-required in typical workflows Use `1.0` sparingly. ## Per-Type Link Baselines (Guidance Floors) | Type | Minimum Expectation | | ---------------------------- | --------------------------------------------------------------- | | skill | at least 2 typed links with reasons | | kb | at least 2 links to related units/examples | | prompt | at least 2 links (`uses_kb`, companion skill/tool) | | tool | at least 2 links (`pairs_with`, `related`, or consuming skills) | | recipe/content_type/playbook | composition links to dependencies and alternatives | | rubric/metric | at least 1 link to evaluated/subject units | These are guidance floors for quality, not rigid gate rules for all drafts. ## Chokepoint Avoidance Rules Avoid graph structures that force one route: - single mandatory neighbor with no alternate path - clusters only linked internally with no bridges - overuse of `requires` where `recommends` is more appropriate For major workflows, provide at least one alternate or fallback path. ## Quality Checks ### Script-Based ```bash pnpm registry:graph:audit pnpm registry:e2e:audit -- --limit 250 ``` ### SQL Spot Checks ```sql -- Links missing reason select count(*) as missing_reason_links from entity_links where reason is null or btrim(reason) = ''; -- Isolated published/curated entities select re.entity_type, re.code from registry_entities re left join entity_links el on el.from_entity_id = re.id or el.to_entity_id = re.id where re.status in ('curated','published') group by re.entity_type, re.code having count(el.id) = 0 order by re.entity_type, re.code; -- Link type distribution select link_type, count(*) from entity_links group by 1 order by 2 desc; ``` ## Review Checklist - [ ] Link type is specific and semantically correct. - [ ] Reason is concise and actionable. - [ ] Weight reflects practical strength. - [ ] At least one non-chokepoint exploration path exists. - [ ] No deterministic hard-route language is implied. ## Anti-Patterns - Mass `related` links with no reason. - Defaulting all weights to `1.0`. - Using links to enforce control rather than assist discovery. - Building dense graph clutter with no signal hierarchy. ## References (Feb 2026) - MCP Architecture: - MCP Protocol Guide: - OpenTelemetry GenAI Conventions: - Neo4j Data Modeling Designs: --- ## Source: docs/references/contracts/QBANK_READONLY_EXTRACTION_CONTRACT_V0.md # HLT QBank Read-Only Extraction Contract v0 Status: Active Updated: 2026-03-05 ## Purpose Define deterministic, read-only ingestion from HLT QBank upstream into Katailyst canonical surfaces with hash-based change detection and provenance. This contract covers the upstream extraction boundary. After source material is cached inside Katailyst, use `docs/references/contracts/HLT_REFERENCE_WORKSPACE_CONTRACT.md` for: - the reference-first workspace model - nightly scan behavior - operator findings/actions - downstream reuse rules ## Upstream Endpoint (Verify Before Use) ```text GET https://api.hltcorp.com/api/v3/flashcards/{flashcard_id}/comprehensive?bundle_identifier={bundle_identifier} ``` ## Canonical Identity Mapping ### Upstream identity (PGA/HLT) 1. `content_source = "hlt_qbank"` 2. `content_source_id = String(flashcard_id)` 3. `bundle_identifier` persisted in app metadata ### Canonical source URL ```text https://api.hltcorp.com/api/v3/flashcards/{flashcard_id}/comprehensive?bundle_identifier={bundle_identifier} ``` ## SHA-256 Change Detection Contract Compute `source_sha256` over core learning payload only (exclude discussion threads): 1. Sort answers by `(sort_order ASC, id ASC)` when `id` exists. 2. Fallback sort: `(sort_order ASC, choice_key ASC, raw_content ASC)`. 3. Build `hash_input` with stable separators: - question - rationale - key_takeaway - exam_tip - normalized answer tuples (`correct|raw_content|raw_rationale`) 4. Deterministic serialization rule: - Do not hash raw object/array blobs with naive `JSON.stringify` unless keys are pre-sorted. - If object serialization is required, use stable key-order serialization (`fast-json-stable-stringify` equivalent) or explicit alphabetical key extraction. - For answers, build tuples in fixed field order only: `correct|raw_content|raw_rationale`. 5. Hash algorithm: SHA-256 UTF-8. ## Storage Contract (Katailyst) Persist provenance fields in metadata JSON: 1. `metadata_json.source.sha256 = source_sha256` 2. `metadata_json.source.upstream_updated_at = flashcard.updated_at` (if present) 3. `metadata_json.source.fetched_at = ingestion timestamp` ## Canonical Field Mapping ### Core item 1. `asset_type = "MCItem"` 2. `content_reference = flashcard.content_reference || source_url` 3. `tags = flashcard.tag_list` (optional) ### MC item body 1. `live_question = flashcard.question` 2. `live_rationale = flashcard.rationale` 3. `live_key_takeaway = flashcard.key_takeaway` 4. `live_exam_tip = flashcard.exam_tip` 5. `randomized = true` unless upstream explicitly defines fixed order ### Choices 1. `choice_key = answer.choice_key` or derive from `sort_order` (`1->A`, `2->B`, ...) 2. `raw_content = answer.raw_content` 3. `raw_rationale = answer.raw_rationale` 4. `correct = answer.correct` 5. `sort_order = answer.sort_order` 6. `deleted_at` set at ingestion timestamp when `answer.deleted = true` ### Comments/discussion 1. `user_id = null` unless a trusted upstream-to-core user map exists 2. `comment_text = discussion.message` 3. `metadata_json` includes upstream IDs/status/votes/platform/app metadata ### Optional stats (only if upstream provides item-level stats) Persist into stats surfaces with raw stats payload preserved in `stats_json`. ## Read-Only Guardrails 1. Never write back to upstream systems. 2. Ingestion updates only Katailyst canonical tables. 3. Treat upstream response as immutable input for hash/delta checks. ## Verification 1. Same payload => identical `source_sha256`. 2. Changed core learning fields => changed `source_sha256`. 3. Discussion-only changes do not affect `source_sha256`. 4. Re-run ingest on unchanged source should yield near-zero false deltas. --- ## Source: docs/references/contracts/README.md # Contracts Reference Index `docs/references/contracts/**` holds behavioral contracts and source-of-truth operating rules. Use this folder when you need to answer questions like: - Runtime ownership and repo boundaries: `docs/references/contracts/CURRENT_OPERATING_MODEL.md` and `docs/references/contracts/RUNTIME_OWNERSHIP_AND_CONSUMPTION.md` - Canonical vs mirror/export surfaces: `docs/references/contracts/MIRRORS_AND_PACKS.md` - DB URL and canonical script resolution: `docs/references/contracts/DB_URL_ENV_CONTRACT.md` - Vault-backed tool execution and secret-pointer rules: `docs/references/contracts/VAULT_TOOL_EXECUTION.md` - Atomic-unit taxonomy, quality standards, and duplicate policy: `docs/references/contracts/ATOMIC_UNIT_STANDARDS.md` and `docs/references/contracts/ATOMIC_DUPLICATE_DETECTION_POLICY.md` - Agent runtime naming and adaptation: `docs/references/contracts/AGENT_RUNTIME_AND_SKILL_ADAPTATION.md` If a contract appears to conflict with a higher-level primer, prefer `docs/RULES.md`, then `docs/VISION.md`, then the most specific contract in this folder. --- ## Source: docs/references/contracts/RECOVERY_POSTURE.md # Recovery Posture > Unified definition of "recovery" across the Katailyst system. Four modes, one posture. ## Why This Exists "Recovery" previously meant four different things in different parts of the codebase. This caused confusion for agents and operators who encountered the term without knowing which meaning applied. This document is the single reference. --- ## Mode 1: Content Recovery **What it is:** Restoring entity quality after bad edits, over-condensation, or bulk operations that damaged curated content. **When it triggers:** - An agent or script overwrites a rich skill with a stub - A bulk operation strips artifacts or detail from multiple entities - An edit reduces a published entity below its quality bar **What the system does:** - Entity revisions are immutable — prior versions can always be loaded via `registry.artifact_body` or revision history - The `history.query` tool shows recent write operations with before/after context - The `ARCHITECTURE_PRESERVATION_AND_RECOVERY.md` contract defines how to restore rich doc structure **What the agent/user sees:** A revision with lower quality than the prior version. Recovery = load the previous revision, compare, and restore what was lost. **Key reference:** `docs/references/contracts/ARCHITECTURE_PRESERVATION_AND_RECOVERY.md` --- ## Mode 2: Generation Fallback **What it is:** Graceful degradation when AI-powered generation fails — structured output parsing errors, model timeouts, invalid JSON, or missing context. **When it triggers:** - The factory intake pipeline produces malformed output - A deliberation round fails to generate a valid artifact - A tool execution returns an error from the upstream provider **What the system does:** - Returns deterministic placeholder content using validated ref patterns and manual refinement hints - Logs the failure class and context so the operator can diagnose - Does NOT silently swallow the failure — the output is clearly marked as fallback **What the agent/user sees:** A partial result with clear markers that generation failed and what to try next (different model, simpler prompt, more context, retry). **Key reference:** `lib/factory/intake-draft-generator.ts` (fallback generation rules) --- ## Mode 3: Run Remediation **What it is:** Handling eval cases and test scenarios that can't fully execute — aspirational tests, tests for capabilities that don't exist yet, or tests that require resources the system doesn't have. **When it triggers:** - An eval case references a tool that isn't connected (e.g., phone calling) - A test scenario requires a capability that's planned but not yet built - A deliberation or eval run fails partway through **What the system does:** - Keeps the test case — does NOT delete it just because it can't run today - Scores it honestly (0/5 or "not_executable" with explanation) - Records what happened so the operator can see the gap - The eval system is designed to show trajectory over time, including aspirational cases **What the agent/user sees:** A test that ran but couldn't complete, with a clear explanation of why. The test stays in the suite and gets re-run each time — eventually the capability will exist and the score will improve. **Key reference:** `lib/readiness/contracts.ts`, `docs/atomic-units/EVAL_CASES.md` --- ## Mode 4: Session Continuity **What it is:** Resuming work after context loss, timeout, crash, or deployment. **When it triggers:** - A long-running deliberation times out - An agent loses its context window mid-task - A deployment happens while a workflow is running - A user starts a new conversation and needs to pick up where they left off **What the system does:** - Registry state is always in the DB (canonical) — nothing is lost on context reset - Run IDs and revision history provide audit trails to resume from - `memory.query` returns shared agent memories that persist across sessions - `registry.agent_context` can resume from a prior run via `resume_from_run_id` - Mirror surfaces (`.claude/`, `.claude-plugin/`) provide portability snapshots **What the agent/user sees:** Start a new session, call `registry.agent_context` with the prior run context, and the system rebuilds the working packet. **Key reference:** `docs/references/contracts/MIRRORS_AND_PACKS.md` --- ## The Unified Posture Recovery is not an emergency. It's a normal operating mode. The system is designed to handle all four modes gracefully: 1. **Immutable revisions** mean content is never permanently lost 2. **Honest failure reporting** means generation problems are visible, not hidden 3. **Aspirational testing** means we don't delete tests just because they fail today 4. **Persistent state** means context loss is a speed bump, not a cliff Agents encountering any form of recovery should: - Check what mode they're in (content, generation, run, or session) - Use the appropriate tool (`history.query`, `get_entity` with prior version, `memory.query`, `registry.agent_context` with resume) - Report what they found honestly - Never silently swallow failures or pretend recovery didn't happen --- ## Source: docs/references/contracts/TOOL_INTEGRATION_PHILOSOPHY.md # Tool Integration Philosophy **Date:** 2026-03-26 **Status:** Active reference ## The Core Idea Katailyst is a registry of 1,500+ composable building blocks (skills, knowledge bundles, tools, prompts, styles, rubrics, playbooks, etc.) connected by a 10,000-link knowledge graph. External agents connect via MCP, search the graph with rich natural-language intents, and the graph returns ranked building blocks. The agents decide how to compose them. The system serves up menus, not mandates. Tools are just another entity type in this graph. A tool entity (`entity_type: 'tool'`) has the same lifecycle as a skill or KB: it gets created, tagged, linked to hubs, discovered via embeddings, and composed by agents. The graph doesn't treat tools as special. ## How Tools Actually Work (End to End) ### 1. Tools live in the registry as entities Every external tool (Firecrawl, Cloudinary, Gamma, Marketo, etc.) is a `tool` entity in `registry_entities` with: - A `tools` extension row (provider, tool_type, auth_method, risk_level) - A revision with `content_json` containing `call_spec` (the API contract: URL, method, input schema, auth, polling config) - Tags for discoverability (e.g., `domain:marketing`, `family:automation`) - Links to hubs and related entities (e.g., `tool:gamma.generate` is linked from `hub:hub-meeting` via `uses_tool`) ### 2. Agents find tools through the graph When an agent needs a tool, they have two paths: **Path A -- discover (recommended):** Call `discover` with a rich intent like "I need to generate a presentation about Q1 results for the board." The graph returns ranked results including tool entities alongside skills, KBs, content types, recipes, etc. The agent sees the full menu of building blocks. **Path B -- tool.search (focused):** Call `tool.search` with a query. This searches only `entity_type: 'tool'` entities. Returns tool refs with provider, risk level, and family classification. The agent then calls `tool.describe` to get the full call_spec, and `tool.execute` to run it. ### 3. Execution goes through a hosted pipeline When `tool.execute` is called: 1. `registry-tool-service.ts` loads the tool from the catalog 2. Policy evaluation checks risk level, human approval gates, circuit breakers 3. Vault resolves the required secret (API key) from `auth_secret_key` 4. The executor dispatches based on `call_spec.executor.kind`: - `http` -- generic HTTP executor makes the API call - Provider-specific executors handle special cases (Tavily, email, bash, agentmail) - `internal` -- runs internal logic 5. Output is audited and optionally LLM-summarized 6. Result returns to the agent ### 4. Tool manifests are a bootstrapping mechanism Files like `lib/tools/manifests/gamma.ts` define tool entities in code. A CLI script (`scripts/registry/seed/register-tools.ts`) writes them into the DB. Once in the DB, the manifest files are effectively dead -- the tool lives in the graph. Manifests exist because writing a 400-line tool definition with input schemas, async polling config, and link declarations is easier in TypeScript than through the MCP create API. ## The Two Systems Problem The codebase has two parallel classification systems that don't talk to each other: ### System 1: The Graph (what we want) Entities linked to hubs via typed edges. When `hub-multimedia` links to `tool:cloudinary.transform` via `recommends`, the graph knows Cloudinary is a multimedia tool. When `hub-meeting` links to `tool:gamma.generate` via `uses_tool`, the graph knows Gamma is a presentation/meeting tool. Discovery finds these relationships through embeddings and graph traversal. The graph is flexible: adding a new tool means creating the entity, linking it to relevant hubs, and the system discovers it automatically. No code changes. ### System 2: The Hardcoded Foundation (what we're moving away from) `lib/mcp/tool-family-foundation.ts` defines 7 "tool families" with hardcoded `provider_keys` arrays. `tool.search` uses these to: - Classify tools into families (research-web, media-design, communications, etc.) - ~~Determine executability~~ (REMOVED -- now based on call_spec presence) - Filter by family when `family_id` is passed `registry.capabilities` and `guide` also read from this constant to show tool family readiness status. ### Why this is a problem - Adding a new tool's provider to the right family requires editing TypeScript code - The graph already has the same information (hub links) but tool.search doesn't use it - 38 of 49 providers had no family assignment because nobody updated the code - The families in the constant don't match the hubs in the graph (7 families vs 20 hubs) - It's a parallel routing system that bypasses the intelligence of the graph ### What we changed (2026-03-26) 1. **Executability decoupled from families.** `tool.search` now determines `executable: true` based on whether the tool has a `call_spec` in its revision content, not based on family membership. A tool with an API spec is executable regardless of whether its provider is in a hardcoded list. 2. **Provider lists expanded.** All 49 providers now have family classification as metadata tags. This is just classification, not gating. 3. **Intent classifier stripped.** `inferTaskType()` was reduced to 4 structural patterns (onboarding, creation, registry, repair). All content-domain classification removed -- the graph handles it through embeddings. ### Where we want to get to Eventually `tool.search`'s family classification should come from the graph (hub membership) instead of `TOOL_FAMILY_FOUNDATION.provider_keys`. The constant would remain only for `guide` and `registry.capabilities` where it describes the overall readiness posture of tool families as a system health view. ## The Vision From the CEO (paraphrased): > We're not trying to be this forced orchestration. We more so want an approach of okay, you're going to do planning and research and you should make sure you input that into the graph. We want to find the top 1% and study what's making it work. We are just basically handing all these building blocks to these sub-agents that then are doing their own interpretation. We want to lean into model intelligence and we want them to have the context and the research. We're not trying to pre-determine what they need. We want that flexibility. We want to be able to add 100 more tools and 500 more KBs and not have that destroy our graph. The graph needs to handle it. This means: - No forced routing into specific paths - Tools, skills, KBs are all building blocks served by the graph - Agents use intelligence to compose, not predetermined sequences - New tools should work by being added to the graph (entity + links), not by editing code - Sub-agents explore multiple angles in parallel - Rich input drives quality -- agents should write paragraph-length discovery intents - The system should scale to hundreds of tools without code changes ## Practical Rules for Adding a New Tool 1. Create the tool entity via MCP `registry.create` or the manifest + seed script 2. Include a `call_spec` in the revision `content_json` with input schema, auth config, and executor kind 3. Tag it appropriately (`domain:X`, `family:Y`) 4. Link it to relevant hubs (the ones that should "recommend" or "use" this tool) 5. Set `status: 'curated'` or `'published'` so it appears in search 6. That's it. No TypeScript file edits needed. The graph handles discovery. ## Common Agent Pitfalls ### tool.execute requires actorUserId When calling `tool.execute` through MCP with a generic bearer token (e.g., from Claude Code), the auth context has `actorUserId: null`. The system actor fallback (`resolveActorUserIdOrFallback` in `lib/mcp/handlers/hosted-execution.ts`) handles this for HLT org by using a synthetic system actor ID. If you get "requires an authenticated actor user," the deployment may be stale -- wait for the next Vercel deploy or trigger one. ### Vault secret resolution chain A tool needs THREE things to resolve its API key: 1. `auth_secret_key` set in the `tools` DB table (e.g., `firecrawl/api-key`) 2. A row in `integration_secrets` matching that key for the execution org 3. The `vault.decrypted_secrets` entry pointed to by the integration_secret If any link is missing, execution fails with `missing_integration_secret` or `missing_vault_secret`. Check all three when debugging auth failures. ### Cross-org entity references System-org entities (org `00000000-0000-0000-0000-000000000001`) and HLT-org entities (org `8ba36969-a2da-401b-8b00-3a14d49f7b08`) can link to each other, but the `org_id` parameter on `registry.link` must match the FROM entity's org. When linking FROM a system entity, use the system org_id. When linking FROM an HLT entity, use the HLT org_id. ### HTTP executor limitations - Only supports `body_format: "json"` -- multipart/form-data doesn't work (affects Captions.ai, Whisper transcription) - Only supports `executor.kind: "http"` or `"internal"` -- no WebSocket (affects Framer server-api) - JWT-signing auth (Kling native API) needs a custom provider executor -- use FAL as a proxy instead - Provider executors exist for: agentmail, bash, publish.email, tavily (keys in `lib/tools/provider-executors.ts`) ### Hub wiring philosophy - **Primary action tools** get hub links (the tools agents discover fresh through intent-based search) - **Operational sub-tools** (status, cancel, errors, list) connect to their PARENT tool via `related` links, NOT to hubs - Hub links are for discovery, not hierarchy -- they're recommendations, not mandates - A tool can link to multiple hubs (e.g., `notion.query` on both hub-planning and hub-research) ## File Map | File | What it does | | ---------------------------------------------------- | --------------------------------------------------- | | `lib/tools/catalog.ts` | Loads tool records from DB for execution | | `lib/tools/executor.ts` | Core dispatch: HTTP, internal, MCP executors | | `lib/tools/http-executor.ts` | Makes actual HTTP API calls | | `lib/tools/provider-executors/` | Custom per-provider execution (Tavily, email, bash) | | `lib/tools/registry-tool-service.ts` | MCP-facing service: search, describe, execute | | `lib/tools/policy.ts` | Risk assessment, human approval gates | | `lib/tools/circuit-breaker.ts` | Failure protection | | `lib/tools/audit.ts` | Execution audit logging | | `lib/tools/manifests/` | Static tool definitions for DB seeding | | `lib/tools/register-from-manifest.ts` | Writes manifests into the DB | | `lib/mcp/tool-family-foundation.ts` | Family classification metadata (being simplified) | | `lib/mcp/handlers/hosted-execution/tool-search.ts` | tool.search MCP handler | | `lib/mcp/handlers/hosted-execution/tool-describe.ts` | tool.describe MCP handler | | `lib/mcp/handlers/hosted-execution/tool-execute.ts` | tool.execute MCP handler | | `lib/vault/resolve-integration-secret.ts` | Vault-backed API key resolution | --- ## Source: docs/references/contracts/VAULT_TOOL_EXECUTION.md # Vault + Tool Execution (End-to-End Guide) Purpose: make Vault-backed secrets usable by tools and agent callers **without** leaking secret values into git, packs, logs, or client surfaces. Canonical background docs (read first): - `docs/reports/phase-01-vault-integration.md` - `docs/references/supabase/SUPABASE_VAULT_NOTES.md` - `database/001-schema-ddl.sql` (tables: `integration_secrets`, `tools.auth_secret_key`) This doc focuses on **operational setup** + the minimal backend surface to prove the pattern works. --- ## Quickstart (Fast Path) This is the fastest path to a working bootstrap trio (Tavily + CodeSandbox + Relevance) with: - system tools catalog (global) - per-org Vault secrets (private) - one execution API (`POST /api/tools/execute`) - one tool-inspection API (call spec + examples): `POST /api/tools/describe` ### 1) Preflight (read-only) ```bash just preflight ``` If you do not have `just` installed: ```bash npx tsx scripts/ops/preflight.ts ``` ### 2) Bootstrap your org + owner membership This expects you already created your user in Supabase Auth (so it exists in `auth.users`). ```bash just bootstrap-org --org-code hlt --org-name "HLT" --user-email you@company.com ``` ### 3) Seed the bootstrap catalog (system catalog + call_specs) Pre-launch, the bootstrap catalog is the canonical source for the "bootstrap tool set": - system tool entities + tags - `tools` wiring (endpoint_url + `auth_secret_key` pointers) - `entity_revisions.content_json.call_spec` + example payloads Apply the seed SQL to your canonical DB: ```bash just db-seed-bootstrap --verify ``` If you do not have `just` installed: ```bash npx tsx scripts/registry/seed/seed_registry_examples_to_db.ts --file database/009-seed-katailyst-canonical-examples.sql --verify ``` ### 4) Import (or set) the required secrets into Vault Recommended (manual, via stdin): ```bash echo -n "" | just vault-set --org-code hlt --secret-key novu/secret-key echo -n "" | just vault-set --org-code hlt --secret-key tavily/api-key echo -n "" | just vault-set --org-code hlt --secret-key codesandbox/api-token echo -n "" | just vault-set --org-code hlt --secret-key relevance/combined-token echo -n "" | just vault-set --org-code hlt --secret-key cohere/api-key echo -n "" | just vault-set --org-code hlt --secret-key voyage/api-key ``` ### 5) Confirm Vault inventory (org-scoped) ```bash just vault-inventory --org-code hlt ``` If you do not have `just` installed: ```bash npx tsx scripts/vault/vault_inventory.ts --org-code hlt ``` ### 6) Smoke test tool execution (Vault-backed; safe by default) This will: - verify the tool exists in the system catalog - verify your org has the required `integration_secrets` pointers - verify Vault decryption works server-side (never prints secret values) - optionally attempt live calls when you pass `--live` or `--live-if-ready` ```bash just smoke-tools --org-code hlt ``` --- ## 0) What’s Canonical vs What’s a Mirror - Canonical secret values: Supabase Vault (`vault.secrets`) - Canonical secret metadata (org-scoped): `public.integration_secrets` - Canonical tool references: `public.tools.auth_secret_key` (points to `integration_secrets.secret_key`) - Repo mirrors/packs: may contain `auth_secret_key` strings, but must never contain secret values ## 0a) Canonical HLT Vault Key Families Use stable key names in docs, manifests, and runbooks. Do not put raw secret values in repo-facing surfaces. - `slack/victoria/bot-token` - `slack/victoria/app-token` - `slack/julius/bot-token` - `slack/julius/app-token` - `slack/lila/bot-token` - `slack/lila/app-token` - `render/openclaw/api-key` - `render/api-key` - `linear/victoria/api-key` - `brave/api-key` - `devin/api-key` - `multimedia-mastery/agent-token` - `agentmail/victoria/api-key` - `agentmail/julius/api-key` - `agentmail/lila/api-key` These are naming conventions and metadata pointers only. Rotation and raw-value handling stay in Vault and provider consoles. --- ## 1) Add A Secret (Vault) Recommended: create secrets in Vault using the Supabase UI, or SQL via `vault.create_secret()`. Examples (placeholders only, do not commit real values): ```sql -- Create (returns UUID) select vault.create_secret( 'REDACTED', 'org/system/tavily/api-key', 'Tavily API key (system org)' ); -- Rotate: easiest is to update the existing vault row if you use a stable name update vault.secrets set secret = 'REDACTED', updated_at = now() where name = 'org/system/tavily/api-key'; ``` Notes: - Prefer stable names: `org///` - Use the Vault `name` field as your durable identifier (and keep the UUID if you want). --- ## 2) Create The Org-Scoped Metadata Pointer (`integration_secrets`) This table is RLS-protected and stores references only. ```sql -- System org id helper exists in `database/004-rls-policies.sql`: -- select system_org_id(); insert into public.integration_secrets ( org_id, secret_key, vault_secret_name, description, status ) values ( system_org_id(), 'tavily/api-key', 'org/system/tavily/api-key', 'Tavily API key for tool execution', 'active' ) on conflict (org_id, secret_key) do update set vault_secret_name = excluded.vault_secret_name, description = excluded.description, status = excluded.status, updated_at = now(); ``` Rotation pattern: - rotate the Vault value (`vault.secrets.secret`) - optionally set `integration_secrets.rotated_at = now()` --- ## 3) Wire A Tool To The Secret (DB only, no values) Tools reference the metadata key: - `tools.auth_secret_key` = `integration_secrets.secret_key` Canonical example tools supported in code today: 1. `tool:novu.trigger` (HTTP, call_spec supported) 2. `tool:novu.health` (internal, call_spec executor) 3. `tool:novu.workflows.sync` (internal, call_spec executor) 4. `tool:novu.shadow-subscriber.upsert` (internal, call_spec executor) 5. `tool:novu.ops.snapshot` (internal, call_spec executor) 6. `tool:novu.ops.remediate` (internal, call_spec executor) 7. `tool:tavily.search` (HTTP, call_spec supported) 8. `tool:codesandbox.create-sandbox` (HTTP, call_spec supported) 9. `tool:relevance.knowledge-query` (HTTP, call_spec supported) 10. `tool:relevance.agent-trigger` (HTTP, call_spec supported) 11. `tool:relevance.workforce-trigger` (HTTP, call_spec supported; async poll) 12. `tool:relevance.task-status` (HTTP, call_spec supported) 13. `tool:publish.email` (HTTP, backed by Resend; bespoke executor fallback) 14. `tool:cohere.rerank` (HTTP, call_spec supported) 15. `tool:voyage.rerank` (HTTP, call_spec supported) Optional “soon”: `tool:codesandbox.update-sandbox` (likely SDK-driven; treat as non-canonical until a stable HTTP contract is confirmed). --- ## Legacy (Reference Only) This repo is pre-launch and aims to be strict and self-contained. Legacy imports are kept only as a reference path for migration/comparison. - Legacy tool import (call_spec + tags): `just tools-import-legacy ...` - Legacy Vault import helper: `just vault-import-legacy ...` Example wiring (placeholders only): ```sql -- Tavily search tool update public.tools set endpoint_url = 'https://api.tavily.com/search', auth_method = 'api_key', auth_secret_key = 'tavily/api-key' where entity_id = ( select id from public.registry_entities where entity_type = 'tool' and code = 'tavily.search' order by updated_at desc limit 1 ); -- Email tool (Resend) update public.tools set endpoint_url = 'https://api.resend.com/emails', auth_method = 'bearer', auth_secret_key = 'resend/api-key' where entity_id = ( select id from public.registry_entities where entity_type = 'tool' and code = 'publish.email' order by updated_at desc limit 1 ); ``` --- ## 4) Execute A Tool (Backend API) Repo code provides: - `POST /api/tools/execute` - Auth required (Supabase session cookie) - **Org owner/admin/editor required** for the execution org (Vault secrets are org-scoped) - Tool definitions come from the execution org when present, otherwise fall back to `system` - Resolves Vault secrets server-side and executes tools via `call_spec` (preferred) or bespoke fallback ### Inspect A Tool (Call Spec + Examples) Use this before building UI (CMS "Test Lab") or before an agent runs a tool for the first time. - `POST /api/tools/describe` - Auth required - Org owner/admin/editor required (same org scoping as execution) - Returns: tool metadata + `call_spec` + `examples` (if present) Example: ```json { "tool_ref": "tool:codesandbox.create-sandbox", "org_id": "" } ``` ### `tool:tavily.search` ```json { "tool_ref": "tool:tavily.search", "org_id": "", "input": { "query": "supabase vault decrypted_secrets best practices", "max_results": 5, "include_answer": true } } ``` ### `tool:novu.trigger` Default seeded endpoint points to self-hosted local API: - `http://localhost:3100/v1/events/trigger` For non-local environments, update `tools.endpoint_url` (or define an org-specific tool row). ```json { "tool_ref": "tool:novu.trigger", "org_id": "", "input": { "name": "org_hlt.content_ready", "to": { "subscriberId": "user_123", "email": "owner@example.com" }, "payload": { "title": "Draft approved", "assetId": "asset_001" }, "transactionId": "run_abc_001" } } ``` ### `tool:novu.health` ```json { "tool_ref": "tool:novu.health", "org_id": "", "input": {} } ``` ### `tool:novu.workflows.sync` ```json { "tool_ref": "tool:novu.workflows.sync", "org_id": "", "input": { "limit": 25 } } ``` ### `tool:novu.shadow-subscriber.upsert` ```json { "tool_ref": "tool:novu.shadow-subscriber.upsert", "org_id": "", "input": { "source_kind": "lead", "source_id": "lead_123", "preferred_email": "lead@example.com" } } ``` ### `tool:codesandbox.create-sandbox` This is optimized for agents: hand it a minimal files map and let it return a live sandbox. ```json { "tool_ref": "tool:codesandbox.create-sandbox", "org_id": "", "input": { "title": "hello-catalyst", "description": "Fast path smoke test", "template": "node", "files": { "package.json": { "content": "{\"name\":\"hello-catalyst\",\"private\":true,\"type\":\"module\",\"dependencies\":{}}\n" }, "index.js": { "content": "console.log('hello from catalyst')\n" } } } } ``` ### `tool:relevance.knowledge-query` ```json { "tool_ref": "tool:relevance.knowledge-query", "org_id": "", "input": { "region": "us", "knowledge_id": "kn_123", "query": "What is our canonical taxonomy rule?", "top_k": 5 } } ``` ### `tool:relevance.agent-trigger` ```json { "tool_ref": "tool:relevance.agent-trigger", "org_id": "", "input": { "region": "us", "agent_id": "agt_123", "message": { "role": "user", "content": "Run the agent." }, "context": { "any": "json" } } } ``` ### `tool:relevance.workforce-trigger` ```json { "tool_ref": "tool:relevance.workforce-trigger", "org_id": "", "input": { "region": "us", "workforce_id": "wf_123", "message": { "role": "user", "content": "Run the workforce." }, "context": { "any": "json" } } } ``` ### `tool:publish.email` (Resend) ```json { "tool_ref": "tool:publish.email", "org_id": "", "input": { "from": "you@yourdomain.com", "to": ["you@yourdomain.com"], "subject": "Catalyst test email", "text": "Hello from Catalyst tool execution." } } ``` Optional env var: - `RESEND_DEFAULT_FROM` (used if `input.from` is omitted) --- ## 4b) Smoke Test (One Command) This repo includes a smoke script that validates the full chain: - tool rows exist (system org) - org-scoped secret pointers exist (`integration_secrets`) - Vault secret is decryptable server-side (`vault.decrypted_secrets`) - `call_spec` HTTP execution works (including async polling when used) If you want an agent-focused entry point (discovery menus + execution), also see: - `kb:tools-guide-overview` - `kb:tavily-search` - `kb:codesandbox-overview` - `kb:codesandbox-recipes` Run: ```bash just smoke-tools --org-code hlt ``` If you do not have `just` installed: ```bash npx tsx scripts/ops/smoke_tools.ts --org-code hlt --live-if-ready ``` Notes: - Default behavior is safe: it runs checks, and only attempts live HTTP calls when required inputs are provided. - It never prints secret values. To include a live Relevance knowledge query, provide IDs: ```bash just smoke-tools --org-code hlt --relevance-region us --relevance-knowledge-id kn_xxx ``` To force live calls (fail fast if inputs are missing): ```bash npx tsx scripts/ops/smoke_tools.ts --org-code hlt --live --relevance-region us --relevance-knowledge-id kn_xxx ``` --- ## 5) Should Secret Values Live In The CMS? Recommended default (v1): - CMS manages **metadata only** (`integration_secrets` rows). - Secret values are written/rotated in Vault UI or via SQL tooling. If we later add "enter secret value" UX: - it must be server-only - it must never echo values back to the client - it must record an audit trail (Phase 7 run events + DB events) --- ## 6) How This Shares Across Agents / Pages / Orchestrators Key idea: store a secret once; reference it everywhere. - Any agent/tool that needs credentials uses a `secret_key` reference, not the value. - Any orchestrator can: - call `/api/discover` to get a menu of tools - call `/api/tools/execute` to run the tool using Vault-backed credentials ### One Entry Point Per Tool Family (Discovery) Prefer discovery by **family** (not by memorizing codes). ```json POST /api/discover { "types": ["tool"], "families": ["research"], "limit": 50, "facets": true } ``` Other useful families (seeded tags): `development`, `orchestration`, `evaluation`, `ingestion`. Tip: tools imported by `scripts/registry/import/import_legacy_tools.ts` attach tags like: - `family:research` (Tavily) - `family:development` (CodeSandbox) - `family:orchestration` (Relevance) Edge Functions are optional: - Use them for webhooks and thin utilities. - Do not force all execution through Edge Functions; keep HTTP + MCP adapters as additional surfaces. --- ## 7) MCP Notes (Canonical Examples) MCP is best treated as an adapter surface: 1. Supabase MCP (canonical DB access): see `docs/QUICK_START_AGENTS.md` section 4a 2. Playwright MCP (render/test lab): `.mcp.json` 3. Postgres MCP (local or self-hosted DBs): see the databases section in `awesome-mcp-servers` For deploying your own MCP server on Supabase Edge Functions: - Supabase guide (external): https://supabase.com/docs/guides/getting-started/byo-mcp - Supabase MCP server features (external): https://supabase.com/docs/guides/getting-started/mcp - Supabase MCP auth guide (external): https://supabase.com/docs/guides/getting-started/mcp#authentication - mcp-use Supabase deployment guide (external): https://mcp-use.com/docs/deployment/supabase --- ## Source: docs/references/DESIGN_SYSTEM_RULES.md # Design System Rules > Actionable rules for translating Figma designs into production code. > Every rule links to a real file. Every token maps to a real value. --- ## Design Identity > Calm complexity. Bold hierarchy. AI-first. Katailyst's UI draws from **Airtable** (structure that makes density readable), **Linear** (bold one-directional hierarchy), and **Notion** (progressive disclosure that hides depth until you need it). ### Core Principles 1. **The main thing is the main thing.** Each page has one hero element. Supporting context lives in collapsible sections, tooltips, or secondary panels. 2. **Progressive disclosure over page sprawl.** Use `DisclosureSection`, `HelpHint` tooltips, wizards, and sheets to layer depth. Don't show everything at once. 3. **AI is ambient, not decorative.** Integrate AI where it adds value (smart suggestions, auto-classification, quality signals). One AI CTA per surface, not a wall of wand buttons. 4. **Multimedia-forward.** Show images, previews, and visual artifacts. Don't describe what you can show. 5. **Agent-accessible.** Every surface should be usable by both humans and AI agents. Semantic HTML, aria labels, structured data attributes. 6. **Personalization-ready.** Design surfaces as composable modules that can be reordered, filtered, and adapted to user context. 7. **Bold, not dry.** Premium surfaces (KPI cards, score gauges, hero sections) use subtle depth cues -- inner glow, glass elevation, entrance animation. Standard content cards stay minimal. 8. **Wizards over mega-forms.** Multi-step flows over single-page walls of fields. Break complex creation into guided phases. ### Engineering Constraints - **Diff-first**: Any operation that creates or modifies data shows what will change before committing. - **DB is canonical**: Supabase is the source of truth. UI reads from and writes to DB via server actions, not local state. ### What We Avoid - Dry clinical dashboards with no visual hierarchy - Feature-dump pages where everything competes for attention - Dense data grids with no progressive disclosure - Flat, lifeless cards with zero interaction feedback - Walls of explanatory text (use `HelpHint` tooltips instead) - The `Sparkles` icon (banned -- use `Wand2` or context-specific icons for AI actions) - Generic placeholder icons for named tools (always use real brand SVGs) ### Before Writing UI Code (Mandatory Agent Workflow) Every agent modifying UI components MUST follow this workflow: 1. **Read the existing file** and all related files in the same directory before making changes. Never guess what a page looks like -- read it. 2. **Read this design system doc**, especially the Design Identity principles and the relevant section for the component type you're modifying. 3. **Use the UI/UX Pro Max skill** (`skills/ui-ux-pro-max`) for design decisions: color choices, typography, layout patterns, accessibility checks. 4. **Use the Frontend Design skill** (`skills/curated/global/engineering/frontend-design`) for aesthetic direction and bold design choices. 5. **Check the surface hierarchy** (Section 20) before adding any background opacity or blur. 6. **Check the motion rules** (Section 30) before adding any animation or transition. 7. **Run the pre-commit grep checks** at the bottom of Section 30 before submitting changes. Do NOT: - Guess what a page looks like based on the component name - Add UI components without reading the design system - Use `transition-all`, `hover:shadow-*` on cards, `Sparkles` icon, or any banned pattern without checking Section 22 - Write tiny text below `text-xs` (12px) for readable content (see Section 5) --- ## 1. Color System ### Figma Color → Tailwind Class Lookup When you see a color in a Figma frame, match it to the **nearest token** — never hardcode hex/HSL. #### Core Palette | Figma Hex (Light) | Figma Hex (Dark) | Tailwind `bg-` | Tailwind `text-` | When to Use | | ----------------- | ---------------- | ---------------- | ----------------------------- | ------------------------ | | `#FFFFFF` | `#0A0A0F` | `bg-background` | `text-foreground` | Page body | | `#FFFFFF` | `#0A0A0F` | `bg-card` | `text-card-foreground` | Card surfaces | | `#FFFFFF` | `#0A0A0F` | `bg-popover` | `text-popover-foreground` | Dropdowns, tooltips | | `#1B2559` | `#FAFAFA` | `bg-primary` | `text-primary-foreground` | Primary buttons, links | | `#F1F5F9` | `#262629` | `bg-secondary` | `text-secondary-foreground` | Secondary buttons | | `#F1F5F9` | `#262629` | `bg-muted` | `text-muted-foreground` | Disabled, secondary text | | `#F1F5F9` | `#262629` | `bg-accent` | `text-accent-foreground` | Hover highlights | | `#EF4444` | `#7F1D1D` | `bg-destructive` | `text-destructive-foreground` | Delete actions | | `#E2E8F0` | `#262629` | `border-border` | — | All borders | | `#E2E8F0` | `#262629` | `border-input` | — | Input/select borders | #### Signal Colors (Semantic Feedback) Pattern: `bg-signal-{name}/15 text-signal-{name}-fg` for WCAG AA tinted badges/alerts. | Signal | Hex | `bg-` tint | `text-` foreground | Use For | | ------- | --------- | ---------------------- | ------------------------ | ------------------- | | Success | `#22C55E` | `bg-signal-success/15` | `text-signal-success-fg` | Completed, positive | | Warning | `#F59E0B` | `bg-signal-warning/15` | `text-signal-warning-fg` | Caution, attention | | Danger | `#EF4444` | `bg-signal-danger/15` | `text-signal-danger-fg` | Errors, failures | | Info | `#3B82F6` | `bg-signal-info/15` | `text-signal-info-fg` | Informational | #### Entity Status Colors Pattern: `bg-status-{name}/15 text-status-{name}-fg` — maps 1:1 to the entity lifecycle. | Status | Visual Color | Badge Variant | Lifecycle Meaning | | ---------- | ------------ | ---------------------- | --------------------------- | | staged | Gray | `variant="staged"` | Newly created, not reviewed | | curated | Blue | `variant="curated"` | Reviewed, not yet live | | published | Green | `variant="published"` | Live and visible | | deprecated | Amber | `variant="deprecated"` | Phasing out | | archived | Gray | `variant="archived"` | Removed from discovery | IMPORTANT: There is no "draft" status. The lifecycle is `staged → curated → published → deprecated → archived`. #### Chart Colors | Token | Use | | --------------------------- | ----------------------- | | `chart-1` through `chart-5` | Data visualization only | #### Decision Rule If a color in Figma doesn't match any token above, it's probably wrong. Ask before inventing new tokens. The system is intentionally constrained. --- ## 2. Component API Reference ### Typography & Copy Density (Cockpit Default) Default: **minimal**. Prefer scanability over explanation. - **Page headers:** if you include `description`, keep it to one short clause. Avoid lists. - **Panels/sections:** avoid always-visible subtitle text (commonly `text-xs text-muted-foreground`). - Use `HelpHint` tooltips for explanatory copy instead. - **Exceptions:** empty states, error recovery, and required form guidance can include visible help text. - **AI CTA density:** one primary AI action per page surface. - Use a compact assist card or one explicit AI button. - Keep advanced AI options behind disclosure (sheet/details), not always-visible panels. Cockpit primitives: ```tsx import { HelpHint } from '@/components/ui/help-hint' import { Panel, PanelHeader, PanelTitle, PanelActions } from '@/components/ui/panel' import { SectionHeader } from '@/components/ui/section-header' ``` ### Button (`@/components/ui/button`) ```tsx import { Button } from '@/components/ui/button' ``` | Prop | Type | Default | Notes | | --------- | ----------------------------------------------------------------------------- | ----------- | ----------------------------------------- | | `variant` | `'default' \| 'destructive' \| 'outline' \| 'secondary' \| 'ghost' \| 'link'` | `'default'` | | | `size` | `'default' \| 'sm' \| 'lg' \| 'icon' \| 'icon-sm'` | `'default'` | | | `loading` | `boolean` | `false` | Shows Spinner, sets `aria-busy`, disables | | `asChild` | `boolean` | `false` | Renders as child element (Radix Slot) | Variant → visual mapping: - `default` → `bg-primary` solid, `shadow-sm shadow-black/5` - `destructive` → `bg-destructive` solid - `outline` → `border border-input`, `bg-background`, `shadow-sm shadow-black/5` - `secondary` → `bg-secondary` - `ghost` → transparent, `hover:bg-accent` - `link` → text only, `hover:underline` Size → dimensions: - `default` → `h-10 px-4 py-2` - `sm` → `h-9 px-3` - `lg` → `h-11 px-8` - `icon` → `h-10 w-10` - `icon-sm` → `h-8 w-8` Focus: `focus-visible:outline-2 focus-visible:outline-ring/70` Disabled: `disabled:pointer-events-none disabled:opacity-50` SVGs inside: auto-sized to `size-4`, `pointer-events-none`, `shrink-0` ### Badge (`@/components/ui/badge`) ```tsx import { Badge } from '@/components/ui/badge' ``` | Prop | Type | Default | Notes | | --------- | --------- | ----------- | ------------------------------------------------- | | `variant` | see below | `'default'` | | | `dot` | `boolean` | `false` | Prepends a `size-1.5 rounded-full bg-current` dot | Variants: - **Structural:** `default`, `secondary`, `destructive`, `outline` - **Signal:** `success`, `warning`, `danger`, `info` - **Status:** `staged`, `curated`, `published`, `deprecated`, `archived` Base styles: `rounded-full border px-2.5 py-0.5 text-xs font-semibold` ### InlineTagPills (`@/components/ui/inline-tag-pills`) **Always use `InlineTagPills` for registry taxonomy tags.** Never render tags as plain `Badge variant="outline"` -- the namespace-colored pills provide visual distinction between tag categories (domain, family, format, etc.) at a glance. ```tsx import { InlineTagPills } from '@/components/ui/inline-tag-pills' // From browse views (enriched facets) // From detail views / discovery (raw tag strings) // With empty state ``` | Prop | Type | Default | Notes | | ----------- | --------------------- | ------- | ----------------------------------------------- | | `facets` | `Array<{key, label}>` | -- | From `enrichTagFacets()` (browse views) | | `tags` | `string[]` | -- | Raw `namespace:code` strings (detail/discovery) | | `max` | `number` | `2` | Visible pills before `+N` overflow | | `maxWidth` | `string` | `100px` | Per-pill max-width (truncates with ellipsis) | | `showEmpty` | `boolean` | `false` | Render empty state text when no tags | | `emptyText` | `string` | `--` | Text shown in empty state | **Namespace color palette** (defined in `lib/ui/tag-display.ts`): | Namespace | Background | Text (dark mode) | | ---------- | ---------------- | ----------------------- | | `domain` | `purple-500/12` | `purple-400` | | `family` | `blue-500/12` | `blue-400` | | `format` | `emerald-500/12` | `emerald-400` | | `audience` | `orange-500/12` | `orange-400` | | `surface` | `teal-500/12` | `teal-400` | | `runtime` | `slate-500/12` | `slate-400` | | `dept` | `indigo-500/12` | `indigo-400` | | `source` | `amber-500/12` | `amber-400` | | `role` | `rose-500/12` | `rose-400` | | `phase` | `cyan-500/12` | `cyan-400` | | `channel` | `violet-500/12` | `violet-400` | | `topic` | `lime-500/12` | `lime-400` | | (fallback) | `bg-muted` | `text-muted-foreground` | NOTE: The palette uses raw Tailwind color ramp references (not semantic tokens) because the 12 namespaces need distinct hues that don't exist in the semantic token set. This is an approved exception. **Display logic:** The namespace prefix is stripped for display (`domain:writing` -> `Writing`). The code portion is titleized (`content-writing` -> `Content Writing`). ### Card (`@/components/ui/card`) ```tsx import { Card, CardHeader, CardTitle, CardDescription, CardContent, CardFooter, } from '@/components/ui/card' ``` | Sub-component | Base Styles | Notes | | ----------------- | --------------------------------------------------------------------------- | -------------- | | `Card` | `rounded-lg border border-border/50 bg-card text-card-foreground shadow-sm` | Outer wrapper | | `CardHeader` | `flex flex-col space-y-1.5 p-6` | Title area | | `CardTitle` | `text-2xl font-semibold leading-none tracking-tight` | Renders `

` | | `CardDescription` | `text-sm text-muted-foreground` | Subtitle text | | `CardContent` | `p-6 pt-0` | Body area | | `CardFooter` | `flex items-center p-6 pt-0` | Actions area | Card has a CVA `variant` prop: - `default` -- standard card (solid `bg-card`, `border-border/50`) - `interactive` -- clickable card (`transition-colors hover:border-primary/30 hover:bg-accent/30`) - `prominent` -- hero/featured card (solid `bg-card`, slightly stronger shadow) **IMPORTANT:** Cards are ALWAYS solid `bg-card`. Never `bg-card/XX` with opacity. Never gradients (`bg-gradient-to-*`). Never `backdrop-blur` on cards. ### Input (`@/components/ui/input`) ```tsx import { Input } from '@/components/ui/input' ``` Base: `h-10 w-full rounded-md border border-input bg-background px-3 py-2 text-sm` Hover: `hover:border-muted-foreground/30` Focus: `focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 focus-visible:shadow-sm` Disabled: `disabled:cursor-not-allowed disabled:bg-muted disabled:opacity-50` Placeholder: `placeholder:text-muted-foreground` ### Textarea (`@/components/ui/textarea`) Same interaction states as Input. Min height: `min-h-[80px]`. ### Label (`@/components/ui/label`) ```tsx import { Label } from '@/components/ui/label' ``` Base: `text-sm font-medium leading-none` Supports `required` prop → appends red `*` asterisk. Peer-disabled: `peer-disabled:cursor-not-allowed peer-disabled:opacity-70` ### Select (`@/components/ui/select`) ```tsx import { Select, SelectTrigger, SelectValue, SelectContent, SelectItem, } from '@/components/ui/select' ``` Trigger matches Input dimensions: `h-10 rounded-md border border-input bg-background px-3 text-sm`. Content z-index: `z-[110]` (above dialogs). ### Dialog (`@/components/ui/dialog`) ```tsx import { Dialog, DialogTrigger, DialogContent, DialogHeader, DialogTitle, DialogDescription, DialogFooter, } from '@/components/ui/dialog' ``` Overlay: `bg-black/80`, z-index `z-50`. Content: `max-w-lg p-6 gap-4 border bg-background shadow-lg`, centered. Close button: top-right `X` icon with `sr-only` label. ### Tabs (`@/components/ui/tabs`) ```tsx import { Tabs, TabsList, TabsTrigger, TabsContent } from '@/components/ui/tabs' ``` TabsList: `h-10 rounded-md bg-muted px-0.5 text-muted-foreground` Active trigger: `data-[state=active]:bg-background data-[state=active]:text-foreground data-[state=active]:shadow-sm` ### Table (`@/components/ui/table`) ```tsx import { Table, TableHeader, TableBody, TableRow, TableHead, TableCell, } from '@/components/ui/table' ``` Wrapped in `overflow-auto`. Row hover: `hover:bg-muted/50`. Head cells: `h-12 px-4 font-medium text-muted-foreground`. ### Empty State (`@/components/ui/empty`) ```tsx import { Empty, EmptyHeader, EmptyMedia, EmptyTitle, EmptyDescription } from '@/components/ui/empty' ``` Centered flex layout, `border-dashed`, `text-balance`. EmptyMedia has `variant="icon"` for a `size-10 rounded-lg bg-muted` icon container. ### Spinner (`@/components/ui/spinner`) ```tsx import { Spinner } from '@/components/ui/spinner' ``` Wraps `Loader2Icon`. Default `size-4 animate-spin`. Has `role="status"` and `aria-label="Loading"`. ### Skeleton (`@/components/ui/skeleton`) ```tsx import { Skeleton } from '@/components/ui/skeleton' ``` `animate-pulse rounded-md bg-muted`. Size via className. ### Alert (`@/components/ui/alert`) ```tsx import { Alert, AlertTitle, AlertDescription } from '@/components/ui/alert' ``` Variants: `default` (neutral), `destructive` (red border + text). Has `role="alert"`. Icons positioned absolute left. --- ## 3. Composition Recipes ### Recipe: Stat Card ```tsx
{value}

{title}

{description}

``` ### Recipe: Action Card (Clickable, with Status + Tags) ```tsx {/* Header: icon + title */}

{name}

{/* Meta row */}
Tier {tier} {rating}
{/* Description */}

{summary}

{/* Status + Tags row */}
{status}
``` ### Recipe: Expandable List Item (Run Viewer) ```tsx
{isExpanded &&
{/* Expanded content */}
}
``` ### Recipe: Error Banner ```tsx
{title}
  • {error}
``` ### Recipe: Empty State (Inline) ```tsx

{message}

{hint}

``` ### Recipe: Form Section ```tsx

{title}

{/* Full-width fields: add sm:col-span-2 */}
``` ### Recipe: Page Header ```tsx

{title}

{description}

``` --- ## 4. Layout System ### App Shell ``` ┌─────────────────────────────────────────────┐ │ Sidebar (fixed) │ TopNav (h-16, border-b) │ │ full: 16rem │──────────────────────────│ │ collapsed: 4rem │
│ │ hidden: 0 │ p-3 sm:p-6 │ │ │ bg-white dark:bg-… │ │ │ overflow-auto │ │ │
│ └─────────────────────────────────────────────┘ ``` - Sidebar width managed by `MenuContext` (`lib/contexts/menu-context.tsx`) - Content margin-left set via inline style (not Tailwind) to match sidebar state - Mobile (< 1024px): sidebar is overlay, margin-left is `0` - Layout wraps in: `
` ### Page Content Pattern Every page follows this skeleton: ```tsx
{/* Page Header */} {/* Error Banner (if errors) */} {/* Main Content (grids, lists, forms) */} {/* Bottom Panels */}
``` ### Grid Patterns | Pattern | Classes | Use | | ------------------- | --------------------------------------------------------------------- | ------------------- | | Stat cards (6-up) | `grid grid-cols-2 gap-3 sm:gap-4 lg:grid-cols-3 xl:grid-cols-6` | KPI overview | | Action cards (4-up) | `grid grid-cols-1 gap-3 sm:grid-cols-2 lg:grid-cols-3 xl:grid-cols-4` | Clickable cards | | Two-panel | `grid grid-cols-1 gap-4 sm:gap-6 lg:grid-cols-2` | Side-by-side panels | | Form fields | `grid grid-cols-1 gap-4 sm:grid-cols-2` | Form layouts | ### Spacing Scale | Context | Classes | Value | | ------------------- | ---------------------------- | ------------------ | | Page section gaps | `space-y-4 sm:space-y-6` | 16px → 24px | | Card internal | `p-3 sm:p-4` or `p-4 sm:p-6` | 12→16px or 16→24px | | Grid gaps | `gap-3` or `gap-4 sm:gap-6` | 12px or 16→24px | | Form field spacing | `space-y-2` | 8px | | Inline element gaps | `gap-1.5` or `gap-2` | 6px or 8px | | Tag/badge groups | `gap-1.5` | 6px | --- ## 5. Typography Scale | Role | Classes | Example | | ------------- | --------------------------------------------------- | -------------------------- | | Page title | `text-xl font-bold text-foreground` | Dashboard page headings | | Section title | `text-lg font-semibold text-foreground` | Card headers, panel titles | | Subsection | `text-base font-semibold text-foreground` | Panel sub-headers | | Card title | `text-sm font-semibold text-foreground` | Compact card headings | | Dialog title | `text-lg font-semibold leading-none tracking-tight` | Modal heading | | Body text | `text-sm text-foreground` | Default paragraph | | Secondary | `text-sm text-muted-foreground` | Descriptions, helper text | | Stat value | `text-lg font-bold sm:text-xl` | "42" | | Stat label | `text-xs font-medium sm:text-sm` | "Total Units" | | Button label | `text-sm font-medium` | "Save" | | Badge text | `text-xs font-semibold` | "published" | | Caption | `text-xs text-muted-foreground` | Timestamps, metadata | | Micro | `text-2xs text-muted-foreground` | Inline code refs | | Monospace | `font-mono text-xs` | Codes, hashes, refs | | Truncation | `line-clamp-2` or `truncate` | Descriptions | ### Font Weight Rules - `font-bold` -- page titles and stat values only - `font-semibold` -- all other headings, badge text, nav labels - `font-medium` -- button labels, active nav items, form labels - `font-normal` (default) -- body text, descriptions Never use `font-extrabold`, `font-black`, or `font-thin`. The system uses a narrow weight range for cohesion. Font: Inter (Google Fonts), loaded in `app/layout.tsx`. ### Minimum Text Sizes | Context | Minimum | Tailwind | Notes | | ------------------------------------------------- | ------- | ---------- | ---------------------------------------------------------------------------------- | | Body text, descriptions | 14px | `text-sm` | Default for all readable content | | Labels, captions, metadata | 12px | `text-xs` | Minimum for any text a user needs to read | | Decorative micro-labels (uppercase tracking-wide) | 11px | `text-2xs` | ONLY for uppercase label text with wide tracking (e.g., "ORG SHARED", "MARKETING") | | Badge text, inline codes | 11px | `text-2xs` | Acceptable inside Badge components and `` blocks | **Banned below 11px**: Never use `text-[9px]`, `text-[10px]`, or `text-3xs` for any text a user needs to read. If you find yourself needing text this small, the design is too dense -- use progressive disclosure instead. **Hardcoded sizes**: Avoid `text-[Npx]` entirely. Use the semantic scale (`text-xs`, `text-sm`, `text-2xs`). Hardcoded pixel sizes bypass the type scale and create inconsistency. --- ## 6. Interaction States ### Hover Patterns | Surface | Hover Class | Notes | | ---------------- | --------------------------------------------------- | -------------------- | | Primary button | `hover:bg-primary/90` | Slight dim | | Outline button | `hover:bg-accent hover:text-accent-foreground` | Fill with accent | | Ghost button | `hover:bg-accent hover:text-accent-foreground` | Same as outline | | Secondary button | `hover:bg-secondary/80` | Slight dim | | Card (clickable) | `hover:bg-accent/30 hover:border-primary/30` | Color shift only | | Link text | `hover:underline` | Text decoration | | Group hover | `group` on parent, `group-hover:underline` on title | Card title underline | | Table row | `hover:bg-muted/50` | Subtle highlight | | Input/Textarea | `hover:border-muted-foreground/30` | Border darkens | **IMPORTANT: Card/panel hover discipline:** - Use `transition-colors` on cards and panels. NEVER `transition-all`. - Card hover = color changes only (`hover:bg-accent/30`, `hover:border-primary/30`). - NEVER add `hover:shadow-md`, `hover:shadow-xl`, or `hover:shadow-[...]` to cards. - NEVER add `hover:scale-[...]` or `hover:-translate-y-*` to cards. - NEVER add `group-hover:scale-*` to icons inside cards or list items. - The ONLY acceptable `hover:scale` locations: submit/action buttons (`hover:scale-105 active:scale-95`) and brand logo elements. ### Focus Patterns | Component | Focus Classes | | -------------- | -------------------------------------------------------------------------------------------------- | | Button | `focus-visible:outline-2 focus-visible:outline-ring/70` | | Input/Textarea | `focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 focus-visible:shadow-sm` | | Badge | `focus:ring-2 focus:ring-ring focus:ring-offset-2` | | Tabs trigger | `focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2` | | Dialog close | `focus:ring-2 focus:ring-ring focus:ring-offset-2` | ### Disabled Pattern All interactive components: `disabled:pointer-events-none disabled:opacity-50` Inputs additionally: `disabled:cursor-not-allowed disabled:bg-muted` ### Shadow System | Usage | Classes | | ------------------- | ----------------------------------------- | | Default button/card | `shadow-sm` or `shadow-sm shadow-black/5` | | Dialog/sheet | `shadow-lg` | | Dropdown/popover | `shadow-md` | | Floating action bar | `shadow-2xl` | | No shadow | (ghost buttons, badges, inline panels) | **IMPORTANT:** Shadow should NOT change on hover for cards. Cards use color-based hover feedback (`hover:bg-accent/30`), not elevation changes. Shadow hover is reserved for floating action buttons and tooltips only. ### Border Patterns | Usage | Classes | | ---------------------- | ---------------------------------------------- | | Standard border | `border border-border` | | Container with opacity | `border border-border/50` | | Inner divider | `border-t border-border/40` | | Input/select border | `border border-input` | | Dashed (empty state) | `border border-dashed border-border/50` | | Section divider | `border-t border-border` | | Signal border (error) | `border border-signal-danger/30` | | Hover border emphasis | `hover:border-primary/30` | | Form section | `border border-gray-200 dark:border-[#1F1F23]` | **IMPORTANT: Canonical border opacities:** - `border-border/50` -- standard for card containers, panels, sections - `border-border/40` -- inner dividers and nested separators - `border-border` (no opacity) -- full-strength borders on UI primitives (inputs, tables) **BANNED border opacities:** `/60`, `/70`, `/80`, `/90`. These create inconsistency. If you need a border, pick from the three canonical values above. --- ## 7. Radius Tokens | Token | Value | Tailwind | Usage | | ---------------- | ---------------- | -------------- | --------------------------- | | `--radius` | `0.5rem` (8px) | `rounded-lg` | Cards, modals, alerts | | `--radius - 2px` | `0.375rem` (6px) | `rounded-md` | Inputs, selects, tabs | | `--radius - 4px` | `0.25rem` (4px) | `rounded-sm` | Small interactive elements | | — | — | `rounded-xl` | Action cards, form sections | | — | — | `rounded-full` | Badges, dots, avatars | Decision: Cards get `rounded-lg` or `rounded-xl`. Inputs get `rounded-md`. Badges get `rounded-full`. --- ## 8. Animation Rules Animations and transitions work normally by default. A `@media (prefers-reduced-motion: reduce)` rule in `globals.css` disables all transitions and animations when the user requests reduced motion via their OS settings. There is **no** blanket kill switch — animations run normally for users who haven't opted into reduced motion. ### CSS Animations in globals.css | Class | Keyframe | Duration | Use | | ------------------ | ---------- | -------- | ---------------- | | `.animate-shimmer` | `shimmer` | 1.8s | Skeleton loading | | `.animate-spin` | (Tailwind) | 1s | Spinner | Standard Tailwind transition utilities work normally. **Default to `transition-colors`** for most interactive elements. See Section 21 for the full transition discipline. `transition-all` is banned on cards, panels, and list items. ### Framer Motion (Preferred for Rich Animations) For page-enter effects, staggered lists, and viewport-triggered animations, use `framer-motion`. It applies styles via JavaScript, so it works even when CSS animations are disabled. Existing framer-motion components: - `CardSpotlight` — mouse-follow spotlight glow on cards - `AnimatedNumber` — count-up number animation on viewport enter - `BlurFade` — fade-in with blur-to-sharp transition on viewport enter - `AnimatedList` — stagger-animate child elements ```tsx import { BlurFade } from '@/components/ui/blur-fade' import { AnimatedList } from '@/components/ui/animated-list'
{content}
{items.map(item =>
{item.name}
)}
``` ### Adding New CSS Animations Add keyframes and a class to `app/globals.css`: ```css @keyframes your-animation { /* ... */ } .animate-your-animation { animation: your-animation 0.3s ease-out; } ``` The `prefers-reduced-motion` rule will automatically suppress it for users who opt in. --- ## 9. Dark Mode - Controlled by `next-themes` with `attribute="class"`, `defaultTheme="light"`, `enableSystem={false}` - `ThemeProvider` wraps the entire app in `app/layout.tsx` - All semantic tokens have automatic dark variants defined in `globals.css` under `.dark {}` - IMPORTANT: `disableTransitionOnChange` is enabled — theme switches are instant, no flash ### What This Means for Implementation - **Use tokens, not raw colors.** `text-foreground` auto-switches. `text-gray-900` does not (it's force-overridden in globals.css but that's a compatibility hack, not a pattern to follow). - **Form sections** now use `bg-card border-border` (updated from the legacy `bg-white dark:bg-[#0F0F12]` pattern). - **Section headers** use `text-gray-900 dark:text-white`. Prefer `text-foreground` for new work. - **Never use Tailwind's raw gray scale** (`gray-100`, `gray-200`, etc.) — the globals.css overrides remap them anyway. Use semantic tokens. --- ## 10. Icons - Library: **Lucide React** (`lucide-react`) - IMPORTANT: Do NOT install additional icon packages. All icons come from Lucide or Figma asset payloads. ### Sizing Convention | Context | Class | Pixel Size | | ---------------- | ----------------------------------------- | ---------- | | Inline with text | `size-3` or `h-3 w-3` | 12px | | Standard icon | `size-4` or `h-4 w-4` | 16px | | Empty state hero | `h-6 w-6` | 24px | | Inside Button | Auto-sized via `[&_svg]:size-4` on Button | 16px | ### Common Icons in Use | Icon | Import | Usage | | ---------------- | -------------- | ---------------------------- | | `Zap` | `lucide-react` | Actions, playbooks | | `Star` | `lucide-react` | Ratings | | `Clock` | `lucide-react` | Timestamps | | `AlertCircle` | `lucide-react` | Error/failure indicators | | `RefreshCw` | `lucide-react` | Refresh buttons | | `Loader2Icon` | `lucide-react` | Loading (wrapped by Spinner) | | `X` | `lucide-react` | Close/dismiss | | `Check` | `lucide-react` | Selected state | | `ChevronDown/Up` | `lucide-react` | Expand/collapse | | `Layers` | `lucide-react` | Registry/collection | | `BookOpen` | `lucide-react` | Skills | | `Wrench` | `lucide-react` | Tools | | `Eye` | `lucide-react` | Visibility/schemas | | `BadgeCheck` | `lucide-react` | Verified/published | ### Banned Icons | Icon | Why | Use Instead | | ---------- | ----------------------------------- | --------------------------------------------------------------------------------------------- | | `Sparkles` | Overused AI cliche, user preference | `Wand2` (default AI action), `Brain` (reasoning), `FlaskConical` (eval), `Search` (discovery) | ### AI Action Icons For AI-powered features, pick the icon that describes what the AI **does**, not that AI is involved: | AI Action | Icon | | ------------------- | ------------------------- | | Generate / create | `Wand2` | | Search / discover | `Search` or `Compass` | | Evaluate / score | `FlaskConical` or `Gauge` | | Reason / think | `Brain` | | Process / compute | `Cpu` | | Suggest / recommend | `Lightbulb` | ### Integration & Tool Icons - Named tools (Claude, Cursor, Codex, n8n, Slack, etc.) MUST use their actual brand SVG icon - Store integration icons in `public/icons/integrations/` as SVGs - For MCP servers: use the icon from the server's metadata when available - Never use a generic Lucide icon as a substitute for a real brand logo - Fallback only: use `Server` icon when no brand icon exists ### Accessibility - Decorative icons: no aria attributes needed (default) - Status indicator icons: add `aria-hidden="true"` (when meaning is conveyed by adjacent text) - Standalone interactive icons: use `aria-label` (e.g., Spinner has `aria-label="Loading"`) --- ## 11. Form Patterns ### Architecture Forms use **React 19 `useActionState`** + **server actions**. No react-hook-form. ```tsx 'use client' import { useActionState } from 'react' import { myServerAction } from '@/app/.../actions' type ActionState = { ok: boolean; error: string | null } export default function MyForm() { const [state, formAction] = useActionState(myServerAction, { ok: false, error: null, }) return (
{state.error && (
{state.error}
)}
) } ``` ### Field Layout - Single field: `
` wrapping Label + Input - Two-column grid: `
` - Full-width in grid: add `sm:col-span-2` to the field wrapper - Helper text below input: `
...
` ### Native Select (non-Radix) When inside forms with server actions, use native ` {/* ... */} ``` --- ## 12. Responsive Breakpoints | Breakpoint | Width | Usage | | ---------- | --------- | --------------------------------------------------- | | (base) | < 640px | Single column, compact padding (`p-3`, `gap-3`) | | `sm:` | >= 640px | Two columns begin, padding expands (`p-4`/`p-6`) | | `md:` | >= 768px | Rarely used directly | | `lg:` | >= 1024px | Three columns, sidebar visible (desktop breakpoint) | | `xl:` | >= 1280px | Four+ columns for card grids | Pattern: always mobile-first. Base styles are mobile, add `sm:`/`lg:`/`xl:` for progressive enhancement. --- ## 13. Figma MCP Integration Flow When implementing designs from Figma: 1. **Fetch context first.** Run `get_design_context` for the exact node(s) to get the structured representation. 2. **Handle large responses.** If truncated, use `get_metadata` for the node map, then re-fetch specific nodes. 3. **Get the visual reference.** Run `get_screenshot` for the variant being implemented. 4. **Download assets.** Use Figma MCP's asset endpoint for images/SVGs. If it returns a `localhost` source, use it directly — do not create placeholders. 5. **Translate to project conventions:** - Map Figma colors to the token table in Section 1 above - Use existing component APIs from Section 2 — don't rebuild what exists - Apply composition recipes from Section 3 for common patterns - Follow the interaction states from Section 6 — don't guess hover/focus behavior - Respect the animation kill switch (Section 8) — if the design shows motion, use framer-motion or a whitelisted class 6. **Validate.** Compare your implementation against the Figma screenshot for 1:1 visual parity before marking complete. ### Translation Decision Tree ``` Figma shows a color → Match to Section 1 token table → Found? → Use the Tailwind class Not found? → Ask before inventing tokens Figma shows a button → Check Section 2 Button API → Match variant/size? → Use ``` --- ## Source: docs/references/external-codebases.md # External codebases — reference pointers Codebases that live outside the `katailyst-1` working tree but are worth reading for pattern reuse, architectural precedent, or historical context. Each entry links to where the code actually lives on disk and names the reusable pieces. _Add new entries at the top._ For strategic vision docs (Gamma exports, planning memos, brand-consolidation audits), see [docs/references/vision/](vision/README.md). For pattern reference docs (single-concept crib sheets), see [docs/references/patterns/](patterns/). For the canonical 4-peer + 3-service map, see [docs/planning/active/2026-04-17-ecosystem-atlas-v2.md](../planning/active/2026-04-17-ecosystem-atlas-v2.md). **Status legend** (per entry): - `live-peer` — one of the 4 ecosystem peers (Katailyst, Paperclip, Mastra, Agent Canvas) - `live-service` — one of the 3 specialized services (MasteryPublishing, Multimedia4Mastery, sidecar-system) - `integrate-piece` — we lift a specific piece into Katailyst canon (phase tracked in Session 2 plan) - `reference-pinned` — read for patterns, do not auto-integrate - `archive` — superseded or dead; do not pull from --- ## Agent Canvas (Katailyst Agent Hub — coordination plane) **Status:** `live-peer` **On disk:** `~/Downloads/AI2 April/Agent-Canvas--main/` (snapshot) **Live:** `https://agent-coordination-canvas.replit.app/` · GitHub `Awhitter/Agent-Canvas-` · Replit `@alec24/Agent-Launchpad` **Stack:** Bun · PostgreSQL · React client assets · `kth_`-prefixed federation API keys · 12 Katailyst MCP tools wired **Role:** Coordination canon for the distributed AI agent fleet. 13+ canvas widgets (notes, todos, kanban, charts, mermaid, iframes, agent status panels). Board-scoped multi-agent surface. 7-table telemetry schema (agent runs, steps, MCP calls, artifacts, decisions, warnings, heartbeats). 10-minute stale threshold. **Federation contract:** External agents authenticate with `Authorization: Bearer kth_…`, post heartbeats, read/write scoped boards, coordinate via `/api/fleet/status`. Agent card at `/.well-known/agent-card.json`; doctrine at `/api/agent/instructions`. **Top files to read:** `README.md`, `docs/TASK_HUB_AGENT_GUIDE.md`, `replit.md`. **Related Session 2 phases:** Phase 7 (federation handshake), Phase 15 (Canvas-as-Paperclip-adapter). --- ## MasteryPublishing (content engine / article sink) **Status:** `live-service` **On disk:** `~/Downloads/AI2 April/MasteryPublishing-main/` **Live:** `https://v0-next-js-content-engine.vercel.app/` → Cloudflare Worker → `https://hltmastery.com/nursing/resources/{product}/{slug}` **Stack:** v0-generated Next.js · Supabase (own DB) · Vercel hosting **Role:** Article sink. Katailyst agents call `POST /api/publish` with `x-api-key: KATAILYST_API_KEY` to create/update article rows idempotently by `katailyst_id`. **Contract:** `ContentEnginePublishPayload` mirrored in `katailyst-1/lib/integrations/content-engine/contract.ts`. **Top files to read:** `README.md`, `DEVELOPMENT_RECAP.md`, `app/api/publish/route.ts`, `lib/types/database.ts`. **Related Session 2 phases:** Phase 11 (HLT Content Factory gated publish), Phase 17 (weekly cadence cron). --- ## Multimedia4Mastery (media hub + studio) **Status:** `live-service` **On disk:** `~/Downloads/AI2 April/Multimedia4Mastery-main/` **Stack:** Next.js 16 · Cloudinary · own MCP server · Vitest **Role:** Canonical media tool surface. Routed shared wizard (`/studio/image`, `/studio/flowchart`, `/studio/rationale`), batch studio, audio playground, Visual Rationale Wizard, HLT Social Studio sub-modules. **Contract:** `/api/media/v1/{tools,health,capabilities,library,generate,…}` + repo-owned MCP server. **Top files to read:** `README.md`, `docs/api/MEDIA_TOOL_CONTRACT.md`, `docs/ARCHITECTURE.md`, `docs/MCP.md`. **Related Session 2 phases:** Phase 10 (wire as hosted-execution MCP handlers in Katailyst). --- ## Evidence-Based-Business (HLT Data Analyzer) **Status:** `integrate-piece` **On disk:** `~/Downloads/AI2 April/Evidence-Based-Business-main.zip` **Stack:** Vite · React · Tailwind · shadcn/ui · Vercel API routes · OpenAI Chat Completions (structured JSON normalization) **Role:** Interactive financial + usage analytics dashboard with AI-assisted insight generation. Source of **evidence blocks** (stats, trends, citations) that article content should cite. **Contract:** `POST /api/ai-analysis` — `{ prompt, chartData?, agentic? } → { keyInsights, trendAnalysis, recommendations[], meta }`. Also `GET/POST /api/auth/session` (session-status envelope with `requestId`). **Piece we lift:** the `/api/ai-analysis` contract becomes `kb:evidence-source-ebb` + `tool:evidence-lookup` in Katailyst (Phases 2 and 8). EBB stays its own Vercel deployment; katailyst-1 calls it via `lib/integrations/ebb/client.ts`. **Top files to read:** `README.md`, `AGENTS.md`, `api/ai-analysis.ts` (or equivalent). **Related Session 2 phases:** Phase 2 (EBB contract + `kb:evidence-source-ebb`), Phase 8 (sidecar `/evidence` slash command). --- ## nurse-radar (Lovable FireCrawl scanner) **Status:** `integrate-piece` **On disk:** `~/Downloads/AI2 April/nurse-radar-main.zip` **Stack:** Vite · React · TypeScript · shadcn/ui · `.lovable/` metadata (Lovable-generated scaffold) **Role:** Topic radar that uses FireCrawl to keep the content pipeline aware of new nursing topics worth writing about. **Piece we lift:** FireCrawl scanner logic becomes `tool:topic-radar` — a Katailyst MCP hosted-execution handler (`lib/mcp/handlers/hosted-execution/topic-radar.ts`) normalizing FireCrawl responses to `{ items: [{ url, title, summary, freshness_score, topic_tags }] }`. **Top files to read:** `README.md`, `src/App.tsx`, `.lovable/plan.md`. **Related Session 2 phases:** Phase 6. After lift, archive the zip to `~/Downloads/.archive/2026-04-17-nurse-radar.zip`. --- ## content-creator-studio (PGAios experimental UI) **Status:** `archive` **On disk:** `~/Downloads/AI2 April/content-creator-studio-main/` **Stack:** Vite · TypeScript · Sentry · `VITE_PGAIOS_API_BASE=https://pga-schema-production.up.railway.app` **Why archived:** Superseded by `sidecar-system` as the canonical content authoring UI. `content-creator-studio` talks to a **different backend** (PGAios on Railway), not the Katailyst MCP. Keeping two "content creators" with different backends violates the one-truth-per-concept rule. **Do not depend on, do not revive without an explicit decision.** Patterns (glass-box loading, registry browser, quality evaluation display) can inform future sidecar-system work but should not be imported as-is. --- ## gpt-researcher **Status:** `reference-pinned` **On disk:** `~/Downloads/AI2 April/gpt-researcher/` **Role:** Research pattern reference. The Phase 5 NurseResearch Mastra workflow borrows its evaluator-optimizer loop shape (research → self-critique → requery until quality floor). **Do NOT fork or vendor.** Read for patterns, cite in Mastra workflow docstrings. --- ## nursing-career-ops / nurse-navigator-toolbox / ScraperVault **Status:** `reference-pinned` **On disk:** `~/Downloads/AI2 April/nursing-career-ops-main.zip`, `~/Downloads/AI2 April/nurse-navigator-toolbox-43-main.zip`, `~/Downloads/AI2 April/ScraperVault-main.zip` **Role:** Reference repos for future verticals. Do not wire until a concrete need + phase exists. --- ## NursingNexus (Replit nursing forum + Career Hub prototype) **On disk:** `~/Downloads/NursingNexus/` **Snapshot date:** March 2026 (exported 2026-03-26, pruned 2026-04-16) **Tree size:** ~21 MB (git history kept) **Stack:** React 18 · Vite · Wouter · TanStack Query · Tailwind · shadcn/ui · Express · Passport · Neon Postgres · Drizzle ORM · OpenAI GPT-4 · `ws` WebSockets ### What to read it for | Area | Specific files to study | Why it's useful for Katailyst / HLT Mastery | | ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **AI resume builder** | `client/src/pages/career-resume-page.tsx` + `POST /api/career/resume` in `server/routes.ts` | 4-step wizard with 5 specialty templates (New Grad RN, Experienced RN, Travel Nurse, CRNA/NP, Nurse Educator); GPT-4 produces ATS-optimized HTML resume with print-to-PDF. Maps directly onto HLT Mastery's NurseBid resume-gen flow. | | **Interview-prep STAR loop** | `client/src/pages/career-interview-page.tsx` + `/api/career/interview-start`, `/interview-feedback`, `/interview-summary` | Self-paced simulator: 8 specialty tracks, 10 GPT-4-generated questions, chat-style Q&A with STAR-framework feedback, final scorecard with ROI-framed improvements. Clean evaluator-optimizer loop pattern. | | **Specialty compass / match quiz** | `client/src/pages/career-compass-page.tsx` + `/api/career/specialty-match` | 8-question quiz → GPT-4 matches user to one of 12 nursing specialties (with 2026 salary data + demand badges). Lightweight persona-classification intake. | | **Gamification** | `client/src/components/gamification/` + `server/gamification-service.ts` | `level-badge` · `streak-counter` · `points-how-to-card` · `onboarding-checklist` components; `pointEvents` ledger table pattern. | | **Topic seeding** | `server/seed-topics.ts` | Auto-seeds 112 nursing topic templates across 7 NCLEX domains on first boot. Reusable shape for first-run registry seeding. | | **AI chat persona system** | `server/ai-service.ts` + `client/src/components/enhanced-ai-chat-modal.tsx` | `CAREER` / `INSTRUCTOR` / `STUDENT` / `CLINICIAN` personas switchable per conversation. | | **Admin-seeding UI** | `client/src/components/admin/admin-topic-library-tab.tsx`, `admin-seeding-tab.tsx` | Browse + filter 112 topic templates by NCLEX domain/category; seed individual or bulk. | ### What **not** to copy blindly - React 18 (project is on React 19) + Wouter (project is on Next.js App Router). - Custom scrypt auth (project uses Supabase auth). - Neon Postgres wiring (project uses Supabase Postgres). - Replit-specific config (`.replit`, `replit.nix`, `generated-icon.png`). ### Cross-links - Sibling summary: [~/Downloads/NursingNexus/README.md](../../../NursingNexus/README.md) - Related vision doc: [docs/references/vision/2026-04-12-hlt-healthcare-funnel-blueprint.md](vision/2026-04-12-hlt-healthcare-funnel-blueprint.md) — Career Hub patterns feed the blueprint's "BOTTOM: Conversion" layer. - Active planning: [docs/planning/active/katailyst-nursing-architecture.md](../planning/active/katailyst-nursing-architecture.md). ### Original source Archive: `~/Downloads/.archive/2026-04-16-hlt-inputs/NursingNexus.zip` — full Replit export (72 MB zipped, 168 MB unzipped, includes the Replit skill packs and AI-chat paste dumps stripped from the working copy). --- ## Integration candidates — locked 2026-04-17 The 5 concrete "integrate best pieces" decisions that feed the Session 2 plan (Phases 2, 6, 8, 9, 13, 14). Each row is dated, reversible, and points at a specific downstream phase. **Top-1% pattern borrowed:** Stripe "API Changelog" — every choice is dated, reversible, has a one-line rationale. [`stripe.com/docs/upgrades`](https://stripe.com/docs/upgrades) | Piece we lift | From repo | Lands as | Session 2 phase | Why lifted (1-line) | Why NOT the whole repo | | -------------------------------------------------------------------- | -------------------- | ------------------------------------------------------- | ---------------- | --------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- | | FireCrawl topic scanner (`src/App.tsx` + `src/components/`) | nurse-radar | `tool:topic-radar` (Katailyst MCP hosted-execution) | Phase 6 | One focused verb — `tool_execute('topic-radar', { domain, niche, limit })` — beats a separate Lovable app | Lovable scaffold is non-reusable; only the scanner logic is worth preserving | | STAR interview loop (`client/src/pages/career-interview-page.tsx`) | NursingNexus | `eval_case:star-interview-nursing@v1` | Phase 13 | Canonical nursing-persona regression eval that runs on every article/playbook change | Replit + Wouter + React 18 stack; we consume the pattern, not the code | | AI resume-gen wizard (`client/src/pages/career-resume-page.tsx`) | NursingNexus | `playbook:nursing-resume-gen@v1` | Phase 9 | 4-step + 5-specialty playbook plugs directly into sidecar-system articles domain | Same stack caveat; the prompt + wizard shape is the reusable asset | | `POST /api/ai-analysis` contract | EBB (HLT Data Analyzer) | `kb:evidence-source-ebb` + `tool:evidence-lookup` | Phases 2 + 8 | Evidence-grounded article blocks (stats, trends, citations) from HLT's own operational data | EBB stays its own Vercel deploy; we call it, we don't fork it | | NurseLink Hub Feed API | NurseLink Hub | `tool:job-feed-query` | Phase 14 | "Current market snapshot" tables in nursing articles (salary, certs, location mix) on every publish | Job scraper is its own service; we consume normalized output only | ### Archive - **content-creator-studio** (PGAios backend on Railway) — superseded by `sidecar-system`. Two canonical content creators violates one-truth-per-concept. Do not revive without an explicit decision. ### Reference-pinned (no lift planned) - `gpt-researcher` — research pattern inspiration for Phase 5 NurseResearch evaluator-optimizer loop; do not fork. - `nursing-career-ops`, `nurse-navigator-toolbox`, `ScraperVault` — future-vertical references; wire only when a concrete phase demands. ### Changelog - **2026-04-17** — Initial integrate-piece matrix locked. Atlas v2 + 6 sibling-repo entries added. Entries: Agent Canvas as `live-peer`, MasteryPublishing + Multimedia4Mastery as `live-service`, EBB + nurse-radar as `integrate-piece`, content-creator-studio as `archive`, gpt-researcher + nursing-career-ops + nurse-navigator-toolbox + ScraperVault as `reference-pinned`. Historical-snapshot banners added to `docs/reports/investigation/home-refactor-fallout-2026-04-15.md` and `mixed-refactor-implementation-plan-2026-04-15.md` (dangling refs to renamed `home-ux-redesign-deep.md` → `home-redesign.md` per commit `649fbff6`). --- --- ## Source: docs/references/integrations/ELEVENLABS_WHISPER_API.md # API Contracts Reference Complete API specifications for ElevenLabs Text-to-Speech and OpenAI Whisper services. --- ## ElevenLabs Text-to-Speech API ### Overview ElevenLabs provides a REST API for converting text to natural-sounding speech with multiple voice options and customization parameters. **Note:** ElevenLabs does not offer a native Speech-to-Text API. Their platform focuses exclusively on Text-to-Speech synthesis. For audio transcription, use OpenAI Whisper. ### Base URL ``` https://api.elevenlabs.io ``` ### Authentication Two authentication methods are supported: 1. **API Key in Header (Recommended)** ``` xi-api-key: {your-api-key} ``` 2. **Bearer Token** ``` Authorization: Bearer {your-api-key} ``` ### Text-to-Speech Endpoint #### HTTP Request ``` POST https://api.elevenlabs.io/v1/text-to-speech/{voice_id} ``` **Path Parameters:** - `voice_id` (string, required): The ID of the voice to use for synthesis **Headers:** ``` xi-api-key: {your-api-key} Content-Type: application/json ``` #### Request Body ```json { "text": "The quick brown fox jumps over the lazy dog", "model_id": "eleven_monolingual_v1", "voice_settings": { "stability": 0.5, "similarity_boost": 0.75 }, "language_code": "en", "pronunciation_dictionary_locator": null } ``` **Parameters:** | Parameter | Type | Required | Description | | ---------------------------------- | ------ | -------- | ----------------------------------------------------------------------------- | | `text` | string | Yes | The text to convert to speech (max length varies by plan) | | `model_id` | string | Yes | The model ID to use (e.g., `eleven_monolingual_v1`, `eleven_multilingual_v1`) | | `voice_settings` | object | No | Voice quality parameters | | `voice_settings.stability` | number | No | Stability factor (0.0-1.0, default: 0.5). Higher = more consistent | | `voice_settings.similarity_boost` | number | No | Similarity boost (0.0-1.0, default: 0.75). Higher = closer voice match | | `language_code` | string | No | Language code for synthesis (e.g., `en`, `es`, `fr`) | | `pronunciation_dictionary_locator` | string | No | Reference to custom pronunciation dictionary | #### Query Parameters - `output_format` (optional): Audio output format - `mp3_22050_32` (default): MP3, 22.05kHz, 32kbps - `mp3_44100_64`: MP3, 44.1kHz, 64kbps - `mp3_44100_96`: MP3, 44.1kHz, 96kbps - `mp3_44100_128`: MP3, 44.1kHz, 128kbps - `mp3_44100_192`: MP3, 44.1kHz, 192kbps - `pcm_16000`: PCM, 16kHz, 16-bit - `pcm_22050`: PCM, 22.05kHz, 16-bit - `pcm_24000`: PCM, 24kHz, 16-bit - `pcm_44100`: PCM, 44.1kHz, 16-bit - `ulaw_8000`: μ-law, 8kHz #### Response **Success (200 OK):** - Content-Type: `audio/mpeg` (or configured format) - Body: Binary audio data **Example Response Headers:** ``` HTTP/1.1 200 OK Content-Type: audio/mpeg Content-Length: 8234 Transfer-Encoding: chunked ``` **Error Response (400, 401, 429, 500):** ```json { "detail": { "status": "invalid_input_error", "message": "Invalid request parameters" } } ``` ### Available Voices ElevenLabs provides multiple pre-trained voices including: - Adam, Bella, Chris, Daniel, Elli, George, Harry, Jessica, James, John, Lewis, Linda, Matthew, Michael, Mimi, Oliver, Paige, Rachel, Ryan, Sam, Sophia, Steve, Tanya, Victoria (Voice availability and names may vary based on subscription plan) ### Rate Limiting - Rate limits vary by subscription tier - Responses include rate limit headers: ``` X-RateLimit-Limit-Requests: 3000 X-RateLimit-Limit-Characters: 1000000 ``` --- ## OpenAI Whisper API ### Overview OpenAI Whisper provides automatic speech recognition (ASR) for converting audio to text with support for 99 languages. ### Base URL ``` https://api.openai.com ``` ### Authentication ``` Authorization: Bearer {your-api-key} ``` ### Audio Transcription Endpoint #### HTTP Request ``` POST https://api.openai.com/v1/audio/transcriptions ``` **Headers:** ``` Authorization: Bearer {your-api-key} Content-Type: multipart/form-data ``` #### Request Body Multipart form-data format (NOT JSON): ``` --boundary Content-Disposition: form-data; name="file"; filename="audio.mp3" Content-Type: audio/mpeg [binary audio data] --boundary Content-Disposition: form-data; name="model" whisper-1 --boundary-- ``` **Parameters:** | Parameter | Type | Required | Description | | ------------------------- | ------ | -------- | -------------------------------------------------------------------- | | `file` | binary | Yes | Audio file to transcribe (max 25MB) | | `model` | string | Yes | Model to use (see Models section below) | | `language` | string | No | ISO-639-1 language code (e.g., `en`, `es`, `fr`) | | `prompt` | string | No | Optional text to guide the model's style or format | | `response_format` | string | No | Response format (default: `json`) | | `temperature` | number | No | Sampling temperature (0.0-1.0, default: 0). Set to 1 for more random | | `timestamp_granularities` | array | No | Granularity of timestamps (e.g., `["segment", "word"]`) | **Supported Audio Formats:** - mp3, mp4, mpeg, mpga, m4a, wav, webm **File Size Limit:** - Maximum 25MB #### Query Parameters None (all parameters use form-data) #### Response Formats **JSON (default):** ```json { "text": "The quick brown fox jumps over the lazy dog" } ``` **Text:** ``` The quick brown fox jumps over the lazy dog ``` **SRT (SubRip):** ``` 1 00:00:00,000 --> 00:00:02,500 The quick brown fox 2 00:00:02,500 --> 00:00:05,000 jumps over the lazy dog ``` **Verbose JSON:** ```json { "task": "transcribe", "language": "english", "duration": 5.0, "text": "The quick brown fox jumps over the lazy dog", "segments": [ { "id": 0, "seek": 0, "start": 0.0, "end": 2.5, "text": "The quick brown fox", "avg_logprob": -0.3456, "compression_ratio": 1.2, "no_speech_prob": 0.001, "words": [ { "word": "The", "start": 0.0, "end": 0.3 }, { "word": "quick", "start": 0.3, "end": 0.6 } ] } ] } ``` **VTT (WebVTT):** ``` WEBVTT 00:00:00.000 --> 00:00:02.500 The quick brown fox 00:00:02.500 --> 00:00:05.000 jumps over the lazy dog ``` ### Supported Models | Model | Features | Use Case | | --------------------------- | ---------------------------- | ----------------------------------------------- | | `whisper-1` | Standard Whisper model | General-purpose transcription | | `gpt-4o-transcribe` | High accuracy, fastest | Production transcription | | `gpt-4o-mini-transcribe` | Balanced performance/cost | Cost-efficient transcription | | `gpt-4o-transcribe-diarize` | Speaker diarization included | Multi-speaker transcription with identification | **Speaker Diarization Note:** The `gpt-4o-transcribe-diarize` model automatically identifies and labels different speakers in the audio. ### Rate Limiting - Rate limits vary by subscription tier - Check response headers: ``` X-RateLimit-Limit-Requests: 5000 X-RateLimit-Limit-Tokens: 80000 X-RateLimit-Remaining-Requests: 4999 X-RateLimit-Remaining-Tokens: 79500 X-RateLimit-Reset-Requests: 12m X-RateLimit-Reset-Tokens: 15s ``` ### Error Responses **400 Bad Request:** ```json { "error": { "message": "Invalid request: file size exceeds 25MB", "type": "invalid_request_error", "param": "file", "code": "invalid_upload_file" } } ``` **401 Unauthorized:** ```json { "error": { "message": "Incorrect API key provided", "type": "invalid_request_error", "param": null, "code": "invalid_api_key" } } ``` **429 Rate Limited:** ```json { "error": { "message": "Rate limit exceeded", "type": "server_error", "param": null, "code": "rate_limit_exceeded" } } ``` --- ## Integration Comparison | Aspect | ElevenLabs TTS | OpenAI Whisper | | -------------------- | --------------------------- | ------------------------------------- | | **Purpose** | Text → Speech | Audio → Text | | **Direction** | Synthesis | Transcription | | **Authentication** | xi-api-key header or Bearer | Bearer token | | **Request Format** | JSON | Multipart form-data | | **Response Format** | Binary audio | JSON/text/srt/vtt/verbose_json | | **File Size Limit** | N/A (text input) | 25MB maximum | | **Language Support** | Multiple (varies by model) | 99 languages | | **Voice Options** | 20+ pre-trained voices | N/A (audio input) | | **Customization** | Stability, similarity boost | Temperature, language hint, prompt | | **Output Quality** | MP3 or PCM variants | Segment-level + word-level timestamps | --- ## Usage Notes ### ElevenLabs TTS - Use `text` parameter for the content to synthesize - Adjust `stability` and `similarity_boost` in voice_settings to fine-tune audio quality - Select appropriate `model_id` based on language needs (monolingual vs. multilingual) - Choose `output_format` based on quality/bandwidth requirements ### OpenAI Whisper - Ensure audio file is under 25MB; compress if needed - Include `language` parameter for better accuracy if known - Use `response_format: "verbose_json"` to get segment and word-level timing - Use `timestamp_granularities: ["word"]` to get word-level timing without verbose output - Provide `prompt` parameter to guide style (e.g., "Names: John, Mary, Alex") for technical terms - Select `gpt-4o-transcribe-diarize` for multi-speaker scenarios requiring speaker identification --- ## Source: docs/references/integrations/FRAMER_PAGE_BUILD_GUIDE.md # HLT Mastery -- Framer Page Build Guide **Purpose**: Everything a Framer developer (or AI agent) needs to build the Resources section of hltmastery.com. Self-contained. No external dependencies. **Project**: https://framer.com/projects/HLTMastery-com--montTdTggA8zwlRhAdOf-4L1i6 **Status**: All work in DRAFT. Nothing publishes to the live site without explicit approval. **Live site**: hltmastery.com (24K visitors/week -- do NOT break it) --- ## 1. What We're Building A demand-driven content hub (internally called "NCLEX-ipedia") that serves as the definitive resource for every standardized exam HLT covers. Three page types: 1. **Resources Index** (`/nursing/nclex-rn/resources`) -- filterable grid of all articles 2. **Resource Detail** (`/nursing/nclex-rn/resources/[slug]`) -- individual article page 3. **Exam Hub pages** (future) -- per-exam landing pages with curated content sections The Resources CMS collection already exists with 27 fields, 5 seeded articles, and both the Index and Detail pages created as DRAFTS under `/nursing/nclex-rn/resources`. --- ## 2. CMS Collections ### Resources Collection (27 fields, target 30) | # | Field Name | Type | Framer ID | Purpose | | --- | ----------------- | ------------- | ---------- | ------------------------------------------ | | 1 | Title | string | IH5gZlf4d | Article headline | | 2 | Subtitle | string | WPPB_XFh3 | Deck/subheadline | | 3 | Status | enum | TuSsd9Js3 | Workflow state | | 4 | Category | enum | yzhyQF9at | Content category for filtering | | 5 | Brand | enum | HOwvFOzp3 | Which HLT product line | | 6 | Headers | string | ywVbBKQSl | H2/H3 outline for TOC generation | | 7 | Author | string | ScK12AeaB | Author name (text, not reference) | | 8 | Card Type | enum | o_FSze7Nh | Article/Video/Podcast/Guide/Checklist/Tool | | 9 | Tags | string | BabKfXjuv | Comma-separated tags | | 10 | Card Text Summary | string | rKVjMFGgG | Short excerpt for card display | | 11 | Featured Image | image | IE7eK3we\_ | Hero image for article + card | | 12 | Card Image | image | h3lr4vygq | Optional smaller card thumbnail | | 13 | Read Time | string | PPgmv4By1 | e.g. "8 min read" | | 14 | Date Published | date | b1WoSXKSe | Publication date | | 15 | Related Resources | string | vUPYOrQDQ | Internal linking references | | 16 | CTA Type | enum | y9b6KiqrM | Internal/External/App Store/Download | | 17 | CTA Text | string | PG_WQFKKB | CTA button copy | | 18 | CTA URL | link | qhHVnF1Ko | CTA destination | | 19 | SEO Title | string | CHdj5t4nA | Custom `` override | | 20 | SEO Description | string | RyhsY0AET | Meta description override | | 21 | Featured | boolean | ifOh8La1u | Pin to Featured section | | 22 | Excerpt | string | dQJN8Da2x | Longer excerpt for meta/cards | | 23 | Social Image | image | Y66ivX9Og | OG/Twitter share image (1200x630) | | 24 | Product | enum | t090x3hEh | Which exam this is for | | 25 | Content Type | enum | R4XGro9yU | Article format/structure | | 26 | Trending Score | number | sRQiDrzDn | 0-100, powers trending sort | | 27 | Content | formattedText | GmxpCsLOH | Full article body (HTML) | **3 fields to add** (use remaining slots 28-30): - `FAQ JSON` (string) -- JSON array of `{"q":"...","a":"..."}` for FAQ accordion + schema - `Citations` (string) -- Numbered citation list - `Katailyst ID` (string) -- Sync key to backend system ### Status Enum Values | Value | Framer ID | Meaning | | ------------------------ | ---------- | ----------------------------- | | New AI Draft | PyBcl8BLR | Initial AI-generated content | | Published | opw0b8yRo | Live on site | | Scheduled | VBKM2yNR\_ | Scheduled for future publish | | Waiting On SME Review | S7yGz7pOq | Needs expert review | | Trash - Error/incomplete | zIVNElMVb | Bad content, keep for records | | Trash - Content Quality | zqCOsSijq | Didn't meet quality bar | ### Category Enum Values | Value | Framer ID | | -------------------------------- | --------- | | General | aKCC_DmKM | | Psychology & Mentality | w7iq0OMni | | Video Centric | H1GZRxnOk | | Lessons From Students Who Passed | DDd4QBKDS | | Data Deep Dive | fwWyt9ePI | | Tips & Tricks | Pxoxk9ATA | | QBank Item Walkthrough | jATU13FXI | | High yield concept deep dive | SsYb8Tzsy | | Common Questions | MEGRqe2UT | | Fundamentals | rTukmyavc | ### Brand Enum Values | Value | Framer ID | | ---------------- | --------- | | HLT Mastery | iPH5KOLVC | | NCLEX RN Mastery | PU2uWCETG | | NCLEX PN Mastery | x1FKOcImL | | FNP Mastery | exewAwcjH | | ASVAB Mastery | CnZ8gljBW | | TEAS Mastery | oftwINmML | | PTCB Mastery | hbvuEQ7W0 | | Nursing Mastery | CYZ7yfZTQ | | Dental Mastery | JFeyJUnBL | | PMHNP Mastery | bcpO2Amur | ### Product Enum Values | Value | Framer ID | | ---------------- | --------- | | NCLEX-RN Mastery | paRVi5r0g | | NCLEX-PN Mastery | yfEWj7LuO | | FNP Mastery | mVTN_0eL9 | | ASVAB Mastery | sM9Gt1Sy9 | | TEAS Mastery | s57NjGR7F | | HESI Mastery | Upf1zkHKy | | CNA Mastery | Avd2m2HJY | | MCAT Mastery | pEowvttIT | | LPN Mastery | VPb3R_Jv1 | | EMT Mastery | JlbiH0nGP | | General | Tw7r7dqV2 | ### Content Type Enum Values | Value | Framer ID | | ---------- | --------- | | Deep Dive | XdzSQMT1k | | How To | gOMc6WsKu | | Tutorial | ml9gQO27H | | Listicle | I_FWDjunn | | Case Study | UQKe0YUsW | | Comparison | yb0HDCfR4 | | FAQ | Hf3ndrEUu | | News | Q354u7CKt | | Opinion | O2Ar7LAFV | | Cheatsheet | j5lh0MAJL | | Quiz Prep | L5rBVI9nr | | Exam Hub | slW23usEr | ### Card Type Enum Values | Value | Framer ID | | --------- | --------- | | Article | Dzegf43WQ | | Video | l4Ekrx1cJ | | Podcast | DEsdNivsX | | Guide | X_Ht5PEEm | | Checklist | caz2ugY5Q | | Tool | gslRHyiin | ### CTA Type Enum Values | Value | Framer ID | | --------- | --------- | | Internal | NcxDmjbpV | | External | UDsnzai8f | | App Store | P84WVmjgm | | Download | jr3j2wSub | ### Authors Collection (10 fields) | # | Field | Type | Framer ID | | --- | ---------- | ------- | --------- | | 1 | Name | string | MrTbhuHnc | | 2 | Instagram | link | l17WU4meu | | 3 | TikTok | link | EZSmBKSUc | | 4 | Active | boolean | TPoJCOcyv | | 5 | Avatar | image | Mt2XkO_xd | | 6 | Role | string | QJBDUNHWG | | 7 | Agent Type | enum | cttp4fWIX | | 8 | LinkedIn | link | XHU39egTm | | 9 | Twitter/X | link | Oyij_Hn3K | | 10 | Bio | string | oDcRL86ke | --- ## 3. Design System ### Color Palette | Role | Hex | CSS Variable | Usage | | --------------------------- | ------- | ------------------------ | -------------------------------------------------- | | Primary Blue (Mastery Blue) | #155EEF | `--color-primary` | CTAs, links, active states, progress bar, headings | | Primary Hover | #1048CC | `--color-primary-hover` | Button/link hover | | Rich Black | #101828 | `--color-text-primary` | Body text, headings, card titles | | Medium Gray | #667085 | `--color-text-secondary` | Secondary text, meta info | | Off White | #F9FAFB | `--color-bg-light` | Page backgrounds, section dividers | | White | #FFFFFF | `--color-bg-card` | Card backgrounds | | Blue Tint | #EFF4FF | `--color-bg-accent` | Table headers, filter bar, callout backgrounds | | Amber | #F79009 | `--color-warning` | Trending indicators, attention badges | | Gold | #EAAA08 | `--color-gold` | Premium/featured badges | | Success Green | #2FBF71 | `--color-success` | Pass rates, correct answers | | Critical Red | #E5484D | `--color-danger` | Error states, fail indicators | | Border Gray | #E5E7EB | `--color-border` | Card borders, dividers | ### Typography All text uses **Inter** (available in Framer's font library). | Role | Size (Desktop) | Size (Mobile) | Weight | Line Height | | -------------------- | -------------- | ------------- | -------------- | ----------- | | Page Title (H1) | 36px | 28px | Bold (700) | 1.2 | | Section Heading (H2) | 28px | 22px | Bold (700) | 1.25 | | Subsection (H3) | 22px | 18px | SemiBold (600) | 1.3 | | Body Text | 18px | 16px | Regular (400) | 1.65 | | Card Title | 18px | 16px | SemiBold (600) | 1.3 | | Card Excerpt | 14px | 14px | Regular (400) | 1.5 | | Meta/Caption | 14px | 12px | Regular (400) | 1.4 | | Badge/Label | 12px | 12px | SemiBold (600) | 1.2 | ### Responsive Breakpoints | Name | Width | Grid | Notes | | ---------- | ----------- | ----- | -------------------------------- | | Desktop XL | 1440px+ | 3-col | Full layout, sidebar visible | | Desktop | 1024-1439px | 3-col | Slightly narrower content | | Tablet | 768-1023px | 2-col | No sidebar, TOC as dropdown | | Mobile | 375-767px | 1-col | Full-width cards, stacked layout | --- ## 4. Resources Index Page ### Hero Section (~300px height) - **Headline**: "Free NCLEX Resources & Study Guides" (H1, white text) - **Subtitle**: "Expert-written articles, study guides, and practice question walkthroughs" - **Search bar**: Prominent, centered, placeholder "Search resources..." - **Background**: Gradient from #155EEF to #1048CC with subtle pattern overlay ### Dynamic Filter Bar (below hero) Using Framer's native Dynamic Filters (Feb 2026 feature): | Filter | Type | Bound To | Options | | ------------- | ------------ | -------------------- | ----------------------------------------------------------------------- | | Search | Search Field | Title, Tags, Excerpt | Free text | | Content Type | Tabs | Content Type enum | All \| Deep Dives \| How-To \| Cheatsheets \| Quiz Prep \| Case Studies | | Category | Dropdown | Category enum | All Categories + each value | | Product/Exam | Dropdown | Product enum | All Exams + each value | | Featured Only | Toggle | Featured boolean | On/Off | Setup: Select collection list > right panel > Filters > Dynamic > add each filter type. Framer auto-creates page variables that bind the filter input to the collection list. URL updates automatically with filter state (e.g., `/resources?type=deep-dive&category=tips-tricks`), enabling shareable filtered views. ### Article Card Grid - **Desktop**: 3-column grid, 24px gap - **Tablet**: 2-column, 20px gap - **Mobile**: 1-column, full-width **Card component anatomy:** 1. **Featured Image**: 16:9 ratio, rounded top corners (8px), object-fit: cover 2. **Category badge**: Pill on top-left of image, white text on semi-transparent dark bg 3. **Content Type icon**: Small (16px) icon next to category 4. **Title**: H3, max 2 lines, text-overflow: ellipsis, Inter SemiBold 18px, #101828 5. **Excerpt**: Max 2 lines, Inter Regular 14px, #667085 6. **Author bar**: Avatar (24px circle) + author name (14px SemiBold) + dot + date + dot + read time 7. **Trending indicator**: Visible when Trending Score > 70. Flame icon + "Trending" in amber **Hover state**: Card lifts 4px (translateY), box-shadow: `0 8px 25px rgba(0,0,0,0.1)`, image scales to 1.02x, title transitions to #155EEF. Duration 0.2s ease-out. **Sort order**: Featured pinned to top row, then Publish Date descending. **Pagination**: "Load More" button at bottom, 12 articles per page. ### Sidebar (Desktop only, 300px right) - **Popular Articles**: Top 5 by Trending Score (compact list: title + read time) - **Newsletter Signup**: Email input + subscribe button (Framer native form with webhook) - **Exam Quick Links**: Pill buttons linking to filtered views per exam ### Empty State When no results match filters: magnifying glass icon + "No resources found" + "Try adjusting your filters" + Clear Filters button. --- ## 5. Resource Detail Page ### Reading Progress Bar - 3px bar fixed to viewport top, full width - Color: #155EEF - 0% at article body top, 100% at bottom - Implement via Framer's scroll-based animation (Effects > Scroll Transform on width) ### Hero Section **With Hero Image**: Full-width image with gradient overlay (transparent to rgba(0,0,0,0.4)). On top: - Category badge (pill, white text) - Title (H1, white, Inter Bold, 36px desktop / 28px mobile, max-width 800px centered) - Subtitle (if exists, 20px white, 80% opacity) - Author bar (avatar + name + role + date + read time, white text) - Share buttons (Twitter, LinkedIn, Facebook, Copy Link) -- right-aligned on desktop **Without Hero Image**: Solid #F9FAFB background with dark text (color-inverted layout). ### Table of Contents (Conditional: Content Type = Deep Dive, Exam Hub) - **Desktop**: Floating left sidebar, sticky at 120px from top - Auto-generated from H2/H3 headings in Content field - Active heading highlighted with 2px blue left border (#155EEF) - **Mobile**: Collapses to dropdown via sticky "Contents" button ### Article Body (Content field) - Max-width: 720px (65-70 chars per line) - Font: Inter Regular 18px, line-height 1.65, #101828 - Headings: H2 28px Bold, H3 22px SemiBold, H4 18px SemiBold - Images: Full body width (720px), 8px rounded corners - Links: #155EEF, underline on hover - Lists: Proper bullet/number rendering from HTML - Code blocks: monospace 14px, #F5F5F5 background, 16px padding ### Pull Quotes (Conditional: Content includes `<blockquote>`) - 4px left border #155EEF - Italic text at 22px - Light blue background (#EFF4FF) - 24px padding ### FAQ Accordion (Conditional: FAQ JSON field not empty) - Section heading: "Frequently Asked Questions" (H2) - Each Q/A as expandable accordion item - Question: bold 16px trigger - Answer: regular 16px collapsible body - First item expanded by default - Smooth expand/collapse 0.3s ease-out ### CTA Block (Conditional: CTA Type != none) Based on CTA Type enum: - **App Store**: Blue gradient card, app icon, store badges, value prop - **Internal**: White card with blue border, headline, description, button - **External**: Standard CTA with link - **Download**: Download icon + button ### Related Articles - "You Might Also Like" section - 3 cards (desktop) / 2 (mobile) - Filtered by same Category OR same Product, excluding current article - Sorted by Date Published descending ### Author Bio Section - Large avatar (80px), Name (H3), Role, Bio - LinkedIn button if field exists - If Agent Type is not "human" and not empty: "AI-Assisted Content" badge with tooltip --- ## 6. Conditional Visibility Rules Set these in the Framer editor: select element > right panel > Visibility > set condition. | Section | Condition | Content Types That Use It | | ------------------- | ---------------------------------------- | ------------------------------------- | | Table of Contents | Content Type = "Deep Dive" OR "Exam Hub" | Deep Dive, Exam Hub | | Progress bar | Content Type = "Deep Dive" OR "Exam Hub" | Deep Dive, Exam Hub | | FAQ Accordion | FAQ JSON is not empty | Any type with FAQ data | | CTA Block | CTA Type is not empty | Any type with CTA | | Pull Quote styling | Content contains `<blockquote>` | Any (handled by CSS in formattedText) | | Trending badge | Trending Score > 70 | Any with high score | | Author Bio LinkedIn | Author LinkedIn is not empty | Authors with LinkedIn | | AI Content badge | Agent Type != "human" AND != empty | AI-authored content | --- ## 7. Schema.org JSON-LD (Code Component) Add a custom code component on the detail page template that generates structured data: ### Article Schema (every article) ```json { "@context": "https://schema.org", "@type": "Article", "headline": "{{Title}}", "description": "{{SEO Description || Excerpt}}", "image": "{{Featured Image URL}}", "author": { "@type": "Person", "name": "{{Author}}" }, "publisher": { "@type": "Organization", "name": "HLT Mastery", "logo": { "@type": "ImageObject", "url": "https://hltmastery.com/logo.png" } }, "datePublished": "{{Date Published}}", "mainEntityOfPage": "{{Page URL}}" } ``` ### FAQ Schema (when FAQ JSON exists) ```json { "@context": "https://schema.org", "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "{{question}}", "acceptedAnswer": { "@type": "Answer", "text": "{{answer}}" } } ] } ``` ### Breadcrumb Schema (every page) ```json { "@context": "https://schema.org", "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://hltmastery.com" }, { "@type": "ListItem", "position": 2, "name": "NCLEX-RN", "item": "https://hltmastery.com/nursing/nclex-rn" }, { "@type": "ListItem", "position": 3, "name": "Resources", "item": "https://hltmastery.com/nursing/nclex-rn/resources" }, { "@type": "ListItem", "position": 4, "name": "{{Title}}" } ] } ``` --- ## 8. SEO Configuration In the detail page template's Page Settings: | Setting | Value | | -------------------- | --------------------------------------------------- | ------------- | | Title | `{{SEO Title}}` (falls back to `{{Title}} | HLT Mastery`) | | Description | `{{SEO Description}}` (falls back to `{{Excerpt}}`) | | Social Preview Image | Bind to `{{Social Image}}` field | | Canonical URL | Auto (Framer handles this) | ### AEO (AI Agent Optimization) Rules Every article must follow these for AI citation: 1. Direct answer in the first 50 words of the Content field 2. FAQ schema when FAQ data exists 3. Comparison tables and numbered lists in parseable HTML 4. Author credentials visible (E-E-A-T signals) --- ## 9. Plugin Stack Install in this order via Insert > Plugins: | Priority | Plugin | Purpose | | -------- | ---------------------- | ---------------------------------------------------------------- | | 1 | **CMS Sheets** | Spreadsheet-style bulk CMS editor. Essential during build phase. | | 2 | **Semflow** | SEO audit -- title tags, meta, heading structure, alt text | | 3 | **MetaViewer** | Live preview of Google/social search appearance | | 4 | **Google Analytics 4** | Set measurement ID in Site Settings > General > Analytics | | 5 | **Smart Contrast** | Accessibility contrast checker (free) | GA4 Measurement ID: `G-41YBVZDEJ2` (already configured in project settings per screenshot). --- ## 10. Files Tab Upload to the Files tab in Site Settings: ### robots.txt ``` User-agent: * Allow: / Sitemap: https://hltmastery.com/sitemap.xml ``` ### llms.txt (AI Agent Discoverability) ``` # HLT Mastery ## About HLT Mastery is the leading test prep platform for nursing and allied health exams including NCLEX-RN, NCLEX-PN, FNP, TEAS, ASVAB, PANCE, and DAT. ## Content Types - Study guides and deep dives on exam topics - Practice question walkthroughs with detailed rationales - Exam day preparation checklists - Score interpretation and retake guidance - Career guides for healthcare professionals ## Expertise - 4,000+ expert-written practice questions - 27,000+ monthly active nursing app users - Coverage across 7+ standardized exams ## Resources - /nursing/nclex-rn/resources - NCLEX-RN study resources - /nursing/nclex-rn/resources/[slug] - Individual resource articles ## Contact - Website: https://hltmastery.com - Apps: Available on iOS and Android ``` --- ## 11. Safety Rules 1. **NEVER publish to the live site** without explicit approval from Alec 2. All work stays in DRAFT until reviewed 3. `deploy()` is NEVER called from any script -- only `publish()` (which creates previews) 4. Do not modify any published page (Home, /pass-guarantee, /nursing/nclex-rn, etc.) 5. Do not rename any existing URL slug 6. Do not delete any existing collection or collection items 7. The Resources-old collection (Airtable-synced, 5 items) is deprecated -- leave it alone 8. Laura's /blog-testing-laura page is her work -- leave it alone --- ## 12. Content Guidelines for Seeded Articles Every article should follow these structural patterns: 1. **First 50 words**: Directly answer the question the user searched for (AEO optimization) 2. **Structure**: H2 for major sections, H3 for subsections. No H1 in Content (the Title field is the H1) 3. **Length by Content Type**: - Deep Dive: 1,500-3,500+ words - How To: 600-1,400 words - Cheatsheet: 500-1,000 words - Listicle: 500-1,200 words - Case Study: 800-1,500 words 4. **Tone**: Empathetic, expert, accessible. "You" form. No corporate jargon. No exclamation points in headlines. 5. **Images**: 16:9 hero, real photography (not AI-perfect stock), diverse representation 6. **FAQ**: Include 3-5 FAQ pairs in the FAQ JSON field for Schema.org rich results 7. **CTA**: Every article should have an appropriate CTA type set (app-download for most NCLEX content) --- ## Source: docs/references/integrations/HLT_CLOUDINARY_DAM.md # HLT Cloudinary DAM — Canonical Reference Single source of truth for the HLT Cloudinary Digital Asset Management setup. Every doc, KB, migration, or runtime helper that describes the DAM must match this page. If you find a conflict elsewhere, this page wins — file a PR to align the other doc, not the other way around. ## Cloud | Field | Value | | ---------------- | ---------------------------------- | | Cloud name | `hlt-media` | | Cloud ID | `c-1e2a3dbe7b0abcf38e49df4f50a4da` | | Automatic backup | Enabled | | Plan limits | Image 20MB, video 2GB, raw 20MB | Legacy cloud `dq9xmts6p` was retired. Migration `database/205-rebind-cloudinary-cloud-name.sql` rewrites stored URLs in `assets`, `asset_versions`, and `entity_revisions` from the legacy cloud to `hlt-media`. Historical migrations 093 and 204 are intentionally preserved as historical artifacts. ## Environment variables Configured per-environment in Vercel and locally in `.env.local` (gitignored): | Var | Purpose | | ----------------------- | ----------------------------------------------------------------------- | | `CLOUDINARY_URL` | Full connection string: `cloudinary://<api_key>:<api_secret>@hlt-media` | | `CLOUDINARY_CLOUD_NAME` | `hlt-media` | | `CLOUDINARY_API_KEY` | Non-secret key | | `CLOUDINARY_API_SECRET` | Secret — stored in Vercel/Supabase Vault only | Runtime fallback: `lib/data/multimedia.ts` and `scripts/integrations/framer-sync-articles.ts` default to `hlt-media` when `CLOUDINARY_CLOUD_NAME` is unset so dev workflows do not require a full env bootstrap. ## Folder structure Max depth 2. Leaf folders may hold assets directly. ``` Home/ ├── articles/ │ ├── blog/ │ ├── help-center/ │ └── knowledge-base/ ├── branding/ │ ├── colors/ │ ├── fonts/ │ ├── guidelines/ │ └── logos/ ├── in-app/ │ ├── asvab-mastery/ │ ├── fnp-mastery/ │ ├── nclex-mastery/ │ ├── nurse-recruiting/ │ └── shared-components/ ├── inbox/ # Upload landing zone — Media Library default ├── multimedia/ # Rich media (replays, components, video) ├── products/ # Product marketing ├── samples/ # Cloudinary default samples ├── shared/ # Cross-team shared assets └── social/ # Social graphics ``` Sub-folder conventions: - `in-app/<product-slug>/` — one folder per product. Slug matches `CONTENT_TAXONOMY.md` app codes (e.g., `nclex-mastery`, `fnp-mastery`). - `social/<platform>/` — optional sub-foldering by platform (`instagram`, `linkedin`, `youtube`, etc.). - `products/app-store-screenshots/`, `products/mockups/`, `products/feature-graphics/` — optional sub-folders for marketing variants. ## Upload presets All three are signed mode with `overwrite: true` and `use_filename: true`. | Preset | Default folder | Notes | | ------------------- | -------------- | -------------------------------------------------------------------- | | `hlt-web-optimized` | (none) | API uploads; routes wherever specified. `unique_filename: true`. | | `hlt-default` | `inbox/` | API uploads default landing zone. `unique_filename: false`. | | `ml_default` | `inbox/` | Media Library UI default (image/video/raw). `unique_filename: true`. | Default flow: uploads land in `inbox/` and get triaged into the right folder. From `inbox/approved/`, `inbox/pending-review/`, `inbox/rejected/` sub-folders if the review flow is active. ## Structured metadata schemas Every asset can be tagged with all five fields. | Field | Type | Values | | ------------ | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `status` | single-select | `draft`, `review`, `approved`, `published`, `archived` | | `source` | single-select | `fal`, `dall-e`, `midjourney`, `stable-diffusion`, `canva`, `figma`, `stock`, `manual`, `screenshot`, `agent`, `user-upload`, `flux`, `ideogram`, `leonardo`, `runway`, `suno`, `elevenlabs`, `cloudinary`, `imported`, `google`, `openai`, `anthropic`, `replicate`, `unsplash`, `pexels` | | `app_id` | multi-select | `nclex-rn`, `nclex-pn`, `fnp`, `pmhnp`, `teas`, `asvab`, `dat`, `hesi`, `pance`, `mcat`, `cna`, `emt`, `ptcb`, `acls`, `ecg`, `nbdhe`, `inbde`, `cst` | | `vertical` | single-select | `nursing`, `nurse-practitioner`, `military`, `dental`, `allied-health`, `health-sciences`, `shared` | | `asset_type` | single-select | `icon`, `illustration`, `photo`, `screenshot`, `logo`, `banner`, `thumbnail`, `avatar`, `background`, `og_image`, `hero`, `social_card`, `watermark`, `splash_screen`, `audio`, `video`, `mnemonic`, `infographic`, `diagram`, `chart` | All five fields are not read-only, not mandatory, and do not allow dynamic value addition by users. Add new values by editing the schema in Cloudinary Settings → Metadata. ## Named transformations | Name | Behavior | | ----------------------- | -------------------------------------------------------- | | `f_auto/q_100.jpg` | Auto-format, quality 100, force `.jpg` extension | | `e_auto_enhance` | Auto-enhance effect | | `t_media_lib_thumb.jpg` | Library thumbnail (uses `media_lib_thumb`), force `.jpg` | | `media_lib_thumb` | Limit to 150 × 100 | Additional transformations are maintained via the Cloudinary Admin API. Suggested additions for common use cases: ``` # Hero / OG image — 1200×630 c_fill,w_1200,h_630,f_auto,q_auto # Social square — 1080×1080 c_fill,w_1080,h_1080,f_auto,q_auto # App icon — 512×512 c_fill,w_512,h_512,f_png # Gallery thumb — 384×288 (used by lib/data/multimedia.ts) f_auto,q_auto,w_384,h_288,c_fill,g_auto ``` ## How runtime code uses this - `lib/data/multimedia.ts` — builds preview/thumbnail URLs via `CLOUDINARY_THUMB_BASE`. Reads `CLOUDINARY_CLOUD_NAME` env with `hlt-media` fallback. - `scripts/integrations/framer-sync-articles.ts` — builds hero/social/og/gallery URLs via `CLOUDINARY_BASE`. Same env + fallback. - Tool call_specs (`tool:cloudinary.upload`, `tool:cloudinary.transform`, `tool:cloudinary.manage`) — URL templates are stored in `entity_revisions.content_json` and rebinded by migration 205. - Image host allowlist in `next.config.mjs` and CSP `img-src` allow `res.cloudinary.com` unconditionally — no change needed on cloud rename. ## See also - `docs/references/integrations/INTEGRATION_CONTRACT_MULTIMEDIA_MASTERY.md` — companion sidecar that orchestrates image generation → Cloudinary upload. - `docs/references/integrations/TOOL_INTEGRATION_FAMILIES_RUNBOOK.md` — where this DAM fits in the broader tool-integration taxonomy. --- ## Source: docs/references/integrations/INTEGRATION_CONTRACT_MARKETO.md # Marketo Integration Contract ## Purpose Adobe Marketo Engage is HLT's marketing automation engine. Katailyst uses it for: - Lead creation and sync (app users pushed to Marketo via webhooks) - Engagement program enrollment (7 exam-specific nurture streams) - Campaign triggers (event-driven scoring and lifecycle transitions) - Email delivery (AI-generated content from Katailyst delivered via Marketo) - Performance feedback loop (campaign metrics fed back to Katailyst rubrics) ## Canonical Connection Points - Instance: `758-CUU-617` - Cluster: `sj-pod-sj31` - REST Endpoint: `https://758-CUU-617.mktorest.com/rest` - Identity Endpoint: `https://758-CUU-617.mktorest.com/identity` - OAuth Token URL: `https://758-CUU-617.mktorest.com/identity/oauth/token` - Munchkin ID: `758-CUU-617` - Contact Volume: 838,000+ ## Auth Contract Canonical vault secret keys: - `marketo/client-id` — OAuth client_credentials client ID - `marketo/client-secret` — OAuth client_credentials client secret Grant type: `client_credentials` (server-to-server, no user interaction). Token lifetime: 3600 seconds. Client caches and refreshes proactively at 60s before expiry. ## Mode Contract - `shadow` — Read-only operations (health checks, program listing, lead lookups). Default mode. Safe for exploration. - `live` — Full read-write operations (lead sync, campaign triggers, email sends). Requires explicit promotion. Mode is stored in `integration_accounts.settings_json.mode` and resolved by `lib/marketo/config.ts`. ## Client Library - **Client:** `lib/marketo/client.ts` — `MarketoClient` class - **Config:** `lib/marketo/config.ts` — `resolveMarketoConfig`, `resolveMarketoHealth` - **Types:** `lib/marketo/types.ts` — All Marketo REST API types - **Factory:** `createMarketoClient({ supabase, orgId })` Available operations: - `syncLeads()` — Create/update leads (batch, max 300) - `getLeadsByEmail()` — Lookup leads by email - `addLeadsToList()` — Add leads to static list - `getLists()` — Get all static lists - `getPrograms()` — Get programs (engagement, email, default, event) - `getProgram(id)` — Get specific program - `getCampaigns()` — Get smart campaigns - `triggerCampaign()` — Trigger campaign for specific leads - `scheduleCampaign()` — Schedule batch campaign - `testConnection()` — Verify OAuth token fetch - `isConfigured()` — Check config completeness ## Delivery Integration Marketo is a first-class platform in the unified delivery pipeline: - **Platform type:** `marketo` in `DeliveryPlatform` union - **Capabilities:** `email`, `lead` - **Content payload fields:** `subject`, `preheader`, `fromName`, `replyTo`, `leadListId` - **Adapter:** `lib/delivery/adapter.ts` maps Marketo targets - **Queue:** `lib/delivery/queue.ts` maps marketo to `email` content type ## Rate Limits - 50,000 API calls/day - 100 calls per 20 seconds - 10 concurrent requests (target 5 in practice) - 300 records per request (target 250 for safety) - HTTP 606 = rate limit exceeded (retry with exponential backoff) ## MCP Tool Surface Tools registered as direct MCP tools (not registry entities): - `marketo` — Multiplexed tool with actions: health, get-programs, get-leads, sync-leads, trigger-campaign - `delivery.schedule` — Existing tool supports `marketo` platform for email scheduling ## Scripts - `scripts/integrations/bootstrap_marketo_integration_account.ts` — Create/update integration account - `scripts/integrations/verify_marketo_activation.ts` — Validate all checks pass ## Health Endpoint - `GET /api/integrations/marketo/health` — Returns structured health report ## Current Verification Status Integration account bootstrapped. Vault secrets configured. Live connection test pending. ## Katailyst Content Pipeline Content flows: Katailyst recipe execution (agent produces content_json) -> middleware transforms to Marketo HTML -> Marketo REST API uploads email asset -> engagement program delivers to scored segment -> performance data feeds back to Katailyst rubric evaluation. See `hlt-marketo-automation-machine-spec.docx` for the full automation spec. --- ## Source: docs/references/integrations/INTEGRATION_CONTRACT_MULTIMEDIA_MASTERY.md # Multimedia Mastery Integration Contract ## Purpose Multimedia Mastery is a media-native execution sidecar. Katailyst should use it for: - discovery of available media capabilities - remote MCP connectivity - image and media execution flows - studio deep links and workflow handoff For agent-facing plain-English use cases and parent-lane routing, use: - `docs/references/ai-agents/MULTIMEDIA_CAPABILITY_REFERENCE.md` Katailyst should not flatten Multimedia Mastery into a generic skill. It should be treated as: - tool surface - integration account - linked KB and playbook support ## Canonical Connection Points - Base URL: `https://multimediamastery.vercel.app` - Capabilities: `https://multimediamastery.vercel.app/api/media/v1/capabilities` - Tool Catalog: `https://multimediamastery.vercel.app/api/media/v1/tools` - Remote MCP: `https://multimediamastery.vercel.app/api/media/v1/mcp` - Health: `https://multimediamastery.vercel.app/api/media/v1/health` Agents should discover `capabilities` and `tools` first before using execution lanes. ## Current Verification Status Verified on 2026-03-06: - public discovery surfaces respond successfully - remote MCP metadata endpoint responds successfully - health route responds successfully - the currently stored bearer token did **not** authenticate successfully against an authenticated endpoint That means: - the integration surface is real - the vault path is present - the current token is not yet proven-good for authenticated production use Do not treat Multimedia Mastery as a Tier 1 proven execution lane until authenticated verification passes. ## Auth Contract Canonical vault secret key: - `multimedia-mastery/agent-token` Recommended vault name shape: - `org/<org-code>/multimedia-mastery/agent-token` Supported env vars for Katailyst runtime: - `MULTIMEDIA_HUB_TOKEN` - `MM_AGENT_TOKEN` - `MULTIMEDIA_HUB_BEARER_TOKEN` Supported base URL env vars: - `MULTIMEDIA_HUB_URL` - `MM_HUB_URL` - `NEXT_PUBLIC_MULTIMEDIA_HUB_URL` Resolution order: 1. `MULTIMEDIA_HUB_URL` 2. `MM_HUB_URL` 3. `NEXT_PUBLIC_MULTIMEDIA_HUB_URL` Token resolution order: 1. `MULTIMEDIA_HUB_TOKEN` 2. `MM_AGENT_TOKEN` 3. `MULTIMEDIA_HUB_BEARER_TOKEN` Bearer auth expectation: - write-capable and remote MCP flows should use a valid agent or library-write bearer token accepted by the deployed Multimedia Mastery surface - storing a token in Vault is not proof that the token is valid for live authenticated endpoints - verify against an authenticated endpoint before promoting Multimedia Mastery above existing proven media lanes ## Remote MCP Template ```json { "mcpServers": { "multimediaMastery": { "type": "streamable-http", "url": "https://multimediamastery.vercel.app/api/media/v1/mcp", "headers": { "authorization": "Bearer ${MM_AGENT_TOKEN}" } } } } ``` ### Claude Code / Claude-compatible JSON config ```json { "mcpServers": { "multimediaMastery": { "type": "streamable-http", "url": "https://multimediamastery.vercel.app/api/media/v1/mcp", "headers": { "Authorization": "Bearer ${MM_AGENT_TOKEN}" } } } } ``` ### Codex TOML config ```toml [mcp_servers.multimediaMastery] url = "https://multimediamastery.vercel.app/api/media/v1/mcp" bearer_token_env_var = "MM_AGENT_TOKEN" ``` Authentication note: - Codex still needs a valid bearer token provided through the client or surrounding environment - use `bearer_token_env_var = "MM_AGENT_TOKEN"` for bearer auth in Codex - do not rely on a Claude-style nested `headers` table in `.codex/config.toml`; Codex does not surface that as bearer auth - verify the parsed entry with `codex mcp get multimediaMastery` - do not ship this as active canon until the token is verified ## Verification Script Use the repo canary before enabling the Codex entry: ```bash python3 scripts/ops/multimedia_mcp_canary.py ``` Exit behavior: - `0` = discovery routes are healthy and authenticated MCP initialize succeeded - `1` = discovery or authenticated MCP failed - `2` = discovery routes are healthy but no Multimedia token env var is set ## Katailyst Positioning - Katailyst owns registry, orchestration hints, graph links, and evaluation context. - Multimedia Mastery owns media-native runtime behavior, library workflows, and studio reconnection. - Promotion above existing proven image lanes should wait until live verification shows it is reliable in production. ## Minimum Readiness Checklist - `python3 scripts/ops/multimedia_mcp_canary.py` passes with exit code `0` - vault secret exists for the target org - health route returns configured base URL - auth is present in health metadata - capabilities endpoint responds successfully - tools endpoint responds successfully - remote MCP metadata endpoint responds successfully - authenticated endpoint accepts the current bearer token --- ## Source: docs/references/integrations/INTEGRATION_CONTRACT_RENDER_MCP.md # Render MCP Integration Contract ## Purpose Render MCP is the hosted infrastructure-control lane for Render-backed services. Katailyst should use it for: - listing and inspecting Render workspaces - listing services, deploys, logs, and metrics - querying Render Postgres through Render's read-only SQL lane - giving hosted agents and operators a common infra-debug surface It should not be treated as a substitute for direct runtime file access. For `/data/workspace` truth, the active requirement is still: - validated SSH access - or a verified mount into the live service workspace Render MCP gives platform visibility, not arbitrary shell or filesystem access. ## Canonical Connection Points - Hosted MCP URL: `https://mcp.render.com/mcp` - Auth header: `Authorization: Bearer <RENDER_API_KEY>` - Workspace selection is required before most resource-level operations Official hosted docs confirmed on 2026-03-27: - hosted MCP URL and bearer auth - workspace selection requirement - supported actions for services, logs, metrics, and Render Postgres - limitations around resource mutation and operational controls ## Current Verification Status Verified on 2026-03-27: - Render publishes a hosted MCP endpoint at `https://mcp.render.com/mcp` - remote clients authenticate with a Render API key in the bearer header - the hosted server supports workspaces, services, deploys, logs, metrics, and read-only Render Postgres queries - the hosted server is not a disk-audit surface for `/data/workspace` Not yet proven from this workspace: - live HLT production workspace selection through the hosted MCP server - which current Render services map to Victoria, Julius, and Lila - live `/data/workspace` file parity for the 8 steering files That means Render MCP is ready to use as the hosted infra-control surface, but it does not close the live runtime truth gap by itself. ## Auth Contract Canonical Vault secret key: - `render/api-key` Canonical local env var: - `RENDER_API_KEY` Repo helper: - `lib/integrations/render-mcp.ts` Resolution order for hosted agents and server-side runtime code: 1. existing `process.env.RENDER_API_KEY` 2. org-scoped Vault secret `render/api-key` 3. shared/system fallback for `render/api-key` Recommended Vault name shape: - `org/<org-code>/render/api-key` ## Project-Local MCP Config ### Claude Code / Claude-compatible JSON Project-local config lives in `.mcp.json`. ```json { "mcpServers": { "render": { "type": "http", "url": "https://mcp.render.com/mcp", "headers": { "Authorization": "Bearer ${RENDER_API_KEY}" } } } } ``` ### Codex TOML Project-local config lives in `.codex/config.toml`. ```toml [mcp_servers.render] url = "https://mcp.render.com/mcp" bearer_token_env_var = "RENDER_API_KEY" enabled = true ``` ## User-Global Config These examples are for operators who want Render MCP available outside this repo as well. ### Claude Code CLI ```bash claude mcp add --transport http render https://mcp.render.com/mcp \ --header "Authorization: Bearer ${RENDER_API_KEY}" ``` ### Cursor ```json { "mcpServers": { "render": { "url": "https://mcp.render.com/mcp", "headers": { "Authorization": "Bearer ${RENDER_API_KEY}" } } } } ``` ## Hosted-Agent Access Pattern Hosted agents and server-side helpers should not hardcode Render credentials. Use: - `resolveRenderApiKey({ orgId })` - `ensureRenderMcpEnv({ orgId })` - `buildRenderMcpHeaders()` from: - `lib/integrations/render-mcp.ts` Example: ```ts import { buildRenderMcpHeaders, ensureRenderMcpEnv, RENDER_MCP_URL, } from '@/lib/integrations/render-mcp' await ensureRenderMcpEnv({ orgId }) const headers = buildRenderMcpHeaders() if (!headers) throw new Error('Render MCP is not configured for this org.') const response = await fetch(RENDER_MCP_URL, { method: 'POST', headers: { 'content-type': 'application/json', ...headers, }, body: JSON.stringify(payload), }) ``` ## Runtime Truth Audit Rule Render MCP is the right lane for: - workspace inventory - service inventory - deploy history - logs and metrics - Render-managed database queries It is not the right lane for: - reading `/data/workspace/AGENTS.md` - checking the live 8 steering files on disk - proving repo mirror parity against live disk For those tasks, use validated SSH or a verified runtime mount. ## Minimum Readiness Checklist - `RENDER_API_KEY` exists locally or resolves from Vault - `render/api-key` is present in Vault for the target org - `.mcp.json` includes the hosted Render MCP entry - `.codex/config.toml` includes the hosted Render MCP entry - the operator can list Render workspaces - the operator can set the active HLT production workspace - the operator understands that `/data/workspace` parity still requires SSH or mount access ## Limitations That Matter Here Per Render's hosted MCP docs: - modification/deletion support is intentionally limited - environment-variable updates are the main mutation lane currently exposed - deploy triggers, scale controls, and arbitrary service ops are not the primary MCP path - sensitive data exposure is reduced but not guaranteed to be impossible Treat Render MCP as a high-value hosted operator surface, not as a full replacement for the Render dashboard, REST API, or runtime shell access. --- ## Source: docs/references/integrations/README.md # Integrations Reference Index `docs/references/integrations/**` is the source-of-truth contract lane for integrations and connector families. Use these docs when you need integration behavior, publishing model, or downstream surface expectations. Start here: - Tool-family map and cross-integration framing: `docs/references/integrations/TOOL_INTEGRATION_FAMILIES_RUNBOOK.md` - Framer page build guide: `docs/references/integrations/FRAMER_PAGE_BUILD_GUIDE.md` - Marketo contract: `docs/references/integrations/INTEGRATION_CONTRACT_MARKETO.md` - Multimedia mastery contract: `docs/references/integrations/INTEGRATION_CONTRACT_MULTIMEDIA_MASTERY.md` - Render MCP contract: `docs/references/integrations/INTEGRATION_CONTRACT_RENDER_MCP.md` - Research/source provider lanes: `docs/references/integrations/research-web/` --- ## Source: docs/references/integrations/TOOL_INTEGRATION_FAMILIES_RUNBOOK.md # Tool Integration Families Runbook (Cloudinary, Novu, Localytics) Last updated: 2026-02-23 ## Why this exists This runbook is the "how to use" and taxonomy map for the three integrations that were drifting in practice: - Cloudinary (media asset lifecycle) - Novu (notifications + ops telemetry) - Localytics (analytics ingestion) It aligns DB/runtime behavior with `docs/TAXONOMY.md` so tools are discoverable by family and provider. ## Readiness states Use these readiness labels consistently when a tool family graduates into the hosted-agent foundation: - `inventory-only` — credentials/templates exist, but no hosted execution contract yet - `describable` — `tool.describe` works and the family is discoverable, but execution is not yet proven - `executable` — `tool.execute` works and at least one proof run is recorded - `gated-live` — executable, but live/shared mutations still require explicit approval - `deprecated` — retained for continuity only; do not expand Admission checklist for a family: - secret resolution works - `tool.describe` works - `tool.execute` works - risk/approval posture is declared - one proof run is recorded - one benchmark intent can actually use it Naming note: - Canonical name is **Localytics**. - Team shorthand like "Loclytic" or "Loglytic" should map back to `provider:localytics` in code/docs. ## Family + Provider Mapping Use these canonical tags: - Cloudinary - `provider:cloudinary` - `family:media` - `tool_type:http` - `surface:api`, `surface:runtime` - Novu - `provider:novu` - `family:communications` - `tool_type:http|internal` (depends on tool) - `surface:api`, `surface:runtime` - Localytics - `provider:localytics` (integration account lane) - `family:data` (analytics ingestion lane) - currently operated via integration accounts + scripts, not first-class runtime tools ## Current Operating Model ### Cloudinary Primary entities: - `tool:cloudinary.manage@v1` - `tool:cloudinary.upload@v1` - `tool:cloudinary.transform@v1` - `kb:cloudinary-integration-guide@v1` Canonical repair + bootstrap: - `database/093-cloudinary-tool-call-spec-and-secret-pointers.sql` - restores `call_spec` on `cloudinary.manage/upload/transform` - links HLT `integration_secrets` rows to existing Vault names: - `org/hlt/cloudinary/api-key` - `org/hlt/cloudinary/api-secret` - `org/hlt/cloudinary/cloud-name` Verification: ```bash pnpm db:psql -- -c " select re.code, (er.content_json ? 'call_spec') as has_call_spec from registry_entities re left join entity_revisions er on er.id = re.current_revision_id where re.org_id = system_org_id() and re.entity_type = 'tool' and re.code like 'cloudinary.%' order by re.code; " pnpm db:psql -- -c " select secret_key, status, vault_secret_name from integration_secrets where org_id = (select id from orgs where code = 'hlt') and secret_key like 'cloudinary/%' order by secret_key; " npx tsx scripts/ops/smoke_tools.ts \ --org-code hlt \ --tool-codes cloudinary.manage,cloudinary.upload,cloudinary.transform \ --live-if-ready ``` How to use (tool execution payload shape): ```json { "tool_ref": "tool:cloudinary.transform", "input": { "transformation": "f_auto,q_auto,w_1200,c_limit", "public_id": "campaigns/spring-2026/hero-v1.png" } } ``` ```json { "tool_ref": "tool:cloudinary.upload", "input": { "file": "https://images.example.com/hero.png", "folder": "campaigns/spring-2026", "public_id": "hero-v1", "overwrite": true } } ``` Taxonomy/family link sanity: ```bash pnpm db:psql -- -c " with cloud as ( select id from registry_entities where code in ('cloudinary-integration-guide','cloudinary.manage','cloudinary.upload','cloudinary.transform') ) select count(*) as link_count from entity_links where from_entity_id in (select id from cloud) or to_entity_id in (select id from cloud); " ``` ### Novu Primary entities: - `tool:novu.trigger@v1` - `tool:novu.health@v1` - `tool:novu.workflows.sync@v1` - `tool:novu.shadow-subscriber.upsert@v1` - `tool:novu.ops.snapshot@v1` - `tool:novu.ops.remediate@v1` Readiness checks: ```bash npx tsx scripts/integrations/verify_novu_activation.ts --org-code hlt npx tsx scripts/ops/smoke_tools.ts \ --org-code hlt \ --tool-codes novu.trigger,novu.health,novu.workflows.sync,novu.shadow-subscriber.upsert,novu.ops.snapshot,novu.ops.remediate \ --live-if-ready \ --novu-name org_hlt.content_ready ``` Reference docs: - `docs/references/integrations/NOVU_INTEGRATION_FOUNDATION.md` - `docs/runbooks/novu/operator-control-pack.md` - Novu platform docs: `https://docs.novu.co/platform/what-is-novu` - Novu API refs used by this repo: - `https://docs.novu.co/api-reference/events/trigger-event` - `https://docs.novu.co/api-reference/workflows/list-workflows` UI surfaces: - `/dashboard-cms/content/channels` (bindings, risk panel, incidents) - `/dashboard-cms/observe` and `/dashboard-cms/observe/runs` (delivery/webhook/ingress timelines) Quick verify (operator-safe): ```bash # route-level health in app context curl -s http://localhost:3000/api/novu/health | jq . # registry tool route proof (requires authenticated cookie) # ORCHESTRATOR_COOKIE="<sb cookie>" npx tsx scripts/integrations/prove_novu_tool_routes.ts --org-id <uuid> ``` Activation warning guide (`scripts/integrations/verify_novu_activation.ts`): - `novu/webhook-signing-key status=(missing)` - add/activate `integration_secrets.secret_key='novu/webhook-signing-key'` for the org. - `novu/ingress-signing-key status=(missing)` - add/activate `integration_secrets.secret_key='novu/ingress-signing-key'` for the org. - `active workflow bindings=0` and/or `recent workflow sync evidence=false` - run `tool:novu.workflows.sync` (or route proof script) for the org/app scope. - then re-run `npx tsx scripts/integrations/verify_novu_activation.ts --org-code <org> --json`. ### Localytics Localytics is currently a metrics integration pathway (ingestion + app mapping), not a first-class runtime tool family. Key scripts: - `scripts/metrics/localytics_list_apps.ts` - `scripts/metrics/import_localytics_apps.ts` - `scripts/metrics/ingest_metric_points_from_localytics.ts` Operational checks: ```bash npx tsx scripts/metrics/localytics_list_apps.ts --org-code hlt --json npx tsx scripts/metrics/import_localytics_apps.ts --org-code hlt npx tsx scripts/metrics/ingest_metric_points_from_localytics.ts --org-code hlt --days 7 ``` Reference docs: - `docs/references/metrics/METRICS_SYSTEM_DESIGN.md` - `docs/references/metrics/METRICS_SYSTEM_DESIGN.md` - Localytics docs: `https://docs.localytics.com/` UI surfaces: - `/dashboard-cms/metrics` - `/dashboard-cms/metrics/integrations` Quick verify (operator-safe): ```bash # app inventory (provider-backed integrations) curl -s http://localhost:3000/api/endpoints | jq '.endpoints[] | select(.family=="metrics")' # localytics ingestion pipeline npx tsx scripts/metrics/localytics_list_apps.ts --org-code hlt --json ``` ## Drift Repair Rule (Tool call_spec) If a tool has historical revisions with `call_spec` but the current revision does not, runtime behavior and readiness checks degrade. Repair migration: - `database/092-repair-system-tool-call-spec-current-revisions.sql` - `database/093-cloudinary-tool-call-spec-and-secret-pointers.sql` Apply: ```bash pnpm db:sql:apply -- --from 92 --to 93 ``` Verify: ```bash pnpm db:psql -- -c " with tr as ( select re.code, er.content_json from registry_entities re left join entity_revisions er on er.id = re.current_revision_id where re.org_id = system_org_id() and re.entity_type='tool' and re.version='v1' ) select count(*) as tools_missing_call_spec from tr where tr.content_json is null or not (tr.content_json ? 'call_spec'); " ``` Expected outcome after repair: `tools_missing_call_spec = 0` for tools that already had any historical `call_spec`. --- ## Source: docs/references/mcp/MCP_IMPORT_NORMALIZATION.md # MCP Import & Placement Normalization This spec defines **where MCP configurations live** and how to stage external MCP servers without polluting the repo. --- ## 1) Canonical Placement **Project MCP config** - `.mcp.json` at repo root - Contains only **project‑scoped** server definitions - Never commit secrets (auth happens out‑of‑band) **Codex MCP config** - `.codex/config.toml` (project‑scoped) or `~/.codex/config.toml` (global) - Codex does **not** read `.mcp.json` **Claude hooks config** - `.claude/settings.json` - Hook scripts live under `.claude/hooks/` --- ## 2) Staging External MCP Servers When evaluating third‑party MCP servers, stage them under: ``` docs/runbooks/mcp/servers/<source>/<server-slug>/ README.md config.example.json (or config.example.yaml) notes.md ``` Rules: - **No credentials** in staged configs. - Keep a short `notes.md` describing use‑case, risks, and status (staged/curated). - Only promote to `.mcp.json` once validated. --- ## 3) Promotion Rules (Staged → Project Config) Before adding to `.mcp.json`: 1. Confirm the server is needed for a near‑term use‑case. 2. Validate tool list and scope. 3. Record provenance (source repo + version/commit). 4. Add a short entry to `docs/references/mcp/MCP_SERVER_CATALOG.md`. --- ## 4) Naming Conventions - MCP server keys should be **lowercase** and **dash‑separated**. - Use vendor prefixes only if needed to avoid collisions. Examples: - `supabase` - `playwright` - `openapi-to-mcp` --- ## 5) Hooks + Commands Placement - **Hooks**: `.claude/hooks/` (Claude‑specific) - **Commands**: `.claude/commands/` if we adopt a slash‑command pattern - **Deterministic scripts**: `scripts/` (registry sync, import/export, evals) --- ## 6) Invariants - `.mcp.json` is the only project MCP config. - No secrets committed to repo. - Staged MCP servers must be documented before activation. --- ## Source: docs/references/NOMENCLATURE.md # Nomenclature (Canonical Terms) This document defines **the official vocabulary** for the Catalyst system. Use these terms consistently in code, docs, and UI. If a term is not listed here, add it before using it broadly. For classification and placement questions such as "should this be a skill, KB, bundle, playbook, artifact, or link?", use [Atomic Unit Decision Matrix](./atomic-units/DECISION_MATRIX.md). This file stays glossary-first on purpose. --- ## Atomic Units (Registry Entities) - **skill** — a reusable procedure/SOP (SKILL.md); **a protocol that says “do this, then that.”** - **tool** — an executable capability (API/tool call). Tools define _can do_. - **kb** — long‑form reference context; an **“instant‑expert packet.”** Informs decisions but does **not** prescribe steps. - **prompt** — short directive or request fragment (system/task/format/rubric). In practice, this is **the request** plus any immediate reference context. - **DB entity_type:** `prompt` - **Link type:** `uses_prompt` - **schema** — JSON shape contract for inputs/outputs. - **style** — formatting overlay rules (no schema changes). - **bundle** — a curated collection of any units (unordered). - **recipe** — preset binding of schema + style + constraints. - **content_type** — UI preset, not a schema fork. - **agent** — persona + preferences; **does not define capabilities.** Agents bias selection and tone. - **playbook** — ordered multi‑step pattern (soft guidance). - **channel** — delivery surface constraints (not schema). - **rubric** — evaluation criteria. - **metric** — measurement definition. - **lint_rule** — validation rule. - **lint_ruleset** — bundle of lint rules. - **agent_doc** — runtime documentation attached to agents. Operating instructions, memory, personality definitions consumed during agent context assembly. - **operational_log** — process records and operational notes. Audit trails, session logs, decision records tracked for transparency and debugging. - **pattern** — reusable architectural pattern or technique (e.g., "Two-Phase RAG", "Orchestrator-Subagent"). Documents a proven approach for agent behavior or system design. - **hub** — domain front door that organizes related entities via `recommends` links. Each hub represents a domain (social, article, marketing, etc.) and points agents to the best tools, skills, and KBs for that domain. - **content asset** — a produced instance (article, image, video, page, deck, etc.), not automatically a reusable atomic unit. - **memory / log** — operational continuity or trace context; promote to KB only when it becomes durable, reusable truth. --- ## Derived / Operational Units (Non-`entity_type`) - **action** — a runnable “push‑button” entry point surfaced on the CMS Launchpad and the command palette. - **Canonical storage (v1):** **not** a new registry `entity_type`. An Action is a **featured `playbook`** with an Action contract in its `entity_revisions.content_json` and curated via: - `surface:cms-launchpad` - and/or membership in `bundle:launchpad-core@v1` - **UI label:** “Action” (alias of `playbook` in Launchpad contexts) - **Typed ref:** still `playbook:code@vN` (Actions must remain exportable and linkable as playbooks) - **automation** — a scheduled trigger for an Action. - **Canonical storage (planned):** operational tables (e.g. `automations`, `automation_runs`) plus a `runs` trace per dispatch. - **Invariant:** every automation execution must be replayable via its linked run trace (no “silent cron” behavior). --- ## Export / Packaging Terms - **mirror** — filesystem projection of canonical registry data (e.g., SKILL.md exports). - **pack** — a portable export bundle (JSON or file tree) meant for reuse. - **manifest** — selection intent for a pack (tags/types/filters). - **lockfile** — resolved set of exact versions/hashes for a pack. - **unit package contract** — shared envelope for all unit types (see `docs/atomic-units/SHARED_CONTRACT.md`). - **unit.json** — shared metadata file in the unit package contract (tags, status, provenance, links). - **artifact** — any non‑entrypoint file in a unit (rules, references, templates, tests, assets). - **context bundle** — compiled set of KB/skills for a single run or prompt (token‑budgeted). - **plugin** — generated distribution surface for outside runtimes; not a canonical atomic unit type. ## Standard Operations (Verbs) These verbs are intentionally reused across atomic unit types to reduce cognitive load. The unit type determines the surface (mirror vs pack), not the verb itself. - **sync** — reconcile a canonical source to a portable surface. - Examples: DB → filesystem mirror (`skill`, `kb`), DB → JSON pack (`tool`, `prompt`, `bundle`). - **refresh** — recompute derived metadata that is not canonical. - Examples: KB `length.tokens_est`, derived YAML headers, generated aliases. - **index** — regenerate deterministic discovery indexes (manifests). - Example: `.claude/*/curated/manifest.json` from recursive `**/unit.json` scan. - **lint** — validate unit package contract + taxonomy coverage (prevent drift). - **export** — produce a portable distribution artifact (mirror snapshot or pack). - **import** — ingest external packs into staged surfaces with provenance (then curate → canonicalize). - **check-drift** — run a no-write verification that the mirror/pack matches expectations. - **prune** — optionally remove orphaned files from portable surfaces after a successful sync. ## Priority Tier - **tier 1** — exceptional / flagship - **tier 2** — excellent - **tier 3** — strong - **tier 4** — good - **tier 5** — average / solid - **tier 6** — below average - **tier 7** — weak / niche - **tier 8** — poor - **tier 9** — raw / unpolished - **tier 10** — bottom / deprecated --- ## Naming + Derivation (Single Source of Truth) - **name** — canonical human label for a unit. This is the **only** name you enter. - **code** — derived slug from `name` (lowercase, hyphenated, punctuation removed). - **aliases** — derived search variants from `name` (no manual entry). **Code invariants (enforced for new records):** - **No** `entity_type:` prefix or any `:` in code - **No** entity‑type prefixes (`skill-`, `kb-`, `tool-`, etc.) **Format guidance (by unit type):** - **KBs + skills:** lower kebab‑case (`^[a-z0-9]+(-[a-z0-9]+)*$`) - **Tools/schemas/others:** lowercase slugs may include `_` or `.` when those are part of provider IDs (e.g., `audio_script_v1`, `fal.image_generate`). Keep them lowercase and colon‑free. Legacy colon‑prefixed codes were migrated in Phase 01‑05 (ref invariant migration), and the DB now enforces colon‑free codes going forward. **Guideline (not a rule):** keep `name` descriptive and plain‑English so external orchestrators can route correctly. **System prefix handling:** when `system` is set (e.g., `HLT Corp Infra`), the system prefix in `name` may be **omitted** from `code` to keep domain‑first slugs clean and short. The system label provides disambiguation. ### Length Signal - **length.tokens_est** — preferred token estimate for a KB (or long artifact). - Use tokens for routing/bundling decisions; keep it derived from `unit.json`. ### Code Shape (Guidelines) - **KBs:** `domain-kind-topic` Examples: `writing-guide-clarity`, `cms-schema-question`, `infra-reference-prompt-runner` - **Skills:** **action‑first** (verb‑leaning, ergonomic to call) Examples: `research-trends`, `write-email`, `validate-qbank` **Note:** these are routing aids. The identity is the typed ref (`kb:code@v1`), not the folder path. Example: - `name`: **"HLT Corporate CMS — Question Schema"** - `code`: `cms-schema-question` (system prefix omitted) - `aliases`: `"hlt corporate cms question schema"`, `"hlt cms question schema"`, `"question schema"` --- ## System Labels (Guidelines, Not Rules) - **system** — optional label for high‑level system grouping (used for corporate infrastructure). - Suggested values (pick one and stay consistent): - `HLT Corp Infra` (preferred) - `HLT Corporate Infra` - `HLT Official Systems` **Note:** Not everything needs a system label. Use it when external routing needs precision. Avoid adding HLT prefixes to global reference KBs (e.g., UI/UX design). --- ## Staging + Status - **staged** — imported raw; not yet curated or ranked. - **curated** — reviewed, tagged, and ready for discovery. - **published** — eligible for default ranking in discovery. - **archived** — hidden from default discovery but retained. - **deprecated** — visible but flagged with replacement guidance. --- ## Taxonomy - **tags** — namespaced metadata shared across all units. - **tag namespace** — `dept:*`, `action:*`, `stage:*`, `audience:*`, `domain:*`, etc. - **link** — weighted relationship between any two units with reason. --- ## Discovery + Execution - **discovery** — ranked menu of candidates (soft bias, no hard gates). - **inspect** — open a unit to view artifacts/examples/constraints. - **execute** — run a skill or tool. - **evaluate** — score outcomes for ranking updates. - **revise** — update outputs or unit content with versioning. --- ## Graph Relationships (Links + Proclivities) - **links** — weighted relationships between units (parent, related, often_follows). They shape graph‑style discovery. - **proclivities** — agent‑specific preferences (prefer/avoid/default_to/never_use) stored separately; they tune selection without changing global links. --- ## System / Repo Vocabulary - **canonical** — source of truth (Supabase DB for registry). - **reference input** — external resources captured for curation (not canonical). - **factory** — template + generator for creating new units. - **render/test lab** — environment for previewing outputs and validating schemas. --- ## Naming Rules (Short) - **code** — derived slug from `name`; never includes entity type. - **ref** — typed reference `entity_type:code[@version]`. - **description** — trigger‑grade, short, and activation‑friendly. ## Human Naming Methodology Keep human labels operator-legible and consistent. - Prefer **job/capability first** in the human name. - Keep **source/vendor** in provenance, tags, or summary unless it materially changes behavior. - Use a vendor/runtime in the human name only when it removes a real ambiguity. - When a row is a runtime-specialized variant, prefer a **purposeful qualifier** over a random vendor prefix. Good patterns: - `Skill Creator for Hosted Agents` - `Firecrawl for Hosted Agents` - `Nano Banana Pro Image Generation` - `GitHub Workflow Methodology` Usually avoid: - `OpenClaw Skill Creator` - `OpenClaw Firecrawl Web Scraping` - source/vendor prefixes added only because the import came from that ecosystem Rule of thumb: - if the difference is **runtime specialization**, name the specialization - if the difference is **provider/model**, name the provider/model - if the difference is only **source provenance**, keep it out of the human title --- ## When To Update - Add any new term here **before** using it broadly in docs or UI. - If a term is ambiguous, include a “not this” clarification. --- ## Source: docs/references/operations/FILESYSTEM_BASH_PRINCIPLES.md # Filesystem + Bash Principle (Reference) This captures the principle: **agents excel when they can explore structured files with bash**. Use it as a default retrieval strategy before building custom pipelines. ## Core Idea - Represent domain data as files and folders. - Give the agent **bash + filesystem** so it can `ls`, `find`, `grep`, and `cat`. - Let the agent decide what context to load, instead of stuffing everything into prompts. ## Why It Works - **Native model capability.** LLMs are trained on codebases and CLI usage. - **Precise retrieval.** `grep` finds exact values, not fuzzy semantic matches. - **Context minimalism.** Load only the files needed, not everything. - **Debuggability.** You can see exactly what the agent read and ran. - **Future‑proofing.** As code understanding improves, this approach improves automatically. ## Mapping Examples (Patterns) - `/customers/<id>/profile.json`, `/tickets/*.md`, `/conversations/*.txt` - `/documents/uploaded/*`, `/documents/extracted/*`, `/analysis/<doc_id>/*` ## How This Applies Here - **Repo context** should be a real filesystem and browsed with bash. - **KB and reference data** can be packaged as files for quick, precise access. - **Long context** should be a folder, not a giant prompt. ## DB Shortcut (Canonical) - Use `pnpm db:url` to print the resolved canonical DB URL (supports project key fallback order). - Use `pnpm db:psql -- -c "select now();"` for SQL shell commands without re-resolving env keys manually. - `scripts/lib/db-url.ts` is the single resolver contract used by scripts; prefer reusing it over duplicating env-key logic. --- This is a **principle**, not a hard gate. Use it to reduce bespoke tooling. --- ## Source: docs/references/README.md # References Index `docs/references/**` is the specialist reference lane. Use it when you need setup truth, contracts, or ecosystem notes that are deeper than the main repo front doors. ## Agent Index Surfaces - `/.well-known/llms.txt` - `/llms.txt` - `/llms-full.txt` - `/api/docs/page.md?path=...` ## Start Here - `docs/references/contracts/MIRRORS_AND_PACKS.md` — DB vs mirror/export behavior. - `docs/references/contracts/DB_URL_ENV_CONTRACT.md` — canonical DB URL resolution contract. - `docs/references/contracts/AGENT_AUTONOMY_DISCOVERY.md` — discovery-first execution policy. - `docs/references/contracts/ATOMIC_UNIT_STANDARDS.md` — canonical quality floor + promotion and type policy. ## Pick This When… - You need source-of-truth behavior or system boundaries: `docs/references/contracts/README.md` - You need integration contracts or connector-family guidance: `docs/references/integrations/README.md` - You need Supabase auth setup: `docs/references/supabase/SUPABASE_AUTH_SETUP.md` - You need skill ecosystem or governance references: `docs/references/skills/SKILL_FACTORY_GOVERNANCE_CHECKLIST.md` - You need repo operations guidance: `docs/references/operations/FILESYSTEM_BASH_PRINCIPLES.md` ## Folders ### `ai-agents/` AI runtime notes, SDK integration contracts, rerank behavior, and agent-pattern references. ### `contracts/` System contracts and canonical behavioral rules (quality, lifecycle, links, qbank, mirrors, vault/tool execution, platform/system contracts). ### `integrations/` 3rd-party integration contracts and runbooks (Framer, Marketo, Pipedream, Corp CMS, Novu, and tool-family operations). ### `mcp/` MCP server catalog and import normalization guidance. ### `metrics/` Canonical metrics system contract and operational design. ### `skills/` Skill ecosystem notes, normalization standards, QA rubric, governance checklist, and candidate guidance. ### `supabase/` Supabase auth, edge functions, MCP setup, and vault notes. ### `operations/` Repo operational references (filesystem/bash principles, git worktrees). ### `security/` Security notes and hardening references. ### `catalyst_repo_bootstrap/` Imported bootstrap reference material preserved for provenance. ## Pruned Material Historical notes, trend captures, and archived UI/reference material are not part of the active repo guidance surface. Keep this lane focused on current references only. ## Usage Rule - If guidance conflicts, prefer `docs/RULES.md`, then `docs/VISION.md`, then these references. --- ## Source: docs/references/security/SECURITY_NOTES.md # Security Notes — Rendered HTML (`dangerouslySetInnerHTML`) > Last reviewed: 2026-03-01 ## Overview This document tracks every `dangerouslySetInnerHTML` usage in the Katailyst application code, the sanitization chain that protects it, and the risk level. ## Sanitization Chain: `renderMarkdown()` All markdown → HTML rendering flows through `lib/render/markdown-renderer.ts`. Its safety model: 1. **Raw HTML escaping** — `renderer.html` calls `escapeHtml()` on every raw HTML token. Embedded `<script>`, `<iframe>`, etc. are rendered as literal text, never as DOM elements. 2. **Link URL sanitization** — `sanitizeLinkUrl()` allows only `#`, relative paths, `http(s):`, `mailto:`, and `tel:` protocols. `javascript:`, `data:`, `vbscript:`, `blob:` are all blocked. Unsafe links degrade to plain text (no `<a>` tag emitted). 3. **Image URL sanitization** — `sanitizeImageUrl()` allows only relative paths and `http(s):`. 4. **Attribute minimization** — The renderer only emits attributes it generates itself (`href`, `src`, `alt`, `title`, `class`, `data-md-line`, `rel`). No user-supplied attributes pass through. 5. **Rel attributes** — All links get `rel="nofollow noopener noreferrer"`. 6. **No DOMPurify dependency** — Safety is built into the renderer, avoiding jsdom/DOMPurify compatibility issues in Vercel Node runtimes. See comments in `markdown-renderer.ts` L12–18. ## Inventory | # | File | Lines | Source | Risk | Notes | |---|------|-------|--------|------|-------| | 1 | `components/chat/chat-message.tsx` | ~110 | `renderMarkdown(seg.text)` | Medium | Chat message segments — user/AI text rendered as markdown | | 2 | `components/chat/chat-message.tsx` | ~125 | `renderMarkdown(text)` | Medium | Single-segment chat message fallback | | 3 | `components/chat/ai-proposal-block.tsx` | ~117 | `renderMarkdown(draft)` | Medium | AI proposal content — editable then re-rendered | | 4 | `components/content/asset-editor-utils.tsx` | ~84 | `renderMarkdown(markdown)` | Low | Asset preview — user-authored content from DB | | 5 | `components/registry/artifact-preview.tsx` | ~64 | `renderMarkdown(content)` | Low | Artifact markdown preview | | 6 | `components/test-lab/render-canvas.tsx` | ~121 | `render engine output` | Medium | Render canvas — HTML from the full render engine pipeline | | 7 | `components/evals/pairwise-compare.tsx` | ~463 | `render engine output` | Low | Pairwise evaluation — side-by-side rendered HTML comparison | | 8 | `components/evals/pairwise-compare.tsx` | ~486 | `render engine output` | Low | Same as above (variant B) | | 9 | `components/ui/chart.tsx` | ~112 | CSS theme variables | None | Controlled `<style>` tag — only chart config values, no user input | ## Risk Assessment - **Medium risk** (items 1–3, 6): These render content that includes user-authored or AI-generated text. The `renderMarkdown()` sanitization chain is the primary defense. If the sanitizer has a bypass, these are the attack surface. - **Low risk** (items 4–5, 7–8): Content from the database/render pipeline, authored by authenticated users. Same sanitization chain applies. - **No risk** (item 9): CSS variables generated from chart configuration — no user-controlled strings. ## Maintenance Rules 1. **Never bypass `renderMarkdown()`** — If you need to render HTML from markdown, always use this function. Do not use `marked()` directly without the custom renderer. 2. **Never render raw user input** — If content is not markdown, escape it with `escapeHtml()` before injecting. 3. **Add a `// SECURITY:` comment** to every new `dangerouslySetInnerHTML` usage explaining the sanitization chain. 4. **Update this document** when adding or removing `dangerouslySetInnerHTML` instances. --- ## Source: docs/references/skills/SKILL_FACTORY_GOVERNANCE_CHECKLIST.md # Skill Factory Governance Checklist Status: Active Last updated: 2026-02-19 Used by: Phases 11-15 (authoring, promotion, export, and ops) ## How to Use Apply this checklist before any of these actions: 1. Promote a staged skill to curated/published. 2. Export skill/plugin/pack artifacts for sharing. 3. Accept AI-generated enrichment suggestions at scale. ## Enforcement Posture (Guidance-First) 1. `warn` by default for quality/taxonomy drift and incomplete enrichment. 2. `block` only for critical compatibility/safety/export-break classes. 3. Every warning/block must include actionable remediation text and a next command. 4. Avoid hard-routing language in shared units unless safety/compliance requires it. ## Policy-to-Check Mapping (Executable) | Policy ID | Class | Severity | Enforcement Source | Failure Output (minimum) | Runbook | | ------------------------- | ------------------------ | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------ | | `compat.frontmatter` | Portable core contract | block | `python3 scripts/registry/lint_unit_packages.py --strict` | Missing/invalid `name`/`description` in launcher frontmatter | `docs/runbooks/factory/import-normalization.md` | | `compat.profile` | Profile compatibility | block | `npx tsx scripts/distribution/export_plugin.ts --check` + `npx tsx scripts/distribution/validate-plugin.ts --strict --plugin-dir .claude-plugin` | Profile mismatch and exact profile id (`agent_skills_standard`, `plugin_portable`, etc.) | `docs/runbooks/factory/promotion-rollback.md` | | `compat.skill_naming` | Skill portability naming | block | Compatibility profile checks + `python3 scripts/registry/lint_unit_packages.py --strict` | Non-kebab strict-surface skill codes (block) and reserved-token findings (`anthropic`, `claude`) surfaced per-profile severity | `docs/references/contracts/AGENT_SKILLS_COMPATIBILITY_CONTRACT.md` | | `taxonomy.required` | Discovery/tag hygiene | warn | `python3 scripts/registry/lint_unit_packages.py --strict` | Missing required namespaces and suggested tags | `docs/runbooks/factory/import-normalization.md` | | `links.typed-ref` | Graph/link integrity | warn | Factory commit post-checks (`factory_commit.post_commit_checks` + link persistence summary) | Unresolved typed refs and invalid link rows | `docs/runbooks/factory/import-normalization.md` | | `quality.fixtures` | Eval readiness | warn | `python3 scripts/registry/lint_unit_packages.py --strict` + eval loops | Missing tests/fixtures and suggested starter fixture path | `docs/runbooks/factory/optimization-ab-validation.md` | | `quality.fixture_format` | Fixture readability | warn | `python3 scripts/registry/lint_unit_packages.py --strict` | Tests fixture content appears escape-encoded (literal `\\n`) instead of multiline markdown/text | `docs/runbooks/factory/optimization-ab-validation.md` | | `quality.overconstraint` | Reuse/composability | warn | `npx tsx scripts/registry/audit/audit_atomic_e2e.ts` | Overly narrow/hard-lock phrasing with suggested rewrite framing | `docs/references/contracts/ATOMIC_UNIT_STANDARDS.md` | | `dedupe.overlap_evidence` | Duplicate hygiene | warn | `docs/references/contracts/ATOMIC_DUPLICATE_DETECTION_POLICY.md` + remediation reports | Missing overlap rationale, migration notes, or provenance for `alternate`/`supersedes` actions | `docs/references/contracts/ATOMIC_DUPLICATE_DETECTION_POLICY.md` | | `dedupe.rollback_loop` | Duplicate hygiene | block | Promotion review + rollback runbook | Supersede/remediation action without explicit rollback target and owner | `docs/runbooks/factory/promotion-rollback.md` | | `safety.secrets` | Safety/compliance | block | lint + review scripts in export/check pipeline | Secret leakage indicators and offending path | `docs/runbooks/factory/incident-response-failed-runs-exports.md` | | `export.roundtrip` | Distribution integrity | block | `npx tsx scripts/distribution/export_registry_packs.ts --check` (+ optional roundtrip validator) | Export or import-back integrity failure details | `docs/runbooks/factory/promotion-rollback.md` | | `ops.rollback-ready` | Operational resilience | warn | Decision packet + promotion action context | Missing rollback target, owner, or monitoring note | `docs/runbooks/factory/promotion-rollback.md` | ## Actionable Failure Output Contract Every governance failure should emit: 1. `policy_id` 2. `severity` (`warn` or `block`) 3. `reason` (single-sentence plain language) 4. `evidence` (ref/path/profile/check name) 5. `next_command` (copy-paste command) 6. `runbook_path` (exact repo path) Example shape: ```json { "policy_id": "compat.profile", "severity": "block", "reason": "plugin_portable profile reported export-breaking compatibility errors", "evidence": { "ref": "skill:new-skill-code@v1", "profile_id": "plugin_portable", "check": "export_plugin --check" }, "next_command": "npx tsx scripts/distribution/validate-plugin.ts --strict --plugin-dir .claude-plugin", "runbook_path": "docs/runbooks/factory/promotion-rollback.md" } ``` ## A. Compatibility Gate - [ ] `SKILL.md` exists and has valid frontmatter. - [ ] frontmatter includes `name` and `description`. - [ ] folder/name parity is valid for selected portability profile. - [ ] launcher remains concise and not overloaded with deep references. ## B. Taxonomy + Discovery Gate - [ ] required tag namespaces are present for unit type/status. - [ ] tags are lowercase and correctly namespaced. - [ ] links include type/weight/reason where applicable. - [ ] no orphaned discovery-critical metadata. ## C. Provenance + Traceability Gate - [ ] source/provenance metadata is present and accurate. - [ ] transformation history is auditable (import -> normalize -> review -> promote). - [ ] major AI suggestions retain rationale and decision outcome. ## D. Quality + Test Gate - [ ] tests/fixtures exist for promoted skills. - [ ] expected outputs or evaluation fixtures are present where required. - [ ] no obvious dead-end stubs (empty/placeholder-only artifacts). - [ ] quality rubric checks pass or documented exception approved. - [ ] language is guidance-first and composable (no unjustified hard-lock phrasing). - [ ] evolving best-practice claims are source-backed (internal or external references). ## E. Security + Safety Gate - [ ] no secrets hardcoded in skill artifacts/scripts. - [ ] scripts/hooks are non-destructive by default. - [ ] external imports reviewed before enablement. - [ ] destructive behavior requires explicit operator acknowledgment. ## F. Export Integrity Gate - [ ] export command check mode passes. - [ ] plugin validation passes if plugin surface used. - [ ] round-trip import-back integrity passes for release candidates. - [ ] generated manifest/provenance attached to export payload. ## G. Ops and Rollback Gate - [ ] promotion decision is evidence-backed (evals/fixtures/manual review). - [ ] rollback path is identified and documented. - [ ] known risks are recorded in decision log. - [ ] post-change monitoring plan exists (at least one KPI and owner). ## H. Lifecycle Evidence Floor (Phase 26.5) - [ ] `staged` promotions meet baseline evidence in `docs/references/contracts/ATOMIC_UNIT_STANDARDS.md` (`name/code/summary/tags/ref hygiene`). - [ ] `curated` promotions include typed links with reasons plus minimum depth by entity type. - [ ] `published` promotions include cross-type linkage depth, artifact/reference evidence, and audit clean pass. - [ ] Overlap remediation (`alternate`/`supersedes`) includes explicit evidence packet + migration note + rollback target. ## Supersede Rollback Loop (Required) For any `supersedes` action: 1. record predecessor and successor refs, 2. document migration note and owner, 3. define rollback target (`restore previous canonical route`), 4. validate discovery/traverse behavior after change, 5. keep replaced entities recoverable (no destructive delete in the same wave). ## Required Command Bundle (Reference) ```bash python3 scripts/registry/lint_unit_packages.py --strict npx tsx scripts/registry/sync/sync_skills_from_db.ts --check npx tsx scripts/distribution/export_plugin.ts --check npx tsx scripts/distribution/validate-plugin.ts --strict --plugin-dir .claude-plugin npx tsx scripts/distribution/test-plugin-smoke.ts --plugin-dir .claude-plugin npx tsx scripts/distribution/export_registry_packs.ts --check npx tsx scripts/registry/audit/audit_registry_graph.ts --report docs/reports/registry-graph-audit-latest.json npx tsx scripts/registry/audit/audit_atomic_e2e.ts --report docs/reports/atomic-e2e-audit-latest.json pnpm lint pnpm typecheck ``` ## Decision Log (Fill Per Promotion Batch) - Date: - Operator/Agent: - Scope (refs): - Compatibility profile: - Checklist exceptions (if any): - Evidence links: - Rollback plan: - Post-release checks: ## Friction Tuning Metrics (Track Weekly) 1. Warning noise rate (`warning` checks / total checks) 2. Override rate (promotion overrides / total promotions) 3. Median remediation time (queue -> resolved follow-up) 4. Open remediation backlog count If friction rises while quality is stable, tune prompts/check suggestions before adding new hard blocks. --- ## Source: docs/references/supabase/SUPABASE_AUTH_SETUP.md # Supabase Auth Setup (Email + OAuth) Use this runbook to get password auth, Google OAuth, and GitHub OAuth working reliably in this repo. ## What This App Expects - App callback route: `/auth/callback` - Password reset route: `/auth/reset-password` - OAuth providers currently exposed in UI: `google`, `github` - Auth actions read `NEXT_PUBLIC_SITE_URL` when set, and otherwise infer origin from request headers. ## 1. App Env (local + deployed) Set in `.env.local` (and deployment env): ```bash NEXT_PUBLIC_SITE_URL=http://localhost:3005 NEXT_PUBLIC_SUPABASE_URL=... NEXT_PUBLIC_SUPABASE_ANON_KEY=... ``` Production should use your real app domain: ```bash NEXT_PUBLIC_SITE_URL=https://your-app-domain.com ``` ## 2. Supabase URL Configuration In Supabase Dashboard: - `Authentication` -> `URL Configuration` - Set **Site URL** to your primary app URL. - Add redirect URLs including: - `http://localhost:3005/auth/callback` - `https://your-app-domain.com/auth/callback` - any preview URLs you use ## 3. Enable OAuth Providers In Supabase Dashboard: - `Authentication` -> `Providers` -> enable **Google** - `Authentication` -> `Providers` -> enable **GitHub** - For each provider: - add provider `client_id` and `client_secret` - configure provider-side callback URL: - `https://<your-project-ref>.supabase.co/auth/v1/callback` ## 4. Email Provider (SMTP) For production-grade email delivery: - `Authentication` -> `Email` - Configure custom SMTP provider (for example Resend/Postmark/SES) - Set sender identity (`from` address/domain) with SPF/DKIM as required by the provider - Keep email confirmation and reset templates aligned with your app routes ## 5. Verify Readiness Run: ```bash pnpm auth:readiness -- --strict ``` This checks: - Supabase Auth settings endpoint (`/auth/v1/settings`) - Supabase Management API auth config (`site_url`) when `SUPABASE_MANAGEMENT_TOKEN` (or `SUPABASE_ACCESS_TOKEN`) is set - enabled providers (`google`, `github`, etc.) - presence of app and Supabase site URLs - user confirmation stats and identity providers in `auth.*` tables ## Notes - For local smoke tests where signup email confirmation is inconvenient, use: - `npx tsx scripts/ops/bootstrap_prod_smoke_user.ts --write-dotenv` - Do not commit secrets. Keep provider secrets in Supabase/Auth provider consoles and local/deployment env only. --- ## Source: docs/references/vercel-ai-sdk-v6-mcp-patterns.md # Vercel AI SDK v6 & @ai-sdk/mcp — Sidecar System Technical Reference **Last Updated:** 2026-03-30 **Source:** Official Vercel AI GitHub documentation, ai-sdk.dev, cookbook examples **Focus:** Production-documented patterns, not hypotheticals --- ## Overview This reference documents every pattern required for the sidecar system, extracted directly from Vercel AI SDK v6 official documentation. All code examples are production-ready and tested patterns from the canonical sources. ### Key Principles - **Transport Flexibility:** MCP clients support stdio, HTTP, and SSE transports with automatic tool discovery - **Multi-Step Tool Orchestration:** Control agent behavior with `stepCountIs()` and dependent tool calling - **Streaming-First Architecture:** All patterns use streaming for real-time UI updates - **Error Recovery:** Explicit error handling with tool-specific error types and repair mechanisms - **Lifecycle Management:** Proper client startup/shutdown with cleanup handlers --- ## 1. createMCPClient API Surface ### 1.1 Stdio Transport (Local Server) ```typescript import { createMCPClient } from '@ai-sdk/mcp' import { Experimental_StdioMCPTransport } from '@ai-sdk/mcp' const client = await createMCPClient({ transport: new Experimental_StdioMCPTransport({ command: 'node server.js', // Optional: // args: ['--port', '3000'], // env: { CUSTOM_VAR: 'value' }, }), }) // Retrieve tools const tools = await client.tools() // Access tool metadata for (const [toolName, tool] of Object.entries(tools)) { console.log(`Tool: ${toolName}`) console.log(` Parameters: ${JSON.stringify(tool.parameters)}`) } // Critical: Always close on shutdown if (client) { await client.close() } ``` **Transport Type Notes:** - Stdio transport spins up a subprocess - Server process receives MCP protocol over stdin/stdout - Automatic reconnection on failure (exponential backoff default) - Tool discovery happens immediately on connection --- ### 1.2 HTTP Transport (Remote Server) ```typescript import { createMCPClient } from '@ai-sdk/mcp' const httpClient = await createMCPClient({ transport: { type: 'http', url: 'http://localhost:3000/mcp', headers: { Authorization: 'Bearer YOUR_TOKEN', 'Custom-Header': 'value', }, // OAuth support authProvider: async () => ({ type: 'oauth_client_credentials', client_id: process.env.CLIENT_ID, client_secret: process.env.CLIENT_SECRET, scope: 'mcp:*', }), }, }) const tools = await httpClient.tools() await httpClient.close() ``` **HTTP Transport Characteristics:** - Stateless connection via standard HTTP - Supports OAuth client credentials for authentication - Headers passed with every request - Ideal for cloud-hosted MCP servers - Timeout handling built-in --- ### 1.3 Server-Sent Events (SSE) Transport ```typescript const sseClient = await createMCPClient({ transport: { type: 'sse', url: 'http://localhost:3000/sse', // Optional auth headers: { Authorization: 'Bearer ...' }, }, }) const tools = await sseClient.tools() await sseClient.close() ``` **SSE Transport Characteristics:** - Maintains persistent connection for server-push updates - Simpler than websockets for one-directional server updates - Automatic reconnection on disconnect - Lightweight streaming protocol --- ### 1.4 Tool Filtering & Metadata ```typescript // All tools from a client const allTools = await client.tools() // Accessing tool schema const getWeatherTool = allTools['getWeather'] if (getWeatherTool) { // Type inference works automatically // parameters schema available at getWeatherTool.parameters console.log(getWeatherTool.parameters) } // Tool description and schema const toolWithMetadata = allTools['myTool'] console.log({ name: 'myTool', description: toolWithMetadata.description, // From MCP server schema inputSchema: toolWithMetadata.parameters, }) ``` **Key Points:** - Tool discovery is automatic from MCP server - Schema validation happens at execute time - Tool names must be unique across merged toolsets --- ### 1.5 Reconnection & Server-Down Behavior ```typescript const client = await createMCPClient({ transport: new Experimental_StdioMCPTransport({ command: 'node server.js', }), // Reconnection is automatic, no config needed // Default: exponential backoff with jitter }) try { const tools = await client.tools() // If server goes down during this call, error is thrown // Subsequent calls will attempt reconnection } catch (error) { // Handle connection failure console.error('Failed to fetch tools:', error.message) } // Always clean up await client.close() ``` **Reconnection Behavior:** - Automatic retry with exponential backoff - No manual reconnection required - `tools()` call will fail if server unreachable - `close()` terminates client cleanly --- ## 2. streamText Advanced Patterns ### 2.1 Tool Error Handling ```typescript import { streamText, NoSuchToolError, InvalidToolInputError } from 'ai' try { const result = await streamText({ model: openai('gpt-4o'), prompt: 'Use the weather tool to check the forecast', tools: { getWeather: tool({ description: 'Get weather for a location', parameters: z.object({ location: z.string() }), execute: async ({ location }) => { // Tool execution return { temp: 72, condition: 'sunny' } }, }), }, }) } catch (error) { // Specific error type checking if (NoSuchToolError.isInstance(error)) { console.error('Model called a tool that does not exist') // Handle gracefully } else if (InvalidToolInputError.isInstance(error)) { console.error('Model provided invalid input to a tool') // Handle gracefully } else { console.error('Unknown error:', error) } } ``` **Error Types:** - `NoSuchToolError` — Tool not in toolset - `InvalidToolInputError` — Tool input doesn't match schema - Tool execution errors bubble up with context --- ### 2.2 onStepFinish Callback Behavior ```typescript interface Step { // Discriminator: determines which fields are present type: 'tool-call' | 'tool-result' | 'text' // Present for all steps stepNumber: number // For text steps text?: string // For tool-call steps toolCall?: { toolName: string toolCallId: string args: Record<string, unknown> } // For tool-result steps toolResult?: { toolCallId: string result: unknown } } const result = streamText({ model: openai('gpt-4o'), tools: { /* ... */ }, onStepFinish: async (event) => { const { steps, text, toolCalls, toolResults } = event // steps is the complete array of all steps so far for (const step of steps) { if (step.type === 'tool-call') { console.log(`Step ${step.stepNumber}: Tool "${step.toolCall.toolName}" called`) } else if (step.type === 'tool-result') { console.log(`Step ${step.stepNumber}: Tool result received`) } else if (step.type === 'text') { console.log(`Step ${step.stepNumber}: Text generated: ${step.text}`) } } // Log final accumulated state console.log('Current text:', text) console.log('Tool calls:', toolCalls) console.log('Tool results:', toolResults) }, }) ``` **Callback Timing:** - Called after each model step completes - Contains complete `steps` array up to that point - Aggregates tool calls and results - Useful for logging, monitoring, validation --- ### 2.3 stepCountIs Functionality ```typescript import { stepCountIs } from 'ai' const result = streamText({ model: openai('gpt-4o'), prompt: 'Get the weather for New York, then get details about it', tools: { /* ... */ }, // Stop after 5 tool invocation steps (not counting text steps) stopWhen: stepCountIs(5), }) ``` **Step Counting:** - `stepCountIs(n)` — Stop after n tool-invocation steps - Does not count pure text generation steps - Allows multi-step tool chains with safety limits - Common pattern: `stepCountIs(5)` for 2-3 dependent tools --- ### 2.4 Dynamic Tool Management ```typescript // Tool merging from multiple MCP clients const stdioClient = await createMCPClient({ transport: new Experimental_StdioMCPTransport({ command: 'node local-server.js', }), }) const httpClient = await createMCPClient({ transport: { type: 'http', url: 'http://remote-mcp.example.com/mcp', }, }) const stdioTools = await stdioClient.tools() const httpTools = await httpClient.tools() // Merge all tools into one toolset const mergedTools = { ...stdioTools, ...httpTools } // Use merged tools in streamText const result = streamText({ model: openai('gpt-4o'), tools: mergedTools, // All tools available to model onFinish: async () => { // Cleanup both clients await stdioClient.close() await httpClient.close() }, onError: async (error) => { // Cleanup on error too await stdioClient.close() await httpClient.close() }, }) ``` **Dynamic Tool Patterns:** - Tools retrieved at request time from MCP servers - Merge multiple tool sources before calling `streamText` - Cleanup all clients in `onFinish` and `onError` - Tool names must be unique across sources --- ### 2.5 Experimental Callbacks: onToolCallStart & onToolCallFinish ```typescript const result = streamText({ model: openai('gpt-4o'), tools: { getLocation: tool({ description: 'Get user location', parameters: z.object({}), execute: async () => ({ latitude: 40.7128, longitude: -74.006, }), }), }, // Called BEFORE tool.execute() runs experimental_onToolCallStart: async (event) => { const { stepNumber, toolCall, messages } = event console.log(`Step ${stepNumber}: About to call "${toolCall.toolName}"`) console.log(`Args: ${JSON.stringify(toolCall.args)}`) // Can be used for: // - Pre-execution logging/monitoring // - Validation before tool runs // - UI updates announcing tool usage }, // Called AFTER tool.execute() completes experimental_onToolCallFinish: async (event) => { const { stepNumber, toolCall } = event if (event.success === true) { // Success case console.log(`Step ${stepNumber}: Tool succeeded`) console.log(`Output: ${JSON.stringify(event.output)}`) } else { // Error case: event.success === false console.log(`Step ${stepNumber}: Tool failed`) console.log(`Error: ${event.error.message}`) } // Can be used for: // - Post-execution logging // - Cleanup after tool runs // - State updates based on result }, }) ``` **Callback Characteristics:** - `experimental_onToolCallStart` fires before `tool.execute()` - `experimental_onToolCallFinish` fires after `tool.execute()` completes - `event.success` discriminates between success and error - Useful for UI updates, monitoring, and state synchronization --- ## 3. useChat Client Hooks ### 3.1 Options & Configuration ```typescript import { useChat } from 'ai/react' const { messages, input, handleInputChange, handleSubmit, sendMessage, append, setMessages, reload, stop, isLoading, addToolOutput, } = useChat({ // Endpoint for API calls api: '/api/chat', // Tool execution control // When to auto-send tool outputs back to server sendAutomaticallyWhen: 'lastAssistantMessageIsCompleteWithToolCalls', // Options: 'never' | 'lastAssistantMessageIsCompleteWithToolCalls' // Callbacks onFinish: (message) => { console.log('Message generation finished:', message) }, onError: (error) => { console.error('Chat error:', error) }, // Headers for API requests headers: { Authorization: 'Bearer token', }, // Unique identifier for conversation id: 'conversation-123', }) ``` **Key Options:** - `api` — Backend endpoint (default: `/api/chat`) - `sendAutomaticallyWhen` — Auto-submit tool outputs when last message complete with tools - `onFinish` — Called when message generation completes - `onError` — Called on error - `headers` — Custom headers for API calls --- ### 3.2 Programmatic Message Injection & Hotlinks ```typescript const { append, messages } = useChat() // Inject a user message and trigger AI response const handleHotlink = async (topic: string) => { await append({ role: 'user', content: `Tell me about ${topic}`, }) } // Or inject assistant message with tool calls (for testing) await append({ role: 'assistant', content: 'I will get the weather for you.', tool_calls: [ { id: 'call_123', type: 'function', function: { name: 'getWeather', arguments: JSON.stringify({ location: 'New York' }), }, }, ], }) // Add tool result const { addToolOutput } = useChat() await addToolOutput({ tool: 'getWeather', toolCallId: 'call_123', output: { temp: 72, condition: 'sunny' }, }) ``` **Message Injection Patterns:** - `append()` adds messages and triggers generation - Can inject user or assistant messages programmatically - Useful for hotlinks, quick actions, test scenarios - Tool results added via `addToolOutput()` --- ### 3.3 Streaming State Handling ```typescript const { isLoading, messages } = useChat({ onFinish: (message) => { // Stream has completed }, }); // Use isLoading to control UI return ( <div> {isLoading && <LoadingSpinner />} {messages.map((msg) => ( <div key={msg.id}> {msg.role === 'user' ? 'You: ' : 'AI: '} {msg.content} </div> ))} </div> ); ``` **State Patterns:** - `isLoading` true while message generating - `isLoading` false when generation complete - `messages` updated incrementally as stream arrives - Use for loading indicators and input disable states --- ### 3.4 Client-Side Tool Call Callbacks ```typescript const { addToolOutput } = useChat({ // Called when model wants to invoke a tool onToolCall: async ({ toolCall }) => { if (toolCall.toolName === 'getLocation') { try { // Execute tool client-side const location = await getLocationData() // Add result back to chat addToolOutput({ tool: 'getLocation', toolCallId: toolCall.toolCallId, output: location, }) } catch (err) { // Handle error addToolOutput({ tool: 'getLocation', toolCallId: toolCall.toolCallId, state: 'output-error', errorText: `Failed to get location: ${err.message}`, }) } } else { // Unknown tool, let server handle it // or throw error } }, }) ``` **Client Tool Execution:** - `onToolCall` fires when model invokes a tool - Can execute tools client-side - `addToolOutput()` sends result back - Support for error states with `state: 'output-error'` --- ## 4. Multi-Step Tool Calling ### 4.1 Model Decision-Making for Multiple Tools ```typescript const result = streamText({ model: openai('gpt-4o'), prompt: 'First get the user location, then get weather for that location', tools: { getLocation: tool({ description: 'Get current user location', parameters: z.object({}), execute: async () => ({ latitude: 40.7128, longitude: -74.006, }), }), getWeather: tool({ description: 'Get weather for coordinates', parameters: z.object({ latitude: z.number(), longitude: z.number(), }), execute: async ({ latitude, longitude }) => ({ temp: 72, condition: 'sunny', location: `${latitude}, ${longitude}`, }), }), }, // Control step limit stopWhen: stepCountIs(5), onStepFinish: async (event) => { // Log each step for debugging for (const step of event.steps) { console.log(`Step ${step.stepNumber}: ${step.type}`) } }, }) ``` **Multi-Tool Flow:** 1. Model sees both tools in schema 2. First invokes `getLocation` → gets coordinates 3. Then invokes `getWeather` with those coordinates 4. `stepCountIs(5)` prevents infinite loops 5. Each step tracked in `onStepFinish` --- ### 4.2 Step Count Control & Agentic Patterns ```typescript // 2-step chain (e.g., location → weather) const result = streamText({ model: openai('gpt-4o'), tools: { getLocation, getWeather }, stopWhen: stepCountIs(3), // Allows 2 tool calls + 1 text }) // 5+ tool chain (complex workflows) const complexResult = streamText({ model: openai('gpt-4o'), tools: { searchDatabase: tool({ /* ... */ }), fetchDetails: tool({ /* ... */ }), validateData: tool({ /* ... */ }), generateReport: tool({ /* ... */ }), }, stopWhen: stepCountIs(10), // Allows multi-step orchestration onStepFinish: async (event) => { const currentStep = event.steps[event.steps.length - 1] console.log(`Completed step ${currentStep.stepNumber}`) }, }) ``` **Control Patterns:** - `stepCountIs(n)` — Hard limit on tool invocations - Set based on expected tool chain depth - Prevents runaway loops - Common: `stepCountIs(5)` for 2-3 dependent tools --- ### 4.3 Dependent Tool Execution ```typescript const result = streamText({ model: openai('gpt-4o'), // Model will orchestrate this sequence: // 1. Call getLocation → receives coordinates // 2. Call getWeather with those coordinates → receives forecast // 3. Generate text summary prompt: `Get the location, fetch weather for that location, and provide a summary forecast.`, tools: { getLocation: tool({ description: 'Get user location as coordinates', parameters: z.object({}), execute: async () => ({ latitude: 40.7128, longitude: -74.006, }), }), getWeather: tool({ description: 'Get weather for coordinates', parameters: z.object({ latitude: z.number(), longitude: z.number(), }), execute: async ({ latitude, longitude }) => ({ temp: 72, condition: 'sunny', humidity: 65, }), }), }, onFinish: async (event) => { console.log('Final text:', event.text) // All tool results are in the conversation state }, }) ``` **Dependency Resolution:** - Model automatically chains tools (no explicit dependency definition) - Later tool calls receive results from earlier calls - Model decides when all dependencies are satisfied - Conversation state accumulates all results --- ## 5. Streaming UI Patterns ### 5.1 Intermediate State Streaming ```typescript import { createUIMessageStream } from '@ai-sdk/ui-utils' import { streamText } from 'ai' export async function POST(request: Request) { const { messages } = await request.json() const result = streamText({ model: openai('gpt-4o'), system: 'You are helpful.', messages, tools: { getWeather: tool({ description: 'Get weather', parameters: z.object({ location: z.string() }), execute: async ({ location }) => ({ temp: 72, condition: 'sunny', }), }), }, }) // Transform stream to UI messages return result.toUIMessageStreamResponse() } ``` **Streaming Benefits:** - UI receives intermediate states in real-time - Progressive rendering while tools execute - Better UX than waiting for complete response - Supported by `toUIMessageStreamResponse()` --- ### 5.2 Custom Data Stream Protocol ```typescript import { toUIMessageStreamResponse } from '@ai-sdk/ui-utils' export async function POST(request: Request) { const { messages } = await request.json() const result = streamText({ model: openai('gpt-4o'), messages, tools: { /* ... */ }, }) // Transform with custom error handling return result.toUIMessageStreamResponse({ onError: (error) => { if (NoSuchToolError.isInstance(error)) { return 'The assistant tried to use an unknown tool.' } else if (InvalidToolInputError.isInstance(error)) { return 'The assistant provided invalid input to a tool.' } else { return 'An error occurred.' } }, // Custom data field (optional) data: { sessionId: 'session-123', metadata: { /* ... */ }, }, }) } ``` **Custom Protocol:** - `toUIMessageStreamResponse()` transforms stream to UI format - `onError` callback for error messaging - Optional `data` field for custom metadata - Stream format: SSE (server-sent events) --- ### 5.3 toUIMessageStreamResponse Behavior ```typescript const stream = streamText({ model: openai('gpt-4o'), messages, tools: { /* ... */ }, }) // Transform the stream for UI consumption const response = stream.toUIMessageStreamResponse({ // Optional: custom error messages onError: (error) => { console.error('Stream error:', error) return `Error: ${error.message}` }, // Optional: metadata to include data: { timestamp: new Date(), source: 'chat-api', }, }) // Response is a Web Response with SSE stream return response ``` **Response Format:** - Returns `Response` object with streaming body - Content-Type: `text/event-stream` - Events contain: `text`, `toolCall`, `toolResult` data - Client-side: consumed by `useChat()` --- ## 6. Error Handling ### 6.1 createMCPClient Error Types ```typescript import { createMCPClient } from '@ai-sdk/mcp' import { Experimental_StdioMCPTransport } from '@ai-sdk/mcp' try { const client = await createMCPClient({ transport: new Experimental_StdioMCPTransport({ command: 'node server.js', }), }) const tools = await client.tools() } catch (error) { // Connection errors if (error.message.includes('ENOENT')) { console.error('Server process not found') } // Communication errors if (error.message.includes('timeout')) { console.error('Server communication timeout') } // Tool discovery errors if (error.message.includes('tools')) { console.error('Failed to discover tools') } } ``` **Error Sources:** - Server process failure (stdio) - Network timeout (HTTP/SSE) - Invalid tool schema from server - Authentication failure --- ### 6.2 MCP Tool Call Timeout Behavior ```typescript const result = streamText({ model: openai('gpt-4o'), tools: { slowTool: tool({ description: 'Slow operation', parameters: z.object({}), execute: async () => { // If this takes too long... await new Promise((r) => setTimeout(r, 30000)) // 30 seconds return { result: 'done' } }, }), }, onError: (error) => { // Timeout manifests as tool execution error console.error('Tool error:', error.message) }, }) ``` **Timeout Handling:** - Timeouts from tool execution propagate as errors - Default timeout: typically model-specific (few minutes) - No explicit MCP client timeout config - Handle in `onError` callback --- ### 6.3 User-Facing Error Surfacing ```typescript // API route export async function POST(request: Request) { try { const { messages } = await request.json() const result = streamText({ model: openai('gpt-4o'), messages, tools: { getWeather: tool({ description: 'Get weather', parameters: z.object({ location: z.string() }), execute: async ({ location }) => { // Might throw const data = await fetchWeather(location) return data }, }), }, }) // Transform stream and handle errors return result.toUIMessageStreamResponse({ onError: (error) => { // User-facing error messages if (NoSuchToolError.isInstance(error)) { return 'I tried to use a tool that is not available.' } else if (InvalidToolInputError.isInstance(error)) { return 'I provided invalid information to a tool.' } else if (error.message.includes('weather')) { return 'I could not fetch weather data. Please try again.' } else { return 'Something went wrong. Please try again.' } }, }) } catch (err) { // Server-side error handling console.error('Chat error:', err) return new Response('Internal server error', { status: 500 }) } } // Client-side const { messages, isLoading } = useChat({ onError: (error) => { // Propagate server error to UI setErrorMessage(error.message) }, }) ``` **Error Messaging Strategy:** - Tool-specific errors in `toUIMessageStreamResponse` callback - Generic fallback for unknown errors - Server-side logging for debugging - Client-side display for users --- ## 7. Middleware & Providers ### 7.1 Interceptors for Tool Calls ```typescript import { streamText } from 'ai' const result = streamText({ model: openai('gpt-4o'), messages, tools: { getWeather: tool({ description: 'Get weather', parameters: z.object({ location: z.string() }), execute: async ({ location }) => { // Can intercept at execution level console.log(`Tool called with: ${location}`) return { temp: 72 } }, }), }, // Intercept via experimental callbacks experimental_onToolCallStart: async (event) => { const { toolCall, stepNumber } = event console.log(`[${stepNumber}] Tool: ${toolCall.toolName}`) console.log(`Args: ${JSON.stringify(toolCall.args)}`) // Can log, validate, or reject here }, experimental_onToolCallFinish: async (event) => { const { toolCall, success } = event if (success) { console.log(`[${event.stepNumber}] Success: ${JSON.stringify(event.output)}`) } else { console.log(`[${event.stepNumber}] Error: ${event.error.message}`) } }, }) ``` **Interception Points:** - `experimental_onToolCallStart` — Before tool execution - `experimental_onToolCallFinish` — After tool execution - Tool `execute()` function itself - Logging, validation, monitoring opportunities --- ### 7.2 Wrapper Capabilities & Tool Call Repair ```typescript import { streamText } from 'ai' const result = streamText({ model: openai('gpt-4o'), tools: { /* ... */ }, // Repair malformed tool calls experimental_repairToolCall: async (options) => { const { toolCall, tools, messages } = options // Inspect malformed call console.log(`Repairing tool call: ${toolCall.toolName}`) console.log(`Invalid args: ${JSON.stringify(toolCall.args)}`) // Attempt to fix or return null to skip // Returns LanguageModelV4ToolCall | null // Example: if tool expects a string but got object if (toolCall.toolName === 'getWeather') { // Could attempt to extract location string const location = extractLocation(toolCall.args) if (location) { return { ...toolCall, args: { location }, // Repaired } } } // Return null if can't repair return null }, }) ``` **Repair Mechanism:** - `experimental_repairToolCall` called for invalid tool calls - Can inspect and attempt to fix arguments - Returns repaired `LanguageModelV4ToolCall` or null - Useful for handling model quirks or schema mismatches --- ### 7.3 MCP Sampling Provider for Nested Agents ```typescript import { createMCPClient } from '@ai-sdk/mcp' import { MCPSamplingProvider } from '@ai-sdk/mcp' import { streamText } from 'ai' // Create MCP client for sub-agent const mcpClient = await createMCPClient({ transport: { type: 'http', url: 'http://localhost:3001/mcp', }, }) // Create sampling provider (nested agent capability) const samplingProvider = new MCPSamplingProvider({ client: mcpClient, model: openai('gpt-4o'), }) // Use sampling provider in main agent const result = streamText({ model: openai('gpt-4o'), messages, tools: { // Regular tools getWeather: tool({ description: 'Get weather', parameters: z.object({ location: z.string() }), execute: async ({ location }) => ({ temp: 72 }), }), // Sub-agent access via sampling provider delegateToSubAgent: tool({ description: 'Ask sub-agent for help', parameters: z.object({ question: z.string() }), execute: async ({ question }) => { // Call sampling provider (runs sub-agent) const response = await samplingProvider.createMessage({ messages: [{ role: 'user', content: question }], }) return response.content }, }), }, }) ``` **Nested Agent Pattern:** - MCP Sampling Provider enables sub-agents - Sub-agent runs via sampling provider - Main agent orchestrates decisions - Useful for hierarchical task delegation --- ## 8. Prompt & Resource Management ### 8.1 Tool Schema & Prompt Integration ```typescript import { streamText } from 'ai' const systemPrompt = `You are a weather assistant. You have access to weather and location tools. Always try to get the location first, then fetch weather for that location. Be concise in your responses.` const result = streamText({ model: openai('gpt-4o'), system: systemPrompt, messages, tools: { // Tool schema is automatically included in prompt getLocation: tool({ description: 'Get current user location', parameters: z.object({}), execute: async () => ({ latitude: 40.7128, longitude: -74.006 }), }), getWeather: tool({ description: 'Get weather for coordinates', parameters: z.object({ latitude: z.number(), longitude: z.number(), }), execute: async ({ latitude, longitude }) => ({ temp: 72, condition: 'sunny', }), }), }, // Model uses system prompt + tool schemas to decide // "First get location, then fetch weather" }) ``` **Schema Integration:** - Tool descriptions automatically included in model prompt - Parameter schemas converted to JSON Schema - System prompt guides tool usage strategy - Model sees all tool options at once --- ### 8.2 Resource Management & Lifecycle ```typescript // Startup: Create clients and get tools async function setupChat() { const client = await createMCPClient({ transport: new Experimental_StdioMCPTransport({ command: 'node server.js', }), }) const tools = await client.tools() return { client, tools } } // Request handling async function handleChatMessage(userMessage: string) { const { client, tools } = await setupChat() // Fresh tools per request try { const result = streamText({ model: openai('gpt-4o'), messages: [{ role: 'user', content: userMessage }], tools, onFinish: async () => { // Cleanup await client.close() }, onError: async () => { // Cleanup on error await client.close() }, }) return result.toUIMessageStreamResponse() } catch (error) { // Ensure cleanup await client.close() throw error } } ``` **Resource Pattern:** - Create MCP clients per request (not singleton) - Fetch fresh tool schemas each time - Close clients in both success and error paths - Prevents resource leaks and stale tools --- ## 9. Complete Sidecar Integration Example ```typescript import { createMCPClient } from '@ai-sdk/mcp' import { Experimental_StdioMCPTransport } from '@ai-sdk/mcp' import { streamText, NoSuchToolError, InvalidToolInputError } from 'ai' import { openai } from '@ai-sdk/openai' // API endpoint export async function POST(request: Request) { const { messages } = await request.json() // 1. Create MCP client const mcpClient = await createMCPClient({ transport: new Experimental_StdioMCPTransport({ command: 'node sidecar-server.js', }), }) // 2. Get tools from sidecar const tools = await mcpClient.tools() // 3. Stream text with tool calling const result = streamText({ model: openai('gpt-4o'), system: 'You are a helpful assistant with access to various tools.', messages, tools, stopWhen: stepCountIs(5), // 4. Monitor tool execution experimental_onToolCallStart: async (event) => { console.log(`[${event.stepNumber}] Calling: ${event.toolCall.toolName}`) }, experimental_onToolCallFinish: async (event) => { if (event.success) { console.log(`[${event.stepNumber}] Success`) } else { console.log(`[${event.stepNumber}] Error: ${event.error.message}`) } }, // 5. Cleanup onFinish: async () => { await mcpClient.close() }, onError: async (error) => { await mcpClient.close() }, }) // 6. Transform stream with error handling return result.toUIMessageStreamResponse({ onError: (error) => { if (NoSuchToolError.isInstance(error)) { return 'Tool not available.' } else if (InvalidToolInputError.isInstance(error)) { return 'Tool input invalid.' } else { return 'Chat error.' } }, }) } ``` --- ## Key Takeaways 1. **Transport Flexibility** — Choose stdio, HTTP, or SSE based on server location 2. **Tool Discovery** — Tools fetched from MCP server at request time 3. **Multi-Step Orchestration** — Model decides tool sequence, `stepCountIs()` limits depth 4. **Streaming-First** — All patterns built around streaming for real-time UI 5. **Error Handling** — Specific error types for tool validation, user-facing error messaging 6. **Lifecycle Management** — Create clients per request, cleanup in `onFinish`/`onError` 7. **Experimental Callbacks** — Use `experimental_onToolCallStart/Finish` for monitoring 8. **Tool Repair** — `experimental_repairToolCall` handles malformed calls 9. **Nested Agents** — MCP Sampling Provider enables sub-agent delegation 10. **Client Integration** — `useChat()` with `onToolCall` for client-side execution --- ## References - **Vercel AI SDK:** https://ai.vercel.ai - **MCP Documentation:** https://modelcontextprotocol.io - **@ai-sdk/mcp:** npm package - **API Examples:** ai-sdk.dev/docs, cookbook examples --- ## Source: docs/references/vision/2026-03-22-hlt-brand-consolidation-sidecar-plan.md # HLT Mastery Brand Consolidation & Sidecar Plan > **Source:** original markdown authored 2026-03-22, archived at `~/Downloads/.archive/2026-04-16-hlt-inputs/HLT Mastery Brand Consolidation & Sidecar Plan/` > **Related registry entity:** `kb:hlt-brand-consolidation-sidecar-plan-2026-03` > **Relates to:** [2026-04-12-hlt-healthcare-funnel-blueprint.md](2026-04-12-hlt-healthcare-funnel-blueprint.md) (different surface — funnel vs. brand) · [../../planning/active/2026-04-16-content-factory-v3-zoomout.md](../../planning/active/2026-04-16-content-factory-v3-zoomout.md) **Date:** 2026-03-22 **Author:** Alec + Claude (planning session) **Status:** DRAFT — ready for team review **Scope:** Unify scattered brand assets, build a team-facing brand management surface, wire brand into AI content pipelines ## 1. The Problem (Why This Plan Exists) HLT has ~100 apps (top 20 matter) under the "HLT Mastery" parent brand. Brand assets are scattered across at least 6 systems: 1. **Catalyst Registry** — 91+ brand-related entities (voice KBs, visual identity, design tokens, channel definitions, bundles) 2. **Google Drive** — Laura's Brand Guidelines doc (March 2026, most current), logo folders, app icon folders, wordmark folders, older brand positioning docs (2019–2021) 3. **Repo (.claude/)** — KB files for brand voice per-app, design skills, design system rules, UI/UX principles 4. **Uploaded files** — HLT Mastery Apps Brand Guide PDF (4 pages), HLT Nursing Style Guide (editorial) 5. **Slack** — Scattered brand discussions, team feedback 6. **Cloudinary** — Asset CDN (cloud: dq9xmts6p), partially configured but app icons not yet uploaded Meanwhile, multiple team members are working on parallel brand efforts without a shared surface: - **Laura** — Brand Guidelines Google Doc (visual identity done, writing style section BLANK) - **Ben** — Skill structure spreadsheet mapping capabilities - **Cailey** — Needs visual guidelines for design work - **Emily** — Content creation needing consistent voice Nobody can see the full picture. Alec is the single point of failure for brand decisions. ## 2. Complete Asset Inventory (What We Found) ## 2.1 Catalyst Registry Entities | Entity | Type | What It Contains | Status | | -------------------------------------------------------------------------------------- | ------- | --------------------------------------------------------------------------------------------------- | -------------------------- | | kb:hlt-brand-foundation | KB | Master brand foundation (168K chars) — massive, likely needs pruning | Exists, possibly bloated | | kb:hlt-brand-voice-hlt-mastery-communication-guide | KB | Master communications reference — parent voice for all apps | Exists, large | | kb:hlt-brand-voice-nclex-rn | KB | Flagship app voice: mentor-to-student tone, messaging architecture, proof patterns, signature lines | Exists, detailed | | kb:hlt-brand-voice-nclex-pn | KB | NCLEX-PN specific voice variation | Exists | | kb:hlt-brand-voice-fnp | KB | FNP voice: peer-to-peer collegial tone for advanced practitioners | Exists, detailed | | kb:hlt-brand-voice-asvab | KB | ASVAB voice: no-nonsense military coaching style | Exists, detailed | | kb:hlt-brand-voice-content-writing | KB | General content writing voice guidelines | Exists | | kb:brand-voice-master | KB | OPERATOR-facing voice ("calm senior operator") — NOT customer-facing | Exists, often confused | | kb:hlt-brand-logo-design-feel | KB | Visual identity with design tokens, Apple-adjacent philosophy | Exists, has COLOR CONFLICT | | kb:brand-guide-content-style-1 | KB | Content style guide part 1 | Exists | | kb:brand-guide-content-style-2 | KB | Content style guide part 2 | Exists | | kb:brand-reference-colors | KB | Color reference | Exists | | kb:brand-reference-typography | KB | Typography reference | Exists | | skill:image-prompting | Skill | Image generation linked to Cloudinary + brand voice + content philosophy | Exists | | bundle:context-brand-voice | Bundle | Groups voice-related KBs for injection into content pipelines | Exists | | Channels: hlt-nclex-rn, hlt-nclex-pn, hlt-fnp, hlt-teas, hlt-pance, hlt-dat, hlt-asvab | Channel | Apps as channels for per-app routing | Exists (7 of ~100) | \***\*2.2 Google Drive Assets\*\*** | Asset | Location | Status | | ------------------------ | ---------------------------------------------------- | ------------------------------------------------------------ | | Laura's Brand Guidelines | Doc ID: 1HiaP0lQkl9aNlKNtT78HTzAmDDFT93yOdRbEiPeWwx8 | MOST CURRENT — visual identity complete, writing style BLANK | | Logo Files | drive/folders/1C5o2tLR2jixxgb-XE-yzAsU8lx9G4xlE | Exists, not inventoried | | App Icons | drive/folders/1aNQCWty1J4S5nhQOaRgozbBgShmWyXfM | Exists, not in Cloudinary yet | | Wordmarks | drive/folders/1OIgljP_Kqj2YWRFb7a3Er6UoAFo1XLfQ | Exists, not inventoried | | Brand Positioning (2020) | Google Doc | "Personalized learning made accessible and captivating" | | Brand Traits (2019–2021) | Google Docs | Supportive, Accessible, Passionate — detailed definitions | \***\*2.3 Repo Files (.claude/ and docs/)\*\*** | File/Dir | What It Is | | ---------------------------------------------------------- | ----------------------------------------------------------------------------- | | docs/references/DESIGN_SYSTEM_RULES.md (1,659 lines) | Comprehensive design system — surface hierarchy, transitions, banned patterns | | docs/references/UI_UX_DESIGN_PRINCIPLES.md (430 lines) | UI/UX principles | | docs/references/BRAND_VOICE_AND_UX_STRATEGY.md (446 lines) | Brand voice + UX strategy doc | | .claude/skills/curated/global/design/brand-guidelines/ | Brand guidelines skill | | .claude/skills/curated/global/design/canvas-design/ | Canvas design skill (12K) | | .claude/skills/curated/global/design/ui-ux-pro-max/ | Full UI/UX skill (14K) | | .claude/skills/curated/global/design/ui-polish/ | UI polish skill (12K) | | .claude/skills/curated/global/design/theme-factory/ | Theme factory skill (3.1K) | | .claude/kb/curated/org/design/hlt-brand-logo-design-feel/ | Visual identity KB (local mirror) | | .claude/kb/curated/org/design/brand-reference-colors/ | Color reference KB | | .claude/kb/curated/org/design/brand-reference-typography/ | Typography KB | | .claude/kb/curated/org/marketing/hlt-brand-\* | Per-app voice KBs (local mirrors) | \***\*2.4 Uploaded Materials\*\*** | File | What It Is | | -------------------------------------------------- | ------------------------------------------------------------------------------------------- | | HLT Mastery Apps Brand Guide PDF (23.5MB, 4 pages) | Official brand guide: colors, typography, app icons | | HLT Nursing Style Guide TXT (20K+ tokens) | Living editorial style guide — APA 7th, serial comma, HTML entities, product-specific rules | ## 3. Critical Findings & Conflicts ## 3.1 COLOR SYSTEM CONFLICT (Must Resolve First) Two different color systems exist: | Source | Primary | Accent | Philosophy | | ----------------------------------- | ------------ | ----------------- | ------------------------------- | | Laura's Doc + Brand Guide PDF | Blue #155EEF | Orange #F79009 | Modern, app-store friendly | | Catalyst hlt-brand-logo-design-feel | Teal #2B7C8B | Deep Navy #0B2B33 | Apple-adjacent, premium medical | **Recommendation:** Laura's doc is newer (March 2026) and was explicitly created to be the current standard. The Catalyst visual identity KB likely reflects an older or alternate direction. **Use Laura's colors as canonical, update Catalyst to match.** ## 3.2 Two Different "brand-voice-master" Concepts - kb:brand-voice-master in Catalyst = operator/internal voice ("calm senior operator" for system operations) - kb:hlt-brand-voice-hlt-mastery-communication-guide = actual customer-facing parent brand voice **Recommendation:** Rename brand-voice-master to brand-voice-operator or system-voice to eliminate confusion. The communications guide is the real brand voice master. ## 3.3 Voice Hierarchy Is Strong But Underdocumented The per-app voices are well-differentiated: | App | Voice Character | Tone | Audience Relationship | | -------- | ---------------------------- | ---------------------- | --------------------- | | NCLEX-RN | Mentor who's been there | Confident, encouraging | Mentor → student | | FNP | Experienced colleague | Collegial, peer-level | Peer → peer | | ASVAB | Drill-instructor-meets-tutor | No-nonsense, direct | Coach → recruit | | TEAS | Supportive guide | Warm, structured | Guide → first-timer | But there's no single document that shows this hierarchy visually or lets the team browse it. ## 3.4 Only 7 of ~100 Apps Exist as Channels Catalyst has channels for NCLEX-RN, NCLEX-PN, FNP, TEAS, PANCE, DAT, ASVAB. The other ~93 apps have no channel entities, no voice KBs, and no brand sheets. ## 3.5 Laura's Writing Style Section Is Blank The most current brand guidelines doc has visual identity locked in but the writing/voice section is empty. This is the single highest-leverage gap to fill — Laura already has the structure ready. ## 3.6 Cloudinary Is Half-Wired The skill:image-prompting in Catalyst references Cloudinary (cloud: dq9xmts6p) and has upload logic, but app icons and brand assets haven't been pushed to Cloudinary yet. No folder structure or naming convention exists. ## 4. The Plan: Four Workstreams ## Workstream A: Brand Source of Truth (Week 1–2) **Goal:** One canonical, browsable, editable place where all brand decisions live. **A1. Complete Laura's Brand Guidelines Doc (HIGH LEVERAGE, EASY WIN)** - Fill in the writing style section using existing Catalyst voice KBs and the Nursing Style Guide - Add the voice hierarchy table (parent → per-app variations) - Add the "cool teacher who swears in class" personality articulation - Add channel-specific tone calibration (email=personal, social=conversational, in-app=concise, blog=authoritative-casual) - **Who:** Laura + Alec review, Claude can draft the content from existing KBs - **Output:** Completed Google Doc ready for team use immediately **A2. Resolve Color Conflict** - Confirm Laura's Blue #155EEF / Orange #F79009 system as canonical - Update kb:hlt-brand-logo-design-feel in Catalyst to match - Update kb:brand-reference-colors in Catalyst - Update repo file .claude/kb/curated/org/design/brand-reference-colors/ - **Who:** Alec confirms direction, Claude executes updates - **Output:** Single color system everywhere **A3. Rename brand-voice-master** - Rename kb:brand-voice-master → kb:system-voice-operator or similar - Update any bundles or links that reference it - **Who:** Claude executes, Alec approves - **Output:** No more confusion between operator voice and brand voice **A4. Audit and Prune hlt-brand-foundation** - At 168K chars, this entity is bloated. Likely contains duplicated or outdated material - Extract what's still valid, archive the rest - **Who:** Claude audits, Alec/Laura review - **Output:** Lean, current brand foundation KB ## Workstream B: Sidecar Brand Management App (Week 2–4) **Goal:** A web app where the team can see, browse, and edit brand guidelines without needing Alec or Catalyst expertise. **B1. Architecture Decision** **Recommended stack:** - **Next.js** on Vercel (you already have Vercel MCP + deployment tools) - **Mastra** for AI agent orchestration (per your preference — handles tool calling, memory, workflows) - **Catalyst MCP** as the data backend (brand entities already live here) - **Cloudinary** for asset serving/management - **shadcn/ui** for components (you have the MCP for this) **Why this stack:** Zero new infrastructure. Catalyst is already the graph database for brand entities. The sidecar just becomes a pretty UI over the Catalyst API. Mastra handles the AI layer (brand voice enforcement, content generation). Vercel deploys it instantly. **B2. Core Pages** | Page | What It Shows | Who Uses It | | ---------------- | ------------------------------------------------------------------------------------------------ | --------------------------- | | Brand Overview | Parent brand identity: colors, typography, personality, voice summary, logo | Everyone | | App Directory | Grid/list of all apps with icon, colors, voice summary, status (complete/incomplete) | Everyone | | App Detail | Per-app brand sheet: icon, color variations, voice character, sample copy, channel-specific tone | Content creators, designers | | Voice Playground | Type text → see it rewritten in any app's voice, compare voices side-by-side | Emily, content team | | Asset Library | Browse/upload logos, icons, wordmarks (backed by Cloudinary) | Cailey, designers | | Style Guide | The editorial rules (from Nursing Style Guide) — searchable, filterable by product | Content writers | | Design Tokens | Live token explorer — colors, spacing, radii, shadows as copyable values | Developers, Cailey | **B3. Data Flow** ``` Laura's Google Doc (human-editable source of truth) ↓ (sync script or manual trigger) Catalyst Registry (structured KB entities) ↓ (MCP API calls) Sidecar Next.js App (read + display) ↓ (Mastra agent layer) Content Pipelines (Multimedia Mastery, blog, social, email) ``` **Key principle:** Laura's doc stays the human-friendly editing surface. Catalyst stays the structured data layer. The sidecar reads from Catalyst and displays beautifully. Content pipelines pull from Catalyst via MCP. Nobody has to learn a new tool — Laura edits in Google Docs, developers query Catalyst, the sidecar shows everyone else. **B4. MVP Scope (Build First)** 1. Brand Overview page (colors, typography, personality) 2. App Directory with the top 7 apps that already have channel entities 3. Asset Library connected to Cloudinary 4. Deploy to Vercel under a team-accessible URL **B5. Phase 2 (Build After MVP Works)** 5. Voice Playground with Mastra agent 6. App Detail pages with full brand sheets 7. Design Token explorer 8. Sync mechanism from Google Doc → Catalyst 9. Expand to top 20 apps ## Workstream C: Asset Pipeline & Cloudinary (Week 1–3, parallel with A & B) **Goal:** All brand assets in Cloudinary with a consistent structure, accessible by URL pattern. **C1. Cloudinary Folder Structure** ``` hlt-mastery/ ├── logos/ │ ├── primary/ (main HLT Mastery logo variants) │ ├── monochrome/ │ └── icon-only/ ├── app-icons/ │ ├── nclex-rn/ (all variants: light, dark, square, rounded) │ ├── nclex-pn/ │ ├── fnp/ │ ├── teas/ │ ├── pance/ │ ├── dat/ │ ├── asvab/ │ └── ... (expand as apps are added) ├── wordmarks/ │ ├── nclex-rn/ │ └── ... ├── brand-elements/ │ ├── patterns/ │ ├── illustrations/ │ └── photography/ └── generated/ (AI-created images from Multimedia Mastery) ├── social/ ├── blog/ └── ad-creative/ ``` **C2. Naming Convention** ``` {brand}-{asset-type}-{variant}-{size}.{ext} Examples: hlt-nclex-rn-icon-light-512.png hlt-mastery-logo-primary-horizontal.svg hlt-fnp-wordmark-dark.png ``` **C3. Upload Existing Assets** - Download from Google Drive folders (logos, app icons, wordmarks) - Rename per convention - Upload to Cloudinary with proper folder placement - Tag each asset in Cloudinary with: app name, asset type, color variant - **Who:** Can be scripted (Node.js + Cloudinary SDK + Google Drive API) - **Estimated effort:** Half-day scripting + review **C4. Wire Into Multimedia Mastery** - Update skill:image-prompting to reference the Cloudinary folder structure - Add brand asset lookup: "give me the NCLEX-RN icon" → returns Cloudinary URL - Add guardrail: generated images for HLT content auto-placed in generated/ subfolder ## Workstream D: AI Content Pipeline Brand Enforcement (Week 3–5) **Goal:** When content is created through AI pipelines, brand guidelines are available but not forced. Personal requests don't get HLT branding. **D1. The Routing Logic** ``` User request comes in ├── Has app context (e.g., "write NCLEX-RN email") │ → Load bundle:context-brand-voice + app-specific voice KB │ → Apply channel-specific tone (email=personal, social=conversational) │ → Generate on-brand content │ ├── Has "HLT" or brand context but no specific app │ → Load parent brand voice (communications guide) │ → Generate with parent brand personality │ └── No brand context (e.g., "make me a picture of a sunset") → Generate freely, no brand enforcement ``` **This is the "lego block" pattern you want:** Brand context is a composable layer, not a mandatory wrapper. The bundles in Catalyst already support this — bundle:context-brand-voice gets injected when brand context is present, skipped when it's not. **D2. Mastra Workflow Design** ``` // Pseudocode for the content creation workflow const brandWorkflow = new Workflow("content-with-brand") .step("detect-context", async (input) => { // Determine if this is brand content or personal return detectBrandContext(input); }) .step("load-brand", async (context) => { if (!context.isBranded) return null; // Pull from Catalyst MCP const voice = await catalyst.get(`kb:hlt-brand-voice-${context.app}`); const visual = await catalyst.get("kb:brand-reference-colors"); return { voice, visual }; }) .step("generate", async (input, brand) => { // Generate content with or without brand context return generateContent(input, brand); }) .step("review", async (output, brand) => { if (!brand) return output; // Optional brand compliance check return reviewAgainstGuidelines(output, brand); }); ``` **D3. Per-App Brand Sheets as Catalyst Entities** For each of the top 20 apps, create a structured entity: ``` # Example: kb:hlt-app-brand-sheet-nclex-rn app_name: "NCLEX-RN Mastery" app_slug: "nclex-rn" parent_brand: "hlt-mastery" icon_cloudinary: "hlt-mastery/app-icons/nclex-rn/" colors: primary: "#155EEF" accent: "#F79009" app_specific: "#0D9488" # teal for nursing voice_character: "Mentor who's been there" voice_tone: "Confident, encouraging, occasionally irreverent" audience_relationship: "Mentor → student" channel_calibration: email: "Personal, first-name basis, share stories" social: "Conversational, memes OK, link to value" in_app: "Concise, action-oriented, celebrate progress" blog: "Authoritative but accessible, cite evidence" sample_tagline: "You've got this. We've got the proof." status: "complete" ``` **D4. Multimedia Mastery Integration** - When Multimedia Mastery generates images for an HLT app: 1. Pull the app's brand sheet from Catalyst 2. Include brand colors and visual style in the image prompt 3. Auto-upload result to Cloudinary under generated/{channel}/ 4. Return Cloudinary URL (not raw base64) - When generating for non-brand requests: skip all of the above ## 5. Priority Sequencing (What To Do First) ## Immediate (This Week) | # | Action | Leverage | Effort | Who | | --- | --------------------------------------------------------------------------- | --------- | ------------------------------------------ | ------------------------------- | | 1 | Fill Laura's writing style section using existing Catalyst voice KBs | VERY HIGH | Low — content exists, just needs synthesis | Claude drafts → Laura reviews | | 2 | Resolve color conflict — confirm Blue #155EEF as canonical, update Catalyst | HIGH | Low — one decision + entity updates | Alec confirms → Claude executes | | 3 | Rename brand-voice-master to eliminate confusion | MEDIUM | Very low — rename + relink | Claude executes | \***\*Next 2 Weeks\*\*** | # | Action | Leverage | Effort | Who | | --- | ---------------------------------------------------------------- | --------- | ------------------------------------ | ----------------------------------- | | 4 | Upload app icons to Cloudinary with naming convention | HIGH | Medium — scripting + review | Claude scripts → Alec/Cailey review | | 5 | Create app brand sheets for top 7 apps in Catalyst | HIGH | Medium — template + 7 entities | Claude creates → team reviews | | 6 | MVP sidecar app — Brand Overview + App Directory + Asset Library | VERY HIGH | Medium-High — Next.js + Catalyst MCP | Claude builds → team tests | \***\*Following 2 Weeks\*\*** | # | Action | Leverage | Effort | Who | | --- | ------------------------------------------------------- | -------- | ----------------------- | -------------- | | 7 | Voice Playground in sidecar with Mastra | HIGH | Medium | Claude builds | | 8 | Expand to top 20 apps — brand sheets + channel entities | MEDIUM | Medium — templated work | Claude + Laura | | 9 | Wire Multimedia Mastery to pull brand context | HIGH | Medium | Claude builds | | 10 | Google Doc → Catalyst sync mechanism | MEDIUM | Medium | Claude builds | ## 6. Team Roles & Workflow (Post-Implementation) ## Who Owns What | Person | Responsibility | Tool They Use | | ------ | ---------------------------------------------------------- | --------------------------------------------- | | Alec | Final brand decisions, personality direction, app strategy | Sidecar app (review mode) | | Laura | Brand guidelines content, voice definitions, style rules | Google Docs → syncs to Catalyst | | Ben | Skill/capability structure, technical brand integration | Catalyst Registry directly | | Cailey | Visual assets, icon design, design token application | Sidecar app (Asset Library) + Cloudinary | | Emily | Content creation using brand voice | Sidecar app (Voice Playground) + AI pipelines | \***\*The Flow\*\*** 1. Laura updates brand guidelines in Google Doc (human-friendly) 2. Sync pushes structured data to Catalyst (machine-friendly) 3. Sidecar app displays it beautifully (team-friendly) 4. AI pipelines pull from Catalyst when generating content (automation-friendly) 5. Generated assets land in Cloudinary (asset-management-friendly) ## 7. The Lego Block Pattern The whole system is built from composable pieces that snap together: ``` BLOCKS: ├── Brand Voice KB (per-app personality + tone) ├── Brand Visual KB (colors, typography, visual style) ├── Channel Entity (app identity + routing metadata) ├── Brand Sheet Entity (unified per-app reference) ├── Bundle (groups KBs for pipeline injection) ├── Skill (how to use the brand in specific contexts) └── Cloudinary Asset (actual files: icons, logos, generated images) COMPOSITIONS: "Write NCLEX email" = Channel(nclex-rn) + Bundle(brand-voice) + Voice-KB(nclex-rn) + Skill(content-writing) "Make social post for FNP" = Channel(fnp) + Bundle(brand-voice) + Voice-KB(fnp) + Skill(social-media) + Visual-KB(colors) "Generate study guide image" = Channel(nclex-rn) + Skill(image-prompting) + Visual-KB(design-feel) + Cloudinary(upload) "Make me a sunset picture" = Skill(image-prompting) only — no brand blocks loaded ``` This is the "simple patterns, insane power" you described. Each block is small and self-contained. Power comes from how they compose. ## 8. Open Questions for Alec 1. **Top 20 apps list** — Can you provide the full list of the top 20 apps that matter? We have 7 in Catalyst. Which 13 more need brand sheets? 2. **Color confirmation** — Can you confirm Blue #155EEF + Orange #F79009 (Laura's doc) as the canonical color system, deprecating the Teal #2B7C8B from the older Catalyst entity? 3. **Sidecar URL** — Do you want this on a subdomain (e.g., brand.hltmastery.com) or a Vercel URL (e.g., hlt-brand.vercel.app)? 4. **Airtable integration** — You mentioned wanting to see things in Airtable via Whalesync. Should the sidecar app replace that need, or do you still want Airtable as an additional view? (Recommendation: the sidecar app replaces this need — one fewer tool to maintain.) 5. **Brand enforcement strictness** — When AI generates content with brand context, should it auto-run a compliance check, or just make brand context available and trust the output? (Recommendation: make it available, add optional review step, don't block generation.) 6. **Ben's spreadsheet** — Can Ben share the skill structure spreadsheet so we can reconcile it with what's in Catalyst? ## 9. Success Metrics How we know this worked: - **Team self-service:** Laura, Cailey, Emily can answer brand questions without asking Alec - **Content consistency:** AI-generated content for the same app sounds the same regardless of who triggered it - **Asset findability:** Any team member can find any app's icon, colors, or voice in < 30 seconds - **New app onboarding:** Adding a new app's brand takes < 1 hour (fill template, upload assets, create entities) - **Zero brand confusion:** No more "which colors are right?" or "what voice do we use for FNP?" ## 10. Appendix: Existing Entity Graph in Catalyst ``` bundle:context-brand-voice ├── kb:hlt-brand-voice-hlt-mastery-communication-guide (parent voice) ├── kb:hlt-brand-voice-nclex-rn ├── kb:hlt-brand-voice-nclex-pn ├── kb:hlt-brand-voice-fnp ├── kb:hlt-brand-voice-asvab ├── kb:hlt-brand-voice-content-writing ├── kb:brand-guide-content-style-1 └── kb:brand-guide-content-style-2 kb:hlt-brand-logo-design-feel (visual identity — NEEDS COLOR UPDATE) kb:brand-reference-colors (color tokens — NEEDS UPDATE) kb:brand-reference-typography (typography tokens) kb:hlt-brand-foundation (168K chars — NEEDS AUDIT/PRUNING) kb:brand-voice-master (operator voice — NEEDS RENAME) skill:image-prompting (Cloudinary + brand-aware image gen) channel:hlt-nclex-rn channel:hlt-nclex-pn channel:hlt-fnp channel:hlt-teas channel:hlt-pance channel:hlt-dat channel:hlt-asvab ``` ## 11. File Locations Reference | What | Where | | -------------------- | ------------------------------------------------------------- | | This plan | docs/planning/active/2026-03-22-brand-consolidation-plan.md | | Design system rules | docs/references/DESIGN_SYSTEM_RULES.md | | UI/UX principles | docs/references/UI_UX_DESIGN_PRINCIPLES.md | | Brand voice strategy | docs/references/BRAND_VOICE_AND_UX_STRATEGY.md | | Per-app voice KBs | .claude/kb/curated/org/marketing/hlt-brand-voice-\* | | Visual identity KBs | .claude/kb/curated/org/design/ | | Design skills | .claude/skills/curated/global/design/ | | Laura's brand doc | Google Drive ID: 1HiaP0lQkl9aNlKNtT78HTzAmDDFT93yOdRbEiPeWwx8 | | Drive logo folder | drive/folders/1C5o2tLR2jixxgb-XE-yzAsU8lx9G4xlE | | Drive app icons | drive/folders/1aNQCWty1J4S5nhQOaRgozbBgShmWyXfM | | Drive wordmarks | drive/folders/1OIgljP_Kqj2YWRFb7a3Er6UoAFo1XLfQ | --- ## Source: docs/references/vision/2026-04-12-hlt-healthcare-funnel-blueprint.md # HLT Mastery — The Healthcare Funnel Blueprint > **Source:** `docs/references/vision/2026-04-12-hlt-healthcare-funnel-blueprint.pdf` (12 pages, exported from Gamma via pdf-lib on 2026-04-12) > **Subtitle:** From Content to Career — An End-to-End System > **Audience:** Internal strategy. Proposes a unified Healthcare-Ops orchestration layer across HLT Mastery's existing assets. > **Relates to:** [../../planning/active/2026-04-16-content-factory-v3-zoomout.md](../../planning/active/2026-04-16-content-factory-v3-zoomout.md) · [../../planning/active/katailyst-nursing-architecture.md](../../planning/active/katailyst-nursing-architecture.md) --- ## TL;DR HLT Mastery has four mature but siloed assets: exam-prep apps (27K MAU, 7+ exams), the Katailyst Content Engine, NurseLink Hub (job scraping), and Nurse Career Navigator / NurseBid (job board). The blueprint proposes a **Healthcare-Ops AI Command Center** — slash-command modes + YAML-per-specialty configs — that stitches these assets into one automated funnel: SEO content → ads/social → job scrape + enrich → job board → AI matching, bidding, resume gen, coaching. Built once for nurses, configurable for dentists, PAs, and beyond. --- ## Page 1 — Cover **HLT Mastery: The Healthcare Funnel Blueprint** From Content to Career — An End-to-End System. ## Page 2 — What You Have Today (Overview) | Asset | What it is | | ------------------------------------- | ------------------------------------------------------------------------------------------------------------ | | **HLT Mastery Apps** | 27K MAU, 4,000+ QBank questions, 7+ exams | | **Katailyst Content Engine** | AI content factory with recipes, styles, rubrics | | **NurseLink Hub** | Job scraper — Apify actors, 157 canonical jobs, enrichment pipeline, Feed API + webhooks | | **Nurse Career Navigator / NurseBid** | Full job board — AI matching, bidding, hospital intel, forums, articles, resume gen, 50+ admin API endpoints | ## Page 3 — The Content Machine (Deep Dive) **NCLEX-ipedia: The Definitive Knowledge Base.** - **Vision:** Definitive wiki-style knowledge base for every healthcare exam. - **Content Types:** 13+ deep dives, how-tos, cheat sheets, QBank walkthroughs. - **Platform:** Framer CMS with 25-field schema. - **Pipeline:** Katailyst → Framer Server API → live site. - **Strategy:** SEO + AEO (AI Engine Optimization) dual strategy. - **Exams Covered:** 7+ exams — NCLEX-RN, NCLEX-PN, TEAS, ASVAB, PANCE, FNP, DAT. **Growth Flywheel:** SEO Articles → Organic Traffic → Free App Downloads → Premium Conversion. ## Page 4 — The Job Pipeline (Deep Dive) 1. **Scraping** — Apify actors scrape Indeed, LinkedIn, and Google Jobs for raw listings. 2. **Enrichment** — Deduplication + Salary, Certs, Skills, and Facility data enrichment. 3. **Distribution** — Feed API + Webhooks push enriched data to downstream sites. 4. **Job Board** — Nurse Career Navigator: AI matching, bidding, and hospital intel. **Nurse experience surfaces:** Browse · AI Match · Bid · Apply · Career Coaching. **Hospital intelligence layer:** Glassdoor ratings, nurse reviews, Magnet status. ## Page 5 — The Gap: What's Missing Today 1. **No Unified Orchestration Layer** — no cohesive system connecting content, jobs, and candidates into a single journey. 2. **Independent Operations** — the content machine and job pipeline operate in silos without automated feedback loops. 3. **Manual Ad & Social Generation** — no systematic, AI-driven generation of ads or social content tied directly to the funnel. 4. **Specialty Scalability** — no easy way to spin up the same high-performance system for new specialties (Dental, PA, etc.). 5. **Unadapted "Career-Ops" Framework** — the proven tech-sector job-search template hasn't been adapted for healthcare operations yet. ## Page 6 — The Proposal: Healthcare-Ops (Configurable AI Command Center) Four pillars: - **Career-Ops Architecture** — slash commands, specialized modes, evaluation pipelines, batch processing. - **YAML Configuration** — one config file per specialty (Nurse, Dentist, PA) to adapt the entire system instantly. - **API Integration** — seamless connection to Katailyst, Framer, NurseLink Hub, and Nurse Career Navigator. - **Unified Orchestration** — ties content, jobs, and candidates into a single, automated end-to-end funnel. ## Page 7 — The End-to-End Funnel | Layer | What runs here | | -------------------------- | ------------------------------------------------------------- | | **TOP — Content & SEO** | Katailyst → NCLEX-ipedia articles → Organic Traffic | | **AMPLIFY — Ads & Social** | AI-generated job ads (Indeed/Meta) + Social Posts (IG/TikTok) | | **MIDDLE — Job Discovery** | NurseLink Hub scrapes → Enriches → Pushes to Job Board | | **BOTTOM — Conversion** | AI Matching, Bidding, Applications, Resume Gen, Coaching | Arrows indicate the automated flow of data and candidates through the Healthcare-Ops system. ## Page 8 — Healthcare-Ops Modes: The Command Center | Command | Purpose | | --------------------------- | ------------------------------------------- | | `/healthcare-ops content` | Generate SEO articles via Katailyst recipes | | `/healthcare-ops ads` | Create job ads for Indeed, LinkedIn, Meta | | `/healthcare-ops social` | Generate posts for IG, TikTok, LinkedIn | | `/healthcare-ops scan` | Scrape job portals via NurseLink Hub | | `/healthcare-ops enrich` | Trigger job enrichment pipeline | | `/healthcare-ops push` | Push jobs to Nurse Career Navigator | | `/healthcare-ops evaluate` | Score and evaluate job listings | | `/healthcare-ops batch` | Process multiple jobs in parallel | | `/healthcare-ops tracker` | Pipeline status dashboard | | `/healthcare-ops analytics` | Cross-system performance analytics | ## Page 9 — Configurable by Specialty: One YAML, Infinite Possibilities **Nurse config** - Archetypes: ICU, Med-Surg, OR, Travel Nurse - Certifications: NCLEX, BLS, ACLS, PALS - Job Portals: Indeed Healthcare, Vivian Health **Dentist config** - Archetypes: General, Oral Surgery, Orthodontics - Certifications: DDS/DMD, DEA, State License - Job Portals: DentalPost, Indeed Dental **PA config** - Archetypes: Family Medicine, Emergency, Surgery - Certifications: PANCE, NCCPA, DEA - Job Portals: AAPA Job Board, Health eCareers > System adapts instantly via YAML configuration. ## Page 10 — Integration Architecture Unified API orchestration layer calls: - **Katailyst AI Engine** — content generation recipes & automated production. - **Framer Server API** — direct publishing to NCLEX-ipedia knowledge base. - **NurseLink Hub API** — trigger scraping & pull enriched job data. - **Nurse Navigator API** — push jobs, manage listings & candidate matching. - **Meta / LinkedIn APIs** — automated job ad posting & campaign management. - **Social Platforms** — publish organic content to IG, TikTok, LinkedIn. ## Page 11 — What We Build First: Phased Rollout | Phase | Window | Deliverables | | --------------------------------- | -------- | ---------------------------------------------------------------------------------------------------- | | **Phase 1 — Core Infrastructure** | Week 1–2 | Healthcare-ops template · content generation mode · job scanning mode · evaluation & push logic | | **Phase 2 — Amplify Layer** | Week 3–4 | Ad generation modes · social post automation · platform API integrations · creative asset pipeline | | **Phase 3 — Multi-Specialty** | Week 5–6 | YAML config system · dentist specialty template · PA specialty template · specialty-specific recipes | | **Phase 4 — Scale & Optimize** | Ongoing | Cross-system analytics · funnel optimization · new specialty onboarding · AI model fine-tuning | ## Page 12 — Closing: One System. Any Specialty. Full Funnel. **The future of healthcare operations.** - **Content Attracts** — SEO-optimized educational content draws in qualified candidates organically. - **Amplify Reach** — AI-driven ads and social campaigns expand visibility across all major platforms. - **Automated Jobs** — listings are automatically scraped, enriched, and published to the job board. - **AI Matching** — intelligent AI matches candidates to the most suitable opportunities instantly. > READY TO SCALE HLT MASTERY. --- ## How this fits the v3 Content Factory plan The v3 plan scopes `katailyst-1` as the brain, `sidecar-system` as the authoring surface, `MasteryPublishing-main` as the student-facing publishing target, and `ai4mastery-next-main` as the educator-facing target. The blueprint proposes a fifth capability — a **Healthcare-Ops command-center layer** that orchestrates content/ads/social/jobs across the same brain — and frames multi-specialty expansion (Dentist, PA) as a YAML-config problem, not a rebuild. Open questions the v3 plan will need to answer: - Does the Healthcare-Ops command surface live inside `sidecar-system`, or as a sibling app? - Is NCLEX-ipedia's "25-field Framer schema" the same schema v3 projects against from the `articles` row, or a separate target? - How does the job pipeline's evaluator loop interact with the article evaluator (shared rubric engine vs. separate)? - Does NurseBid's resume-gen flow collapse into the sidecar canvas or stay as a standalone surface? --- ## Source: docs/references/vision/README.md # Vision Docs Strategic source material that feeds the active planning docs under `docs/planning/active/`. Each doc here has a matching registry entity (`kb:…`) so MCP `discover` can surface it in agent context. **Rule:** vision docs are reference-grade and dated. They don't get superseded in place — if the strategy shifts, add a new doc with a new date and update the registry entity's `supersedes` link. --- ## Index | Doc | Date | Registry entity | What it's for | | -------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | ------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | [2026-04-12-hlt-healthcare-funnel-blueprint.md](2026-04-12-hlt-healthcare-funnel-blueprint.md) (+ [PDF](2026-04-12-hlt-healthcare-funnel-blueprint.pdf)) | 2026-04-12 | `kb:hlt-healthcare-funnel-blueprint-2026-04` | 12-page Gamma deck proposing a Healthcare-Ops AI Command Center that unifies HLT Mastery's four siloed assets (NCLEX apps, Katailyst Content Engine, NurseLink Hub, Nurse Career Navigator/NurseBid) into one funnel with slash-command modes + YAML-per-specialty configs. Companion amendment at [2026-04-16-healthcare-ops-v3-amendment.md](../../planning/active/2026-04-16-healthcare-ops-v3-amendment.md). | | [2026-03-22-hlt-brand-consolidation-sidecar-plan.md](2026-03-22-hlt-brand-consolidation-sidecar-plan.md) | 2026-03-22 | `kb:hlt-brand-consolidation-sidecar-plan-2026-03` | Audit of 91+ brand entities scattered across 6 systems (Catalyst registry, Google Drive, repo, uploaded files, Slack, Cloudinary). Identifies a color conflict (Laura's #155EEF vs. Catalyst's #2B7C8B), a voice-master naming collision, and proposes a brand-consolidation sidecar for Laura/Ben/Cailey/Emily. | --- ## Adding a new vision doc 1. Name it `docs/references/vision/YYYY-MM-DD-short-slug.md` — ISO date prefix; kebab-case slug. 2. Open with a blockquote header: source, companion registry entity ref, cross-links. 3. Keep the body readable top-to-bottom; structured tables over bullet walls. 4. Create a matching registry entity via `mcp__katailyst-registry__registry_create`: - `entity_type: "kb"`, `status: "staged"` (can't create `curated` directly — required tag namespaces kick in) - `content_json: { kb: { title, variants: { full, snippet, distilled }, item_type: "reference", content_type: "text/markdown", source: "<doc path>" } }` - Then add tags with `registry_manage_tags` — required namespaces: `format`, `scope`, `source`, plus relevant `brand:*`, `domain:*`, `family:*`, `audience:*`, `topic:*`. - Promote via `registry_update` with `status: "curated"`. - Link to related entities with `registry_link`. 5. Update this index. See `docs/references/external-codebases.md` for the sibling pattern for non-vision reference material. --- ## Source: docs/ROADMAP.md # Katailyst: Vision, Architecture & Roadmap > Created: 2026-03-15 | Updated: 2026-03-26 > Status: Active > Scope: Strategic vision for evolving Katailyst into a world-class AI-powered skill platform ## What This Document Is A strategic vision and execution roadmap for evolving Katailyst from an internal registry/armory into a world-class, AI-powered skill platform -- starting with HLT's own team, expanding to broader audiences over time. Covers what exists, what's missing, how the pieces connect, and a phased roadmap. --- ## The Vision (One Paragraph) Katailyst is the armory that makes any AI agent -- Claude Code, Mastra, n8n, OpenClaw, or whatever comes next -- dramatically better at education, healthcare, nursing, content, and marketing work. It's a living library of 1,500+ composable capability blocks (skills, prompts, KB, recipes, styles, rubrics) connected by a 10,000+ link knowledge graph, continuously improved through build-measure-learn loops, A/B testing, and real engagement data. Agents discover what they need through embedding-based semantic search and Cohere rerank, compose capability packets, execute with domain-expert depth, and feed results back. The best patterns survive. The system gets smarter. --- ## Architecture: How the Layers Connect ``` CONSUMERS (who uses the armory) +-----------+ +-----------+ +-----------+ +-----------+ | Claude | | Mastra | | n8n | | OpenClaw | | Code / | | Agent | | Workflow | | Fleet | | Cowork | | Network | | Automator | | (Render) | +-----+-----+ +-----+-----+ +-----+-----+ +-----+-----+ | | | | | +-----------+--------------+---------------+ | | MCP (streamable-http) +-----+--+----------------------------------------------+ | KATAILYST MCP SERVER | | 44 tools / 14 resources / 5 prompts / 9 toolsets | | discover | traverse | get_entity | tool.execute | | registry.create | lists.vote | delivery.schedule | +---------------------+---------------------------------+ | +---------------------+----------------------------------+ | KATAILYST REGISTRY | | | | +----------+ +----------+ +----------+ | | | Skills | | Prompts | | KB | 1,500+ | | | (238) | | (173) | | (303) | entities | | +----+-----+ +----+-----+ +----+-----+ | | | | | | | +----+-------------+-------------+------+ | | | KNOWLEDGE GRAPH (10,483 links) | | | | requires | recommends | pairs | | | | uses_kb | alternate | parent | | | +-------------------------------------------+ | | | | +----------+ +----------+ +----------+ | | | Rubrics | | Evals | | Metrics | Quality | | | (33) | | Signals | | (2,902) | layer | | +----------+ +----------+ +----------+ | +--------------------------------------------------------+ | +---------------------+----------------------------------+ | DISTRIBUTION | | | | .claude-plugin/ Plugin marketplace (Claude) | | .claude/skills/ Local mirror (Claude Code) | | /llms.txt LLM-readable index | | /mcp MCP endpoint | | Pipedream Delivery targets (186) | | Novu Notification workflows (5) | +--------------------------------------------------------+ | +---------------------+----------------------------------+ | COMPANION REPOS (new) | | | | katailyst-agents/ Claude Agent SDK sidecar agents | | katailyst-fleet/ Agent PM: roster, cron, traces | +--------------------------------------------------------+ ``` --- ## What Exists Today (Strengths) | Capability | State | Key Assets | | ----------------------- | ---------- | --------------------------------------------------------------- | | **Registry** | Production | 1,500+ entities, 21 types, Supabase canonical | | **Knowledge Graph** | Production | 10,483 typed/weighted links, graph-driven selection | | **MCP Server** | Production | 44 tools, 9 toolsets, 14 resources, 5 prompts, OAuth | | **Discovery** | Production | `discover_v2` 10-signal scoring, embeddings, Cohere rerank | | **Deliberation Engine** | Production | 5 patterns, 9 roles, auto-escalation, MCP `deliberate` tool | | **Plugin Format** | 90% ready | `.claude-plugin/` with skills, agents, hooks | | **Skill Factory** | Production | create-from-template, create-from-interview playbooks | | **Eval Infrastructure** | Production | 35 eval cases, rubric judging, pipeline eval, regression runner | | **Voting/Lists** | Built | `lists.create/vote/publish` MCP tools | | **Delivery Scheduling** | Built | create/list/get/reschedule/cancel/stats | | **20 Domain Hubs** | Production | Data, Design, Marketing, Social, Email, Education, NCLEX, etc. | | **Integrations** | Active | Pipedream, Cloudinary, Marketo, Firecrawl, Tavily, Brave | --- ## What's Missing (Gaps) | Gap | Impact | Difficulty | | ------------------------------------------- | ----------------------------------------------- | ----------- | | Plugin marketplace not published | No external discoverability | Low | | No parallel skill testing (arena mode) | Can't run 2-10 variants simultaneously | Medium | | No automated BML loops | Skills don't self-improve | Medium | | No sidecar agent repo | Can't run autonomous agents outside Claude Code | Medium | | No HybridRAG (vector + graph at query time) | Discovery misses relationship context | Medium | | No skill demo/preview for teams | Hard to evaluate skills before using them | Medium | | No n8n/Mastra integration | No cron scheduling, no agent network routing | Medium | | No fleet management | Agent tracking is manual | Medium-High | | No nightly skill scout | Missing new community skills/patterns | Medium | | Voting signals don't feed discovery weights | Community wisdom doesn't improve ranking | Low | | No personal prompt library UI | Teams can't save/share/rank their own prompts | Medium | --- ## How Katailyst Differentiates from Generic AI Tools Generic AI tools (Claude Code, Cursor, etc.) will ship generic plugins. Katailyst's moat: 1. **Domain-expert knowledge delta** -- 334 KB items encoding deep expertise in nursing education, NCLEX exam prep, healthcare content, social media for education companies. Base models know none of this at HLT's specificity level. 2. **Graph intelligence** -- a generic plugin is a flat file. Katailyst skills have 9,937 links telling agents "if you're using this skill, you also need this style, this schema, this KB." That compositional intelligence is the unfair advantage. 3. **BML loops with real data** -- skills connect to Localytics (app engagement), Metabase (analytics), and social metrics. Generic plugins can't measure whether they worked in your business. 4. **Personalization layers** -- personal prompts, team voting, org-level brand voice, HLT-specific content standards. Generic tools serve everyone equally (i.e., serve no one specifically). 5. **Vertical depth > horizontal breadth** -- the goal isn't to beat Claude Code at "write a README." The goal is to beat everything at "create an NCLEX practice question with evidence-based distractors aligned to NCSBN test plan categories and score it with a psychometric rubric." **Position:** Katailyst is the specialized armory that makes Claude Code (and any agent) 10x better at HLT's work. Claude Code is the general-purpose tool. Katailyst is the domain-expert upgrade. --- ## Plugin Marketplace: Private-First, Open Later **Immediate (HLT-only):** - Publish `.claude-plugin/` to a private GitHub repo - HLT team members add it with `/plugin marketplace add <private-repo-url>` - Curate 3-5 domain bundles: nursing-exam-prep, social-media-powerhouse, content-factory, registry-ops, education-toolkit - Team can browse, install, and use skills directly in Claude Code and Cowork **Later (public/partner):** - Open selected skill packs to partners or the public - Register with Anthropic's official plugin directory - Build a landing page showing what Katailyst skills can do (the demo/preview system) - Enterprise customers get private marketplace with RBAC via Cowork's admin settings **What's already in `.claude-plugin/`:** - 60+ skills (1password, agent-browser, ai-sdk, brainstorming, brand-guidelines, etc.) - 7 agents (atlas, code-reviewer, ivy, lucy, nova, quinn, rex) - Hooks with session-start, discovery-primer, context-synthesizer - Partner bundle (obra-superpowers with brainstorm, write-plan, execute-plan) --- ## Sidecar Agents (Claude Agent SDK) -- `katailyst-agents/` Repo **What the SDK gives you (March 2026):** - Same agent loop as Claude Code, programmable in TypeScript (v0.2.71) or Python (v0.1.48) - Native MCP server consumption -- point at your Katailyst MCP endpoint - Subagent composition -- agents spawning specialized sub-agents - In-process MCP servers -- custom tools without separate processes - Hooks for lifecycle events (pre/post tool call, error handling) - Built-in file ops, shell commands, web search **Agents to build:** | Agent | Purpose | MCP Tools Used | | ------------------- | ------------------------------------------------------------------- | ---------------------------------------------- | | `skill-creator` | Takes rough ideas/docs, creates registry-grade skills | `discover`, `registry.create`, `registry.link` | | `eval-runner` | Runs skills against rubrics, logs signals | `get_skill_content`, `discover` (rubrics) | | `nightly-scout` | Monitors GitHub/community for new skills, compares against registry | `discover`, `incoming.sources.list` | | `content-optimizer` | Takes content + metrics, generates improved variants | `discover`, `get_entity` (rubrics, metrics) | | `bml-loop-runner` | Orchestrates full build-measure-learn cycles | All of the above | **Key pattern:** Every agent starts by calling `discover` with rich intent to find the right skills/KB, then uses those capabilities to do its work. Results flow back via `registry.update`, eval signals, and metric points. --- ## Arena Mode: Parallel Skill Testing (2-10 Variants) **Flow:** ``` 1. Select a skill (e.g., skill:social-content) 2. Generate N variants (prompt rewrites via prompt-architecture) 3. Provide test inputs (sample brief, target audience, constraints) 4. Run all variants in parallel 5. Score each output against rubrics (engagement-v1, skill-quality, etc.) 6. Show side-by-side comparison with model reasoning traces 7. Auto-promote winner, archive losers with full trace 8. Feed scores into eval_signals for discovery ranking ``` **What exists to reuse:** - `prompt-architecture` skill for generating variants - `hlt-prompt-rewrite-prompt` for systematic prompt improvement - `rubric:skill-quality` and `rubric:engagement-v1` for scoring - `skill-judge` for quality evaluation - `/api/mcp/playground/run` endpoint for execution - `eval_runs` / `eval_signals` tables for result storage --- ## Graph + RAG Improvements (HybridRAG) **Current state:** `discover_v2` does semantic + text search. Graph expansion follows after discovery. These are sequential -- not truly hybrid. **Target state:** HybridRAG where vector search and graph traversal happen concurrently at query time, with results merged and re-ranked. **Steps:** 1. Ensure pgvector embeddings are generated for all entities (check `database/003-discovery-system.sql`) 2. Add parallel graph traversal during discovery -- when a vector hit is found, immediately pull its `requires`/`recommends` neighbors 3. Cross-encoder reranking -- after initial retrieval, use an LLM-as-judge to re-score the combined vector+graph results against the original intent 4. Entity linking bridge -- run NER on retrieved content to map mentions to graph nodes, surfacing relationship context 5. Graph-aware context windows -- when assembling the final capability packet, structure it as: primary hits -> required dependencies -> recommended companions -> contextual KB **Why this matters:** A skill about "NCLEX cardiovascular questions" should also surface the "nursing-pharmacology" KB (graph neighbor), the "bloom-taxonomy" schema (structural dependency), and the "clinical-reasoning" style (quality link) -- even if those terms weren't in the search query. That's the graph depth advantage over flat vector search. --- ## n8n + Mastra Integration ### n8n: The Automation Backbone n8n is bidirectional MCP as of March 2026 -- it consumes Katailyst MCP AND exposes its workflows as MCP servers. **Use cases:** - **Nightly cron:** Skill scout runs at 2am, checks GitHub trending + community repos - **Social media BML:** Post content -> wait 48h -> pull engagement metrics from Localytics/Metabase -> feed back to eval_signals -> generate improved variant - **Content publishing pipeline:** Draft -> review -> approve -> format for channel -> schedule delivery -> measure - **Alert workflows:** When a skill's eval score drops below threshold, trigger investigation **Integration path:** - Register key n8n workflows as `tool` entities in the registry - Agents discover and call n8n workflows via `tool.execute` - n8n workflows call Katailyst MCP to discover skills, create content, log results ### Mastra: The Agent Network Mastra's Agent Network (vNext) provides automatic routing without predetermined workflows. TypeScript-native like the Katailyst stack. **Use cases:** - **Supervisor pattern:** A coordinator agent decomposes "create a nursing study guide" into research, outline, draft, review subtasks - **Observational memory:** Track which skills work best in which contexts over time - **Agent Network routing:** Incoming requests automatically routed to the best-equipped agent based on intent + history - **BML integration:** Mastra agents that consume Katailyst skills and feed results back --- ## Fleet Management -- `katailyst-fleet/` Repo Separate repo from day one. Katailyst stays the armory. Fleet handles operations. **Core features:** - **Agent roster:** Dashboard of all agents (OpenClaw, Claude Code, Mastra, n8n) with status, last heartbeat, capabilities - **Task queue:** Assign work to agents, track progress, handle failures/retries - **Cron scheduler:** Recurring tasks (nightly audits, weekly reports, daily content generation) - **Run observability:** Trace what each agent did, which skills it used, inputs/outputs, timing, cost - **Agent-to-agent delegation:** Structured handoffs with context transfer (builds on `catalyst-agent-relay-js`) - **AI oversight layer:** Meta-agents that review other agents' output quality, flag regressions, suggest improvements **Tech stack:** Claude Agent SDK (TypeScript), connects to Katailyst MCP for capability discovery, separate Supabase project for fleet state. --- ## Vertical Skill Packs (Both in Parallel) ### Nursing/Healthcare Pack - **Foundation:** hub-education (103 degree), NCLEX A/B test KB, learning-objectives-v1 - **Skills to curate:** question-bank-generator, clinical-reasoning-scaffold, pharmacology-quick-ref, anatomy-visual-prompts, exam-anxiety-coach - **Rubrics:** psychometric-quality, bloom-alignment, distractor-quality, clinical-accuracy - **BML signals:** app engagement (Localytics), question completion rates, user ratings ### Social Media/Content Pack - **Foundation:** hub-social (123 degree), social-platform-intelligence-2026, social-content-winners-2026, brand-voice-master - **Skills to curate:** social-content, make-social playbook, platform-specific channel playbooks (Instagram, LinkedIn, TikTok, X) - **Rubrics:** engagement-v1, brand-consistency, hook-quality, platform-compliance - **BML signals:** post engagement metrics, click-through rates, follower growth, A/B test results --- ## Roadmap: Foundation First ### Phase 0: Foundation Audit (Week 1) **Goal:** Verify the foundation before building on it. - [ ] Run full test suite, lint, typecheck -- confirm tests passing - [ ] Verify MCP server health via `registry.health` - [ ] Audit plugin format -- validate all `.claude-plugin/` skills have proper frontmatter - [ ] Verify `discover_v2` returns expected results for education + social queries - [ ] Check that `lists.vote` signals are stored correctly - [ ] Document any gaps in the eval pipeline (rubrics without test cases, etc.) - [ ] Ensure graph links for top hub entities are complete and weighted ### Phase 1: Plugin Marketplace + Sidecar Foundation (Weeks 2-3) **Goal:** Make skills discoverable and installable. Stand up agent SDK repo. - [ ] Publish `.claude-plugin/` to private GitHub repo for HLT team - [ ] Curate top skills into nursing-exam-prep and social-media bundles - [ ] Test marketplace install from fresh Claude Code instance - [ ] Create `katailyst-agents/` repo with Claude Agent SDK (TypeScript) - [ ] Build first agent: `skill-creator` that connects to Katailyst MCP - [ ] Verify agent can `discover` -> `get_skill_content` -> create new skills ### Phase 2: Arena Mode + BML Infrastructure (Weeks 4-5) **Goal:** Enable parallel testing and build-measure-learn loops. - [ ] Extend MCP playground for arena mode (2-10 variants) - [ ] Build A/B variant generator using prompt-architecture - [ ] Wire voting signals into discovery ranking weights - [ ] Build `eval-runner` agent in sidecar repo - [ ] Create side-by-side comparison UI in CMS dashboard - [ ] Wire Localytics + Metabase metrics into eval_signals ### Phase 3: Integrations + Automation (Weeks 6-8) **Goal:** Connect n8n and Mastra. Enable automated loops. - [ ] Set up n8n with bidirectional MCP connection to Katailyst - [ ] Build nightly skill scout workflow (n8n cron -> Agent SDK agent) - [ ] Build social media BML loop (publish -> 48h -> measure -> iterate) - [ ] Set up Mastra agent network consuming Katailyst MCP - [ ] Register n8n workflows as tool entities in registry - [ ] Build `nightly-scout` agent in sidecar repo ### Phase 4: Fleet + Demos + Scale (Weeks 9-12) **Goal:** Full fleet management. Skill demos. Team prompt library. - [ ] Create `katailyst-fleet/` repo with roster, task queue, cron - [ ] Build agent observability dashboard - [ ] Build interactive skill demo/preview in CMS - [ ] Build personal prompt library with team voting UI - [ ] Create shareable demo links for skill packs - [ ] Build nursing + social media vertical skill packs with specialized rubrics - [ ] HybridRAG improvements to discover_v2 ### Phase 5: Open Up + Grow (Weeks 13+) **Goal:** Selectively open to partners/public. AI oversight. - [ ] Open selected skill packs to partners - [ ] Register with Anthropic's official plugin directory - [ ] Build AI oversight agents that review other agents' output - [ ] Build content pipeline from research -> publish -> measure - [ ] Forum/community integration for skill sharing - [ ] Long-horizon eval suite (30+ test cases, nightly runs) --- ## Use Cases: What This Enables **For HLT Content Team (Lila + humans):** > "Create a LinkedIn carousel about NCLEX study tips, optimized for the algorithm, using our brand voice" > Agent discovers: social-content skill + channel-playbook-linkedin KB + brand-voice-master style + image-prompting skill + social-platform-intelligence-2026 KB. Composes a carousel with copy, image prompts, and posting schedule. After posting, n8n tracks engagement and feeds results back. Next time, the system knows what worked. **For Nursing Curriculum Team:** > "Generate 50 pharmacology questions at the application level for NCLEX prep" > Agent discovers: question-bank-generator skill + learning-objectives-v1 prompt + pharmacology KB + bloom-taxonomy schema + clinical-reasoning style + distractor-quality rubric. Generates questions, scores them against the rubric, auto-rejects low-quality items, logs signals. Over time, the question generator gets better because eval signals improve discovery ranking. **For an External Partner (eventually):** > Installs Katailyst plugin marketplace in Claude Code. Gets access to education and content skill packs. Their Claude Code agent now has deep domain expertise in nursing education that would have taken months to build from scratch. **For the Nightly Scout:** > At 2am, the scout agent checks GitHub trending repos, Anthropic's plugin directory, and community skill repos. Finds 3 new skills relevant to content creation. Downloads them to `incoming/`, creates `DROP.md` with provenance. Runs them against existing rubrics. One outperforms the current `social-content` skill. Flags for human review. If approved, auto-promotes. --- ## Key External Resources - [Claude Code Plugin Docs](https://code.claude.com/docs/en/plugins) - [Claude Plugin Marketplace Docs](https://code.claude.com/docs/en/plugin-marketplaces) - [Claude Agent SDK Overview](https://platform.claude.com/docs/en/agent-sdk/overview) - [Agent SDK TypeScript](https://platform.claude.com/docs/en/agent-sdk/typescript) - [Agent SDK Python](https://platform.claude.com/docs/en/agent-sdk/python) - [Mastra Framework](https://mastra.ai/) - [Mastra Agent Network vNext](https://mastra.ai/blog/vnext-agent-network) - [n8n MCP Integration](https://www.n8n-mcp.com/) - [n8n as Agentic MCP Hub](https://www.infralovers.com/blog/2026-03-09-n8n-agentic-mcp-hub/) - [Hybrid RAG Guide 2026](https://calmops.com/ai/hybrid-search-rag-complete-guide-2026/) - [GraphRAG vs Vector RAG (Neo4j)](https://neo4j.com/blog/developer/knowledge-graph-vs-vector-rag/) - [Anthropic Official Plugin Directory](https://github.com/anthropics/claude-plugins-official) - [Enterprise Plugin Guide](https://almcorp.com/blog/claude-cowork-plugins-enterprise-guide/) - [Cowork Plugins Blog](https://claude.com/blog/cowork-plugins) --- ## Source: docs/runbooks/content-creator/fnp-article-shipping.md # FNP Article Shipping — First Ship Runbook **Goal:** ship one FNP deep-dive article end-to-end through the new sidecar pipeline (`research → dispatchSpecialist → evaluateArticle → publish`) with full lineage flowing to MasteryPublishing-main for the build-measure-learn loop. **Prereqs from Phase 0 / Phase 1 (already shipped in repo):** - `components/articles/*` + `app/resources/layout.tsx` in MasteryPublishing-main with the duplicate navbar removed, breadcrumbs added, sticky offsets corrected - `scripts/007_add_brief_id_lineage.sql` in MasteryPublishing-main (migration ready) - `app/api/publish/route.ts` in MP accepts `brief_id`, `writer_specialist_version`, `evaluator_score` - `domains/articles/draft/specialists/fnp-writer.ts` in sidecar-system (specialist prompt, Opus 4.6) - `domains/articles/evaluate/{rubric,evaluator}.ts` in sidecar-system (9-dimension rubric + evaluator-optimizer) - `app/(apps)/chat/tools/{dispatchSpecialist,evaluateArticle}.ts` in sidecar-system - `app/(apps)/chat/tools/contentEnginePublish.ts` accepts lineage fields - Chat system prompt nudges toward the specialist chain when tools are live ## 1. Apply migration 007 to production Supabase The migration is idempotent (`ADD COLUMN IF NOT EXISTS` + `CREATE INDEX IF NOT EXISTS`). ```bash # From MasteryPublishing-main psql "$DATABASE_URL" -f scripts/007_add_brief_id_lineage.sql ``` Verify: ```sql \d articles -- Expect: brief_id UUID, writer_specialist_version TEXT, evaluator_score NUMERIC(4,2) -- Indexes: idx_articles_brief_id, idx_articles_writer_version ``` ## 2. Register fnp-writer-v1 skill in Katailyst The SKILL.md is staged at `.claude/skills/imports/hlt-content-factory/fnp-writer-v1/SKILL.md`. From `katailyst-1/` repo root: ```bash # Dry run first — prints what would change without writing npx tsx scripts/registry/import/import_staged_skills_to_db.ts \ --root .claude/skills/imports \ --source hlt-content-factory \ --skill fnp-writer-v1 \ --dry-run # Apply npx tsx scripts/registry/import/import_staged_skills_to_db.ts \ --root .claude/skills/imports \ --source hlt-content-factory \ --skill fnp-writer-v1 ``` Verify via MCP: ``` discover({ intent: "FNP specialist writer for long-form articles" }) # Expect skill:fnp-writer-v1 in the top results. ``` ## 3. Register rubric:fnp-deep-dive-v1 in Katailyst Not a SKILL.md — rubrics use a direct seed script. From `katailyst-1/` repo root: ```bash # Dry run — prints planned insert + tags without DB writes npx tsx scripts/registry/seed/seed_fnp_content_factory_rubric.ts --dry-run # Apply — idempotent (no-op if content_hash unchanged) npx tsx scripts/registry/seed/seed_fnp_content_factory_rubric.ts ``` Verify via MCP: ``` get_entity({ entity_type: "rubric", code: "fnp-deep-dive-v1" }) ``` ## 4. Link skill <-> rubric in the graph Optional but recommended so `discover` surfaces them together. Via `registry_link` MCP (once the stringify bug in the Claude Code MCP bridge is fixed), or via SQL: ```sql -- Assuming the entity_links table with (from_id, to_id, relation) INSERT INTO entity_links (from_id, to_id, relation, created_by, weight) SELECT s.id, r.id, 'pairs_with', NULL, 1.0 FROM registry_entities s, registry_entities r WHERE s.code = 'fnp-writer-v1' AND s.entity_type = 'skill' AND r.code = 'fnp-deep-dive-v1' AND r.entity_type = 'rubric' ON CONFLICT DO NOTHING; ``` Plus the reverse direction for symmetric discoverability. ## 5. Confirm sidecar chat tools are live In `sidecar-system` repo: ```bash pnpm typecheck # should pass cleanly pnpm lint # 0 errors (2 pre-existing warnings OK) pnpm dev # start sidecar on port 3000 or whatever is configured # Open the chat UI. Confirm these tools appear in the tool picker: # - dispatchSpecialist # - evaluateArticle # - contentEnginePublish (requires CONTENT_ENGINE_PUBLISH_KEY env var) ``` ## 6. End-to-end smoke test: ship one FNP deep-dive In the sidecar chat (flexibility-first — the agent may vary the order): ``` User: "Write an FNP deep-dive on diagnosing asthma in a 6-year-old, targeting AANP exam prep. Research via Katailyst first. Use dispatchSpecialist and grade with evaluateArticle before publishing to content_engine." ``` Expected agent trajectory: 1. **Research** — multiple parallel `discover` calls against Katailyst (brand voice, exam blueprint, FNP product KB, adjacent asthma KBs); optional Firecrawl of top-5 SERP; optional qbank rationale pull if a qbank lookup tool is wired; SEO/PAA harvest. 2. **Brief compose** — an inline brief paragraph covering persona (28-year-old FNP student 8 weeks out), emotional anchor (fear of pediatric pharmacology), thesis, key sections, interactive placement, CTA, primary keyword. 3. **`dispatchSpecialist`** — `product_slug="fnp"`, `content_type="deep-dive"`, `topic="Diagnosing pediatric asthma for FNP boards"`, brief, research_context. Returns `{ article, specialist_version: "fnp-writer-v1", cycle: 1 }`. 4. **`evaluateArticle`** — `rubric_id="fnp_deep_dive_v1"`, article, research_context. Returns EvalReport with weighted_average + pass_result. 5. **If `revise`** — agent calls `dispatchSpecialist` again with `revision_directive` from the report. Max 3 cycles. 6. **`contentEnginePublish`** — `status="draft"`, `writer_specialist_version="fnp-writer-v1"`, `evaluator_score=<weighted_average>`. Optionally a `brief_id` if the agent registered a ContentBrief. 7. **Verify live** — response contains `https://hltmastery.com/nursing/resources/fnp/<slug>`. Visit it. ## 7. Verify lineage landed in Supabase ```sql SELECT id, slug, brief_id, writer_specialist_version, evaluator_score, created_at FROM articles WHERE slug LIKE '%asthma%' ORDER BY created_at DESC LIMIT 5; ``` `writer_specialist_version` should be `fnp-writer-v1`. `evaluator_score` should be a number between 0 and 10. ## 8. QA checklist before promoting status to 'published' - [ ] Header/breadcrumb render correctly at `hltmastery.com/nursing/resources/fnp/<slug>` - [ ] Mobile layout clean (Framer parent is mobile-first) - [ ] Hero image loads via Cloudinary - [ ] At least one interactive block present (clinical-vignette, qbank-embed, ai-exercise, reflection-prompt, mnemonic-card) - [ ] Citations list at article end, inline `[n]` references resolve - [ ] Meta title <= 60 chars, meta description 140-160 chars - [ ] No banned phrases present (`dive deep`, `unlock your potential`, `game-changer`, etc.) - [ ] Author is "Victoria" - [ ] `evaluator_score` >= 8.5 (rubric pass threshold) When all green, flip `status` to `published` in Supabase (or re-publish via the sidecar with `status: "published"` and explicit user approval). ## 9. Open items (follow-ups, not blocking first ship) - Apply `registry_link` bidirectionally between `skill:fnp-writer-v1` and `rubric:fnp-deep-dive-v1` via MCP once the Claude Code bridge serialization bug is fixed - Add specialist variants for `nclex-rn`, `pmhnp`, `teas`, `asvab`, `dat`, `pance`, `hesi` (template from `fnp-writer.ts`) - Wire the ContentBrief entity so `brief_id` is a real Katailyst entity id, not null - Set up the article_events beacon + nightly rollup (Phase 4 of the plan) - Build `/api/qbank/question/[id]` proxy to HLT partner API so the `qbank-embed` block renders live questions --- **Quality gate:** per the shipping-milestone decision ("Phase 0 + one FNP article"), do not start Phase 2 (interactive blocks) until one real FNP article is live and the lineage has been verified in Supabase. The point of the first ship is to prove the loop before extending the surface. --- ## Source: docs/runbooks/content-creator/sidecar-to-publish.md # Content Creator Pipeline — Sidecar → Katailyst → Publish **Canonical end-to-end contract for the content-creation pipeline.** Last verified: 2026-04-16. Registry entries (canonical, do not duplicate): - `operational_log:sidecar-system` — "Primary content production platform. Domain-specific chat interfaces with 8 configs (Article Factory, Social Command, Email Ops, Metrics...)" - `operational_log:mastery-publishing` — "Long-form publishing engine. Richer content display replacing older v0 naming. Articles published via Framer pipeline land here." - `hub:hub-article` — routes to `playbook:make-article`, `playbook:blog-production`, `bundle:blog-writing-kit` - `hub:hub-social`, `hub:hub-email`, `hub:hub-copywriting` — sibling hubs the sidecar uses per domain - `kb:framer-cms-integration` — Framer CMS integration playbook (weight 0.85 under hub-article) - `kb:content-creation-philosophy` — governs HOW agents approach creation --- ## The pipeline ``` ┌─────────────────────────────────────┐ │ Operator (the user) │ │ "Write a career guide on IA nursing" │ └───────────────┬─────────────────────┘ │ prompt ▼ ┌─────────────────────────────────────────────────────────┐ │ sidecar-system (separate repo, deployed to Vercel) │ │ NEXT_PUBLIC_SIDECAR_DOMAIN in {articles, social, │ │ email, metrics, ads, education, multimedia, template}│ │ Uses Vercel AI SDK + Anthropic Claude Opus 4.6 │ │ MCP client: CATALYST_MCP_URL = https://katailyst.com/mcp│ │ Auth: CATALYST_MCP_TOKEN (PAT issued from Katailyst) │ └───────────────┬─────────────────────────────────────────┘ │ MCP tool calls ▼ ┌───────────────────────────────────────────────────────────────┐ │ katailyst-1 MCP surface (this repo, app/mcp/route.ts) │ │ Tools the sidecar depends on — all live today: │ │ │ │ discovery-read/ │ │ discover(intent, audience, goals, constraints, ...) │ │ → ranked KBs, content_types, rubrics, brand voice │ │ get_entity(entity_type, code) │ │ get_skill_content(code) │ │ traverse(ref, depth, link_types) │ │ list_entities(...) │ │ search_tags(...) │ │ │ │ hosted-execution/ │ │ tool_search / tool_describe / tool_execute │ │ → dispatches Framer, Marketo, v0, Lovable, Replit, │ │ Supadata, Metabase, Localytics, Pipedream targets │ │ │ │ registry-read/ │ │ capabilities / health / agent_context / memory_query │ │ artifact_body (fetch skill markdown, playbook steps, etc.)│ │ │ │ registry-write/ (PAT must carry authoring scope) │ │ create_entity, update_entity, add_revision, link_entities │ │ │ │ delivery-schedule/ │ │ schedule, promote, targets_list, targets_discover │ │ │ │ deliberation/, marketo/, lists/ │ └───────────────┬───────────────────────────────────────────────┘ │ ├───── POST /api/publish ──────┐ ▼ │ ┌────────────────────────────────┐ │ │ content-engine │ │ │ (MasteryPublishing, separate │ │ │ repo at v0-next-js-content- │ │ │ engine.vercel.app) │ │ │ Proxied via Cloudflare Worker │ │ │ to hltmastery.com/nursing/ │ │ │ resources/{product}/{slug} │ │ └────────────────────────────────┘ │ ▼ ┌─────────────────────────────┐ │ Framer CMS (Resources) │ │ framer-sync-articles.ts │ │ writes draft items; │ │ production_locked flag │ │ gates promote-to-published │ └─────────────────────────────┘ ``` Two sinks, one source of truth. The same article entity in the Katailyst registry can publish to BOTH the content engine (→ hltmastery.com/nursing/resources) AND Framer CMS (→ HLTcorp Framer site) without duplicating the content body. --- ## Sidecar ↔ Katailyst MCP handshake ### Env vars (on the sidecar side) | Var | Value | Required | | ---------------------------- | ---------------------------------------------------------------------------------------------- | -------- | | `NEXT_PUBLIC_SIDECAR_DOMAIN` | one of: `articles`, `social`, `email`, `metrics`, `ads`, `education`, `multimedia`, `template` | yes | | `CATALYST_MCP_URL` | `https://www.katailyst.com/mcp` | yes | | `CATALYST_MCP_TOKEN` | PAT issued from Katailyst Settings → MCP tokens | yes | | `ANTHROPIC_API_KEY` | for Claude Opus 4.6 | yes | | `OPENAI_API_KEY` | for GPT-5 (fallback) | yes | | `DATABASE_URL` | sidecar's own Supabase (separate from Katailyst) | yes | | `BETTER_AUTH_SECRET` | auth secret | yes | ### Env vars (on the katailyst-1 side) | Var | Value | Required | Notes | | -------------------------- | ---------------------------------------------------------- | -------- | ------------------------------------------------------------- | | `CONTENT_ENGINE_API_URL` | defaults to `https://v0-next-js-content-engine.vercel.app` | no | override for staging | | `KATAILYST_API_KEY` | same value as the content engine's `KATAILYST_API_KEY` | yes | enforced by `lib/integrations/content-engine/client.ts:66-75` | | `FRAMER_API_KEY` | from Framer Server API | yes | for Framer CMS writes; account rows in `integration_accounts` | | Framer `production_locked` | boolean flag on the Framer site record | yes | if `true`, `framer-sync-articles.ts` refuses to write | ### Discovery flow (what the sidecar calls) The transcript at [../../planning/active/home-redesign.md §"sidecar-system chat empty state"](../../planning/active/home-redesign.md) shows a real end-to-end Iowa-nursing article task. The canonical shape: 1. **discover()** — 3+ parallel calls with different angles (content shape, brand voice, anti-patterns) 2. **get_entity(content_type, code)** — fetches the content type's lint/recipe/kb refs. Example: `article-career-guide` 3. **get_skill_content(brand_voice_code)** — pulls the voice KB body. Example: `hlt-brand-voice-nclex` 4. **tool_execute('framer.list_articles', { product_slug })** — dedup check against existing Framer items (23 NCLEX-RN items today) 5. **traverse(hub_ref, depth=2)** — grab the full hub toolkit when a hub surfaces in discover (e.g., `hub:hub-article`) 6. Sidecar drafts the article using the above context 7. **POST /api/mcp** → `tool_execute('content_engine.publish', payload)` OR direct HTTP to `lib/integrations/content-engine/client.ts:publishArticle()` --- ## Publish path (two sinks) ### Sink A — Content Engine (hltmastery.com) Entry points: - **HTTP**: `POST /api/publish` on the content-engine repo (contract in `lib/integrations/content-engine/contract.ts`) - **Client**: `publishArticle()` in `lib/integrations/content-engine/client.ts` - **Transformer**: `transformToContentEnginePayload()` in `lib/integrations/content-engine/from-katailyst-article.ts` - **Batch**: `scripts/integrations/content-engine-sync-articles.ts` (CLI with `--dry-run`, `--execute`, `--publish`, `--codes`, `--limit`) Idempotency: upstream uses `katailyst_id` as the primary key; safe to replay. Public URL: `https://hltmastery.com/nursing/resources/{product_slug}/{slug}` (Cloudflare-proxied to `https://v0-next-js-content-engine.vercel.app/resources/{product}/{slug}`). ### Sink B — Framer CMS (HLTcorp) Entry points: - **Batch**: `scripts/integrations/framer-sync-articles.ts` (CLI with same flags) - **Safety**: checks `production_locked` before writing; never calls `deploy()`; `publish()` creates preview only - **Default status**: "New AI Draft" - **Batch size**: 50 per addItems call Two sinks share helpers — `framer-sync-articles.ts` exports helpers reused by `from-katailyst-article.ts` (per commit `da88980e`). --- ## Open gaps + follow-ups 1. **No MCP tool wrapper for `content_engine.publish`** — sidecar today calls CLI scripts or direct HTTP. Consider registering via `createMarketoHandlers`-style handler in `lib/mcp/handlers/` so `tool_execute('content_engine.publish', ...)` works through the normal hosted-execution lane. 2. **Nightly mirror cron is not wired** — plan-v2.md §PR #8 specifies `app/api/cron/asset-factory/daily/route.ts` to mirror `interactive_artifacts` → `assets`. File does not exist yet. 3. **Replit integration has `warning` readiness** (per `registry_health()` on 2026-04-16). Account rows missing. Not blocking articles, but blocks Replit build_url/deploy tools. 4. **Asset-factory HTTP route is still legacy** — `app/api/interactive/publish/route.ts` is the live endpoint; the Phase 1 cleanup deleted the dormant `route.next.ts` scaffolding. When the Phase 2 cron + MCP handler land, revisit to decide whether to fold the asset-factory PublishRequest schema into the live route. --- ## Smoke tests ```bash # Registry + integrations healthy pnpm tsx -e "$(cat <<'EOF' import { createMcpClient } from '@/lib/mcp/client' const c = await createMcpClient(process.env.CATALYST_MCP_URL!, process.env.CATALYST_MCP_TOKEN!) console.log(await c.call('registry.health')) EOF )" # Dry-run content-engine sync for one article CONTENT_ENGINE_API_URL=$CONTENT_ENGINE_API_URL \ KATAILYST_API_KEY=$KATAILYST_API_KEY \ pnpm tsx scripts/integrations/content-engine-sync-articles.ts \ --org-code hlt --codes article-iowa-nursing-career-guide # Dry-run Framer sync pnpm tsx scripts/integrations/framer-sync-articles.ts \ --org-code hlt --codes article-iowa-nursing-career-guide # Audit Framer CMS (read-only) pnpm tsx scripts/integrations/framer-cms-audit.ts ``` --- ## Cross-references - Pipeline strategy: [`../../planning/active/asset-factory/plan-v2.md`](../../planning/active/asset-factory/plan-v2.md) - Home surfacing plan: [`../../planning/active/home-redesign.md`](../../planning/active/home-redesign.md) - Framer CMS integration KB: `kb:framer-cms-integration` (via MCP `get_skill_content`) - Content philosophy KB: `kb:content-creation-philosophy` (via MCP `get_skill_content`) - Article hub: `hub:hub-article` (via MCP `traverse` for full toolkit) - Upstream contract: [`../../../lib/integrations/content-engine/contract.ts`](../../../lib/integrations/content-engine/contract.ts) - Sidecar-system repo: `github.com/Awhitter/sidecar-system` (separate) --- ## Source: docs/runbooks/factory/framer-cms-content-pipeline.md # Framer CMS Content Pipeline Runbook **Status:** Active **Updated:** 2026-04-06 **Scope:** HLT Mastery content operations (Articles, Resources, Stories) **Audience:** Content operators, agents, automated factory workflows ## Purpose Provide a complete, autonomous operational guide for managing the Framer CMS content pipeline that powers HLT Mastery (NCLEX-RN, FNP, AGNP articles and resources). This runbook covers: - Credential management and Supabase vault retrieval - Framer API connection and authentication - CMS collection schema reference (field IDs, enum values, data types) - CRUD operations (create, read, update, delete articles) - Framer rich text HTML formatting rules (critical gotchas) - Content article templates and structure patterns - Cross-linking and UTM tracking patterns - Common errors and troubleshooting procedures This is the single source of truth for Framer CMS operations. Any content agent or factory process that touches HLT Mastery articles must use this runbook. ## Quick Start (2 minutes) For routine article creates/updates: 1. **Retrieve API key** from Supabase vault (see [Credential Retrieval](#credential-retrieval) section) 2. **Connect to Framer** using the connection pattern (see [Connection Pattern](#connection-pattern) section) 3. **Use the CRUD Templates** (see [API Operations](#api-operations) section) — copy the relevant code block and substitute field values 4. **Validate HTML** against [Framer Rich Text Rules](#framer-rich-text-htmlrules) before sending 5. **Test in draft** — always create articles with `Status: Draft` first ## Credential Retrieval ### Supabase Vault Access (Required) The Framer API key is encrypted and stored in Supabase `vault.decrypted_secrets`. **Connection String:** ``` postgres://postgres.exuervuuyjygnihansgl:tWSxGOnU6mvSyH8d@aws-1-us-east-1.pooler.supabase.com:5432/postgres ``` **Vault Key Name:** ``` org/hlt/framer/hltmastery/api-key ``` **Retrieval Code (Node.js):** ```javascript import pg from 'pg' const { Client: PgClient } = pg const pgClient = new PgClient({ connectionString: 'postgres://postgres.exuervuuyjygnihansgl:tWSxGOnU6mvSyH8d@aws-1-us-east-1.pooler.supabase.com:5432/postgres', ssl: { rejectUnauthorized: false }, }) await pgClient.connect() const vaultRes = await pgClient.query( "SELECT decrypted_secret FROM vault.decrypted_secrets WHERE name = 'org/hlt/framer/hltmastery/api-key' LIMIT 1", ) const apiKey = vaultRes.rows[0]?.decrypted_secret if (!apiKey) { throw new Error('Failed to retrieve Framer API key from vault') } await pgClient.end() console.log('API Key retrieved successfully') ``` **Important Security Notes:** - Never log or print the API key - Always use `ssl: { rejectUnauthorized: false }` for Supabase pooler connections - Close the connection immediately after retrieval - Do not embed the key in code — fetch it at runtime - Vault key name is case-sensitive: `org/hlt/framer/hltmastery/api-key` ## Connection Pattern ### Framer API Connection (WebSocket-based) ```javascript // 1. Retrieve API key (see section above) // const apiKey = ... // 2. Import framer-api const { connect } = await import('framer-api') // 3. Connect to the Framer project const PROJECT_URL = 'https://framer.com/projects/HLTMastery-com--montTdTggA8zwlRhAdOf-4L1i6' const framer = await connect(PROJECT_URL, apiKey) // 4. Get collections const collections = await framer.getCollections() console.log('Collections loaded:', collections.length) // 5. Find the Articles collection const articlesCollection = collections.find((c) => c.name === 'Articles') if (!articlesCollection) { throw new Error('Articles collection not found') } // 6. Get the items collection interface const articles = articlesCollection.items() ``` **Connection Checklist:** - [ ] API key retrieved and verified (not null, not empty) - [ ] `framer-api` module installed: `npm install framer-api` - [ ] Project URL is exact match: `https://framer.com/projects/HLTMastery-com--montTdTggA8zwlRhAdOf-4L1i6` - [ ] Collection lookup uses `.name === 'Articles'`, NOT `.slug` - [ ] Connection established without errors before attempting queries ### Connection Troubleshooting | Error | Cause | Fix | | --------------------------------- | ---------------------------------- | ----------------------------------------------------------------------------------- | | `Cannot find module 'framer-api'` | Package not installed | `npm install framer-api` | | `Failed to authenticate` | Invalid API key | Verify vault retrieval; check key has no whitespace | | `Project not found` | Wrong project URL | Use exact: `https://framer.com/projects/HLTMastery-com--montTdTggA8zwlRhAdOf-4L1i6` | | `Articles collection not found` | Lookup by slug instead of name | Use `.name === 'Articles'` not `.slug` | | `Connection timeout` | Network issue or vault unreachable | Check internet connection; verify Supabase region (AWS us-east-1) | ## CMS Schema Reference ### Collections Overview | Collection | ID | Purpose | Item Count | | ------------------ | ----------- | ----------------------------------------- | ---------- | | Articles | `oRXr591Ze` | Main content pieces (NCLEX-RN, FNP, AGNP) | 29 | | Resources | `G3ehViYD4` | Learning materials and references | 30 | | Stories | `hBloTGVrW` | User success stories | 30 | | Authors | `zCTMKgSLY` | Author profiles | 3 | | Tags | `QaWxDCWEU` | Content tags | N/A | | Resources (Legacy) | `IXpL5m8OO` | Deprecated resource collection | 13 | | Framer Blog | `Ws_NbkR9T` | Framer team blog content | 3 | ### Articles Collection Field Schema All field IDs are Framer-generated UUIDs. Do not change them — they are immutable references. ```javascript const FIELDS = { // Identifiers Title: 'bjAEz1hEH', // string; required slug: 'slug', // reserved; auto-generated from title // Classification Category: 'fA8uLUaZU', // enum; required; see ENUM values below Product: 'E7r5IMHYu', // enum; required — FNP=kVoqGt5bS, AGNP=g00JPHLpN, NCLEX-RN=kjzojxhjv Vertical: 'E73Zf_6Zx', // enum; required — NursePractitioner=tvVwSTs71, Nursing=BjkvQSfQU ContentType: 'iPk2JtByG', // enum; optional // Content Title: 'bjAEz1hEH', // string Subtitle: 'xk1OOvgPm', // string; optional Excerpt: 'zSteB3ctN', // string; optional; preview text Content: 'gDIxMWxw2', // formattedText; main body HTML Image: 'rQcV7vlpH', // image; optional; null or string URL // Metadata Date: 'klsejpDiZ', // date; publication/update date (YYYY-MM-DD) Status: 'ztCuodvw8', // enum; Draft or Published Featured: 'RfPd2TSgE', // boolean; pin to top Author: 'wbj8P2oSe', // collectionReference; points to Authors collection } ``` ### Enum Values Reference Enum values are immutable Framer IDs. Use exact IDs when setting enum fields. ```javascript const ENUM = { // Status Draft: 'sb42ZflLF', Published: 'NFAA1eUab', // Note: exact name may vary; verify if writing new status // Category (subcategory of content) TestTaking: 'GXa_qfHAB', TipsAndTricks: 'Y2hmykInU', HighYieldConcept: 'CIFO9HDu2', CommonChallenges: 'yqWjPaAGw', HowTo: 'vYJRJiVGo', // Product (exam/certification) 'NCLEX-RN': 'kjzojxhjv', 'NCLEX-PN': 'ybwI6nwtW', FNP: 'kVoqGt5bS', TEAS: 'tJPi9MfvW', 'HESI-A2': 'MPaEWLZk1', PANCE: 'fbiKTErJy', PTCB: 'kdOi7bdLD', CST: 'ivUlMBuv1', ASVAB: 'qd_fu34E6', DAT: 'wLJfKHGsG', INBDE: 'RZ1D7skZF', NBDHE: 'hrGUZoiOI', ACLS: 'Su1lKHF6d', ECG: 'qj13Dv1UF', AGNP: 'g00JPHLpN', // NOTE: CMS display name has trailing space "AGNP " // Vertical (specialty area) 'HLT Mastery': 'C_4W1uLHe', Nursing: 'BjkvQSfQU', Dental: 'S3lGdqgxD', 'Nurse Practitioner': 'tvVwSTs71', Military: 'lPslmzzt4', // Content Type Standard: 'NFAA1eUab', 'how-to': 'vYJRJiVGo', listicle: 'HfvaXAhRp', } ``` **Important Enum Notes:** - ENUM values are case-sensitive and must be exact UUIDs - FNP articles: `Product: FNP (kVoqGt5bS)`, `Vertical: Nurse Practitioner (tvVwSTs71)` - AGNP articles: `Product: AGNP (g00JPHLpN)`, `Vertical: Nurse Practitioner (tvVwSTs71)` - NCLEX-RN articles: `Product: NCLEX-RN (kjzojxhjv)`, `Vertical: Nursing (BjkvQSfQU)` - When creating a new article, always use `Draft` status first for testing - Do not invent enum values — use only the values listed above ### Author Reference ```javascript const AUTHORS = { Kristin: 'wTNBcwCNI', BenOConnor: 'D8sWxJ5eZ', Catherine: 'eKdF9yBKP', } ``` When setting the `Author` field, use the exact author ID from this table. ## API Operations ### Read Operations #### Get All Items (Articles) ```javascript const articles = articlesCollection.items() const allItems = await articles.getItems() console.log(`Loaded ${allItems.length} articles`) allItems.forEach((item) => { console.log(`- ${item.fieldData[FIELDS.Title].value}: ${item.slug}`) }) ``` **Output Structure:** Each item has: - `.id` — Framer's immutable item ID (UUID) - `.slug` — URL-friendly identifier (generated from title) - `.fieldData` — object mapping field IDs to values - Field values can be: - Plain values: `"string value"` or `true` - Object form: `{ type: "string", value: "..." }` - **Always check both forms** — Framer may return either #### Get Single Item by Slug ```javascript const items = await articles.getItems() const targetSlug = 'my-article-slug' const item = items.find((i) => i.slug === targetSlug) if (!item) { console.error(`Article not found: ${targetSlug}`) return } // Access field data const title = item.fieldData[FIELDS.Title] const content = item.fieldData[FIELDS.Content] // Handle both plain and object forms const titleValue = typeof title === 'string' ? title : title?.value const contentValue = typeof content === 'string' ? content : content?.value console.log('Title:', titleValue) console.log('Content length:', contentValue?.length || 0) ``` #### Get Specific Fields ```javascript const items = await articles.getItems() const item = items.find((i) => i.slug === 'my-slug') // Extract field values safely function getFieldValue(item, fieldId) { const raw = item.fieldData[fieldId] return typeof raw === 'string' ? raw : raw?.value } const title = getFieldValue(item, FIELDS.Title) const status = getFieldValue(item, FIELDS.Status) const author = getFieldValue(item, FIELDS.Author) const image = getFieldValue(item, FIELDS.Image) const content = getFieldValue(item, FIELDS.Content) console.log('Title:', title) console.log('Status:', status) console.log('Author ID:', author) console.log('Image URL:', image || '(no image)') console.log('Content HTML:', content.substring(0, 100) + '...') ``` ### Create Operations #### Create a New Article (IMPORTANT: No ID Field) ```javascript const articles = articlesCollection.items() const newArticle = await articles.addItems([ { slug: 'medication-dosage-calculations', fieldData: { // String fields [FIELDS.Title]: { type: 'string', value: 'Mastering Medication Dosage Calculations', }, [FIELDS.Subtitle]: { type: 'string', value: 'A practical guide to NCLEX exam questions', }, [FIELDS.Excerpt]: { type: 'string', value: 'Learn the step-by-step process for solving medication calculation problems...', }, // Formatted text (HTML) [FIELDS.Content]: { type: 'formattedText', value: ` <p>Opening paragraph introducing the topic.</p> <h3>Section Title</h3> <p>Content here.</p> <blockquote> <p><strong>Key Insight:</strong> Always verify your calculation using dimensional analysis.</p> </blockquote> <h4>Subsection</h4> <table> <tr> <th>Drug</th> <th>Typical Dose</th> </tr> <tr> <td>Acetaminophen</td> <td>650-1000 mg</td> </tr> </table> <h4>Keep Reading</h4> <ul dir="auto"> <li data-preset-tag="p"><p><a href="https://hltmastery.com/nursing/nclex-rn/resources/iv-rate-calculations">IV Rate Calculations</a></p></li> </ul> <h4>Final Word</h4> <p>Practice is key to mastery.</p> `, }, // Enum fields (use exact enum IDs) [FIELDS.Category]: { type: 'enum', value: ENUM.HighYieldConcept, }, [FIELDS.Product]: { type: 'enum', value: ENUM.Nursing, // For NCLEX-RN }, [FIELDS.Status]: { type: 'enum', value: ENUM.Draft, // Always start in Draft }, // Boolean fields [FIELDS.Featured]: { type: 'boolean', value: false, }, // Date fields (YYYY-MM-DD format) [FIELDS.Date]: { type: 'date', value: '2026-04-06', }, // Collection reference (author ID) [FIELDS.Author]: { type: 'collectionReference', value: AUTHORS.Kristin, }, // Optional: Image (null or absolute URL) [FIELDS.Image]: { type: 'image', value: null, // or: 'https://example.com/image.jpg' }, }, }, ]) console.log('Article created:', newArticle) console.log('Item ID:', newArticle[0].id) console.log('Slug:', newArticle[0].slug) ``` **Critical Notes on Creation:** - **DO NOT pass an `id` field** — Framer auto-generates IDs. Passing a pre-generated UUID causes "No item found" errors - The slug must be unique within the collection - Always use `Draft` status initially - Use enum IDs (not names) for Category, Product, Status - Date format must be `YYYY-MM-DD` - Image field must be `null` or a valid absolute HTTPS URL - AGNP product enum DOES exist: `g00JPHLpN` — always set Product for AGNP articles ### Update Operations #### Update an Existing Article (Read-Modify-Rewrite Pattern) Framer's API requires a read-delete-rewrite flow (no direct patch): ```javascript const articles = articlesCollection.items() // 1. Read the current article const items = await articles.getItems() const targetSlug = 'medication-dosage-calculations' const item = items.find((i) => i.slug === targetSlug) if (!item) { throw new Error(`Article not found: ${targetSlug}`) } // 2. Extract current field values (handle both plain and object forms) function extractValue(raw) { return typeof raw === 'string' ? raw : raw?.value } const currentData = { title: extractValue(item.fieldData[FIELDS.Title]), subtitle: extractValue(item.fieldData[FIELDS.Subtitle]), excerpt: extractValue(item.fieldData[FIELDS.Excerpt]), content: extractValue(item.fieldData[FIELDS.Content]), category: extractValue(item.fieldData[FIELDS.Category]), product: extractValue(item.fieldData[FIELDS.Product]), status: extractValue(item.fieldData[FIELDS.Status]), author: extractValue(item.fieldData[FIELDS.Author]), date: extractValue(item.fieldData[FIELDS.Date]), image: extractValue(item.fieldData[FIELDS.Image]), featured: extractValue(item.fieldData[FIELDS.Featured]), } // ⚠️ SEE WARNING BELOW — remove/re-add is EXTREMELY DANGEROUS ``` > **⛔ CRITICAL WARNING: The remove/re-add update pattern is EXTREMELY DANGEROUS and has caused repeated data loss.** > > **What happens:** `removeItems()` deletes the article immediately and irreversibly. If the subsequent `addItems()` fails (which it frequently does due to frozen objects, enum validation errors, or Image field type mismatches), the article is gone forever. > > **Known failure modes:** > > - The `Image` field (`rQcV7vlpH`) returns a frozen object from `getItems()` that fails Framer's `(null | string)` type validator on re-add. > - Enum fields may return nested objects `{type: "enum", value: {...}}` instead of flat `{type: "enum", value: "id"}`, causing "Expected a valid enum case, got: [object Object]" errors. > - The `Category` field has failed reconstruction in every attempt. > > **Recommended approach:** Do NOT update existing articles via the Server API. Instead: > > 1. **For tag/metadata fixes:** Delete the old article, then create a brand-new article with all fieldData set explicitly from known-good enum IDs (not reconstructed from read-back data). > 2. **For content updates:** Same approach — delete old, create new with full explicit fieldData. > 3. **Always have the full article content available before deleting.** Never delete an article you cannot fully reconstruct. > 4. **Never reconstruct fieldData from `getItems()` return values.** Always use hardcoded, known-good enum IDs. ### Delete Operations #### Delete an Article by Slug ```javascript const articles = articlesCollection.items() async function deleteArticle(targetSlug) { const items = await articles.getItems() const item = items.find((i) => i.slug === targetSlug) if (!item) { throw new Error(`Article not found: ${targetSlug}`) } // Confirm before deletion (this is destructive) console.log(`Deleting article: "${item.fieldData[FIELDS.Title].value}"`) console.log(`Slug: ${item.slug}`) // Perform deletion await articles.removeItems([item.id]) console.log('Article deleted successfully') } // Usage await deleteArticle('medication-dosage-calculations') ``` **Delete Safety Checklist:** - [ ] Verified correct article slug - [ ] Checked that article is not featured or pinned - [ ] Confirmed no incoming cross-links from other articles - [ ] Saved a backup of the article content (if needed) - [ ] Authorized by content owner before proceeding ## Framer Rich Text HTML Rules ### Critical: Supported Tags Only Framer Rich Text has a strict whitelist of supported HTML tags. Unsupported tags are silently stripped or cause rendering issues. #### Supported Tags (100% Safe) ```html <!-- Block-level --> <blockquote> <p>Quote text</p> </blockquote> <table> <tr> <th>Header</th> <th>Column 2</th> </tr> <tr> <td>Data</td> <td>More data</td> </tr> </table> <!-- Headings (h2–h6) --> <h2>Large Heading (use sparingly — very large in rich text)</h2> <h3>Section Heading (PREFERRED for article sections)</h3> <h4>Subsection Heading (PREFERRED for subsections)</h4> <h5>Minor Heading</h5> <h6>Tiny Heading</h6> <!-- Paragraphs and inline --> <p> Paragraph text with <strong>bold</strong>, <em>italic</em>, and <a href="https://example.com">links</a>. </p> <!-- Lists --> <ul dir="auto"> <li data-preset-tag="p"><p>Bullet item</p></li> <li data-preset-tag="p"><p>Another item</p></li> </ul> <ol dir="auto"> <li data-preset-tag="p"><p>Numbered item 1</p></li> <li data-preset-tag="p"><p>Numbered item 2</p></li> </ol> <!-- Inline code (block code is NOT supported) --> <p>Use <code>const x = 42;</code> for inline code.</p> <!-- Images --> <img src="https://example.com/image.jpg" alt="Description" /> ``` #### Unsupported / Problematic Tags (DON'T USE) | Tag | What Happens | What to Do Instead | | ---------------------- | ---------------------------------------- | ------------------------------------------------------ | | `<hr>` | Stripped silently | Use empty paragraphs or `<blockquote>` as visual break | | `<div>`, `<span>` | Stripped; content survives | Wrap in `<p>` tags instead | | `<br>` inside `<p>` | Converted to `<br><br>` (double spacing) | Use separate `<p>` tags instead | | `<pre><code>` | Not supported (blocks rendering) | Use `<code>` for inline only | | `<style>`, inline CSS | Stripped completely | Use Framer component styling | | Relative URLs in `<a>` | Stripped to bare text | Always use absolute `https://...` URLs | | `<form>`, `<input>` | Not supported | Not applicable for content | ### Heading Size Rules **Critical: h2 is VERY LARGE in Framer Rich Text** - **h2** → Renders massive (subtitle-size) — avoid for article body - **h3** → Preferred for main section headings (good size) - **h4** → Preferred for subsection headings (smaller, still clear) - **h5, h6** → Rarely needed; hard to read **Pattern:** ```html <h3>Main Section</h3> <p>Section content...</p> <h4>Subsection</h4> <p>Subsection content...</p> ``` ### List Formatting Rules Lists must use `dir="auto"` and wrap items in `<li data-preset-tag="p">` with `<p>` tags: ```html <!-- CORRECT: Unordered List --> <ul dir="auto"> <li data-preset-tag="p"><p>First item</p></li> <li data-preset-tag="p"><p>Second item</p></li> <li data-preset-tag="p"> <p>Nested items:</p> <ul dir="auto"> <li data-preset-tag="p"><p>Sub-item 1</p></li> <li data-preset-tag="p"><p>Sub-item 2</p></li> </ul> </li> </ul> <!-- CORRECT: Ordered List --> <ol dir="auto"> <li data-preset-tag="p"><p>Step 1: Do this first</p></li> <li data-preset-tag="p"><p>Step 2: Then this</p></li> <li data-preset-tag="p"><p>Step 3: Finally this</p></li> </ol> ``` **List Gotchas:** - Missing `dir="auto"` causes rendering issues - Missing `data-preset-tag="p"` causes indentation problems - Directly typing `* item` without HTML structure is converted to broken lists - Nested lists must follow the same `<ul dir="auto"><li><p>` structure ### Link Rules (CRITICAL: Absolute URLs Only) **Framer strips all relative URLs.** Only absolute `https://` URLs survive. ```html <!-- WORKS: Absolute URL --> <a href="https://hltmastery.com/nursing/nclex-rn/resources/medication-calculations"> Medication Calculations </a> <!-- BROKEN: Relative URL (stripped to bare text) --> <a href="/nursing/nclex-rn/resources/medication-calculations"> Medication Calculations </a> <!-- BROKEN: Protocol-relative (also stripped) --> <a href="//hltmastery.com/nursing/nclex-rn/resources/medication-calculations"> Medication Calculations </a> ``` **Link Validation Before Sending:** ```javascript function validateLinks(html) { const linkRegex = /<a href="([^"]+)"/g const matches = [...html.matchAll(linkRegex)] for (const match of matches) { const href = match[1] if (!href.startsWith('https://')) { throw new Error(`Invalid link found: "${href}" (must be absolute HTTPS URL)`) } } console.log(`✓ All ${matches.length} links are absolute HTTPS URLs`) } // Usage validateLinks(contentHtml) ``` ### Blockquote Pattern (Key Insight) Use blockquote for highlighted insights: ```html <blockquote> <p> <strong>Key Insight:</strong> Always verify medications against the "rights" of administration. </p> </blockquote> <!-- Or for longer quotes --> <blockquote> <p> This is a longer blockquote that spans multiple sentences. It could be a definition, a critical safety point, or a memorable phrase from a study guide. </p> </blockquote> ``` **Blockquote Usage:** - Minimum one blockquote per article (best practice) - Use for "Key Insight," definitions, or critical safety points - Bold the label if applicable: `<strong>Key Insight:</strong>` - Blockquote renders with left border and subtle background ### Table Formatting Tables are wrapped in `<figure>` by Framer automatically. Write clean HTML: ```html <table> <tr> <th>Concept</th> <th>NCLEX Relevance</th> <th>Study Strategy</th> </tr> <tr> <td>Pharmacokinetics</td> <td>High — 15-20% of questions</td> <td>Memorize ADME</td> </tr> <tr> <td>Contraindications</td> <td>High — 20-25% of questions</td> <td>Study by drug class</td> </tr> </table> ``` **Table Best Practices:** - Always use `<th>` for headers, `<td>` for data - Keep columns balanced (not too wide or narrow) - 2–5 columns recommended - Comparison tables (2–3 columns) work best - Do NOT use nested tables ### Spacing and Empty Elements **Gotcha: Spacer paragraphs between blocks are stripped** ```html <!-- WRONG: Empty <p> between blocks is stripped --> <blockquote><p>Quote</p></blockquote> <p>Next paragraph</p> <!-- This empty <p> is stripped --> <!-- RIGHT: Just nest the elements; Framer handles spacing --> <blockquote><p>Quote</p></blockquote> <p>Next paragraph</p> <!-- WRONG: <br> creates double spacing --> <p>Paragraph<br />with break</p> <!-- Renders as <br><br> — too much space --> <!-- RIGHT: Use separate paragraphs --> <p>First paragraph.</p> <p>Second paragraph.</p> ``` ### Image Handling The `Image` field expects `null` or a string URL (not an object): ```javascript // CORRECT [FIELDS.Image]: { type: 'image', value: 'https://example.com/article-hero.jpg' } // ALSO CORRECT (no image) [FIELDS.Image]: { type: 'image', value: null } // WRONG (passing an object) [FIELDS.Image]: { type: 'image', value: { url: 'https://example.com/...', alt: '...' } // ERROR } ``` For inline images within rich text content, use `<img>` tags: ```html <p>Here is an inline image:</p> <img src="https://example.com/inline-diagram.jpg" alt="Pharmacokinetics Flow Diagram" /> <p>Description of the image.</p> ``` ## Content Templates and Patterns ### Article Structure Template Every well-formed article follows this structure: ```html <!-- 1. Opening paragraph (hook and context) --> <p> This article covers [topic]. Whether you're preparing for the NCLEX-RN or deepening your understanding, mastering [specific concept] is essential for safe patient care and exam success. </p> <!-- 2. Key Insight blockquote --> <blockquote> <p><strong>Key Insight:</strong> [One critical takeaway or safety principle]</p> </blockquote> <!-- 3. Main sections with h3 headings --> <h3>Section 1: Fundamentals</h3> <p>Explanation of foundational concept...</p> <h3>Section 2: Application</h3> <p>How this applies to patient care...</p> <!-- 4. Comparison/reference table --> <table> <tr> <th>Concept</th> <th>Key Feature</th> </tr> <tr> <td>Item 1</td> <td>Detail</td> </tr> </table> <!-- 5. h4 subsections for deeper exploration --> <h4>Common Pitfalls</h4> <p>Students often mistake...</p> <h4>Test-Taking Strategy</h4> <p>When you see [keyword], remember [principle]...</p> <!-- 6. Keep Reading cross-links section --> <h4>Keep Reading</h4> <ul dir="auto"> <li data-preset-tag="p"> <p> <a href="https://hltmastery.com/nursing/nclex-rn/resources/related-article-1" >Related Article 1</a > </p> </li> <li data-preset-tag="p"> <p> <a href="https://hltmastery.com/nursing/nclex-rn/resources/related-article-2" >Related Article 2</a > </p> </li> </ul> <!-- 7. Product mention with UTM tracking --> <p> Ready to accelerate your NCLEX-RN preparation? <a href="https://hltmastery.com/nursing/nclex-rn?utm_source=hltmastery_blog&utm_medium=article&utm_campaign=nursing_content_2026&utm_content=article-slug" > Join HLT Mastery </a> and get access to 1,000+ high-yield practice questions and expert video explanations. </p> <!-- 8. Final Word closing section --> <h4>Final Word</h4> <p> Mastering [topic] is a key milestone on your path to nursing excellence. Practice consistently, learn from mistakes, and trust your preparation. </p> ``` ### Cross-Link Pattern When linking to other HLT Mastery articles: ```html <h4>Keep Reading</h4> <ul dir="auto"> <li data-preset-tag="p"> <p> <a href="https://hltmastery.com/nursing/nclex-rn/resources/medication-math" >Medication Math Fundamentals</a > </p> </li> <li data-preset-tag="p"> <p> <a href="https://hltmastery.com/nursing/nclex-rn/resources/pediatric-dosing" >Pediatric Dosage Calculations</a > </p> </li> </ul> ``` **URL Patterns by Product:** - **NCLEX-RN:** `https://hltmastery.com/nursing/nclex-rn/resources/{slug}` - **FNP:** `https://hltmastery.com/nursing/fnp/resources/{slug}` - **AGNP:** `https://hltmastery.com/nursing/agnp/resources/{slug}` ### UTM Tracking Pattern All product links should include UTM parameters: ``` utm_source=hltmastery_blog utm_medium=article utm_campaign={product}_content_2026 utm_content={article-slug} ``` **Example:** ``` https://hltmastery.com/nursing/nclex-rn?utm_source=hltmastery_blog&utm_medium=article&utm_campaign=nursing_content_2026&utm_content=medication-calculations ``` **UTM Component Mapping:** - `utm_source` = always `hltmastery_blog` - `utm_medium` = always `article` - `utm_campaign` = format: `{nursing|fnp|agnp}_content_2026` - `utm_content` = article slug (the one you're in) ## Troubleshooting Guide ### API Connection Issues **Problem:** "Cannot connect to Framer project" ``` Error: Failed to authenticate at connect (framer-api) ``` **Diagnosis:** 1. Verify API key retrieved from vault (check it's not null/empty) 2. Verify exact project URL: `https://framer.com/projects/HLTMastery-com--montTdTggA8zwlRhAdOf-4L1i6` 3. Check internet connection 4. Verify Supabase credentials haven't changed **Fix:** ```javascript // 1. Test vault retrieval in isolation const apiKey = await retrieveApiKeyFromVault() console.log('API Key length:', apiKey?.length) // Should be > 0 // 2. Test connection with exact URL const { connect } = await import('framer-api') const framer = await connect(PROJECT_URL, apiKey) console.log('Connected successfully') ``` ### Collection Lookup Issues **Problem:** "Articles collection not found" ``` Error: Cannot find collection ``` **Diagnosis:** - Checking by `.slug` instead of `.name` - Framer collection names may differ from display names **Fix:** ```javascript // WRONG const articles = collections.find((c) => c.slug === 'articles') // RIGHT const articles = collections.find((c) => c.name === 'Articles') // Debug: List all collections const collections = await framer.getCollections() collections.forEach((c) => console.log(`Name: ${c.name}, Slug: ${c.slug}`)) ``` ### Field Value Access Issues **Problem:** "Cannot read property 'value' of undefined" **Diagnosis:** - Framer returns field values in two forms: - Plain strings/booleans: `"My Title"` - Object form: `{ type: 'string', value: 'My Title' }` - Code only handles one form **Fix:** ```javascript // SAFE: Handle both forms function getFieldValue(item, fieldId) { const raw = item.fieldData[fieldId] if (!raw) return null return typeof raw === 'string' || typeof raw === 'boolean' ? raw : raw?.value } const title = getFieldValue(item, FIELDS.Title) const featured = getFieldValue(item, FIELDS.Featured) ``` ### Create Article Issues **Problem:** "No item found" error after creation ``` Error: Item not found after addItems ``` **Diagnosis:** - Passing an `id` field when creating (Framer auto-generates IDs) - Field data passed as frozen objects (from getItems()) **Fix:** ```javascript // WRONG: Passing id field await articles.addItems([{ id: 'some-uuid', // ❌ DO NOT PASS slug: 'my-slug', fieldData: { ... } }]); // RIGHT: No id field await articles.addItems([{ slug: 'my-slug', // ✓ Only slug, no id fieldData: { [FIELDS.Title]: { type: 'string', value: 'Title' }, // ... } }]); // ALSO WRONG: Reusing frozen objects from getItems() const items = await articles.getItems(); const item = items[0]; // item.fieldData is frozen; cannot be re-passed to addItems() // RIGHT: Reconstruct fieldData as plain objects await articles.addItems([{ slug: item.slug, fieldData: { [FIELDS.Title]: { type: 'string', value: extractValue(item, FIELDS.Title) }, // ... all fields reconstructed as plain objects } }]); ``` ### HTML Rendering Issues **Problem:** Links disappear, blank output **Diagnosis:** - Relative URLs in `<a href>` (Framer strips them) - Unsupported HTML tags (stripped silently) - Malformed list structure **Fix:** ```javascript // Validate before sending function validateHTMLContent(html) { const errors = [] // Check for relative links const relativeLinks = html.match(/<a href="\/[^"]+"/g) || [] if (relativeLinks.length > 0) { errors.push(`Found ${relativeLinks.length} relative links. Use absolute HTTPS URLs.`) } // Check for unsupported tags const unsupported = /<(hr|div|span|pre|form|input)[\s>]/g const matches = html.match(unsupported) || [] if (matches.length > 0) { errors.push(`Found ${matches.length} unsupported tags: ${unsupported.source}`) } // Check list structure const badLists = /<[ul|ol]>(?!.*dir="auto")/g if (badLists.test(html)) { errors.push('Lists missing dir="auto" attribute') } if (errors.length > 0) { throw new Error('HTML validation failed:\n' + errors.join('\n')) } console.log('✓ HTML content validated') } // Usage validateHTMLContent(contentHtml) await articles.addItems([ { slug, fieldData: { [FIELDS.Content]: { type: 'formattedText', value: contentHtml } } }, ]) ``` ### Update Issues **Problem:** "Cannot use addItems with frozen field data objects" **Diagnosis:** - Directly passing objects returned from getItems() to addItems() - Framer returns frozen (immutable) objects **Fix:** ```javascript // Get the item const items = await articles.getItems() const item = items.find((i) => i.slug === 'my-slug') // Extract values (unfreezes them) const title = typeof item.fieldData[FIELDS.Title] === 'string' ? item.fieldData[FIELDS.Title] : item.fieldData[FIELDS.Title]?.value // Re-add with new plain objects (not frozen) await articles.removeItems([item.id]) await articles.addItems([ { slug: item.slug, fieldData: { [FIELDS.Title]: { type: 'string', value: title }, // Plain object, not frozen // ... other fields }, }, ]) ``` ### Status Code Issues **Problem:** "Enum value not valid" or "Status not recognized" **Diagnosis:** - Using enum name instead of enum ID - Using wrong enum ID **Fix:** ```javascript // WRONG: Using enum name [FIELDS.Status]: { type: 'enum', value: 'Draft' } // ❌ String name // RIGHT: Using enum ID [FIELDS.Status]: { type: 'enum', value: ENUM.Draft } // ✓ UUID constant // If unsure of the exact enum ID, read a published article to confirm const published = await articles.getItems(); const example = published.find(i => extractValue(i, FIELDS.Status) === somePublishedStatus); console.log('Published status ID:', extractValue(example, FIELDS.Status)); ``` ### Vault Retrieval Failures **Problem:** "Failed to retrieve API key from vault" **Diagnosis:** - Supabase connection failed - Wrong vault key name - Credentials have changed **Fix:** ```javascript // Test connection independently import pg from 'pg' const { Client: PgClient } = pg const pgClient = new PgClient({ connectionString: 'postgres://postgres.exuervuuyjygnihansgl:tWSxGOnU6mvSyH8d@aws-1-us-east-1.pooler.supabase.com:5432/postgres', ssl: { rejectUnauthorized: false }, }) try { await pgClient.connect() console.log('✓ Supabase connection successful') const result = await pgClient.query('SELECT 1 as test') console.log('✓ Query executed successfully') const vaultRes = await pgClient.query( "SELECT decrypted_secret FROM vault.decrypted_secrets WHERE name = 'org/hlt/framer/hltmastery/api-key' LIMIT 1", ) if (vaultRes.rows.length === 0) { console.error('✗ Vault key not found. Verify the exact name.') return } const apiKey = vaultRes.rows[0].decrypted_secret console.log('✓ API key retrieved:', apiKey.substring(0, 20) + '...') } catch (error) { console.error('✗ Error:', error.message) } finally { await pgClient.end() } ``` ## Quick Reference ### Field IDs (Always Copy Exactly) ```javascript const FIELDS = { Title: 'bjAEz1hEH', Subtitle: 'xk1OOvgPm', Excerpt: 'zSteB3ctN', Content: 'gDIxMWxw2', Category: 'fA8uLUaZU', Product: 'E7r5IMHYu', Vertical: 'E73Zf_6Zx', ContentType: 'iPk2JtByG', Date: 'klsejpDiZ', Image: 'rQcV7vlpH', Featured: 'RfPd2TSgE', Status: 'ztCuodvw8', Author: 'wbj8P2oSe', } ``` ### Enum IDs (Always Copy Exactly) ```javascript const ENUM = { Draft: 'sb42ZflLF', TestTaking: 'GXa_qfHAB', TipsAndTricks: 'Y2hmykInU', HighYieldConcept: 'CIFO9HDu2', CommonChallenges: 'yqWjPaAGw', Nursing: 'BjkvQSfQU', Standard: 'NFAA1eUab', HowTo: 'vYJRJiVGo', FNP: 'kVoqGt5bS', } ``` ### Author IDs ```javascript const AUTHORS = { Kristin: 'wTNBcwCNI', BenOConnor: 'D8sWxJ5eZ', Catherine: 'eKdF9yBKP', } ``` ### Critical URLs ```javascript const PROJECT_URL = 'https://framer.com/projects/HLTMastery-com--montTdTggA8zwlRhAdOf-4L1i6' const URLS = { nclex_rn_article: 'https://hltmastery.com/nursing/nclex-rn/resources/{slug}', fnp_article: 'https://hltmastery.com/nursing/fnp/resources/{slug}', agnp_article: 'https://hltmastery.com/nursing/agnp/resources/{slug}', } const UTM = { source: 'hltmastery_blog', medium: 'article', campaign_template: '{product}_content_2026', // Replace {product} with nursing|fnp|agnp content: '{slug}', // Article slug } ``` ## Glossary | Term | Definition | | ------------------------ | ------------------------------------------------------------------- | | **Collection** | A Framer CMS table (Articles, Resources, Stories, etc.) | | **Item** | A single record in a collection (one article) | | **Field** | A column in a collection (Title, Content, Status, etc.) | | **Field ID** | Framer's immutable UUID for a field (e.g., `bjAEz1hEH`) | | **Slug** | URL-friendly identifier (e.g., `medication-calculations`) | | **Enum** | Dropdown field with predefined values (Draft, Published, etc.) | | **Enum ID** | Framer's UUID for an enum value (e.g., `sb42ZflLF` for Draft) | | **Formatted Text** | Rich HTML text field (supports subset of HTML) | | **Collection Reference** | Link to an item in another collection (Author → Authors collection) | | **Draft** | Unpublished status; used for testing | | **Featured** | Boolean flag to pin article to top of feed | | **Vault** | Supabase encrypted secrets storage | ## Support and Escalation If you encounter issues not covered in this runbook: 1. **Check the [Troubleshooting Guide](#troubleshooting-guide)** — most common issues are documented with solutions 2. **Review [Framer Rich Text Rules](#framer-rich-text-html-rules)** — HTML issues are the most frequent source of problems 3. **Validate your API key and connection** — use the test script in [Vault Retrieval Failures](#vault-retrieval-failures) 4. **Check exact field and enum IDs** — copy from the [Quick Reference](#quick-reference) section, never from memory For blocking issues: - Verify Supabase connectivity (test with psql if possible) - Check that project URL is exact (no typos) - Ensure `framer-api` package is installed and up-to-date - Review Framer project settings to confirm API key is valid (not expired or revoked) --- **Document Version:** 1.0 **Last Updated:** 2026-04-06 **Status:** Production Ready **Owner:** HLT Mastery Content Operations --- ## Source: docs/runbooks/factory/import-normalization.md # Factory Runbook: Import + Normalization Status: Active Updated: 2026-02-17 ## When To Use Use this runbook when importing external skills/agents/KB/prompts and preparing them for staged review. ## Goals 1. Keep DB canonical. 2. Preserve provenance and transformation trail. 3. Catch portability and taxonomy issues before promotion. 4. Prevent thin, metadata-poor imports from polluting the default discovery surface. ## Inputs 1. Incoming drop in `incoming/` with `DROP.md`. 2. Target unit scope (single skill, batch, mixed package). 3. Selected compatibility profile (`agent_skills_standard`, `plugin_portable`, `catalyst_enriched`). ## Procedure 1. Fetch the upstream package into `.claude/skills/imports/<source>/<code>/`. 2. Generate a parity report against the current local mirror. 3. Normalize names, refs, and folder parity. 4. Refresh the curated mirror while preserving approved overlays. 5. Run import lint and portability checks. 6. Stage in DB with status `staged`. 7. Attach provenance and rationale artifacts. 8. If staged packages duplicate existing curated skills (same `code@version`), snapshot and prune duplicates from `imports/`. ## Canonical Import-Then-Adapt Contract This is the only acceptable flow for external skill ingestion when registry trust matters. 1. Preserve upstream portable structure first. - Keep `SKILL.md` as the launcher. - Keep deep material in layered artifacts (`references/`, `examples/`, `templates/`, `rules/`, `tests/`). 2. Write canonically to the DB first. - Treat repo mirrors as generated outputs. - If a launcher/artifact repair is needed, write the DB revision before mirror sync. 3. Add Katailyst overlays explicitly. - Overlay discovery copy (`summary`, `use_case`) and HLT routing notes without replacing the upstream core. 4. Normalize trust metadata before promotion. - Required namespaces for operator-visible skills: `dept:*`, `family:*`, `source:*`. - Required when execution surface is knowable: `runtime:*`. - Required for imported/community skills: `partner:*` when there is a clear maintainer/provider. 5. Assign provisional tiers conservatively. - Do not let new imports enter the default top surface just because they are complete or imported cleanly. 6. Prove artifact readiness before high-visibility browse placement. - Thin launcher-only imports stay discoverable, but should be suppressed from default recommended browse until they have defensible trust signals and artifact depth. ## Registry Trust Gates Before an imported skill is allowed to behave like a top-surface candidate: 1. Upstream source/package path is recorded and reachable. 2. Canonical DB revision contains the launcher plus any layered artifacts that materially improve use. 3. `summary` and `use_case` are distinct and actually useful for discovery. 4. Runtime/source/partner tags are explicit rather than implied by prose. 5. Default browse suppression is removed only after artifact readiness and trust metadata are adequate. ## Factory Inbox (UI-First Flow) Route: `/dashboard-cms/factory/incoming` Use this when you want a unified "drop it here and handle it" operator flow with run traces. Inbox accepts: - user ideas / issue writeups - copied/pasted notes and requirements - links and GitHub repository URLs - zip/file attachments - new schema/tool/KB/agent requests - skill library imports Execution tracks: 1. `skill_import` track: - normalize -> enrich -> import -> evaluate -> gated promote (`curated`) 2. `triage` track: - AI triage -> enrichment recommendations -> evaluation + route decision 3. Queue an incoming drop from the Inbox list. 4. Worker runs the track selected by intake metadata. 5. Skill import ready candidates auto-promote to `curated` (live in this repo). 6. Source drop is archived under `incoming/archive/YYYY/MM/DD/...` with `MANIFEST.json`. 7. Inspect run step evidence in `/dashboard-cms/observe/runs/:run_id`. Notes: - `DROP.md` is required for queueing. - Pipeline writes run/step/recommendation evidence to runtime trace tables. - Promotion remains deterministic (`promote_skill_status.ts` checks are enforced). - Intake triage recommendations do not bypass deterministic safety gates. ## Command Sequence ```bash # 1) Fetch upstream package(s) and generate parity npx tsx scripts/ingestion/import_upstream_skills.ts --repo <repo-url> --source <slug> --skill <code> --dry-run npx tsx scripts/ingestion/import_upstream_skills.ts --repo <repo-url> --source <slug> --skill <code> --write # 2) Normalize import material npx tsx scripts/ingestion/normalize_skill_imports.ts --help # 3) Lint package contracts python3 scripts/registry/lint_unit_packages.py --strict python3 scripts/ingestion/lint_skill_imports.py --strict # 4) Sync mirrors (check mode first) npx tsx scripts/registry/sync/sync_skills_from_db.ts --check # 5) Optional enrichment pass for imported skills npx tsx scripts/ingestion/enrich_import_skill_packages.ts --help # 6) Detect duplicate staged imports (same code@version already curated) npx tsx scripts/registry/audit/audit_skill_lifecycle.ts --report docs/reports/skills-lifecycle-audit-YYYY-MM-DD.json # 7) Snapshot + prune duplicate staged imports npx tsx scripts/ingestion/prune_duplicate_import_skills.ts --snapshot docs/reports/skills-import-duplicates-YYYY-MM-DD.json --dry-run npx tsx scripts/ingestion/prune_duplicate_import_skills.ts --snapshot docs/reports/skills-import-duplicates-YYYY-MM-DD.json --write ``` ## Quality Gates Pass all before handoff to eval lane: 1. `SKILL.md` portability checks are clean for selected profile. 2. Required tag namespaces are present. 3. Typed refs resolve or are explicitly documented unresolved. 4. No secrets in imported artifacts/scripts. 5. Artifact readiness is sufficient for the intended visibility tier. 6. Default recommended browse would not suppress the import as thin or environment-mismatched. ## Common Failure Classes + Fixes 1. Missing frontmatter fields - Fix in portable launcher frontmatter (`name`, `description`). 2. Folder/name mismatch - Rename folder or skill `name` to profile-compliant format. 3. Missing taxonomy tags - Add required namespaced tags per `docs/TAXONOMY.md`. 4. Unknown typed refs - Resolve via DB lookup or mark as unresolved with reason. ## Evidence To Record 1. Import source and timestamp. 2. Normalization commands executed. 3. Lint results. 4. Remaining warnings (if any) with remediation owner. ## Handoff After successful import normalization, continue with: - `docs/runbooks/factory/optimization-ab-validation.md` - `import-external-skill` for the operator checklist that keeps repo intake and DB sync in the same lane --- ## Source: docs/runbooks/factory/incident-response-failed-runs-exports.md # Factory Runbook: Incident Response (Failed Runs + Export Breakage) Status: Active Updated: 2026-02-16 ## When To Use Use when: 1. run failure rate is elevated. 2. export/portability checks regress. 3. promotion path is blocked by repeated contract failures. ## Triage SLA 1. Start triage within 15 minutes for critical alerts. 2. Establish incident class and owner within 30 minutes. 3. Publish first remediation action within 60 minutes. ## Failure Classification 1. Runtime failures - run status `failed`/`cancelled` spikes. 2. Contract failures - portability or plugin-compatible errors in post-commit checks. 3. Governance friction failures - warning noise, override spikes, unresolved remediation queue. ## Initial Triage Steps 1. Open `/dashboard-cms/observe` and inspect latest failed runs. 2. Open `/dashboard-cms` -> `System Health` -> `Factory Ops Snapshot`. 3. Identify top failing class and isolate affected refs. ## Remediation Playbook 1. Runtime failures - inspect run context and failing step in Observe - retry with constrained scope - queue remediation if quality regression is involved 2. Contract failures - run portability/export check bundle - fix malformed metadata/frontmatter/refs - re-run checks before any promotion 3. Friction failures - reduce warning noise via policy tuning - require closure criteria on queued remediation - remeasure override and remediation latency ## Command Bundle ```bash python3 scripts/registry/lint_unit_packages.py --strict npx tsx scripts/registry/sync/sync_skills_from_db.ts --check npx tsx scripts/distribution/export_plugin.ts --check npx tsx scripts/distribution/validate-plugin.ts --strict --plugin-dir .claude-plugin npx tsx scripts/distribution/test-plugin-smoke.ts --plugin-dir .claude-plugin npx tsx scripts/distribution/export_registry_packs.ts --check pnpm typecheck pnpm lint pnpm test:run ``` ## Containment Rules 1. Pause high-risk promotions when critical checks fail. 2. Keep creation/evaluation flows open unless critical safety/secret risks are detected. 3. Avoid broad hard locks unless rollback cannot contain blast radius. ## Incident Close Criteria 1. Failure class root cause is documented. 2. Checks return to baseline thresholds. 3. Any temporary bypass/override is removed or converted into explicit policy. 4. Follow-up tasks are captured in plan/state docs. ## Postmortem Minimum Fields 1. Incident window (start/end timestamps) 2. Affected refs/surfaces 3. Root cause summary 4. Corrective actions 5. Preventive actions --- ## Source: docs/runbooks/factory/operator-quickstart.md # Factory Operator Quickstart Status: Active Scope: Phase 15 AMS operations baseline Updated: 2026-02-19 ## Purpose Give a new operator or agent a fast, reliable path to run the full factory lifecycle without hidden assumptions. ## Canonical References 1. `AGENTS.md` 2. `docs/RULES.md` 3. `docs/VISION.md` 4. `docs/references/skills/SKILL_FACTORY_GOVERNANCE_CHECKLIST.md` 5. `docs/references/contracts/ATOMIC_UNIT_STANDARDS.md` 6. `docs/references/contracts/ATOMIC_DUPLICATE_DETECTION_POLICY.md` 7. `docs/references/contracts/MIRRORS_AND_PACKS.md` ## 0) Preflight (Required) Run from repo root: ```bash git status --short --untracked-files=all git log -5 --oneline pnpm typecheck pnpm lint ``` If `pnpm typecheck` or `pnpm lint` fails, stop and resolve before promoting anything. ## 1) Health Snapshot Check (Cockpit) Open `System Health` on `/dashboard-cms` and review: 1. `Factory Ops Snapshot` metrics 2. `Lifecycle SLOs` statuses 3. `Governance Friction` rates 4. `Remediation Hooks` If a critical remediation hook appears, jump to: - `docs/runbooks/factory/incident-response-failed-runs-exports.md` ## 2) Intake + Normalization Lane Use: - `docs/runbooks/factory/import-normalization.md` Goal: 1. Stage incoming material safely. 2. Normalize into DB-canonical structures. 3. Keep portability profile compatibility visible early. ## 3) Optimization + Eval Lane Use: - `docs/runbooks/factory/optimization-ab-validation.md` Goal: 1. Run QA/eval loops. 2. Compare variants (A/B or pairwise). 3. Queue remediation for weak candidates. ## 4) Promotion + Distribution Lane Use: - `docs/runbooks/factory/promotion-rollback.md` - `docs/references/contracts/ATOMIC_UNIT_STANDARDS.md` Goal: 1. Promote with evidence. 2. Validate portability/export integrity. 3. Keep rollback path documented before publish. 4. For overlap candidates, require `alternate`/`supersedes` rationale and rollback target before promotion. ## 5) Monitoring + Remediation Lane Use: - `docs/runbooks/factory/incident-response-failed-runs-exports.md` Goal: 1. Triage failures by class. 2. Apply deterministic remediation steps. 3. Capture residual risk and follow-up actions. ## 6) Standard Command Bundle ```bash python3 scripts/registry/lint_unit_packages.py --strict npx tsx scripts/registry/sync/sync_skills_from_db.ts --check npx tsx scripts/distribution/export_plugin.ts --check npx tsx scripts/distribution/validate-plugin.ts --strict --plugin-dir .claude-plugin npx tsx scripts/distribution/test-plugin-smoke.ts --plugin-dir .claude-plugin npx tsx scripts/distribution/export_registry_packs.ts --check pnpm typecheck pnpm lint pnpm test:run ``` ## 7) Escalation Triggers Escalate immediately when any of the following are true: 1. Portability pass rate drops below 95%. 2. Export success rate drops below 98%. 3. Failure rate exceeds 15% in the active ops window. 4. Remediation queue backlog grows without follow-up QA/promotion activity. ## 8) Output Standard (Per Ops Session) Every operator session should leave: 1. Evidence of checks run. 2. Decision notes (promote/hold/rollback). 3. Linked runbook path used for remediation. 4. Next action owner (human or agent). --- ## Source: docs/runbooks/factory/optimization-ab-validation.md # Factory Runbook: Optimization + A/B Validation Status: Active Updated: 2026-02-16 ## When To Use Use for iterative quality improvement before promotion, especially for: 1. Skill-vs-skill comparisons 2. Version-vs-version checks 3. Bundle/mixed-unit quality decisions ## Goals 1. Improve reliability and outcome quality. 2. Preserve deterministic eval evidence. 3. Decide promote/hold/rollback using traceable criteria. ## Workflow 1. Select candidate and baseline refs. 2. Run QA/eval pass. 3. Run pairwise/A-B judging where applicable. 4. Generate decision packet. 5. Apply action (`promote`, `hold`, `rollback`, `queue_remediation`). ## Command Sequence ```bash # Promptfoo/deterministic harness npx tsx scripts/eval/eval_promptfoo.ts --help # Refresh aggregate quality signals npx tsx scripts/eval/recompute_eval_signals.ts --help npx tsx scripts/eval/recompute_pairwise_ratings.ts --help # Repository safety checks before actioning outcomes pnpm typecheck pnpm lint pnpm test:run ``` ## Decision Heuristics (Default) 1. Promote - strong score + adequate confidence + no blocking quality flags. 2. Hold - incomplete confidence or insufficient sample size. 3. Rollback - regression risk with strong negative evidence. 4. Queue remediation - fixable deficits or low-confidence issues requiring another pass. ## Required Evidence 1. Eval run IDs 2. Score and confidence context 3. Decision recommendation + reason codes 4. Follow-up action owner ## Remediation Loop If recommendation is `queue_remediation`: 1. Create remediation queue entry. 2. Capture reason codes in run context. 3. Re-run QA after edits. 4. Re-check confidence before promotion. ## Common Failure Classes + Fixes 1. Score unstable across reruns - tighten prompt structure; reduce uncontrolled context variance. 2. Low signal confidence - increase fixture diversity and sample count. 3. Frequent override requests - refine rubric weighting and reduce warning noise. ## Handoff After optimization/eval readiness: - `docs/runbooks/factory/promotion-rollback.md` --- ## Source: docs/runbooks/factory/promotion-rollback.md # Factory Runbook: Promotion + Rollback Status: Active Updated: 2026-02-16 ## When To Use Use for staged -> curated -> published progression and any rollback action. ## Goals 1. Enforce evidence-backed promotion. 2. Keep compatibility/export integrity intact. 3. Make rollback fast and deterministic. ## Required Pre-Checks 1. Review `docs/references/skills/SKILL_FACTORY_GOVERNANCE_CHECKLIST.md`. 2. Confirm latest eval/QA evidence. 3. Confirm runbook-defined rollback target exists. ## Command Sequence ```bash # Portability and mirror integrity python3 scripts/registry/lint_unit_packages.py --strict npx tsx scripts/registry/sync/sync_skills_from_db.ts --check # Distribution integrity npx tsx scripts/distribution/export_plugin.ts --check npx tsx scripts/distribution/validate-plugin.ts --strict --plugin-dir .claude-plugin npx tsx scripts/distribution/test-plugin-smoke.ts --plugin-dir .claude-plugin npx tsx scripts/distribution/export_registry_packs.ts --check # Repo quality pnpm typecheck pnpm lint pnpm test:run ``` ## Promotion Decision Model 1. Promote when: - quality threshold is met or approved override exists. - compatibility profile checks have no blocking errors. 2. Hold when: - quality confidence is insufficient or unresolved warnings remain. 3. Block when: - portable core malformed. - secret/safety risk detected. - export-breaking incompatibility is present. ## Rollback Paths 1. `published -> curated` 2. `curated -> staged` Document rollback target in decision notes before promoting. ## Override Policy Overrides are allowed but must include: 1. explicit operator role eligibility. 2. written override reason. 3. follow-up remediation item. ## Post-Change Monitoring After promotion/rollback: 1. Verify status transitions in dashboard. 2. Check Factory Ops Snapshot for drift/failure changes. 3. Confirm no new critical remediation hooks. ## Evidence To Capture 1. Promotion or rollback run ID 2. Applied checks + pass/fail 3. Exception notes and owner (if any) 4. Rollback readiness statement ## Escalate If 1. portability pass rate < 95% 2. export success rate < 98% 3. repeated overrides without remediation closure --- ## Source: docs/runbooks/factory/README.md # Factory Runbooks Index `docs/runbooks/factory/**` is the operator procedures lane for the skill and registry factory lifecycle. Route by operator intent: - Normalize and stage imports safely: `docs/runbooks/factory/import-normalization.md` - Get a new operator through the lifecycle quickly: `docs/runbooks/factory/operator-quickstart.md` - Optimize or A/B validate before promotion: `docs/runbooks/factory/optimization-ab-validation.md` - Promote or roll back curated assets safely: `docs/runbooks/factory/promotion-rollback.md` - Triage failed runs or export breakage: `docs/runbooks/factory/incident-response-failed-runs-exports.md` --- ## Source: docs/runbooks/interop/external-runtime-onboarding.md # External Runtime Onboarding Use this when your agent or orchestrator runs **outside** Katailyst. Examples: - OpenClaw/Render agents such as Victoria, Lila, and Julius - an app-native orchestrator - a LangChain-style workflow - a custom MCP client with its own execution loop ## What Katailyst Is Katailyst is the canonical registry and control plane. It gives you: - atomic-unit discovery - canonical tags, links, tiers, and eval signals - vault references and integration metadata - portable mirrors and export surfaces It does **not** replace your runtime loop. ## Recommended Flow 1. Prime with: - `AGENTS.md` - `CATALYST.md` - `docs/references/contracts/RUNTIME_OWNERSHIP_AND_CONSUMPTION.md` 2. Discover candidates with API or MCP. 3. Inspect and traverse related units. 4. Choose a small set of blocks. 5. Execute inside your runtime. 6. Write outputs, traces, and learnings back canonically. ## OpenClaw-Style Example - Discover a few skills, tools, and KB entries from Katailyst. - Pull only the blocks the agent decides it needs. - Execute on the OpenClaw/Render runtime. - Preserve agent autonomy over sequencing and sub-agent use. - Write results back to Katailyst so discovery quality improves over time. ## Generic Orchestrator Example - Use API or MCP for discovery. - Use mirrors or packs when your host needs portable local files. - Keep orchestration decisions in your own runtime. - Treat Katailyst as the armory and source of canonical metadata. ## Surface Choice - **MCP** for direct discovery/query in local or tool-driven hosts - **API** for structured app/service workflows - **Mirror** for portable local folders - **Pack/export** for shipping curated subsets into another repo or runtime ## Guardrails 1. DB is canonical. 2. Mirrors are derived. 3. External runtimes stay autonomous. 4. Do not hardcode a single orchestration path just because Katailyst recommended one. --- ## Source: docs/runbooks/interop/mcp-ai-sdk-adapter.md # MCP + AI SDK Adapter — Interop Runbook Actionable guidance for connecting external AI SDK clients to the Katailyst registry via MCP. ## Overview The Katailyst registry exposes an MCP server (`scripts/ops/mcp_registry_server.ts`) that any MCP-compatible client can consume. The AI SDK (`ai` package from Vercel) provides an MCP client that maps MCP tools into AI SDK tools, enabling LLMs to discover and use registry entities directly. ``` AI SDK Client MCP Transport Katailyst MCP Server ────────────── ───────────── ──────────────────── generateText() stdio / Streamable HTTP discover() + tools from MCP ──────► MCP protocol ──────────► get_entity() model selects tool (JSON-RPC) list_entities() model uses result ◄────── ◄────────── traverse() search_tags() get_skill_content() ``` ## Available MCP Tools | Tool | Description | Key Params | | ------------------- | ----------------------------------------------------- | --------------------------------------------------------------------------------------------- | | `discover` | Semantic search for entities by intent | `intent`, `entity_types?`, `tags?`, `limit? (1-200, default 20)`, `cursor?`, `response_mode?` | | `registry.search` | Alias for first-pass discovery | same as `discover` | | `registry.expand` | Alias for continuation/deeper pages | same as `discover` (expects `cursor` for continuation) | | `get_entity` | Full entity details with revision content | `entity_type`, `code`, `version?` | | `registry.get` | Alias for `get_entity` | `entity_type`, `code`, `version?` | | `list_entities` | Paginated entity listing with filters | `entity_type?`, `status?`, `tags?`, `limit?` | | `get_skill_content` | Rendered instruction body for a skill or typed entity | `code`, `entity_type?`, `include_artifacts?`, `artifact_mode?` | | `search_tags` | Search tags by prefix | `query`, `namespace?`, `limit?` | | `traverse` | Graph traversal of entity links | `ref`, `link_types?`, `depth?` | `discover` supports three response modes: - `response_mode: "text"` (default): human-friendly menu - `response_mode: "json"`: machine-friendly payload (also available in `structuredContent`) - `response_mode: "compact"`: minimal result fields (truncated summary/use_case, max 5 tags) for context-constrained agents Continuation contract: - `discover` returns `next_cursor`, `has_more`, and `can_request_more` in structured output. - When `has_more=true`, call again with `cursor=next_cursor`. - Keep iterating until `has_more=false`. ## MCP Resources and Prompts The server now exposes read-only resources and reusable prompt templates in addition to tools. Resources (examples): - `katailyst://docs/interop/registry-api-contract` - `katailyst://docs/interop/orchestrator-workflow` - `katailyst://docs/references/discovery-rerank` - `katailyst://docs/atomic-units/readme` Prompts: - `registry-select-from-menu` - `registry-refine-discovery` ## Transport Selection Matrix | Environment | Transport | Auth | Latency | When to Use | | ----------- | --------- | ------------------------ | ------- | ------------------------ | | Local dev | **stdio** | None (trusted) | ~5ms | `pnpm dev` + Claude Code | | CI/CD | **stdio** | None | ~5ms | Automated testing | | Staging | **HTTP** | Personal token or cookie | ~50ms | Preview deployments | | Production | **HTTP** | Personal token / session | ~50ms | External integrations | Client surface split: - Claude Code reads `.mcp.json` - Codex reads `.codex/config.toml` - Claude.ai / Co-Work connector setup is done in the Claude settings UI with the same endpoint and bearer header shape ### Stdio Transport (Local/Dev) The MCP server runs as a child process. No network overhead. Fastest option. ```typescript import { experimental_createMCPClient as createMCPClient } from 'ai' import { Experimental_StdioMCPTransport as StdioTransport } from 'ai' const transport = new StdioTransport({ command: 'npx', args: ['tsx', 'scripts/ops/mcp_registry_server.ts'], }) const client = await createMCPClient({ transport }) const tools = await client.tools() ``` **Claude Code / Codex configuration** (`.mcp.json`): ```json { "mcpServers": { "katailyst-registry": { "command": "npx", "args": ["tsx", "scripts/ops/mcp_registry_server.ts"] } } } ``` ### Streamable HTTP Transport (Staging/Production) The MCP server is exposed over HTTP with Server-Sent Events. ```typescript import { experimental_createMCPClient as createMCPClient } from 'ai' const client = await createMCPClient({ transport: { type: 'http', url: 'https://www.katailyst.com/mcp', headers: { Authorization: `Bearer ${process.env.KATAILYST_PERSONAL_MCP_TOKEN}`, }, }, }) const tools = await client.tools() ``` Startup introspection for docs/templates: ```typescript const resources = await client.listResources() const prompts = await client.listPrompts() const contractDoc = await client.readResource({ uri: 'katailyst://docs/interop/registry-api-contract', }) const selectionPrompt = await client.getPrompt({ name: 'registry-select-from-menu', arguments: { intent: 'find best skill for onboarding', menu_json: JSON.stringify([{ ref: 'skill:example', score: 0.9 }]), }, }) ``` ## Auth Token Flows ### Stdio (Local) No auth needed. The server script reads DB credentials from environment variables or `.env.local`. ### Streamable HTTP with Supabase Session ```typescript const transport = { type: 'http', url: 'https://www.katailyst.com/mcp', headers: { Cookie: `sb-access-token=${accessToken}; sb-refresh-token=${refreshToken}`, }, } ``` ### Streamable HTTP with Personal Token (Recommended) ```typescript const transport = { type: 'http', url: 'https://www.katailyst.com/mcp', headers: { Authorization: `Bearer ${process.env.KATAILYST_PERSONAL_MCP_TOKEN}`, }, } ``` Team guidance: - issue personal tokens from `/dashboard-cms/tools/mcp` - use per-user tokens for long-lived remote clients - use short-lived connect tokens only for temporary handoff and validation - keep static env bearer tokens for tightly controlled infrastructure cases - Katailyst `/mcp` itself is bearer/session based today; Supabase MCP's OAuth login flow is a separate server and should not be used as the Katailyst auth model ### Token Refresh For long-lived HTTP connections, refresh the auth token before expiry: ```typescript // Check token expiry before each batch of tool calls if (isTokenExpiring(token, bufferMs: 60_000)) { token = await refreshToken() // Reconnect with new token await client.close() client = await createMCPClient({ transport: { type: 'http', url, headers: { Authorization: `Bearer ${token}` } }, }) } ``` ## Discovery-Driven Workflow ### Pattern 1: Model-Driven (Recommended) Let the LLM decide which MCP tools to call based on the user's intent. ```typescript import { generateText } from 'ai' import { anthropic } from '@ai-sdk/anthropic' const result = await generateText({ model: anthropic('claude-opus-4-6'), tools, maxSteps: 5, system: `You have access to a content registry via MCP tools. Use 'discover' to find relevant building blocks, then use the returned entity_type and code with 'get_entity' for metadata and artifact paths. Call 'get_skill_content' for rendered bodies, let artifact_mode="auto" protect you from very large artifact sets, and use 'registry.artifact_body' when you need an exact artifact file. Return the content.`, prompt: userMessage, }) ``` ### Pattern 2: Programmatic (Explicit Control) Call MCP tools directly for deterministic workflows. ```typescript // 1. Discover candidates const discoverResult = await client.callTool('discover', { intent: 'blog writing', entity_types: ['skill'], limit: 3, }) // 2. Parse result const candidates = JSON.parse(discoverResult.content[0].text) // 3. Get full entity const entityResult = await client.callTool('get_entity', { entity_type: 'skill', code: candidates[0].code, }) // 4. Use in generation const skillContent = JSON.parse(entityResult.content[0].text) ``` ### Pattern 3: Hybrid (Model + Guard Rails) Use the model for discovery but validate selections programmatically. ```typescript const result = await generateText({ model: anthropic('claude-opus-4-6'), tools, maxSteps: 3, prompt: 'Find a skill for SEO content writing', toolChoice: { type: 'required' }, // Force tool use }) // Validate the model's tool selections for (const step of result.steps) { for (const call of step.toolCalls) { if (call.toolName === 'discover') { // Verify intent matches user's request assert(call.args.intent.includes('SEO')) } } } ``` ## Cancellation / Abort ### AI SDK Abort ```typescript const controller = new AbortController() const result = generateText({ model: anthropic('claude-opus-4-6'), tools, prompt: 'Find skills...', abortSignal: controller.signal, }) // Cancel after timeout setTimeout(() => controller.abort(), 30_000) ``` ### Transport Behavior on Abort | Transport | Abort Behavior | | --------- | ----------------------------------------------------------- | | stdio | Closes stdin → subprocess receives SIGPIPE/EOF → shuts down | | HTTP | Closes HTTP connection → server detects drop → cleans up | The Katailyst MCP server handles `SIGINT`/`SIGTERM` for graceful shutdown (connection cleanup, DB disconnect). ## Troubleshooting Matrix | Symptom | Cause | Diagnosis | Fix | | ------------------------------- | ------------------------------------- | ----------------------------------------------------------------------- | ---------------------------------------------------- | | Client fails to connect (stdio) | Script not found or missing deps | Run `npx tsx scripts/ops/mcp_registry_server.ts` manually, check stderr | Ensure `tsx` installed, script path correct | | Client fails to connect (HTTP) | Server not running or URL wrong | `curl` the `/mcp` URL, check for 200/405 | Verify server running, URL matches env | | Auth error (401) on HTTP | Missing/expired/revoked token | Check Authorization header, token source, and revocation/expiry status | Mint/revoke personal token from dashboard or re-auth | | `discover` returns empty | No matching entities or vague intent | Try specific intent. Check DB has seeded entities | Run seed script, check entity statuses | | Tool call timeout (>10s) | DB query slow or pool exhausted | Check server stderr for timing. Check DB limits | Server sets `statement_timeout = 10s`. Check DB load | | Schema mismatch | Server version != client expectations | `client.tools()` and inspect `inputSchema` | Update MCP server or align client expectations | | EPIPE / broken pipe (stdio) | Client or server crashed mid-call | Check server stderr and client logs | Restart. Check Node.js version compat | | SSL/TLS error (HTTP) | Self-signed cert or cert mismatch | `NODE_TLS_REJECT_UNAUTHORIZED=0` (dev only) | Install proper cert or use HTTP locally | ## Environment Variables | Variable | Used By | Description | | ---------------------------------- | ----------- | ------------------------------------------------------------------------------------------------------- | | `CATALYST_DB_URL` | MCP server | PostgreSQL connection string (preferred) | | `DATABASE_URL` | MCP server | Fallback DB URL | | `KATAILYST_PERSONAL_MCP_TOKEN` | HTTP client | Per-user long-lived token for authenticated streamable HTTP connections | | `MCP_PERSONAL_ACCESS_TOKEN_SECRET` | HTTP server | Secret used to sign/hash per-user personal tokens | | `MCP_CONNECT_TOKEN_SECRET` | HTTP server | Secret used to sign short-lived connect tokens | | `MCP_AUTH_TOKEN` | HTTP server | Static fallback bearer token accepted by `/mcp` and secret fallback when dedicated secrets are missing | | `MCP_AUTH_TOKENS` | HTTP server | Comma-separated static bearer token list accepted by `/mcp` | | `MCP_ALLOW_ANONYMOUS` | HTTP server | Anonymous access is **off by default**. Set `true` only when you intentionally want a permissive route. | | `MCP_CORS_ORIGINS` | HTTP server | Optional comma-separated CORS origins (`*` default) | | `MCP_ALLOWED_HOSTS` | HTTP server | Optional host allow-list when DNS rebinding protection enabled | | `MCP_ALLOWED_ORIGINS` | HTTP server | Optional origin allow-list used by transport-level checks | | `NODE_ENV` | Both | `development` for local, `production` for deployed | ## Files | File | Purpose | | --------------------------------------------- | ----------------------------------------------------------- | ---- | ------------- | | `scripts/ops/mcp_registry_server.ts` | Stdio MCP server (production-ready) | | `app/mcp/route.ts` | Streamable HTTP MCP endpoint (`GET | POST | DELETE /mcp`) | | `lib/mcp/protocol.ts` | MCP server factory (McpServer + StdioTransport) | | `lib/mcp/handlers.ts` | Tool handler implementations (DB queries) | | `lib/mcp/tool-definitions.ts` | Tool registrations with Zod schemas | | `lib/interop/mcp-ai-sdk-example.ts` | Reference example module (types, patterns, troubleshooting) | | `.mcp.json` | Claude Code MCP server configuration | | `docs/runbooks/interop/mcp-ai-sdk-adapter.md` | This document | ## Related Documentation - **Registry API contract:** `docs/runbooks/interop/registry-api-contract.md` - **AI SDK capability map:** `docs/references/ai-agents/AI_SDK_6_MAP.md` - **Orchestrator workflow:** `docs/runbooks/interop/orchestrator-workflow.md` --- ## Source: docs/runbooks/interop/orchestrator-workflow.md # External Orchestrator Workflow Reference Reference workflow showing how external orchestrators consume the Katailyst registry API. Demonstrates discovery-first selection (no hard-coded routing) with full trace breadcrumbs. ## Architecture ``` External Orchestrator Katailyst Registry API ────────────────────── ────────────────────────── 1. Intent POST /api/discover "write a blog post" ──────────────► { intent, limit } ◄────────────── { data: [candidates...] } 2. Dependency Walk POST /api/traverse "skill:blog-writer" ──────────────► { ref, link_types, depth } ◄────────────── { links, entities } 3. Tool Inspection POST /api/tools/describe "tool:tavily-search" ─────────────► { tool_ref } ◄────────────── { tool, config, call_spec } 4. Execution POST /api/tools/execute (or dry-run) ──────────────► { tool_ref, input } ◄────────────── { tool, data } ``` ## Workflow Steps ### Step 1: Discover Find candidates matching a natural language intent. ```bash curl -X POST https://your-domain.com/api/discover \ -H "Content-Type: application/json" \ -H "Cookie: sb-session=..." \ -d '{ "intent": "write a blog post about AI trends", "limit": 10, "rerank": { "enabled": true, "provider": "auto", "top_n": 10 } }' ``` **Decision rules:** - 0 candidates → log "no_match", exit gracefully - 1+ candidates → proceed to traverse/describe/execute - Multiple close-scoring candidates → present menu to user - If menu quality is weak → run a second `discover` pass with refined intent/tags/types ### Step 1b: Refine and Retry (Optional) When the top menu does not fit, keep the same verb and refine inputs: ```bash curl -X POST https://your-domain.com/api/discover \ -H "Content-Type: application/json" \ -H "Cookie: sb-session=..." \ -d '{ "intent": "skill for writing technical long-form AI articles with citations", "types": ["skill"], "tags": ["domain:writing", "action:create"], "limit": 10 }' ``` ### Discovery At Scale (1k-3k+ nodes) As the registry grows, discovery should stay discovery-first rather than pretending to solve the whole task in one call: - **Narrow, don't solve** -- `discover` and `registry.agent_context` are meant to surface a strong bounded menu for this pass, not the one true answer forever. - **Agent owns the composition** -- the orchestrator or its sub-agents decide which blocks to load, combine, ignore, or replace. - **Continuation beats truncation** -- if the first pass is only partly right, run a second discover pass with different wording, filters, or tags, then use `registry.expand` or `traverse` when you need more. - **Hubs are optional lighthouses** -- when a hub appears, traverse it to open the domain. Treat it as a strong front door, not a forced route. - **Rich intent matters** -- send a paragraph or two with audience, business goal, output shape, and relevant context. That improves the shortlist before any sub-agent decomposition begins. - **Parallel scouts are fine** -- for complex work, dispatch multiple sub-agents to search different facets, then let the parent agent decide whether the combined menu is good enough or needs another pass. ### Step 2: Traverse Dependencies Walk the selected candidate's dependency graph to understand what tools, KBs, and prompts it needs. ```bash curl -X POST https://your-domain.com/api/traverse \ -H "Content-Type: application/json" \ -H "Cookie: sb-session=..." \ -d '{"ref": "skill:blog-writer", "link_types": ["uses_tool", "uses_kb", "uses_prompt"], "depth": 2}' ``` **Why traverse?** An orchestrator may need to: - Pre-load required KBs into context - Verify all required tools are available - Build a dependency-aware execution plan ### Step 3: Describe Tool Inspect a tool's configuration before execution. ```bash curl -X POST https://your-domain.com/api/tools/describe \ -H "Content-Type: application/json" \ -H "Cookie: sb-session=..." \ -d '{"tool_ref": "tool:tavily-search"}' ``` **Useful for:** - Checking `risk_level` and `requires_human_approval` before execution - Validating input schema against the `call_spec` - Presenting tool capabilities to users ### Step 4: Execute Tool Execute the tool with input. Use dry-run for testing. ```bash curl -X POST https://your-domain.com/api/tools/execute \ -H "Content-Type: application/json" \ -H "Cookie: sb-session=..." \ -d '{"tool_ref": "tool:tavily-search", "input": {"query": "AI trends 2026", "max_results": 3}}' ``` ## Trace Log Schema External orchestrators should emit trace breadcrumbs for reproducibility: ```json { "run_id": "run-scenario-01-1707753600000", "scenario_id": "scenario-01-blog-writer", "started_at": "2026-02-12T17:00:00.000Z", "completed_at": "2026-02-12T17:00:02.500Z", "steps": [ { "step": "discover", "timestamp": "2026-02-12T17:00:00.100Z", "request": { "method": "POST", "path": "/api/discover", "body": { "intent": "write a blog post", "limit": 5 } }, "response": { "status": 200, "body": { "data": ["..."] } }, "decision": "candidates_found", "rationale": "Returned 3 candidates for intent" }, { "step": "traverse", "timestamp": "2026-02-12T17:00:00.500Z", "request": { "method": "POST", "path": "/api/traverse", "body": { "ref": "skill:blog-writer" } }, "response": { "status": 200, "body": { "links": ["..."], "entities": {} } }, "decision": "graph_explored", "rationale": "Traversed 2 links from skill:blog-writer" } ], "outcome": "success" } ``` ### Required Trace Fields | Field | Type | Description | | -------------- | -------- | --------------------------------------------------- | | `run_id` | string | Unique identifier for this workflow execution | | `scenario_id` | string | Reference to the triggering scenario or intent | | `started_at` | ISO 8601 | When the workflow began | | `completed_at` | ISO 8601 | When the workflow finished | | `steps` | array | Ordered list of step traces | | `outcome` | enum | `success`, `no_match`, `menu_presented`, or `error` | ### Required Step Fields | Field | Type | Description | | ----------- | -------- | ------------------------------------------------ | | `step` | enum | `discover`, `traverse`, `describe`, or `execute` | | `timestamp` | ISO 8601 | When this step executed | | `request` | object | `{ method, path, body? }` | | `response` | object | `{ status, body? }` | | `decision` | string | What the orchestrator decided after this step | | `rationale` | string | Human-readable explanation | ## Failure Handling & Continuation ### Fail-Open Decisions The workflow is designed to fail open: | Failure | Continuation action | Rationale | | ---------------------- | ------------------------ | ------------------------------ | | Discover returns error | Log, exit gracefully | No candidates = no action | | Traverse returns error | Skip, proceed to execute | Dependencies are informational | | Describe returns error | Skip, proceed to execute | Description is optional | | Execute returns error | Log error in trace | Caller decides retry policy | ### Idempotent Retries - **Discover** and **Traverse** are read-only — safe to retry - **Describe** is read-only — safe to retry - **Execute** idempotency depends on the tool's `idempotent` flag (from describe response) - `idempotent: true` → safe to retry - `idempotent: false` → retry with caution (check for side effects) ### Resume from Step The trace log enables resume-from-step: 1. Load the trace from the last run 2. Find the last successful step 3. Resume from the next step with the same `run_id` ```typescript // Pseudo-code for resume const lastTrace = loadTrace(runId) const lastStep = lastTrace.steps[lastTrace.steps.length - 1] if (lastStep.step === 'discover' && lastStep.response.status === 200) { // Skip discover, resume from traverse await stepTraverse(...) } ``` ## Running the Sample ### Prerequisites - Local dev server running (`pnpm dev`) - Authenticated session (browser cookie or API session) ### Run All Scenarios (Dry-Run) ```bash npx tsx scripts/examples/orchestrator_sample_workflow.ts --dry-run ``` ### Run Specific Scenario ```bash npx tsx scripts/examples/orchestrator_sample_workflow.ts --scenario scenario-01-blog-writer --dry-run ``` ### Run Against Remote ```bash ORCHESTRATOR_BASE_URL=https://your-domain.com \ ORCHESTRATOR_COOKIE="sb-access-token=..." \ npx tsx scripts/examples/orchestrator_sample_workflow.ts --dry-run ``` ## Fixture Scenarios | ID | Description | Tags | | ------------------------------- | ------------------------------------------- | ---------------------------------- | | `scenario-01-blog-writer` | Single high-confidence match → auto-execute | happy-path, single-match | | `scenario-02-ambiguous-request` | Multiple candidates → menu | menu-selection, ambiguous | | `scenario-03-no-match` | No candidates → graceful exit | no-match, graceful-exit | | `scenario-04-tool-direct` | Direct tool reference → describe + execute | happy-path, tool-direct | | `scenario-05-dependency-walk` | Skill → traverse dependency graph | dependency-walk, graph-exploration | ## Files | File | Purpose | | ------------------------------------------------------- | ------------------------------- | | `scripts/examples/orchestrator_sample_workflow.ts` | Runnable workflow script | | `scripts/examples/fixtures/orchestrator-scenarios.json` | Deterministic scenario fixtures | | `__tests__/scripts/orchestrator-workflow.test.ts` | Fixture schema + coverage tests | | `docs/runbooks/interop/orchestrator-workflow.md` | This document | --- ## Source: docs/runbooks/interop/README.md # Interop Runbooks Operational notes and stable contracts for external consumers integrating with Katailyst. Start here: - `docs/runbooks/interop/registry-api-contract.md` — canonical HTTP contract (discover, traverse, tools/describe, tools/execute, plugins/export) - `docs/runbooks/interop/examples/registry-api.http.md` — curl examples - `docs/runbooks/interop/examples/typescript-client.md` — minimal typed TS client Reference integrations: - `docs/runbooks/interop/slack-like-integration.md` — webhook adapter (signature verification, replay/idempotency, discovery-driven execution) - `docs/runbooks/interop/runtime-fleet-hardening.md` — canary/stable rollout checks, rollback triggers, and post-deploy smoke commands - `docs/runbooks/interop/orchestrator-workflow.md` — end-to-end orchestrator sample (discover -> traverse -> describe -> execute) with trace schema - `docs/runbooks/interop/mcp-ai-sdk-adapter.md` — MCP + AI SDK transport selection, auth flows, troubleshooting --- ## Source: docs/runbooks/interop/registry-api-contract.md # Registry API Contract Canonical reference for external consumers integrating with Katailyst registry/discovery APIs. > **Auth model (runtime truth):** > > - `/api/*` endpoints in this runbook require a valid Supabase session. > - `/mcp` supports `anonymous`, static `bearer`, `personal_token`, `connect_token`, or `supabase` auth modes (controlled by MCP env flags and token issuance). > - Plugin export is `POST /api/plugins/export` (JSON body), not GET query string. --- ## Endpoints | Method | Path | Purpose | | ------ | --------------------- | ----------------------------------------------- | | POST | `/api/discover` | Semantic/intent discovery of registry entities | | POST | `/api/traverse` | Graph traversal of entity links | | POST | `/api/tools/describe` | Describe a tool entity's config and call spec | | POST | `/api/tools/execute` | Execute a tool with input payload | | POST | `/api/plugins/export` | Export a portable `.claude-plugin/` ZIP package | --- ## Auth Matrix (Runtime-Backed) | Surface | Auth Modes | Notes | | ----------------------------------------------------------------------- | -------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `/api/discover`, `/api/traverse`, `/api/tools/*`, `/api/plugins/export` | `supabase` session | Session cookie (`sb-*`) required; org resolution from membership rules. | | `/mcp` | `anonymous \| bearer \| personal_token \| connect_token \| supabase` | Controlled by `MCP_ALLOW_ANONYMOUS`, static env bearer tokens, per-user personal MCP tokens, and short-lived connect tokens; response includes `x-katailyst-mcp-auth`. | | `/api/mcp/personal-tokens*`, `/api/mcp/connect-token` | `supabase` session | Dashboard-managed token issuance surfaces. Personal-token routes are user-scoped and should be called same-origin. | | `/api/endpoints`, `/api/openapi.json` | public | Read-only inventory/spec surfaces. | MCP org resolution behavior for write-capable calls: 1. `execution_org` when explicit org is supplied by caller. 2. `membership` resolution when actor has one or more org memberships. 3. `single_membership` auto-select when exactly one org is available. --- ## Error Envelope All JSON-returning endpoints (discover, traverse, tools/describe, tools/execute) use a consistent error shape: ```json { "error": { "code": "unauthorized", "message": "Not authenticated" } } ``` The `/api/plugins/export` endpoint uses a simpler shape for some errors: `{ "error": "<string>" }`. ### Common Error Codes | HTTP Status | Code | Meaning | | ----------- | ------------------------- | ---------------------------------------------------------------------- | | 400 | `bad_request` | Invalid JSON, missing required fields, or malformed parameters | | 401 | `unauthorized` | No valid auth for endpoint policy (session/bearer as applicable) | | 403 | `forbidden` | Insufficient org role (owner/admin/editor required for tool endpoints) | | 404 | `tool_not_found` | Referenced entity does not exist or is not published/curated | | 409 | `requires_human_approval` | Tool requires human approval before execution | | 500 | `*_failed` | Internal error (DB query, RPC failure) | | 502 | `upstream_error` | External tool endpoint returned an error | | 504 | `upstream_error` | External tool polling timed out | --- ## Ref Format Entity references follow the pattern `type:code` (e.g., `skill:blog-system-v1`, `tool:tavily-search`). Accepted entity types: `skill`, `tool`, `kb`, `prompt`, `schema`, `style`, `content_type`, `recipe`, `bundle`, `playbook`, `channel`, `agent`, `eval_case`, `rubric`, `metric`, `lint_rule`, `lint_ruleset`. --- ## 1. POST `/api/discover` Discover registry entities via semantic intent, tag filters, or faceted browsing. Recommended orchestrator pattern: - ask for an initial `limit: 10-20` - let the agent select, inspect, or traverse - if quality is low, call `/api/discover` again with refined `intent`/`types`/`tags` - if quality is still low or breadth is needed, continue with `cursor` until exhausted ### Request Body ```json { "intent": "tools for web search", "types": ["tool", "skill"], "tags": ["domain:research"], "families": ["development"], "bundles": ["base-content-kit"], "org_id": "uuid-here", "limit": 20, "cursor": null, "facets": false, "debug": false, "rerank": { "enabled": true, "provider": "auto", "top_n": 20, "model": "rerank-v4.0-fast" } } ``` | Field | Type | Default | Description | | ----------------- | -------------------------- | ------- | ----------------------------------------------------------------------------------------------------------------- | | `intent` | `string \| null` | `null` | Natural-language search query (semantic matching) | | `types` | `string[] \| string` | all | Entity type filter. Accepts array or CSV. Also accepts `type` key. | | `tags` | `string[]` | none | Tag filters in `namespace:code` format | | `families` | `string[]` | none | Family shortcuts — auto-prefixed with `family:`. Also accepts `family` key. | | `bundles` | `string[]` | none | Filter by bundle membership. Also accepts `bundle` key. | | `org_id` | `string \| null` | `null` | Scope to a specific org. Also accepts `orgId`. | | `limit` | `number` | `20` | Results per page. Clamped to `[1, 200]`. | | `cursor` | `string \| null` | `null` | Opaque pagination cursor from `next_cursor` in previous response. | | `facets` | `boolean` | `false` | Include taxonomy facet counts in response. | | `debug` | `boolean` | `false` | Include raw scoring signals per result. | | `rerank` | `object \| boolean` | enabled | Optional retrieve-then-rerank stage (set `false` to disable). | | `rerank.provider` | `auto \| cohere \| voyage` | `auto` | Provider preference for rerank stage. | | `rerank.top_n` | `number` | `null` | Optional rerank target depth (`1-200`). Values below `limit` are raised to `limit` to preserve cursor continuity. | | `rerank.model` | `string \| null` | `null` | Optional provider model override. | ### Response (200) ```json { "data": [ { "ref": "tool:tavily-search", "entity_type": "tool", "code": "tavily-search", "name": "Tavily Search", "summary": "Web search tool powered by Tavily API", "use_case": "Research and information gathering", "priority_tier": 1, "rating": 92, "tags": ["domain:research", "family:development"], "links_to": ["skill:research-trends"], "score": 0.873, "match_reasons": ["intent_match", "tag_match"], "rerank_provider": "cohere", "rerank_score": 0.932 } ], "next_cursor": "eyJzY29yZSI6MC44NzN9", "meta": { "pagination": { "requested_limit": 20, "applied_limit": 20, "has_more": true, "continuation_supported": true, "cursor_order": "response_order", "warnings": [] }, "rerank": { "enabled": true, "requested_provider": "auto", "used_provider": "cohere", "fallback_used": false, "top_n": 20, "warnings": [] } }, "facets": { "family": [{ "tag": "family:development", "count": 12 }], "action": [], "stage": [], "domain": [{ "tag": "domain:research", "count": 5 }] } } ``` ### Pagination Cursor-based. The API fetches `limit + 1` rows. If an extra row exists, its position is encoded as `next_cursor` (base64url JSON with `score`, `priority_tier`, `name`, `ref`). Pass it as `cursor` in the next request. `next_cursor` is `null` when no more pages exist. Rerank behavior and pagination: - Rerank is enabled by default on first-page requests. - Cursor requests skip rerank and preserve base DB ordering. - When first-page rerank is enabled and additional pages exist, `next_cursor` still points to base DB ordering (see `meta.pagination.cursor_order`). - If `rerank.top_n` is lower than `limit`, the API raises effective rerank depth to `limit` so continuation does not skip records. - In auto mode, providers are attempted as `cohere -> voyage`. When `facets=true`, the internal fetch cap is `max(limit, 50)` (up to 200) to provide richer facet counts. ### Errors | Status | Code | Condition | | ------ | ----------------- | ----------------------------------------- | | 401 | `unauthorized` | No valid session | | 400 | `bad_request` | Invalid JSON, non-object body, bad cursor | | 500 | `discover_failed` | Supabase `discover_v2` RPC error | --- ## 2. POST `/api/traverse` Traverse the entity link graph starting from a root ref. ### Request Body ```json { "ref": "skill:blog-system-v1", "link_types": ["uses_tool", "requires"], "depth": 2, "org_id": null } ``` | Field | Type | Default | Description | | ------------ | -------------------- | ------- | ---------------------------------------------------------------------- | | `ref` | `string` | — | **Required.** Root entity ref (must contain `:`) | | `link_types` | `string[] \| string` | all | Filter to specific link types. CSV accepted. Also accepts `linkTypes`. | | `depth` | `number` | `1` | Traversal depth. Clamped to `[1, 5]`. | | `org_id` | `string \| null` | `null` | Scope to org. Also accepts `orgId`. | ### Valid Link Types ``` governed_by_pack, bundle_member, uses_tool, uses_prompt, uses_kb, often_follows, requires, alternate, recommends, pairs_with, prerequisite, supersedes, parent, related ``` ### Response (200) ```json { "root": "skill:blog-system-v1", "links": [ { "depth": 1, "source": "skill:blog-system-v1", "target": "tool:tavily-search", "link_type": "uses_tool", "weight": 1.0 } ], "entities": { "skill:blog-system-v1": { "name": "Blog System v1", "summary": "End-to-end blog production skill" }, "tool:tavily-search": { "name": "Tavily Search", "summary": "Web search tool" } } } ``` ### Pagination None. Returns all links up to the specified depth. ### Errors | Status | Code | Condition | | ------ | ----------------- | -------------------------------------- | | 401 | `unauthorized` | No valid session | | 400 | `bad_request` | Missing/malformed `ref`, invalid types | | 500 | `traverse_failed` | Supabase `traverse_links` RPC error | --- ## 3. POST `/api/tools/describe` Describe a tool entity's configuration, call spec, and examples. ### Auth Requires **owner**, **admin**, or **editor** role in the execution org. ### Request Body ```json { "tool_ref": "tool:tavily-search", "org_id": "uuid-here" } ``` | Field | Type | Default | Description | | ---------- | ---------------- | ---------------- | ------------------------------------------------------------------------------------------ | | `tool_ref` | `string` | — | **Required.** Tool ref. Also accepts `toolRef` or `ref`. Auto-prefixes `tool:` if missing. | | `org_id` | `string \| null` | auto from single | Execution org. Also accepts `orgId`. Auto-resolved if user has exactly 1 org. | ### Response (200) ```json { "tool": { "ref": "tool:tavily-search", "code": "tavily-search", "name": "Tavily Search", "status": "published", "summary": "Web search powered by Tavily", "use_case": "Research and information gathering", "tool_org_id": "00000000-0000-0000-0000-000000000001", "tags": ["domain:research", "family:development"] }, "config": { "tool_type": "api", "provider": "tavily", "action": "search", "endpoint_url": "https://api.tavily.com/search", "auth_method": "api_key_in_body", "auth_secret_key": "TAVILY_API_KEY", "risk_level": 1, "requires_human_approval": false, "idempotent": true }, "call_spec": { "...": "tool-specific call specification" }, "examples": [{ "...": "usage examples" }] } ``` ### Errors | Status | Code | Condition | | ------ | -------------------- | ------------------------------------------ | | 401 | `unauthorized` | No valid session | | 400 | `bad_request` | Invalid body or missing tool_ref | | 400 | `org_required` | User in multiple orgs, no org_id specified | | 403 | `forbidden` | Insufficient org role | | 404 | `tool_not_found` | Tool entity or tools row not found | | 500 | `memberships_failed` | DB error fetching org memberships | | 500 | `tool_lookup_failed` | DB error querying registry entities | | 500 | `tool_fetch_failed` | DB error querying tools table | --- ## 4. POST `/api/tools/execute` Execute a tool with the provided input payload. ### Auth Requires **owner**, **admin**, or **editor** role in the execution org. ### Request Body ```json { "tool_ref": "tool:tavily-search", "org_id": "uuid-here", "input": { "query": "latest AI research papers", "max_results": 5 } } ``` | Field | Type | Default | Description | | ---------- | ---------------- | ---------------- | ------------------------------------------------------ | | `tool_ref` | `string` | — | **Required.** Same normalization as `/tools/describe`. | | `org_id` | `string \| null` | auto from single | Execution org. Same auto-resolve logic. | | `input` | `unknown` | `undefined` | Tool-specific input payload. | ### Response (200) ```json { "tool": "tool:tavily-search", "data": { "results": [ { "title": "...", "url": "...", "content": "..." } ] } } ``` ### Execution Modes 1. **Call spec HTTP** — If the tool's entity revision has a `call_spec`, uses generic HTTP execution with URL templating, auth injection, and optional async polling. 2. **Hardcoded providers** — Falls back to built-in executors for known `provider/action` combos (`tavily/search`, `bash/run`, `publish/email`). 3. **No match** — Returns `400 invalid_tool`. ### Errors | Status | Code | Condition | | ------ | ----------------------------- | -------------------------------------------- | | 401 | `unauthorized` | No valid session | | 400 | `bad_request` | Invalid body or missing tool_ref | | 400 | `org_required` | Ambiguous org | | 400 | `invalid_tool` | No executor for this provider/action | | 400 | `invalid_input` | Input validation failed | | 400 | `missing_auth_secret_key` | Tool needs auth but no secret key configured | | 400 | `missing_integration_secret` | No integration_secrets row found | | 400 | `inactive_integration_secret` | Secret is deactivated | | 400 | `invalid_secret_reference` | No vault reference on the secret | | 400 | `missing_vault_secret` | Vault decryption failed | | 403 | `forbidden` | Insufficient org role | | 404 | `tool_not_found` | Tool entity or config not found | | 409 | `requires_human_approval` | Tool flagged for manual approval | | 500 | `tool_call_failed` | Catch-all execution failure | | 502 | `upstream_error` | External endpoint returned an error | | 504 | `upstream_error` | Async polling timed out | --- ## 5. POST `/api/plugins/export` Export a portable `.claude-plugin/` ZIP package of skills, prompts, and agents. ### Request Body ```json { "orgId": "uuid-here", "statuses": ["curated", "published"], "includeTags": ["domain:writing"], "excludeTags": ["stage:deprecated"], "compatibilityProfile": "plugin_portable", "includeArtifacts": true, "dryRun": false } ``` | Field | Type | Default | Description | | ---------------------- | -------------------- | -------------------- | --------------------------------------------------------------- | | `orgId` / `org` | `string \| null` | auto from membership | Execution org UUID. Must be one of user's memberships. | | `statuses` / `status` | `string[] \| string` | `curated,published` | Statuses included in export selection. | | `includeTags` | `string[] \| string` | none | Include filters; supports wildcard (`namespace:*`) matching. | | `excludeTags` | `string[] \| string` | none | Exclude filters applied after include selection. | | `compatibilityProfile` | `string` | `plugin_portable` | Compatibility contract for export payload validation. | | `includeArtifacts` | `boolean` | `true` | Include artifact payloads in export when present. | | `dryRun` / `dry_run` | `boolean` | `false` | Return export plan and diagnostics without returning ZIP bytes. | ### Response (200) - `Content-Type: application/zip` - `Content-Disposition: attachment; filename="<name>-<version>.zip"` - Body: ZIP archive containing: - `.claude-plugin/plugin.json` (plugin manifest) - `.claude-plugin/marketplace.json` (marketplace manifest) - `README.md` - `LICENSE` - `package.json` - `skills/<code>/SKILL.md` files - `commands/<code>.md` files - `agents/<code>.md` files - Hook files ### Errors | Status | Body | Condition | | ------ | -------------------------------------------- | -------------------------------- | | 401 | `{ "error": "unauthorized" }` | No valid session | | 400 | `{ "error": "no_org_membership" }` | User has no org memberships | | 403 | `{ "error": "forbidden" }` | User role not owner/admin | | 400 | `{ "error": "invalid_export_options", ... }` | Invalid export selection/profile | | 500 | `{ "error": "export_failed", ... }` | Export planning/zip failure | --- ## Rate Limiting No explicit rate limiting is enforced at the application layer. The underlying Supabase and PostgreSQL connection pools provide natural throughput limits. **Guidance for consumers:** - Limit concurrent requests to 10 per session. - Implement exponential backoff on 5xx errors. - Cache discovery results locally when possible (results are stable within minutes). ## Idempotency - **Discover** and **traverse** are inherently idempotent (read-only). - **Tools/describe** is idempotent (read-only). - **Tools/execute** is NOT idempotent unless the tool config has `idempotent: true`. - **Plugins/export** is idempotent (read-only snapshot). For non-idempotent tool executions, consumers should implement their own deduplication using request IDs. ## Retry Guidance | Error Type | Retry? | Strategy | | ---------- | ------ | ----------------------------------------- | | 400 | No | Fix the request | | 401 | No | Re-authenticate and get a fresh session | | 403 | No | Check org membership and role | | 404 | No | Verify the entity exists and is published | | 409 | No | Requires out-of-band human approval | | 500 | Yes | Exponential backoff, max 3 retries | | 502/504 | Yes | Exponential backoff, longer intervals | --- ## Source: docs/runbooks/morning-brief/morning-brief.md # Morning brief runbook **Runs:** daily, 07:00 America/Los_Angeles. **Owner surface:** Katailyst (operator plane) + Evidence-Based Business (metrics). **Canonical landing folder:** `docs/reports/morning-brief/YYYY-MM-DD/` **Primary daily output:** `docs/reports/morning-brief/YYYY-MM-DD/index.html` (AIRO-grade — see §"Visual contract" below) **Markdown source-of-truth:** `docs/reports/morning-brief/YYYY-MM-DD/index.md` (data layer, fed into the HTML template) **Home screen (evergreen):** `docs/reports/morning-brief/index.html` — links to today's brief and doctrine companions **Case-study archive:** `docs/planning/active/scratch/ai-money-map/day-N-<company>.html` **Skill:** `.claude/skills/case-study-harvester/` **Seed queue:** `docs/planning/active/scratch/ai-money-map/queue.md` --- ## Why this exists Alec's ecosystem is big and the window for AI wedge moves is short. The morning brief exists so that every day the operator shows up with three things already done: 1. A Gamma-style case study of one top-1% AI operator, with the pattern extracted and mapped to HLT. 2. One new skill added to the Claude skill library, derived from that case study. 3. A pruning + health pass across the fleet so nothing silently rots. The brief is deliberately short. The case study is deliberately rich. The skill is deliberately mechanical. That mix is the point. ## Seven steps, in order ### 1. Harvest the day's case study Run the `case-study-harvester` skill. Inputs: read the top unblocked entry from `docs/planning/active/scratch/ai-money-map/queue.md`. If the queue is empty, refuse and alert the operator — do not pick a random company. Outputs: - `docs/planning/active/scratch/ai-money-map/day-N-<company>.html` (Gamma-style, mermaid-embedded, uses `day-1-abridge.html` as the few-shot template — this is the **doctrine** aesthetic). - `docs/reports/morning-brief/YYYY-MM-DD/index.md` — the data layer: markdown source-of-truth for today's brief (case study link, three primitives, three HLT actions, fleet health, BML rows, pruning, tomorrow). Shortest-possible. - `docs/reports/morning-brief/YYYY-MM-DD/index.html` — the **AIRO-grade** rendered brief. Uses the day-1 (2026-04-16) template as the few-shot. This is what gets shared and what the home screen CTA lands on. See "Visual contract" below. The case study (`day-N-<company>.html`) must contain: TL;DR, at-a-glance metrics, revenue stack mermaid, product shape, moat layers, pattern to steal, HLT application table, tomorrow preview. Do not deviate from the template. The rendered brief (`YYYY-MM-DD/index.html`) must contain: topbar with brand dot + breadcrumb, hero (kicker pulse + H1 + lede + primary CTA to case study + ghost CTA to doctrine), stat mockup pane, three primitive cards (cyan / magenta / gold accent bars), three action cards, today's skill tile, fleet health surface strip, BML 5-row table (instrumented or "probe pending"), pruning pill, tomorrow teaser with the company name in magenta and a preview-CTA, footer with canonical + production path + companion docs. Same 6-section shape as the data-layer markdown. Do not deviate. ### 2. Add today's skill Every case study names one skill to add. Create it. Skill location: `.claude/skills/<slug>/SKILL.md` with description front-matter conforming to the existing skills convention (trigger phrases + concise purpose). Register it in the skill index (check `docs/planning/active/scratch/ai-money-map/skill-log.md`) so we keep a running diary of which case study birthed which skill. ### 3. Fleet health pass Probe every surface in `docs/planning/active/scratch/ecosystem-contract/llms-surfaces.json`. For each surface listed with `health_endpoint`: hit it, capture status + latency, write one row to `docs/reports/morning-brief/YYYY-MM-DD/probe.ndjson`. Fail loud on: - any critical or high-tier surface returning non-2xx - a 3σ latency regression vs the 7-day baseline - a `llms.txt` drift alert (see step 5) Feed the results into the fleet map summary section of the daily brief. ### 4. Pruning pass Walk the scratch folders and remove files that: - are older than 14 days, - are not referenced from any active planning doc (`docs/planning/active/*`) or runbook, - are not part of the canonical artifact list (`ecosystem-contract/`, `ai-money-map/`, `ai4mastery-surface/`). Dry-run first. Emit a `docs/reports/morning-brief/YYYY-MM-DD/pruning.md` report listing candidates. Only delete on operator approval (the approval queue, same pattern as every other surface). ### 5. llms.txt audit Read `llms.txt` and `llms-full.txt`. Diff the surface list against `llms-surfaces.json`. Any surface named in llms.txt that isn't in the probe list, or any surface in the probe list that isn't in llms.txt → drift event. Drift events land in `docs/reports/morning-brief/YYYY-MM-DD/drift.md` and trigger a Novu notification via the existing `fleet-drift-default` template. ### 6. Build-measure-learn snapshot Pull yesterday's numbers for the five surfaces: - articles published (MasteryPublishing publish endpoint) - social pushed (sidecar-system destination log) - ads running (Marketo + any paid placements) - emails sent (publish.email + AgentMail) - upgrade screens shown (Framer + MasteryPublishing instrumentation) Write them into the daily report. Flag any surface with zero activity two days in a row — that is a dead feedback loop, not a quiet day. ### 7. Tomorrow's prep End of brief: verify tomorrow's queue entry is seeded. If the queue has fewer than 7 entries remaining, top it up with companies from the evergreen watchlist in `queue.md`. --- ## Daily report shape `docs/reports/morning-brief/YYYY-MM-DD/index.md` always has this outline: ```md # Morning brief — <date> ## 1. Case study of the day - link to day-N-<company>.html - three primitives to steal - three HLT actions this week (with owner + surface) ## 2. Today's new skill - skill slug + what it does + when it fires ## 3. Fleet health - green/yellow/red table of surfaces - any drift events ## 4. Build-measure-learn metrics - 5-row table: surface | units yesterday | 7-day trend | flag ## 5. Pruning candidates - list of files to remove (operator approves) ## 6. Tomorrow - queued case study + one-line teaser ``` Keep the top of the report to ten sentences or less. Everything else is linked, not inlined. ## Visual contract (AIRO-grade) The rendered brief at `docs/reports/morning-brief/YYYY-MM-DD/index.html` is the team-shareable artifact. It must render at the quality bar established on 2026-04-16. Template file: `docs/reports/morning-brief/2026-04-16/index.html`. Home screen: `docs/reports/morning-brief/index.html`. ### Single source of truth: `docs/assets/brief-system.css` Every page in the brief family (home, daily briefs, and the linked case-study pages under `docs/planning/active/scratch/ai-money-map/`) **must** link the shared stylesheet: ```html <link rel="stylesheet" href="<relative-path>/docs/assets/brief-system.css" /> ``` - From `docs/reports/morning-brief/index.html` → `../../assets/brief-system.css` - From `docs/reports/morning-brief/YYYY-MM-DD/index.html` → `../../../assets/brief-system.css` - From `docs/planning/active/scratch/ai-money-map/*.html` → `../../../../assets/brief-system.css` ### The palette contract (enforced by construction, not memory) - **No brief-family page may define its own `:root` color tokens.** The shared stylesheet is the only place tokens live. - **No brief-family page may introduce a new accent color.** The palette is **cyan** (`--cyan: #00f6d0`) + **magenta** (`--ff4fd8`) + neutrals. Gold, coral, and blue accents have been retired. - **Legacy pages retrofit via remap, not rewrite.** A legacy page keeps its local grammar but remaps its local variables (`--accent`, `--accent-2`, …) to `var(--cyan)` / `var(--magenta)` inside a single `:root` block that appears _after_ the shared stylesheet link. - **Use the `.bs-*` class system for common primitives.** Page-specific layout is fine, but it must reference the canonical tokens via `var(--cyan)`, `var(--magenta)`, `var(--ink)`, `var(--surface)`, `var(--line)`. ### Class primitives (shared) | Class | Purpose | | ------------------------------------------- | ---------------------------------------------------- | | `.bs-nav` | Topbar with brand dot + breadcrumb | | `.bs-hero` + `.kicker` + `.sub` + `.bs-cta` | Hero with kicker pulse, H1, lede, primary CTA | | `.bs-feature` | Stat-in-hero / mockup pane with headline + row stack | | `.bs-eyebrow` + `.bs-h2` + `.bs-tag` | Section header triptych | | `.bs-cards` / `.bs-card` | Three-up card row (primitive / action / line) | | `.bs-acts` / `.bs-act` | Action cards with owner pill | | `.bs-tiles` / `.bs-tile` | Surface roster (fleet character grid) | | `.bs-skill` | Single-skill tile with pill | | `.bs-table-wrap` / `.bs-table` | BML five-row table | | `.bs-pill` | Pruning / status pill | | `.bs-teaser` | Tomorrow preview CTA | | `.bs-footer` | Doctrine + operations refs + path meta | If a new primitive is needed, **add it to the shared stylesheet first**, then use it. Never redefine an existing primitive locally. **Rubrics enforced on every rendered brief.** These came out of the 2026-04-16 AIRO batch and are now mandatory for the HTML brief: | Rubric | How it shows up | | ------------------------------------ | ------------------------------------------------------------------------------------------------------------------ | | `rubric:hero-mockup-in-motion` | Right pane of hero is a live-looking UI card with today's actual stat rows | | `rubric:character-grid-fleet` | Fleet health rendered as a visual roster (dots + status pills), not a plain list | | `rubric:comparison-table-us-vs-them` | When the day's case includes a competitive take, render it as a 4-column table with HLT column highlighted | | `rubric:pricing-tier-anchor` | When the brief surfaces a pricing / tier decision, show three cards (anchor on middle) | | `rubric:single-payment-anchor` | Tomorrow teaser and skill tile anchor on one memorable phrase, never a pricing grid | | `rubric:stat-in-hero` | CFO-friendly unit must appear in the mockup pane of the hero | | `rubric:single-cta-repeated` | One primary CTA verb, repeated ≥3× down the page ("Read the case study") | | `rubric:7-block-landing` | The 6 canonical sections + footer match the 7-block shape (hero / today / skill / fleet / BML / tomorrow / footer) | Any rendered brief that fails to render ≥6 of the 8 rubrics above is kicked back to `case-study-harvester` before the operator sees it. **Failure mode.** If the HTML render would dilute quality (missing section, broken mockup pane, no stat in hero), ship the markdown brief anyway and file a `render-pending` flag in the daily log. The markdown is never blocked on HTML. ## How this runbook is enforced - The scheduled task invokes this runbook. See `scripts/ops/run_morning_brief.ts` (to be created) or the Claude-scheduled-task created via `mcp__scheduled-tasks__create_scheduled_task`. - If the skill `case-study-harvester` is missing, the scheduled run must halt and alert, not improvise. - If `llms-surfaces.json` is missing or outdated, the scheduled run must halt and alert. - If any brief-family page grows a local `:root` color-token block that re-declares `--cyan` / `--magenta` / `--bg` / etc., the run halts and alerts — the shared stylesheet is the only token surface. - The approval queue (same one that drives articles/social/ads/emails) is the gate on every destructive pruning action. ### Pre-publish grep gate Before any brief renders for a new day, the harvester runs these checks: ``` grep -nE '--neon-(cyan|magenta|gold|coral)' docs/reports/morning-brief docs/planning/active/scratch/ai-money-map -r grep -n 'rel="stylesheet" href=".*brief-system.css"' docs/reports/morning-brief/YYYY-MM-DD/index.html ``` - First grep must return **no matches**. Old `--neon-*` tokens are retired; the shared stylesheet owns the palette. - Second grep must return **exactly one match**. Every daily brief links the shared stylesheet exactly once. ## When to deviate Only when: - Operator explicitly runs an ad-hoc brief (`/morning-brief ad-hoc <company>`). - A fleet incident outranks the case study (then the brief becomes an incident brief; the queued company shifts one day). - Queue is empty AND operator has approved an evergreen fallback. Otherwise: do not improvise. The value is in the discipline. --- ## Source: docs/runbooks/paperclip/hlt-content-factory-company.md # Runbook — HLT Content Factory (Paperclip company setup) **Status:** Bootstrap procedure. Execute in order. **Related:** [paperclip investigation doc](../../reports/doc/2026-04-16-paperclip-katalyst-screenpipe-investigation.md), [nurse recruiting pilot](./nurse-recruiting-pilot.md). ## Purpose Stand up "HLT Content Factory" as a first-class Paperclip company so article generation runs through the same governance backbone (budgets, approval gates, audit log) that every future vertical will use. This runbook is the smallest useful wire-up: one company, one goal, one issue, one approval, one `--execute` of the content-engine sync. Proves the plumbing before we scale. ## Prerequisites - Paperclip server running with `DATABASE_URL` pointing at the dedicated Paperclip Postgres (per the investigation doc's decision C). - Paperclip operator API key for local admin calls. - `KATAILYST_API_KEY` set in the environment from which you'll run the approval-gated sync (must match the content engine upstream). - At least one published article in the Katailyst registry for org `hlt` (verify with `scripts/integrations/content-engine-sync-articles.ts --org-code hlt` in dry-run mode first). ## Steps ### 1. Create the company `POST /api/companies` on the Paperclip server: ```bash curl -sS -X POST "$PAPERCLIP_API_URL/api/companies" \ -H "Authorization: Bearer $PAPERCLIP_OPERATOR_TOKEN" \ -H "Content-Type: application/json" \ -d @- <<'JSON' { "name": "HLT Content Factory", "description": "Publishes nursing/healthcare content from the Katailyst registry to the Mastery content engine, gated by approval.", "issuePrefix": "HLT", "budgetMonthlyCents": 50000, "requireBoardApprovalForNewAgents": true, "brandColor": "#10b981" } JSON ``` Capture `company.id` from the response. Export it as `PAPERCLIP_HLT_COMPANY_ID` for the rest of the runbook. Why these values: - `issuePrefix: "HLT"` keeps issue IDs (`HLT-1`, `HLT-2`, ...) readable and matches the org code already used in the Katailyst registry. - `budgetMonthlyCents: 50000` = $500/mo for pilot. Raises later once cost shape is known. - `requireBoardApprovalForNewAgents: true` — we do not want any agent to auto-join this company. ### 2. Create a content-publishing goal ```bash curl -sS -X POST "$PAPERCLIP_API_URL/api/companies/$PAPERCLIP_HLT_COMPANY_ID/goals" \ -H "Authorization: Bearer $PAPERCLIP_OPERATOR_TOKEN" \ -H "Content-Type: application/json" \ -d @- <<'JSON' { "title": "Publish weekly nursing articles to hltmastery.com", "description": "Each week, promote one vetted nursing article from the Katailyst registry to the Mastery content engine as 'published'. All publishes go through Paperclip approval.", "targetMetric": "1 article published per week", "horizon": "quarterly" } JSON ``` Capture `goal.id` as `PAPERCLIP_HLT_GOAL_ID`. ### 3. Create the first issue (one article, one publish) Pick a Katailyst article code to pilot with. Example: `article-nclex-labs`. ```bash curl -sS -X POST "$PAPERCLIP_API_URL/api/companies/$PAPERCLIP_HLT_COMPANY_ID/issues" \ -H "Authorization: Bearer $PAPERCLIP_OPERATOR_TOKEN" \ -H "Content-Type: application/json" \ -d @- <<'JSON' { "goalId": "'"$PAPERCLIP_HLT_GOAL_ID"'", "title": "Publish article-nclex-labs to content engine as 'published'", "body": "Article code: article-nclex-labs\nOrg: hlt\nSink: MasteryPublishing content engine (POST /api/publish)\nStatus target: published\n\nRun the approval-gated sync: scripts/integrations/content-engine-sync-articles-gated.ts --org-code hlt --codes article-nclex-labs --publish --paperclip-approval-id <id> --execute.", "labels": ["publish", "content-engine", "nursing"], "status": "todo" } JSON ``` Capture `issue.id` as `PAPERCLIP_HLT_ISSUE_ID`. ### 4. Create an approval request against the issue ```bash curl -sS -X POST "$PAPERCLIP_API_URL/api/companies/$PAPERCLIP_HLT_COMPANY_ID/approvals" \ -H "Authorization: Bearer $PAPERCLIP_OPERATOR_TOKEN" \ -H "Content-Type: application/json" \ -d @- <<'JSON' { "issueId": "'"$PAPERCLIP_HLT_ISSUE_ID"'", "kind": "content_publish", "title": "Publish article-nclex-labs to hltmastery.com/nursing", "body": "Approve promotion of the Katailyst article `article-nclex-labs` (org: hlt) to 'published' status on the Mastery content engine. Destination URL: https://hltmastery.com/nursing/resources/<product_slug>/article-nclex-labs.", "requestedBy": "operator", "validityHours": 24 } JSON ``` Capture `approval.id` as `PAPERCLIP_APPROVAL_ID`. Approve it via the Paperclip dashboard (board role) before running step 5. > **Approvals are single-use.** Once the gated sync consumes an approval (state transitions from `approved` → `approved.consumed`), it cannot be reused. If the sync fails mid-run and needs to be re-attempted, create a fresh approval. This aligns the audit trail with per-run cost accounting: one approval, one execution, one cost event. ### 5. Run the approval-gated sync ```bash cd /Users/alecwhitters/Downloads/katailyst-1 PAPERCLIP_API_URL=... \ PAPERCLIP_OPERATOR_TOKEN=... \ KATAILYST_API_KEY=... \ npx tsx scripts/integrations/content-engine-sync-articles-gated.ts \ --org-code hlt \ --codes article-nclex-labs \ --publish \ --paperclip-approval-id $PAPERCLIP_APPROVAL_ID \ --execute ``` What this does (see [scripts/integrations/content-engine-sync-articles-gated.ts](../../../scripts/integrations/content-engine-sync-articles-gated.ts)): 1. Calls `GET $PAPERCLIP_API_URL/api/approvals/<id>`. 2. Refuses to proceed unless the approval is `status=approved`, not expired, and not already `consumed` (single-use enforcement). 3. Delegates to the existing `content-engine-sync-articles.ts` flow with all passed-through flags. 4. On success, posts a comment to the issue (`POST /api/issues/<id>/comments`) with the publish result, and marks the approval `consumed` so it cannot be replayed. 5. On failure, posts the error message as a comment and exits non-zero. The approval is NOT marked consumed on failure — the operator must decide whether to retry the same approval (if the failure was transient and no publish side-effect occurred) or create a fresh approval (default, safest). ### 6. Close the issue The sync posts a comment with the publish result. If success, the operator marks the issue `done` on the Paperclip dashboard. The audit log entry (`activity_log` + `cost_events`) captures the full lifecycle. ## Verification After step 5 completes successfully: - Article is live at `https://hltmastery.com/nursing/resources/<product_slug>/article-nclex-labs`. - Issue `HLT-1` has status `done` with a comment pointing at the publish result. - Approval `PAPERCLIP_APPROVAL_ID` is in state `approved.consumed`. - `activity_log` has entries for the approval consumption and the issue comment. ## What this runbook intentionally skips (for v1) - Agent-driven publishing. This runbook has a human at the approval gate and a human running the CLI. Agent participation comes later when the `mastra-gateway` adapter is registered and a content-factory Mastra workflow is deployed. - Unpublish/rollback. If the published article turns out to be wrong, the human re-runs the sync against an updated Katailyst revision; content-engine upserts by `katailyst_id`. - Multi-article batches. Keep the first run to one article so any failure has a single root cause to investigate. ## Troubleshooting | Symptom | Likely cause | Fix | | ---------------------------- | -------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `approval not found` | Wrong `PAPERCLIP_APPROVAL_ID` or wrong API URL | Check `$PAPERCLIP_API_URL/api/approvals/<id>` in a browser | | `approval expired` | More than 24h since creation | Create a new approval (step 4) | | `approval already consumed` | Someone already ran the gated sync with this approval ID | Approvals are single-use — create a fresh approval (step 4). If the prior run failed and no publish side-effect occurred, you can reuse the same issue; if the prior run succeeded and you're trying to re-publish, verify whether a re-publish is actually intended before creating the fresh approval. | | `KATAILYST_API_KEY rejected` | Content-engine env out of sync | Rotate key in both Katailyst env and MasteryPublishing env | | `No articles found` | Article status is not `published` in Katailyst registry | Publish it in Katailyst first | ## Next steps after this runbook succeeds 1. Register the `mastra-gateway` adapter in the Paperclip server. See [packages/adapters/mastra-gateway/README.md](../../../../paperclip-master/packages/adapters/mastra-gateway/README.md). 2. Install the `katailyst-registry` Paperclip plugin. See investigation doc decision (b). 3. Create a content-factory Mastra workflow that reads the audience-research packet and proposes article briefs. Wire it as a Paperclip agent in the HLT Content Factory company. 4. Move to the nurse recruiting pilot. See [nurse-recruiting-pilot.md](./nurse-recruiting-pilot.md). --- ## Source: docs/runbooks/paperclip/nurse-recruiting-pilot.md # Runbook — Nurse Recruiting Pilot (end-to-end on the wired stack) **Status:** Pilot playbook. Run after the Content Factory runbook has proven the plumbing end-to-end. **Related:** - [paperclip investigation doc](../../reports/doc/2026-04-16-paperclip-katalyst-screenpipe-investigation.md) - [HLT Content Factory company setup](./hlt-content-factory-company.md) - [nursing architecture brief](../../planning/active/katailyst-nursing-architecture.md) - [mastra-gateway adapter README](../../../../paperclip-master/packages/adapters/mastra-gateway/README.md) ## Purpose Prove the full stack — Paperclip governance + Mastra execution + Katailyst registry + nursing plugin content — on one vertical end-to-end, with budget caps and audit log. Once this proves out, forking to the social-media or PA/NP verticals is a config change. The pilot intentionally runs on the narrowest slice: one persona (`nclex-rn`), one research packet, one content recommendation, one published article. No parallel verticals. No multi-agent choreography. Just the loop end-to-end. ## Pilot outcome criteria The pilot is done when all five of these are true: 1. A **NurseResearch** Mastra workflow ran once, invoked via the `mastra-gateway` Paperclip adapter, and persisted a `PersonaFindings` artifact for `nclex-rn` to the Katailyst registry. 2. A **HLT NurseRecruiting** Paperclip company exists with a goal, an issue, an agent, and cost events logged for the research run. 3. The agent's cost stayed under a per-run budget cap enforced by Paperclip. 4. One article was published to `hltmastery.com/nursing/resources/<product_slug>/<slug>` via the approval-gated sync, with the published article's `body_json` referencing at least one pain point ID from the `PersonaFindings` artifact. 5. The full activity chain (agent invocation → usage → approval → publish) is replayable from the Paperclip `activity_log` for 90 days. Stop the pilot and debrief when all five are true, even if the article is mediocre. The pilot is proving the **plumbing**; content quality is the next iteration. ## Prerequisites (from prior runbooks) - Paperclip server running against its own Supabase database (investigation decision C). - `mastra-gateway` adapter scaffolded at `paperclip-master/packages/adapters/mastra-gateway/`. **Not yet** registered in the Paperclip server; step 1 below registers it. - HLT Content Factory company stood up and the gated sync proved out on one article (per [hlt-content-factory-company.md](./hlt-content-factory-company.md)). - Persona port in place at `plugins/katailyst-nursing/references/personas.json` and `research-schemas.ts`. - Katailyst `do-research` skill already in the registry (it is — see the nursing plugin). ## Step-by-step ### Step 1 — Register the `mastra-gateway` adapter in Paperclip core In `paperclip-master`, add `mastra-gateway` to the adapter registry in the Paperclip server's bootstrap code. Concretely: 1. `pnpm add @paperclipai/adapter-mastra-gateway` in `packages/server`. 2. Import `mastraGatewayAdapter` from the package into the server's adapter registry module (the file that already imports `claude-local`, `openclaw-gateway`, etc.). 3. Run `pnpm -r typecheck && pnpm test:run && pnpm build` before deploying. Verification: `GET $PAPERCLIP_API_URL/api/adapters` lists `mastra_gateway` as an available adapter type. ### Step 2 — Deploy the NurseResearch Mastra workflow Create a minimal Mastra workflow in the `katailyst-agents` sidecar (or a dedicated `katailyst-nurse-research` Vercel project). Workflow inputs: - `personaIds: string[]` - `researchDepth: "quick" | "standard" | "deep"` - `outputFocus: "full" | "content-strategy" | "persona-findings" | "gap-analysis"` Workflow outputs (terminal response, per [mastra-gateway README](../../../../paperclip-master/packages/adapters/mastra-gateway/README.md)): ```jsonc { "status": "ok", "result": { "summary": "NCLEX-RN research packet: 7 pain points, 3 trends, 4 content gaps (quality 82).", "payload": { /* PersonaFindings[] per research-schemas.ts */ }, "meta": { "provider": "anthropic", "model": "claude-opus-4-6", "usage": { "inputTokens": ..., "outputTokens": ... }, "costUsd": 0.42 } } } ``` Use the methodology in [`plugins/katailyst-nursing/references/audience-research-workflow.md`](../../../plugins/katailyst-nursing/references/audience-research-workflow.md). Load personas from the ported `personas.json`. Validate output with the Zod schemas in `research-schemas.ts`. Pilot budget: $2 per run (one persona, standard depth). ### Step 3 — Create the HLT NurseRecruiting Paperclip company Mirror the structure of the Content Factory company (see its runbook for the curl shape). Values: - `name`: "HLT NurseRecruiting" - `issuePrefix`: "NURSE" - `budgetMonthlyCents`: 30000 ($300/mo for pilot) - `requireBoardApprovalForNewAgents`: true - `brandColor`: "#0ea5e9" Export `PAPERCLIP_NURSE_COMPANY_ID`. ### Step 4 — Hire the NurseResearch agent in that company `POST $PAPERCLIP_API_URL/api/companies/$PAPERCLIP_NURSE_COMPANY_ID/agents` with: ```jsonc { "name": "NurseResearch", "title": "Audience Research Agent", "role": "agent", "adapterType": "mastra_gateway", "adapterConfig": { "url": "https://katailyst-nurse-research.vercel.app/api/workflows/audience-research/run", "workflowId": "audience-research", "authToken": "<from company_secrets>", "timeoutSec": 600, "payloadTemplate": { "researchDepth": "standard", "outputFocus": "full", }, }, "budgetMonthlyCents": 10000, "heartbeatEnabled": false, } ``` Expect to hit the "requireBoardApprovalForNewAgents" gate. Approve via the Paperclip dashboard. Export `PAPERCLIP_NURSE_AGENT_ID`. Why `heartbeatEnabled: false`: the pilot runs on demand, not on a schedule. Turn on heartbeats only after the first successful one-shot run. ### Step 5 — Create a research goal + issue + first task ```bash # Goal curl -sS -X POST "$PAPERCLIP_API_URL/api/companies/$PAPERCLIP_NURSE_COMPANY_ID/goals" \ -H "Authorization: Bearer $PAPERCLIP_OPERATOR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "title": "Ship one NCLEX-RN content piece backed by audience research", "description": "Run the NurseResearch agent on persona nclex-rn, pick one content recommendation, produce the article via the content factory, publish via the approval-gated sync.", "horizon": "sprint" }' # Issue assigned to NurseResearch agent curl -sS -X POST "$PAPERCLIP_API_URL/api/companies/$PAPERCLIP_NURSE_COMPANY_ID/issues" \ -H "Authorization: Bearer $PAPERCLIP_OPERATOR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "goalId": "'"$PAPERCLIP_NURSE_GOAL_ID"'", "title": "Research packet: nclex-rn (pilot)", "body": "Produce a full PersonaFindings packet for personaId=nclex-rn per audience-research-workflow.md. Output must validate against personaFindingsSchema. Budget cap: $2.", "status": "todo", "assigneeAgentId": "'"$PAPERCLIP_NURSE_AGENT_ID"'", "labels": ["research", "pilot", "nclex-rn"] }' ``` ### Step 6 — Trigger the agent run Because `heartbeatEnabled: false`, trigger a manual wake: ```bash curl -sS -X POST "$PAPERCLIP_API_URL/api/agents/$PAPERCLIP_NURSE_AGENT_ID/wake" \ -H "Authorization: Bearer $PAPERCLIP_OPERATOR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "reason": "pilot-run", "issueId": "'"$PAPERCLIP_NURSE_ISSUE_ID"'" }' ``` Paperclip will invoke the `mastra-gateway` adapter with a context carrying `runId`, `companyId`, `agentId`, `taskId`, `issueId`. The adapter POSTs to the Mastra URL. Mastra runs the 8-step methodology. On return, Paperclip logs `cost_events` and `activity_log` entries. ### Step 7 — Verify the PersonaFindings artifact The Mastra workflow should have written the artifact to a known location (filesystem, S3, or directly to the Katailyst registry as a staged entity). Minimum viable target for the pilot: write to `/tmp/research/nclex-rn-<runId>.json` and copy into the Katailyst repo at `plugins/katailyst-nursing/references/pilot-artifacts/nclex-rn-findings.json`. Validate: ```bash cd /Users/alecwhitters/Downloads/katailyst-1 node -e " const findings = require('./plugins/katailyst-nursing/references/pilot-artifacts/nclex-rn-findings.json'); const { personaFindingsSchema } = require('./plugins/katailyst-nursing/references/research-schemas.ts'); personaFindingsSchema.parse(findings); console.log('OK: findings match schema, pain points:', findings.painPoints.length); " ``` If validation fails, that's the pilot's first real bug — the Mastra prompt needs to enforce the schema more tightly. Log it, fix the workflow, re-run step 6. ### Step 8 — Pick a content recommendation and create the publish issue From the findings, pick the top content recommendation (highest priority, opportunity score A or B). Create an issue in **HLT Content Factory** (not NurseRecruiting): ```bash curl -sS -X POST "$PAPERCLIP_API_URL/api/companies/$PAPERCLIP_HLT_COMPANY_ID/issues" \ -H "Authorization: Bearer $PAPERCLIP_OPERATOR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "goalId": "'"$PAPERCLIP_HLT_GOAL_ID"'", "title": "Publish: <chosen recommendation title>", "body": "Article brief:\n- Hook: <from recommendation>\n- Outline: <from recommendation>\n- Pain points addressed: <IDs>\n- Persona: nclex-rn\n- Research runId: <from step 6>\n\nDraft the article body in the Katailyst registry (code: article-<slug>), then publish via gated sync.", "labels": ["publish", "nclex-rn", "pilot-derived"] }' ``` ### Step 9 — Human-authored article in Katailyst For the pilot, the article body is human-authored (not agent-generated). Why: we want to validate the plumbing, not the generation. Create a Katailyst article entity with `code: article-<slug>`, `status: published`, body referencing the pain point IDs from the findings. This is the same process as any existing HLT article. (Next iteration of the pilot will replace this step with a content-factory Mastra agent generating the draft. Explicitly out of scope for pilot v1.) ### Step 10 — Create the approval and run the gated sync Per the Content Factory runbook (steps 4 and 5). Use `--codes article-<slug>`. The gated sync posts the publish result as a comment on the HLT issue. ### Step 11 — Close the loop - Mark HLT publish issue `done`. - Mark NurseRecruiting research issue `done`. - Pull the Paperclip `activity_log` for both companies and paste the replay chain into the issue's final comment. This is the audit-log verification for outcome criterion 5. ## Pilot debrief checklist After step 11, write a one-page debrief in `docs/reports/doc/<today>-nurse-recruiting-pilot-debrief.md` covering: 1. Did all 5 outcome criteria hit? If no, which failed and why? 2. Actual cost per research run vs the $2 budget. Actual cost per publish. 3. How many manual touchpoints (human interventions) were required? Target for pilot v2: one (the approval gate itself). Count every other intervention as a pilot gap. 4. One content-quality observation. Did the chosen recommendation's pain-point anchoring make it into the published article? Could a reader tell? 5. The single biggest friction point in the plumbing. Not "add more tests" — the actual thing that wasted the most time. ## What this pilot intentionally does NOT test - **Multiple personas in parallel.** One persona only. - **Content-factory agents generating drafts.** Humans draft in pilot v1. - **Heartbeat-driven scheduling.** On-demand wake only. - **Katailyst-registry Paperclip plugin.** The agent's skill access is hard-wired in the Mastra workflow; the plugin integration is pilot v2. - **Social-media vertical.** That's the forking decision after debrief. ## What unblocks when the pilot succeeds 1. **Vertical fork.** Stand up a "HLT SocialMedia" Paperclip company with the same company structure. The NurseResearch workflow becomes `SocialResearch` (different personas, same methodology). Delta is config, not code. 2. **Katailyst-registry plugin.** Build it and swap the hard-wired skill access for the plugin-provided tools. 3. **Content-factory agent.** Wire a second `mastra-gateway` agent in the HLT Content Factory company that consumes `ContentRecommendation` input briefs and drafts articles. Article body goes into the Katailyst registry via the registry plugin's writable tools (future — out of pilot v1). 4. **Scheduled heartbeats.** Turn on heartbeats once cost shape is known and the one-shot flow is stable. ## Known risks | Risk | Likelihood | Mitigation | | ---------------------------------------------------------------------------------------------- | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | | Mastra workflow exceeds `$2` budget on first run | Medium | Adapter reports `costUsd` back; Paperclip enforces `budgetMonthlyCents`. Start depth=`quick` if nervous. | | Paperclip approval API shape differs from what `content-engine-sync-articles-gated.ts` expects | Medium | Runbook uses `/api/approvals/<id>` and expects `status`+`expiresAt`. If the actual Paperclip API differs, update the wrapper's `fetchApproval` shape. | | Supabase DB for Paperclip hits pooler connection limit in production | Low | Use direct-connection (port 5432) for migrations, pooler (port 6543) for runtime. Per investigation doc decision C. | | Mastra workflow response does not match `personaFindingsSchema` | High | This is the most likely first failure. Fix in workflow prompt, not by loosening the schema. | ## References - [Paperclip investigation decisions](../../reports/doc/2026-04-16-paperclip-katalyst-screenpipe-investigation.md) - [HLT Content Factory runbook](./hlt-content-factory-company.md) - [mastra-gateway adapter contract](../../../../paperclip-master/packages/adapters/mastra-gateway/README.md) - [Audience research methodology](../../../plugins/katailyst-nursing/references/audience-research-workflow.md) - [Ported persona registry](../../../plugins/katailyst-nursing/references/personas.json) - [Ported research schemas](../../../plugins/katailyst-nursing/references/research-schemas.ts) --- ## Source: docs/runbooks/paperclip/rubric-scored-approval-design.md # Design — Rubric-Scored Paperclip Approvals (pilot v1.1) **Status:** Design only. Not yet implemented. Supersedes the binary-gate approval flow in [hlt-content-factory-company.md](./hlt-content-factory-company.md) step 4 as a v1.1 upgrade. **Related:** [investigation doc](../../reports/doc/2026-04-16-paperclip-katalyst-screenpipe-investigation.md), [kb:langfuse-tracing-hlt](../../../plugins/katailyst-nursing/references/PORT_NOTES.md), cross-validation plan `~/.claude/plans/effervescent-scribbling-sifakis.md` registry signal #1. ## Problem The pilot v1.0 approval gate is binary: a human clicks "approve" in the Paperclip dashboard after reading free-form body text. There's no structured review, no quality floor, and no way to capture why an approval was given or declined. For the first few pilot publishes that's fine. At scale (weekly cadence, multiple verticals, multiple approvers) the lack of structure means: - Approvers drift on what "good enough" means. No inter-rater agreement. - Declined approvals produce no actionable feedback for the author (human or agent). - Post-hoc analysis of why an article underperformed can't be joined to the pre-publish quality signal. The Katailyst registry already has the pieces needed to fix this: - **`rubric:content-quality`** — master composite rubric for pre-publish promotion. Surfaced in registry signal #1 of the cross-validation plan. - **`prompt:hlt-prompt-output-critique`** — multi-angle critique specialist. Designed to produce the structured feedback a rubric depends on. ## Design ### Data shape When a publishable artifact (draft article, image set, email sequence) is ready, the author (agent or human) creates the approval request body with three components: ```json { "issueId": "<uuid>", "kind": "content_publish", "title": "Publish <article-code> to <destination>", "body": "<human-readable summary>", "validityHours": 24, "rubric_ref": "rubric:content-quality", "critique_prompt_ref": "prompt:hlt-prompt-output-critique", "artifact_ref": "kb:<article-code>" } ``` The `rubric_ref`, `critique_prompt_ref`, and `artifact_ref` fields are **new** — pilot v1.0 has none of them. Adding them is additive to Paperclip's existing approval schema (new columns nullable, no migration of existing approvals needed). ### Flow 1. **Author creates approval request** with all three refs. 2. **Paperclip's approval UI fetches the artifact body** (via the `katailyst-registry` plugin's `get_skill_content` tool, per investigation decision (b)). 3. **Paperclip runs the critique** by invoking the referenced prompt against the artifact + the referenced rubric. Output lands as a structured `ApprovalCritique` block attached to the approval row: ```json { "rubric_ref": "rubric:content-quality", "scores": [ { "dimension": "factual-accuracy", "score": 88, "notes": "..." }, { "dimension": "brand-voice-fit", "score": 76, "notes": "..." }, { "dimension": "seo-hygiene", "score": 82, "notes": "..." }, { "dimension": "pedagogical-clarity", "score": 70, "notes": "..." } ], "overall_score": 79, "top_concerns": ["..."], "suggested_edits": ["..."], "cost_usd": 0.04, "generated_at": "2026-04-17T01:23:45Z" } ``` 4. **Approver sees** the artifact PLUS the critique block in the Paperclip dashboard — not raw copy. 5. **Approver decision** is now one of: `approved` (explicit override available if score < 70), `rejected` (with the critique's `top_concerns` surfaced as rejection reason), `revision_requested` (new status — passes `suggested_edits` back to the author as an issue comment). ### Score thresholds (defaults) - **≥ 85 overall** — auto-approved for low-stakes kinds (e.g., social posts, internal runbooks). Still requires board approval for `content_publish` kind per existing `requireBoardApprovalForNewAgents` semantics — scoring is additive, not a bypass. - **70–84 overall** — requires explicit approver click. Critique block shown prominently. - **< 70 overall** — status pre-filled as `revision_requested`. Approver can override to `approved` but must type a reason. Thresholds live in the company config, not hardcoded. Per-rubric thresholds allowed (some rubrics score harder than others). ### Cost + latency - Critique LLM call: ~$0.02–0.10 per approval (Sonnet-class model on a 1,500-word article). Budget cap applies — goes through `cost_events` on the reviewing agent. - Latency: 10–30s at approval-create time. Acceptable — approvals are async by nature. - Falls back to no-critique on critique LLM failure. Approval still usable; just marked `critique_failed` on the audit trail. ## What this design explicitly does NOT do - **Does not make approval decisions autonomously.** The critique informs the human approver; the human approver still clicks. - **Does not replace the approval audit log.** `cost_events`, `activity_log`, and approval state transitions remain authoritative. - **Does not require a new rubric or prompt.** Uses `rubric:content-quality` and `prompt:hlt-prompt-output-critique` as-is. Custom rubrics per-content-kind can be added later by extending the company config, not the code. ## Implementation dependencies (blockers) This design is blocked on: 1. **`katailyst-registry` Paperclip plugin** (investigation decision (b), not yet implemented). Needed to fetch the artifact body and run the critique prompt server-side. 2. **Paperclip approval schema extension** for the three new ref fields (`rubric_ref`, `critique_prompt_ref`, `artifact_ref`) and the `ApprovalCritique` block. Additive migration; no data loss. **This is a Paperclip-side change, not a Katailyst-side change** — document in Paperclip issue tracker. 3. **`rubric:content-quality` readable via MCP** — confirm via `get_entity("rubric", "content-quality")` that the rubric's evaluation criteria are machine-consumable. If they're prose only, normalize to the score-per-dimension shape above before rolling out. ## Rollout - **Phase A** (after plugin + schema extension land): shadow mode. Critique runs on every approval, but the dashboard only shows it as an "Advisory" block; approvers continue to click based on the raw artifact. Capture 20-30 shadow runs to calibrate thresholds. - **Phase B** (after calibration): enable threshold auto-decisions for internal/low-stakes kinds only. - **Phase C** (after trust): enable for `content_publish` with default thresholds. ## References - Registry signal #1: `rubric:content-quality` + `prompt:hlt-prompt-output-critique` — the existing registry pieces this design composes. - Cross-validation plan (P1 action #4): the rubric-scored-approvals line item that motivated this design doc. - Investigation doc Preventive Measures: single-use approval commitment (pilot v1.0) continues to apply in v1.1. One approval, one critique, one execution. --- ## Source: docs/runbooks/paperclip/verify-approval-api.md # Runbook — Verify Paperclip Approval API Shape (B1) **Status:** Test procedure. Execute against a live Paperclip deployment before running [hlt-content-factory-company.md](./hlt-content-factory-company.md) step 5. **Resolves:** Verification blocker B1 in [investigation doc](../../reports/doc/2026-04-16-paperclip-katalyst-screenpipe-investigation.md). **Time to complete:** ~5 minutes. ## Why this runbook exists The approval-gated sync wrapper at [scripts/integrations/content-engine-sync-articles-gated.ts](../../../scripts/integrations/content-engine-sync-articles-gated.ts) assumes a specific response shape for `GET $PAPERCLIP_API_URL/api/approvals/<id>`. The shape was inferred from the Paperclip spec, not empirically verified. If the live response differs, the gated sync will either refuse valid approvals (false negative) or accept invalid ones (false positive, worse). This runbook verifies the shape with one curl and one diff. Do it once per Paperclip deployment; capture the result in `kb:paperclip-approval-api` in the Katailyst registry. ## Prerequisites - Paperclip server running, reachable at `$PAPERCLIP_API_URL`. - Operator bearer token with `approvals:read` scope in `$PAPERCLIP_OPERATOR_TOKEN`. - At least one approval created via the normal UI flow (any status — `pending`, `approved`, `rejected`, `expired`, or `consumed` all exercise the shape). - `jq` installed for JSON inspection. ## Steps ### 1. Capture a response ```bash APPROVAL_ID=<paste-id-from-dashboard> curl -sS \ -H "Authorization: Bearer $PAPERCLIP_OPERATOR_TOKEN" \ -H "Accept: application/json" \ "$PAPERCLIP_API_URL/api/approvals/$APPROVAL_ID" \ | tee /tmp/paperclip-approval-sample.json \ | jq . ``` Save the output. That file is the truth of the shape. ### 2. Diff against the wrapper's expectation The wrapper expects (top-level, or wrapped in `{"approval": {...}}`): ```jsonc { "id": "string", // required "companyId": "string?", // optional "issueId": "string?", // optional, used for posting result comments "status": "approved|pending|rejected|expired|consumed|<other>", // required, string literal "expiresAt": "<ISO 8601 string>" | null, // optional, parsed with Date.parse() "title": "string?" // optional, used in log lines only } ``` Checklist — for each field, inspect the captured JSON: - `.id` — string present? ✓/✗ - `.status` — string literal (not nested object, not enum number)? Acceptable values observed? ✓/✗ - `.expiresAt` — ISO 8601 when present, or null, or absent? ✓/✗ - `.issueId` — string present when the approval was tied to an issue? ✓/✗ - Wrapper envelope — is the approval at the top level, or wrapped in `{"approval": ...}`? Either is handled by `fetchApproval()` (line ~119 of the wrapper). Automated diff: ```bash jq 'keys' /tmp/paperclip-approval-sample.json # Expected keys (at minimum): id, status # Optional but useful: companyId, issueId, expiresAt, title jq '.status' /tmp/paperclip-approval-sample.json # Expected: a string — "pending" | "approved" | "rejected" | "expired" | "consumed" # If you see an object or an int, the wrapper's ApprovalRecord interface needs updating. jq '.expiresAt' /tmp/paperclip-approval-sample.json # Expected: an ISO 8601 string or null # If you see a Unix timestamp or a different format, Date.parse() will likely produce NaN # and the wrapper will skip the expiry check — silent bug. Normalize in the wrapper. ``` ### 3. Repeat with a consumed approval (B2 validation) After running the gated sync once successfully, immediately re-fetch the same approval and inspect the status. Related verification blocker B2: does the transition `approved → consumed` happen server-side? ```bash # Before running gated sync curl -sS ... | jq '.status' # Expected: "approved" # Run the gated sync successfully. Wait for exit 0. # After running gated sync curl -sS ... | jq '.status' # Expected: "consumed" # If still "approved": B2 fails. The wrapper's single-use enforcement is vacuous. # Fix: extend the wrapper to call POST /api/approvals/<id>/consume (or equivalent). ``` ### 4. Capture the findings Create a Katailyst registry entity so future agents and operators don't repeat this test: ``` kb:paperclip-approval-api@v1 - Summary: Verified response shape for GET /api/approvals/:id on Paperclip <version>. - Tags: format:reference, scope:org, source:internal, domain:engineering, family:integration, modality:text - Body: paste the captured JSON + any deltas from the wrapper's expected shape - Links: - uses_tool → tool:paperclip-approvals-api (if that tool entity exists) - pairs_with → scripts/integrations/content-engine-sync-articles-gated.ts reference ``` Update the wrapper's `ApprovalRecord` interface comment at [scripts/integrations/content-engine-sync-articles-gated.ts](../../../scripts/integrations/content-engine-sync-articles-gated.ts) line ~46 to reference the new kb entity: `// Shape verified against live deployment. See kb:paperclip-approval-api.` ### 5. If the shape diverges For each divergence, pick the tolerant fix (do NOT ask Paperclip to change): | Divergence | Fix in wrapper | | ------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------- | | Status values differ in case (`APPROVED` vs `approved`) | Normalize in `assertApprovalUsable()` with `.toLowerCase()`. | | Status is a nested object (`{ value: "approved" }`) | Update `ApprovalRecord.status` type; extract the string in `fetchApproval()`. | | `expiresAt` is Unix epoch ms (number) | Branch in the expiry check: if `typeof === 'number'`, compare directly; else Date.parse. | | Top-level envelope (`{"approval": {...}}`) | Already handled by `fetchApproval` fallback — no change. | | Status `consumed` doesn't exist | B2 blocker — must handle server-side consume call explicitly. See investigation doc Verification blockers §B2. | ## Outcome Before this runbook runs: the gated sync is theoretical — it might work, it might not. After this runbook runs: `kb:paperclip-approval-api` documents the actual contract, the wrapper is known-compatible, and B1 is resolved. Pilot can proceed to `hlt-content-factory-company.md` step 5. --- ## Source: docs/runbooks/README.md # Runbooks `docs/runbooks/**` is the specialist operations lane. This index is the router; the deeper runbooks are the actual procedures. Use it for: - setup workflows that are narrower than the main repo bootstrap - operator verification and incident handling - integration-specific or domain-specific procedures Do not use it for: - the repo front door - canonical architecture explanation - everyday command discovery that belongs in `README.md`, `justfile`, or `package.json` ## Route By Task - Factory lifecycle, promotion, rollback, and incident handling: `docs/runbooks/factory/README.md` - External consumers and interop patterns: `docs/runbooks/interop/README.md` - MCP server configuration: `docs/runbooks/mcp/servers/` --- ## Source: docs/SYSTEM_GUIDE.md # Katailyst: Vision & System Guide **Version:** 1.3 — March 2026 **Author:** Alec Whitters, HLT Corp **Purpose:** Canonical description of what Katailyst is, what problems it solves, how it works, and where it's going. Give this to any AI, team member, or collaborator before they work with or on the system. --- ## 1. The Problems We're Solving Every team using AI faces the same set of problems. They compound into each other and nobody has good answers for them yet. **Everyone has their own way.** Each person on a 20-person team discovers their own prompts, their own workflows, their own tricks. The marketing person found a great copywriting approach. The NP team figured out how to get good clinical content. Justin has his B2B outreach dialed in. But none of these learnings transfer. They live in individual chat histories and personal notes. When one person figures something out, the rest of the team doesn't benefit. **It's hard to get the ecosystem working together.** There are amazing AI tools, skills, and prompts scattered across the web — Anthropic's skill standards, community repos, specialized prompt libraries, design component systems. But importing them into your workflow, adapting them to your business, making sure they have the right tags and metadata, avoiding duplicates of things you already have — that's tedious and error-prone. The AI often does it wrong: it forgets tags, tries to batch-import 50 things without reading any of them, or brings in things that don't fit our format. **AI doesn't know your business.** No matter how smart the model is, it doesn't know what your 20 different test prep apps are, what the pricing is at each tier, what your brand voice sounds like, what worked in last month's campaign, or what the specific NP certification exam covers. It can guess, but it guesses wrong. The only durable advantage is encoding this knowledge so every agent has it automatically. **It's impossible to know if this is better than that.** Someone changes a skill. Did the output improve? Someone imports a new design guideline. Is it actually better than what we had? Without systematic testing, you're guessing. And guessing at scale means quality scatters instead of improving. **The machine that creates the machine.** Prompt engineering, skill creation, knowledge curation — these are the meta-capabilities that determine everything else. If you can build a system that creates, tests, and improves its own building blocks, you have a self-reinforcing advantage. The factory that builds the factory. **How do you get a whole team using AI effectively?** Not just power users. Everyone — the NP team, marketing, customer success, B2B. They need to see what exists, add their expertise, trust that the system works, and have it feel like one coherent thing instead of 15 disconnected tools. Katailyst is our answer to all of these problems at once. --- ## 2. What Katailyst Is Katailyst is a **registry, knowledge graph, and quality engine** for composable AI building blocks. It holds ~1,600 entities across 21 types, connected by ~10,900 typed weighted links, organized by 34 tag namespaces. Any AI agent — Claude Code, Cursor, Slack, Cowork, a custom orchestrator — plugs in via MCP and instantly accesses the entire library. When an agent connects and sends a request, Katailyst runs a three-phase discovery pipeline: semantic vector search across all entities → graph expansion through typed links to find structurally related blocks the text search would miss → ranked selection of the best 5-20 building blocks for that specific situation. The agent receives skills, knowledge, styles, schemas, tools, and rubrics — everything it needs to be an instant expert on this topic for this audience in this brand voice. **The armory metaphor:** Katailyst is to AI agents what a well-organized armory is to a special operations team. We don't tell the operators which weapon to use or how to complete the mission. We equip them with the best gear available, organized so they can find what they need, and trust their intelligence and context to determine the best approach. We are the golf caddy, not the golfer. **Three content layers:** - **System layer** — community/canonical building blocks from across the web. Available to all. Constantly growing as we scan, import, and adapt the best from Anthropic, community repos, and anywhere else. - **Organization layer (HLT)** — company-specific: brand guides, audience KBs, product knowledge, proprietary skills, internal styles. Visible to everyone in the org. - **Personal layer** — per-user: "write in my voice," preferred design aesthetic, personal collections, agent memory of past conversations. Only visible to the individual. The architecture supports multi-tenancy: today it's HLT, but the same system can serve other organizations, each with their own org layer on top of the shared community library. The long-term vision is that this becomes a platform any organization can use to manage their AI ecosystem — especially educators and educational publishers who face the same problems we do. --- ## 3. The Thesis AI is generalizable. The production bottleneck is gone. The new bottleneck is judgment: what to create, for whom, in what voice, against what standard, and how to know if it worked. When everyone has an AI factory, the winner teaches their factory three capabilities: **precision** (hit the target), **engagement** (make it resonate), and **learning** (get better each cycle). Not as one-time brilliance, but as repeatable, scalable systems. **Study the top 1% across all industries.** Not just healthcare. What's actually winning — what customers vote for with their feet. A landing page converting at 3x. A social post with 10x engagement. Find the underlying concept that made it work. Encode that concept as a skill, style, or rubric. Now every output benefits automatically. **Deeply understand the audience.** Go to the forums. Study what real nursing students complain about, search for, argue over. Encode this into audience KBs and personas. The shift: it's not "can you understand your audience" — it's "can you get the factory to understand your audience." Because once the factory learns it, scale follows. **Encode, test, iterate.** Great insights don't stay in someone's head. They become registry entities — linked, tagged, graded, tracked. If someone finds a better approach, A/B test it using the pairwise system. If quality drops, the eval system catches it. Build-measure-learn at the atomic unit level. **Build the system, not the use case.** Counterintuitive but deeply believed. AI is generalizable — get the orchestration right and it iterates into any use case. "Create a {content_type} for {audience} about {topic}" works for healthcare, real estate, legal, anything — if the building blocks are good and the discovery is smart. The right orchestration can find the use case, rather than requiring a perfect use case up front. **Small advantages compound at scale.** When everyone has a factory, the one with 5% better targeting, 5% better hooks, 5% better brand consistency doesn't just win by 5%. It wins by a massive margin because those advantages multiply across thousands of outputs. --- ## 4. How the System Works ### 4.1 The Registry Supabase-canonical database. ~1,400 registry entities. Each has: type, metadata (name, code, summary, use case), tags in 34 namespaces, lifecycle status (staged → curated → published → deprecated → archived), quality signals (tier 1-10 where 1=exceptional and 10=raw/deprecated, QA score 0-100 with 75 pass threshold, content completeness), artifacts (instruction bodies, examples, reference files), and links to other entities. **The 21 entity types and what they actually are:** | Type | Count | Role | | ------------------------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Skill** | 122+ | The most important type. Structured procedural instruction with activation condition, instruction body, and artifacts. Like a specialized employee who knows how to do one thing really well. "Do Research" tells an agent the complete methodology for gathering, comparing, and packaging external evidence with citations and confidence levels. | | **Prompt** | 141+ | Lighter-weight instruction template. More open-ended, with variables: "Write a {content_type} for {audience} about {topic}." | | **Knowledge Bundle (KB)** | 152+ | Reference material and domain expertise. Briefing packets. "HLT Product Catalog" with pricing, features, audiences across all 20 apps. "FNP Certification Requirements." "NCLEX Pharmacology Essentials." | | **Content Type** | 120+ | Output format specs. "Question of the Day Card" with dimensions, layout zones, content rules. The mold content gets poured into. | | **Tool** | 46+ | API connections: Cloudinary, Apify, N8N, corporate CMS API, Vercel deploy. Extends agents into the real world. | | **Playbook** | 32+ | Multi-step workflow templates. "Create Article from Pain Point": research → outline → draft → image → review → publish. | | **Bundle** | 23+ | Pre-packaged groupings. "NCLEX-RN Content Pack." Convenience, not mandate — agents can use as-is, cherry-pick, or ignore. | | **Style** | varies | Voice, visual system, design language. "HLT Social Impact": brand colors (#0b0d12, #f07b33, #ffd07d), typography, tone. What makes outputs feel like "us" instead of generic AI. | | **Schema** | varies | Structural templates defining output shape. "Article Schema": title, hook, body sections, CTA, metadata. | | **Agent/Persona** | 9 | Agent definitions with personality and operating instructions. Victoria (system agent, Slack), Lila (content ops), Julius (dev), Atlas (planning), Nova (execution), and others. | | **Recipe** | varies | Multi-step procedures chaining skills and tools. | | **Channel** | varies | Distribution targets: "HLT Instagram" with dimensions, tone, scheduling rules. | | **Rubric** | varies | Scoring criteria. "Article Quality Rubric": accuracy, engagement, brand alignment, readability. Both AI judges and humans score against these. | | **Eval Case** | varies | Specific test scenarios. "Enhance heart anatomy question" — always the same input, grade the output. Tracked over time. | | **Metric** | varies | Tracked measurements. | | **Lint Rule/Ruleset** | varies | Quality enforcement. "All skills must have ≥3 tags and an activation condition." | | **Agent Doc** | varies | Runtime documentation attached to agents. Operating instructions, memory, personality definitions consumed during agent context assembly. | | **Operational Log** | varies | Process records and operational notes. Audit trails, session logs, decision records tracked for transparency and debugging. | | **Pattern** | varies | Architectural patterns and reusable techniques. Documented approaches like "Two-Phase RAG" or "Orchestrator-Subagent" that guide agent behavior and system design. | | **Hub** | 15+ | Domain front doors. Each hub organizes a domain (social, article, marketing, etc.) and its `recommends` links point agents to the best tools, skills, and KBs for that domain. | ### 4.2 The Knowledge Graph ~9,000 typed, weighted links. Every link carries: `link_type`, `weight` (0.0–1.0), `reason` (human-readable). 14 link types with different semantics: - `requires` (weight 0.7-0.9): Hard dependency. Gets reserved promotion slots during discovery. "This skill REQUIRES this KB to function." - `recommends` (weight 0.7-1.0): Strong suggestion. "This skill works much better WITH this style." - `pairs_with` (weight 0.5-0.7): Often used together. - `uses_kb`, `uses_tool`, `uses_prompt` (weight 0.7-0.9): Reference dependencies. - `bundle_member`, `governed_by_pack`, `often_follows`, `prerequisite`, `supersedes`, `alternate`, `parent`, `related`: Various structural relationships. **21 hub entities** serve as domain front doors. Hub-social for social media. Hub-article for articles. Hub-copywriting for writing. Hubs are like phone books — intermediate navigation between broad search and specific resources. They're critical infrastructure for discovery and should NOT be casually edited. **The graph's design principle: "The graph decides, code follows."** Zero hardcoded entity codes, keywords, or role assignments anywhere in application code. Entity selection is entirely driven by graph structure, link weights, tier rankings, and tag overlap. The system's behavior changes by editing the graph — no code changes needed. If adding 50 entities tomorrow would require code edits, the design is over-orchestrated. ### 4.3 The Three-Phase Discovery Pipeline When any agent calls `discover` via MCP: **Phase 1 — Semantic Search.** The query is embedded and matched against all entities using 6 weighted scoring signals: text match on name/code/summary/use_case (3×), tag overlap (1×), link popularity via log of incoming links (0.3×), priority tier on 1-10 scale via (11-tier)/10 (0.5×), rating normalized 0-100 (0.3×), and recency decay on updated_at (0.2×). Returns top-K candidates with composite scores typically 3.0-6.0. **Phase 2 — Graph Expansion.** From the top discover hits, traverses 8 link types (uses_kb, recommends, requires, bundle_member, governed_by_pack, parent, uses_prompt, pairs_with) to pull in structurally related entities that text search alone would miss — styles, schemas, rubrics, recipes, playbooks, content types. Graph-expanded entities get synthetic relevance scores: `link_weight × 3.0 × linkTypeBoost × tierBoost` where linkTypeBoost for requires=1.5, pairs_with=1.3, uses_kb=1.2, others=1.0. **Phase 3 — Selection.** Picks 5-8 final entities and returns them as a **ranked menu** — NOT a mandate. The highest-scored entity appears first in the array. Structural dependencies (`requires` links) from the top discover hits are included to ensure styles, schemas, and KBs travel with the skills that need them. Remaining slots fill by ranked score. The consuming agent then decides which blocks to use, in what order, and how many — the registry never forces a specific entity or sequence. The `registry-select-from-menu` MCP prompt explicitly supports choosing a `best_ref`, keeping `backup_refs`, and deciding whether to `use_now` or `refine_search`. Every selection is tracked in a `recommendation_receipt` with `graph_promotions`, `confidence` (low/medium/high), and `clarification_hints`. This means full explainability: every returned entity includes `match_reasons` like `['graph:requires', 'via:recipe:web-landing-page']`. The agent and the team can see exactly why something was included. ### 4.4 The MCP 33 tools in 8 families. This is the interface any external agent uses: **Discovery:** `discover` (primary — semantic + graph + ranking), `traverse` (walk graph by link type), `search_tags` (taxonomy browse) **Registry Read:** `get_entity`, `list_entities`, `get_skill_content` (full instruction body + artifacts), `registry.artifact_body`, `registry.graph.summary` **Registry Write:** `registry.create`, `registry.update`, `registry.add_revision`, `registry.link` (create or remove links via action param), `registry.manage_tags` **Meta & Navigation:** `registry.agent_context` (onboarding context packet), `registry.session`, `registry.capabilities`, `registry.health`, `guide` (system orientation + task-shaped navigation) **Execution:** `tool.search`, `tool.describe`, `tool.execute` (with vault-backed secrets) **Memory & History:** `memory.query`, `history.query` **Delivery:** `delivery.schedule` (create/list/get/reschedule/cancel/stats via action param), `delivery.connect_link.create`, `delivery.targets.discover/list/promote` **Lists & Eval:** `lists.get`, `lists.manage` (create/add_item/vote/publish via action param), `eval.refresh_signals` **Quality:** `deliberate` (multi-agent review for high-stakes artifacts) Five built-in MCP prompts guide discovery: `registry-select-from-menu` (choose from results), `registry-refine-discovery` (improve a poor search), `registry-agent-bootstrap` (build an agent context packet), `registry-expand-user-request` (enrich thin user input), and `registry-integration-onboarding` (plan integration rollout). MCP declares 4 extensions in the `katailyst://` namespace: skill-conventions v1.0 (SKILL.md format, frontmatter, artifacts), content-types v1.0 (entity taxonomy, lifecycle, tags), graph-discovery v1.0 (link semantics, weighted traversal, discover protocol), tool-catalog v0.2 (external tool cataloging with risk levels and execution policy). ### 4.5 The Web Dashboard (23+ Pages) **Discovery:** Home (3D knowledge graph hero, quick-create, entity stats, common workflows, content gallery, activity feed), Registry Browse (search/filter all entities, gallery/list/table views), Connections (full 3D interactive knowledge graph), Topic Map, About (entity type explanations and entry points) **Creation:** Capability Forge (4 modes: Drop — paste raw idea, Interview — guided creation, Mine — extract from source, Browse — start from template), Registry New (form-based with per-type revision editors for each of the 21 entity types) **Operations:** Factory (4 tabs: Autopilot — AI-driven processing, Wizard — template-based with questionnaire_json/template_json/validator_config, Incoming — staged queue filtered to exclude smoke tests, Distribution), Observe (activity cockpit with nightly ops, run traces, system classification), Automations (scheduled workflows with versions and templates) **Quality:** Evals (4 tabs: Overview dashboard, Run management, Comparisons with pairwise ELO tournaments, Analytics), Test Lab (preview, validate, regression suite, lint, render, compare), Registry Health, Review Queue **Content & Delivery:** Content/Assets, Content/Multimedia (connected to Multimedia Mastery sidecar), Publishing (calendar), Channels **Integration:** Tools (connector cards with status), Tools/MCP (playground + server browser), Metrics, Integrations **Other:** Agents (registry with runtime overlays), Chat Ground, Lists (curated collections with scope: all/mine/org/public), Plugins (distribution/export), Settings, Personal Collections ### 4.6 The Quality Engine **Skill QA** evaluates skills across 8 dimensions: trigger_clarity (weight 1.2×), determinism, tool_correctness, context_hygiene, scope_constraints, knowledge_delta, output_contract, actionability. Pass threshold: 75 points. Skills below 75 stay staged. **Eval system infrastructure:** deterministic harness for repeatable tests, promptfoo integration for structured runs, single-output AI judge, rubric-based multi-axis judge, pairwise tournament with ELO ranking, variation generator, pipeline eval batch processing, improvement proposal generation from results, discovery weight calibration from eval signals. **A/B testing flow:** Create a variant (never replace the original). Run both through eval. Pairwise comparison. AI grades AND humans grade. The ELO-based tournament sorts variants across multiple matchups. Promote the winner. The system has `registry-select-from-menu` and comparison UI built in. **Lifecycle enforcement:** Entities start as `staged`. Must pass taxonomy coverage requirements (tags from required namespaces) and skill QA threshold (75 points) to be promoted to `curated`. `Curated` is the live operational tier. `Published` is a distribution/release tier. `Deprecated` → `archived` for sunset. ### 4.7 Connected Systems **Multimedia Mastery** (multimediamastery.vercel.app) — separate app with its own MCP handling image generation (Fal/Nano Banana Pro 2), editing, video, Cloudinary asset management. Style kits, workflow packs (Question of the Day, social, editorial), audience profiles. MCP at `/api/media/v1/mcp`. Should feel like one system with Katailyst. **Corporate CMS** — HLT's production system serving millions of users. ~70,000 question bank items across ~200 apps. Read-only API access. Agents can pull items and transform them into content. **Hosted agents** — Victoria (system agent, Slack, proactive maintenance), Lila (content ops), Julius (dev workflows) on Render via OpenClaw. 6 IDE sub-agents: Atlas, Nova, Quinn, Rex, Ivy, Lucy. **llms.txt** — curated 253-line agent index at `/llms.txt` with links to all canonical docs, enabling LLM-first navigation. --- ## 5. How It Gets Used ### The Team Onboarding Flow (What We Need to Nail) 1. NP team member logs in with HLT email → sees the home dashboard → the graph shows the interconnected system → entity stat cards explain what Skills, KBs, Prompts are 2. They search "FNP" → see everything related: 5 Skills, 8 KBs, 3 Styles, 2 Playbooks 3. They read through what exists → "oh, we're missing information about the 7P exam format" 4. They add a new KB with the missing knowledge → the system guides them through tagging, linking to related entities, and quality checks 5. They connect Katailyst MCP to their Claude Code → now their AI assistant has access to all 1,400+ blocks → when they ask "write a study guide for 7P certification," it pulls in the right KBs, skills, styles, and schemas automatically ### Content Marketing at Scale (High Season Push) Articles, social media (Question of the Day from CMS items), cheat sheets as lead magnets on Framer landing pages, email sequences via Marketo, upgrade screen optimization via weekly build-measure-learn loop. All using brand voice skills, audience KBs, and measured against rubrics. ### The Agent Ecosystem IDE agents (Claude Code, Cursor, Codex) for individual work. Slack agents (Victoria, Lila, Julius) for team-level operations and proactive maintenance. Web dashboard for visual management and browsing. Multimedia Mastery for visual content. External tools (Lovable, V0, Framer, N8N) via MCP. Future: per-department assistants, Mira tutor integration, custom agent gateway via Claude Code SDK. ### The Build-Measure-Learn Loop Create content → deploy → measure results → lessons feed back into skills and styles → next cycle is better. For upgrade screens: generate new copy, deploy for a week, measure conversion, generate improved variant, repeat. For social: track engagement, feed winning patterns back into the style kits and hook skills. The eval system tracks quality over time so we can see the improvement (or catch degradation). --- ## 6. Operating Principles 1. **We are the armory, not the commander.** Agents have more context. We rank, tag, suggest — never dictate paths. The Anti-Forcing Rule is canonical: when discovery returns bad results, fix metadata and scoring, not forced links. 2. **Quality over speed. No exceptions.** Nothing goes in half-done. Agents have Alzheimer's — anything not finished now is never finished. 3. **Fix root causes, never surfaces.** Ask why three times. Fix the underlying issue. Now or never. 4. **Read before you touch.** MUST read any entity in full before editing. No bulk scripts in the registry. The only way to add value is research and care. 5. **The graph decides, code follows.** Zero hardcoded entity codes or role assignments. Selection driven entirely by graph structure. If adding 50 entities would require code edits, the design is wrong. 6. **Consistent taxonomy.** 34 namespaces, one shared taxonomy across all 21 entity types. Units without required tags stay staged. 7. **Context over conciseness.** Rich context for agents. More tokens, not fewer. Explain the situation; let the agent find the path. 8. **Google-like approach.** Improve the ranking, not the import limits. Page 1 matters. Page 40 doesn't. 9. **Build the system, not the use case.** AI is generalizable. Get the orchestration right and it iterates into any use case. 10. **Observable everything.** Every run traceable. Every output gradeable. Evidence over intuition. 11. **The machine that creates the machine matters most.** Skill creation, prompt engineering, knowledge curation — the meta-capabilities that improve everything downstream. Invest disproportionately here. --- ## 7. Where This Is Going **Now:** Polish and reliability. Fix the rendering bugs. Onboard the team. Get the eval system running real output-quality tests against rubrics. Run the high-season growth push (May-August). Make it so every department can search their domain, see what exists, add their expertise, and trust the system. **Next:** Per-person AI assistants with memory and personality. Own agent gateway via Claude Code SDK (less dependency on OpenClaw). Distribution channels working end-to-end (Facebook, Instagram, Marketo, Framer). N8N automation integration. Sub-agent orchestration for complex requests. Mira tutor integration. **Later:** Multi-org platform for educators and educational publishers. Growing community library in the system layer. Self-improving factory that identifies winning patterns, imports them, tests variations, promotes winners — with human oversight. Agent fleet with dozens of specialized agents, each with their own personality. The endgame: every company needs something like this, and we're building it first — especially targeting educators and educational publishing companies who face every problem we face but don't have a solution yet. --- ## 8. Worked Example **Request:** "Create a daily question card about heart anatomy with one strong misconception hook and clean answer-reveal space." **Discovery (3 phases):** 1. Vector search: "heart anatomy question NCLEX misconception" → cardiac anatomy KBs, NCLEX-RN audience profile, question enhancement skill, misconception hook skill 2. Graph expand: follows `uses_kb` → HLT Social Impact style, `pairs_with` → QotD content type, `requires` → brand asset KBs 3. Selection: top skill slot 1, required KBs reserved slots, remaining by score. Returns ~8 entities with recommendation_receipt. **Execution:** Agent restructures the question using the enhancement skill methodology. Applies HLT Social Impact style. Calls Multimedia Mastery MCP for the image with style kit and audience profile. **Output:** Finished card — misconception hook, clean anatomy diagram, answer-reveal, HLT branding, mobile-optimized. **Eval:** Hook strength 8/10, clinical accuracy 9/10, visual clarity 7/10, brand compliance 9/10, schema compliance 10/10. Score: 86/100. Tracked. Next week, same eval, quality trend visible. --- _When facing any decision: are we equipping agents with the best building blocks and trusting their intelligence, or trying to be the dictator? Are we fixing the system, or putting a band-aid on a symptom?_ ---