Agentic Engineering Toolkit — Orchestrating Commands, Skills, Agents & Hooks

Foundation

Five Layers, Five Purposes

Each solves a different problem. Use the right one for the job — don't cram everything into .claude.md.

Hooks

Guaranteed enforcement. Format code, block protected files, log commands.

0 tokens

MCP Servers

External service access. GitHub, databases, monitoring, APIs.

On connect

Agents

Specialized behavior with fresh eyes. Separate context windows.

Separate ctx

Skills

Cross-project domain knowledge. Progressive disclosure — load on demand.

On demand

.claude.md

This project's context. Tech stack, commands, conventions. Keep it small.

Always loaded

The core principle: context is precious. .claude.md is always loaded (keep it under 300 lines). Skills load on demand. Sub-agents get their own context windows. Hooks run outside context entirely. Design for progressive disclosure, not "dump everything upfront."

Decision matrix: where should this knowledge live?

Use this to decide which layer is right for each piece of knowledge or behavior:

Knowledge?

Project-specific (build commands, tech stack) → .claude.md
Cross-project, stable (metric definitions, API standards) → Skill
Rapidly changing (sprint priorities, temp workarounds) → .claude.md

Behavior?

Specialized mindset (code reviewer, security auditor) → Sub-Agent
Context isolation needed (heavy analysis, exploration) → Sub-Agent

Service access?

External APIs, live data (GitHub, database, Sentry) → MCP Server

Enforcement?

Must always happen (format code, block .env edits, log commands) → Hook
If you're relying on Claude "remembering" to do something, it should be a hook.

Workflow 1

Building a New Feature

From problem definition through verified, shipped code — with context transferring cleanly across sessions.

/requirements JTBD Interview

/document GitHub Issue

/start-work Branch + Setup

Agent Teams Build + Review

Verify Tests + Chrome

/ship-it PR + Changelog

Phase 1: /requirements — JTBD Interview

A structured interview that walks through problem definition, jobs-to-be-done mapping, and codebase analysis before any code is written.

Problem Definition

"What problem are you trying to solve?" — forces clarity before solutions. Challenges vague or solution-first thinking.

Jobs Mapping

"What job are users hiring this feature to do?" — maps functional jobs to specific functionality and success criteria.

Codebase Analysis

Agent explores existing patterns, affected files (with line numbers), database impact, security implications. You get a confirmation checkpoint before it proceeds.

Implementation Plan

TDD-focused recommendations: phase-by-phase breakdown, files to create/modify, test strategy. Everything the next developer needs.

Phase 2: /document — Conversation → GitHub Issue

Converts everything from the interview into a comprehensive GitHub issue. This invokes the github-issue-creator skill, which structures the output for another developer (or agent) to implement.

The issue includes: root cause analysis, specific file references with line numbers, TDD workflow (mandatory), user integration flow, proposed data model changes, and security considerations.

Key principle: "Write for the next developer." They won't have your conversation — only this issue. The /document command transfers all your research into something actionable.

Phase 3: /start-work <issue-number>

Start a new session (clear context). This command reads the GitHub issue and sets up everything automatically:

Fetches the issue, extracts requirements
Creates a branch (feature/247-engagement-history)
Runs build + test baseline
Detects TDD requirements → enforces them
Reviews available MCP tools and agents

For complex issues, this is where agent teams come in: each implementation phase gets a discrete agent with focused scope. They share context through the issue spec and follow the best practices outlined in it.

Phase 4: Verify — Tests + Browser

Not done until verified through two channels:

Automated tests: TDD means tests were written first. Baseline is compared — new failures block shipping.
Chrome MCP: Claude drives the browser directly — navigates to the feature, interacts with it, screenshots for visual verification. Catches what unit tests miss.

Phase 5: /ship-it

Commits (conventional commit format), pushes, writes changelog entries (both internal CHANGELOG.md and public changelog page), creates a PR. Every step requires your approval.

Workflow 2

Bugs & Refactoring

The "Think Out Loud" pattern — don't tell Claude to fix it. Tell Claude to explain what changes are needed.

The insight: When you ask Claude to "fix this bug," it jumps straight to code. When you ask Claude to "review this code and determine what changes would be needed," it thinks through the problem systematically — building rich context in the process. That context is what makes /document produce a thorough implementation spec.

Review the code

"Review the code related to [feature/area]" — gets Claude oriented in the relevant part of the codebase.

Explore impact

"What changes would be needed for the system to [desired behavior]?" — deliberately asks Claude to articulate, not implement.

Claude thinks out loud

This is the priming step. Claude walks through affected files, dependencies, edge cases, and tradeoffs — building context that wouldn't exist if you just said "fix it."

/document

Converts all that analysis into a TDD-focused GitHub issue. The issue is richer because Claude already thought through the problem.

/start-work → fix → /ship-it

New session picks up the well-documented issue. TDD ensures the fix is verified. Same pipeline as Workflow 1 from here.

Progressive Adoption

Getting Started

Don't skip to Month 4 on Day 1. Each layer builds on the previous.

Week 1: Foundation — .claude.md

Start with a lean .claude.md for your project. Under 300 lines. Include:

Tech stack and framework versions
Build, test, and dev commands
Code conventions and file organization
Key architectural patterns

# .claude.md

## Tech Stack
Next.js 14, TypeScript, Tailwind, Supabase

## Commands
- npm run dev — Start dev server
- npm test — Run test suite
- npm run build — Production build

## Conventions
- Components in src/components/
- Server actions in src/actions/
- Tests in __tests__/ directories

Week 2: Safety — Hooks

Add hooks for things that must always happen. Start with protected files and auto-formatting.

// ~/.claude/settings.json
{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{
        "type": "command",
        "command": "prettier --write \"$FILEPATH\""
      }]
    }],
    "PreToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{
        "type": "command",
        "command": "~/.claude/hooks/block-protected.sh"
      }]
    }]
  }
}

Why hooks over prompting? Prompting says "please format code." Hooks say "code is always formatted." One is a suggestion. The other is a guarantee.

Week 3-4: External Services — MCP Servers

Connect Claude to the services you use repeatedly. Start with GitHub and your database.

# Add GitHub MCP
claude mcp add --transport stdio github \
  -- npx -y @anthropic/mcp-server-github

# Add PostgreSQL MCP
claude mcp add --transport stdio postgres \
  --env DATABASE_URL=${DATABASE_URL} \
  -- npx -y @modelcontextprotocol/server-postgres

Month 2: Cross-Project Knowledge — Skills

Notice patterns you explain repeatedly. Create your first skill with progressive disclosure: a lean SKILL.md (~500 words) with detailed references on demand.

# ~/.claude/skills/company-metrics/SKILL.md (~500 words)

---
name: company-metrics
description: "Company metric definitions and calculation patterns"
---

## Key Metrics
- MRR: SUM(subscription_amount) WHERE status='active'
- Churn: See references/churn-calculation.md
- LTV: See references/ltv-model.md

# ~/.claude/skills/company-metrics/references/churn-calculation.md
# (detailed docs, loaded only when relevant)

Progressive disclosure: Load 500 words upfront, 10,000 words on-demand. Context stays clean.

Month 3: Quality Gates — Sub-Agents

Create agents for specialized behavior. The key: agents are for behavior, not knowledge. Put knowledge in skills and let agents inherit it.

# .claude/agents/code-reviewer.md

---
name: code-reviewer
description: "Reviews code for quality, security, and convention compliance"
---

You are a code reviewer. Review with fresh perspective.
Focus on: security issues, missing error handling,
convention violations, and test coverage gaps.

Load the api-standards skill for pattern reference.

Month 4: Commands — Workflow Automation

Once you know your workflow, encode it into commands. Commands are the glue — they orchestrate skills, agents, and tools into repeatable sequences.

# .claude/commands/start-work.md
# Fetches issue, creates branch, runs baseline,
# detects TDD requirements, reviews available tools

# .claude/commands/ship-it.md
# Builds, tests, commits, pushes, writes changelog, creates PR

# .claude/commands/document.md
# Converts conversation into a GitHub issue
# using the github-issue-creator skill

Commands are where the five layers come together. A single /start-work 247 can fetch an issue (MCP), create a branch (bash), load skills (knowledge), spawn agents (behavior), and enforce TDD (hooks).

Reference

Templates & Examples

Starter configs you can adapt. Copy what's useful, skip what's not.

Skill template (with progressive disclosure)

# ~/.claude/skills/your-skill/SKILL.md

---
name: your-skill-name
description: "One line that tells Claude when to load this"
---

# Skill Name

## Purpose
What this skill teaches Claude to do.

## Core Knowledge (~500 words max)
The essential information Claude needs immediately.
Keep this lean — it's loaded into context.

## References (load on demand)
- See references/detailed-schemas.md for table schemas
- See references/query-patterns.md for SQL patterns
- See references/edge-cases.md for known gotchas

## When to Use
Activate when user asks about [triggers].
Don't use for [anti-triggers].

Agent template

# .claude/agents/your-agent.md

---
name: your-agent-name
description: "What this agent does (behavior, not knowledge)"
---

You are a [role]. Your job is to [specific behavior].

# Load relevant skills for domain knowledge:
Load the [skill-name] skill for reference.

# Focus areas:
- [Specific thing to look for]
- [Specific thing to look for]
- [Specific thing to look for]

# Output format:
Report findings as:
1. Issue description
2. Location (file + line)
3. Severity (critical/warning/info)
4. Suggested fix

Command template (workflow automation)

# .claude/commands/your-command.md

# /your-command - Short description

What this command does in one sentence.

## Usage
/your-command <argument>

## Workflow Steps
Execute in order. STOP and report if any step fails.

### Step 1: Pre-flight
[Validation checks]

### Step 2: Main action
[The core work — can invoke skills, agents, tools]

### Step 3: Verification
[Confirm the work succeeded]

### Step 4: Report
[Show results to user]

## Error Handling
| Error | Action |
|-------|--------|
| [Error type] | [What to do] |

Anti-patterns to avoid

Bloated .claude.md

Over 500 lines? You're wasting 20%+ of context before any work starts. Move cross-project knowledge to skills. Link instead of pasting.

Knowledge in agent prompts

Agents are for behavior, not knowledge. Put domain knowledge in skills and let agents inherit it. Otherwise you reload 10,000 words every time the agent runs.

Duplicated information

Same metric definition in .claude.md, a skill, AND an agent prompt? That's 3x context cost and 3x update burden. Single source of truth, always.

Relying on Claude to remember

"Always format code" in .claude.md is a suggestion. A PostToolUse hook is a guarantee. If it must always happen, make it a hook.

Orchestrating
Commands, Skills,
Agents & Hooks

Five Layers, Five Purposes

Knowledge?

Behavior?

Service access?

Enforcement?

Building a New Feature

Phase 1: /requirements — JTBD Interview

Problem Definition

Jobs Mapping

Codebase Analysis

Implementation Plan

Phase 2: /document — Conversation → GitHub Issue

Phase 3: /start-work <issue-number>

Phase 4: Verify — Tests + Browser

Phase 5: /ship-it

Bugs & Refactoring

Review the code

Explore impact

Claude thinks out loud

/document

/start-work → fix → /ship-it

Context Engineering > Prompting

Conversations are priming

Sessions are boundaries

Agents are fresh eyes

Getting Started

Templates & Examples

Bloated .claude.md

Knowledge in agent prompts

Duplicated information

Relying on Claude to remember