Skip to main content

Building AI Development Workflows with Kiro Skills

Keith Hodo
Author
Keith Hodo
Solutions Architect at AWS. Writing about cloud, agentic AI, and the journey.

In my last post I mentioned that I used Kiro to ship an entire blog migration in a single day — infrastructure, content, CI/CD, the works. I promised a deeper look at the AI tooling. Here it is.

How did I get into it? Well, on February 13th a very dear friend, Alex Wood, messaged and showed me a Package that he had been building with his Kiro Skills. He blogs about how he is doing Production Coding and that post only server to whet my appetite. Once he handed me the Package, I was off to the races. And I haven’t stopped. My delivery cycles have only gotten tighter and the value delivered has gone off the charts.

So, here’s short version: I’ve taken the Skills that Alex handed me and built a library of reusable AI workflows called skills that handle the repetitive parts of my development process. Spec writing, code implementation, code review, session management, deployment. Each skill is a markdown file that guides Kiro on how to execute a specific workflow, step by step, with human review gates built in. They live in the repo alongside the code they operate on.

This isn’t about asking an AI to “write me some code.” It’s about encoding the way I actually work into something repeatable.

What Are Skills?
#

A Kiro skill is a markdown file that lives in your repo at .kiro/skills/{name}/SKILL.md. Each one describes a workflow — when to run it, what steps to follow, what inputs it needs, what outputs it produces, and what rules to follow. When you invoke a skill, Kiro reads the instructions and executes the workflow, using its tools (file system, terminal, web search, AWS CLI) to do the actual work.

Best of all Kiro’s implementation of Skills is backed by the Agent Skills spec. These aren’t just some random prompts that were shipped because they worked in Kiro. They’re real and used by some of the biggest name in the industry. Have a look for yourself.

Think of Skills like a runbook, except the “entity” following the runbook is an AI that can actually execute the steps.

Here’s the structure:

.kiro/
  skills/
    create-spec/
      SKILL.md          ← workflow instructions
    capture-skill/
      SKILL.md
    session-handoff/
      SKILL.md
    implement-and-review-loop/
      SKILL.md
    ...

Skills are portable. I’ve copied skills between projects, adapting the project-specific parts while keeping the workflow structure intact. The blog you’re reading right now was built using skills originally developed for a completely different project.

Here’s the Most Valuable Player (MVP) of the skills collection. Capture Skills allows you to do anything from capture a collection of prompts you enter to read a blog post and ask for a delta of how they’re thinking about something and then add it to your Skills collection. It’s amazing.

Skill 1: Capture Skill
#

This is the meta skill — the one that creates other skills. When I find myself repeating a workflow or explaining the same process to Kiro more than once, I run capture-skill to turn it into a reusable prompt.

# Capture Skill

Create a new Kiro CLI skill (prompt) from a conversation,
a pasted prompt, or a description.

## Input

Optional: a name for the new skill (e.g., "refactor-code").
Will ask if not provided.

## Process

### Step 1: Gather Source Material

Ask the user to provide one of:
1. A prompt they've written — paste the text directly
2. A description of what the skill should do — help draft it
3. Reference to earlier in this conversation — extract
   the relevant workflow

### Step 2: Analyze and Structure

From the provided material, identify:
- Core purpose — what does this skill accomplish?
- Required inputs — what arguments does it need?
- Step-by-step process — break into clear phases
- Expected outputs — what files/artifacts are produced?
- Error cases — what could go wrong?

### Step 3: Draft the Skill

Create a markdown prompt following the structure of existing
skills in `.kiro/skills/`:

  # {Skill Name}

  {One-line description}

  ## Input
  {Describe expected input or arguments}

  ## Process
  ### Phase 1: {Name}
  {Steps...}

  ### Phase 2: {Name}
  {Steps...}

  ## Output
  {What the user gets when complete}

  ## Rules
  - {Constraints and conventions}

### Step 4: Review with User

Present the draft and ask:
- Does this capture what you wanted?
- Any steps to add or remove?
- Confirm the skill name.

### Step 5: Save

1. Validate name: alphanumeric, hyphens, underscores only.
   Max 50 characters.
2. Save to `.kiro/skills/{name}/SKILL.md`
3. Confirm creation and show how to use it:
   `/skills` → select `{name}`

## Skill Naming Rules

- ✅ Valid: `review-pr`, `debug-test`, `deploy-stacks`
- ❌ Invalid: `my skill` (spaces), `review.code` (dots)

## Rules

- Keep skills focused — one skill per workflow.
- Include verification steps where appropriate.
- Make the skill self-contained — readable cold.
- Ask the user rather than guessing on unclear points.

The key insight is that skills are self-improving. Every time I use a skill and find a gap — a missing step, a wrong assumption, an edge case — I either fix it inline or run capture-skill to create a new one. The library grows organically from real work, not from trying to anticipate every scenario upfront. Best of all, with these fixes, the skills become more inline with how you work. Just ask Kiro to “create or improve skills” and it analyzes the delta and self-improves. Magic.

Skill 2: Improve Skill
#

The companion to capture-skill. After using a skill and noticing something off — a missing step, a rule that got violated, output that didn’t match expectations — I run improve-skill to make targeted fixes while the context is still fresh.

# Improve Skill

Review the current chat session where a skill was used and
improve it based on feedback.

## Workflow

1. Review the conversation first to identify skill gaps:
   - Where the skill's output diverged from expectations
   - Missing steps that had to be done manually
   - Rules that were violated or missing
   - Output format issues
2. If gaps are obvious from context, propose the specific
   skills and fixes directly. If not clear, ask:
   "Which skill needs improvement?" and "What went wrong?"
3. Read the current skill file(s) from `.kiro/skills/`.
4. Multiple skills can be improved in one pass — don't
   force one-at-a-time.
5. For each skill, describe the change and apply it.
6. After applying changes, summarize all improvements made.

## Rules

- Don't rewrite the entire skill — make targeted
  improvements.
- Preserve what's working well.
- Add examples from the current session where they
  clarify the improvement.
- When a rule was violated, strengthen the rule text
  (e.g., add ⚠️ MANDATORY markers) rather than just
  restating it.
- Prefer adding guardrails (hard gates, warnings) over
  relying on behavioral compliance.

This is how skills evolve. capture-skill creates them, improve-skill refines them. The combination means the library gets better every session without requiring a dedicated “skill maintenance” effort.

Skill 3: Session Handoff
#

This one solved a problem that was driving me crazy. I’d be deep in a feature, end the session, come back the next day, and spend valuable time re-explaining context to Kiro. What branch am I on? What task was I working on? What decisions did I make and why? What’s deployed? What’s broken?

The session-handoff skill captures all of that into a structured document:

# Session Handoff

Generate a session handoff document capturing the current working
state for the next session.

## Workflow

1. Check `git status`, `git branch`, and `git log --oneline -5`
   for current state.
2. Review the conversation history to identify what was
   accomplished this session.
3. Write the handoff to `.kiro/context/session-handoff.md`
   (rolling file — overwrite each time).

## Required Sections

- BEFORE RESUMING: Blockers that must be resolved before any
  work (e.g., expired credentials, pending merges, VPN).
  This section goes first so the next session doesn't start
  broken.
- IMMEDIATE NEXT STEPS: Numbered list of what to do first
  next session (deploy, merge, test, etc.).
- CURRENT STATE: Branch name, clean/dirty, last commit,
  phase progress (X/Y tasks), test counts, live URLs.
- WHAT WE DID THIS SESSION: Each task completed with details,
  decisions made, bugs found and fixed.
- WHAT'S REMAINING: Table of next tasks with status and notes.
- AWS RESOURCES: Cloud resources with IDs, regions, and values
  (Lambda, S3, SSM params, etc.).
- KEY FILES: Table of important file paths grouped by area.
- KEY DECISIONS: Numbered list of architectural decisions to
  carry forward (with rationale).
- STAKEHOLDER PREFERENCES: Workflow preferences, naming
  conventions, review process.

## Rules

- Be specific — include resource IDs, exact commands,
  and file paths.
- Document decisions and their rationale, not just what
  was done.
- Note any blockers or environment issues encountered.
- After saving, immediately run the `auto-memory` skill
  to capture learnings into `.kiro/steering/memory.md`.
- After auto-memory, remind the user: "Any skills need
  refining from this session? Run `improve-skill` if so."

When I end a session, I say “handoff” and Kiro checks git status, reviews what we accomplished, and writes a handoff document to .kiro/context/session-handoff.md. It includes specific resource IDs, exact commands, file paths, and the reasoning behind decisions — not just what was done, but why.

Skill 4: Session Resume
#

The companion to handoff. When I start a new session, I say “resume” and Kiro reads the handoff document, loads the project context from the steering files, checks git status to make sure reality matches the document, and presents a summary of where we left off and what’s next.

# Session Resume

Resume a previous working session by loading the latest
handoff context.

## Workflow

1. Read all steering files from `.kiro/steering/` to load
   project context (structure, tech stack, branching rules,
   product overview).
2. Read all agent definitions from `.kiro/agents/` to
   understand available code review and automation agents.
3. Read all skill definitions from `.kiro/skills/` to
   understand available workflows.
4. List files in `.kiro/context/` sorted by modification
   date (newest first).
5. Read the most recent `session-handoff.md`.
6. Present a summary:
   - Where we left off: Current branch, phase, and
     immediate next steps
   - What's live: Key resources and endpoints
   - What's next: Top 3-5 action items from the handoff
   - Blockers: Anything that needs attention before
     resuming (expired creds, unpushed commits, etc.)
7. Run `git status` and `git branch` to confirm current
   state matches the handoff doc. Flag any discrepancies.
8. Ask: "Ready to pick up from here? Or do you want to
   pivot to something else?"

## Rules

- Don't dump the entire handoff doc — summarize the
  actionable parts.
- If git state doesn't match the handoff, call it out
  clearly.
- If there are unpushed commits or uncommitted changes,
  mention them first.
- Load steering docs for project context but don't
  recite them.
- Carry forward stakeholder preferences from the
  handoff doc.

The handoff/resume pair means I never lose context between sessions. I can pick up a project after a week away and be productive in under a minute. The AI reads its own notes from last time and knows exactly where things stand.

Skill 5: Create Spec
#

This is one of my favorites for software engineering. We know that Spec-driven development drives better results out of LLMs. This builds on that assumption and super-charges it. The create-spec skill orchestrates a full specification pipeline — requirements gathering, high-level design, low-level design, and task planning — producing a single unified document that becomes the input for implementation. Want to go turn by turn? Have Kiro ask you questions. Want to bring in MCP Servers or “Tools”, easy, just tell it to use tools.

The workflow has human review gates after every phase. Kiro doesn’t just generate a spec and hand it to you. It generates the requirements, stops, and asks for feedback. Then it generates the high-level design based on the approved requirements, stops again, and asks for feedback. Each phase builds on the previous one, and nothing advances without approval.

# Create Spec

Orchestrate the full specification pipeline — requirements →
high-level design → low-level design → task plan — producing
a single unified spec document. This spec is the primary input
for `implement-and-review-loop`.

## Input

A description of what to build, or "resume" to continue an
in-progress spec.

## Output

A single file: `docs/{feature-name}-spec.md`

## Process

### Phase 0: Research (if AWS/infrastructure features)

If the feature involves AWS APIs, Lambda runtimes, SDK
capabilities, or infrastructure patterns:
1. Use tools to research — call search_documentation,
   web_search, or read_documentation to verify assumptions
   about API capabilities, SDK support, and runtime
   limitations BEFORE writing requirements.
2. Document findings — add a "Research Findings" section
   to the requirements with what's supported, what's not,
   and links to docs.
3. Flag constraints early — if research reveals a limitation,
   surface it in requirements as a constraint, not as a
   surprise in HLD.

### Phase 1: Requirements

1. Gather the user's description, ask clarifying questions.
2. Produce the Requirements section.
3. STOP — present for review.
4. If feedback given, revise and re-present. Loop until
   approved.
5. Save the spec file. Confirm: "Requirements approved.
   Moving to High-Level Design."

### Phase 2: High-Level Design

1. Produce the HLD section using requirements as context.
2. STOP — present for review.
3. If feedback given, revise. Loop until approved.
4. Update the spec file. Confirm: "HLD approved. Moving
   to Low-Level Design."

### Phase 3: Low-Level Design

1. Produce the LLD section using requirements + HLD
   as context.
2. STOP — present for review.
3. If feedback given, revise. Loop until approved.
4. Update the spec file. Confirm: "LLD approved. Moving
   to Task Plan."

### Phase 4: Task Plan

1. Break the LLD into implementation tasks with
   dependencies.
2. Validate tasks with tools — for each task that
   references AWS APIs, env vars, SDK methods, or CLI
   commands, verify with tools before writing.
3. STOP — present for review.
4. If feedback given, revise. Loop until approved.
5. Update the spec file. Confirm: "Spec complete! Ready
   for implement-and-review-loop."

### Phase 5: Final Summary

Present:
- Spec file path
- Requirement count (FR + NFR)
- Component count from LLD
- Task count with dependency waves
- "Run implement-and-review-loop to start building."

## Resuming an In-Progress Spec

If the user says "resume":
1. Find the most recent spec in `docs/` ending in
   `-spec.md`.
2. Determine which phase is next.
3. Pick up from there.

## Rules

- Human review gate after every phase — never
  auto-advance.
- Each phase builds on the previous.
- Every requirement must be traceable through HLD → LLD
  → at least one task.
- If a spec already exists for this feature, ask whether
  to revise it or start fresh.

Phase 0 is worth calling out. Before writing any requirements, Kiro researches the AWS services involved — checking API capabilities, SDK support, and known limitations. This prevents the painful mid-design pivot where you discover that the thing you designed doesn’t actually work the way you assumed. I’ve had that happen enough times to make research a mandatory first step.

The output is a single markdown file with requirements, design, and tasks all in one place. Every requirement traces through the design to at least one implementation task. When I hand this to the implement-and-review-loop skill, it has full context for every task it works on.

Skill 6: Implement and Review Loop
#

This is where the actual coding happens, and it’s where things get agentic. The implement-and-review-loop skill chains implementation and code review in an automated cycle. It reads a task from the spec, implements it, runs a multi-agent code review, fixes the findings, and re-reviews until the code is clean.

# Implement and Review Loop

Orchestrate an implement → review → fix cycle for tasks
in a spec. Chains implementation and review in a loop
until code is clean.

THIS IS THE DEFAULT ENTRY POINT for implementation work.
When the user asks to "implement", "build", or "code",
use this skill.

## Input

A task number, "next task", or "implement all open tasks".
The spec is read from `docs/` — look for `*-spec.md`.

## Process

### Phase 1: Implement

1. Read the task from the spec.
2. Implement the changes described.
3. Run guard-rails — all build gates must pass.

### Phase 1.5: Verify Build

After implementation and before review, run guard-rails:
1. All build gates must pass (hard fail blocks the loop).
2. All test gates must pass (hard fail blocks the loop).
3. New code coverage check — flag untested public functions.
4. Secrets scan — block if detected.
5. Branch check — block if on main.

### Phase 2: Review (MANDATORY — never skip)

Invoke review agents in parallel:
- Security reviewer
- Infrastructure reviewer
- Maintainability reviewer
- Performance reviewer
- Test quality reviewer

Each finding gets a severity:
- 🔴 Must Fix — broken builds, security issues
- 🟡 Should Fix — maintainability, performance
- 🟢 Nit — style, naming (logged but not acted on)

### Phase 3: Fix (if actionable findings exist)

1. For each actionable finding (🔴 + 🟡), apply the fix.
2. Re-run the relevant test suite(s).
3. If tests fail, feed the error back and retry the fix
   (max 2 retries per finding).

### Phase 4: Re-review (if fixes were applied)

1. Run quick-review on the fix diff only.
2. If new 🔴 or 🟡 findings emerge, loop back to Phase 3.
3. Max 3 total review→fix iterations to prevent infinite
   loops. If still unresolved after 3 passes, present
   remaining findings to user for manual decision.

### Phase 5: Present Final State

STOP. Present to user:
- Summary of files created/modified
- Test count (total passing)
- Spec progress (X/Y tasks complete)
- Review iterations completed
- Any remaining findings that couldn't be auto-resolved
- Newly eligible tasks
- "Ready to commit?"

After approval, offer the full chain:
1. build-and-deploy
2. push-and-pr
3. session-handoff

### Phase 6: Commit (only after approval)

1. Stage specific files (not `git add .`).
2. Commit with descriptive message including review stats.

### Phase 7: Next Task (batch mode only)

If running "implement all", move to next eligible task
and repeat from Phase 1.

## Rules

- NEVER skip Phase 2 (review). Every task gets reviewed.
- When batching: implement one task → review → fix →
  commit → next task. Do NOT batch multiple tasks into
  a single review.
- Max 3 review→fix iterations. Escalate to human after.
- Never commit without user approval.

The review phase is the part I’m most particular about. It uses specialized review agents — separate AI personas that each focus on one aspect of code quality. The security reviewer is paranoid about auth bypasses and hardcoded secrets. The infrastructure reviewer thinks about what breaks at 3 AM. The maintainability reviewer thinks about the developer who has to modify this code six months from now.

Here’s the full set of agent definitions. Each one lives in .kiro/agents/ as a JSON file:

{
  "name": "review-security",
  "description": "Security-focused code reviewer",
  "prompt": "You are a security-focused code reviewer.
    Focus exclusively on:
    - Authentication and authorization: Are auth checks
      present and correct? Can they be bypassed?
    - Input validation: Are all user inputs validated and
      sanitized? SQL injection, XSS, command injection?
    - Secrets management: Are secrets, keys, or credentials
      hardcoded or logged?
    - Data exposure: Does the API return more data than
      necessary? Are error messages leaking internals?
    - Dependency risks: Are there known-vulnerable packages?
    - IAM permissions: Are AWS IAM policies least-privilege?
    - Encryption: Is data encrypted at rest and in transit?
    For each finding: state the risk, rate severity, suggest
    a fix. Be paranoid. Assume attackers will find every
    weakness.",
  "tools": ["fs_read", "code", "grep", "glob"]
}
{
  "name": "review-infrastructure",
  "description": "AWS and infrastructure-focused code reviewer",
  "prompt": "You are an AWS infrastructure code reviewer.
    Focus exclusively on:
    - CDK patterns: Cross-stack coupling? Correct use of
      RemovalPolicy? Stack dependency ordering?
    - IAM: Least-privilege policies? Overly broad wildcards?
      Missing condition keys?
    - Encryption: S3 encryption enabled? KMS keys where
      needed? SSL/TLS enforced?
    - Networking: Security groups too permissive? Public
      access where it shouldn't be?
    - Cost: Over-provisioned resources? Missing lifecycle
      rules? Inefficient storage classes?
    - Monitoring: Missing CloudWatch alarms or metrics?
    - Resilience: Single points of failure? Missing multi-AZ?
    - Tagging: Resources missing required tags?
    For each finding: explain the operational risk, rate
    severity, suggest a fix. Think about what breaks at
    3 AM when nobody is watching.",
  "tools": ["fs_read", "code", "grep", "glob"]
}
{
  "name": "review-maintainability",
  "description": "Maintainability-focused code reviewer",
  "prompt": "You are a maintainability-focused code reviewer.
    Focus exclusively on:
    - Code organization: Are files, classes, and methods in
      the right place?
    - Naming: Are names descriptive and consistent?
    - Separation of concerns: Are responsibilities properly
      divided?
    - DRY violations: Is there duplicated logic that should
      be extracted?
    - Error handling: Are errors handled consistently?
    - Configuration: Is config manageable across environments?
    - Documentation: Are public APIs documented?
    - Testability: Is the code structured for easy unit
      testing? Are dependencies injectable?
    For each finding: explain why it hurts maintainability,
    rate severity, suggest a refactoring. Think about the
    developer who has to modify this code 6 months from now.",
  "tools": ["fs_read", "code", "grep", "glob"]
}
{
  "name": "review-performance",
  "description": "Performance-focused code reviewer",
  "prompt": "You are a performance-focused code reviewer.
    Focus exclusively on:
    - Resource allocation: Are objects created unnecessarily
      per-request? Are HTTP clients and SDK clients reused?
    - Data fetching: N+1 problems? Missing pagination?
    - Memory: Unnecessary allocations? Missing disposal?
    - Async patterns: Blocking calls in async methods?
      Thread pool starvation risks?
    - Caching: Are there missed caching opportunities?
    - Payload sizes: Are API responses unnecessarily large?
    - Infrastructure: Over-provisioned resources? Missing
      auto-scaling?
    For each finding: explain the performance impact, rate
    severity, suggest a fix. Think about what happens at
    10x and 100x the current load.",
  "tools": ["fs_read", "code", "grep", "glob"]
}
{
  "name": "review-test-quality",
  "description": "Test quality and coverage reviewer",
  "prompt": "You are a test quality reviewer.
    Focus exclusively on:
    - Coverage gaps: Are new code paths covered by tests?
    - Edge cases: Boundary conditions? Null inputs? Empty
      collections?
    - Assertion quality: Are assertions specific? Do tests
      verify behavior or just that code doesn't throw?
    - Test isolation: Do tests depend on external state or
      ordering?
    - Mock usage: Are dependencies properly mocked?
    - Naming: Do test names describe scenario and outcome?
    - Negative tests: Are failure paths tested?
    Also flag when production code changes have NO
    corresponding test changes.
    For each finding: explain what could go undetected,
    rate severity, suggest a test to add. Assume every
    untested path will break in production.",
  "tools": ["fs_read", "code", "grep", "glob"]
}

Each agent gets read-only access to the codebase and produces structured findings. The loop collects all findings, applies fixes for the actionable ones, and re-reviews until clean. The whole cycle — implement, review with five agents, fix, re-review — happens without me touching the keyboard. I review the final summary and approve or request changes.

The Compound Effect
#

Individually, any one of these skills saves time. But the real value is in how they chain together. A typical feature development session looks like this:

  1. Resume — load context from last session
  2. Create spec — requirements → design → tasks (with human gates)
  3. Implement and review — code → 5-agent review → fix → re-review
  4. Handoff — save state for next session

Each skill produces artifacts that the next skill consumes. The spec feeds the implementation loop. The implementation loop produces code that gets reviewed. The handoff captures everything for the resume. It’s a pipeline, and each stage is a reusable, improvable component.

I’ve been running this workflow on a real project for several months now. The skills have been refined through actual use — every time something goes wrong or a step is missing, I fix the skill. They’re not theoretical. They’re battle-tested.

Getting Started
#

If you want to try this yourself, start small. Don’t try to build the whole pipeline at once. Pick one workflow that you repeat often — maybe it’s how you write commit messages, or how you review PRs, or how you set up a new feature branch. Write a skill for that one thing. Use it a few times. Improve it. Then add another.

The Kiro skills documentation covers the format and conventions. But, honestly, you can just have Kiro start writing skills for you. And then you improve them over time. Skills are just markdown files in your repo — no special tooling beyond Kiro itself.

The best skills come from real friction, not from imagining what might be useful. Pay attention to the moments where you’re explaining the same process to your AI assistant for the third time. That’s a skill waiting to be written.

Until next time — go build something cool! Keith