Usage
software-plan-reviewallowed-skills#agent-creator
Configuration
read_fileglobgrep |
Instructions
Overview
software-plan-creation
Expected Plan Structure
software-plan-creation
Title and Summary — what is being delivered and the value it provides Design Reference — the source design name and path for traceability Prerequisites — dependencies, access, open questions that must be resolved before work begins Phases — numbered phases, each containing: Scope — what the phase delivers Tasks — numbered, concrete, actionable tasks Testing — testing approach for the phase, including TDD slices when applicable Exit criteria — what must be true when the phase is complete
Testing Strategy — overall approach covering unit, integration, and end-to-end testing; TDD guidance where applicable Documentation — what documentation is needed and when Risks and Mitigations — table with Risk, Impact, and Mitigation columns Definition of Done — checklist of conditions for the entire plan to be considered complete
Steps
Identify the plan to review: accept a plan name, a stored plan reference, or plan text pasted in the conversation restate the plan's apparent purpose, the design it covers, and the scope of work if the input is fragmented, reconstruct the main proposal before critiquing it
Resolve the plan artifact: if the full plan is already present in the conversation, review that text directly — do not force retrieval through plan tools if the user refers to a stored plan by name or asks to review an existing plan, use globwith pattern .stencila/plans/*.mdto locate candidates use read_fileto load the selected stored plan if multiple similarly named plans exist, list candidates, compare them, and review the one that best matches the user's request
Resolve the corresponding design (when available): use globwith pattern .stencila/designs/*.mdto find the design, then read_fileto load the design specification that the plan is based on if the plan includes a "Design Reference" section, use the name or path it provides to locate the design reviewing a plan against its source design is essential for checking coverage, alignment, and whether acceptance criteria are fully addressed if no corresponding design exists or cannot be identified, note this as a limitation and review the plan on its own merits
Understand the plan before judging it: summarize the planned work, its phases, and the overall delivery strategy in plain language identify the stated goals, prerequisites, phasing strategy, testing approach, and definition of done note any missing context that materially limits confidence in the review
Evaluate structure and completeness against the expected plan format: check whether each expected section is present and well-formed check that the summary accurately reflects the design's goals check that the design reference is present and identifies the source design clearly enough to trace back check that prerequisites include open questions from the design flag missing sections and assess whether their absence is justified or is a gap
Evaluate task breakdown and granularity: assess whether tasks are concrete and actionable — each task should be something a developer can start and finish flag tasks that are too vague (e.g., "implement the feature") or too granular (e.g., individual lines of code) check whether each phase has a clear deliverable and exit criterion assess whether the number of phases is appropriate — not artificially inflated for simple work or too few for complex work check whether every acceptance criterion from the design maps to at least one task
Evaluate phasing and sequencing: check whether phases are ordered so that foundational work precedes dependent work identify circular dependencies or phases that depend on work from a later phase assess whether vertical slices (end-to-end functionality) are preferred over horizontal slices (all models, then all APIs, then all UI) where appropriate flag cases where parallelizable work is unnecessarily serialized check that each phase produces a working, testable increment
Evaluate testing strategy: assess whether the overall testing approach is realistic for the technology and architecture described check whether testing is specified for each phase, not just at the plan level check whether unit, integration, and end-to-end testing are covered as appropriate for the work flag phases that have no testing approach specified
Evaluate TDD slices (when TDD is used): check whether TDD slices follow logically coherent red-green-refactor cycles — each slice should cover one meaningful behavior or a tightly related behavior cluster, implement just enough to pass, then refactor flag slices that batch too many unrelated tests before implementing (this dilutes test intent and overwhelms the implementation step) flag slices that are overly micro-sliced into trivial assertions or mechanical follow-ups where workflow overhead would dominate the useful work check whether slices are well-sequenced and build incrementally on each other check whether slice descriptions are specific enough to act on (e.g., "write a failing test that parsereturns AuthError::MalformedTokenfor an empty string" is good; "write tests for parsing" is too vague) assess whether slice boundaries align with behavior boundaries, dependency boundaries, risk boundaries, or meaningful review checkpoints suggest merging adjacent slices when they are too small and tightly related, or splitting slices when they are too broad and mix loosely related behaviors if TDD is proposed for work where it is a poor fit (exploratory, UI-heavy, hard-to-mock external dependencies), flag this and suggest an alternative
Evaluate risks and mitigations: check whether risks from the source design are acknowledged and addressed in the plan identify risks the plan introduces that were not in the design (e.g., risky sequencing, single points of failure, missing expertise) assess whether mitigations are concrete rather than generic if the plan uses the expected table format (Risk | Impact | Mitigation), check that the Impact column is meaningful and mitigations are actionable check whether the hardest parts of the plan are surfaced rather than hidden
Evaluate definition of done: check whether the definition of done is present, specific, and verifiable check whether it aligns with the design's acceptance criteria flag checklist items that are vague (e.g., "code is good") or missing (e.g., no mention of tests passing, documentation, or code review)
Evaluate alignment with the source design: check whether the plan's scope matches the design's scope — not broader, not narrower verify that the plan does not silently drop design requirements or acceptance criteria verify that the plan does not introduce scope that was explicitly out-of-scope in the design check whether open questions from the design are handled as prerequisites or risks in the plan
Evaluate documentation tasks: check whether documentation tasks are included in the plan assess whether documentation is integrated into phases rather than deferred to the end check whether the plan specifies what kind of documentation is needed (inline docs, API docs, user-facing docs, architecture decision records)
Produce a structured review report following the Report Format below Distinguish facts from uncertainty: clearly label assumptions made during the review separate definite problems from possible risks or questions that need confirmation avoid inventing system facts that are not supported by the plan or its source design
Review Checklist
Summary and Design Reference
Is there a clear summary of what is being delivered and why? Is the source design identified with name and path so the plan can be traced back? Does the summary accurately reflect the design's goals?
Prerequisites
Are prerequisites clearly stated? Are open questions from the design listed as prerequisites or risks? Are external dependencies, access requirements, or decisions that must be resolved before work begins identified?
Task Breakdown and Granularity
Are tasks concrete, actionable, and appropriately scoped? Could a developer pick up any task and understand what to do? Are there tasks that are too vague to estimate or begin? Are there tasks that are so granular they add noise rather than clarity? Does every acceptance criterion from the design map to at least one task?
Phasing and Sequencing
Does each phase produce a working, testable increment? Are phases ordered so no phase depends on work from a later phase? Is the number of phases appropriate for the complexity of the work? Are vertical slices preferred over horizontal slices where appropriate? Are there opportunities to parallelize work that the plan serializes?
Testing Strategy
Is the testing approach specified for each phase? Is the overall testing strategy realistic for the architecture? Are unit, integration, and end-to-end testing covered as appropriate? If TDD is used, are slices logically coherent and incremental? If TDD is used, does each slice specify what test to write, what to implement, and what to refactor? If TDD is used, do slices avoid batching many tests before implementing? If TDD is used, do slices also avoid low-value micro-slicing where several adjacent slices should really be one behavior-oriented unit? If TDD is used, are slice boundaries justified by behavior, dependency, subsystem, or risk boundaries? If TDD is not used, is the alternative clearly described? Are there phases with no testing approach specified?
Documentation
Are documentation tasks included in the plan? Is it clear what documentation is needed and when it should be written? Are documentation tasks integrated into phases rather than deferred to the end?
Risks and Mitigations
Are risks from the source design addressed or acknowledged? Does the plan identify risks it introduces (sequencing risks, dependency risks, etc.)? Are mitigations concrete and actionable rather than generic? Are the hardest parts of the plan surfaced rather than hidden? If using the table format, are Impact values meaningful?
Definition of Done
Is there a clear definition of done for the entire plan? Does the definition of done align with the design's acceptance criteria? Is the definition of done specific and verifiable? Does it cover tests passing, documentation, and code review?
Feasibility and Actionability
Is the plan realistic given constraints, dependencies, and delivery goals? Could a team use this plan to estimate, assign, and begin work? Are open questions clearly separated from settled decisions? Are next changes to the plan obvious from the review?
Report Format
Overall Assessment
Strengths
Findings
Structure and completeness — missing or malformed sections Design coverage — dropped requirements, added scope, or misalignment with the source design Task breakdown and granularity — vague, oversized, or overly granular tasks Phasing and sequencing — ordering issues, dependency problems, artificial phasing Testing strategy — missing coverage, unrealistic approaches, TDD slice problems Documentation — missing or deferred documentation tasks Risks and mitigations — unaddressed risks, generic mitigations Definition of done — missing, vague, or misaligned completion criteria
indicate severity as High , Medium , or Low describe the issue precisely, referencing the specific phase, task, or section explain why it matters
Recommendations
Open Questions
Examples
a structured critique covering each checklist dimension feedback on sequencing (e.g., token validation should precede middleware integration) assessment of whether all design acceptance criteria are covered by plan tasks evaluation of TDD slice quality (e.g., slices in Phase 1 are well-scoped but Phase 3 batches too many tests in a single slice, or Phase 2 is fragmented into micro-slices that should be merged) prioritized recommendations such as adding exit criteria to Phase 2, rebalancing TDD slices in Phase 3, and adding a risk entry for third-party OAuth provider downtime
a review highlighting strengths in phasing and TDD approach finding that the Design Reference section is missing, making traceability difficult warnings about missing documentation tasks, vague Phase 3 tasks, and an unaddressed open question from the design about message retry limits finding that the Definition of Done omits documentation and code review concrete suggestions for tightening task descriptions, reordering phases to reduce dependency risk, and adding integration test coverage
Edge Cases
Very short or partial plan : Do not refuse. Review what exists, identify the most important missing sections, and state the confidence limits caused by missing detail. Mostly good plan with a few weak spots : Preserve strengths in the review instead of rewriting the whole plan as if it were poor. Keep the critique proportional. Plan without a source design : Review the plan on its own merits. Note that the absence of a source design limits the ability to check coverage and alignment. Recommend creating a design if the plan covers complex or ambiguous work. Plan that diverges from its source design : Call out the divergence explicitly — added scope, dropped requirements, or changed assumptions — and recommend reconciling the plan with the design or updating the design to reflect intentional changes. Plan with no testing strategy : Flag this as a high-severity finding and suggest what types of testing should be specified based on the nature of the work. Plan with TDD slices that are too broad : Flag slices that batch many tests or cover too much scope. Suggest narrowing each slice to one meaningful behavior or tightly related behavior cluster with a clear red-green-refactor cycle. Plan with TDD slices that are too narrow : Flag runs of trivial or highly adjacent slices whose boundaries add more workflow overhead than clarity. Suggest merging them into fewer behavior-oriented slices. Plan with artificial phasing : If the work is simple enough for a single phase but the plan breaks it into many phases, recommend consolidation and explain why fewer phases would be more effective. Plan with no definition of done : Flag this as a finding and suggest a checklist derived from the design's acceptance criteria plus standard items (tests passing, documentation written, code reviewed). Plan pasted in conversation : Review the text directly. Do not insist on locating a stored plan artifact when the content is already available. Review drifting into plan creation : Suggest structural changes and task improvements when helpful, but keep the primary deliverable as critique and revision guidance rather than producing a replacement plan.
TDD Slice Sizing Review Heuristics
Signs slices are too narrow
multiple adjacent slices touch the same code area and acceptance criterion with only tiny assertion-level differences the plan separates behavior that would naturally be tested and implemented together several slices appear to exist only because of implementation order, not because of meaningful behavior or risk boundaries the likely workflow overhead per slice would be disproportionate to the value of the separation
Signs slices are about right
each slice delivers one meaningful behavior or one tightly related behavior cluster each slice has a plausible Red, Green, and Refactor loop with localized feedback if something fails the slice descriptions are concrete enough to act on without prescribing every tiny assertion as its own slice the sequence builds confidence incrementally without unnecessary handoff overhead
Signs slices are too broad
one slice spans several loosely related behaviors or acceptance criteria a single slice crosses package, subsystem, or architectural boundaries without a strong reason many unrelated tests would need to be written before implementation can begin failure or review feedback would likely be diffuse and hard to resolve in one iteration
.stencila/skills/software-plan-review/SKILL.md