에이전트 기반 코딩 실험 3

2025-08-03 (modified: 2025-09-16)
저자: AK

에이전트 기반 코딩 실험 1과 실험 2에서 이어지는 내용. 2025년 8월 2일 즈음에 정리.

기존 실험에서 개선할 점

에이전트 기반 코딩 실험 2에서는 “Business and Product Requirements Document” 또는 “BPRD”라는 문서를 만들어봤다. 문서를 절반 쯤 형식화하여 추적가능성(traceability)을 분석할 수 있도록 했다. 월.

기계적으로 분석 가능한 문서를 만들어서 추적가능성을 분석하는 방식 자체는 좋았는데 이걸 실제 코드까지 연결시키지는 못했다. 실제로 명세서에 대응되는 자동화된 인수 테스트가 있는지 여부나, 비즈니스로까지 경로가 이어지지 않는 코드(정당화되지 않은 코드)가 존재하는지 검사하는 기능 등이 없어서 아쉬웠다.

새 이름: S4

“AI-Driven Software Development Process”라는 프로세스를 규정하고 이 프로세스에 필요한 주요 정보를 간략하게 담아낸 문서인 “Semi-Structured Software Specification”, 줄여서 “S4”라는 문서를 만들었다. BPRD의 확장판이다.

S4의 내적 일관성, S4와 코드 사이의 추적가능성 등을 분석하는 CLI 도구인 s4를 만들었다.

명세서 내적 일관성과 명세-코드 일관성

s4 validate 명령을 명세서에 내적 일관성이 있는지 검사해준다. 기능 간 의존성에 순환은 없는지, 기능과 연결되지 않는 비즈니스 목표 또는 그 반대로 비즈니스 목표 없이 존재하는 기능이 있는지, 모든 기능이 하나 이상의 인수 테스트에 의해 커버되는지 등을 검사한다.

s4 status를 실행하면 명세서와 코드 사이가 일치하는지(인수 테스트 기준), 등록한 모든 검사를 실행한 후 어떤 문제가 있는지 알려준다. 여러 문제가 있을 경우 가장 먼저 해결해야할 문제가 무엇인지, 해당 문제를 해결하기 위해 필요한 뭘 하면 되는지 등을 알려준다.

각종 정적 검사와 테스트

에이전트 기반 코딩을 할 때 자주 발생하는 문제는 코드 품질이 낮아진다는 점이다. 에이전트에게 여러 도구를 제공하면 크게 개선할 수 있다. 모델이 얼마나 똑똑해지는지도 중요하지만 모델을 위한 인지적 적소 구축 또한 매우 중요하다.

자세한 내용은 AI 시대의 소스코드 품질 중 “AI가 생성한 코드의 품질을 강제하기” 섹션 참고.

외부 도구 호출

위 도구들을 쓰는 방법이 프로젝트마다 다르기 때문에 s4 CLI에서 모두 구현할 수는 없다. 따라서 명세서에 확장점을 추가했다. 외부 도구를 호출하고 그 결과를 그대로 제공하거나, 필요한 경우엔 어뎁터를 만들어서 출력을 파싱한 뒤 적절히 활용할 수 있다. 예를 들어 명세서의 인수 테스트 중 어떤 게 실제로 통과하는지 알아내려면 인수테스트 실행 결과를 파싱해야 한다(TAP 같은 표준 프로토콜이 있으니 쉽게 해결할 수 있다).

스펙-코드 사이를 연결하는 외부 도구는 connector 섹션에 등록한다.

connectors:
  listAcceptanceTests: |- # Prints ATs in the format: "AT-0001: TITLE"
    npx vitest list --project=acceptance | awk -F'> ' '{split($1, parts, "/"); id=parts[length(parts)]; print id ": " $2}' | sed 's/\.test\.ts : /: /'
  locateAcceptanceTest: |-
    echo "src/at/{ID}.test.ts"
  runAcceptanceTest: |-  # TAP-flat format
    npm run --silent test:acceptance -- src/at/{ID}.test.ts
  runAcceptanceTests: |-  # TAP-flat format
    npm run --silent test:acceptance

그외의 도구들(각종 정적 검사와 테스트)은 tools에 등록한다.

tools:
  - id: biome
    command: npm run --silent check:biome
  - id: tsc
    command: npm run --silent check:tsc
  - id: eslint
    command: npm run --silent check:eslint
  - id: depcruise
    command: npm run --silent check:depcruise
  - id: knip
    command: npm run --silent check:knip
  - id: jscpd
    command: npm run --silent check:jscpd
  - id: unittest
    command: npm run --silent test:coverage
  - id: tloc
    command: npm run --silent check:tloc
    recommendedNextActions: |-
      Test-to-production code ratio is too
      high. Simplify tests while keeping test
      coverage about the same. Run following
      command to check the longest test
      files:

      > tokei -f -s code | grep "test.ts" | head -n 10

      Note: Comments and empty lines are not counted
      in the code calculation, so there is no need
      to remove them.

프로세스

s4 status를 실행하면 AI 에이전트가 다음에 뭘 하면 좋을지 알려준다. 대충 이런 식이다:

깨지는 인수 테스트가 있으면 고치라고 지시한다.
(위 검사를 통과했으면) 명세서의 내적 일관성에 문제가 있으면 이를 고치라고 지시한다.
(위 검사를 통과했으면) 명세서와 코드 사이에 불일치가 있으면 코드를 고치라고 지시한다.
(위 검사를 통과했으면) 사용자 정의 도구들(린터, 타입체커, 단위 테스트, 의존성 검사, 죽은 코드 검사)
(위 검사를 통과했으면) 할 일이 더이상 없으니 인간을 호출하라고 알려준다.

인간은 AI와 상의하여 명세서에 기능과 인수테스트를 추가한다. 그러면 AI가 자기 혼자 s4 status를 실행해가며 다음 할일을 찾아서 한다.

모든 출력에는 “Recommended Next Actions” 섹션이 있고, 어떤 도구를 어떻게 호출하면 되는지나 뭘 하면 좋을지를 맥락에 맞게 제안해준다. HATEOAS와 유사. (참고: 상태없는 AI-인간 인터랙션)

한편 이 방식은 에이전트가 도구의 호출 여부를 제어하고, 호출된 도구는 LLM이 잘 하지 못하는 종류의 작업을 수행한 뒤에 도구가 잘 하지 못하는 종류의 작업을 다시 에이전트에게 구체적으로 지시하여 에이전트의 작업을 촉발한다는 점에서 재귀적 신경기호 제어 역전 패턴의 일종이라고 볼 수 있겠다.

개밥 먹기

s4가 어느 정도 만들어진 다음부터 이 프로젝트 자체에 대한 명세서를 작성하고 s4를 활용하여 이 프로젝트를 진행할 수 있게 됐다. (참고: 개밥 먹기)

다음은 s4 개발 프로젝트에 대한 S4 문서:

title: >-
  S4: A Framework for AI-Driven Software Development
mission: >-
  Enable an AI-driven yet human-steerable software development methodology by unifying specifications, tools, and processes into a cohesive framework.
vision: >-
  A world where the complexity of software creation is automated away, allowing innovators to translate their vision into working software effortlessly and reliably.
businessObjectives:
  - id: BO-0001
    description: >-
      Enable AI agents to work more effectively by providing a [[Semi-Structured Software Specification]] and tools optimized for [[AI-Driven Development]].
  - id: BO-0002
    description: >-
      Ensure consistent understanding between humans and AI agents through [[Specification as Single Source of Truth]] that eliminates ambiguity and misalignment.
  - id: BO-0003
    description: >-
      Maintain complete [[Traceability]] from business goals to implementation, ensuring every line of code serves a documented business purpose.
# MARK: Concepts
concepts:
  - id: Semi-Structured Software Specification
    description: >-
      A structured document format that describes software projects with clear [[Business Objective]]s, [[Feature]]s, and [[Acceptance Test]]s. It is designed to be both human-readable and machine-parseable. AI coding agents use this format to understand requirements and implement the [[Acceptance Test]]s and [[Feature]]s. Agents can also use CLI tools to validate the spec and keep it synchronized with the implementation.
  - id: Business Objective
    description: >-
      A high-level goal that the business wants to achieve, identified by a "BO-####" ID format, representing the "why" behind feature development and providing context to AI agents and human stakeholders.
  - id: Feature
    description: >-
      A concrete functionality that will be built to achieve business objectives, identified by a "FE-####" ID format, representing the "what" to be built and enabling AI agents and human stakeholders to understand the implementation scope.
  - id: Acceptance Test
    description: >-
      A specific, testable condition that a feature must meet to be considered complete, identified by an "AT-####" ID format, written in Given-When-Then style. AI agents and human stakeholders use this to understand the acceptance criteria for a feature.
  - id: Traceability
    description: >-
      The ability to track and verify the relationships between [[Business Objective]]s, [[Feature]]s, [[Acceptance Test]]s, and actual implementation code, ensuring that every business objective is covered by at least one feature, every feature is covered by at least one acceptance test, every acceptance test is implemented in code, and all dependencies are properly documented and validated throughout the entire AI-driven development lifecycle.
  - id: Human-in-the-Loop
    description: >-
      A development process in which AI agents autonomously execute development tasks while human stakeholders provide validation, oversight, and decision-making at critical junctures, ensuring business alignment and quality control.
  - id: AI-Driven Development
    description: >-
      A software development methodology where AI agents drive the entire development process from spec interpretation to implementation and testing, with humans providing strategic direction and validation in a [[Human-in-the-Loop]] manner.
  - id: Specification as Single Source of Truth
    description: >-
      The [[Semi-Structured Software Specification]] serves as the authoritative source for all project requirements, design decisions, and development guidance. Code is a derived artifact that must conform to and implement the spec. When discrepancies arise between the spec and the code, the spec takes precedence and the code must be modified to align with it.
# MARK: Features
features:
  - id: FE-0001
    title: Read spec from file
    description: >-
      The CLI accepts a `--spec` option that specifies the path to the spec file, defaulting to `s4.yaml` in the current directory. This lets users work with spec files directly instead of stdin.
    covers: [BO-0001, BO-0002]
  - id: FE-0002
    title: Check internal consistency of the spec
    description: >-
      The CLI performs internal consistency validation by checking that specifications provided via the `--spec` option follow required structural rules, including proper ID formats, complete [[Traceability]] across [[Business Objective]]s, [[Feature]]s, and [[Acceptance Test]]s, and the absence of circular dependencies. Error messages guide the reader (AI or human) to modify the spec to resolve inconsistencies.
    covers: [BO-0002, BO-0003]
    prerequisites: [FE-0001]
  - id: FE-0003
    title: Show overvall status of the project with comprehensive context and recommended next actions
    description: >-
      The CLI analyzes the current state of the project and provides actionable guidance on the next actions. It first shows a summarized context containing the project title, mission, vision, business objectives, and a list of features with completion stats based on passing acceptance tests. After this summary, it recommends specific next steps, helping AI agents and humans prioritize their work by addressing validation, synchronization, tests, and code quality issues in order of priority.
    covers: [BO-0001, BO-0002, BO-0003]
    prerequisites: [FE-0002, FE-0010]
  - id: FE-0004
    title: Display detailed feature information
    description: >-
      The CLI can display detailed information about a specific feature including including its title, description, covered business objectives, prerequisites, dependent features, and acceptance tests in a well-formatted markdown output.
    covers: [BO-0001, BO-0002]
    prerequisites: [FE-0002]
  - id: FE-0005
    title: Display detailed acceptance test information
    description: >-
      The CLI can display detailed information about a specific acceptance test including its description, covered features, and related business objectives in a well-formatted markdown output.
    covers: [BO-0001, BO-0002]
    prerequisites: [FE-0002]
  - id: FE-0008
    title: Locate acceptance test files
    description: >-
      The CLI can locate acceptance test files by ID using the tools configuration, helping developers quickly find their implementations.
    covers: [BO-0001]
    prerequisites: [FE-0002]
  - id: FE-0009
    title: Run individual acceptance tests
    description: >-
      The CLI can execute individual acceptance tests by ID using the tools configuration, enabling targeted test execution during development.
    covers: [BO-0001]
    prerequisites: [FE-0002]
  - id: FE-0010
    title: Run all acceptance tests
    description: >-
      The CLI can execute all acceptance tests defined in a spec using the tools configuration, enabling comprehensive test execution.
    covers: [BO-0001]
    prerequisites: [FE-0002]
  - id: FE-0013
    title: Run user-defined tools
    description: >-
      The CLI can execute tools defined in the `tools` configuration, enabling developers to run custom scripts or commands as part of the development process.
    covers: [BO-0003]
    prerequisites: [FE-0002]
  - id: FE-0014
    title: Guide spec authoring from the ground up
    description: >-
      The CLI can guide AI agent and human spec authors to write a spec that is easy to understand and maintain.
    covers: [BO-0001, BO-0002]
    prerequisites: [FE-0002]
# MARK: Acceptance Tests
acceptanceTests:
  - id: AT-0001
    covers: FE-0001
    given: a spec file
    when: the user runs "s4 validate --spec path-and-filename"
    then: the spec is read from the file
  - id: AT-0042
    covers: FE-0001
    given: a spec file "s4.yaml" exists in the current directory
    when: the user runs "s4 validate" without specifying --spec
    then: the system reads from "s4.yaml" by default
  - id: AT-0050
    covers: FE-0001
    given: a non-existent spec file
    when: the user runs "s4 validate --spec nonexistent.yaml"
    then: the command fails with an error message indicating the file does not exist
  - id: AT-0004
    covers: FE-0001
    given: a spec in YAML format
    when: the user runs "s4 validate --spec spec.yaml --format yaml"
    then: the spec is successfully parsed and validated
  - id: AT-0005
    covers: FE-0001
    given: a spec in JSON format
    when: the user runs "s4 validate --spec spec.json --format json"
    then: the spec is successfully parsed and validated
  - id: AT-0002
    covers: FE-0002
    given: a spec with multiple structural issues
    when: the user runs "s4 validate"
    then: all validation issues are detected and reported
  - id: AT-0003
    covers: FE-0002
    given: a spec with no structural issues
    when: the user runs "s4 validate"
    then: the command exits successfully with no errors
  - id: AT-0010
    covers: FE-0002
    given: a spec with no [[Business Objective]]s defined
    when: the user runs "s4 validate"
    then: error messages show that the spec has at least one [[Business Objective]] defined and provide actionable guidance
  - id: AT-0006
    covers: FE-0002
    given: a spec with uncovered [[Business Objective]]s
    when: the user runs "s4 validate"
    then: error messages show which BO-#### IDs lack covering features and provide actionable guidance
  - id: AT-0007
    covers: FE-0002
    given: a spec with circular feature dependencies
    when: the user runs "s4 validate"
    then: circular dependency is detected and reported with feature IDs
  - id: AT-0017
    covers: FE-0002
    given: a spec with uncovered [[Feature]]s
    when: the user runs "s4 validate"
    then: error messages show which FE-#### IDs lack covering acceptance tests and provide actionable guidance
  - id: AT-0018
    covers: FE-0002
    given: a spec with invalid prerequisite references
    when: the user runs "s4 validate"
    then: error messages show which features reference unknown prerequisite IDs and provide actionable guidance
  - id: AT-0019
    covers: FE-0002
    given: a spec with invalid business objective references in features
    when: the user runs "s4 validate"
    then: error messages show which features reference unknown business objective IDs and provide actionable guidance
  - id: AT-0020
    covers: FE-0002
    given: a spec with invalid feature references in acceptance tests
    when: the user runs "s4 validate"
    then: error messages show which acceptance tests reference unknown feature IDs and provide actionable guidance
  - id: AT-0023
    covers: FE-0002
    given: a spec with invalid concept references in descriptions
    when: the user runs "s4 validate"
    then: error messages show which items reference undefined concepts and provide actionable guidance
  - id: AT-0021
    covers: FE-0002
    given: a spec with duplicate IDs across business objectives, features, and acceptance tests
    when: the user runs "s4 validate"
    then: error messages show which IDs are duplicated and provide actionable guidance
  - id: AT-0022
    covers: FE-0002
    given: a spec with duplicate concept labels
    when: the user runs "s4 validate"
    then: error messages show which concept labels are duplicated and provide actionable guidance
  - id: AT-0024
    covers: FE-0002
    given: a spec with unused concepts
    when: the user runs "s4 validate"
    then: error messages show which concepts are defined but never referenced and provide actionable guidance
  - id: AT-0043
    covers: FE-0003
    given: a spec
    when: the user runs "s4 status"
    then: the system displays the project title, mission, and vision
  - id: AT-0044
    covers: FE-0003
    given: a spec
    when: the user runs "s4 status"
    then: the system displays all business objectives
  - id: AT-0045
    covers: FE-0003
    given: a spec
    when: the user runs "s4 status"
    then: the system displays a list of all features with their completion stats
  - id: AT-0035
    covers: FE-0003
    given: a spec with no issues
    when: the user runs "s4 status"
    then: the system displays the project is in good state
  - id: AT-0032
    covers: FE-0003
    given: a spec with failing acceptance tests
    when: the user runs "s4 status"
    then: the system displays few failing acceptance tests and picks the most important one to fix
  - id: AT-0008
    covers: FE-0003
    given: a spec with internal inconsistencies
    when: the user runs "s4 status"
    then: the system displays all detected issues along with actionable guidance for each
  - id: AT-0009
    covers: FE-0003
    given: a spec with mismatching acceptance tests
    when: the user runs "s4 status"
    then: the system displays a message that the files should be fixed to match the spec
  - id: AT-0030
    covers: FE-0003
    given: a spec with missing acceptance test files
    when: the user runs "s4 status"
    then: the system displays a message that missing acceptance tests need to be created first
  - id: AT-0031
    covers: FE-0003
    given: a spec with dangling acceptance test files
    when: the user runs "s4 status"
    then: the system displays a message that dangling acceptance tests need to be removed first
  - id: AT-0033
    covers: FE-0013
    given: a spec with a failing tool
    when: the user runs "s4 status"
    then: the system displays the tool failure
  - id: AT-0038
    covers: FE-0004
    given: a spec with a feature "FE-0001"
    when: the user runs "s4 info FE-0001"
    then: the system displays detailed information about the feature in markdown format
  - id: AT-0039
    covers: FE-0004
    given: a spec with no feature "FE-9999"
    when: the user runs "s4 info FE-9999"
    then: the system displays an error indicating that the feature does not exist
  - id: AT-0040
    covers: FE-0005
    given: a spec with an acceptance test "AT-0001"
    when: the user runs "s4 info AT-0001"
    then: the system displays detailed information about the acceptance test in markdown format
  - id: AT-0041
    covers: FE-0005
    given: a spec with no acceptance test "AT-9999"
    when: the user runs "s4 info AT-9999"
    then: the system displays an error indicating that the acceptance test does not exist
  - id: AT-0013
    covers: FE-0009
    given: a spec with a tools configuration
    when: the user runs "s4 run-at AT-####"
    then: the system executes the specified acceptance test and returns the results
  - id: AT-0014
    covers: FE-0008
    given: a spec with a tools configuration
    when: the user runs "s4 locate-at AT-####"
    then: the system returns the file path for the specified acceptance test
  - id: AT-0015
    covers: FE-0010
    given: a spec with a tools configuration
    when: the user runs "s4 run-ats"
    then: the system executes all acceptance tests and returns the results
  - id: AT-0046
    covers: FE-0013
    given: a spec with a tool "sometool"
    when: the user runs "s4 tool sometool" and the tool exits with code 0
    then: the system returns exit code 0
  - id: AT-0047
    covers: FE-0013
    given: a spec with a tool "sometool"
    when: the user runs "s4 tool sometool" and the tool exits with non-zero code
    then: the system returns exit code 1 and provides custom message defined in "recommendedNextActions" field
  - id: AT-0048
    covers: FE-0013
    given: a spec with multiple tools defined in order (e.g., "tool1", then "tool2")
    when: the user runs "s4 tools"
    then: the system executes all tools in the defined order
  - id: AT-0049
    covers: FE-0013
    given: a spec with tools where a preceding tool has stopOnError set to true and exits with non-zero
    when: the user runs "s4 tools"
    then: the system stops executing subsequent tools after the failing tool
  - id: AT-0051
    covers: FE-0013
    given: a spec with multiple tools
    when: the user runs "s4 status"
    then: the system runs all tools at the end of the process
  - id: AT-0062
    covers: FE-0014
    given: in any occasion
    when: the user runs "s4 guide"
    then: the system displays the brief from `guideline.yaml`
  - id: AT-0063
    covers: FE-0014
    given: no spec file exists in the current directory
    when: the user runs "s4 status"
    then: the system displays that the spec file has to be created first and suggest to run "s4 guide"
  - id: AT-0064
    covers: FE-0014
    given: in any occasion
    when: the user runs "s4 guide SECTION-NAME"
    then: the system displays the content of the section from `guideline.yaml` with examples
# MARK: Connectors
connectors:
  listAcceptanceTests: |- # Prints ATs in the format: "AT-0001: TITLE"
    npx vitest list --project=acceptance | awk -F'> ' '{split($1, parts, "/"); id=parts[length(parts)]; print id ": " $2}' | sed 's/\.test\.ts : /: /'
  locateAcceptanceTest: |-
    echo "src/at/{ID}.test.ts"
  runAcceptanceTest: |-  # TAP-flat format
    npm run --silent test:acceptance -- src/at/{ID}.test.ts
  runAcceptanceTests: |-  # TAP-flat format
    npm run --silent test:acceptance
# MARK: Tools
tools:
  - id: biome
    command: npm run --silent check:biome
  - id: tsc
    command: npm run --silent check:tsc
  - id: eslint
    command: npm run --silent check:eslint
  - id: depcruise
    command: npm run --silent check:depcruise
  - id: knip
    command: npm run --silent check:knip
  - id: jscpd
    command: npm run --silent check:jscpd
  - id: unittest
    command: npm run --silent test
    recommendedNextActions: |-
      There are failing tests. Run `npm run test` to see the details.
  - id: coverage
    command: npm run --silent test:coverage
  - id: tloc
    command: npm run --silent check:tloc
    recommendedNextActions: |-
      Test-to-production code ratio is too high. Simplify tests while keeping test coverage about the same.

      Here are some tips:

      - Simplify assertions by define DSLs using `expect.extend({})`
      - Comments and empty lines are not counted in the code calculation, so there's no need to remove them.
      - Run this command to check the longest test files: `tokei --type TypeScript -f -s code | grep "test.ts" | head -n 10`

유사한 접근들

2025-07-14 - Kiro (software): AWS의 코딩 에이전트. “스펙 기반 개발”을 표방하고 있다.
2025-06-02 - Prompts are code, .json/.md files are state