エージェント駆動開発：Copilot応用科学の超加速

AIエージェントによる知的労力の自動化

急速に進化するソフトウェアエンジニアリングの分野において、効率性の追求はしばしば画期的なイノベーションにつながります。AI研究者のTyler McGoffinは最近、この精神を象徴する取り組みについて詳述しました。それは、GitHub Copilotを用いたエージェント駆動開発を通じて、自身の知的労力を自動化するというものです。これは単にコーディングを速くするだけでなく、開発者の役割を反復的な分析から創造的な問題解決と戦略的な監督へと根本的に転換させるものです。McGoffinの経験は、エンジニアの間で見られるお決まりのパターン、すなわち退屈な作業をなくすためのツールを構築するというものに光を当てつつ、AIエージェントにこれまで手動ではスケールできなかった複雑な分析タスクを委ねることで、さらに一歩進んだアプローチを示しています。

McGoffinの着想は、彼の仕事の重要でありながら圧倒的な側面、すなわちTerminalBench2やSWEBench-Proのようなベンチマークに対するコーディングエージェントの性能分析から生まれました。これには、エージェントの思考プロセスと行動の詳細なJSONログである「軌跡（trajectories）」を分解する作業が含まれ、これは数多くのタスクとベンチマーク実行にわたって数十万行のコードに及ぶことがありました。GitHub Copilotは既にパターン認識を支援していましたが、この分析ループの反復的な性質は完全な自動化を強く求めていました。これが、「eval-agents」というシステム、つまりこの知的負担を自動化し、Copilot応用科学の彼のチームが同様の効率を達成できるように設計されたシステムの誕生につながりました。

エージェント駆動開発の青写真

'eval-agents'の構想は、コラボレーションとスケーラビリティに焦点を当てた明確な一連の原則に導かれました。McGoffinは、これらのAIエージェントを共有しやすく、作成を簡単にし、チーム貢献の主要な手段とすることを目指しました。これらの目的は、特にGitHub CLIのOSSメンテナーとしての経験を通じて磨かれたGitHubのコアバリューを反映しています。しかし、プロジェクトの方向性を真に形成し、最初の2つの目標に予期せぬ恩恵をもたらしたのは、コーディングエージェントを主要な貢献者にすることという3番目の目標でした。

エージェント型コーディング環境は、開発プロセスを効率化するためにいくつかの強力なツールを活用しました。

コーディングエージェント: Copilot CLI。直接的な対話と制御を提供します。
使用モデル: Claude Opus 4.6。高度な推論とコード生成能力を提供します。
IDE: VSCode。開発の中心となるワークスペースとして機能します。

決定的に重要だったのはCopilot SDKでした。これは既存のツール、MCPサーバー、そして新しいツールやスキルを登録するメカニズムへのアクセスを提供しました。この基盤により、コアとなるエージェント機能を再発明する必要がなくなり、チームはアプリケーション固有のロジックに集中することができました。この統合された環境は迅速な開発ループを促進し、適切なセットアップがあればAIエージェントが支援するだけでなく、開発作業のかなりの部分を推進できることを証明しました。

効果的なエージェント型コーディングのための核となる原則

エージェント駆動パラダイムへの移行は、単なるツールの導入以上のものを要求します。それは方法論の転換を必要とします。McGoffinは、開発を加速しコラボレーションを促進するために不可欠であることが証明された3つの核となる原則を特定しました。

プロンプト戦略: エージェントと効果的に対話するためには、会話的で冗長であること、そして計画を優先することが重要です。
アーキテクチャ戦略: クリーンで十分に文書化され、リファクタリングされたコードベースは、エージェントが効果的にナビゲートし貢献するために不可欠です。
反復戦略: 非難しない文化と同様に、「エージェントではなくプロセスを非難する」という考え方を受け入れることで、迅速な実験と学習が可能になります。

これらの戦略を一貫して適用した結果、驚くべき成果がもたらされました。この有効性の証として、わずか3日以内に5人の新しい貢献者が協力して11の新しいエージェント、4つの新しいスキルを追加し、プロジェクトに「eval-agentワークフロー」の概念を導入しました。この共同のスプリントにより、345ファイルにわたって**+28,858/-2,884行**という目覚ましいコード変更がもたらされ、github-agentic-workflowsの実際の深い影響が実証されました。

核となる原則の概要を以下に示します。

原則	説明	エージェント駆動開発への恩恵
プロンプト	エージェントをシニアエンジニアのように扱います。思考を導き、前提を過剰に説明し、実行前に計画モード（`/plan`）を活用します。会話的で詳細に説明します。	より正確で関連性の高い出力につながり、エージェントが複雑な問題を効果的に解決するのに役立ちます。
アーキテクチャ	リファクタリング、包括的なドキュメンテーション、堅牢なテストを優先します。コードベースをクリーンで読みやすく、適切に構造化します。デッドコードは積極的に整理します。	エージェントがコードベース、パターン、既存の機能を理解できるようにし、正確な貢献を促進します。
反復	「エージェントではなくプロセスを非難する」という考え方を採用します。間違いを防ぐためにガードレール（厳密な型付け、リンター、広範なテスト）を実装します。プロセスとガードレールを強化することで、エージェントのエラーから学びます。	迅速な反復を促進し、エージェントの貢献に対する信頼を築き、開発パイプラインを継続的に改善します。

開発の加速：実践における戦略

このエージェント駆動アプローチの成功は、これらの原則を実践的に適用することに根ざしています。

プロンプト戦略：AIエンジニアを導く

AIコーディングエージェントは強力ですが、明確に範囲が定められた問題においてその能力を発揮します。より複雑なタスクには、ジュニアエンジニアと同様にガイダンスが必要です。McGoffinは、簡潔なコマンドよりも、会話形式で対話すること、前提を説明すること、計画モードを活用することの方がはるかに効果的であることを見出しました。例えば、堅牢な回帰テストを追加する際に、「/plan 最近、Copilotが新しいパラダイムに合わせてテストを喜んで更新しているのを見ました。しかし、これらのテストは更新されるべきではありません。回帰を防ぐために、Copilotが触ることができない、あるいは予約しなければならないテストスペースをどのように作成すればよいでしょうか？」のようなプロンプトが生産的な対話を開始しました。このやり取りは、強力なclaude-opus-4-6モデルと共に行われることが多く、契約テストのガードレールのような洗練されたソリューションにつながり、これは人間のエンジニアのみが更新でき、重要な機能が保護されることを保証しました。

アーキテクチャ戦略：AI支援品質の基盤

人間のエンジニアにとって、クリーンなコードベースの維持、テストの作成、機能の文書化は、機能リリースの圧力の下で優先順位が下げられがちです。しかし、エージェント駆動開発では、これらが最も重要になります。McGoffinは、リファクタリング、ドキュメンテーション、テストケースの追加に時間を費やすことで、Copilotがコードベースをナビゲートし、貢献する能力が劇的に向上することを発見しました。エージェントファーストのリポジトリは明確さを重視します。これにより、開発者は「今知っていることを踏まえると、これをどう異なる設計にするか？」といった質問をCopilotに投げかけ、理論上のリファクタリングをAIの支援を受けて実現可能なプロジェクトに変えることさえ可能になります。このアーキテクチャの健全性への継続的な注力は、新機能が容易に提供できることを保証します。

反復戦略：エージェントだけでなくプロセスを信頼する

AIモデルの進化により、考え方は「信頼するが検証する」から、より信頼に基づく姿勢へと変化しました。これは、効果的なチームが「人を非難するのではなく、プロセスを非難する」という哲学で機能するのと似ています。エージェント駆動開発におけるこの「非難しない文化」とは、AIエージェントが間違いを犯した場合、エージェント自体を非難するのではなく、根本的なプロセスとガードレールを改善することで対応することを意味します。これには、インターフェースの整合性を保証する厳格な型付け、コード品質のための堅牢なリンター、そして広範な統合、エンドツーエンド、契約テストといった厳格なCI/CDプラクティスの実装が含まれます。これらのテストを手動で構築するのは費用がかかる場合がありますが、エージェントの支援によって実装コストが大幅に削減され、新しい変更に対する重要な信頼が得られます。これらのシステムをセットアップすることで、開発者はCopilotが自身の作業をチェックできるようにし、ジュニアエンジニアが成功するように準備されるのと同様の方法を反映させます。

エージェント駆動開発ループを習得する

これらの原則を実用的なワークフローに統合することで、強力で加速された開発ループが生まれます。

Copilotと計画する: /planを使用して新機能を開始します。計画を繰り返し検討し、コード実装前にテストとドキュメンテーションの更新が含まれ、完了していることを確認します。ドキュメンテーションはエージェントのための追加のガイドラインとして機能できます。
Autopilotで実装する: Copilotに/autopilotを使用して機能を実装させ、そのコード生成能力を活用します。
Copilot Code Reviewでレビューする: Copilotにレビューサイクルを開始するよう促します。これには、Copilot Code Reviewエージェントに要求し、そのコメントに対応し、問題が解決されるまでレビューを再要求することが含まれます。
人間のレビュー: パターンが強制され、複雑な決定が戦略的意図と一致していることを確認するために、最終的な人間のレビューを実施します。

機能ループを超えて、継続的な最適化が重要です。McGoffinは定期的にCopilotに「/plan 不足しているテスト、破損している可能性のあるテスト、デッドコードがないかコードをレビューしてください」や「/plan ドキュメンテーションとコードをレビューして、ドキュメントの欠落を特定してください」といったコマンドでプロンプトを与えます。これらのチェックは、毎週、または新機能が統合される際に実行され、エージェント駆動開発環境が健全で効率的であることを保証します。

AIによるソフトウェアエンジニアリングの未来

苛立たしい分析タスクを自動化するという個人的な探求から始まったものが、ソフトウェア開発の新しいパラダイムへと進化しました。GitHub CopilotのようなツールやClaude Opusのような高度なモデルによって推進されるエージェント駆動開発は、単に開発者を速くするだけでなく、AI研究者とソフトウェアエンジニア双方にとって仕事の性質を根本的に変えるものです。知的労力をインテリジェントなエージェントにオフロードすることで、チームは前例のないレベルの生産性、コラボレーション、イノベーションを達成し、最終的には真に進歩を推進する創造的かつ戦略的な課題に集中することができます。このアプローチは、AIエージェントが単なるツールではなく、開発チームの不可欠なメンバーとなり、ソフトウェアの構築と維持の方法を変革するエキサイティングな未来を告げます。

元の情報源

https://github.blog/ai-and-ml/github-copilot/agent-driven-development-in-copilot-applied-science/

よくある質問

What is agent-driven development in the context of GitHub Copilot?

Agent-driven development refers to a software engineering paradigm where AI agents, such as those powered by GitHub Copilot, become primary contributors and collaborators in the development process. Instead of merely suggesting code, these agents actively participate in planning, implementing, refactoring, testing, and documenting software. This approach leverages the AI's ability to automate repetitive intellectual tasks, allowing human engineers to focus on higher-level problem-solving, strategic design, and creative work, thereby accelerating development cycles and improving code quality through structured AI assistance and rigorous guardrails.

How did the 'eval-agents' project originate?

The 'eval-agents' project was born out of a common challenge faced by AI researchers: analyzing vast quantities of data. Tyler McGoffin, an AI researcher, found himself repeatedly poring over hundreds of thousands of lines of 'trajectories'—detailed logs of AI agent thought processes and actions during benchmark evaluations. Recognizing this as an intellectually toilsome and repetitive task, he sought to automate it. By applying agent-driven development principles with GitHub Copilot, he created 'eval-agents' to analyze these trajectories, significantly reducing the manual effort required and transforming a tedious analytical chore into an automated process.

What are the key components of an agentic coding setup for this approach?

An effective agentic coding setup, as demonstrated in this approach, typically includes a powerful AI coding agent like Copilot CLI, a robust underlying large language model such as Claude Opus 4.6, and a feature-rich Integrated Development Environment (IDE) like VSCode. Crucially, leveraging an SDK, such as the Copilot SDK, provides access to essential tools, servers, and mechanisms for registering new tools and skills, offering a foundational infrastructure for building and deploying agents without reinventing core functionalities. This integrated environment enables seamless interaction between the developer and the AI agent throughout the development lifecycle.

What prompting strategies are most effective when working with AI coding agents?

Effective prompting strategies for AI coding agents emphasize conversational, verbose, and planning-oriented interactions. Rather than terse problem statements, developers achieve better results by engaging agents in a dialogue, over-explaining assumptions, and leveraging the AI's speed for initial planning before committing to code changes. This involves using planning modes (e.g., '/plan') to collaboratively brainstorm solutions and refine ideas. Treating the AI agent like a junior engineer who benefits from clear guidance, context, and iterative feedback helps it to produce more accurate and relevant outputs, leading to superior problem-solving and feature implementation.

Why are architectural strategies like refactoring and documentation crucial for agent-driven development?

Architectural strategies like frequent refactoring, comprehensive documentation, and robust testing are paramount in agent-driven development because they create a clean, navigable codebase that AI agents can effectively understand and interact with. A well-maintained codebase, much like for human engineers, allows AI agents to contribute features more accurately and efficiently. By prioritizing readability, consistent patterns, and up-to-date documentation, developers ensure that Copilot can interpret the codebase's intent, identify opportunities for improvement, and implement changes with minimal errors, making feature delivery trivial and facilitating continuous re-architecture.

How does a 'blameless culture' apply to iteration strategies in agent-driven development?

Applying a 'blameless culture' to agent-driven development means shifting from a 'trust but verify' mindset to one that prioritizes 'blame process, not agents.' This philosophy acknowledges that AI agents, like human engineers, can make mistakes. The focus then shifts to implementing robust processes and guardrails—such as strict typing, comprehensive linters, and extensive integration and end-to-end tests—to prevent errors. When an agent does make a mistake, the response is to learn from it and introduce additional guardrails, refining the processes and prompts to ensure the same error isn't repeated, fostering a rapid and psychologically safe iteration pipeline.

What is the typical development loop when using agent-driven development?

The typical development loop in agent-driven development begins with planning a new feature collaboratively with Copilot using a '/plan' prompt, ensuring testing and documentation updates are integrated early. Next, Copilot implements the feature, often using an '/autopilot' command. Following implementation, a review loop is initiated with a Copilot Code Review agent, addressing comments iteratively. The final stage involves a human review to enforce patterns and standards. Outside this feature loop, Copilot is periodically prompted to review for missing tests, code duplication, or documentation gaps, maintaining a continuously optimized agent-driven environment.

What kind of impact did agent-driven development have on team productivity and collaboration?

The impact of agent-driven development on team productivity and collaboration was transformative, leading to an incredibly rapid iteration pipeline. In one instance, a team of five new contributors, using this methodology, created 11 new agents, four new skills, and implemented complex workflows in less than three days. This amounted to a staggering change of +28,858/-2,884 lines of code across 345 files. This dramatic increase in output highlights how agent-driven development, by automating routine tasks and providing intelligent assistance, significantly accelerates feature delivery, fosters deeper collaboration, and enables teams to achieve unprecedented levels of innovation and efficiency.