工厂模式:编码智能体如何改变软件工程

工厂模式:编码智能体如何改变软件工程
The Factory Model: How Coding Agents Changed Software Engineering
作者:Addy Osmani|编译:技术前沿
最近,智能体工程领域发生了一次「阶跃变化」。你不再仅仅是编写代码,而是在建造用于构建软件的工厂。Addy Osmani 在这篇深度文章中提出了「工厂模式」的心智模型,系统阐述了编码智能体三代的演进、规范作为杠杆的核心洞见、TDD 的新重大性,以及高杠杆工程师的六项核心能力。全文中英对照呈现。
感觉到的转变
The Shift That's Felt
智能体工程领域发生了一些根本性的变化。这并非工具渐进式改善或工作流逐步演变的常规变化,而是一次阶跃变化。有数十年开发经验的开发者们用一样的方式描述它:这门手艺的重心已经转移了。
Something fundamental has shifted in agentic engineering. It doesn't feel like the usual gradual tool improvement or workflow evolution. It feels like a step change. Developers with decades of experience are describing it in the same terms: the center of gravity of the craft has moved.
当前最有用的做法是同时保持两种认知的张力:编程已经发生巨变,但软件工程的核心并未改变。两者之间的差距正是有趣的故事所在,清晰理解这一点,是在这个时代蓬勃发展的工程师与被时代抛下的工程师之间的区别。
The most useful framing right now is to hold two thoughts simultaneously: programming has changed dramatically, but the core of software engineering has not. The gap between those two is where the interesting story lives, and understanding it clearly is the difference between engineers who thrive in this era and engineers who get left behind.
抽象层级的发展弧线
The Arc of Abstraction
软件工程的历史就是抽象层级不断提升的历史。我们从比特发展到指令,从指令到函数,从函数到对象,从对象到服务,从服务到分布式系统。每一次栈的跃迁都提高了个体开发者的生产力,并扩大了能够参与构建软件的人群范围。
The history of software engineering is a history of rising abstraction. We moved from bits to instructions, from instructions to functions, from functions to objects, from objects to services, from services to distributed systems. Each jump up the stack increased individual developer productivity and expanded the population of people who could participate in building software.
我们当前正在经历的是同一弧线上的又一步。我们正从编写代码转向编排能够编写代码的系统。Grady Booch 将其称为软件的第三个时代——一个由抽象层级上升定义的新黄金时代,开发者的工作从编写指令转变为定义意图。
What we're experiencing now is another step on that same arc. We're moving from writing code to orchestrating systems that write code. Grady Booch frames this as the third era of software — a new golden age defined by rising abstraction where the developer's job shifts from writing instructions to defining intent.
AI 编程工具的三代演进
Three Generations of AI Coding Tools
第一代:加速的自动补全 / Accelerated Autocomplete
预测下一行代码、填充样板代码、节省重复模式。有用,的确 省时。但工作流保持不变:你主导,工具辅助。AI 只是减少了循环内的摩擦。
Predicts the next line, fills boilerplate, saves keystrokes on repetitive patterns. Useful, genuinely time-saving. But the workflow stays the same: you lead, the tool assists. AI just reduces friction within the loop.
第二代:同步智能体 / Synchronous Agents
你用自然语言描述任务,模型生成代码。你审查、修正、迭代直到获得可工作的结果。输入更少,更多是描述意图。但你依旧参与每一步。智能体是协作者,而非自主工作者。
You describe a task in natural language, the model generates code. You review, correct, iterate until you have something working. You're still in every step. The agent is a collaborator, not an autonomous worker.
第三代:自主智能体 / Autonomous Agents
可以接受一个规范并持续运行几十分钟甚至数天。设置环境、安装依赖、编写测试、遭遇失败、在线研究解决方案、修复失败、编写实现、再次测试、设置服务,并生成可供审查的制品。你不再逐行交互,而是在定义结果和审查成果。
Can take a spec and run for minutes or hours. Set up environments, install deps, write tests, hit failures, research solutions, fix failures, write implementations, test again, set up services, and generate artifacts for review.
三个月前还是周末项目的任务,目前可能只需启动并在三十分钟后检查结果。
Tasks that would have been weekend projects three months ago can now be kicked off and checked on in thirty minutes.
工厂心智模型
The Factory Mental Model
你不再仅仅是编写代码,你是在建造用于构建软件的工厂。
You're not just writing code anymore. You're building factories that build code.
这个工厂由智能体舰队组成。每个智能体都有任务、工具带(代码库、测试运行器、部署脚本、文档)、上下文(规范、架构决策、先前的约束)和反馈循环。你不再是手把手指导单个智能体完成单个任务,而是并行启动多个智能体。一个处理后端重构,另一个实现功能,另一个编写集成测试,另一个更新文档。
This factory consists of a fleet of agents. Each agent has a task, a toolbelt (codebase access, test runner, deploy scripts, docs), context (specs, architecture decisions, prior constraints), and a feedback loop. Instead of hand-holding a single agent through a single task, you spin up multiple agents in parallel. One handles backend refactoring, another implements a feature, another writes integration tests, another updates docs.
这个类比比初看起来更深刻。工厂有质量控制,有流程文档,有需要准确指定的输入否则产出会出错,当环境不可靠时工厂会停滞。所有这些属性都直接映射到智能体软件开发上,认真对待这个类比会指引你进行真正重大的投资。
The analogy runs deeper than it first appears. Factories have quality control, process documentation, inputs that need to be precisely specified or the output is wrong, and they stall when the environment is unreliable. All of these map directly to agentic software development, and taking the analogy seriously points you toward investments that actually matter.
与新人入职的类比
The Onboarding Analogy
智能体实际行为中最引人注目的模式之一是,其工作循环与 onboarding 一位新工程师的过程超级类似。你交给它们一份规范。它们将其分解为子任务。它们探索代码库以了解情况。当遇到困难时,它们搜索提交记录。它们运行 git blame 来找出最后修改某个子系统的人。它们通过 Slack 向合适的人类请求领域知识,然后继续工作。它们迭代直到产出符合验收标准。
One of the most striking patterns in how agents actually behave is that their work cycle closely mirrors the process of onboarding a new engineer. You hand them a spec. They break it down. They explore the codebase to understand context. When stuck, they search commit history. They run git blame. They Slack the right human for domain knowledge, then keep working. They iterate until the output meets acceptance criteria.
Slack 和电子邮件正在成为人类与智能体之间的接口,而不仅仅是人与人之间的接口。Git 历史正在演变成智能体为理解架构决策而导航的知识图谱。文档正在成为自主执行的培训材料。
Slack and email are becoming interfaces between humans and agents, not just between humans. Git history is evolving into a knowledge graph that agents navigate to understand architectural decisions. Documentation is becoming training material for autonomous execution.
如果你想清楚目前应该对代码库进行哪些投资,可以问自己:一位新工程师,仅凭现有的文档和提交记录,能否理解代码为何如此构建?如果答案是否定的,那么智能体在那里也会遇到困难,你可能获得的杠杆作用将是有限的。
If you want to think clearly about what to invest in your codebase right now, ask yourself: could a new engineer, looking only at existing docs and commit history, understand why the code is structured the way it is? If the answer is no, agents will struggle there too, and your leverage will be limited.
你的规范就是杠杆
Your Spec Is the Leverage
如果你能编排二十、三十甚至五十个并行运行的智能体,那么平庸产出与卓越产出之间的差异几乎完全取决于你的规范质量。在这种规模下,模糊的思维不仅会拖慢速度,还会成倍放大。模糊的需求会通过数十个并行自主运行传播,每一个都会在略微不同的方向上出点小错。前期做出的糟糕架构决策不会只影响一个实现,而是会波及整个舰队。
If you can orchestrate twenty, thirty, or fifty agents running in parallel, the difference between mediocre output and excellent output is almost entirely determined by the quality of your spec. At that scale, vague thinking doesn't just slow things down — it multiplies. Vague requirements propagate through dozens of parallel autonomous runs, each deviating in slightly different ways. Bad architectural decisions made upfront don't affect one implementation — they cascade across the entire fleet.
规范不再是简单的提示,规范是明确化的产品思维。
A spec isn't a prompt anymore. A spec is product thinking made explicit.
这就是为什么强劲的软件工程师从这些工具中获得的杠杆作用不是减少,而是更多。编写代码的机械性工作正在被自动化。理解系统的认知性工作正在被放大。你目前花在培养真正架构理解和系统思维上的每一个小时,其回报将体目前整个自主工作舰队上,而不仅仅是你个人的产出。
This is why strong software engineers get more leverage from these tools, not less. The mechanical work of writing code is being automated. The cognitive work of understanding systems is being amplified. Every hour you spend cultivating genuine architectural understanding and systems thinking pays back across an entire fleet of autonomous workers, not just your own output.
什么并未真正改变
What Hasn't Actually Changed
请思考智能体开发依旧需要你提供什么:清晰的需求、强劲的抽象、可靠的测试、谨慎的权衡和人工监督。这些都是经典的软件工程技能。围绕 AI 编码的炒作可能造成传统技能已被淘汰的印象。实际并非如此。
Think about what agent-based development still requires from you: clear requirements, strong abstractions, reliable tests, careful tradeoffs, and human oversight. These are classic software engineering skills. The hype around AI coding may create the impression that traditional skills have been obsoleted. They haven't.
- 清晰的需求 — 如果你无法以可评估的方式阐明成功是什么样子,再多的自主执行也无法产生它。智能体无法澄清从未被赋予的需求。
- 强劲的抽象 — 清晰的架构不会由于智能体在进行实现而变得不那么重大,而是变得更有价值,由于智能体会放大其所处系统的特性。
- 可靠的测试 — 这一点值得单独讨论。
- 谨慎的权衡 — 智能体针对既定目标进行优化。它们不会自然地平衡相互竞争的关注点,预测二阶效应,或在技术上正确的解决方案是错误的产品决策时发出警告。这种判断力依旧在你身上。
- 人工监督 — 输出质量高到足以通过粗略审查,这意味着你的审查技能门槛实际上是提高了,而非降低。
为何测试比以往任何时候都更重大
Why Testing Matters More Than Ever
良好的测试和 TDD 过去已经是优秀实践。在智能体工作流中,它们变得近乎强制性。红/绿 TDD 意味着你在编写实现之前先编写测试。你确认测试失败(红阶段),然后迭代实现直到测试通过(绿阶段)。这个顺序不是可选的仪式,它是让你确信实现的确 在做你认为它应该做的事情的机制。
Good testing and TDD used to be best practice. In agent workflows, they're closer to mandatory. Red/Green TDD means you write the test before the implementation. You confirm the test fails (red), then iterate the implementation until it passes (green). This sequence isn't an optional ritual — it's the mechanism that gives you confidence that the implementation is actually doing what you think it should.
当一支智能体舰队在数十个并行任务中生成代码时,成本会严重复合。一个以通过测试为优化目标的智能体会找到通过测试的方法。如果测试是在实现之后编写的,它们很可能测试的是实现恰好做了什么,而不是它应该做什么。一个全面的、测试优先的套件是你确保自主输出的确 正确的最有效杠杆。
When a fleet of agents generates code across dozens of parallel tasks, costs compound severely. An agent optimized for passing tests will find a way to pass them. If tests are written after the implementation, they'll likely test what the implementation happens to do, not what it should do. A comprehensive, test-first suite is your most effective leverage for ensuring autonomous output is actually correct.
在任务开始时告知智能体使用红/绿 TDD 是你可以给出的最高杠杆的指令之一。
Telling an agent at the start of a task to use Red/Green TDD is one of the highest-leverage instructions you can give.
未解决的问题是验证,而非生成
The Unsolved Problem Is Verification, Not Generation
生成不再是瓶颈,验证才是。
Generation is no longer the bottleneck. Verification is.
智能体可以产生令人印象深刻的输出。挑战在于如何有把握地知道该输出是否正确。测试在变更前通过,并不意味着它们能捕获变更引入的回归。上下文窗口的限制意味着在大型代码库上工作的智能体可能会错过重大约束或模式。不稳定的环境,单个开发者遇到时是恼人的边缘情况,但当你有四十个智能体同时遇到同一个不稳定的测试时,就会成为系统性阻塞点。工厂停滞了。
Agents can produce impressive output. The challenge is knowing with confidence whether that output is correct. Tests passing before a change doesn't mean they catch regressions the change introduces. Context window limits mean agents working on large codebases can miss important constraints or patterns that exist outside their current reasoning window. Flaky tests that are an annoying edge case for one developer become a systemic blocker when forty agents hit the same flaky test simultaneously. The factory stalls.
在验证赶上生成之前,人工审查不是可选的开销,而是安全系统。对令人印象深刻的智能体输出的恰当反应不是由于它看起来不错就信任它,而是凭借架构理解和测试纪律来严格评估它。
Until verification catches up with generation, human review isn't optional overhead — it's the safety system. The right response to impressive agent output isn't to trust it because it looks good, but to rigorously evaluate it with architectural understanding and test discipline.
高杠杆工程的新形态
The New Shape of High-Leverage Engineering
在这个时代将产生最大影响的工程师,其区分标准将不是打字速度或多好地记住语法,而是一套不同的能力:
The engineers who will have the most impact in this era won't be distinguished by typing speed or syntax memorization, but by a different set of capabilities:
系统思维 / Systems Thinking
将复杂架构置于脑中、理解组件如何交互、预见一处更改如何影响其他地方行为的能力。比培养打字速度更难,但在管理需要你整合输出的智能体舰队时,价值也大得多。
Holding complex architecture in your head, understanding how components interact, anticipating how a change in one place affects behavior elsewhere.
问题分解 / Problem Decomposition
知道如何将庞大模糊的目标分解成智能体可以可靠执行的范围明确的子任务。良好分解问题,然后验证分解是否正确,是一门真正的技艺。
Knowing how to break a large, ambiguous goal into well-scoped sub-tasks that an agent can reliably execute.
架构判断力 / Architectural Judgment
理解系统为何如此设计、优化了哪些属性、做出了哪些权衡。智能体可以实施,但无法判断设计是否正确。
Understanding why a system is designed the way it is, what properties it optimizes, and what tradeoffs were made.
规范清晰度 / Spec Clarity
编写明确无误、对重大边缘情况完整、且结构上便于评估的需求。模糊的规范产生模糊的结果,准确的规范成倍放大为准确的实现。
Writing requirements that are unambiguous, complete on important edge cases, and structured for evaluability.
输出评估能力 / Output Evaluation
识别某个东西看起来正的确 则不然、某个实现解决了既定问题却创造了新问题的品味。这种判断力无法自动化。
The taste to identify when something looks correct but isn't, when an implementation solves the stated problem but creates a new one.
编排技能 / Orchestration
管理多个并行工作流、对智能体输出提供有效反馈、识别智能体是需要被引导还是需要重新分配任务、在自主工作者舰队中保持连贯性的实践能力。
Managing multiple parallel workflows, giving effective feedback on agent output, keeping coherence across a fleet of autonomous workers.
这些并非全新的技能,优秀的工程师一直都需要它们。改变的是它们的相对重大性。软件开发的机械部分正越来越多地由机器处理,认知部分正在被放大。
These aren't new skills. Good engineers have always needed them. What's changed is their relative importance. The mechanical parts of software development are increasingly handled by machines. The cognitive parts are being amplified.
更大的图景
The Bigger Picture
新网站创建量同比增长 40%。新 iOS 应用增长近 50%。GitHub 代码推送量在美国跃升 35%。所有这些指标在 2024 年末之前多年持平。图表看起来像曲棍球杆曲线。从未写过一行代码的人正在构建和发布软件。
New website creation is up 40% year over year. New iOS apps are up nearly 50%. GitHub code pushes have jumped 35% in the US. All of these metrics were flat for years before late 2024. The charts look like hockey sticks. People who have never written a line of code are building and shipping software.
更多的数量并不必定意味着更好的质量。但实际是,创建软件的门槛已经显著降低,这是软件工程领域的根本性转变。重大的技能已经向上转移到栈的更高层,就像之前的每一次转变一样。
More volume doesn't necessarily mean better quality. But the fact that the barrier to creating software has meaningfully dropped is a fundamental shift in the landscape of software engineering. The skills that matter have shifted up the stack, just like every previous transition.
结语
Conclusion
编程主要作为一种击键活动的时代已经结束。编程主要作为一种思考和判断活动的时代已经加速了数十年,并且刚刚进入了更高的档位。
The era of programming primarily as a typing activity is over. The era of programming primarily as a thinking and judgment activity has been accelerating for decades and has just shifted into a higher gear.
工厂模式不是一个关于失去软件控制权的隐喻,而是一个关于建立杠杆的隐喻。理解这一点的工程师将构建未来十年最有趣的东西。
The factory model is not a metaphor about losing control of software. It's a metaphor about building leverage. Engineers who understand this will build the most interesting things of the next decade.
原文地址 / Original Source
作者:Addy Osmani(Google 软件工程师,Google Cloud / Gemini)
https://addyosmani.com/blog/factory-model/
发布于 2026 年 2 月 25 日 | 版权归原作者所有 | 本文仅供学习交流
整理编译:技术前沿|原文遵循原作者发布规范
如有勘误欢迎留言指正