網易首頁 > 網易號 > 正文 申請入駐

Anthropic 官方指南:怎么給 Agent 設計工具

0
分享至

BLOG

本文翻譯自 Anthropic 官方博客「Seeing like an agent: how we design tools in Claude Code」,作者 Thariq Shihipar,Claude Code 團隊工程師,今天發布

以下為逐段中英對照翻譯

構建 Agent 最難的部分之一:設計工具

One of the hardest parts about building an agent harness is constructing its tools.

構建 Agent harness 最困難的部分之一,是設計它的工具集

Claude acts completely through tool calling, but there are a number of ways tools can be constructed in the Claude API with primitives like bash, skills and code execution.

Claude 完全通過工具調用來行動。在 Claude API 中,工具可以用 bash、skills、代碼執行等基礎原語來構建

So how do you design your agents' tools? Do you give it one general-purpose tool like bash or code execution? Or fifty specialized tools, one for each use case?

那你該怎么給 Agent 設計工具?給它一個通用工具(比如 bash 或代碼執行)就夠了?還是做五十個專用工具,每個場景一個?

To put yourself in the mind of the model, imagine being given a difficult math problem. What tools would you want in order to solve it? It would depend on your own skill set!

要站在模型的角度想這個問題,可以想象你面前有一道很難的數學題。你想要什么工具來解決它?答案取決于你自己的能力

Paper would be the minimum, but you'd be limited by manual calculations. A calculator would be better, but you would need to know how to operate the more advanced options. The fastest and most powerful option would be a computer, but you would have to know how to use it to write and execute code.

一張紙是最低配,但你只能手算。計算器好一些,但你得知道怎么用高級功能。最快最強的選擇是電腦,但你得會用它來寫和執行代碼

This is a useful framework for designing your agent. You want to give it tools that are shaped to its own abilities. But how do you know what those abilities are? You pay attention, read its outputs, experiment. You learn to see like an agent.

這是一個很有用的設計框架。你要給 Agent 的工具,應該貼合它自身的能力形狀。但你怎么知道它的能力是什么?你觀察它,讀它的輸出,反復實驗。你學會「像 Agent 一樣看」

If you're building an agent, you'll face the same questions we did: when to add a tool, when to remove one, and how to tell the difference. Here's how we've answered them while building Claude Code, including where we got it wrong first.

如果你在做 Agent,你會面對和我們一樣的問題:什么時候加工具,什么時候刪工具,怎么區分這兩種情況。下面是我們在 Claude Code 的實際經驗,包括一開始做錯的地方

用 AskUserQuestion 工具改善提問能力


三種方案的光譜:從無結構到過度剛性,AskUserQuestion 工具落在中間

When building the AskUserQuestion tool, our goal was to improve Claude's ability to ask questions (often called elicitation).

設計 AskUserQuestion 工具時,我們的目標是提升 Claude 向用戶提問的能力(通常稱為 elicitation)

While Claude could just ask questions in plain text, we found answering those questions felt like they took an unnecessary amount of time. How could we lower this friction and increase the bandwidth of communication between the user and Claude?

雖然 Claude 可以用純文本提問,但我們發現回答這些問題的體驗很差,耗時太多。怎么降低這個摩擦,提升用戶和 Claude 之間的溝通帶寬?

第一次嘗試:修改 ExitPlanTool

The first approach we tried was adding a parameter to the ExitPlanTool to have an array of questions alongside the plan. This was the easiest fix to implement, but it confused Claude because we were simultaneously asking for a plan and a set of questions about the plan. What if the user's answers conflicted with what the plan said? Would Claude need to call the ExitPlanTool twice? We knew this tactic wouldn't work, so we went back to the drawing board.

我們第一個方案是給 ExitPlanTool 加一個參數,讓它在輸出計劃的同時輸出一組問題。這是最省事的改法,但它讓 Claude 很困惑:我們同時要求它做計劃和對計劃提問。如果用戶的回答和計劃矛盾怎么辦?Claude 是不是得調兩次這個工具?我們知道這個方案行不通,于是回到原點

第二次嘗試:改變輸出格式

Next, we tried updating Claude's output instructions to serve a slightly modified markdown format that it could use to ask questions. For example, we could ask it to output a list of bullet point questions with alternatives in brackets. We could then parse and format that question as UI for the user.

接下來,我們嘗試修改 Claude 的輸出指令,讓它用一種特殊的 Markdown 格式來提問。比如用 bullet point 列出問題,每個問題后面用方括號給出選項。然后前端解析這個格式,渲染成 UI

Claude could usually produce this format, but not reliably. It would append extra sentences, drop options, or abandon the structure altogether. Onto the next approach.

Claude 大部分時候能生成這個格式,但不穩定。它會在末尾多加一句話,漏掉選項,或者干脆不用這個格式。下一個方案

第三次嘗試:AskUserQuestion 工具


AskUserQuestion 工具的實際界面

Finally, we landed on creating a tool that Claude could call at any point, but it was particularly prompted to do so during plan mode. When the tool triggered we would show a modal to display the questions and block the agent's loop until the user answered.

最終方案是做一個獨立的工具,Claude 可以在任何時候調用,但在規劃模式中會被特別引導去使用。工具觸發后彈出一個模態框顯示問題,阻塞 Agent 循環直到用戶回答

This tool allowed us to prompt Claude for a structured output and it helped us ensure that Claude gave the user multiple options. It also gave users ways to compose this functionality, for example calling it in the Agent SDK or using referring to it in skills.

這個工具讓我們能引導 Claude 輸出結構化內容,確保給用戶多個選項。它也給了用戶組合使用的空間,比如在 Agent SDK 或 Skills 中引用它

Most importantly, Claude seemed to like calling this tool and we found its outputs worked well. After all, even the best designed tool doesn't work if Claude doesn't understand how to call it.

最關鍵的一點:Claude 喜歡調用這個工具,輸出質量也好。畢竟,再好的工具設計,如果模型不理解怎么調用,也是白搭

Is this the final form of elicitation in Claude Code? We doubt it. As Claude gets more capable, the tools that serve it have to evolve too. The next section shows a case where a tool that once helped started getting in the way.

這是 Claude Code 中 elicitation 的最終形態嗎?大概不是。隨著 Claude 能力提升,服務它的工具也必須跟著演進。下一節會展示一個曾經有用的工具后來開始礙事的案例

跟隨能力迭代:從 Todos 到 Tasks


從 Todos 到 Tasks:單 Agent 線性清單 → 多 Agent 協作任務圖

When we first launched Claude Code, we realized that the model needed a todo list to keep it on track. Todos could be written at the start and checked off as the model did work. To do this we gave Claude the TodoWrite tool, which would write or update Todos and display them to the user.

Claude Code 剛上線時,我們發現模型需要一個待辦清單來保持專注。開工前列好待辦,做完一項勾一項。我們做了 TodoWrite 工具來實現這個功能

But even then, we often saw Claude forgetting what it had to do. To adapt, we inserted system reminders every 5 turns that reminded Claude of its goal.

即便如此,Claude 還是經常忘記該干什么。我們于是每隔 5 輪對話就插一條系統提醒

As models improved, they found To-do lists limiting. Being sent reminders of the todo list made Claude think that it had to stick to the list instead of modifying it when it realized it needed to change course. We also saw Opus 4.5 also get much better at using subagents, but how could subagents coordinate on a shared todo list?

隨著模型迭代,Todo 列表開始礙事。系統提醒讓 Claude 覺得必須嚴格按清單執行,不敢中途調整方向。Opus 4.5 用子 Agent 的能力大幅提升,但多個子 Agent 怎么共享一個 Todo 列表?

Seeing this, we replaced the TodoWrite feature with the Task tool. Whereas todos are focused on keeping the model on track, tasks help agents communicate with each other. Tasks could include dependencies, share updates across subagents and the model could alter and delete them.

看到這些問題,我們把 TodoWrite 替換成了 Task 工具。Todo 的重點是讓模型保持方向,Task 的重點是讓 Agent 之間互相溝通。Task 支持依賴關系,可以跨子 Agent 共享狀態更新,模型可以隨時修改和刪除

模型能力提升之后,曾經需要的工具可能反過來限制它

As model capabilities increase, the tools that your models once needed might now be constraining them. It's important to constantly revisit previous assumptions on what tools are needed. This is also why it's useful to stick to a small set of models to support that have a fairly similar capabilities profile.

隨著模型能力提升,你的模型曾經需要的工具現在可能反過來在限制它。定期回頭審視「這些工具是否還有必要」很重要。這也是為什么建議只支持少量能力相近的模型,這樣工具設計可以聚焦

設計搜索界面

The most consequential tools we've built are the ones that let Claude find its own context.

我們做過的最有影響力的工具,是那些讓 Claude 自己尋找上下文的工具

When Claude Code was first released internally, we used RAG: a vector database would pre-index the codebase, and the harness would retrieve relevant snippets and hand them to Claude before each response. While RAG was powerful and fast, it required indexing and setup and could be fragile across a host of different environments. Most importantly, Claude was given this context instead of finding the context itself.

Claude Code 內部版本最早用的是 RAG:向量數據庫預先索引代碼庫,每次回復前自動檢索相關片段塞給 Claude。RAG 速度快、效果好,但需要預處理,環境兼容性脆弱。最根本的問題是:上下文是被塞給 Claude 的,不是 Claude 自己找的

But if Claude could search on the web, why couldn't it also search your codebase? By giving Claude a Grep tool, we could let it search for files and build context itself.

如果 Claude 能搜網頁,為什么不能搜代碼庫?給 Claude 一個 Grep 工具,就能讓它自己搜文件、自己構建上下文

As Claude gets smarter, it becomes increasingly good at building its context when given the right tools.

Claude 越聰明,給它合適的工具后它就越擅長自己構建上下文

When we introduced Agent Skills, we formalized the idea of progressive disclosure, which allows agents to incrementally discover relevant context through exploration.

Agent Skills 上線后,我們把這個思路正式化為漸進式披露(progressive disclosure):讓 Agent 通過探索逐步發現相關上下文

Claude could now read skill files and those files could then reference other files that the model could read recursively. In fact, a common use of skills is to add more search capabilities to Claude like giving it instructions on how to use an API or query a database.

Claude 現在可以讀 Skill 文件,Skill 文件可以引用其他文件,模型可以遞歸地發現和加載上下文。一個常見的 Skill 用法就是給 Claude 增加搜索能力:告訴它怎么調 API、怎么查數據庫

Over the course of a year, Claude went from not really being able to build its own context to being able to do nested search across several layers of files to find the exact context it needed.

一年時間,Claude 從幾乎不會自己構建上下文,到能在多層文件中嵌套搜索,精確找到需要的信息

Progressive disclosure is now a common technique we use to add new functionality without adding a tool. In the next section, we explain why.

漸進式披露現在是我們常用的一種技術:不加工具就能加功能。下一節解釋具體怎么做

漸進式披露:Claude Code Guide 子 Agent

Claude Code currently has ~20 tools, and our team frequently revisits if we need all of them for Claude to be most effective. The bar to add a new tool is high, because this gives the model one more option to think about.

Claude Code 目前有大約 20 個工具,團隊經常審視是否每個都有必要。加新工具的門檻很高,因為每多一個工具,模型就多一個需要思考的選項

For example, we noticed that Claude did not know enough about how to use Claude Code. If you asked it how to add a MCP or what a slash command did, it would not be able to reply.

比如,我們發現 Claude 不夠了解 Claude Code 自身的功能。你問它怎么加 MCP、某個斜杠命令是什么意思,它答不上來

We could have put all of this information in the system prompt, but given that users rarely asked about this, it would have added context rot and interfered with Claude Code's main job: writing code.

可以把這些信息全塞進 system prompt,但用戶很少問這類問題,塞進去會造成上下文腐蝕,干擾 Claude 的主要工作(寫代碼)

Instead, we tried progressive disclosure: we gave Claude a link to its docs that it could load and search when needed. This worked, but Claude would pull large chunks of documentation into context to find an answer the user could have gotten in one sentence.

我們嘗試漸進式披露:給 Claude 一個指向文檔的鏈接,需要時自己去查。能用,但 Claude 會把大段文檔拉進上下文,只為回答一個一句話就能搞定的問題

So we built the Claude Code Guide — a subagent Claude calls whenever a user asks about Claude Code itself. The subagent does the doc-searching in its own context, follows detailed instructions on how to search and what to extract, and hands back only the answer. The main agent's context stays clean.

最終我們做了一個Claude Code Guide子 Agent。當用戶問 Claude Code 自身的問題時,主 Agent 把請求轉給這個子 Agent。子 Agent 在自己的上下文里搜索文檔、提取答案,只把答案傳回來。主 Agent 的上下文保持干凈

While this isn't a perfect solution (Claude can still get confused when you ask it about how to set itself up), we were able to add things to Claude's action space without adding a new tool.

這個方案不完美(Claude 有時候還是會在自身配置問題上犯糊涂),但關鍵是:不用加新工具,就能擴展 Agent 的能力范圍

像 Agent 一樣看,是手藝活

Designing the tools for your models is as much an art as it is a science. It depends heavily on the model you're using, the goal of the agent and the environment it's operating in.

給模型設計工具,與其說是科學,更接近手藝。它取決于你用的模型、Agent 的目標、運行的環境

Our best advice? Experiment often, read your outputs, try new things. And most importantly, try to see like an agent.

我們最好的建議?多實驗,讀你的輸出,試新東西。最重要的是,學會像 Agent 一樣看

Experiment often, read your outputs, try new things. And most importantly, try to see like an agent.

https://claude.com/blog/seeing-like-an-agent

作者:Thariq Shihipar,Anthropic 工程師,Claude Code 團隊

特別聲明:以上內容(如有圖片或視頻亦包括在內)為自媒體平臺“網易號”用戶上傳并發布,本平臺僅提供信息存儲服務。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.

相關推薦
熱點推薦
河南一男子因病偏癱,覺得虧欠妻子主動離婚,女兒擺酒席慶祝:他們開心就好,離婚不離家,母親繼續照顧父親,房車等全部財產都在母親名下

河南一男子因病偏癱,覺得虧欠妻子主動離婚,女兒擺酒席慶祝:他們開心就好,離婚不離家,母親繼續照顧父親,房車等全部財產都在母親名下

洪觀新聞
2026-04-20 16:20:08
豆芽立大功!浙科大實證:豆芽可通過菌群代謝,減少84%腹部脂肪!

豆芽立大功!浙科大實證:豆芽可通過菌群代謝,減少84%腹部脂肪!

科學認識論
2026-04-20 14:45:02
浙江一男子稱花1.02元參加“魔鬼辣”挑戰,吃完半小時痙攣倒地送醫,商家朋友:他是個慣犯,涉嫌敲詐;市監所:商家食材索證索票完整

浙江一男子稱花1.02元參加“魔鬼辣”挑戰,吃完半小時痙攣倒地送醫,商家朋友:他是個慣犯,涉嫌敲詐;市監所:商家食材索證索票完整

中國能源網
2026-04-21 18:19:07
阿圭羅:剩下6輪曼城能全勝;巴黎有很大機會衛冕歐冠

阿圭羅:剩下6輪曼城能全勝;巴黎有很大機會衛冕歐冠

懂球帝
2026-04-22 00:31:07
黃山市一位副鄉長發了16條私信,把知名主播“磨”進大山里賣筍,知名演員鄧超也來了

黃山市一位副鄉長發了16條私信,把知名主播“磨”進大山里賣筍,知名演員鄧超也來了

揚子晚報
2026-04-21 07:26:40
林濤卸任國務院副秘書長,已任廈門市委書記(附簡歷)

林濤卸任國務院副秘書長,已任廈門市委書記(附簡歷)

中國城市報
2026-04-21 22:11:52
美媒:伊朗最高領袖穆杰塔巴已批準同美方進行談判,白宮20日一整天都在等德黑蘭,如出現進展跡象,特朗普也可能同意延長停火期限

美媒:伊朗最高領袖穆杰塔巴已批準同美方進行談判,白宮20日一整天都在等德黑蘭,如出現進展跡象,特朗普也可能同意延長停火期限

極目新聞
2026-04-21 09:44:58
男子網上偶遇多年前婚禮后消失的新娘 討還錢款不成將其殺害 一審被判死刑

男子網上偶遇多年前婚禮后消失的新娘 討還錢款不成將其殺害 一審被判死刑

紅星新聞
2024-08-04 15:19:07
賺錢的第一性原理,就是不干活。

賺錢的第一性原理,就是不干活。

流蘇晚晴
2026-04-20 20:08:05
26歲中國男子在日本札幌失聯10天 姐姐:失聯當天曾與家人通話商量婚禮事宜

26歲中國男子在日本札幌失聯10天 姐姐:失聯當天曾與家人通話商量婚禮事宜

封面新聞
2026-04-21 16:15:04
賽季打完,5位小角色打出身價:阿夫頂薪了,小里拒絕肥約賭對了

賽季打完,5位小角色打出身價:阿夫頂薪了,小里拒絕肥約賭對了

大西體育
2026-04-20 23:32:49
日本正式允許出口殺傷性武器,外交部:嚴重關切,高度警惕

日本正式允許出口殺傷性武器,外交部:嚴重關切,高度警惕

澎湃新聞
2026-04-21 15:34:26
直線拉升!美國、伊朗,傳來大消息!

直線拉升!美國、伊朗,傳來大消息!

數據寶
2026-04-21 21:46:32
太扎心了!上海男子年薪百萬失業引不滿,新婚3個月女子就想離婚

太扎心了!上海男子年薪百萬失業引不滿,新婚3個月女子就想離婚

火山詩話
2026-04-20 06:12:18
沙媒:馬寧在亞冠1/4決賽出現失誤,已被取消亞冠決賽執法資格

沙媒:馬寧在亞冠1/4決賽出現失誤,已被取消亞冠決賽執法資格

懂球帝
2026-04-21 12:40:40
美國發現一個“秘密”:每次對華加征關稅,中國就去找非洲,為何

美國發現一個“秘密”:每次對華加征關稅,中國就去找非洲,為何

泠泠說史
2026-04-21 21:59:17
正式告別!樊振東德甲首季收官,扣除費用后實拿薪資大曝光

正式告別!樊振東德甲首季收官,扣除費用后實拿薪資大曝光

老幡爆笑大聰明
2026-04-20 19:45:46
羽毛球女神淪為“生育工具”!韓景楓官宣二胎,距離1胎僅隔5個月

羽毛球女神淪為“生育工具”!韓景楓官宣二胎,距離1胎僅隔5個月

嫹筆牂牂
2026-04-21 07:15:44
鄭麗文成功了!國民黨3位元老出山,朱立倫的反撲計劃宣告失敗

鄭麗文成功了!國民黨3位元老出山,朱立倫的反撲計劃宣告失敗

米果說識
2026-04-21 16:58:00
貨車司機被瓷磚壓靠身亡,第二天清晨才被人發現

貨車司機被瓷磚壓靠身亡,第二天清晨才被人發現

映射生活的身影
2026-04-20 21:45:55
2026-04-22 01:07:00
賽博禪心
賽博禪心
拜AI古佛,修賽博禪心
396文章數 50關注度
往期回顧 全部

科技要聞

創造4萬億帝國、訪華20次,庫克留下了什么

頭條要聞

三國取消飛航許可 賴清德無法竄訪斯威士蘭

頭條要聞

三國取消飛航許可 賴清德無法竄訪斯威士蘭

體育要聞

一到NBA季后賽,四屆DPOY就成了主角

娛樂要聞

宋承炫曬寶寶B超照,宣布老婆懷孕

財經要聞

現實是最大的荒誕:千億平臺的沖突始末

汽車要聞

全新坦克700正式上市 售價42.8萬-50.8萬元

態度原創

游戲
旅游
家居
數碼
公開課

漲價兩周即回調!索尼官方PS5數字版定價重回399美元

旅游要聞

京城今春“濱水+”玩法迭代

家居要聞

詩意光影 窺見自然之境

數碼要聞

大疆發布Osmo Mobile 8P:售899元 分體式遙控器設計

公開課

李玫瑾:為什么性格比能力更重要?

無障礙瀏覽 進入關懷版