使用延伸思考構建

延伸思考為 Claude 提供了處理複雜任務的增強推理能力，同時在提供最終答案之前，對其逐步思考過程提供不同程度的透明度。

支援的模型

延伸思考在以下模型中受到支援：

Claude Opus 4.1 (claude-opus-4-1-20250805)
Claude Opus 4 (claude-opus-4-20250514)
Claude Sonnet 4 (claude-sonnet-4-20250514)
Claude Sonnet 3.7 (claude-3-7-sonnet-20250219)

API 行為在 Claude Sonnet 3.7 和 Claude 4 模型之間有所不同，但 API 結構保持完全相同。更多資訊請參閱不同模型版本中思考的差異。

延伸思考的運作方式

當延伸思考開啟時，Claude 會建立 thinking 內容區塊，在其中輸出其內部推理。Claude 在制定最終回應之前會整合來自此推理的見解。 API 回應將包含 thinking 內容區塊，然後是 text 內容區塊。以下是預設回應格式的範例：

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

有關延伸思考回應格式的更多資訊，請參閱 Messages API 參考。

如何使用延伸思考

以下是在 Messages API 中使用延伸思考的範例：

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 16000,
    "thinking": {
        "type": "enabled",
        "budget_tokens": 10000
    },
    "messages": [
        {
            "role": "user",
            "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"
        }
    ]
}'

要開啟延伸思考，請新增一個 thinking 物件，將 type 參數設定為 enabled，並將 budget_tokens 設定為延伸思考的指定代幣預算。 budget_tokens 參數決定 Claude 允許用於其內部推理過程的最大代幣數量。在 Claude 4 模型中，此限制適用於完整思考代幣，而不是摘要輸出。較大的預算可以透過為複雜問題啟用更徹底的分析來改善回應品質，儘管 Claude 可能不會使用分配的整個預算，特別是在超過 32k 的範圍內。 budget_tokens 必須設定為小於 max_tokens 的值。但是，當使用交錯思考與工具時，您可以超過此限制，因為代幣限制變成您的整個上下文視窗（200k 代幣）。

摘要思考

啟用延伸思考後，Claude 4 模型的 Messages API 會回傳 Claude 完整思考過程的摘要。摘要思考提供延伸思考的完整智慧優勢，同時防止濫用。以下是摘要思考的一些重要考量：

您需要為原始請求產生的完整思考代幣付費，而不是摘要代幣。
計費的輸出代幣計數將不匹配您在回應中看到的代幣計數。
思考輸出的前幾行更加詳細，提供對提示工程目的特別有用的詳細推理。
隨著 Anthropic 尋求改善延伸思考功能，摘要行為可能會發生變化。
摘要保留了 Claude 思考過程的關鍵想法，延遲最小，實現可串流的使用者體驗，並輕鬆從 Claude Sonnet 3.7 遷移到 Claude 4 模型。
摘要由與您在請求中目標的模型不同的模型處理。思考模型看不到摘要輸出。

Claude Sonnet 3.7 繼續回傳完整的思考輸出。在極少數情況下，如果您需要存取 Claude 4 模型的完整思考輸出，請聯繫我們的銷售團隊。

串流思考

您可以使用伺服器傳送事件 (SSE) 串流延伸思考回應。當為延伸思考啟用串流時，您會透過 thinking_delta 事件接收思考內容。有關透過 Messages API 串流的更多文件，請參閱串流訊息。以下是如何處理帶有思考的串流：

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 16000,
    "stream": true,
    "thinking": {
        "type": "enabled",
        "budget_tokens": 10000
    },
    "messages": [
        {
            "role": "user",
            "content": "What is 27 * 453?"
        }
    ]
}'

在控制台中試用

串流輸出範例：

event: message_start
data: {"type": "message_start", "message": {"id": "msg_01...", "type": "message", "role": "assistant", "content": [], "model": "claude-sonnet-4-20250514", "stop_reason": null, "stop_sequence": null}}

event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "thinking", "thinking": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "Let me solve this step by step:\n\n1. First break down 27 * 453"}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "\n2. 453 = 400 + 50 + 3"}}

// Additional thinking deltas...

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "signature_delta", "signature": "EqQBCgIYAhIM1gbcDa9GJwZA2b3hGgxBdjrkzLoky3dl1pkiMOYds..."}}

event: content_block_stop
data: {"type": "content_block_stop", "index": 0}

event: content_block_start
data: {"type": "content_block_start", "index": 1, "content_block": {"type": "text", "text": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 1, "delta": {"type": "text_delta", "text": "27 * 453 = 12,231"}}

// Additional text deltas...

event: content_block_stop
data: {"type": "content_block_stop", "index": 1}

event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence": null}}

event: message_stop
data: {"type": "message_stop"}

當使用啟用思考的串流時，您可能會注意到文字有時會以較大的區塊到達，與較小的逐代幣傳遞交替。這是預期的行為，特別是對於思考內容。串流系統需要批次處理內容以獲得最佳效能，這可能導致這種「分塊」傳遞模式，串流事件之間可能會有延遲。我們持續努力改善這種體驗，未來的更新將專注於使思考內容更流暢地串流。

延伸思考與工具使用

延伸思考可以與工具使用一起使用，允許 Claude 透過工具選擇和結果處理進行推理。當使用延伸思考與工具使用時，請注意以下限制：

工具選擇限制：帶有思考的工具使用僅支援 tool_choice: {"type": "auto"}（預設）或 tool_choice: {"type": "none"}。使用 tool_choice: {"type": "any"} 或 tool_choice: {"type": "tool", "name": "..."} 將導致錯誤，因為這些選項強制使用工具，這與延伸思考不相容。
保留思考區塊：在工具使用期間，您必須將 thinking 區塊傳回 API 以獲得最後的助理訊息。將完整的未修改區塊包含回 API 以維持推理連續性。

範例：使用工具結果傳遞思考區塊

以下是一個實際範例，展示如何在提供工具結果時保留思考區塊：

weather_tool = {
    "name": "get_weather",
    "description": "Get current weather for a location",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {"type": "string"}
        },
        "required": ["location"]
    }
}

# First request - Claude responds with thinking and tool request
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    tools=[weather_tool],
    messages=[
        {"role": "user", "content": "What's the weather in Paris?"}
    ]
)

API 回應將包含思考、文字和 tool_use 區塊：

{
    "content": [
        {
            "type": "thinking",
            "thinking": "The user wants to know the current weather in Paris. I have access to a function `get_weather`...",
            "signature": "BDaL4VrbR2Oj0hO4XpJxT28J5TILnCrrUXoKiiNBZW9P+nr8XSj1zuZzAl4egiCCpQNvfyUuFFJP5CncdYZEQPPmLxYsNrcs...."
        },
        {
            "type": "text",
            "text": "I can help you get the current weather information for Paris. Let me check that for you"
        },
        {
            "type": "tool_use",
            "id": "toolu_01CswdEQBMshySk6Y9DFKrfq",
            "name": "get_weather",
            "input": {
                "location": "Paris"
            }
        }
    ]
}

現在讓我們繼續對話並使用工具

# Extract thinking block and tool use block
thinking_block = next((block for block in response.content
                      if block.type == 'thinking'), None)
tool_use_block = next((block for block in response.content
                      if block.type == 'tool_use'), None)

# Call your actual weather API, here is where your actual API call would go
# let's pretend this is what we get back
weather_data = {"temperature": 88}

# Second request - Include thinking block and tool result
# No new thinking blocks will be generated in the response
continuation = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    tools=[weather_tool],
    messages=[
        {"role": "user", "content": "What's the weather in Paris?"},
        # notice that the thinking_block is passed in as well as the tool_use_block
        # if this is not passed in, an error is raised
        {"role": "assistant", "content": [thinking_block, tool_use_block]},
        {"role": "user", "content": [{
            "type": "tool_result",
            "tool_use_id": tool_use_block.id,
            "content": f"Current temperature: {weather_data['temperature']}°F"
        }]}
    ]
)

API 回應現在將僅包含文字

{
    "content": [
        {
            "type": "text",
            "text": "Currently in Paris, the temperature is 88°F (31°C)"
        }
    ]
}

保留思考區塊

在工具使用期間，您必須將 thinking 區塊傳回 API，並且必須將完整的未修改區塊包含回 API。這對於維持模型的推理流程和對話完整性至關重要。

雖然您可以省略先前 assistant 角色回合的 thinking 區塊，但我們建議在任何多回合對話中始終將所有思考區塊傳回 API。API 將：

自動過濾提供的思考區塊
使用維持模型推理所需的相關思考區塊
僅對顯示給 Claude 的區塊的輸入代幣計費

當 Claude 調用工具時，它正在暫停構建回應以等待外部資訊。當工具結果回傳時，Claude 將繼續構建該現有回應。這需要在工具使用期間保留思考區塊，原因如下：

推理連續性：思考區塊捕獲了 Claude 導致工具請求的逐步推理。當您發布工具結果時，包含原始思考確保 Claude 可以從中斷的地方繼續其推理。
上下文維護：雖然工具結果在 API 結構中顯示為使用者訊息，但它們是連續推理流程的一部分。保留思考區塊在多個 API 呼叫中維持這種概念流程。有關上下文管理的更多資訊，請參閱我們的上下文視窗指南。

重要：當提供 thinking 區塊時，連續 thinking 區塊的整個序列必須與模型在原始請求期間產生的輸出相符；您不能重新排列或修改這些區塊的序列。

交錯思考

Claude 4 模型中的延伸思考與工具使用支援交錯思考，這使 Claude 能夠在工具呼叫之間進行思考，並在接收工具結果後進行更複雜的推理。透過交錯思考，Claude 可以：

在決定下一步行動之前對工具呼叫的結果進行推理
在中間推理步驟之間鏈接多個工具呼叫
基於中間結果做出更細緻的決策

要啟用交錯思考，請將測試版標頭 interleaved-thinking-2025-05-14 新增到您的 API 請求中。以下是交錯思考的一些重要考量：

使用交錯思考時，budget_tokens 可以超過 max_tokens 參數，因為它代表一個助理回合內所有思考區塊的總預算。
交錯思考僅支援透過 Messages API 使用的工具。
交錯思考僅支援 Claude 4 模型，使用測試版標頭 interleaved-thinking-2025-05-14。
直接呼叫 Anthropic 的 API 允許您在對任何模型的請求中傳遞 interleaved-thinking-2025-05-14，沒有效果。
在第三方平台上（例如，Amazon Bedrock 和 Vertex AI），如果您將 interleaved-thinking-2025-05-14 傳遞給除 Claude Opus 4.1、Opus 4 或 Sonnet 4 之外的任何模型，您的請求將失敗。

不使用交錯思考的工具使用

import anthropic

client = anthropic.Anthropic()

# Define tools
calculator_tool = {
    "name": "calculator",
    "description": "Perform mathematical calculations",
    "input_schema": {
        "type": "object",
        "properties": {
            "expression": {
                "type": "string",
                "description": "Mathematical expression to evaluate"
            }
        },
        "required": ["expression"]
    }
}

database_tool = {
    "name": "database_query",
    "description": "Query product database",
    "input_schema": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "SQL query to execute"
            }
        },
        "required": ["query"]
    }
}

# First request - Claude thinks once before all tool calls
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    tools=[calculator_tool, database_tool],
    messages=[{
        "role": "user",
        "content": "What's the total revenue if we sold 150 units of product A at $50 each, and how does this compare to our average monthly revenue from the database?"
    }]
)

# Response includes thinking followed by tool uses
# Note: Claude thinks once at the beginning, then makes all tool decisions
print("First response:")
for block in response.content:
    if block.type == "thinking":
        print(f"Thinking (summarized): {block.thinking}")
    elif block.type == "tool_use":
        print(f"Tool use: {block.name} with input {block.input}")
    elif block.type == "text":
        print(f"Text: {block.text}")

# You would execute the tools and return results...
# After getting both tool results back, Claude directly responds without additional thinking

在這個不使用交錯思考的範例中：

Claude 在開始時思考一次以理解任務
預先做出所有工具使用決策
當工具結果回傳時，Claude 立即提供回應而不進行額外思考

使用交錯思考的工具使用

import anthropic

client = anthropic.Anthropic()

# Same tool definitions as before
calculator_tool = {
    "name": "calculator",
    "description": "Perform mathematical calculations",
    "input_schema": {
        "type": "object",
        "properties": {
            "expression": {
                "type": "string",
                "description": "Mathematical expression to evaluate"
            }
        },
        "required": ["expression"]
    }
}

database_tool = {
    "name": "database_query",
    "description": "Query product database",
    "input_schema": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "SQL query to execute"
            }
        },
        "required": ["query"]
    }
}

# First request with interleaved thinking enabled
response = client.beta.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    tools=[calculator_tool, database_tool],
    betas=["interleaved-thinking-2025-05-14"],
    messages=[{
        "role": "user",
        "content": "What's the total revenue if we sold 150 units of product A at $50 each, and how does this compare to our average monthly revenue from the database?"
    }]
)

print("Initial response:")
thinking_blocks = []
tool_use_blocks = []

for block in response.content:
    if block.type == "thinking":
        thinking_blocks.append(block)
        print(f"Thinking: {block.thinking}")
    elif block.type == "tool_use":
        tool_use_blocks.append(block)
        print(f"Tool use: {block.name} with input {block.input}")
    elif block.type == "text":
        print(f"Text: {block.text}")

# First tool result (calculator)
calculator_result = "7500"  # 150 * 50

# Continue with first tool result
response2 = client.beta.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    tools=[calculator_tool, database_tool],
    betas=["interleaved-thinking-2025-05-14"],
    messages=[
        {
            "role":  "user",
            "content": "What's the total revenue if we sold 150 units of product A at $50 each, and how does this compare to our average monthly revenue from the database?"
        },
        {
            "role": "assistant",
            "content": [thinking_blocks[0], tool_use_blocks[0]]
        },
        {
            "role": "user",
            "content": [{
                "type": "tool_result",
                "tool_use_id": tool_use_blocks[0].id,
                "content": calculator_result
            }]
        }
    ]
)

print("\nAfter calculator result:")
# With interleaved thinking, Claude can think about the calculator result
# before deciding to query the database
for block in response2.content:
    if block.type == "thinking":
        thinking_blocks.append(block)
        print(f"Interleaved thinking: {block.thinking}")
    elif block.type == "tool_use":
        tool_use_blocks.append(block)
        print(f"Tool use: {block.name} with input {block.input}")

# Second tool result (database)
database_result = "5200"  # Example average monthly revenue

# Continue with second tool result
response3 = client.beta.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    tools=[calculator_tool, database_tool],
    betas=["interleaved-thinking-2025-05-14"],
    messages=[
        {
            "role": "user",
            "content": "What's the total revenue if we sold 150 units of product A at $50 each, and how does this compare to our average monthly revenue from the database?"
        },
        {
            "role": "assistant",
            "content": [thinking_blocks[0], tool_use_blocks[0]]
        },
        {
            "role": "user",
            "content": [{
                "type": "tool_result",
                "tool_use_id": tool_use_blocks[0].id,
                "content": calculator_result
            }]
        },
        {
            "role": "assistant",
            "content": thinking_blocks[1:] + tool_use_blocks[1:]
        },
        {
            "role": "user",
            "content": [{
                "type": "tool_result",
                "tool_use_id": tool_use_blocks[1].id,
                "content": database_result
            }]
        }
    ]
)

print("\nAfter database result:")
# With interleaved thinking, Claude can think about both results
# before formulating the final response
for block in response3.content:
    if block.type == "thinking":
        print(f"Final thinking: {block.thinking}")
    elif block.type == "text":
        print(f"Final response: {block.text}")

在這個使用交錯思考的範例中：

Claude 最初思考任務
在接收計算器結果後，Claude 可以再次思考該結果的意義
Claude 然後根據第一個結果決定如何查詢資料庫
在接收資料庫結果後，Claude 在制定最終回應之前再次思考兩個結果
思考預算分佈在回合內的所有思考區塊中

這種模式允許更複雜的推理鏈，其中每個工具的輸出都會影響下一個決策。

延伸思考與提示快取

提示快取與思考有幾個重要考量：

延伸思考任務通常需要超過 5 分鐘才能完成。考慮使用1 小時快取持續時間來維持較長思考會話和多步驟工作流程中的快取命中。

思考區塊上下文移除

來自先前回合的思考區塊會從上下文中移除，這可能會影響快取中斷點
當繼續使用工具使用的對話時，思考區塊會被快取並在從快取讀取時計算為輸入代幣
這創造了一個權衡：雖然思考區塊在視覺上不佔用上下文視窗空間，但當快取時它們仍然計入您的輸入代幣使用量
如果思考被停用，如果您在當前工具使用回合中傳遞思考內容，請求將失敗。在其他上下文中，傳遞給 API 的思考內容會被忽略

快取失效模式

思考參數的變更（啟用/停用或預算分配）會使訊息快取中斷點失效
交錯思考會放大快取失效，因為思考區塊可能出現在多個工具呼叫之間
儘管思考參數變更或區塊移除，系統提示和工具仍保持快取

雖然思考區塊會被移除以進行快取和上下文計算，但在繼續使用工具使用的對話時必須保留它們，特別是使用交錯思考時。

理解思考區塊快取行為

當使用延伸思考與工具使用時，思考區塊表現出特定的快取行為，會影響代幣計數： 運作方式：

快取僅在您進行包含工具結果的後續請求時發生
當進行後續請求時，先前的對話歷史（包括思考區塊）可以被快取
這些快取的思考區塊在從快取讀取時會在您的使用指標中計算為輸入代幣
當包含非工具結果使用者區塊時，所有先前的思考區塊會被忽略並從上下文中剝離

詳細範例流程： 請求 1：

User: "What's the weather in Paris?"

回應 1：

[thinking_block_1] + [tool_use block 1]

請求 2：

User: ["What's the weather in Paris?"], 
Assistant: [thinking_block_1] + [tool_use block 1], 
User: [tool_result_1, cache=True]

回應 2：

[thinking_block_2] + [text block 2]

請求 2 寫入請求內容的快取（不是回應）。快取包含原始使用者訊息、第一個思考區塊、工具使用區塊和工具結果。 請求 3：

User: ["What's the weather in Paris?"], 
Assistant: [thinking_block_1] + [tool_use block 1], 
User: [tool_result_1, cache=True], 
Assistant: [thinking_block_2] + [text block 2], 
User: [Text response, cache=True]

因為包含了非工具結果使用者區塊，所有先前的思考區塊都會被忽略。此請求將被處理為：

User: ["What's the weather in Paris?"], 
Assistant: [tool_use block 1], 
User: [tool_result_1, cache=True], 
Assistant: [text block 2], 
User: [Text response, cache=True]

要點：

這種快取行為會自動發生，即使沒有明確的 cache_control 標記
無論使用常規思考還是交錯思考，此行為都是一致的

系統提示快取（思考變更時保留）

from anthropic import Anthropic
import requests
from bs4 import BeautifulSoup

client = Anthropic()

def fetch_article_content(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')

    # Remove script and style elements
    for script in soup(["script", "style"]):
        script.decompose()

    # Get text
    text = soup.get_text()

    # Break into lines and remove leading and trailing space on each
    lines = (line.strip() for line in text.splitlines())
    # Break multi-headlines into a line each
    chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
    # Drop blank lines
    text = '\n'.join(chunk for chunk in chunks if chunk)

    return text

# Fetch the content of the article
book_url = "https://www.gutenberg.org/cache/epub/1342/pg1342.txt"
book_content = fetch_article_content(book_url)
# Use just enough text for caching (first few chapters)
LARGE_TEXT = book_content[:5000]

SYSTEM_PROMPT=[
    {
        "type": "text",
        "text": "You are an AI assistant that is tasked with literary analysis. Analyze the following text carefully.",
    },
    {
        "type": "text",
        "text": LARGE_TEXT,
        "cache_control": {"type": "ephemeral"}
    }
]

MESSAGES = [
    {
        "role": "user",
        "content": "Analyze the tone of this passage."
    }
]

# First request - establish cache
print("First request - establishing cache")
response1 = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=20000,
    thinking={
        "type": "enabled",
        "budget_tokens": 4000
    },
    system=SYSTEM_PROMPT,
    messages=MESSAGES
)

print(f"First response usage: {response1.usage}")

MESSAGES.append({
    "role": "assistant",
    "content": response1.content
})
MESSAGES.append({
    "role": "user",
    "content": "Analyze the characters in this passage."
})
# Second request - same thinking parameters (cache hit expected)
print("\nSecond request - same thinking parameters (cache hit expected)")
response2 = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=20000,
    thinking={
        "type": "enabled",
        "budget_tokens": 4000
    },
    system=SYSTEM_PROMPT,
    messages=MESSAGES
)

print(f"Second response usage: {response2.usage}")

# Third request - different thinking parameters (cache miss for messages)
print("\nThird request - different thinking parameters (cache miss for messages)")
response3 = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=20000,
    thinking={
        "type": "enabled",
        "budget_tokens": 8000  # Changed thinking budget
    },
    system=SYSTEM_PROMPT,  # System prompt remains cached
    messages=MESSAGES  # Messages cache is invalidated
)

print(f"Third response usage: {response3.usage}")

訊息快取（思考變更時失效）

from anthropic import Anthropic
import requests
from bs4 import BeautifulSoup

client = Anthropic()

def fetch_article_content(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')

    # Remove script and style elements
    for script in soup(["script", "style"]):
        script.decompose()

    # Get text
    text = soup.get_text()

    # Break into lines and remove leading and trailing space on each
    lines = (line.strip() for line in text.splitlines())
    # Break multi-headlines into a line each
    chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
    # Drop blank lines
    text = '\n'.join(chunk for chunk in chunks if chunk)

    return text

# Fetch the content of the article
book_url = "https://www.gutenberg.org/cache/epub/1342/pg1342.txt"
book_content = fetch_article_content(book_url)
# Use just enough text for caching (first few chapters)
LARGE_TEXT = book_content[:5000]

# No system prompt - caching in messages instead
MESSAGES = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": LARGE_TEXT,
                "cache_control": {"type": "ephemeral"},
            },
            {
                "type": "text",
                "text": "Analyze the tone of this passage."
            }
        ]
    }
]

# First request - establish cache
print("First request - establishing cache")
response1 = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=20000,
    thinking={
        "type": "enabled",
        "budget_tokens": 4000
    },
    messages=MESSAGES
)

print(f"First response usage: {response1.usage}")

MESSAGES.append({
    "role": "assistant",
    "content": response1.content
})
MESSAGES.append({
    "role": "user",
    "content": "Analyze the characters in this passage."
})
# Second request - same thinking parameters (cache hit expected)
print("\nSecond request - same thinking parameters (cache hit expected)")
response2 = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=20000,
    thinking={
        "type": "enabled",
        "budget_tokens": 4000  # Same thinking budget
    },
    messages=MESSAGES
)

print(f"Second response usage: {response2.usage}")

MESSAGES.append({
    "role": "assistant",
    "content": response2.content
})
MESSAGES.append({
    "role": "user",
    "content": "Analyze the setting in this passage."
})

# Third request - different thinking budget (cache miss expected)
print("\nThird request - different thinking budget (cache miss expected)")
response3 = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=20000,
    thinking={
        "type": "enabled",
        "budget_tokens": 8000  # Different thinking budget breaks cache
    },
    messages=MESSAGES
)

print(f"Third response usage: {response3.usage}")

以下是腳本的輸出（您可能會看到略有不同的數字）

First request - establishing cache
First response usage: { cache_creation_input_tokens: 1370, cache_read_input_tokens: 0, input_tokens: 17, output_tokens: 700 }

Second request - same thinking parameters (cache hit expected)

Second response usage: { cache_creation_input_tokens: 0, cache_read_input_tokens: 1370, input_tokens: 303, output_tokens: 874 }

Third request - different thinking budget (cache miss expected)
Third response usage: { cache_creation_input_tokens: 1370, cache_read_input_tokens: 0, input_tokens: 747, output_tokens: 619 }

此範例展示了當快取設定在訊息陣列中時，變更思考參數（budget_tokens 從 4000 增加到 8000）會使快取失效。第三個請求顯示沒有快取命中，cache_creation_input_tokens=1370 和 cache_read_input_tokens=0，證明當思考參數變更時，基於訊息的快取會失效。

延伸思考的最大代幣和上下文視窗大小

在較舊的 Claude 模型（Claude Sonnet 3.7 之前）中，如果提示代幣和 max_tokens 的總和超過模型的上下文視窗，系統會自動調整 max_tokens 以適應上下文限制。這意味著您可以設定一個大的 max_tokens 值，系統會根據需要靜默地減少它。使用 Claude 3.7 和 4 模型，max_tokens（當啟用思考時包括您的思考預算）被強制執行為嚴格限制。如果提示代幣 + max_tokens 超過上下文視窗大小，系統現在會回傳驗證錯誤。

您可以閱讀我們的上下文視窗指南以獲得更徹底的深入了解。

延伸思考的上下文視窗

當計算啟用思考的上下文視窗使用量時，有一些需要注意的考量：

來自先前回合的思考區塊會被剝離，不計入您的上下文視窗
當前回合思考計入該回合的 max_tokens 限制

下圖展示了啟用延伸思考時的專門代幣管理：

有效上下文視窗計算為：

context window =
  (current input tokens - previous thinking tokens) +
  (thinking tokens + encrypted thinking tokens + text output tokens)

我們建議使用代幣計數 API來獲得您特定用例的準確代幣計數，特別是在處理包含思考的多回合對話時。

延伸思考與工具使用的上下文視窗

當使用延伸思考與工具使用時，思考區塊必須明確保留並與工具結果一起回傳。延伸思考與工具使用的有效上下文視窗計算變為：

context window =
  (current input tokens + previous thinking tokens + tool use tokens) +
  (thinking tokens + encrypted thinking tokens + text output tokens)

下圖說明了延伸思考與工具使用的代幣管理：

管理延伸思考的代幣

鑑於 Claude 3.7 和 4 模型的上下文視窗和 max_tokens 行為與延伸思考，您可能需要：

更積極地監控和管理您的代幣使用量
隨著提示長度的變化調整 max_tokens 值
可能更頻繁地使用代幣計數端點
注意先前的思考區塊不會在您的上下文視窗中累積

這項變更是為了提供更可預測和透明的行為，特別是隨著最大代幣限制顯著增加。

思考加密

完整的思考內容會被加密並在 signature 欄位中回傳。此欄位用於驗證思考區塊是由 Claude 產生的，當傳回 API 時。

只有在使用工具與延伸思考時才嚴格需要傳回思考區塊。否則您可以省略先前回合的思考區塊，或者如果您傳回它們，讓 API 為您剝離它們。如果傳回思考區塊，我們建議按照您收到的方式傳回所有內容，以保持一致性並避免潛在問題。

以下是思考加密的一些重要考量：

當串流回應時，簽名會透過 content_block_delta 事件內的 signature_delta 在 content_block_stop 事件之前新增。
Claude 4 模型中的 signature 值比先前模型中的顯著更長。
signature 欄位是一個不透明欄位，不應被解釋或解析 - 它純粹用於驗證目的。
signature 值在平台間相容（Anthropic APIs、Amazon Bedrock 和 Vertex AI）。在一個平台上產生的值將與另一個平台相容。

思考編輯

偶爾 Claude 的內部推理會被我們的安全系統標記。當這種情況發生時，我們會加密部分或全部 thinking 區塊並將其作為 redacted_thinking 區塊回傳給您。redacted_thinking 區塊在傳回 API 時會被解密，允許 Claude 繼續其回應而不失去上下文。當構建使用延伸思考的面向客戶的應用程式時：

注意編輯的思考區塊包含不可人類閱讀的加密內容
考慮提供簡單的解釋，如：「Claude 的一些內部推理已因安全原因自動加密。這不會影響回應的品質。」
如果向使用者顯示思考區塊，您可以過濾掉編輯的區塊，同時保留正常的思考區塊
透明地說明使用延伸思考功能可能偶爾導致一些推理被加密
實施適當的錯誤處理，以優雅地管理編輯的思考而不破壞您的 UI

以下是顯示正常和編輯思考區塊的範例：

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "redacted_thinking",
      "data": "EmwKAhgBEgy3va3pzix/LafPsn4aDFIT2Xlxh0L5L8rLVyIwxtE3rAFBa8cr3qpPkNRj2YfWXGmKDxH4mPnZ5sQ7vB9URj2pLmN3kF8/dW5hR7xJ0aP1oLs9yTcMnKVf2wRpEGjH9XZaBt4UvDcPrQ..."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

在您的輸出中看到編輯的思考區塊是預期的行為。模型仍然可以使用這種編輯的推理來告知其回應，同時維持安全護欄。如果您需要在應用程式中測試編輯思考處理，您可以使用這個特殊測試字串作為您的提示：ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB

在多回合對話中將 thinking 和 redacted_thinking 區塊傳回 API 時，您必須將完整的未修改區塊包含回 API 以獲得最後的助理回合。這對於維持模型的推理流程至關重要。我們建議始終將所有思考區塊傳回 API。更多詳情請參閱上面的保留思考區塊部分。

範例：處理編輯的思考區塊

此範例展示如何處理當 Claude 的內部推理包含被安全系統標記的內容時可能出現在回應中的 redacted_thinking 區塊：

import anthropic

client = anthropic.Anthropic()

# Using a special prompt that triggers redacted thinking (for demonstration purposes only)
response = client.messages.create(
    model="claude-3-7-sonnet-20250219",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[{
        "role": "user",
        "content": "ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB"
    }]
)

# Identify redacted thinking blocks
has_redacted_thinking = any(
    block.type == "redacted_thinking" for block in response.content
)

if has_redacted_thinking:
    print("Response contains redacted thinking blocks")
    # These blocks are still usable in subsequent requests

    # Extract all blocks (both redacted and non-redacted)
    all_thinking_blocks = [
        block for block in response.content
        if block.type in ["thinking", "redacted_thinking"]
    ]

    # When passing to subsequent requests, include all blocks without modification
    # This preserves the integrity of Claude's reasoning

    print(f"Found {len(all_thinking_blocks)} thinking blocks total")
    print(f"These blocks are still billable as output tokens")

在控制台中試用

不同模型版本中思考的差異

Messages API 在 Claude Sonnet 3.7 和 Claude 4 模型中處理思考的方式不同，主要在編輯和摘要行為方面。請參閱下表以獲得簡化比較：

功能	Claude Sonnet 3.7	Claude 4 模型
思考輸出	回傳完整思考輸出	回傳摘要思考
交錯思考	不支援	使用 `interleaved-thinking-2025-05-14` 測試版標頭支援

定價

延伸思考使用標準代幣定價方案：

模型	基本輸入代幣	快取寫入	快取命中	輸出代幣
Claude Opus 4.1	$15 / MTok	$18.75 / MTok	$1.50 / MTok	$75 / MTok
Claude Opus 4	$15 / MTok	$18.75 / MTok	$1.50 / MTok	$75 / MTok
Claude Sonnet 4	$3 / MTok	$3.75 / MTok	$0.30 / MTok	$15 / MTok
Claude Sonnet 3.7	$3 / MTok	$3.75 / MTok	$0.30 / MTok	$15 / MTok

思考過程產生的費用包括：

思考期間使用的代幣（輸出代幣）
來自最後助理回合的思考區塊包含在後續請求中（輸入代幣）
標準文字輸出代幣

當啟用延伸思考時，會自動包含專門的系統提示以支援此功能。

當使用摘要思考時：

輸入代幣：您原始請求中的代幣（不包括先前回合的思考代幣）
輸出代幣（計費）：Claude 內部產生的原始思考代幣
輸出代幣（可見）：您在回應中看到的摘要思考代幣
無費用：用於產生摘要的代幣

計費的輸出代幣計數將不匹配回應中的可見代幣計數。您需要為完整的思考過程付費，而不是您看到的摘要。

延伸思考的最佳實踐和考量

使用思考預算

預算優化：最小預算是 1,024 代幣。我們建議從最小值開始，逐步增加思考預算以找到您用例的最佳範圍。較高的代幣計數能夠進行更全面的推理，但根據任務的不同，收益遞減。增加預算可以改善回應品質，但會增加延遲。對於關鍵任務，測試不同設定以找到最佳平衡。請注意，思考預算是目標而不是嚴格限制 - 實際代幣使用量可能根據任務而有所不同。
起始點：對於複雜任務，從較大的思考預算（16k+ 代幣）開始，並根據您的需求進行調整。
大預算：對於超過 32k 的思考預算，我們建議使用批次處理以避免網路問題。推動模型思考超過 32k 代幣的請求會導致長時間運行的請求，可能會遇到系統超時和開放連接限制。
代幣使用量追蹤：監控思考代幣使用量以優化成本和效能。

效能考量

回應時間：由於推理過程需要額外處理，請準備好可能較長的回應時間。考慮到產生思考區塊可能會增加整體回應時間。
串流要求：當 max_tokens 大於 21,333 時需要串流。當串流時，準備好處理到達的思考和文字內容區塊。

功能相容性

思考與 temperature 或 top_k 修改以及強制工具使用不相容。
當啟用思考時，您可以將 top_p 設定為 1 到 0.95 之間的值。
當啟用思考時，您無法預填回應。
思考預算的變更會使包含訊息的快取提示前綴失效。但是，當思考參數變更時，快取的系統提示和工具定義將繼續工作。

使用指南

任務選擇：對於特別複雜的任務使用延伸思考，這些任務受益於逐步推理，如數學、編碼和分析。
上下文處理：您不需要自己移除先前的思考區塊。Anthropic API 會自動忽略先前回合的思考區塊，它們在計算上下文使用量時不會被包含。
提示工程：如果您想最大化 Claude 的思考能力，請查看我們的延伸思考提示技巧。

第一步

模型與定價

了解 Claude

功能

工具

模型上下文協定 (MCP)

使用案例

提示工程

測試與評估

加強防護機制

法律中心

使用延伸思考構建

支援的模型

延伸思考的運作方式

如何使用延伸思考

摘要思考

串流思考

延伸思考與工具使用

保留思考區塊

交錯思考

延伸思考與提示快取

理解思考區塊快取行為

延伸思考的最大代幣和上下文視窗大小

延伸思考的上下文視窗

延伸思考與工具使用的上下文視窗

管理延伸思考的代幣

思考加密

思考編輯

不同模型版本中思考的差異

定價

延伸思考的最佳實踐和考量

使用思考預算

效能考量

功能相容性

使用指南

下一步

試用延伸思考食譜

延伸思考提示技巧

第一步

模型與定價

了解 Claude

功能

工具

模型上下文協定 (MCP)

使用案例

提示工程

測試與評估

加強防護機制

法律中心

​支援的模型

​延伸思考的運作方式

​如何使用延伸思考

​摘要思考

​串流思考

​延伸思考與工具使用

​保留思考區塊

​交錯思考

​延伸思考與提示快取

​理解思考區塊快取行為

​延伸思考的最大代幣和上下文視窗大小

​延伸思考的上下文視窗

​延伸思考與工具使用的上下文視窗

​管理延伸思考的代幣

​思考加密

​思考編輯

​不同模型版本中思考的差異

​定價

​延伸思考的最佳實踐和考量

​使用思考預算

​效能考量

​功能相容性

​使用指南

​下一步

試用延伸思考食譜

延伸思考提示技巧

支援的模型

延伸思考的運作方式

如何使用延伸思考

摘要思考

串流思考

延伸思考與工具使用

保留思考區塊

交錯思考

延伸思考與提示快取

理解思考區塊快取行為

延伸思考的最大代幣和上下文視窗大小

延伸思考的上下文視窗

延伸思考與工具使用的上下文視窗

管理延伸思考的代幣

思考加密

思考編輯

不同模型版本中思考的差異

定價

延伸思考的最佳實踐和考量

使用思考預算

效能考量

功能相容性

使用指南

下一步