Streaming Messages
When creating a Message, you can set "stream": true
to incrementally stream the response using server-sent events (SSE).
Streaming with SDKs
Our Python and TypeScript SDKs offer multiple ways of streaming. The Python SDK allows both sync and async streams. See the documentation in each SDK for details.
Event types
Each server-sent event includes a named event type and associated JSON data. Each event will use an SSE event name (e.g. event: message_stop
), and include the matching event type
in its data.
Each stream uses the following event flow:
message_start
: contains aMessage
object with emptycontent
.- A series of content blocks, each of which have a
content_block_start
, one or morecontent_block_delta
events, and acontent_block_stop
event. Each content block will have anindex
that corresponds to its index in the final Messagecontent
array. - One or more
message_delta
events, indicating top-level changes to the finalMessage
object. - A final
message_stop
event.
Ping events
Event streams may also include any number of ping
events.
Error events
We may occasionally send errors in the event stream. For example, during periods of high usage, you may receive an overloaded_error
, which would normally correspond to an HTTP 529 in a non-streaming context:
Other events
In accordance with our versioning policy, we may add new event types, and your code should handle unknown event types gracefully.
Delta types
Each content_block_delta
event contains a delta
of a type that updates the content
block at a given index
.
Text delta
A text
content block delta looks like:
Input JSON delta
The deltas for tool_use
content blocks correspond to updates for the input
field of the block. To support maximum granularity, the deltas are partial JSON strings, whereas the final tool_use.input
is always an object.
You can accumulate the string deltas and parse the JSON once you receive a content_block_stop
event, by using a library like Pydantic to do partial JSON parsing, or by using our SDKs, which provide helpers to access parsed incremental values.
A tool_use
content block delta looks like:
Note: Our current models only support emitting one complete key and value property from input
at a time. As such, when using tools, there may be delays between streaming events while the model is working. Once an input
key and value are accumulated, we emit them as multiple content_block_delta
events with chunked partial json so that the format can automatically support finer granularity in future models.
Raw HTTP Stream response
We strongly recommend that use our client SDKs when using streaming mode. However, if you are building a direct API integration, you will need to handle these events yourself.
A stream response is comprised of:
- A
message_start
event - Potentially multiple content blocks, each of which contains:
a. A
content_block_start
event b. Potentially multiplecontent_block_delta
events c. Acontent_block_stop
event - A
message_delta
event - A
message_stop
event
There may be ping
events dispersed throughout the response as well. See Event types for more details on the format.
Basic streaming request
Streaming request with tool use
In this request, we ask Claude to use a tool to tell us the weather.