The Claude 3 family of models comes with new vision capabilities that allow Claude to understand and analyze images, opening up exciting possibilities for multimodal interaction. With Claude, you can now provide both text and image inputs to enrich your conversations and enable powerful new use cases.

πŸ’‘

Vision-capable models

You do not need to use special versions of our Claude 3 models to access Claude's vision capabilities. All Claude 3 models are capable of understanding and analyzing images.

This guide will walk you through how to work with images in Claude, including best practices, code examples, and limitations to keep in mind.

Try chatting now with images at claude.ai!


Getting started

Currently, you can utilize Claude's vision capabilities in three ways:

  • Via claude.ai directly in the chat window. Simply upload an image like you would a file, or drag and drop an image directly into the window!
  • Via our Console Workbench. If you select a model that accepts images (Claude 3 models only), a button to add images will appear at the top right of every User message block.
  • Via API request - see instructions below.

For this guide, we'll be using the Anthropic Python SDK, and the following example variables. We'll fetch sample images from Wikipedia using the httpx library, but you can use whatever image sources work for you.

import anthropic
import base64
import httpx

client = anthropic.Anthropic()

image1_url = "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg"
image1_media_type = "image/jpeg"
image1_data = base64.b64encode(httpx.get(image1_url).content).decode("utf-8")

image2_url = "https://upload.wikimedia.org/wikipedia/commons/b/b5/Iridescent.green.sweat.bee1.jpg"
image2_media_type = "image/jpeg"
image2_data = base64.b64encode(httpx.get(image2_url).content).decode("utf-8")

To utilize images when making an API request, you can provide images to Claude as a base64-encoded image in image content blocks. Here is simple example in Python showing how to include a base64-encoded image in a Messages API request:

message = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": image1_media_type,
                        "data": image1_data,
                    },
                },
                {
                    "type": "text",
                    "text": "Describe this image."
                }
            ],
        }
    ],
)
print(message)

Supported image formats are JPEG, PNG, GIF, and WebP. See Messages API examples for more example code and parameter details.


Image size

For optimal performance, we recommend resizing your images before uploading if it is likely to exceed size or token limits. If your image's long edge is more than 1568 pixels, or your image is more than ~1600 tokens, it will first be scaled down, preserving aspect ratio, until it is within size limits. If your input image is too large and needs to be resized, it will increase latency of time-to-first-token, without giving you any additional model performance. Very small images under 200 pixels on any given edge may lead to degraded performance.

If you want to improve time-to-first-token, we recommend resizing your images to no more than 1.15 megapixels (and within 1568 pixels in both dimensions).

Here is a table of maximum image sizes accepted by our API that will not be resized for common aspect ratios. All these images approximate out to around ~1600 tokens and ~$4.80/1K images (assuming the use of Claude 3 Sonnet):

Aspect ratioImage size
1:11092x1092 px
3:4951x1268 px
2:3896x1344 px
9:16819x1456 px
1:2784x1568 px

Image best practices

When providing images to Claude, keep the following guidelines in mind for best results:

  • Image clarity: Ensure your images are clear and not too blurry or pixelated. Claude may struggle to accurately interpret unclear or low-quality images.

  • Image placement: Just as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend image-then-text structure. See vision prompting tips for more details.

  • Text: If your image contains important text, make sure it is legible and not too small. However, avoid cropping out key visual context just to enlarge the text.

  • Multiple images: You can include multiple images in a single request (up to 5 for claude.ai and 20 for API requests). Claude will analyze all the provided images when formulating its response. This can be helpful for comparing or contrasting images.

See limitations for further details and guidelines.


Prompting tips

Many of the prompting techniques that work well for text-based interactions with Claude can also be applied to image-based prompts. See our multimodal cookbook for a walkthrough of image processing techniques and use cases, complete with accimpanying prompting techniques and strategies.

Below are a few example best practice prompt structures involving images. In general, it's best to place images earlier in the prompt than questions about them or instructions for tasks that use them, and in situations where there are multiple images, to introduce each image with Image 1: and Image 2: and so on. You do not need newlines between images or between images and the prompt.

1. Example: One image

Here is the prompt structure:

RoleContent
User[Image] Describe this image.

Here is the corresponding API call:

message = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": image1_media_type,
                        "data": image1_data,
                    },
                },
                {
                    "type": "text",
                    "text": "Describe this image."
                }
            ],
        }
    ],
)

2. Example: Multiple images

Here is the prompt structure:

RoleContent
UserImage 1: [Image 1] Image 2: [Image 2] How are these images different?

Here is the corresponding API call:

message = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Image 1:"
                },
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": image1_media_type,
                        "data": image1_data,
                    },
                },
                {
                    "type": "text",
                    "text": "Image 2:"
                },
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": image2_media_type,
                        "data": image2_data,
                    },
                },
                {
                    "type": "text",
                    "text": "How are these images different?"
                }
            ],
        }
    ],
)

3. Example: Multiple images with a system prompt

Here is the prompt structure:

Content
SystemRespond only in Spanish.
UserImage 1: [Image 1] Image 2: [Image 2] How are these images different?

Here is the corresponding API call:

message = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    system="Respond only in Spanish.",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Image 1:"
                },
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": image1_media_type,
                        "data": image1_data,
                    },
                },
                {
                    "type": "text",
                    "text": "Image 2:"
                },
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": image2_media_type,
                        "data": image2_data,
                    },
                },
                {
                    "type": "text",
                    "text": "How are these images different?"
                }
            ],
        }
    ],
)

4. Example: Four images across two conversation turns

Claude's vision capabilities really shine in multimodal conversations that mix both images and text. You can carry on extended back-and-forth exchanges with Claude, adding new images or follow-up questions at any point. This enables powerful workflows for iterative image analysis, comparison, or combining visuals with other knowledge.

Here is an example prompt structure:

RoleContent
UserImage 1: [Image 1] Image 2: [Image 2] How are these images different?
Assistant[Claude's response]
UserImage 1: [Image 3] Image 2: [Image 4] Are these images similar to the first two?
Assistant[Claude's response]

When using the API, simply insert new images into the array of Messages in the user role as part of any standard multiturn conversation structure.


Image costs

Each image you include in a request to Claude counts towards your token usage. To calculate the approximate cost, multiply the approximate number of image tokens by the per-token price of the model you're using. You can find model pricing details on our pricing page.

Assuming your image does not need to be resized, you can estimate the number of tokens used via this simple algorithm:

tokens = (width px * height px)/750

Here are a few examples of approximate tokenization and costs for different image sizes within our API's size constraints, assuming the use of Claude 3 Sonnet at $3 per million input tokens:

Image size# of TokensCost / imageCost / 1K images
200x200 px
(0.04 megapixels)
~54~$0.00016~$0.16
1000x1000 px
(1 megapixel)
~1334~$0.004~$4.00
1092x1092 px
(1.19 megapixels)
~1590~$0.0048~$4.80

Limitations

While Claude's image understanding capabilities are cutting-edge, there are some limitations to be aware of:

  • People identification: Claude cannot be used to identify (i.e., name) people in images and will refuse to do so.
  • Accuracy: Claude may hallucinate or make mistakes when interpreting low-quality, rotated, or very small images under 200 pixels.
  • Spatial reasoning: Claude's spatial reasoning abilities are limited. It may struggle with tasks requiring precise localization or layouts, like reading an analog clock face or describing exact positions of chess pieces.
  • Counting: Claude can give approximate counts of objects in an image but may not always be precisely accurate, especially with large numbers of small objects.
  • AI generated images: Claude does not know if an image is AI-generated and may be incorrect if asked. Do not rely on it to detect fake or synthetic images.
  • Inappropriate content: Claude will not process inappropriate or explicit images that violate our Acceptable Use Policy.
  • Healthcare applications: While Claude can analyze general medical images, it is not designed to interpret complex diagnostic scans such as CTs or MRIs. Claude's outputs should not be considered a substitute for professional medical advice or diagnosis.

Always carefully review and verify Claude's image interpretations, especially for high-stakes use cases. Do not use Claude for tasks requiring perfect precision or sensitive image analysis without human oversight.


FAQ

What image file types does Claude support?

Claude currently support JPEG, PNG, GIF, and WebP image formats, specifically image/jpeg, image/png, image/gif, and image/webp.

Can Claude read image URLs?

Claude cannot read image URLs on any interface, including on claude.ai. Our API does not currently support adding URLs in either the text or image blocks. Adding image URLs (or URLs of any sort) in the text block might cause Claude to hallucinate, as Claude is currently unable to retrieve information from that URL.

Is there a limit to the image file size I can upload?

Yes, the maximum allowed image file size is 5MB per image (10MB per image on claude.ai). Images larger than 5MB will be rejected and return an error when using our API.

How many images can I include in one request?

You can include up to 20 images in a single request via the Messages API. You can include up to 5 images per turn on claude.ai. Image counts above that limit will be rejected and return an error when using our API.

Does Claude read image metadata?

No, Claude does not parse or receive any metadata from images passed to it.

Can I delete images I've uploaded?

No. Furthermore, image uploads are ephemeral and not stored beyond the duration of the API request. Uploaded images are automatically deleted after they have been processed.

Where can I find more details on data privacy and security for image uploads?

Please refer to our privacy policy page for information on how we handle uploaded images and other data. We do not use uploaded images to train our models.

What should I do if Claude's image interpretation seems wrong?

If you get an image interpretation from Claude that seems incorrect, first double check that the image is clear, high-quality, and correctly oriented. If the issue persists, try to improve results by employing prompt engineering techniques. If the issue cannot be resolved, please let us know by flagging the concerning output directly in claude.ai via the thumbs up / down interface or contacting our support team. Your input helps us improve!

Can Claude generate, produce, edit, manipulate or create images?

No, Claude is an image understanding model only. It can interpret and analyze images, but it cannot generate, produce, edit, manipulate or create images.


Dive deeper into vision

Ready to start building with images using Claude? Here are a few helpful resources:

If you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.

We're excited to see what you create with Claude's powerful new vision capabilities!