Prompt engineering

Visit our prompt engineering tutorial to learn prompting via an interactive course.

Claude offers high-level baseline performance out of the box. However, prompt engineering can help you enhance its performance further and fine-tune its responses to better suit your specific use case. These techniques are not necessary for achieving good results with Claude, but you may find them useful in upleveling your inputs & outputs.

To quickly get up and running with a prompt or get introduced to prompting as a concept, see intro to prompting.


What is prompt engineering?

Prompt engineering is an empirical science that involves iterating and testing prompts to optimize performance. Most of the effort spent in the prompt engineering cycle is not actually in writing prompts. Rather, the majority of prompt engineering time is spent developing a strong set of evaluations, followed by testing and iterating against those evals.

The prompt development lifecycle

We recommend a principled, test-driven-development approach to ensure optimal prompt performance. Let's walk through the key high level process we use when developing prompts for a task, as illustrated in the accompanying diagram.

  1. Define the task and success criteria: The first and most crucial step is to clearly define the specific task you want Claude to perform. This could be anything from entity extraction, question answering, or text summarization to more complex tasks like code generation or creative writing. Once you have a well-defined task, establish the success criteria that will guide your evaluation and optimization process.

    Key success criteria to consider include:

    • Performance and accuracy: How well does the model need to perform on the task?
    • Latency: What is the acceptable response time for the model? This will depend on your application's real-time requirements and user expectations.
    • Price: What is your budget for running the model? Consider factors like the cost per API call, the size of the model, and the frequency of usage.

    Having clear, measurable success criteria from the outset will help you make informed decisions throughout the adoption process and ensure that you're optimizing for the right goals.

  2. Develop test cases: With your task and success criteria defined, the next step is to create a diverse set of test cases that cover the intended use cases for your application. These should include both typical examples and edge cases to ensure your prompts are robust. Having well-defined test cases upfront will enable you to objectively measure the performance of your prompts against your success criteria.

  3. Engineer the preliminary prompt: Next, craft an initial prompt that outlines the task definition, characteristics of a good response, and any necessary context for Claude. Ideally you should add some examples of canonical inputs and outputs for Claude to follow. This preliminary prompt will serve as the starting point for refinement.

  4. Test prompt against test cases: Feed your test cases into Claude using the preliminary prompt. Carefully evaluate the model's responses against your expected outputs and success criteria. Use a consistent grading rubric, whether it's human evaluation, comparison to an answer key, or even another instance of Claude’s judgement based on a rubric. The key is to have a systematic way to assess performance.

  5. Refine prompt: Based on the results from step 4, iteratively refine your prompt to improve performance on the test cases and better meet your success criteria. This may involve adding clarifications, examples, or constraints to guide Claude's behavior. Be cautious not to overly optimize for a narrow set of inputs, as this can lead to overfitting and poor generalization.

  6. Ship the polished prompt: Once you've arrived at a prompt that performs well across your test cases and meets your success criteria, it's time to deploy it in your application. Monitor the model's performance in the wild and be prepared to make further refinements as needed. Edge cases may crop up that weren't anticipated in your initial test set.

Throughout this process, it's worth starting with the most capable model and unconstrained prompt length to establish a performance ceiling. Once you've achieved the desired output quality, you can then experiment with optimizations like shorter prompts or smaller models to reduce latency and costs as needed.

By following this test-driven methodology and carefully defining your task and success criteria upfront, you'll be well on your way to harnessing the power of Claude for your specific use case. If you invest time in designing robust test cases and prompts, you'll reap the benefits in terms of model performance and maintainability.


Prompt engineering techniques

Across your prompt development cycle, there are some techniques you can use to enhance Claude's performance, such as:

  • Be clear & direct: Provide clear instructions and context to guide Claude's responses
  • Use examples: Include examples in your prompts to illustrate the desired output format or style
  • Give Claude a role: Prime Claude to inhabit a specific role (like that of an expert) in order to increase performance for your use case
  • Use XML tags: Incorporate XML tags to structure prompts and responses for greater clarity
  • Chain prompts: Divide complex tasks into smaller, manageable steps for better results
  • Let Claude think: Encourage step-by-step thinking to improve the quality of Claude's output
  • Prefill Claude's response: Start Claude's response with a few words to guide its output in the desired direction
  • Control output format: Specify the desired output format to ensure consistency and readability
  • Ask Claude for rewrites: Request revisions based on a rubric to get Claude to iterate and improve its output
  • Long context window tips: Optimize prompts that take advantage of Claude's longer context windows

We also provide an experimental helper metaprompt that prompts Claude to create a prompt for you based on guidelines you provide. The metaprompt is experimental, but may be helpful for drafting an initial prompt or quickly creating many prompt variations for testing.

Note: Models older than the Claude 3 family may require more prompt engineering. For more information, see our legacy model guide.


Additional Resources

To learn more about prompt engineering, check out these resources:

  • Anthropic cookbook: A set of recipes in the form of Jupyter notebooks which feature copy-able code that demonstrate how to use Claude in a variety of neat and effective ways in more advanced scenarios, such as uploading PDFs, tool use and function calling, embeddings, and more
  • Prompt engineering interactive tutorial: A hands-on step-by-step tutorial to make it easy to learn effective prompting strategies (requires an API key)
    • There is also an accompanying answer key if you would like to see example solutions.
  • Prompt library: A collection of pre-written prompts for common, fun, and helpful tasks for a variety of personal and professional use cases
  • Client SDKs: a set of tools to make it easier for you to build with and integrate Claude into your applications

Happy prompting!