In this guide we will walk through the process of determining the best approach for building a classifier with Claude and the essentials of end-to-end deployment for a Claude classifier — from use case exploration to back-end integration.

Visit our classification cookbook to see example classification implementations using Claude.

When to use Claude for classification

When should you consider using an LLM instead of a traditional ML approach for your classification tasks? Here are some key indicators:

  1. Rule-based classes: Use Claude when classes are defined by conditions rather than examples, as it can understand underlying rules.
  2. Evolving classes: Claude adapts well to new or changing domains with emerging classes and shifting boundaries.
  3. Unstructured inputs: Claude can handle large volumes of unstructured text inputs of varying lengths.
  4. Limited labeled examples: With few-shot learning capabilities, Claude learns accurately from limited labeled training data.
  5. Reasoning Requirements: Claude excels at classification tasks requiring semantic understanding, context, and higher-level reasoning.

Common classification use cases

Claude is used by companies across industries for intelligent classification tasks. Below is a non exhaustive list of common classification use cases where Claude excels by industry.

  1. Tech & IT
    • Content moderation: Automatically identify and flag inappropriate, offensive, or harmful content in user-generated text, images, or videos.
    • Bug prioritization: Classify software bug reports based on their severity, impact, or complexity to prioritize development efforts and allocate resources effectively.
  2. Customer service
    • Intent analysis: Determine what the user wants to achieve or what action they want the system to perform based on their text inputs.
    • Support ticket routing: Analyze customer interactions, such as call center transcripts or support tickets, to route issues to the appropriate teams, prioritize critical cases, and identify recurring problems for proactive resolution.
  3. Healthcare:
    • Patient triaging: Classify customer intake conversations and data according to the urgency, topic, or required expertise for efficient triaging.
    • Clinical trial screening: Analyze patient data and medical records to identify and categorize eligible participants based on specified inclusion and exclusion criteria.
  4. Finance:
    • Fraud detection: Identify suspicious patterns or anomalies in financial transactions, insurance claims, or user behavior to prevent and mitigate fraudulent activities.
    • Credit risk assessment: Classify loan applicants based on their creditworthiness into risk categories to automate credit decisions and optimize lending processes.
  5. Legal:
    • Legal document categorization: Classify legal documents, such as pleadings, motions, briefs, or memoranda, based on their document type, purpose, or relevance to specific cases or clients.

Using Claude for classification

When deciding which Claude model to use, it is important to determine the intelligence, latency, and price requirements for your use case up front.

Learn more about how Opus, Sonnet, and Haiku compare For classification, a smaller model like Claude 3 Haiku is typically ideal due to its speed and efficiency. For classification tasks where specialized knowledge or complex reasoning is required, Sonnet may be a better choice. Evaluations are a good way to gauge whether a Claude model is performing well enough on its classification task to launch into production. See our page empirical performance evaluations for an overview of the evaluation process. For guidance on how to use Claude to automate the evaluation process, check out the automated evaluations section of the Anthropic Cookbook.

1. Build a strong prompt

While Claude offers high-level baseline performance out of the box, a strong input prompt helps get the best results. We provide a wide range of prompts to get you started in our prompt library, including prompts for a number of classification use cases, including:

For a generic classifier that you can adapt to your specific use case, copy the starter prompt below into our developer Console

UserYou will be building a text classifier that can automatically categorize text into a set of predefined categories.

Here are the categories the classifier will use:

To help you understand how to classify text into these categories, here are some example texts that have already been labeled with their correct category:

Carefully study these examples to identify the key features and characteristics that define each category. Write out your analysis of each category inside <category_analysis> tags, explaining the main topics, themes, writing styles, etc. that seem to be associated with each one.

Once you feel you have a good grasp of the categories, your task is to take in new, unlabeled texts and output a prediction of which category it most likely belongs to.

Before giving your final classification, show your step-by-step process and reasoning inside <classification_process> tags. Weigh the evidence for each potential category. Then output your final <classification> for which category you think the example text belongs to.

The goal is toaccurately categorize new texts into the most appropriate category, as defined by the examples.

Our prompt generator and prompt engineering guide can also help you craft the most effective prompts to optimize Claude 3’s output.

2. Develop your test cases

To run your classification evaluation, you will need test cases to run it on. We recommend gathering a wide range of realistic data that cover all of your predefined categories. Don’t forget to add in edge cases to test the guardrails!

Manually building test cases can be quite time consuming if you don’t have a dataset already available. Many customers use Claude to generate example test data. However, “golden answer” labels are best when manually generated, so as to limit model bias and establish a firm ground truth.

3. Run your eval

You can run an evaluation via a script as is shown in our classification cookbook. AWS Bedrock also provides a platform for model evaluation.

Evaluation metrics

Some success metrics to consider when evaluating Claude’s classification performance include:

AccuracyThe model’s output exactly matches the golden answer or correctly classifies the input according to the task’s requirements. This is typically calculated as (Number of correct predictions) / (Overall number of predictions).
F1 ScoreThe models output optimally balances precision and recall.
ConsistencyThe model’s output is consistent with its predictions for similar inputs or follows a logical pattern.
StructureThe model’s output follows the expected format or structure, making it easy to parse and interpret. For example, many classifiers are expected to output JSON format.
SpeedThe model provides a response within the acceptable time limit or latency threshold for the task.
Bias and FairnessIf classifying data about people, is it important that the model does not demonstrate any biases based on gender, ethnicity, or other characteristics that would lead to its misclassification.

Deploy your solution

To see code examples of how to use Claude for classification, check out our classification cookbook which contains a Jupyter notebook with ready-made code demonstrating how to use and evaluate Claude for classification scenarios.