Introduction

This guide explores how to leverage Claude to efficiently automate the routing of customer tickets at scale. By harnessing Claude’s advanced natural language understanding capabilities, organizations can analyze the content of each customer ticket and accurately determine the appropriate team or department best equipped to handle the issue. This guide walks through how to:

  • Frame the Intent categorization for your request ticket routing as a classification task.
  • Use Claude to understand and categorize customer inquiries accurately.
  • Evaluate the performance of your automated routing classification system
  • Integrate Claude into your support workflow.

Benefits of Automated Ticket Routing

  • Reduced manual effort: Automating the routing process significantly reduces the time and manual effort required to triage tickets, allowing support teams to focus on resolving issues rather than sorting through requests.
  • Faster resolution times: By promptly directing customer inquiries to the right experts, automated routing ensures that issues are addressed quickly and efficiently, leading to faster resolution times.
  • Enhanced customer satisfaction: With tickets being routed to the appropriate teams from the outset, customers receive more targeted and effective support, resulting in improved satisfaction levels.
  • Open paths for future automation. Precise ticket routing allows customers to explore multi- agent approaches where one model determines the intent and then routes the ticket to a specialized virtual agent with a more defined workflow, easing the automation process.

Advantages of Using Claude

Traditionally, multi-class classification techniques in Natural Language Processing (NLP) have been used to categorize support tickets. However, these methods require a very large training dataset, complex ontology design, and inflexible class definition.

Using Large Language Models (LLMs) like Claude, text classification for customer support ticket routing has become significantly more efficient and effective, addressing the limitations of traditional ML techniques:

  1. Minimal training data: Claude’s pre-trained language model can understand and classify tickets with just a few dozen labeled examples, greatly reducing the time and cost associated with data preparation.
  2. Adaptability to changing classes: As your product or customer needs evolve, Claude can easily adapt to changes in class definitions or the introduction of new classes without extensive relabeling of training data
  3. Simplified ontology design: Claude’s advanced language understanding capabilities allow it to accurately classify tickets based on their content and context, rather than relying on strict ontological structures.
  4. Interpretable reasoning: Claude can generate human-readable explanations for its classification decisions, providing interpretable reasoning that builds trust in the automation system and allow you to easily adapt the approach if needed

Defining the Task

Before diving into automation, it’s crucial to take a step back and thoroughly understand your existing ticketing system. Start by investigating how your support team currently handles ticket routing. Consider questions like:

  • What criteria are used to determine which team or department a ticket is assigned to?
  • Are there any automated rules or workflows already in place? In what cases do they fail?
  • How are edge cases or ambiguous tickets handled?
  • How does the team prioritize tickets?

The more you know about how humans handle certain cases, the better you will be able to work with Claude to do the task.

Defining intent categories

Intent categories are a crucial aspect of support ticket classification and routing as they represent the primary purpose or goal behind a customer’s inquiry or issue. By identifying the intent category, support systems can route tickets to the most appropriate team or agent equipped to handle the specific type of request.

If your support team does not already have intent categories defined, you can use Claude to analyze a representative sample of tickets to identify common themes, such as product inquiries or billing questions.

Be sure that the intent categories:

  1. Have descriptive names that clearly convey the primary purpose of the tickets they encompass
  2. Are mutually exclusive and comprehensive, leaving little ambiguity about which category a ticket belongs to
  3. Align with your support team’s processes and expertise to ensure tickets are routed to the agents most capable of providing effective resolutions

Example Data

Let’s take a look at some example data from a hypothetical customer support ticket system:

Here’s the information from the image converted into a markdown table:

#RequestIntentReasoning
132Hello! I had high-speed fiber internet installed on Saturday and my installer, Kevin, was absolutely fantastic! Where can I send my positive review? Thanks for your help!Support, Feedback, ComplaintThe user seeks information in order to leave positive feedback.
1646Have you guys sent my autographed print, yet? I am SO excited! My order was #12068. I haven’t received tracking information yet, but I’m anxiously waiting!Order TrackingCustomer requests tracking information/status.
3215I’m considering purchasing some of the cute clothes that y’all have on your website but I have a hard time finding clothes that fit my shape. If I don’t like the way the clothes fit, what is the policy for returning them?Refund/ExchangeAsking about return policy (pre-order)

In the example data provided (three examples above), we can see that each support ticket is assigned a single intent, which is then used for routing the ticket to the appropriate team. Upon further analysis, we discover that there are only three distinct intent types in the dataset. Our automation task is now clear: given the request text, categorize it into one of the three intents while matching the reasoning behind the classification.


Prompting Claude for Ticket Routing

Ticket routing is a classification task. For more information about classification tasks, see our classification guide.

Here, we’ll focus on building and optimizing a prompt for ticket classification.

Start by defining the method signature for wrapping our call to Claude. We’ll take ticket_contents:str as input and expect a tuple of reasoning:str and intent:str as output. If you have an existing automation using traditional ML, you’ll want to follow that method signature.

from typing import Tuple
import anthropic

# Create an instance of the Anthropic API client
client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
DEFAULT_MODEL = "claude-3-haiku-20240307"

def classify_support_request(ticket_contents: str) -> Tuple[str, str]:
    # Define the prompt for the classification task
    classification_prompt = # We'll talk about it in a bit.

    # Send the prompt to the API to classify the support request.
    message = client.messages.create(
        model=DEFAULT_MODEL,
        max_tokens=500,
        temperature=0,
        messages=[{"role": "user", "content": classification_prompt}],
        stream=False,
    )

    reasoning, intent = # extract these from the output response
    return reasoning, intent

This code:

  1. Imports the Anthropic library and creates a client instance using your API key.
  2. Defines a classify_support_request function that takes a ticket_contents string.
  3. Sends the ticket_contents to the Claude-3 model for classification using a specific classification_prompt (which we’ll discuss later).
  4. Returns the model’s reasoning and intent extracted from the response.

Since we need to wait for the entire reasoning and intent text to be generated before parsing, we set stream=False (the default).

Next we work on the classification_prompt. Our prompt should contain the contents of the user request and return both the reasoning and the intent. Forcing the model to return reasoning adds an implicit “think step-by-step” instruction into the prompt. Now, we’ll also want to extract the reasoning and intent from the text generated. When creating the prompt, we’ll be providing clear instructions and context, using examples to illustrate desired output, and using XML tags to add structure.

Our Prompt Engineering guide covers these techniques in detail. To help you get started you can also use the prompt generator on the Anthropic Console.

Here’s an example of how you can structure your classification prompt:

def classify_support_request(ticket_contents: str) -> Tuple[str, str]:
    # Define the prompt for the classification task
    classification_prompt = f"""You will be acting as a customer support ticket classification system. Your task is to analyze customer support requests and output the appropriate classification intent for each request, along with your reasoning. 

Here is the customer support request you need to classify:

<request>{ticket_contents}</request>

Please carefully analyze the above request to determine the customer's core intent and needs. Consider what the customer is asking for or complaining about.

Write out your reasoning and analysis of how to classify this request inside <reasoning> tags.

Then, output the appropriate classification label for the request inside a <intent> tag. The valid intents are:
<intents>
<intent>Support, Feedback, Complaint </intent>
<intent>Order Tracking</intent>
<intent>Refund/Exchange</intent>
</intents>

A request may have ONLY ONE applicable intent. Only include the intent that is most applicable to the request.

As an example, consider the following request:
<request>Hello! I had high-speed fiber internet installed on Saturday and my installer, Kevin, was absolutely fantastic! Where can I send my positive review? Thanks for your help!</request>

Here is an example of how your output should be formatted (for the above example request):
<reasoning>The user seeks information in order to leave positive feedback.</reasoning>
<intent>Support, Feedback, Complaint</intent>

Here are a few more examples:
---
Example 2 Input:
<request>I wanted to write and personally thank you for the compassion you showed towards my family during my father's funeral this past weekend. Your staff was so considerate and helpful throughout this whole process; it really took a load off our shoulders. The visitation brochures were beautiful. We'll never forget the kindness you showed us and we are so appreciative of how smoothly the proceedings went. Thank you, again, Amarantha Hill on behalf of the Hill Family.</request>

Example 2 Output:
<reasoning>User leaves a positive review of their experience.</reasoning>
<intent>Support, Feedback, Complaint</intent>

---

...

---
Example 9 Input:
<request>Your website keeps sending ad-popups that block the entire screen. It took me twenty minutes just to finally find the phone number to call and complain. How can I possibly access my account information with all of these popups? Can you access my account for me, since your website is broken? I need to know what the address is on file.</request>

Example 9 Output:
<reasoning>The user requests help accessing their web account information.</reasoning>
<intent>Support, Feedback, Complaint</intent>
---

Remember to always include your classification reasoning before your actual intent output. The reasoning should be enclosed in <reasoning> tags and the intent in <intent> tags. Return only the reasoning and the intent.
"""


Let’s break down the key components of this prompt:

  1. We use Python f-strings to create the prompt template, allowing the ticket_contents to be inserted into the <request> tags.
  2. We provide clear instructions on Claude’s role as a classification system that should carefully analyze the request to determine the customer’s core intent and needs.
  3. We ask Claude to provide its reasoning and analysis inside <reasoning> tags, followed by the appropriate classification label inside an <intent> tag.
  4. We specify the valid intents: “Support, Feedback, Complaint”, “Order Tracking”, and “Refund/Exchange”.
  5. We include a few examples to illustrate how the output should be formatted. These examples serve as a few-shot prompt to improve accuracy and consistency.

After generating Claude’s response, we use regular expressions to extract the reasoning and intent from the output. This allows us to separate the structured information from the generated text.

By crafting a clear and well-structured prompt, providing examples, and using XML tags, we can guide Claude to generate accurate and consistent classifications along with the underlying reasoning. This approach enhances the interpretability and reliability of the classification system.

The updated method looks like this:

def classify_support_request(ticket_contents: str) -> Tuple[str, str]:
    # Define the prompt for the classification task
    classification_prompt = f"""You will be acting as a customer support ticket classification system. ... 
...
...The reasoning should be enclosed in <reasoning> tags and the intent in <intent> tags. Return only the reasoning and the intent.
"""
    # Send the prompt to the API to classify the support request.
    message = client.messages.create(
        model=DEFAULT_MODEL,
        max_tokens=500,
        temperature=0,
        messages=[{"role": "user", "content": classification_prompt}],
        stream=False,
    )
    reasoning_and_intent = message.content[0].text

    # Use Python's regular expressions library to extract `reasoning`.
    reasoning_match = re.search(
        r"<reasoning>(.*?)</reasoning>", reasoning_and_intent, re.DOTALL
    )
    reasoning = reasoning_match.group(1).strip() if reasoning_match else ""

    # Similarly, also extract the `intent`.
    intent_match = re.search(r"<intent>(.*?)</intent>", reasoning_and_intent, re.DOTALL)
    intent = intent_match.group(1).strip() if intent_match else ""

    return reasoning, intent

Scaling to large number of intent classes

While the above approach works well for a handful of classes, you might need to revisit the framing of the task if your number of Intent classes is large (e.g., in the dozens). As the number of classes grows, the list of examples will also expand, potentially making the prompt unwieldy. In such cases, consider implementing a hierarchical classification system using a mixture of classifiers.

One effective strategy is to organize your intents into a taxonomic tree structure. You can then create a series of classifiers at every level of the tree, enabling a cascading routing approach. For example, you might have a top-level classifier that broadly categorizes tickets into “Technical Issues,” “Billing Questions,” and “General Inquiries.” Each of these categories can then have its own sub-classifiers to further refine the classification. title

An advantage of this hierarchical approach is that it closely mimics human reasoning for top-down classification. You can encode this reasoning into different prompts for each parent path, allowing for more targeted and context-specific classification. This can lead to improved accuracy and more nuanced handling of customer requests. However, the disadvantage of using multiple classifiers is the potential for slower response times due to the need for multiple calls to Claude. To mitigate this issue, consider using Haiku, the fastest model Claude offers, for the sub-classifiers. This can help strike a balance between classification accuracy and system responsiveness.


Evaluating the Performance of your Ticket Routing Classifier

Before deploying your ticket routing classifier to production, it’s crucial to evaluate its performance in terms of accuracy, cost, and speed. These three factors determine the readiness of your new system and boost confidence in its real-world effectiveness. A thorough evaluation helps you convince both technical and business stakeholders of the appropriateness and impact of your solution.

Choosing the right model

Many customers have found claude-3-haiku-20240307 an ideal model for this use case. It delivers excellent results and is the fastest and most cost-effective model in the Claude 3 family as of this writing. The choice of model depends on the trade-offs between cost, accuracy, and response time. However, if your classification problem requires deep subject matter expertise or highly complex reasoning, you may opt for the larger Sonnet model despite the higher cost.

Evaluation Methodology

To assess your classifier’s performance, we’ll call our classifier function and compare the predicted intent with the actual intent. To maintain the integrity of our evaluation, first remove the tickets used as examples in the prompt. Accuracy will be calculated as the percentage of correct predictions.

While more sophisticated metrics like F1-score offer a better measurement of the model’s performance, we’ll keep things simple for this evaluation. We’ll also focus on the predicted intent and ignore the returned reasoning for now though the reasoning will help you better understand the results.

For details on how to build a more robust classifier evaluation, see this classification cookbook.

The code snippet below evaluates Claude using three key metrics: accuracy, 95th percentile response time, and average cost per classification. By modifying the route_ticket function to return additional data, we can easily calculate these metrics and assess the model’s production-readiness.

from time import perf_counter
from typing import Tuple
import anthropic

# Create an instance of the Anthropic API client
client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)


def classify_support_request(
    request: str, gt_intent: str, model: str = DEFAULT_MODEL
) -> Tuple[str, str]:
    # Define the prompt for the classification task
    classification_prompt = f"""You will be acting as a customer support ticket classification system. ... 
...
...The reasoning should be enclosed in <reasoning> tags and the intent in <intent> tags. Return only the reasoning and the intent.
"""

    # Send the prompt to the API to classify the support request and time the entire processing.
    tic = perf_counter()

    message = client.messages.create(
        model=model,
        max_tokens=500,
        temperature=0,
        messages=[{"role": "user", "content": classification_prompt}],
    )
    usage = message.usage  # Get the usage statistics for the API call for how many input and output tokens were used.
    reasoning_and_intent = message.content[0].text

    # Use Python's regular expressions library to extract `reasoning`.
    reasoning_match = re.search(
        r"<reasoning>(.*?)</reasoning>", reasoning_and_intent, re.DOTALL
    )
    reasoning = reasoning_match.group(1).strip() if reasoning_match else ""

    # Similarly, also extract the `intent`.
    intent_match = re.search(r"<intent>(.*?)</intent>", reasoning_and_intent, re.DOTALL)
    intent = intent_match.group(1).strip() if intent_match else ""

    time_taken = (
        perf_counter() - tic
    )  # Calculate the time taken for the API call + parsing.
    correct = (
        True if gt_intent.strip() == intent.strip() else False
    )  # Check if the model's prediction is correct.

    # Return the reasoning, intent, correct, usage, and time taken.
    return reasoning, intent, correct, usage, time_taken

Interpreting the results for the given dataset, using the claude-3-haiku-20240307 model, we observe the following results:

  • For the 9 examples we use in the prompt:
    • Accuracy: 100.00%
    • 95th Percentile Time Taken: 1.29 seconds
    • Average Cost per Request Routing: $0.0004
  • For rest of the 91 samples in the test set:
    • Accuracy: 89.01%
    • 95th Percentile Time Taken: 1.61 seconds
    • Average Cost per Request Routing: $0.0004

In addition to considering and measuring these core metrics, you may also consider:

  • Consistency and reliability of the model’s performance across different ticket types
  • Handling of edge cases and ambiguous tickets
  • Interpretability and usefulness of the classifications for human agents
  • Overall stability and maintainability of the system

Conducting further testing and implementing an incremental rollout can help build confidence before a full deployment.

Comparing the performance of different models on the remaining 91 samples in the test set:

  • claude-3-sonnet-20240229:
    • Accuracy: 92.31%
    • 95th Percentile Time Taken: 3.41 seconds
    • Average Cost per Request Routing: $0.0050
  • claude-3-opus-20240229:
    • Accuracy: 84.62%
    • 95th Percentile Time Taken: 8.21 seconds
    • Average Cost per Request Routing: $0.0256

Iterating your prompt for better performance

If the initial metrics indicate that improvements are necessary, you can refine your prompt to enhance the model’s performance. We encourage referencing our Prompt Engineering guide and prompt generator for more details on how to craft the most effective prompts to optimize Claude 3’s output.

One especially effective way to improve performance is to provide more targeted examples to Claude in the prompt. To do so, you could employ a vector database to do similarity searches from a sample dataset and retrieve the most relevant examples for a given query. By augmenting the LLM with retrieved examples, we can provide additional context and improve the accuracy of the generated classifications. This approach is outlined in this classification cookbook, which walks through how this approach improved performance from 71% accuracy to 93% accuracy.

Adapting to common scenarios

In addition to this approach, performance can often be meaningfully improved by providing more edge case examples to Claude in the prompt. Here are some scenarios where Claude may misclassify tickets and it would be valuable to consider including examples of how to handle in the prompt:

  • Implicit Requests: Customers often express needs indirectly. For example, “I’ve been waiting for my package for over two weeks now.” is an indirect request for order status.
  • Emotional Prioritization: When customers express dissatisfaction, Claude may prioritize addressing the emotion over solving the underlying problem. Providing Claude with directions on when to prioritize customer sentiment or not can be helpful.
  • Intent vs. Routing: Claude may correctly identify a customer intent, but route it incorrectly. Clarifying the appropriate routes of certain intents is important, especially when the routes may be more ambiguous.
  • Issue Prioritization: When customers present multiple issues in a single interaction, Claude may have difficulty identifying the primary concern. Clarifying the prioritization of intents can help Claude better identify the primary concern.

Remember, as your system evolves, it’s essential to regularly review and refine your prompts to ensure they remain effective and aligned with your changing needs. Continuously monitor the system’s performance, gather feedback from stakeholders, and make necessary adjustments to optimize its accuracy and efficiency.


Integrate Claude into your Support Workflow

When integrating your code into production, you’ll need to architect how it fits into the flow of your ticket routing system. There are two ways you could go around doing this:

  • Push-based: Where the Support Ticket System you’re using (e.g. Zendesk an Anthropic partner) will trigger your code by sending a webhook event to your routing service, which will then classify the intent and route it.
  • Pull-Based: Where your code could pull for the latest tickets at a certain schedule and then route them.

While the bulk of the classification work discussed in previous sections remains the same, you will need to wrap your code in a service for either of the two approaches above. The choice of approach depends on what APIs the support ticketing system provides. Between the two, the push-based approach using webhooks is more web-scaleable but needs you to expose a public endpoint that might have IT Security implications. The pull-based approach is easier to implement but makes unnecessary calls to the Support Ticket System.

push-based-approach

The diagram above shows the push-based approach in action:

  1. Support Ticket Creation - The process begins when a customer creates a new support ticket. The customer provides the necessary information about their issue or inquiry, which is then submitted to the Support Ticket System.
  2. Webhook Event Generation - Upon receiving the new support ticket, the Support Ticket System should generate a Webhook Event Ticket Created notification. This event triggers the subsequent steps in the ticket routing process.
  3. Ticket Content Retrieval - The webhook event initiates the retrieval of the ticket’s contents from the Support Ticket System. This step ensures that the full details of the customer’s issue are available for analysis and classification.
  4. Support Request Classification - Using the retrieved ticket contents, the system classifies the intent behind the support request using your code. This classification helps identify the most appropriate team or service to handle the ticket. For the webhook-based approach to work, your code from the previous section will need to be served using a RESTful API which can be called from the webhook. The endpoint for the request would need to be reachable from the internet.
  5. Ticket Update - Finally, the ticket is updated back into the Support Ticket System, from where the assigned support team can work on resolving it.

Note: While the classification method calls Claude API, we’ve removed that extra call from the diagram for simplicity.


Additional Considerations

Before fully deploying to production, consider the following steps to ensure a smooth and reliable rollout of your solutions:

  • Implement retry logic: While Claude is a robust and highly available assistant, it’s crucial to add try/except logic to handle cases where Claude doesn’t return the expected formatted output or is temporarily unavailable. Implement back-off logic to retry after increasing intervals or slightly adjust the temperature to generate output variations.
  • Thorough staging testing: Conduct extensive testing in a staging environment that closely resembles your production setup. This will help identify any potential issues or incompatibilities before deployment.
  • Load testing: Perform load testing to verify that the system can handle the anticipated volume of tickets without performance degradation. This ensures that the system remains responsive and efficient under real-world conditions.
  • Error handling and logging: Implement comprehensive error handling and logging mechanisms to facilitate debugging and monitoring in production. This will help you quickly identify and resolve any issues that may arise.
  • Gradual rollout: Establish a phased rollout plan, starting with a small percentage of traffic and gradually increasing it while closely monitoring the system’s behavior. This approach minimizes risk and allows for a controlled deployment.
  • Documentation and training: Prepare detailed documentation and provide training to relevant stakeholders on how to use and maintain the new system effectively. This ensures a smooth transition and promotes adoption.
  • Monitoring and alerting: Set up robust monitoring and alerting mechanisms to proactively detect and address any issues that may arise in production. This enables your team to respond quickly and minimize downtime.

By following these steps, you can ensure a successful and reliable deployment of your automated ticket routing system, providing a seamless experience for your users.