Customer support agent
This guide walks through how to leverage Claude’s advanced conversational capabilities to handle customer inquiries in real time, providing 24/7 support, reducing wait times, and managing high support volumes with accurate responses and positive interactions.
Before building with Claude
Decide whether to use Claude for support chat
Here are some key indicators that you should employ an LLM like Claude to automate portions of your customer support process:
Some considerations for choosing Claude over other LLMs:
- You prioritize natural, nuanced conversation: Claude’s sophisticated language understanding allows for more natural, context-aware conversations that feel more human-like than chats with other LLMs.
- You often receive complex and open-ended queries: Claude can handle a wide range of topics and inquiries without generating canned responses or requiring extensive programming of permutations of user utterances.
- You need scalable multilingual support: Claude’s multilingual capabilities allow it to engage in conversations in over 200 languages without the need for separate chatbots or extensive translation processes for each supported language.
Define your ideal chat interaction
Outline an ideal customer interaction to define how and when you expect the customer to interact with Claude. This outline will help to determine the technical requirements of your solution.
Here is an example chat interaction for car insurance customer support:
- Customer: Initiates support chat experience
- Claude: Warmly greets customer and initiates conversation
- Customer: Asks about insurance for their new electric car
- Claude: Provides relevant information about electric vehicle coverage
- Customer: Asks questions related to unique needs for electric vehicle insurances
- Claude: Responds with accurate and informative answers and provides links to the sources
- Customer: Asks off-topic questions unrelated to insurance or cars
- Claude: Clarifies it does not discuss unrelated topics and steers the user back to car insurance
- Customer: Expresses interest in an insurance quote
- Claude: Ask a set of questions to determine the appropriate quote, adapting to their responses
- Claude: Sends a request to use the quote generation API tool along with necessary information collected from the user
- Claude: Receives the response information from the API tool use, synthesizes the information into a natural response, and presents the provided quote to the user
- Customer: Asks follow up questions
- Claude: Answers follow up questions as needed
- Claude: Guides the customer to the next steps in the insurance process and closes out the conversation
Break the interaction into unique tasks
Customer support chat is a collection of multiple different tasks, from question answering to information retrieval to taking action on requests, wrapped up in a single customer interaction. Before you start building, break down your ideal customer interaction into every task you want Claude to be able to perform. This ensures you can prompt and evaluate Claude for every task, and gives you a good sense of the range of interactions you need to account for when writing test cases.
Here are the key tasks associated with the example insurance interaction above:
-
Greeting and general guidance
- Warmly greet the customer and initiate conversation
- Provide general information about the company and interaction
-
Product Information
- Provide information about electric vehicle coverage
This will require that Claude have the necessary information in its context, and might imply that a RAG integration is necessary.
- Answer questions related to unique electric vehicle insurance needs
- Answer follow-up questions about the quote or insurance details
- Offer links to sources when appropriate
- Provide information about electric vehicle coverage
-
Conversation Management
- Stay on topic (car insurance)
- Redirect off-topic questions back to relevant subjects
-
Quote Generation
- Ask appropriate questions to determine quote eligibility
- Adapt questions based on customer responses
- Submit collected information to quote generation API
- Present the provided quote to the customer
Establish success criteria
Work with your support team to define clear success criteria and write detailed evaluations with measurable benchmarks and goals.
Here are criteria and benchmarks that can be used to evaluate how successfully Claude performs the defined tasks:
Here are criteria and benchmarks that can be used to evaluate the business impact of employing Claude for support:
How to implement Claude as a customer service agent
Choose the right Claude model
The choice of model depends on the trade-offs between cost, accuracy, and response time.
For customer support chat, claude-3-5-sonnet-20241022
is well suited to balance intelligence, latency, and cost. However, for instances where you have conversation flow with multiple prompts including RAG, tool use, and/or long-context prompts, claude-3-haiku-20240307
may be more suitable to optimize for latency.
Build a strong prompt
Using Claude for customer support requires Claude having enough direction and context to respond appropriately, while having enough flexibility to handle a wide range of customer inquiries.
Let’s start by writing the elements of a strong prompt, starting with a system prompt:
User
turn (with the only exception being role prompting). Read more at Giving Claude a role with a system prompt.It’s best to break down complex prompts into subsections and write one part at a time. For each task, you might find greater success by following a step by step process to define the parts of the prompt Claude would need to do the task well. For this car insurance customer support example, we’ll be writing piecemeal all the parts for a prompt starting with the “Greeting and general guidance” task. This also makes debugging your prompt easier as you can more quickly adjust individual parts of the overall prompt.
We’ll put all of these pieces in a file called config.py
.
We’ll then do the same for our car insurance and electric car insurance information.
Now that we have our static content, let’s add at least 4-5 sample “good” interactions to guide Claude’s responses. These examples should be representative of your ideal customer interaction and can include guardrails, tool calls, etc.
You will also want to include any important instructions outlining Do’s and Don’ts for how Claude should interact with the customer. This may draw from brand guardrails or support policies.
Now let’s combine all these sections into a single string to use as our prompt.
Add dynamic and agentic capabilities with tool use
Claude is capable of taking actions and retrieving information dynamically using client-side tool use functionality. Start by listing any external tools or APIs the prompt should utilize.
For this example, we will start with one tool for calculating the quote.
Example insurance quote calculator:
Deploy your prompts
It’s hard to know how well your prompt works without deploying it in a test production setting and running evaluations so let’s build a small application using our prompt, the Anthropic SDK, and streamlit for a user interface.
In a file called chatbot.py
, start by setting up the ChatBot class, which will encapsulate the interactions with the Anthropic SDK.
The class should have two main methods: generate_message
and process_user_input
.
Build your user interface
Test deploying this code with Streamlit using a main method. This main()
function sets up a Streamlit-based chat interface.
We’ll do this in a file called app.py
Run the program with:
Evaluate your prompts
Prompting often requires testing and optimization for it to be production ready. To determine the readiness of your solution, evaluate the chatbot performance using a systematic process combining quantitative and qualitative methods. Creating a strong empirical evaluation based on your defined success criteria will allow you to optimize your prompts.
Improve performance
In complex scenarios, it may be helpful to consider additional strategies to improve performance beyond standard prompt engineering techniques & guardrail implementation strategies. Here are some common scenarios:
Reduce long context latency with RAG
When dealing with large amounts of static and dynamic context, including all information in the prompt can lead to high costs, slower response times, and reaching context window limits. In this scenario, implementing Retrieval Augmented Generation (RAG) techniques can significantly improve performance and efficiency.
By using embedding models like Voyage to convert information into vector representations, you can create a more scalable and responsive system. This approach allows for dynamic retrieval of relevant information based on the current query, rather than including all possible context in every prompt.
Implementing RAG for support use cases RAG recipe has been shown to increase accuracy, reduce response times, and reduce API costs in systems with extensive context requirements.
Integrate real-time data with tool use
When dealing with queries that require real-time information, such as account balances or policy details, embedding-based RAG approaches are not sufficient. Instead, you can leverage tool use to significantly enhance your chatbot’s ability to provide accurate, real-time responses. For example, you can use tool use to look up customer information, retrieve order details, and cancel orders on behalf of the customer.
This approach, outlined in our tool use: customer service agent recipe, allows you to seamlessly integrate live data into your Claude’s responses and provide a more personalized and efficient customer experience.
Strengthen input and output guardrails
When deploying a chatbot, especially in customer service scenarios, it’s crucial to prevent risks associated with misuse, out-of-scope queries, and inappropriate responses. While Claude is inherently resilient to such scenarios, here are additional steps to strengthen your chatbot guardrails:
- Reduce hallucination: Implement fact-checking mechanisms and citations to ground responses in provided information.
- Cross-check information: Verify that the agent’s responses align with your company’s policies and known facts.
- Avoid contractual commitments: Ensure the agent doesn’t make promises or enter into agreements it’s not authorized to make.
- Mitigate jailbreaks: Use methods like harmlessness screens and input validation to prevent users from exploiting model vulnerabilities, aiming to generate inappropriate content.
- Avoid mentioning competitors: Implement a competitor mention filter to maintain brand focus and not mention any competitor’s products or services.
- Keep Claude in character: Prevent Claude from changing their style of context, even during long, complex interactions.
- Remove Personally Identifiable Information (PII): Unless explicitly required and authorized, strip out any PII from responses.
Reduce perceived response time with streaming
When dealing with potentially lengthy responses, implementing streaming can significantly improve user engagement and satisfaction. In this scenario, users receive the answer progressively instead of waiting for the entire response to be generated.
Here is how to implement streaming:
- Use the Anthropic Streaming API to support streaming responses.
- Set up your frontend to handle incoming chunks of text.
- Display each chunk as it arrives, simulating real-time typing.
- Implement a mechanism to save the full response, allowing users to view it if they navigate away and return.
In some cases, streaming enables the use of more advanced models with higher base latencies, as the progressive display mitigates the impact of longer processing times.
Scale your Chatbot
As the complexity of your Chatbot grows, your application architecture can evolve to match. Before you add further layers to your architecture, consider the following less exhaustive options:
- Ensure that you are making the most out of your prompts and optimizing through prompt engineering. Use our prompt engineering guides to write the most effective prompts.
- Add additional tools to the prompt (which can include prompt chains) and see if you can achieve the functionality required.
If your Chatbot handles incredibly varied tasks, you may want to consider adding a separate intent classifier to route the initial customer query. For the existing application, this would involve creating a decision tree that would route customer queries through the classifier and then to specialized conversations (with their own set of tools and system prompts). Note, this method requires an additional call to Claude that can increase latency.
Integrate Claude into your support workflow
While our examples have focused on Python functions callable within a Streamlit environment, deploying Claude for real-time support chatbot requires an API service.
Here’s how you can approach this:
-
Create an API wrapper: Develop a simple API wrapper around your classification function. For example, you can use Flask API or Fast API to wrap your code into a HTTP Service. Your HTTP service could accept the user input and return the Assistant response in its entirety. Thus, your service could have the following characteristics:
- Server-Sent Events (SSE): SSE allows for real-time streaming of responses from the server to the client. This is crucial for providing a smooth, interactive experience when working with LLMs.
- Caching: Implementing caching can significantly improve response times and reduce unnecessary API calls.
- Context retention: Maintaining context when a user navigates away and returns is important for continuity in conversations.
-
Build a web interface: Implement a user-friendly web UI for interacting with the Claude-powered agent.