LangChain Chatbot
Airline support chat bot with evaluation walkthrough using LangChain
If you're developing an AI chatbot on top of LangChain, a common problem you might have is ensuring that the chatbot stays on guardrails and refuses to answer inappropriate or off-topic questions.
In this cookbook, we'll show you how you can use Context.ai to evaluate your chatbot on a wide range of potential user inputs to make sure the behaviour of your chatbot is as intended. We will build our application "London Airlines" which is a simple support agent for customer queries.
Let's get started by installing LangChain and Context's Python SDK.
We will first build a simple LangChain application using an LLMChain and PromptTemplates.
For some basic testing we will define some test inputs for our application and specify if we would like our application to respond or not. We would like out application to only answer relevant questions and want to be able to test for this.
To create a chain we will use ChatPromptTemplate and ChatOpenAI and include them in LLMChain.
Finally we can do a quick test run to make sure everything is working nicely! I have attached a sample result for Elvis Presley.
Chain invoke: {'name': 'Elvis Presley', 'query': 'How heavy is an average camel?', 'text': 'Hello, Elvis Presley! Thank you for reaching out to London Airlines. \n\nOn average, a fully grown adult camel can weigh anywhere between 900 to 1,600 pounds. Their weight can vary depending on factors such as age, gender, and breed. \n\nIf you have any other questions or need assistance with anything else, feel free to let me know. Have a great day!'}
Great! We now have some working output. But we can see from observation that our application has responded to the user query when it really should not have! When you have only 6 cases it is feasible to manually inspect the results to see if they are working as expected. But this is not a scalable solution, particularly if you need to test your LLM before every deployment!
Let's use Context.ai to automate this process.
In our for loop from before we generate some TestCases
and assign either the attempts_answer or refuse_answer evaluator as required and store the results in the test_cases list. As we have already generated the response we will upload these using the pregenerated_response
field.
We could upload the query without the pregenerated_response
and Context will regenerate the model response for you automatically. pregenerated_response
is particularly useful if you are using a fine-tuned model.
Notice we have used chat_prompt_template.format_messages
to recreate the user message. We could also use user_input["query"]
if we wanted to submit the original user query.
The final step is to upload our test_cases
to Context.
Thats it! The only thing left to do is run the TestSet! We can do this directly in Python or via the Context platform at with.context.ai/evaluations/sets.
We can now repeat this process after altering the PromptTemplate
. This took me a couple of attempts to find a prompt which does not respond to off-topic queries.
After some prompt engineering and uploading new versions to Context for evaluation, I have found a prompt which works!
If you have any feedback, please let us know by emailing henry@context.ai
Last updated