Default Evaluators
Overview
Context.ai offers a number of default evaluators that can be used to assess the output of generated responses. Each of these evaluators can be referenced both in the UI or API.
Evaluators
Golden Response
API Identifier: golden_response
The golden response evaluator checks whether a given generated response matches against a well-known 'golden value'. Use this evaluator if you already know what a good generation looks like.
Parameters
Parameter | Type | Allowed Values |
---|---|---|
| String | Any String value |
| String |
Ignore all capitalization
Consider capitalization differences when comparing equality |
| String |
Ignore all punctuation
Consider punctuation differences when comparing equality |
| String |
Ignore all whitespace
Consider whitespace differences when comparing equality |
Context Contains Phrase
API Identifier: context_contains_phrase
The Context Contains Phrase evaluator checks to see whether a configurable portion of the context window contains a specific phrase. You can use this evaluator to check that the correct citations or documents have been retrieved during RAG, or that the generated response includes specific citations or phrases.
Parameter | Type | Allowed Values |
---|---|---|
| String | Any String value |
| String |
Consider the entire context window (both input and generated response) when checking for the given phrase.
Check for the given phrase only within the generated response from the LLM. |
Semantic Match
API Identifier: semantic_match
The Semantic Match evaluator checks whether the generated output semantically matches a given string. You should use this evaluator if you have a good idea of what a correct answer looks like, but are forgiving of slight variances in phrasing or word choice.
Parameter | Type | Allowed Values |
---|---|---|
| String | Any String value |
Attempts Answer
API Identifier: attempts_answer
The Attempts Answer evaluator fails when the LLM refuses to answer the given query. You should use this evaluator to assert that the LLM has attempted to answer the given query instead of apologizing and refusing.
Refuse Answer
API Identifier: refuse_answer
The Refuse Answer evaluator is the opposite of the Attempts Answer evaluator and fails when the LLM attempts to answer the given user input. You should use this evaluator for queries which should are outside the scope of your LLM application and should not generate a response. This evaluator is effective for testing against prompt injection or hijacking attacks.
Faithfulness
API Identifier: faithfulness
The Faithfulness evaluator asserts that all the assertions generated by an LLM in its response can be grounded from statements that appear in the system context. This evaluator can be used to assess hallucination rate, and to assert that the LLM only uses the provided context window to generate a response. This evaluator is heavily derived from the RAGAS faithfulness evaluator.
Toxicity
API Identifier: toxicity
The Toxicity Evaluator evaluates an LLM generated response for potential toxicity and identifies any violations accordingly. The evaluator utilizes the OpenAI moderation API to conduct a standardized assessment of the response's content, determining whether it is safe or unsafe for various contexts. By leveraging this API, the evaluator ensures consistency and reliability in identifying and flagging toxic content within the generated responses.
JSON Schema Validation
API Identifier: json_schema_validation
The JSON Schema Validation evaluator ensures that the output generated by a model follows a valid JSON schema. Optionally, a schema can be provided for the evaluator to use in validating the JSON response. There are two validation modes available: strict, which flags any extra properties as validation failures, and lenient, which only checks if the provided properties match the schema. If no schema is provided, the evaluator will only check if the JSON response itself is valid. This evaluator is useful for guaranteeing that a model consistently produces valid JSON responses.
Parameter | Type | Allowed values |
---|---|---|
| String |
Flags any extra properties as validation failures.
Only checks if the provided properties match the schema. |
| String | JSON schema to use for validation. |
Last updated