Default Evaluators

Overview offers a number of default evaluators that can be used to assess the output of generated responses. Each of these evaluators can be referenced both in the UI or API.


Golden Response

API Identifier: golden_response

The golden response evaluator checks whether a given generated response matches against a well-known 'golden value'. Use this evaluator if you already know what a good generation looks like.


ParameterTypeAllowed Values



Any String value



agnostic (Default)

Ignore all capitalization full

Consider capitalization differences when comparing equality



agnostic (Default)

Ignore all punctuation full

Consider punctuation differences when comparing equality



agnostic (Default)

Ignore all whitespace full

Consider whitespace differences when comparing equality

Context Contains Phrase

API Identifier: context_contains_phrase

The Context Contains Phrase evaluator checks to see whether a configurable portion of the context window contains a specific phrase. You can use this evaluator to check that the correct citations or documents have been retrieved during RAG, or that the generated response includes specific citations or phrases.

ParameterTypeAllowed Values



Any String value



context_window (Default)

Consider the entire context window (both input and generated response) when checking for the given phrase. model_response_only

Check for the given phrase only within the generated response from the LLM.

Semantic Match

API Identifier: semantic_match

The Semantic Match evaluator checks whether the generated output semantically matches a given string. You should use this evaluator if you have a good idea of what a correct answer looks like, but are forgiving of slight variances in phrasing or word choice.

ParameterTypeAllowed Values



Any String value

Attempts Answer

API Identifier: attempts_answer

The Attempts Answer evaluator fails when the LLM refuses to answer the given query. You should use this evaluator to assert that the LLM has attempted to answer the given query instead of apologizing and refusing.

Refuse Answer

API Identifier: refuse_answer

The Refuse Answer evaluator is the opposite of the Attempts Answer evaluator and fails when the LLM attempts to answer the given user input. You should use this evaluator for queries which should are outside the scope of your LLM application and should not generate a response. This evaluator is effective for testing against prompt injection or hijacking attacks.


API Identifier: faithfulness

The Faithfulness evaluator asserts that all the assertions generated by an LLM in its response can be grounded from statements that appear in the system context. This evaluator can be used to assess hallucination rate, and to assert that the LLM only uses the provided context window to generate a response. This evaluator is heavily derived from the RAGAS faithfulness evaluator.


API Identifier: toxicity

The Toxicity Evaluator evaluates an LLM generated response for potential toxicity and identifies any violations accordingly. The evaluator utilizes the OpenAI moderation API to conduct a standardized assessment of the response's content, determining whether it is safe or unsafe for various contexts. By leveraging this API, the evaluator ensures consistency and reliability in identifying and flagging toxic content within the generated responses.

JSON Schema Validation

API Identifier: json_schema_validation

The JSON Schema Validation evaluator ensures that the output generated by a model follows a valid JSON schema. Optionally, a schema can be provided for the evaluator to use in validating the JSON response. There are two validation modes available: strict, which flags any extra properties as validation failures, and lenient, which only checks if the provided properties match the schema. If no schema is provided, the evaluator will only check if the JSON response itself is valid. This evaluator is useful for guaranteeing that a model consistently produces valid JSON responses.

ParameterTypeAllowed values




Flags any extra properties as validation failures. disabled(Default)

Only checks if the provided properties match the schema.



JSON schema to use for validation.

Last updated