Running Evaluations
API methods related to running evaluations on Test Set
The Evaluation Execution API allows you to programmatically run evaluations outside of the Context.ai UI, for example, in the context of a unit testing framework or CI/CD workflow.
The API has two important methods: one to start a new execution and one to fetch the results of a finished run.
An evaluation run request executes all the test cases within a previously uploaded Test Set Version. For each test case, the evaluation run does the following:
Generates the model response for the context window provided.
Runs each evaluator on the generated response and provides evaluation verdict along with the chain-of-thought reasoning. Evaluation verdict can be one of the following:
Passed The evaluator condition was fulfilled
Failed The evaluator condition was not fulfilled
Partially Passed The evaluator condition was partially fulfilled. This can occur when only a subset of the generated response meets the evaluation criteria.
Inconclusive The evaluator either failed to execute or the model response is inconclusive.
If the iteration field is set, it will repeat this process 1, 3, 5 or 7 times so you have more confidence in the results.
You can run an evaluation through the Context.ai web app or by submitting a POST request through the API. In the latter case, you can access the results of the run either by polling the results endpoint or on the Context.ai web app.
Submitting an evaluation job
Submits a run evaluation request
POST
https://api.context.ai/api/v1/evaluations/run
Queues a job to run evaluations on the provided Test Set version, if the version has not been run yet. Returns a property data.run_id
that can be used to poll the results of this run.
Request Body
Example Request
Example Response
Polling the results of a run
Get results of a run
GET
https://api.context.ai/api/v1/evaluations/run/{id}
Returns the results of the evaluation run.
Headers
Example Response
Last updated