Running Evaluations

API methods related to running evaluations on Test Set

The Evaluation Execution API allows you to programmatically run evaluations outside of the Context.ai UI, for example, in the context of a unit testing framework or CI/CD workflow.

The API has two important methods: one to start a new execution and one to fetch the results of a finished run.

An evaluation run request executes all the test cases within a previously uploaded Test Set Version. For each test case, the evaluation run does the following:

  1. Generates the model response for the context window provided.

  2. Runs each evaluator on the generated response and provides evaluation verdict along with the chain-of-thought reasoning. Evaluation verdict can be one of the following:

    1. Passed The evaluator condition was fulfilled

    2. Failed The evaluator condition was not fulfilled

    3. Partially Passed The evaluator condition was partially fulfilled. This can occur when only a subset of the generated response meets the evaluation criteria.

    4. Inconclusive The evaluator either failed to execute or the model response is inconclusive.

  3. If the iteration field is set, it will repeat this process 1, 3, 5 or 7 times so you have more confidence in the results.

You can run an evaluation through the Context.ai web app or by submitting a POST request through the API. In the latter case, you can access the results of the run either by polling the results endpoint or on the Context.ai web app.

Submitting an evaluation job

Submits a run evaluation request

POST https://api.context.ai/api/v1/evaluations/run

Queues a job to run evaluations on the provided Test Set version, if the version has not been run yet. Returns a property data.run_id that can be used to poll the results of this run.

Request Body

NameTypeDescription

test_set_name*

String

Name of the Test Set

version*

number

Version of the Test Set

iterations

number

Number of times a test case with evaluators should run. Choices are 1, 3, 5 or 7.

Example Request

{
    "test_set_name": "My First Test Set",
    "version": 1,
    "iterations": 3
}

Example Response

{
    "status": "accepted",
    "data": {
        "run_id": "7e835d66-ca7c-4986-b3ca-0d2f5060863b"
    }
}

Polling the results of a run

Get results of a run

GET https://api.context.ai/api/v1/evaluations/run/{id}

Returns the results of the evaluation run.

Headers

NameTypeDescription

id*

String

Run id

Example Response

{
    "id": "7e835d66-ca7c-4986-b3ca-0d2f5060863b",
    "details": {
        "test_set_name": "My First Test Set",
        "version": 1
    },
    "status": "completed",
    "started_at": "2024-01-23T20:23:25.374Z",
    "progress": {
        "completed": 1,
        "pending": 0
    },
    "results": [
        {
            "iteration": 1,
            "test_case": {
                "name": "My First Test Case",
                "input_context_window": [
                    {
                        "role": "system",
                        "message": "You are a helpful assistant."
                    },
                    {
                        "role": "user",
                        "message": "Can you write me a poem?"
                    }
                ],
                "output": [
                    {
                        "role": "system",
                        "content": "Expedita tempora fuga doloribus molestiae laudantium."
                    }
                ]
            },
            "evaluations": [
                {
                    "evaluator_name": "Attempts Answer",
                    "outcome": "inconclusive",
                    "reasoning": {
                        "result": []
                    }
                }
            ]
        }
    ]
}

Last updated