Bulk Test Upload

API method for the creation of Test Sets

Overview

Test Sets allows you to define a versioned testing environment for your model. A Test Set contains a collection of test cases.

Each test case has a name, model and messages to prompt the model. Test case can optionally have other fields including model_config and metadata. You can use metadata to filter your test cases in the UI.

After creation you can specify evaluators on the Context.ai platform.

Test Sets and test cases are identified by their name attribute. Note that name cannot be updated after creation.

Versioning

When creating a new Test Set, version #1 will be automatically created and populated with your test cases. When making a second POST request with the same Test Set name, a new version will be created. Use the copy_test_cases_from query parameter to specify append / overwrite behaviour.

  • Adding evaluators to your test cases

    You can optionally add evaluators to the test cases in your Test Set through the API. We have some off-the-shelf evaluators available for use. You can also create your own custom evaluators in the Context.ai web app and assign those to your test cases by referencing them in snake_case.

  • Uploading pre-generated response for evaluation

    To run evaluations against a pre-generated response for a test case, you can add the response under your test case, keyed by pregenerated_response. If the pregenerated_response field is set, model is an optional parameter.

Log a Test Set

POST https://api.context.ai/api/v1/test_sets

Create new Test Set version. If Test Set name already exists, a new version will be created. Use copy_test_cases_from query parameter to specify append/replace behaviour.

Query Parameters

NameTypeDescription

copy_test_cases_from

String

Either none or prior_version. If none all test cases will be replaced. If prior_version only test cases with the same name will be replaced, new test cases will be appended.

Defaults to prior_version.

Request Body

NameTypeDescription

name*

String

Name of the Test Set

test_cases*

List

List of all test cases

{
    "name": "train_schedule_edge_cases",
    "version_id": 42
}

Example Request

{
    "name": "train_schedule_edge_cases",
    "test_cases": [
        {
            "name": "political query",
            "model": "gpt-4",
            "model_config" : {
                "temperature": 1.2,
                "top_p": 0.9
            },
            "messages": [
                {
                    "role": "system",
                    "message": "You are a LLM providing information about U.K. train schedules."
                },
                {
                    "role": "user",
                    "message": "What time does the train to Bristol from Paddington depart?"
                },
                {
                    "role": "assistant",
                    "message": "Could you confirm the day of departure?"
                },
                {
                    "role": "user",
                    "message": "Tomorrow"
                }
            ],
            "evaluators": [
                {
                    "evaluator": "attempts_answer"
                }
            ],
            "metadata": {
                "category": "trains"
            }
        },
        {
            "name": "dippy query",
            "model": "gpt-3.5-turbo-1106",
            "messages": [
                {
                    "role": "system",
                    "message": "You are a super llm answer questions about dippy. Dippy is a big dinsour who weighs 300kg and has a lovely smile. Do not answer questions about any other topics"
                },
                {
                    "role": "user",
                    "message": "How much does dippy weigh?"
                }
            ],
            "evaluators": [
                {
                    "evaluator": "context_contains_phrase",
                    "options": {
                        "phrase": "does dippy weigh?",
                        "context": "full_window"
                    }
                },
                {
                     "evaluator": "golden_response",
                     "options": {
                         "golden_response": "Dippy weighs 300kg."
                     }
                },
                {
                    "evaluator": "faithfulness"
                }
            ],
            "metadata": {
                "category": "dinosaurs"
            }
        },
        {
            "name": "unknown answer",
            "model": "gpt-3.5-turbo",
            "messages": [
                {
                    "role": "system",
                    "message": "You are a LLM providing information about U.K. train schedules."
                },
                {
                    "role": "user",
                    "message": "How do I get from Bangalore to Chennai?"
                }
            ],
            "evaluators": [
                {
                    "evaluator": "attempts_answer"
                }
            ],
            "metadata": {
                "category": "trains"
            }
        }
    ]
}

Example Response

{
    "name": "train_schedule_edge_cases",
    "version": 42
}

Last updated