Multi-Call Chains [beta]

For more advanced LLM applications, Context.ai supports evaluating and debugging multi-call chains.

For more complex LLM-powered applications, particularly those with multi-step chains or agent-like workflows, evaluating single request-response pairs may not be sufficient to fully validate performance.

For these more complex workflows, Context supports evaluating multi-call chains of LLM calls as part of a larger trace. Our API implementation is fully compatible with Langsmith tracing, so both can be used in parallel.

Trace evaluations are currently in Beta and supported through our Python SDK only. If you're interested in support for other programming languages, please reach out to henry@context.ai

Get Started

Our SDK provides an interface to create, capture, and annotate specific sections of your code to evaluate via Context.ai. Evaluating multi-call chains is an API-first feature that integrates directly with your existing unit tests.

There are two steps involved to start evaluating complex multi-call chains:

  1. Annotate relevant functions with tracing decorators and loggers.

  2. Write unit tests to evaluate these traces via Context, then receive results in the UI for inspection and debugging.

Creating Traces

Context builds upon the Langsmith tracing SDK, allowing both to be used in parallel.

There are two ways to annotate functions that will allow you to capture them in traces for evaluation.

Tracing Code You Own

The easiest way to trace code you own is to use the @traceable decorator from the Context or Langsmith SDK.

from langsmith import traceable
# OR: from getcontext.tracing import traceable

@traceable
def create_document_store():
    # Implementation

Tracing Code from 3rd Party Libraries

For calls made to 3rd-party libraries (such as LLM SDKs) it isn't possible to annotate these calls with the @traceable decorator. Instead, we provide a helper function dynamic_traceable that allows you to run and capture calls made here as part of the trace at runtime. This function is also useful for dynamically overriding the name of the span in the trace on each call.

from getcontext.tracing import dynamic_traceable

run = dynamic_traceable(
    doc_write.run, run_type="chain", name="doc_writer_run"
)

# Call run as normal with its regular arguments:
run(...)

For specifically capturing OpenAI API calls in traces, you can also use Langchain's wrap_openai helper which will do this dynamic overrides for you.

Context.ai currently supports the llm and chain values for the run_type field. We will be adding support for other run-types (and evaluating them!) in the next few weeks.

Running Evaluations

The recommend way to run evaluations is as part of your existing unit testing suite and Context.ai provides SDK helpers to make this process easy.

Once you have annotated the relevant sections of your code to capture traces, the next step is to annotate these traces with relevant evaluators.

import unittest
from main import ask_question
from getcontext.tracing import capture_trace

def test_ask_question(self):
    # Capture the trace generated by calling `ask_question` with the arguments: "Where does Mark live?"
    # You can optionally override the top_level trace name by setting the trace_name argument value.
    trace = capture_trace(ask_question, 'ask_question', "Where does Mark live?")
    
    # Attach evaluators to specific nodes within the trace. You can add as many evaluators as you want to as many nodes within the trace as is helpful.
    # Note: You cannot currently attach evaluators to multiple different nodes with the same span name.
    trace.add_evaluator(
            span_name="call_openai_llm",
            evaluator=Evaluator(
                evaluator="golden_response",
                options={"golden_response": "Mark lives in London."},
            ),
    )
    
    # Run the evaluation in Context.ai, throwing an exception if it fails.
    trace.evaluate()

We currently support evaluations for LLM spans (those with run_type == "llm") that use the OpenAI schema for message ingestion. We will be building support for evaluating additional types of spans and schemas in the following weeks.

To evaluate LLM responses from foundation model providers other than OpenAI, we recommend using LiteLLM which will ensure a consistent schema for all model responses.

Viewing Traces

All traces from unit tests are automatically logged in the Context.ai dashboard. To view recently executed traces, click on the "Traces" item in the left sidebar.

From here you can view all the recently logged traces, hovering over each for a preview of its structure. Click through to see a detailed view on the trace object.

Last updated