Conversation Cost Estimation

Context.ai can estimate the LLM API costs for ingested conversations. Every model adopts a different pricing methodology and/or tokenization configuration along with unit price making it difficult to determine relative costs of models. Conversation cost estimation enables you to observe conversation transcript costs in a single interface for multiple model providers.

Integration Instructions

To start using transcript cost estimation, add a model metadata key-value pair to ingested transcripts. The model metadata must contain a string from the list of supported models.

{
  "conversation": {
    "messages": [ ... ],
    "metadata": {
      "model": 
    }
  }
}

Once conversations are ingested, an estimated cost will be visible under the "Transcripts" tab within the Context UI.

Pricing Methodology

Conversation Cost Estimation uses the model key in the metadata attached to ingested conversations to select configurations for pricing. The model value determines the pricing methods (token or character counts), tokenization configuration and associated unit price.

When processing system or user messages we assume the full previous context window was also sent to the LLM for that conversation (i.e. every previous message in that conversation). Each assistant message is priced individually.

As an example consider the following conversation.

{
  "conversation": {
    "messages": [
      {
        "role": "user",
        "message": "When does the next train to Bristol depart?",
      },
      {
        "role": "assistant",
        "message": "The next train to Bristol departs from London Paddington at 15:03."
      },
      {
        "role": "user",
        "message": "What is the earliest train tomorrow morning?",
      },
      {
        "role": "assistant",
        "message": "The first morning train to Bristol departs 05:32."
      }
    ],
    "metadata": {
      "model": "gpt-3.5-turbo-16k"
    }
  }
}

The first user and assistant messages will be priced individually. The follow up user question will be priced using a concatenation with the previous two messages:

"When does the next train to Bristol depart? The next train to Bristol departs from London Paddington at 15:03. What is the earliest train tomorrow morning?".

The final assistant message will again be priced individually.

Supported Models & Context Window Sizes

gpt-3.5-turbo-4k    davinci-002         claude-1
gpt-3.5-turbo-16k   babbage-002         claude-2
gpt-4-8k            curie               claude-instant
gpt-4-32k           chat-bison        
gpt-4-1106-preview
gpt-3.5-turbo-1106

Last updated