Python AI SDK

Python AI SDK on Github, PyPI Package

This SDK is available in an open beta, and its methods may change. We encourage you to reach out on Slack for help getting setup, and so we can communicate changes.

Overview

The Statsig Python AI SDK lets you manage your prompts, online and offline evals, and debug your LLM applications in production. It depends upon the Statsig Python Server SDK, but provides convenient hooks for AI-specific functionality.

Install the SDK

pip install statsig-ai

Initialize the SDK

For initialization requirements in forking and WSGI servers, see the Statsig Python Server SDK docs.

If you already have a Statsig instance, you can pass it into the SDK. Otherwise, we’ll create an instance for you internally.

Don't use Statsig
Already have Statsig instance

Initialize the AI SDK with a Server Secret Key from the Statsig console.

Server Secret Keys should always be kept private. If you expose one, you can disable and recreate it in the Statsig console.

from statsig_ai import StatsigAI, StatsigCreateConfig

statsig_ai = StatsigAI(statsig_source=StatsigCreateConfig(server_secret_key='YOUR_SERVER_SECRET_KEY'))
statsig_ai.initialize().

Initializing With Options

Optionally, you can configure StatsigOptions for your Statsig instance:

from statsig_ai import StatsigAI
from statsig_python_core import StatsigOptions

# if you want to configure any statsig options, this is optional:
statsig_options = StatsigOptions()
statsig_options.environment = 'production'

statsig_ai_options.statsig_options = statsig_options

statsig_ai = StatsigAI(statsig_source=StatsigCreateConfig(server_secret_key='YOUR_SERVER_SECRET_KEY', statsig_options=statsig_options))
statsig_ai.initialize()

# if you would like to use any statsig methods, you can access the statsig instance from the statsig_ai instance:
gate = statsig_ai.get_statsig().check_gate(statsig_user, 'my_gate')

After installation, initialize the SDK with a Server Secret Key from the Statsig console.

Server Secret Keys should always be kept private. If you expose one, you can disable and recreate it in the Statsig console.

If you initialize this way, the AI SDK won’t handle initialization, flushing, or shutdown.

from statsig_python_core import Statsig
from statsig_ai import StatsigAI, StatsigAttachConfig

statsig = Statsig('YOUR_SERVER_SECRET_KEY')
statsig.initialize()

statsig_ai = StatsigAI(statsig_source=StatsigAttachConfig(statsig=statsig))
statsig_ai.initialize()

Initializing With Options

Optionally, you can configure StatsigOptions:

from statsig_python_core import Statsig, StatsigOptions
from statsig_ai import StatsigAI, StatsigAttachConfig

options = StatsigOptions()
options.environment = 'production'

statsig = Statsig('YOUR_SERVER_SECRET_KEY', options)
statsig.initialize()

statsig_ai = StatsigAI(statsig_source=StatsigAttachConfig(statsig=statsig))
statsig_ai.initialize()

Using the SDK

Getting a Prompt

Statsig can act as the control plane for your LLM prompts, allowing you to version and change them without deploying code. For more information, see the Prompts documentation.

from statsig_ai import StatsigUser

# Create a user object
user = StatsigUser(user_id='a-user')

# Get the prompt
my_prompt = statsig_ai.get_prompt(user, 'my_prompt')

# Use the live version of the prompt
live_version = my_prompt.get_live()

# Get the candidate versions of the prompt
candidate_versions = my_prompt.get_candidates()

# Use the live version of the prompt in a completion
response = openai.chat.completions.create(
    model=live_version.get_model(fallback='gpt-4'),  # optional fallback
    temperature=live_version.get_temperature(),
    max_tokens=live_version.get_max_tokens(),
    messages=[{'role': 'user', 'content': 'Your prompt here'}],
)

Logging Eval Results

When running an online eval, you can log results back to Statsig for analysis. Provide a score between 0 and 1, along with the grader name and any useful metadata (e.g., session IDs). Currently, you must provide the grader manually — future releases will support automated grading options.

from statsig_ai import StatsigUser

live_prompt_version = statsig_ai.get_prompt(user, 'my_prompt').get_live()
# Create a user object
user = StatsigUser(user_id='a-user')

# Log the results of the eval
statsig_ai.log_eval_grade(user, live_prompt_version, 0.5, 'my_grader', {
    'session_id': '1234567890',
})

# flush eval grade events to statsig
statsig_ai.flush().wait()

Programmatic Evaluation

Programmatic evaluation allows you to run evaluations on datasets programmatically, automatically scoring outputs and sending results to Statsig for analysis. With programmatic evaluation, you can:

Run evaluations on datasets: Process arrays, iterators, or async generators of input/expected pairs
Define custom tasks: Create functions that generate outputs from inputs (supports both sync and async)
Score outputs: Use single or multiple named scorer functions to evaluate outputs (supports boolean, numeric, or metadata-rich scores)
Use parameters: Pass dynamic parameters to tasks using Zod schemas (Node) or dictionaries (Python)
Categorize data: Group evaluation records by categories for better analysis
Compute summary scores: Aggregate results across all records with custom summary functions
Handle errors gracefully: Task and scorer errors are caught and reported without stopping the evaluation

The evaluation automatically sends results to Statsig, where you can view them in the console alongside your other eval data. Note: Tasks and scorers can be async functions. Data can also be provided as async functions, promises, or async iterators. The expected field in data records is optional - scorers can evaluate outputs without expected values. Task and scorer errors are automatically caught and reported in the results.

from statsig_ai import Eval, EvalScorerArgs, EvalDataRecord, EvalHook

# Basic evaluation with a single scorer
result = Eval(
    name='greeting_task',
    data=[
        {'input': 'world', 'expected': 'Hello world'},
        {'input': 'test', 'expected': 'Hello test'},
    ],
    task=lambda input: f'Hello {input}',
    scorer=lambda args: args.output == args.expected,
    eval_run_name='run-123',
)

# Multiple named scorers
result2 = Eval(
    name='multi_scorer_task',
    data=[
        {'input': 'world', 'expected': 'Hello world'},
        {'input': 'test', 'expected': 'Hello test'},
    ],
    task=lambda input: f'Hello {input}',
    scorer={
        'correctness': lambda args: args.output == args.expected,
        'starts_with_hello': lambda args: args.output.startswith('Hello'),
        'length_check': lambda args: len(args.output) > 5,
    },
)

# Using parameters
def task_with_params(input: str, hook: EvalHook) -> str:
    prefix = hook.parameters.get('prefix', 'Hello')
    return f'{prefix} {input}'

result3 = Eval(
    name='parameterized_task',
    data=[
        {'input': 'world', 'expected': 'Hi world'},
    ],
    task=task_with_params,
    scorer=lambda args: args.output == args.expected,
    parameters={'prefix': 'Hi', 'suffix': '!', 'number': 123},
)

# Extras: Categories and summary scores
def summary_scorer(results):
    correct = sum(1 for r in results if r.scores.get('correctness', 0.0) == 1.0)
    return {
        'accuracy': correct / len(results) if results else 0.0,
        'total': len(results),
    }

result4 = Eval(
    name='categorized_with_summary',
    data=[
        {'input': 'world', 'expected': 'Hello world', 'category': 'greeting'},
        {'input': 'test', 'expected': 'Hello test', 'category': ['greeting', 'test']},
        {'input': 'foo', 'expected': 'Goodbye foo', 'category': 'farewell'},
    ],
    task=lambda input: f'Hello {input}',
    scorer={
        'correctness': lambda args: args.output == args.expected,
    },
    summary_score_fn=summary_scorer,
)

# Using EvalDataRecord dataclass
result5 = Eval(
    name='dataclass_records',
    data=[
        EvalDataRecord(input='world', expected='Hello world'),
        EvalDataRecord(input='test', expected='Hello test'),
    ],
    task=lambda input: f'Hello {input}',
    scorer=lambda args: args.output == args.expected,
)

OpenTelemetry (OTEL)

The AI SDK works with OpenTelemetry for sending telemetry to Statsig. You can enable OTel tracing by calling the initializeTracing function. You can also provide a custom TracerProvider to the initializeTracing function if you want to customize the tracing behavior. More advanced OTel configuration and exporter support are on the way. Otel is not supported in the Python AI SDK yet. Coming soon!

Wrapping OpenAI

The Statsig OpenAI Wrapper automatically adds tracing and log events to your OpenAI SDK usage, giving you in-console visibility with minimal setup. OpenAI wrapper is not supported in the Python AI SDK yet. Coming soon!

Using other SDK methods

Whether you passed in a Statsig instance or not, you can access the Statsig instance from the statsig_ai instance, and use its many methods:

# Check a gate value
gate = statsig_ai.get_statsig().check_gate(statsig_user, 'my_gate')

# Log an event
statsig_ai.get_statsig().log_event(statsig_user, 'my_event', value=1)

Refer to the Statsig Python SDK docs for more information on how to use the Core Statsig SDK methods, plus information on advanced setup + singleton usage.

SDK Quickstarts

SDK Concepts

Client SDKs

Server SDKs

AI SDKs

More SDK Methods

Console API Reference

Statsig CLI

HTTP API Reference

Overview

Using the SDK

Getting a Prompt

Logging Eval Results

Programmatic Evaluation

OpenTelemetry (OTEL)

Wrapping OpenAI

Using other SDK methods

SDK Quickstarts

SDK Concepts

Client SDKs

Server SDKs

AI SDKs

More SDK Methods

Console API Reference

Statsig CLI

HTTP API Reference

​Overview

​Using the SDK

​Getting a Prompt

​Logging Eval Results

​Programmatic Evaluation

​OpenTelemetry (OTEL)

​Wrapping OpenAI

​Using other SDK methods

Overview

Using the SDK

Getting a Prompt

Logging Eval Results

Programmatic Evaluation

OpenTelemetry (OTEL)

Wrapping OpenAI

Using other SDK methods