Webinar: AI Agents in the Wild — Tuesday, January 20 at 9am PT / 12pm ET. Register now!
Model Compatibility & Benchmarks

Model Compatibility & Benchmarks

This page provides detailed benchmark scores and version compatibility information for all reasoning models supported by Sema4.AI.

Benchmark Scores

Reasoning models combined with Sema4's agent tuning deliver dramatically better accuracy on complex, multi-step tasks - reducing hallucinations and improving the reliability of your agent outcomes.

These improvements have been validated through τ2-telecom benchmark, a widely-recognized evaluation for measuring agent performance on long-running, multi-step tasks. OpenAI and Anthropic use this benchmark to validate their tool-calling accuracy because it tests an agent's ability to maintain context, follow complex instructions, and complete sequential operations without degradation.

Sema4.AI agent accuracy with τ2-telecom benchmark

Full Model Compatibility Matrix

Use this table to determine which version of Sema4.AI products you need for a specific model. The table shows when each model was first supported - newer versions continue to support all previously introduced models unless otherwise noted.

Model
Benchmark Score
τ2-telecom
Build
Deploy
Minimum Studio Version
Minimum Agent Compute
Enterprise Edition
Minimum Snowflake App
Team Edition
OpenAI
GPT-5.2
Reasoning
88%11.6.51.6.41.4.32
GPT-5.1-codex-max
Reasoning
88% (medium)11.6.51.6.41.4.32
GPT-5
Reasoning
95% (high)1
93% (medium)
90% (low)
1.61.4.11.6*
o3
Reasoning
58%21.4.61.4.1N/A
o4-mini
Reasoning
41%31.4.61.4.1N/A
gpt 4.1
Non-reasoning
34%31.31.31.4.9
gpt-4o
Non-reasoning
24%21.21.6N/A
Anthropic
Anthropic calls reasoning "extended thinking"
Claude Sonnet 4.5
Reasoning
98%41.61.61.6*
Claude Haiku 4.5
Reasoning
83%61.61.61.6*
Claude Opus 4.1
Reasoning
71%41.61.61.6*
Claude Opus 4
Reasoning
57%51.61.61.6*
Claude Sonnet 4
Reasoning
49.641.61.61.6*
Claude Sonnet 3.7
Non-reasoning
4931.4.61.4.11.4
Google
Gemini 3.0 Pro
Reasoning
87%71.6.51.6.41.4.32
Gemini 3.0 Flash
Reasoning
80%81.6.51.6.41.4.32

For Team Edition users

  • Reasoning models with an asterisk (*) are fully supported via AWS Bedrock, OpenAI, or Azure OpenAI. These models are not yet supported in the Sema4.AI platform through the Cortex API.
Footnotes