Model Compatibility & Benchmarks

Model Compatibility & Benchmarks

This page provides detailed benchmark scores and version compatibility information for all reasoning models supported by Sema4.AI.

Benchmark Scores

Reasoning models combined with Sema4's agent tuning deliver dramatically better accuracy on complex, multi-step tasks - reducing hallucinations and improving the reliability of your agent outcomes.

These improvements have been validated through τ2-telecom benchmark, a widely-recognized evaluation for measuring agent performance on long-running, multi-step tasks. OpenAI and Anthropic use this benchmark to validate their tool-calling accuracy because it tests an agent's ability to maintain context, follow complex instructions, and complete sequential operations without degradation.

Sema4.AI agent accuracy with τ2-telecom benchmark

Full Model Compatibility Matrix

Use this table to determine which version of Sema4.AI products you need for a specific model. The table shows when each model was first supported - newer versions continue to support all previously introduced models unless otherwise noted.

Model
Benchmark Score
τ2-telecom
Build
Deploy
Minimum Studio Version
Minimum Agent Compute
Enterprise Edition
Minimum Snowflake App
Team Edition
OpenAI
GPT-5.2
Reasoning
88%11.6.51.6.41.4.32
GPT-5.1-codex-max
Reasoning
88% (medium)11.6.51.6.41.4.3211
GPT-5
Reasoning
95% (high)1
93% (medium)
90% (low)
1.61.4.11.4.20
o3
Reasoning
58%21.4.61.4.1N/A
o4-mini
Reasoning
41%31.4.61.4.1N/A
gpt 4.1
Non-reasoning
34%31.31.31.4.9
gpt-4o
Non-reasoning
24%21.21.6N/A
Anthropic
Anthropic calls reasoning "extended thinking"
Claude Opus 4.5
Reasoning
98.2%91.6.61.6.51.4.44
Claude Sonnet 4.5
Reasoning
98%41.61.61.4.2010
Claude Haiku 4.5
Reasoning
83%61.61.61.4.2010
Claude Opus 4.1
Reasoning
71%41.61.61.4.20
Claude Opus 4
Reasoning
57%51.61.61.4.20
Claude Sonnet 4
Reasoning
49.641.61.61.4.20
Claude Sonnet 3.7
Non-reasoning
4931.4.61.4.11.4
Google
Gemini 3.0 Pro
Reasoning
87%71.6.51.6.41.4.32
Gemini 3.0 Flash
Reasoning
80%81.6.51.6.41.4.32
Footnotes