Model Compatibility & Benchmarks
This page provides detailed benchmark scores and version compatibility information for all reasoning models supported by Sema4.AI.
Benchmark Scores
Reasoning models combined with Sema4's agent tuning deliver dramatically better accuracy on complex, multi-step tasks - reducing hallucinations and improving the reliability of your agent outcomes.
These improvements have been validated through τ2-telecom benchmark, a widely-recognized evaluation for measuring agent performance on long-running, multi-step tasks. OpenAI and Anthropic use this benchmark to validate their tool-calling accuracy because it tests an agent's ability to maintain context, follow complex instructions, and complete sequential operations without degradation.
Sema4.AI agent accuracy with τ2-telecom benchmark
Full Model Compatibility Matrix
Use this table to determine which version of Sema4.AI products you need for a specific model. The table shows when each model was first supported - newer versions continue to support all previously introduced models unless otherwise noted.
For Team Edition users
- Reasoning models with an asterisk (*) are fully supported via AWS Bedrock, OpenAI, or Azure OpenAI. These models are not yet supported in the Sema4.AI platform through the Cortex API.
Footnotes
1 Internal Sema4.ai testing against τ2-telecom benchmark (opens in a new tab)
2 OpenAI GPT-5 Blog Post (opens in a new tab)
3 τ2-telecom benchmark paper (opens in a new tab)
4 Anthropic Claude Sonnet 4.5 announcement (opens in a new tab)
5 Kimi K2 documentation (opens in a new tab)
6 Anthropic Claude Haiku 4.5 announcement (opens in a new tab)
7 Artificial Analysis: Gemini 3 Pro - Everything you need to know (opens in a new tab)
8 Artificial Analysis: Gemini 3 Flash Intelligence, Performance & Price Analysis (opens in a new tab)