LLM models and providers
Reasoning models represent a significant advancement in how AI agents handle complex, multi-step tasks. When combined with Sema4's agent tuning, these models deliver dramatically improved accuracy and reliability for enterprise workflows.
All models on this page are available on Sema4 Platform v2.5+.
Sema4.AI agent accuracy with τ2-telecom benchmark
These improvements have been validated through the τ2-telecom benchmark, a widely-recognized evaluation for measuring agent performance on long-running, multi-step tasks. OpenAI and Anthropic use this benchmark to validate their tool-calling accuracy because it tests an agent's ability to maintain context, follow complex instructions, and complete sequential operations without degradation.
Supported providers
You bring your own model provider account, and the models run under your provider contract. See Configure LLMs for credential setup and network egress requirements.
| Model provider | Recommended models | Providers |
|---|---|---|
| OpenAI | GPT-5.3 Codex High, GPT-5.4 High | OpenAI, Microsoft Azure OpenAI |
| Anthropic | Claude Opus 4.6 High | AWS Bedrock, Azure AI Foundry |
| Gemini 3.1 Pro High | Google Gemini API, Vertex AI |
Supported models
Footnotes
1 τ2-telecom leaderboard (opens in a new tab)
2 Internal Sema4.ai testing against τ2-telecom benchmark (opens in a new tab)
3 Anthropic Claude Opus 4.5 announcement (opens in a new tab)
4 Anthropic Claude Sonnet 4.5 announcement (opens in a new tab)
5 Anthropic Claude Haiku 4.5 announcement (opens in a new tab)
6 Artificial Analysis: Gemini 3 Flash Intelligence, Performance & Price Analysis (opens in a new tab)
7 Artificial Analysis: τ2-bench (GPT-5.3 Codex) (opens in a new tab)
8 Anthropic Claude Sonnet 4.6 announcement (opens in a new tab)
9 Google Gemini 3.1 Pro announcement (opens in a new tab)
Some models from earlier Sema4.AI 1.x releases — GPT-5.2, o3, o4-mini, GPT-4.1, GPT-4o, Claude Opus 4 and 4.1, Claude Sonnet 4 and 3.7, and Gemini 3 Pro — as well as the Snowflake Cortex provider, are not available on Sema4 Platform.
Choosing a model for SQL generation
You can configure a dedicated model for natural-language-to-SQL in semantic data models, separate from your agent model, under Configuration > LLMs > SQL Generation Model.
Based on our BIRD benchmark testing, non-reasoning variants (reasoning level none/off) score similarly to reasoning variants on SQL generation while delivering better latency — prefer them for SQL. For example, Codex models are reasoning-only, so they will be slower at SQL generation.
Currently recommended models for SQL generation:
- Claude Sonnet 4.6 None
- GPT-5.4 None
SQL generation quality is generally limited by context rather than model choice. Enriching your semantic data model with better business information — table and column descriptions, metrics, and verified queries — is the most effective way to improve results. See Semantic data models.
Getting started
- Configure LLMs — Connect OpenAI, Azure, AWS Bedrock, or Google credentials and choose your workspace models
- Select a reasoning model — When adding an LLM configuration under Configuration > LLMs, choose the model and its reasoning level from the model dropdown
- Manage LLMs day-to-day — Default LLM and SQL Generation Model settings