LLM models and providers

Reasoning models represent a significant advancement in how AI agents handle complex, multi-step tasks. When combined with Sema4's agent tuning, these models deliver dramatically improved accuracy and reliability for enterprise workflows.

All models on this page are available on Sema4 Platform v2.5+.

Sema4.AI agent accuracy with τ²-telecom benchmark

OpenAI GPT-5.4

vs GPT-4.1

34%→98.9%

Anthropic Claude Opus 4.6

vs Sonnet 3.7

49%→99.3%

These improvements have been validated through the τ²-telecom benchmark, a widely-recognized evaluation for measuring agent performance on long-running, multi-step tasks. OpenAI and Anthropic use this benchmark to validate their tool-calling accuracy because it tests an agent's ability to maintain context, follow complex instructions, and complete sequential operations without degradation.

Supported providers

You bring your own model provider account, and the models run under your provider contract. See Configure LLMs for credential setup and network egress requirements.

Model provider	Recommended models	Providers
OpenAI	GPT-5.3 Codex High, GPT-5.4 High	OpenAI, Microsoft Azure OpenAI
Anthropic	Claude Opus 4.6 High	AWS Bedrock, Azure AI Foundry
Google	Gemini 3.1 Pro High	Google Gemini API, Vertex AI

Supported models

Model	Benchmark score τ²-telecom	Sema4 Platform
OpenAI Also available through Microsoft Azure OpenAI
GPT-5.4 Latest	98.9%¹	v2.5+
GPT-5.3 Codex	86%⁷	v2.5+
GPT-5.1 Codex Max	88% (medium)²	v2.5+
GPT-5	95% (high)² 93% (medium) 90% (low)	v2.5+
Anthropic Available through AWS Bedrock and Azure AI Foundry.
Claude Opus 4.6 Adaptive thinking · 1M-token-context variant	99.3%²	v2.5+
Claude Sonnet 4.6 Adaptive thinking · 1M-token-context variant	97.9%⁸	v2.5+
Claude Opus 4.5	98.2%³	v2.5+
Claude Sonnet 4.5	98%⁴	v2.5+
Claude Haiku 4.5	83%⁵	v2.5+
Google Available through the Gemini API or Vertex AI
Gemini 3.1 Pro	99.3%⁹	v2.5+
Gemini 3 Flash	80%⁶	v2.5+

Footnotes

Some models from earlier Sema4.AI 1.x releases — GPT-5.2, o3, o4-mini, GPT-4.1, GPT-4o, Claude Opus 4 and 4.1, Claude Sonnet 4 and 3.7, and Gemini 3 Pro — as well as the Snowflake Cortex provider, are not available on Sema4 Platform.

Choosing a model for SQL generation

You can configure a dedicated model for natural-language-to-SQL in semantic data models, separate from your agent model, under Configuration > LLMs > SQL Generation Model.

Based on our BIRD benchmark testing, non-reasoning variants (reasoning level none/off) score similarly to reasoning variants on SQL generation while delivering better latency — prefer them for SQL. For example, Codex models are reasoning-only, so they will be slower at SQL generation.

Currently recommended models for SQL generation:

Claude Sonnet 4.6 None
GPT-5.4 None

SQL generation quality is generally limited by context rather than model choice. Enriching your semantic data model with better business information — table and column descriptions, metrics, and verified queries — is the most effective way to improve results. See Semantic data models.

Getting started

Configure LLMs — Connect OpenAI, Azure, AWS Bedrock, or Google credentials and choose your workspace models
Select a reasoning model — When adding an LLM configuration under Configuration > LLMs, choose the model and its reasoning level from the model dropdown
Manage LLMs day-to-day — Default LLM and SQL Generation Model settings

Agent types Runbooks