Webinar: Better Agents, Easier than Ever — Thursday, June 18th at 9am PT / 12pm ET. Register Now
Version 2.5
LLM models and providers

LLM models and providers

Reasoning models represent a significant advancement in how AI agents handle complex, multi-step tasks. When combined with Sema4's agent tuning, these models deliver dramatically improved accuracy and reliability for enterprise workflows.

All models on this page are available on Sema4 Platform v2.5+.

Sema4.AI agent accuracy with τ2-telecom benchmark

These improvements have been validated through the τ2-telecom benchmark, a widely-recognized evaluation for measuring agent performance on long-running, multi-step tasks. OpenAI and Anthropic use this benchmark to validate their tool-calling accuracy because it tests an agent's ability to maintain context, follow complex instructions, and complete sequential operations without degradation.

Supported providers

You bring your own model provider account, and the models run under your provider contract. See Configure LLMs for credential setup and network egress requirements.

Model providerRecommended modelsProviders
OpenAIGPT-5.3 Codex High, GPT-5.4 HighOpenAI, Microsoft Azure OpenAI
AnthropicClaude Opus 4.6 HighAWS Bedrock, Azure AI Foundry
GoogleGemini 3.1 Pro HighGoogle Gemini API, Vertex AI

Supported models

Model
Benchmark score
τ2-telecom
Sema4 Platform
OpenAI
Also available through Microsoft Azure OpenAI
GPT-5.4
Latest
98.9%1v2.5+
GPT-5.3 Codex86%7v2.5+
GPT-5.1 Codex Max88% (medium)2v2.5+
GPT-595% (high)2
93% (medium)
90% (low)
v2.5+
Anthropic
Available through AWS Bedrock and Azure AI Foundry.
Claude Opus 4.6
Adaptive thinking · 1M-token-context variant
99.3%2v2.5+
Claude Sonnet 4.6
Adaptive thinking · 1M-token-context variant
97.9%8v2.5+
Claude Opus 4.598.2%3v2.5+
Claude Sonnet 4.598%4v2.5+
Claude Haiku 4.583%5v2.5+
Google
Available through the Gemini API or Vertex AI
Gemini 3.1 Pro99.3%9v2.5+
Gemini 3 Flash80%6v2.5+
Footnotes

Some models from earlier Sema4.AI 1.x releases — GPT-5.2, o3, o4-mini, GPT-4.1, GPT-4o, Claude Opus 4 and 4.1, Claude Sonnet 4 and 3.7, and Gemini 3 Pro — as well as the Snowflake Cortex provider, are not available on Sema4 Platform.

Choosing a model for SQL generation

You can configure a dedicated model for natural-language-to-SQL in semantic data models, separate from your agent model, under Configuration > LLMs > SQL Generation Model.

Based on our BIRD benchmark testing, non-reasoning variants (reasoning level none/off) score similarly to reasoning variants on SQL generation while delivering better latency — prefer them for SQL. For example, Codex models are reasoning-only, so they will be slower at SQL generation.

Currently recommended models for SQL generation:

  • Claude Sonnet 4.6 None
  • GPT-5.4 None

SQL generation quality is generally limited by context rather than model choice. Enriching your semantic data model with better business information — table and column descriptions, metrics, and verified queries — is the most effective way to improve results. See Semantic data models.

Getting started

  • Configure LLMs — Connect OpenAI, Azure, AWS Bedrock, or Google credentials and choose your workspace models
  • Select a reasoning model — When adding an LLM configuration under Configuration > LLMs, choose the model and its reasoning level from the model dropdown
  • Manage LLMs day-to-day — Default LLM and SQL Generation Model settings