DataFrames
Sema4.ai DataFrames serves as the agent's intelligent data workspace—an in-memory database where agents can dynamically create, transform, and analyze data from any source with mathematical precision. Unlike LLM-based analysis prone to calculation errors, DataFrames uses SQL for all operations, ensuring accurate results across millions of rows while maintaining complete transparency into analytical work.
DataFrames acts as the agent's analytical scratchpad—a conversation-scoped workspace where they can read, transform, join, and reason over data while showing their complete work process.
Why use DataFrames?
Traditional approaches to data analysis have significant limitations:
LLM-based analysis limitations:
- Mathematical unreliability: Calculation errors and inconsistent numerical results
- Context window constraints: Can't process large datasets beyond token limits
- No persistent workspace: Can't build multi-step analytical workflows
- Token cost explosion: Every data interaction consumes expensive LLM tokens
Manual data work challenges:
- Time-consuming: Hours spent comparing data across Excel files and systems
- Error-prone: Manual reconciliation introduces human mistakes
- Non-repeatable: Each analysis requires starting from scratch
- Technical bottlenecks: Waiting for data teams to build pipelines for one-off tasks
Without DataFrames: Business analysts either struggle with unreliable LLM calculations that produce incorrect financial results, or spend hours manually reconciling data across spreadsheets—work that's tedious, error-prone, and impossible to automate reliably.
Key capabilities
- Mathematical accuracy: SQL-powered calculations with zero tolerance for errors—all calculations use SQL, not LLMs, ensuring mathematically accurate results
- Unlimited scale: Process millions of rows without context window limitations, entirely within your secure environment
- Intelligent workspace: Dynamic data creation, transformation, and multi-step analysis where agents show their complete work process
- Automatic multi-source conversion: Spreadsheet files, and Named Query results become instant DataFrames
- Natural language interface: Ask questions in plain English, get sophisticated SQL automatically
- Intelligent data joining: Agents automatically understand relationships and join data across sources
- Complete transparency: See exactly how agents approach and solve data problems
- Seamless export: Native CSV export, or use action packages for Excel, Google Sheets, and Office 365/SharePoint
- Enterprise-scale cost efficiency: Dramatically reduce token costs on large datasets
What can you analyze with DataFrames?
DataFrames works with multiple data sources, automatically converting them into analyzable format:
- Spreadsheet files: CSV and Excel files uploaded to conversations or Work Items
- Database results: Named Query outputs from database queries
- Combined sources: Join data from multiple spreadsheets or mix spreadsheet and database data
When to use DataFrames
Compared to LLM-based analysis or manual work, DataFrames is a better fit when you need:
| Use Case | LLM-Based Analysis | Manual Work | DataFrames |
|---|---|---|---|
| Financial calculations | ❌ Unreliable | ⚠️ Slow, error-prone | ✅ 100% accurate |
| Large dataset analysis | ❌ Context limits | ❌ Impractical | ✅ Millions of rows |
| Multi-source reconciliation | ❌ Can't join reliably | ⚠️ Hours of work | ✅ Automatic joining |
| Ad-hoc data exploration | ⚠️ Limited scale | ⚠️ Tedious | ✅ Natural language |
| Repeatable workflows | ❌ Inconsistent | ❌ Manual each time | ✅ Runbook automation |
| Cost at enterprise scale | ❌ Token expensive | N/A | ✅ Dramatically reduced |
Common use cases
Here are practical scenarios where DataFrames delivers immediate value:
For business analysts:
- Financial reconciliation: Match invoice data against payment records to identify discrepancies
- Multi-file comparison: Find duplicates and inconsistencies across customer datasets from different systems
- Sales analysis: Compare performance across regions using data from multiple sources
- Budget variance: Reconcile actual spending against budgeted amounts from various departments
For AI developers:
- Automated data workflows: Build runbooks that perform complex data transformations automatically
- Exception handling: Process large datasets and flag outliers for human review
- Repeatable reconciliation: Turn one-off analysis tasks into automated agent processes
- Data quality checks: Validate data accuracy across multiple enterprise sources
Deep dive: Multi-spreadsheet reconciliation
Scenario: A finance team needs to reconcile Q4 expense data across three Excel files from different departments—Sales, Marketing, and Operations—to identify missing receipts, duplicates, and category mismatches.
Traditional approach challenges:
- Hours of manual VLOOKUP formulas across files
- Prone to errors when categories don't match exactly
- Difficult to identify duplicate entries with slight variations
- Non-repeatable process that must be redone each quarter
DataFrames solution:
Upload all data sources
Use the to attach the three Excel files to your conversation:
sales_expenses_q4.xlsxmarketing_expenses_q4.xlsxoperations_expenses_q4.xlsx
Each file automatically becomes a DataFrame, ready for analysis.
Ask your reconciliation questions
In natural language, describe what you need:
"Analyze these three expense files and help me:
- Find any duplicate transactions across all departments
- Identify expenses that appear in only one file but should be in the consolidated report
- Flag any category mismatches where the same vendor appears under different expense categories
- Calculate total spending by category across all departments"
Agent performs intelligent analysis
Watch as the agent:
- Automatically creates intermediate DataFrames for each analysis step
- Joins data across all three files based on transaction dates, amounts, and vendors
- Uses fuzzy matching to identify duplicates with slight variations
- Shows its complete analytical reasoning and SQL queries
- Produces mathematically accurate calculations
Review and export results
- See the agent's step-by-step work process with transparent reasoning
- Review flagged issues with exact source file references
- Export the reconciliation report to CSV with all findings
- Share with stakeholders via SharePoint for review
Business impact: What took 6-8 hours of manual Excel work now takes 15 minutes, with zero calculation errors and complete audit trails showing how every discrepancy was identified.
Deep dive: Combining spreadsheet and database data
Scenario: A sales operations analyst needs to compare uploaded customer order data from a partner portal (Excel file) against internal CRM records (accessed via Named Query) to identify orders that haven't been logged in the CRM system.
Traditional approach challenges:
- Exporting CRM data and manually comparing in Excel
- Building custom database queries for one-off analysis
- Waiting for IT to create data pipelines
- No easy way to handle mismatched customer identifiers
DataFrames solution:
Prepare your data sources
Upload the partner file:
Upload partner_orders_jan.xlsx containing order IDs, customer names, amounts, and dates
Run your Named Query: Execute your existing Named Query to pull CRM order records for January from your database
Both sources automatically become DataFrames.
Express your analysis need
Ask the agent in plain English:
"Compare the partner orders file against our CRM data and help me:
- Find orders in the partner file that don't exist in our CRM
- Match customers even if names are slightly different (like 'ABC Corp' vs 'ABC Corporation')
- Identify any orders where amounts don't match between systems
- Show me the total revenue gap from missing orders"
Agent performs cross-source analysis
The agent automatically:
- Identifies that "Customer Name" in the Excel file corresponds to "Account Name" in the database
- Performs intelligent fuzzy matching to handle name variations
- Joins data across both sources using multiple criteria
- Creates intermediate DataFrames showing matched vs. unmatched orders
- Calculates precise financial impact with SQL accuracy
Take action on insights
- Review the list of missing orders with customer details
- Export the discrepancy report to CSV
- Use the findings to update CRM or investigate with partners
- Save the analysis approach in a runbook for monthly automation
Business impact: Transform a recurring monthly task that required IT assistance and took multiple days into a self-service analysis completed in under an hour, with complete confidence in mathematical accuracy.
Tips for success
For business analysts:
- Start simple: Begin with basic queries to understand your data before building complex analysis
- Express intent clearly: Describe what you want to know rather than how to calculate it
- Upload multiple sources: Don't hesitate to combine data from files and databases
- Review agent work: Check intermediate DataFrames to understand the analytical process
- Export early: Save important results to CSV or Google Sheets as you work
For AI developers:
- Use in runbooks: Automate repetitive data tasks by expressing transformation logic in natural language
- Combine with other features: Integrate DataFrames with Named Queries for powerful workflows
- Trust the math: SQL-powered calculations ensure reliability for financial and compliance work
- Handle exceptions: Use DataFrames in Worker Agent workflows to flag outliers for human review
- Build reusable patterns: Document successful analysis approaches for team reuse
What's next?
Now that you understand what DataFrames is and when to use it, you're ready to start analyzing your data with mathematical precision and natural language simplicity.
Getting started:
- Upload your first spreadsheet and ask a simple question
- Try combining data from multiple sources
- Explore how agents show their analytical reasoning
- Export results to share with your team
Advanced workflows:
- Build runbooks that automate recurring analysis tasks
- Integrate DataFrames with Named Queries for dynamic data access
- Create repeatable reconciliation processes for your business