Step-by-step walkthrough

Prerequisites

Before starting this tutorial, make sure you have:

Installed Sema4.ai Studio and connected it to Snowflake
Installed the Sema4.ai SDK extensions
Completed the Data setup guide to create the required database objects and load sample data
The following Snowflake requirements:
- Admin level access to a Snowflake account
- Key-pair authentication set up for your Snowflake user
- Cortex AI LLM usage granted to your account/role (and cross-region inference enabled in case your Snowflake account is in a region where it's needed, read more here (opens in a new tab))

Part 1: Build your first agent

You'll learn how to:

Build your first agent from the scratch
Work with a natural language runbook
Connect with Snowflake Cortex Analyst
Test your agent locally

Open Sema4.ai Studio

We'll get started using Sema4.ai Studio. Open it up and take a moment to look around the Studio. These are the main tabs you'll be using:

Agents - This is where you'll see all your agents, create new ones, and manage existing ones.
Actions - Home of the actions that your agents can use. We've included a gallery of prebuilt actions to common platform (tagged with prefix Sema4.ai) and you can also create your own (will be tagged with prefix My Actions).
Data Sources - Connections to data sources that you can use in your agents.

Sema4.ai Studio - Ready to build agents!

Build a new agent

Now that you're all set up, let's build your first agent! We are going to build one from the scratch (but worry not, all details are provided in the tutorial).

Navigate to the Agents screen from the main navigation (or click Try Now button in the home screen)
Under My Agents tab, click button.

Choose Conversational as the agent type.

Give your agent a name, for example Oil and Gas Analyst.
Give your agent a description, for example Oil and gas analyst's best friend - helps with data analysis and insights.
Next, we'll create a runbook for your agent. Click Create Runbook button to open the editor. By default the editor shows an outline of a runbook that you can use to get started.
We've made this easy for you: Download the runbook.md (opens in a new tab) file first.
Then click the import button with icon and select the runbook file you just downloaded with the file browser.

Click Save button in the top right corner to save your changes, and then close the runbook editor with the button.
Because you're connected to your Snowflake account, we've automatically configured Snowflake Cortex AI as your default LLM. Let's not change that.

Ok, taking a bit of a breather here. Now all the basics configurations of your agent are done. This is is what you should see:

Add actions

Next up, we'll add actions to your agent. We've prepared a gallery of pre-built actions that make it easy to connect agents to your enterprise apps. We have covered creating your own custom actions with Python in other tutorials.

Adding the Snowflake Cortex Analyst action package

Click Add Action.
Scroll to or search for Sema4.ai / Snowflake Cortex Analyst action package.

Click button to add a new action to your agent - it may take a bit as the action package and the Python environment are automatically loaded.
Then click Add to Agent button at the top.

You'll see that the action package is "yellow" which means you need to provide some configuration details.
Click the Configure button.
Finally, we'll provide the values that let the action connect with your Snowflake Cortex services. Input these values to the fields:

Configuration	Value
Database	OIL_AND_GAS
Schema	PUBLIC
Semantic model file	@OIL_AND_GAS.PUBLIC.CORTEX_ANALYST/oil_gas.yaml
Warehouse	SEMA4_DEMO_WH

Click Confirm.
Click Deploy on the bottom left to bring your agent to life!

Test your agent locally

Once agent deployment is complete, you'll see the chat interface along with the agent details on the right panel.

Create new Chat from the left panel
Ask agent a question releveant to data, for example:

What is the total oil production by company for year to date 2024?

Agent uses Cortex Analyst to generate a query that answers the question

Observe how calls the Cortex Analyst action to generate a query that answers your question, and then executes the query using the Snowflake Execute Query action.
Finally, the agent produces a chart of the results, in cases where chart makes sense - this is because we instructed the agent to do so in the runbook.

Now it's your turn!

Try asking the agent a few questions to see how it performs. Here are some examples:

Show me my top 5 producing wells

Can you show me the monthly production data for TEDDY FEDERAL 12X-5F

What other wells are operated by this same company?

Can you show me a pie chart of the distribution of well types for XTO?

Great job! You've completed the first part of the tutorial and built and tested your first agent all on your laptop. Now let's move on and make your agent more powerful by adding data actions!

Part 2: Connect to data

You'll learn how to:

Connect to your Snowflake data sources
Create and test data actions
Use data actions with your agent

Open VSCode or Cursor

Open VSCode or Cursor
Create a new empty directory and open it in your IDE
It should look something like this:

Create a new action package

Now let's build your first data action. The templates provided by our SDK are the easiest way to get started as they provide the basic structure and some example code to get you going quickly.

Make sure your VS Code/Cursor is opened in an empty directory
Open the SDK Extension
Use Command Palette (Ctrl+Shift+P or ⌘+Shift+P)
Start typing "Create Action Package" and you'll see a a command Sema4.ai: Create Action Package appear. Click it.

Choose project location (typically your workspace folder)
Give a name to your action package, for example oil-and-gas
Select a template from the list, in this case please pick the Data Access/Query
The extension builds your Python environment and create the project structure, this will take a few moments

Once your environment is ready, you'll see that the action list in the side panel get updated to reflect the chosen template, and you can see the project structure in the PACKAGE CONTENT panel. Feel free to collapse the PACKAGE RESOURCES and CONTROL ROOM panels to get a better view of the project structure.

Explore data action structure

Your new action project contains:

package.yaml - Configuration for dependencies and metadata
data_actions.py - Main Python code for your data actions
data_sources.py - Definition of your data sources and models
devdata/*.json - Test input files for local development
package.png - Icon for your action package

Exploring the data_actions.py file further, you'll see that it contains a few examples of data actions, to give you a sense for how to use the Python functions, decorators, parameters and different return types.

@query
def get_customers_in_country(country: str) -> Response[Table]:
    """
    Get all customer names, their id's and account managers in a given country.
	...

The @query decorator exposes this function to the agent. The docstring helps the agent understand how and when it should be used, and the parameters with their type hints (e.g. country: str) informs the agent what parameters are required to invoke this data action.

Configure data sources

Now that our data action package is created, we need to configure our data sources.

Open the data_sources.py file. This template file shows you an example of how to use a CSV file and a Postgres database, but in this example we'll focus on using a single Snowflake data source.
Replace the contents of the data_sources.py file with the following:

"""
The data_sources.py is used to define both the datasources as well as the data server bootstrap.
"""
 
from typing import Annotated
from sema4ai.data import DataSource, DataSourceSpec
 
OilGasDataSource = Annotated[
    DataSource,
    DataSourceSpec(
        name="oil_and_gas",
        engine="snowflake",
        description="Oil & Gas Report Data",
    ),
]

Setup Data Source

Open up the SDK Extension
Expand Data Sources
Click the button next to oil_and_gas (snowflake) to setup the connection needed to access this data.
Setup Data Source
Now, fill in the details to connect to the warehouse, database, and schema you created in the previous step.

Setting	Value
Engine	Snowflake
Name	oil_and_gas
Credential Type	linked
Warehouse	SEMA4_DEMO_WH
Database	OIL_AND_GAS
Schema	PUBLIC

The 'linked' credential type means that the connection will use the same credentials provided when linking Studio to Snowflake.

Click Add to complete data source setup.

Browse tables

Open the Sema4.ai Data Access extension
Expand the oil_and_gas data source to see the tables available in the data source.
Click on the PRODUCTION_REPORTS table to view its contents.

Adding data actions

Open up the SDK Extension
Open the data_actions.py file under PACKAGE CONTENT
Replace the contents of the file with the following:

from data_sources import OilGasDataSource
from typing import Annotated
from sema4ai.actions import ActionError, Response, Table
from sema4ai.data import DataSource, query, get_connection
import pandas
import numpy as np
import math
from scipy import stats
from datetime import datetime
 
@query
def get_production_for_well(wellName: str) -> Response[Table]:
    """
    Get production details for a specific well.
 
    Args:
        wellName: Name of the specific well.
 
    Returns:
        str: The complete production results for the well.
    """
    query = """
        SELECT * FROM oil_and_gas.production_reports
        WHERE WELLNAME = $wellName;
        """
    params = {'wellName': wellName}
 
    result = get_connection().query(query=query, params=params)
    return Response(result=result.to_table())
 
@query
def get_wells_by_company(
    company: str) -> Response[Table]:
    """
    Get all wells operated by a specific company.
 
    Args:
        company: Name of the company.
 
    Returns:
        str: List of wells operated by the company.
    """
    company = "%" + company + "%"
    query = """
        SELECT DISTINCT WellName, API_WELLNO, FieldName
        FROM oil_and_gas.production_reports
        WHERE Company LIKE $company
        ORDER BY WellName;
        """
    params = {'company': company}
 
    result = get_connection().query(query=query, params=params)
    return Response(result=result.to_table())
 
@query
def get_top_producing_wells(limit: int) -> Response[Table]:
    """
    Get the top producing wells based on oil production.
 
    Args:
        limit: Number of top wells to return.
 
    Returns:
        str: List of top producing wells with their production figures.
    """
    query = """
    SELECT WellName, 
        SUM(Oil) as Oil, 
        SUM(Wtr) as Wtr, 
        SUM(Gas) as Gas
    FROM oil_and_gas.production_reports
    WHERE Oil > 0
    GROUP BY WellName
    ORDER BY Oil DESC
    LIMIT $limit;
    """
    params = {'limit': limit}
 
    result = get_connection().query(query=query, params=params)
    return Response(result=result.to_table())
 
 
@query
def get_well_location(wellName: str) -> Response[Table]:
    """
    Get the locations of a specific.
 
    Args:
        wellName: Name of the well.
 
    Returns:
        str: Latitude and longitude for a specific well.
    """
    query = """
        SELECT DISTINCT WellName, Lat, Long
        FROM oil_and_gas.production_reports
        WHERE WellName = $wellname
        ORDER BY WellName;
        """
 
    params = {'wellname': wellName}
 
    result = get_connection().query(query=query, params=params)
    return Response(result=result.to_table())
 
@query
def find_wells_within_radius(datasource: OilGasDataSource, wellName: str, radius: int) -> Response[Table]:
    """
    Find wells within a specified radius of a given well.
 
    Args:
        wellName: Name of the well to use as the center point.
        radius: Radius in meters.
 
    Returns:
        str: List of wells within the specified radius.
    """
    query = """
    SELECT DISTINCT p2.wellname, p2.company
    FROM
        production_reports AS p1
    JOIN production_reports AS p2 ON ST_DWITHIN (
        ST_MAKEPOINT (p1.long, p1.lat),
        ST_MAKEPOINT (p2.long, p2.lat),
        $radius
    )
    AND p1.wellname <> p2.wellname
    WHERE p1.wellname = $wellname;
  """
    params = {'wellname': wellName, 'radius': radius}
 
    result = datasource.native_query(query=query, params=params)
    return Response(result=result.to_table())
    
 
@query
def fit_exponential_decline_curve(
    wellName: str,
    startDate: str,
    endDate: str
) -> Response[str]:
    """
    Fit an exponential decline curve to the production data of a specific well.
 
    Args:
        wellName: Name of the well
        startDate: Start date of the analysis period (format: 'MM/DD/YYYY')
        endDate: End date of the analysis period (format: 'MM/DD/YYYY')
 
    Returns:
        Parameters of the fitted exponential decline curve.
    """
    # Convert input dates from MM/DD/YYYY to YYYY-MM-DD for SQL
    start_date_obj = datetime.strptime(startDate, '%m/%d/%Y')
    end_date_obj = datetime.strptime(endDate, '%m/%d/%Y')
    sql_start_date = start_date_obj.strftime('%Y-%m-%d')
    sql_end_date = end_date_obj.strftime('%Y-%m-%d')
 
    query = """
    SELECT 
        ReportDate,
        Oil
    FROM oil_and_gas.production_reports
    WHERE WellName = $well_name
    AND ReportDate BETWEEN $start_date AND $end_date
    AND Oil > 0
    ORDER BY ReportDate
    """
    params = {'well_name': wellName, 'start_date': sql_start_date, 'end_date': sql_end_date}
    result = get_connection().query(query=query, params=params).as_dataframe()
    
    if not result.empty:
        # Updated datetime parsing to handle timestamps
        dates = [datetime.strptime(str(date).split('.')[0], '%Y-%m-%d') for date in result['REPORTDATE']]
        production = result['OIL'].astype(float).tolist()
        
        # Calculate days since first production
        first_date = min(dates)
        days = [(date - first_date).days for date in dates]
        
        # Use numpy and scipy for curve fitting
        log_production = np.log(production)
        slope, intercept, r_value, p_value, std_err = stats.linregress(days, log_production)
        
        q_i = math.exp(intercept)
        D = -slope
        tau = 1 / D
        
        result_dict = {
            'WellName': wellName,
            'Initial Production (q_i)': round(q_i, 2),
            'Decline Rate (D)': round(D, 6),
            'Characteristic Time (tau)': round(tau, 2),
            'R-squared': round(r_value**2, 4)
        }
        return Response(result=pandas.DataFrame([result_dict]).to_markdown())
    
    return Response(result="No data available for the specified well and date range.")
 
@query
def get_full_company_name(company: str) -> Response[Table]:
    """
    Get the full company name that match the given search term.
 
    Args:
        company: Partial name of the company to search for.
 
    Returns:
        Table: List of distinct company names matching the search term.
    """
    company = "%" + company.upper() + "%"
    query = """
        SELECT DISTINCT Company
        FROM oil_and_gas.production_reports
        WHERE Company LIKE $company
        ORDER BY Company;
        """
    params = {'company': company}
 
    result = get_connection().query(query=query, params=params)
    return Response(result=result.to_table())
 
@query
def get_well_file_number(wellName: str) -> Response[Table]:
    """
    Get the file number for a specific well.
 
    Args:
        wellName: Name of the well to search for.
 
    Returns:
        Table: The file number for the specified well.
    """
    query = """
        SELECT DISTINCT WellName, FileNo
        FROM oil_and_gas.production_reports
        WHERE WellName = $wellName;
        """
    params = {'wellName': wellName}
 
    result = get_connection().query(query=query, params=params)
    return Response(result=result.to_table())

These queries form the foundation of our oil and gas analyst, allowing the agent to answer questions about well production data, locations, and company operations quickly. All other questions are answered using Cortex Analyst.

Data Action	Description
`get_production_for_well`	Retrieves all production data for a specific well by name
`get_wells_by_company`	Finds all wells operated by a given company, supporting partial name matches
`get_top_producing_wells`	Returns the top N wells based on total oil production, including oil, water, and gas totals
`get_well_location`	Gets the latitude and longitude coordinates for a specific well
`find_wells_within_radius`	Identifies other wells within a specified radius (in meters) of a given well using geospatial functions
`fit_exponential_decline_curve`	Analyzes production decline by fitting an exponential curve to a well's production data over a specified date range
`get_full_company_name`	Searches for full company names based on partial matches
`get_well_file_number`	Retrieves the file number associated with a specific well

The data actions above use two different methods for querying data:

get_connection().query() - Standard query method uses Sema4.ai's SQL dialect and provides basic SQL functionality and parameterization and works consistently across all supported data sources.
datasource.native_query() - Used for advanced queries that need database-specific, e.g. Snowflake features like the geospatial functions in find_wells_within_radius.

Update package.yaml

Our data actions enable you to not only query data but also perform calculations and transformations using Python libraries. One of our data actions, fit_exponential_decline_curve, uses the scipy library for curve fitting. We need to update the package.yaml file to include this dependency.

Open the package.yaml file
Update the dependencies > pypi section and add the following:

- scipy=1.14.1
- tabulate=0.9.0

Also update the description to 'Actions for working with oil and gas data'
Save the file
If prompted to rebuild the environment, click Yes

Test data actions

Our VS Code extension makes it easy to test your data actions locally, using test data. This makes it simple to quickly test, debug, and optimize your data actions before using them with your agent.

Locate the get_production_for_well action in the list of actions.
Click the arrow to expand the action, then click Run Action. Before we can run the action, we need to create an action input file that contains the parameters for the action.
When prompted to create the file, click 'Create "devdata/input..."'. It will take a few seconds to create the file.

If the input file fails to create the first time, just try again.

The input file will be automatically created in the devdata folder and opened in your editor.
Update the inputValue and Save the file.

"inputValue": {
	"wellName": "TEDDY FEDERAL 12X-5F"
}

Click the Run Action button to run the action.
View the results in the Terminal pane.

Publish data actions

At this point, you have a set of working data actions that you can use with your agent. All you need to do now is publish it to Studio.

Scroll to the bottom of the action list and click Publish to Studio.
Open up Studio and you'll notice the action is importing in the Actions tab.

Whenever you need to make changes to your actions, simply update the code and publish again to Studio. The new updtes will automatically be applied to your agent and it will immediately start using the updated actions.

Part 3: Updating your agent to use your new actions

Edit your agent

Open your Oil and Gas agent from the list of Agents.
Click the Edit Agent button in the lower-right hand corner to open the agent editor
Click the Add Action button.
Locate your MyActions / oil-and-gas action and click the button.
Click Add to Agent to add the action to your agent.

Click Update in the lower-left hand corner to update your agent.

Part 4: Using your agent

Try asking the agent a few questions to see how it performs. Here are some examples that use the new data actions you just published:

Can you show me the monthly production data for Porter 26-35 4H

Can you show me the exponential decline curve for this well using 2024 data?

Show me the map link for this well

Show me all the wells within 200 meters of this well

How many wells does Grayson operate?

Show me a heatmap of wells per field

Recommended resources

Congratulations on making it through this walkthrough! We hope you enjoyed it, and we are looking forward to your feedback!

Use these following resources to get to the next level with Sema4.ai:

Actions Gallery

Explore the pre-built actions you can add to your agent.

Data Access

Quickstart Deploy agents