Step-by-step walkthrough

Step-by-step walkthrough

Prerequisites

Before starting this tutorial, make sure you have:

  1. Installed Sema4.ai Studio and connected it to Snowflake
  2. Installed the Sema4.ai SDK extensions
  3. Completed the Data setup guide to create the required database objects and load sample data
  4. The following Snowflake requirements:
    • Admin level access to a Snowflake account
    • Key-pair authentication set up for your Snowflake user
    • Cortex AI LLM usage granted to your account/role (and cross-region inference enabled in case your Snowflake account is in a region where it's needed, read more here (opens in a new tab))

Part 1: Build your first agent

You'll learn how to:

  1. Build your first agent from the scratch
  2. Work with a natural language runbook
  3. Connect with Snowflake Cortex Analyst
  4. Test your agent locally

Open Sema4.ai Studio

We'll get started using Sema4.ai Studio. Open it up and take a moment to look around the Studio. These are the main tabs you'll be using:

  • Agents - This is where you'll see all your agents, create new ones, and manage existing ones.
  • Actions - Home of the actions that your agents can use. We've included a gallery of prebuilt actions to common platform (tagged with prefix Sema4.ai) and you can also create your own (will be tagged with prefix My Actions).
  • Data Sources - Connections to data sources that you can use in your agents.
Sema4.ai Studio - Ready to build agents!
Sema4.ai Studio - Ready to build agents!

Build a new agent

Now that you're all set up, let's build your first agent! We are going to build one from the scratch (but worry not, all details are provided in the tutorial).

  1. Navigate to the Agents screen from the main navigation (or click Try Now button in the home screen)
  2. Under My Agents tab, click button.
Create new agent
Create new agent
  1. Choose Conversational as the agent type.
Choose agent type
Choose agent type
  1. Give your agent a name, for example Oil and Gas Analyst.
  2. Give your agent a description, for example Oil and gas analyst's best friend - helps with data analysis and insights.
  3. Next, we'll create a runbook for your agent. Click Create Runbook button to open the editor. By default the editor shows an outline of a runbook that you can use to get started.
  4. We've made this easy for you: Download the runbook.md (opens in a new tab) file first.
  5. Then click the import button with icon and select the runbook file you just downloaded with the file browser.
Import runbook
Import runbook
  1. Click Save button in the top right corner to save your changes, and then close the runbook editor with the button.
  2. Because you're connected to your Snowflake account, we've automatically configured Snowflake Cortex AI as your default LLM. Let's not change that.

Ok, taking a bit of a breather here. Now all the basics configurations of your agent are done. This is is what you should see:

Agent created
Agent created

Add actions

Next up, we'll add actions to your agent. We've prepared a gallery of pre-built actions that make it easy to connect agents to your enterprise apps. We have covered creating your own custom actions with Python in other tutorials.

Adding the Snowflake Cortex Analyst action package

  1. Click Add Action.
  2. Scroll to or search for Sema4.ai / Snowflake Cortex Analyst action package.
Add action
Add action
  1. Click button to add a new action to your agent - it may take a bit as the action package and the Python environment are automatically loaded.
  2. Then click Add to Agent button at the top.
Add to Agent
Add to Agent
  1. You'll see that the action package is "yellow" which means you need to provide some configuration details.
  2. Click the Configure button.
  3. Finally, we'll provide the values that let the action connect with your Snowflake Cortex services. Input these values to the fields:
ConfigurationValue
DatabaseOIL_AND_GAS
SchemaPUBLIC
Semantic model file@OIL_AND_GAS.PUBLIC.CORTEX_ANALYST/oil_gas.yaml
WarehouseSEMA4_DEMO_WH
Configure actions
Configure actions
  1. Click Confirm.
  2. Click Deploy on the bottom left to bring your agent to life!
Deployed agent
Deployed agent

Test your agent locally

Once agent deployment is complete, you'll see the chat interface along with the agent details on the right panel.

  1. Create new Chat from the left panel
  2. Ask agent a question releveant to data, for example:

What is the total oil production by company for year to date 2024?

Agent uses Cortex Analyst to generate a query that answers the question
Agent uses Cortex Analyst to generate a query that answers the question
  1. Observe how calls the Cortex Analyst action to generate a query that answers your question, and then executes the query using the Snowflake Execute Query action.
  2. Finally, the agent produces a chart of the results, in cases where chart makes sense - this is because we instructed the agent to do so in the runbook.
Agent produces a chart of the results
Agent produces a chart of the results

Now it's your turn!

Try asking the agent a few questions to see how it performs. Here are some examples:

Show me my top 5 producing wells

Can you show me the monthly production data for TEDDY FEDERAL 12X-5F

What other wells are operated by this same company?

Can you show me a pie chart of the distribution of well types for XTO?

Great job! You've completed the first part of the tutorial and built and tested your first agent all on your laptop. Now let's move on and make your agent more powerful by adding data actions!

Part 2: Connect to data

You'll learn how to:

  1. Connect to your Snowflake data sources
  2. Create and test data actions
  3. Use data actions with your agent

Open VSCode or Cursor

  1. Open VSCode or Cursor
  2. Create a new empty directory and open it in your IDE
  3. It should look something like this:
Empty directory
Empty directory

Create a new action package

Now let's build your first data action. The templates provided by our SDK are the easiest way to get started as they provide the basic structure and some example code to get you going quickly.

  1. Make sure your VS Code/Cursor is opened in an empty directory
  2. Open the SDK Extension
  3. Use Command Palette (Ctrl+Shift+P or ⌘+Shift+P)
  4. Start typing "Create Action Package" and you'll see a a command Sema4.ai: Create Action Package appear. Click it.
Create Action Package command
Create Action Package command
  1. Choose project location (typically your workspace folder)
  2. Give a name to your action package, for example oil-and-gas
  3. Select a template from the list, in this case please pick the Data Access/Query
  4. The extension builds your Python environment and create the project structure, this will take a few moments

Once your environment is ready, you'll see that the action list in the side panel get updated to reflect the chosen template, and you can see the project structure in the PACKAGE CONTENT panel. Feel free to collapse the PACKAGE RESOURCES and CONTROL ROOM panels to get a better view of the project structure.

After Action Package Creation
After Action Package Creation

Explore data action structure

Your new action project contains:

  • package.yaml - Configuration for dependencies and metadata
  • data_actions.py - Main Python code for your data actions
  • data_sources.py - Definition of your data sources and models
  • devdata/*.json - Test input files for local development
  • package.png - Icon for your action package

Exploring the data_actions.py file further, you'll see that it contains a few examples of data actions, to give you a sense for how to use the Python functions, decorators, parameters and different return types.

@query
def get_customers_in_country(country: str) -> Response[Table]:
    """
    Get all customer names, their id's and account managers in a given country.
	...

The @query decorator exposes this function to the agent. The docstring helps the agent understand how and when it should be used, and the parameters with their type hints (e.g. country: str) informs the agent what parameters are required to invoke this data action.

Configure data sources

Now that our data action package is created, we need to configure our data sources.

  1. Open the data_sources.py file. This template file shows you an example of how to use a CSV file and a Postgres database, but in this example we'll focus on using a single Snowflake data source.
  2. Replace the contents of the data_sources.py file with the following:
"""
The data_sources.py is used to define both the datasources as well as the data server bootstrap.
"""
 
from typing import Annotated
from sema4ai.data import DataSource, DataSourceSpec
 
OilGasDataSource = Annotated[
    DataSource,
    DataSourceSpec(
        name="oil_and_gas",
        engine="snowflake",
        description="Oil & Gas Report Data",
    ),
]

Setup Data Source

  1. Open up the SDK Extension
  2. Expand Data Sources
  3. Click the button next to oil_and_gas (snowflake) to setup the connection needed to access this data.
    Setup Data Source
    Setup Data Source
  4. Now, fill in the details to connect to the warehouse, database, and schema you created in the previous step.
SettingValue
EngineSnowflake
Nameoil_and_gas
Credential Typelinked
WarehouseSEMA4_DEMO_WH
DatabaseOIL_AND_GAS
SchemaPUBLIC

The 'linked' credential type means that the connection will use the same credentials provided when linking Studio to Snowflake.

Setup Data Source
Setup Data Source
  1. Click Add to complete data source setup.

Browse tables

  1. Open the Sema4.ai Data Access extension
  2. Expand the oil_and_gas data source to see the tables available in the data source.
  3. Click on the PRODUCTION_REPORTS table to view its contents.
Browse tables
Browse tables

Adding data actions

  1. Open up the SDK Extension
  2. Open the data_actions.py file under PACKAGE CONTENT
  3. Replace the contents of the file with the following:
from data_sources import OilGasDataSource
from typing import Annotated
from sema4ai.actions import ActionError, Response, Table
from sema4ai.data import DataSource, query, get_connection
import pandas
import numpy as np
import math
from scipy import stats
from datetime import datetime
 
@query
def get_production_for_well(wellName: str) -> Response[Table]:
    """
    Get production details for a specific well.
 
    Args:
        wellName: Name of the specific well.
 
    Returns:
        str: The complete production results for the well.
    """
    query = """
        SELECT * FROM oil_and_gas.production_reports
        WHERE WELLNAME = $wellName;
        """
    params = {'wellName': wellName}
 
    result = get_connection().query(query=query, params=params)
    return Response(result=result.to_table())
 
@query
def get_wells_by_company(
    company: str) -> Response[Table]:
    """
    Get all wells operated by a specific company.
 
    Args:
        company: Name of the company.
 
    Returns:
        str: List of wells operated by the company.
    """
    company = "%" + company + "%"
    query = """
        SELECT DISTINCT WellName, API_WELLNO, FieldName
        FROM oil_and_gas.production_reports
        WHERE Company LIKE $company
        ORDER BY WellName;
        """
    params = {'company': company}
 
    result = get_connection().query(query=query, params=params)
    return Response(result=result.to_table())
 
@query
def get_top_producing_wells(limit: int) -> Response[Table]:
    """
    Get the top producing wells based on oil production.
 
    Args:
        limit: Number of top wells to return.
 
    Returns:
        str: List of top producing wells with their production figures.
    """
    query = """
    SELECT WellName, 
        SUM(Oil) as Oil, 
        SUM(Wtr) as Wtr, 
        SUM(Gas) as Gas
    FROM oil_and_gas.production_reports
    WHERE Oil > 0
    GROUP BY WellName
    ORDER BY Oil DESC
    LIMIT $limit;
    """
    params = {'limit': limit}
 
    result = get_connection().query(query=query, params=params)
    return Response(result=result.to_table())
 
 
@query
def get_well_location(wellName: str) -> Response[Table]:
    """
    Get the locations of a specific.
 
    Args:
        wellName: Name of the well.
 
    Returns:
        str: Latitude and longitude for a specific well.
    """
    query = """
        SELECT DISTINCT WellName, Lat, Long
        FROM oil_and_gas.production_reports
        WHERE WellName = $wellname
        ORDER BY WellName;
        """
 
    params = {'wellname': wellName}
 
    result = get_connection().query(query=query, params=params)
    return Response(result=result.to_table())
 
@query
def find_wells_within_radius(datasource: OilGasDataSource, wellName: str, radius: int) -> Response[Table]:
    """
    Find wells within a specified radius of a given well.
 
    Args:
        wellName: Name of the well to use as the center point.
        radius: Radius in meters.
 
    Returns:
        str: List of wells within the specified radius.
    """
    query = """
    SELECT DISTINCT p2.wellname, p2.company
    FROM
        production_reports AS p1
    JOIN production_reports AS p2 ON ST_DWITHIN (
        ST_MAKEPOINT (p1.long, p1.lat),
        ST_MAKEPOINT (p2.long, p2.lat),
        $radius
    )
    AND p1.wellname <> p2.wellname
    WHERE p1.wellname = $wellname;
  """
    params = {'wellname': wellName, 'radius': radius}
 
    result = datasource.native_query(query=query, params=params)
    return Response(result=result.to_table())
    
 
@query
def fit_exponential_decline_curve(
    wellName: str,
    startDate: str,
    endDate: str
) -> Response[str]:
    """
    Fit an exponential decline curve to the production data of a specific well.
 
    Args:
        wellName: Name of the well
        startDate: Start date of the analysis period (format: 'MM/DD/YYYY')
        endDate: End date of the analysis period (format: 'MM/DD/YYYY')
 
    Returns:
        Parameters of the fitted exponential decline curve.
    """
    # Convert input dates from MM/DD/YYYY to YYYY-MM-DD for SQL
    start_date_obj = datetime.strptime(startDate, '%m/%d/%Y')
    end_date_obj = datetime.strptime(endDate, '%m/%d/%Y')
    sql_start_date = start_date_obj.strftime('%Y-%m-%d')
    sql_end_date = end_date_obj.strftime('%Y-%m-%d')
 
    query = """
    SELECT 
        ReportDate,
        Oil
    FROM oil_and_gas.production_reports
    WHERE WellName = $well_name
    AND ReportDate BETWEEN $start_date AND $end_date
    AND Oil > 0
    ORDER BY ReportDate
    """
    params = {'well_name': wellName, 'start_date': sql_start_date, 'end_date': sql_end_date}
    result = get_connection().query(query=query, params=params).as_dataframe()
    
    if not result.empty:
        # Updated datetime parsing to handle timestamps
        dates = [datetime.strptime(str(date).split('.')[0], '%Y-%m-%d') for date in result['REPORTDATE']]
        production = result['OIL'].astype(float).tolist()
        
        # Calculate days since first production
        first_date = min(dates)
        days = [(date - first_date).days for date in dates]
        
        # Use numpy and scipy for curve fitting
        log_production = np.log(production)
        slope, intercept, r_value, p_value, std_err = stats.linregress(days, log_production)
        
        q_i = math.exp(intercept)
        D = -slope
        tau = 1 / D
        
        result_dict = {
            'WellName': wellName,
            'Initial Production (q_i)': round(q_i, 2),
            'Decline Rate (D)': round(D, 6),
            'Characteristic Time (tau)': round(tau, 2),
            'R-squared': round(r_value**2, 4)
        }
        return Response(result=pandas.DataFrame([result_dict]).to_markdown())
    
    return Response(result="No data available for the specified well and date range.")
 
@query
def get_full_company_name(company: str) -> Response[Table]:
    """
    Get the full company name that match the given search term.
 
    Args:
        company: Partial name of the company to search for.
 
    Returns:
        Table: List of distinct company names matching the search term.
    """
    company = "%" + company.upper() + "%"
    query = """
        SELECT DISTINCT Company
        FROM oil_and_gas.production_reports
        WHERE Company LIKE $company
        ORDER BY Company;
        """
    params = {'company': company}
 
    result = get_connection().query(query=query, params=params)
    return Response(result=result.to_table())
 
@query
def get_well_file_number(wellName: str) -> Response[Table]:
    """
    Get the file number for a specific well.
 
    Args:
        wellName: Name of the well to search for.
 
    Returns:
        Table: The file number for the specified well.
    """
    query = """
        SELECT DISTINCT WellName, FileNo
        FROM oil_and_gas.production_reports
        WHERE WellName = $wellName;
        """
    params = {'wellName': wellName}
 
    result = get_connection().query(query=query, params=params)
    return Response(result=result.to_table())

These queries form the foundation of our oil and gas analyst, allowing the agent to answer questions about well production data, locations, and company operations quickly. All other questions are answered using Cortex Analyst.

Data ActionDescription
get_production_for_wellRetrieves all production data for a specific well by name
get_wells_by_companyFinds all wells operated by a given company, supporting partial name matches
get_top_producing_wellsReturns the top N wells based on total oil production, including oil, water, and gas totals
get_well_locationGets the latitude and longitude coordinates for a specific well
find_wells_within_radiusIdentifies other wells within a specified radius (in meters) of a given well using geospatial functions
fit_exponential_decline_curveAnalyzes production decline by fitting an exponential curve to a well's production data over a specified date range
get_full_company_nameSearches for full company names based on partial matches
get_well_file_numberRetrieves the file number associated with a specific well

The data actions above use two different methods for querying data:

  • get_connection().query() - Standard query method uses Sema4.ai's SQL dialect and provides basic SQL functionality and parameterization and works consistently across all supported data sources.
  • datasource.native_query() - Used for advanced queries that need database-specific, e.g. Snowflake features like the geospatial functions in find_wells_within_radius.

Update package.yaml

Our data actions enable you to not only query data but also perform calculations and transformations using Python libraries. One of our data actions, fit_exponential_decline_curve, uses the scipy library for curve fitting. We need to update the package.yaml file to include this dependency.

  1. Open the package.yaml file
  2. Update the dependencies > pypi section and add the following:
- scipy=1.14.1
- tabulate=0.9.0
  1. Also update the description to 'Actions for working with oil and gas data'
  2. Save the file
  3. If prompted to rebuild the environment, click Yes
Update package.yaml
Update package.yaml

Test data actions

Our VS Code extension makes it easy to test your data actions locally, using test data. This makes it simple to quickly test, debug, and optimize your data actions before using them with your agent.

  1. Locate the get_production_for_well action in the list of actions.
  2. Click the arrow to expand the action, then click Run Action. Before we can run the action, we need to create an action input file that contains the parameters for the action.
  3. When prompted to create the file, click 'Create "devdata/input..."'. It will take a few seconds to create the file.

If the input file fails to create the first time, just try again.

  1. The input file will be automatically created in the devdata folder and opened in your editor.
  2. Update the inputValue and Save the file.
"inputValue": {
	"wellName": "TEDDY FEDERAL 12X-5F"
}
Update action input data
Update action input data
  1. Click the Run Action button to run the action.
  2. View the results in the Terminal pane.

Publish data actions

At this point, you have a set of working data actions that you can use with your agent. All you need to do now is publish it to Studio.

  1. Scroll to the bottom of the action list and click Publish to Studio.
  2. Open up Studio and you'll notice the action is importing in the Actions tab.
Publish to Studio
Publish to Studio

Whenever you need to make changes to your actions, simply update the code and publish again to Studio. The new updtes will automatically be applied to your agent and it will immediately start using the updated actions.

Part 3: Updating your agent to use your new actions

Edit your agent

  1. Open your Oil and Gas agent from the list of Agents.
  2. Click the Edit Agent button in the lower-right hand corner to open the agent editor
  3. Click the Add Action button.
  4. Locate your MyActions / oil-and-gas action and click the button.
  5. Click Add to Agent to add the action to your agent.
Add action to agent
Add action to agent
  1. Click Update in the lower-left hand corner to update your agent.

Part 4: Using your agent

Try asking the agent a few questions to see how it performs. Here are some examples that use the new data actions you just published:

Can you show me the monthly production data for Porter 26-35 4H

Can you show me the exponential decline curve for this well using 2024 data?

Show me the map link for this well

Show me all the wells within 200 meters of this well

How many wells does Grayson operate?

Show me a heatmap of wells per field

Recommended resources

Congratulations on making it through this walkthrough! We hope you enjoyed it, and we are looking forward to your feedback!

Use these following resources to get to the next level with Sema4.ai:

Actions Gallery

Data Access