Document Intelligence
Define document type

Define the document type

A document type is the structure definition for the data you're extracting from the remittance documents for reconciliation. It resembles what you may know as content model, wireframe, blueprint, or relational database architecture.

Remittance documents from various clients and for various payment types may have different structure. A document type defines the target unified structure into which Document Intelligence puts the data extracted from remittance documents. The unified data structure allows for processing data across different document formats, as well as for validating the extracted data before further processing.

Create a document type

  1. In Work Room, click Documents in the left navigation menu.
  2. Type a descriptive name for the document type. The name should reflect the use case you have for the document type.
    • For example, Remittance reconciliation for Customer XYZ
  3. Define the fields and tables to extract as detailed below.

In the screen that opens, you see two sections:

  • Fields to extract
  • Tables to extract

Terminology reminder

  • Fields are lines of text typically structured as key:value pairs, even if they're sentences (such as The company name is ACME.).
  • Tables hold data in cells organized by rows and columns where the first row and the first column are typically used as headers for data labels.

Add fields

Take a look at you mapping sheet you created in the previous unit.

Fields in a remmitance document
Fields in a remmitance document

Create one row per each entry you found relevant in the textual data (red and blue rectangles). Use your internal names for the data. You'll map these labels to what you clients call them later on.

When you add an item, decide whether it's required or not. Required means that if Document Intelligence fails to retrieve this piece of data from an incoming document, the document fails validation right without further processing and human intervention will be required.

Fields in document type
Fields in document type

You can save your work in progress by clicking Save Configuration. Changes on this screen don't auto-save. To get back to editing, select the Configuration tab at the top of the screen.

Add tables

When you're done, do the same with the tabular data. Since there may be multiple fields in one document, you need to name and describe the table before you define its data items.

  1. Next to the Tables to extract heading, click the  icon.
  2. Name the table.
  3. Describe the table so that its purpose is clear.
  4. Define the columns to extract by clicking the  icon beneath table description.
Add table columns
Add table columns

Derived table for facility subtotals

In the example here, we have actually two tables to define:

  • The first is the main table with the invoices defined above.
  • The second is one with two columns, and that is the facility name and its subtotal. This is a derived table based on the footer of each invoice table.

Facility subtotal table footer

From this table, we need to get the facility name and its subtotal. This will serve as a redundancy check because the number here needs to equal the sum of the numbers in the Amount Due column.

To create the second column, follow the same steps as for the first table.

Add facility subtotals table
Add facility subtotals table

Agent-based validation

Agent-based validation provides a way to segrate the validation and the data engineering required to validate the document from the actual reconciliation/processing use case. When you enable agent-based validation, you can define the validation rules in a separate agent.

For this use case walkthrough, to keep things simple, we'll do the validation in the same agent that does the reconciliation. That means we do not enable agent-based validation for now.

Save the document type

This concludes the setup of the document type. Click Save configuration to make the type available for agents across your workspace.

Your document type is now accessible across the Sema4.ai toolset: in Work Room, Control Room, or Studio. However, before you can actually use it, you need to define the data mapping using document formats.