How does bank transaction categorization actually work?

Lending Tue 04 Jun 2024

Learn what happens behind the scenes of Codat’s bank transaction categorization engine.

One major hurdle in using new technology to boost decision-making and efficiency in business lending is explaining how it works, especially when AI is involved. Bank transaction categorization is a prime example that comes to mind.

Traditionally, this process has been laborious and error-prone, relying heavily on manual data entry and human judgment. However, thanks to recent advancements, lenders can now rely on categorization engines that not only mitigate these risks but also streamline operations, allowing lenders to focus on more strategic activities.

But how do they work, and how reliable are they in real-life scenarios? In this post, we explore what happens behind the scenes of Codat’s bank transaction categorization engine, shedding light on the technology that helps business lenders gain advanced insights and boost operational efficiency.

To learn more about how our account categorization engine works, click here.

What is bank transaction categorization, and why is it important?

When considering a loan application, lenders must carefully review and categorize a business’s transactions. That includes everything from travel expenses to high-value client payments. This information is crucial for underwriters to project future cash flows and understand existing debt obligations, determining the business’s remaining cash at month-end and their capacity for further debt.

While pivotal for loan approvals, the categorization process often leads to delays and errors. After all, bank transactions can be cryptic, offering little detail, which makes accurate categorization challenging.

Codat’s approach

Codat’s bank transaction categorization, a key component of the Lending API, simplifies the process by streamlining banking data retrieval and automating transaction categorization. Here’s a closer look at the technology that powers our engine and how it works:

Step 1: Data is fed into the engine

Our bank transaction categorization engine can operate on banking data alone, or lenders have the option to enhance it by adding accounting data. This allows for more comprehensive insight into the applicant’s transactions.

Lenders also have a choice of how the data is fed into the engine. Codat clients can utilize our accounting integrations and partnerships with Plaid and TrueLayer to enable applicants to link their bank account and accounting system (if required) with a few clicks using our connection flow. Alternatively, they can share transaction data they’ve sourced independently via our API or file upload.

Next, our categorization engine gets to work analyzing the bank transaction data.

Step 2: Transaction data is enriched with additional information

When both banking and accounting data is connected, Codat will search for a match for the bank transaction in the accounting data based on the date, amount, and counterparty associated with the transaction. Where a match is found, we then utilize the account type details associated with that transaction (e.g., income, expense, asset, liability, or equity) to offer lenders additional context.

In standard bookkeeping procedures, account transactions are recorded against one of the nominal accounts that appear on the Profit and Loss (Income Statement) or Balance Sheet. We use this information to categorize the nominal account under one of our 250+ financial categories. This ensures more accurate categorization based on the actual accounting behavior of businesses, rather than consumers, which is common with other providers.

Step 3: Details for unmatched transactions are extracted

Where a bank transaction lacks a match in the accounting system or the accounting system has not been connected, we extract details like counterparty, payment type, and classification from the transaction.

Our model then analyzes these details to predict a transaction’s category based on what it’s learned during training. It categorizes transactions into a hierarchy of levels, from Level 1, which gives the most high-level category (e.g., expense), to Level 5, which provides the most granular detail (e.g., client entertainment). We also provide a confidence score for each level, which indicates how sure the model is about its prediction.

We then surface all of this information to the lender so that they can decide, depending on their risk appetite, which category level they want to use to build out further reporting.

Here’s an overview of the process:

💡 A quick note on our training dataset

The strength of any machine learning engine lies in its training dataset. There are various methods for building one, but manually categorizing bank transactions is the most common approach.

At Codat, our infrastructure allows us to do things a little differently. Every month, tens of thousands of businesses connect their banking and accounting data via our APIs. This means we can source a wealth of bank records, match these records to the corresponding accounting transaction, and use this information to assign the correct label to each transaction.

This way, we can be sure that our training dataset and categorization logic are based on actual business practices, not guesswork. And even our lending clients who don’t use accounting data still reap the benefits of this approach.

The benefits of this approach

Codat’s approach to bank transaction categorization helps lenders improve the accuracy of the data they’re using, so they can more confidently automate their data processing to get insight into applicants’ cash flow. Here are a few things that set our solution apart:

🏪 Trained on actual business data: Unlike other categorization providers, our machine learning engine is trained on the behavior of real businesses. By steering clear of the sandbox or consumer data, we help lenders minimize the risk of data misclassification impacting their overall risk assessment process. This, in turn, leads to more informed lending decisions.

🤸 Flexibility in how and what you use: Our categorization engine can operate on banking data alone or supplemented with accounting data. It’s also designed to integrate seamlessly with your existing processes, whether you collect banking data through an Open Banking aggregator, directly from customers, or opt to use our integrations and partnerships with Plaid and TrueLayer. This ensures quality categorization without disrupting your existing workflows.

🔍 Enhanced accuracy using diverse datasets: Our engine cross-references transactions with multiple data sources, such as accounting records. This not only enhances the reliability of data but also reduces the manual effort involved in transaction verification. Even lenders that don’t use accounting data in their underwriting still benefit from this feature, which offers an additional layer of context to an applicant’s transactions.

Codat transforms categorization into a powerful competitive advantage

Codat’s bank transaction categorization engine, a key component of the Lending API, empowers business lenders with the precision and efficiency needed to stay ahead in a competitive market. Lenders like Playter, Finance Fair, and Sprk Capital rely on this feature to gain the clearest insights and unlock optimizations during the underwriting and decisioning processes and beyond. Find out how you can do the same by completing the form below.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	1 year	This cookies is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
lpv887333	30 minutes	No description
visitor_id887333	10 years	No description
visitor_id887333-hash	10 years	No description

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_89798244_1	1 minute	Google Analytics cookies are used to collect information about how Visitors use our site. We use the information to compile reports and to help us improve the site. The cookies collect information in an anonymous form, including the number of Visitors to the site, where Visitors have come to the site from, and the pages they visited.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
_hjAbsoluteSessionInProgress	30 minutes	This cookie is used to detect the first pageview session of a user. This is a True/False flag set by the cookie.
_hjFirstSeen	30 minutes	This is set by Hotjar to identify a new user’s first session. It stores a true/false value, indicating whether this was the first time Hotjar saw this user. It is used by Recording filters to identify new user sessions.
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInPageviewSample	2 minutes	This cookie is set to let Hotjar know whether that visitor is included in the data sampling defined by your site's pageview limit.
_hjTLDTest	session	When the Hotjar script executes we try to determine the most generic cookie path we should use, instead of the page hostname. This is done so that cookies can be shared across subdomains (where applicable). To determine this, we try to store the _hjTLDTest cookie for different URL substring alternatives until it fails. After this check, the cookie is removed.
_lfa	2 years	This cookie is set by the provider Leadfeeder. This cookie is used for identifying the IP address of devices visiting the website. The cookie collects information such as IP addresses, time spent on website and page requests for the visits.This collected information is used for retargeting of multiple users routing from the same IP address.
pardot	past	The cookie is set when the visitor is logged in as a Pardot user.
vuid	2 years	This domain of this cookie is owned by Vimeo. This cookie is used by vimeo to collect tracking information. It sets a unique ID to embed videos to the website.