How does bank transaction categorization actually work?

Learn what happens behind the scenes of Codat’s bank transaction categorization engine.

One major hurdle in using new technology to boost decision-making and efficiency in business lending is explaining how it works, especially when AI is involved. Bank transaction categorization is a prime example that comes to mind.

Traditionally, this process has been laborious and error-prone, relying heavily on manual data entry and human judgment. However, thanks to recent advancements, lenders can now rely on categorization engines that not only mitigate these risks but also streamline operations, allowing lenders to focus on more strategic activities.

But how do they work, and how reliable are they in real-life scenarios? In this post, we explore what happens behind the scenes of Codat’s bank transaction categorization engine, shedding light on the technology that helps business lenders gain advanced insights and boost operational efficiency.

To learn more about how our account categorization engine works, click here.

What is bank transaction categorization, and why is it important?

When considering a loan application, lenders must carefully review and categorize a business’s transactions. That includes everything from travel expenses to high-value client payments. This information is crucial for underwriters to project future cash flows and understand existing debt obligations, determining the business’s remaining cash at month-end and their capacity for further debt.

While pivotal for loan approvals, the categorization process often leads to delays and errors. After all, bank transactions can be cryptic, offering little detail, which makes accurate categorization challenging.

Codat’s approach

Codat’s bank transaction categorization, a key component of the Lending API, simplifies the process by streamlining banking data retrieval and automating transaction categorization. Here’s a closer look at the technology that powers our engine and how it works:

Step 1: Data is fed into the engine

Our bank transaction categorization engine can operate on banking data alone, or lenders have the option to enhance it by adding accounting data. This allows for more comprehensive insight into the applicant’s transactions. 

Lenders also have a choice of how the data is fed into the engine. Codat clients can utilize our accounting integrations and partnerships with Plaid and TrueLayer to enable applicants to link their bank account and accounting system (if required) with a few clicks using our connection flow. Alternatively, they can share transaction data they’ve sourced independently via our API or file upload.

Next, our categorization engine gets to work analyzing the bank transaction data.

Step 2: Transaction data is enriched with additional information

When both banking and accounting data is connected, Codat will search for a match for the bank transaction in the accounting data based on the date, amount, and counterparty associated with the transaction. Where a match is found, we then utilize the account type details associated with that transaction (e.g., income, expense, asset, liability, or equity) to offer lenders additional context. 

In standard bookkeeping procedures, account transactions are recorded against one of the nominal accounts that appear on the Profit and Loss (Income Statement) or Balance Sheet. We use this information to categorize the nominal account under one of our 250+ financial categories. This ensures more accurate categorization based on the actual accounting behavior of businesses, rather than consumers, which is common with other providers.

Step 3: Details for unmatched transactions are extracted

Where a bank transaction lacks a match in the accounting system or the accounting system has not been connected, we extract details like counterparty, payment type, and classification from the transaction.

Our model then analyzes these details to predict a transaction’s category based on what it’s learned during training. It categorizes transactions into a hierarchy of levels, from Level 1, which gives the most high-level category (e.g., expense), to Level 5, which provides the most granular detail (e.g., client entertainment). We also provide a confidence score for each level, which indicates how sure the model is about its prediction. 

We then surface all of this information to the lender so that they can decide, depending on their risk appetite, which category level they want to use to build out further reporting.

Here’s an overview of the process:

💡 A quick note on our training dataset

The strength of any machine learning engine lies in its training dataset. There are various methods for building one, but manually categorizing bank transactions is the most common approach.

At Codat, our infrastructure allows us to do things a little differently. Every month, tens of thousands of businesses connect their banking and accounting data via our APIs. This means we can source a wealth of bank records, match these records to the corresponding accounting transaction, and use this information to assign the correct label to each transaction. 

This way, we can be sure that our training dataset and categorization logic are based on actual business practices, not guesswork. And even our lending clients who don’t use accounting data still reap the benefits of this approach.

The benefits of this approach

Codat’s approach to bank transaction categorization helps lenders improve the accuracy of the data they’re using, so they can more confidently automate their data processing to get insight into applicants’ cash flow. Here are a few things that set our solution apart:

🏪 Trained on actual business data: Unlike other categorization providers, our machine learning engine is trained on the behavior of real businesses. By steering clear of the sandbox or consumer data, we help lenders minimize the risk of data misclassification impacting their overall risk assessment process. This, in turn, leads to more informed lending decisions.

🤸 Flexibility in how and what you use: Our categorization engine can operate on banking data alone or supplemented with accounting data. It’s also designed to integrate seamlessly with your existing processes, whether you collect banking data through an Open Banking aggregator, directly from customers, or opt to use our integrations and partnerships with Plaid and TrueLayer. This ensures quality categorization without disrupting your existing workflows.

🔍 Enhanced accuracy using diverse datasets: Our engine cross-references transactions with multiple data sources, such as accounting records. This not only enhances the reliability of data but also reduces the manual effort involved in transaction verification. Even lenders that don’t use accounting data in their underwriting still benefit from this feature, which offers an additional layer of context to an applicant’s transactions.

Codat transforms categorization into a powerful competitive advantage

Codat’s bank transaction categorization engine, a key component of the Lending API, empowers business lenders with the precision and efficiency needed to stay ahead in a competitive market. Lenders like Playter, Finance Fair, and Sprk Capital rely on this feature to gain the clearest insights and unlock optimizations during the underwriting and decisioning processes and beyond. Find out how you can do the same by completing the form below.