A framework for data processing in supply chains with modular tasks
,
Sep 20, 2024
.
5 minutes to read
Introduction
Supply chains generate vast amounts of data from diverse sources such as suppliers, customers, logistics providers, and internal systems. Often, this data is unstructured, coming in the form of emails or PDF documents. There is no “source of truth” in the supply chain. So companies face the challenge of connecting disconnected systems to get a clear view of their function.
At Loop, we use our AI to resolve supply chain data differences. Whether it’s from different taxonomies and terminologies or different sources across the logistics industry. A core piece that allows us to resolve these differences is the atomic task system.
Atomic task system
The atomic task framework is a system with the goal of breaking down the overall goal into smaller, manageable atomic tasks. Each atomic task focuses on extracting or normalizing a specific piece of information. A task is a simple piece of work that can be performed, whether it be extracting the carrier name off of an invoice, or normalizing that carrier name to its entity representation. Here at Loop, our atomic task system is built internally with the core philosophy that we can design tasks to suit the supply chain domain. This means some of the simpler tasks can be completed by individuals without much domain knowledge, while the more difficult tasks leverage Loop’s competitive advantage and are completed exclusively by domain experts. The atomic task system is designed to be flexible, tasks can be dependent on other tasks, and we can craft tasks as granular as we’d like.
Each task can be completed by humans or AI models. With each new task we design and roll out, it is initially completed by humans with high domain expertise, and once enough training data has been collected, we build multiple machine learning models to mimic humans (more on this in a subsequent blog post). We utilize the consensus mechanism to ensure that all tasks are performed with a high level of accuracy. This means every task is only considered complete if multiple instances, human or machine, come to the same answer. By utilizing the wisdom of the crowds approach, we can ensure any bias for a single method is mitigated.
Designing atomic tasks
Despite the atomic task system being a pillar of Loop’s product and operational efficiency, without good task design it is less effective. Empirically, we have found that machine learning models are actually more flexible than humans, with enough data, they can perform almost any task. However, designing tasks that are completable by humans require a different set of requirements. It is a lot more difficult to come to consensus with more complex tasks and requires significantly more human training. As such, we have chosen to break down tasks into two separate types – those that are more granular and easier to solve by humans without much context, and those that require high context and are more complex in nature. Models and domain experts can solve these more complex tasks, and if needed, each complex task can be broken down into smaller, simple tasks. This allows domain experts to resolve complex tasks more quickly without sacrificing accuracy where non-domain experts can complete the simpler tasks.
Additionally, our approach to task design has evolved from extracting specific fields to a more tree-like structure of extraction. For example, we used to extract different reference numbers as separate atomic tasks: one for invoice number, one for PO number, etc. However, oftentimes the invoice number was ambiguous or carrier-specific. Instead, we broke it down into three tasks:
extract all reference numbers from the page
find the associated keys for each reference number
classify each of those into one of four types of reference numbers.
Each task becomes easier to complete, and the additional reference numbers help to provide additional data to downstream tasks like linking artifacts together.
Atomic tasks in action: extraction and normalization
At Loop, two of our core problems we aim to solve are extraction and normalization.
We define extraction as the process of taking information off of “artifacts.” Artifacts are any data input, regardless of format and source (photos of BOLs, EDI feeds, spreadsheets, etc.), primarily unstructured in the form of documents. Normalization is the process of taking the output of extraction and translating these outputs into our domain model, allowing for a standardized taxonomy.
We extract all key data points like the carrier name, addresses, and line items from the invoice. In the following task, we are extracting the line items from the invoice. In the consensus flow, two Loop models come to the same linehaul amount, validating the output. We then normalize the line items by normalizing the amounts and categorizing each line item, resolve the carrier name to our internal carrier entity, and standardize the address into its component forms.
Every line item of a table can take a different form, some are in horizontal tabular format, others are vertical, some are multipage. Our atomic task system is built in a way to handle the different formats by first extracting the structure of the table before extracting the line items themselves.
Atomic tasks = dynamic future
The atomic task system is built in a way for us to move fast and address market needs.
When Loop was first founded, we focused on serving shippers. But when the freight market dropped off in 2023, 3PLs and 4PL were desperate for a cost saving platform like Loop. So we were able to quickly spin up a net new product of 3PLs/4PLs in a matter of six months. We realized that one crucial gap from building a product for shippers to building a product for 3PLs was our lack of support for factoring relationships. With the help of domain experts here at Loop, we went through several invoices to understand what a factored invoice looked like. Within a week, we had built a task to extract and normalize the factor, prototyped it internally, and released it to a beta 3PL client for testing.
The evolution of the types of tasks will be guided by expanding business requirements, but at the core, the atomic task system remains flexible and scalable. This foundational aspect allows us to adapt quickly to new types of data and different operational demands without the need for overhauling the entire system. By focusing on modular task components, we can integrate additional functionalities as needed, effectively responding to evolving market conditions and customer needs.