Try : Insurtech, Application Development

AgriTech(1)

Augmented Reality(20)

Clean Tech(8)

Customer Journey(17)

Design(43)

Solar Industry(8)

User Experience(66)

Edtech(10)

Events(34)

HR Tech(3)

Interviews(10)

Life@mantra(11)

Logistics(5)

Strategy(18)

Testing(9)

Android(48)

Backend(32)

Dev Ops(11)

Enterprise Solution(29)

Technology Modernization(7)

Frontend(29)

iOS(43)

Javascript(15)

AI in Insurance(38)

Insurtech(66)

Product Innovation(57)

Solutions(22)

E-health(12)

HealthTech(24)

mHealth(5)

Telehealth Care(4)

Telemedicine(5)

Artificial Intelligence(143)

Bitcoin(8)

Blockchain(19)

Cognitive Computing(7)

Computer Vision(8)

Data Science(19)

FinTech(51)

Banking(7)

Intelligent Automation(27)

Machine Learning(47)

Natural Language Processing(14)

expand Menu Filters

Tabular Data Extraction from Invoice Documents

5 minutes, 12 seconds read

The task of extracting information from tables is a long-running problem statement in the world of machine learning and image processing. Although the latest accomplishments in the field of deep learning have seen a lot of success, tabular data extraction still remains a challenge due to the vast amount of ways in which tables are represented both visually and structurally. Below are some of the examples: 

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Invoice Documents

Many companies process their bills in the form of invoices which contain tables that hold information about the items along with their prices and quantities. This information is generally required to be stored in databases while these invoices get processed.

Traditionally, this information is required to be hand filled into a database software however, this approach has some drawbacks:

1. The whole process is time consuming.

2. Certain errors might get induced during the data entry process.

3. Extra cost of manual data entry.

 An invoice automation system can be deployed to address these shortcomings. The idea is to upload the invoice document and the system will read and generate the tabular information in the digital format making the whole process faster and more cost-effective for companies.

Fig. 6

Fig. 6 shows a sample invoice that contains some regular invoice details such as Invoice No, Invoice Date, Company details, and two tables holding transaction information. Now, our goal is to extract the information present in the two tables.

Tabular Information

The problem of extracting tables from invoices can be condensed into 2 main subtasks.

1. Table Detection

2. Tabular Structure Extraction.

 What is Table Detection?

 Table Detection is the process of identifying and locating tables that are present in a document, usually an image. There are multiple ways to detect tables in an image. Some of the approaches make use of image processing toolkits like OpenCV while some of the other approaches use statistical models on features extracted from the documents such as Text Position and Text Characteristics. Recently more deep learning approaches have been used to detect tables using trained neural networks similar to the ones used in Object Detection.

What is Table Structure Extraction?

Table Structure Extraction is the process of extracting the tabular information once the boundaries of the table are detected through Table Detection. The information within the rows and columns is then extracted and transferred to the desired format, usually CSV or Excel file.

Table Detection using Faster RCNN

Faster RCNN is a neural network model that comes from the RCNN family. It is the successor of Fast RCNN created by Ross Girshick in 2015. The name Faster RCNN is to signify an improvement over the previous model both in terms of training speed and detection speed. 

To read more about the model framework, one can access the paper Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

 There are many other object detection model architectures that are available for use today. Each model comes with certain advantages and disadvantages in terms of prediction accuracy, model parameter size, inference speed, etc.

For the task of detecting tables in invoice documents, we will select the Faster RCNN model with FPN(Feature Pyramid Network) as a feature extraction network. The model is pre-trained on the ImageNet corpus using ResNET 101 architecture. The ImageNet corpus is a public dataset that consists of more than 20,000 image categories of everyday objects.  We will therefore make use of a Pytorch framework to train and test the model.

The above mentioned model gives us a fast inference time and a high Mean Average Precision. It is preferred for cases where a quick real time detection is desired.

First, the model is to be trained using public datasets for Table Detection such as Marmot and UNLV datasets. Next, we further fine-tune the model with our custom labeled dataset. For the purpose of labeling, we will follow the COCO annotation format.

Once trained, the model displayed an accuracy close to 86% on our custom dataset. There are certain scenarios where the model fails to locate the tables such as cases containing watermarks and/or overlapping texts. Tables without borders are also missed in a few instances. However, the model has shown its ability to learn from examples and detect tables in multiple different invoice documents. 

Fig. 7

After running inference on the sample invoice from Fig 6, we can see two table boundaries being detected by the model in Fig 7. The first table gets detected with 100% accuracy and the second table is detected with 99% accuracy.

Table Structure Extraction

Once the boundaries of the table are detected by the model, an OCR (Optical Character Reader) mechanism is used to extract the text within the boundaries. The text is then processed using the information that is part of a unique table.

We were able to extract the correct structure of the table, including its headers and line items using logics derived from the invoices. The difficulty of this process depends on the type of invoice format at hand.

There are multiple challenges that one may encounter while building an algorithm to extract structure. Some of them are:

  1. The span of some table columns may overlap making it difficult to determine the boundaries between columns.
  2. The fonts and sizes present within tables may vary from one table to another. The algorithm should be able to accomodate for this variation.
  3. The tables might get split into two pages and detecting the continuation of a table might be challenging.

Certain deep learning approaches have also been published recently to determine the structure of a table. However, training them on custom datasets still remains a challenge. 

Fig 8

The final result is then stored in a CSV file and can be edited or stored according to one’s convenience as shown in Fig 8 which displays the first table information.

Conclusion

The deep learning approach to extracting information from structured documents is a step in the right direction. With high accuracy and low running time, the systems can only learn to perform better with more data. The recent and upcoming advancements in computer vision approaches have made processes such as invoice automation significantly accessible and robust.

About the author:

Prateek Sethi is a Data Scientist working at Mantra Labs. His work involves leveraging Artificial Intelligence to create data-driven solutions. Apart from his work he takes a keen interest in football and exploring the outdoors.

Further Reading:

Cancel

Knowledge thats worth delivered in your inbox

Why Netflix Broke Itself: Was It Success Rewritten Through Platform Engineering?

By :

Let’s take a trip back in time—2008. Netflix was nothing like the media juggernaut it is today. Back then, they were a DVD-rental-by-mail service trying to go digital. But here’s the kicker: they hit a major pitfall. The internet was booming, and people were binge-watching shows like never before, but Netflix’s infrastructure couldn’t handle the load. Their single, massive system—what techies call a “monolith”—was creaking under pressure. Slow load times and buffering wheels plagued the experience, a nightmare for any platform or app development company trying to scale

That’s when Netflix decided to do something wild—they broke their monolith into smaller pieces. It was microservices, the tech equivalent of turning one giant pizza into bite-sized slices. Instead of one colossal system doing everything from streaming to recommendations, each piece of Netflix’s architecture became a specialist—one service handled streaming, another handled recommendations, another managed user data, and so on.

But microservices alone weren’t enough. What if one slice of pizza burns? Would the rest of the meal be ruined? Netflix wasn’t about to let a burnt crust take down the whole operation. That’s when they introduced the Circuit Breaker Pattern—just like a home electrical circuit that prevents a total blackout when one fuse blows. Their famous Hystrix tool allowed services to fail without taking down the entire platform. 

Fast-forward to today: Netflix isn’t just serving you movie marathons, it’s a digital powerhouse, an icon in platform engineering; it’s deploying new code thousands of times per day without breaking a sweat. They handle 208 million subscribers streaming over 1 billion hours of content every week. Trends in Platform engineering transformed Netflix into an application dev platform with self-service capabilities, supporting app developers and fostering a culture of continuous deployment.

Did Netflix bring order to chaos?

Netflix didn’t just solve its own problem. They blazed the trail for a movement: platform engineering. Now, every company wants a piece of that action. What Netflix did was essentially build an internal platform that developers could innovate without dealing with infrastructure headaches, a dream scenario for any application developer or app development company seeking seamless workflows.

And it’s not just for the big players like Netflix anymore. Across industries, companies are using platform engineering to create Internal Developer Platforms (IDPs)—one-stop shops for mobile application developers to create, test, and deploy apps without waiting on traditional IT. According to Gartner, 80% of organizations will adopt platform engineering by 2025 because it makes everything faster and more efficient, a game-changer for any mobile app developer or development software firm.

All anybody has to do is to make sure the tools are actually connected and working together. To make the most of it. That’s where modern trends like self-service platforms and composable architectures come in. You build, you scale, you innovate.achieving what mobile app dev and web-based development needs And all without breaking a sweat.

Source: getport.io

Is Mantra Labs Redefining Platform Engineering?

We didn’t just learn from Netflix’s playbook; we’re writing our own chapters in platform engineering. One example of this? Our work with one of India’s leading private-sector general insurance companies.

Their existing DevOps system was like Netflix’s old monolith: complex, clunky, and slowing them down. Multiple teams, diverse workflows, and a lack of standardization were crippling their ability to innovate. Worse yet, they were stuck in a ticket-driven approach, which led to reactive fixes rather than proactive growth. Observability gaps meant they were often solving the wrong problems, without any real insight into what was happening under the hood.

That’s where Mantra Labs stepped in. Mantra Labs brought in the pillars of platform engineering:

Standardization: We unified their workflows, creating a single source of truth for teams across the board.

Customization:  Our tailored platform engineering approach addressed the unique demands of their various application development teams.

Traceability: With better observability tools, they could now track their workflows, giving them real-time insights into system health and potential bottlenecks—an essential feature for web and app development and agile software development.

We didn’t just slap a band-aid on the problem; we overhauled their entire infrastructure. By centralizing infrastructure management and removing the ticket-driven chaos, we gave them a self-service platform—where teams could deploy new code without waiting in line. The results? Faster workflows, better adoption of tools, and an infrastructure ready for future growth.

But we didn’t stop there. We solved the critical observability gaps—providing real-time data that helped the insurance giant avoid potential pitfalls before they happened. With our approach, they no longer had to “hope” that things would go right. They could see it happening in real-time which is a major advantage in cross-platform mobile application development and cloud-based web hosting.

The Future of Platform Engineering: What’s Next?

As we look forward, platform engineering will continue to drive innovation, enabling companies to build scalable, resilient systems that adapt to future challenges—whether it’s AI-driven automation or self-healing platforms.

If you’re ready to make the leap into platform engineering, Mantra Labs is here to guide you. Whether you’re aiming for smoother workflows, enhanced observability, or scalable infrastructure, we’ve got the tools and expertise to get you there.

Cancel

Knowledge thats worth delivered in your inbox

Loading More Posts ...
Go Top
ml floating chatbot