Try : Insurtech, Application Development

AgriTech(1)

Augmented Reality(21)

Clean Tech(9)

Customer Journey(17)

Design(45)

Solar Industry(8)

User Experience(68)

Edtech(10)

Events(34)

HR Tech(3)

Interviews(10)

Life@mantra(11)

Logistics(5)

Manufacturing(3)

Strategy(18)

Testing(9)

Android(48)

Backend(32)

Dev Ops(11)

Enterprise Solution(33)

Technology Modernization(9)

Frontend(29)

iOS(43)

Javascript(15)

AI in Insurance(38)

Insurtech(66)

Product Innovation(58)

Solutions(22)

E-health(12)

HealthTech(24)

mHealth(5)

Telehealth Care(4)

Telemedicine(5)

Artificial Intelligence(153)

Bitcoin(8)

Blockchain(19)

Cognitive Computing(8)

Computer Vision(8)

Data Science(23)

FinTech(51)

Banking(7)

Intelligent Automation(27)

Machine Learning(48)

Natural Language Processing(14)

expand Menu Filters

Tabular Data Extraction from Invoice Documents

5 minutes, 12 seconds read

The task of extracting information from tables is a long-running problem statement in the world of machine learning and image processing. Although the latest accomplishments in the field of deep learning have seen a lot of success, tabular data extraction still remains a challenge due to the vast amount of ways in which tables are represented both visually and structurally. Below are some of the examples: 

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Invoice Documents

Many companies process their bills in the form of invoices which contain tables that hold information about the items along with their prices and quantities. This information is generally required to be stored in databases while these invoices get processed.

Traditionally, this information is required to be hand filled into a database software however, this approach has some drawbacks:

1. The whole process is time consuming.

2. Certain errors might get induced during the data entry process.

3. Extra cost of manual data entry.

 An invoice automation system can be deployed to address these shortcomings. The idea is to upload the invoice document and the system will read and generate the tabular information in the digital format making the whole process faster and more cost-effective for companies.

Fig. 6

Fig. 6 shows a sample invoice that contains some regular invoice details such as Invoice No, Invoice Date, Company details, and two tables holding transaction information. Now, our goal is to extract the information present in the two tables.

Tabular Information

The problem of extracting tables from invoices can be condensed into 2 main subtasks.

1. Table Detection

2. Tabular Structure Extraction.

 What is Table Detection?

 Table Detection is the process of identifying and locating tables that are present in a document, usually an image. There are multiple ways to detect tables in an image. Some of the approaches make use of image processing toolkits like OpenCV while some of the other approaches use statistical models on features extracted from the documents such as Text Position and Text Characteristics. Recently more deep learning approaches have been used to detect tables using trained neural networks similar to the ones used in Object Detection.

What is Table Structure Extraction?

Table Structure Extraction is the process of extracting the tabular information once the boundaries of the table are detected through Table Detection. The information within the rows and columns is then extracted and transferred to the desired format, usually CSV or Excel file.

Table Detection using Faster RCNN

Faster RCNN is a neural network model that comes from the RCNN family. It is the successor of Fast RCNN created by Ross Girshick in 2015. The name Faster RCNN is to signify an improvement over the previous model both in terms of training speed and detection speed. 

To read more about the model framework, one can access the paper Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

 There are many other object detection model architectures that are available for use today. Each model comes with certain advantages and disadvantages in terms of prediction accuracy, model parameter size, inference speed, etc.

For the task of detecting tables in invoice documents, we will select the Faster RCNN model with FPN(Feature Pyramid Network) as a feature extraction network. The model is pre-trained on the ImageNet corpus using ResNET 101 architecture. The ImageNet corpus is a public dataset that consists of more than 20,000 image categories of everyday objects.  We will therefore make use of a Pytorch framework to train and test the model.

The above mentioned model gives us a fast inference time and a high Mean Average Precision. It is preferred for cases where a quick real time detection is desired.

First, the model is to be trained using public datasets for Table Detection such as Marmot and UNLV datasets. Next, we further fine-tune the model with our custom labeled dataset. For the purpose of labeling, we will follow the COCO annotation format.

Once trained, the model displayed an accuracy close to 86% on our custom dataset. There are certain scenarios where the model fails to locate the tables such as cases containing watermarks and/or overlapping texts. Tables without borders are also missed in a few instances. However, the model has shown its ability to learn from examples and detect tables in multiple different invoice documents. 

Fig. 7

After running inference on the sample invoice from Fig 6, we can see two table boundaries being detected by the model in Fig 7. The first table gets detected with 100% accuracy and the second table is detected with 99% accuracy.

Table Structure Extraction

Once the boundaries of the table are detected by the model, an OCR (Optical Character Reader) mechanism is used to extract the text within the boundaries. The text is then processed using the information that is part of a unique table.

We were able to extract the correct structure of the table, including its headers and line items using logics derived from the invoices. The difficulty of this process depends on the type of invoice format at hand.

There are multiple challenges that one may encounter while building an algorithm to extract structure. Some of them are:

  1. The span of some table columns may overlap making it difficult to determine the boundaries between columns.
  2. The fonts and sizes present within tables may vary from one table to another. The algorithm should be able to accomodate for this variation.
  3. The tables might get split into two pages and detecting the continuation of a table might be challenging.

Certain deep learning approaches have also been published recently to determine the structure of a table. However, training them on custom datasets still remains a challenge. 

Fig 8

The final result is then stored in a CSV file and can be edited or stored according to one’s convenience as shown in Fig 8 which displays the first table information.

Conclusion

The deep learning approach to extracting information from structured documents is a step in the right direction. With high accuracy and low running time, the systems can only learn to perform better with more data. The recent and upcoming advancements in computer vision approaches have made processes such as invoice automation significantly accessible and robust.

About the author:

Prateek Sethi is a Data Scientist working at Mantra Labs. His work involves leveraging Artificial Intelligence to create data-driven solutions. Apart from his work he takes a keen interest in football and exploring the outdoors.

Further Reading:

Cancel

Knowledge thats worth delivered in your inbox

AI Code Assistants: Revolution Unveiled

AI code assistants are revolutionizing software development, with Gartner predicting that 75% of enterprise software engineers will use these tools by 2028, up from less than 10% in early 2023. This rapid adoption reflects the potential of AI to enhance coding efficiency and productivity, but also raises important questions about the maturity, benefits, and challenges of these emerging technologies.

Code Assistance Evolution

The evolution of code assistance has been rapid and transformative, progressing from simple autocomplete features to sophisticated AI-powered tools. GitHub Copilot, launched in 2021, marked a significant milestone by leveraging OpenAI’s Codex to generate entire code snippets 1. Amazon Q, introduced in 2023, further advanced the field with its deep integration into AWS services and impressive code acceptance rates of up to 50%. GPT (Generative Pre-trained Transformer) models have been instrumental in this evolution, with GPT-3 and its successors enabling more context-aware and nuanced code suggestions.

Image Source

  • Adoption rates: By 2023, over 40% of developers reported using AI code assistants.
  • Productivity gains: Tools like Amazon Q have demonstrated up to 80% acceleration in coding tasks.
  • Language support: Modern AI assistants support dozens of programming languages, with GitHub Copilot covering over 20 languages and frameworks.
  • Error reduction: AI-powered code assistants have shown potential to reduce bugs by up to 30% in some studies.

These advancements have not only increased coding efficiency but also democratized software development, making it more accessible to novice programmers and non-professionals alike.

Current Adoption and Maturity: Metrics Defining the Landscape

The landscape of AI code assistants is rapidly evolving, with adoption rates and performance metrics showcasing their growing maturity. Here’s a tabular comparison of some popular AI coding tools, including Amazon Q:

Amazon Q stands out with its specialized capabilities for software developers and deep integration with AWS services. It offers a range of features designed to streamline development processes:

  • Highest reported code acceptance rates: Up to 50% for multi-line code suggestions
  • Built-in security: Secure and private by design, with robust data security measures
  • Extensive connectivity: Over 50 built-in, managed, and secure data connectors
  • Task automation: Amazon Q Apps allow users to create generative AI-powered apps for streamlining tasks

The tool’s impact is evident in its adoption and performance metrics. For instance, Amazon Q has helped save over 450,000 hours from manual technical investigations. Its integration with CloudWatch provides valuable insights into developer usage patterns and areas for improvement.

As these AI assistants continue to mature, they are increasingly becoming integral to modern software development workflows. However, it’s important to note that while these tools offer significant benefits, they should be used judiciously, with developers maintaining a critical eye on the generated code and understanding its implications for overall project architecture and security.

AI-Powered Collaborative Coding: Enhancing Team Productivity

AI code assistants are revolutionizing collaborative coding practices, offering real-time suggestions, conflict resolution, and personalized assistance to development teams. These tools integrate seamlessly with popular IDEs and version control systems, facilitating smoother teamwork and code quality improvements.

Key features of AI-enhanced collaborative coding:

  • Real-time code suggestions and auto-completion across team members
  • Automated conflict detection and resolution in merge requests
  • Personalized coding assistance based on individual developer styles
  • AI-driven code reviews and quality checks

Benefits for development teams:

  • Increased productivity: Teams report up to 30-50% faster code completion
  • Improved code consistency: AI ensures adherence to team coding standards
  • Reduced onboarding time: New team members can quickly adapt to project codebases
  • Enhanced knowledge sharing: AI suggestions expose developers to diverse coding patterns

While AI code assistants offer significant advantages, it’s crucial to maintain a balance between AI assistance and human expertise. Teams should establish guidelines for AI tool usage to ensure code quality, security, and maintainability.

Emerging trends in AI-powered collaborative coding:

  • Integration of natural language processing for code explanations and documentation
  • Advanced code refactoring suggestions based on team-wide code patterns
  • AI-assisted pair programming and mob programming sessions
  • Predictive analytics for project timelines and resource allocation

As AI continues to evolve, collaborative coding tools are expected to become more sophisticated, further streamlining team workflows and fostering innovation in software development practices.

Benefits and Risks Analyzed

AI code assistants offer significant benefits but also present notable challenges. Here’s an overview of the advantages driving adoption and the critical downsides:

Core Advantages Driving Adoption:

  1. Enhanced Productivity: AI coding tools can boost developer productivity by 30-50%1. Google AI researchers estimate that these tools could save developers up to 30% of their coding time.
IndustryPotential Annual Value
Banking$200 billion – $340 billion
Retail and CPG$400 billion – $660 billion
  1. Economic Impact: Generative AI, including code assistants, could potentially add $2.6 trillion to $4.4 trillion annually to the global economy across various use cases. In the software engineering sector alone, this technology could deliver substantial value.
  1. Democratization of Software Development: AI assistants enable individuals with less coding experience to build complex applications, potentially broadening the talent pool and fostering innovation.
  2. Instant Coding Support: AI provides real-time suggestions and generates code snippets, aiding developers in their coding journey.

Critical Downsides and Risks:

  1. Cognitive and Skill-Related Concerns:
    • Over-reliance on AI tools may lead to skill atrophy, especially for junior developers.
    • There’s a risk of developers losing the ability to write or deeply understand code independently.
  2. Technical and Ethical Limitations:
    • Quality of Results: AI-generated code may contain hidden issues, leading to bugs or security vulnerabilities.
    • Security Risks: AI tools might introduce insecure libraries or out-of-date dependencies.
    • Ethical Concerns: AI algorithms lack accountability for errors and may reinforce harmful stereotypes or promote misinformation.
  3. Copyright and Licensing Issues:
    • AI tools heavily rely on open-source code, which may lead to unintentional use of copyrighted material or introduction of insecure libraries.
  4. Limited Contextual Understanding:
    • AI-generated code may not always integrate seamlessly with the broader project context, potentially leading to fragmented code.
  5. Bias in Training Data:
    • AI outputs can reflect biases present in their training data, potentially leading to non-inclusive code practices.

While AI code assistants offer significant productivity gains and economic benefits, they also present challenges that need careful consideration. Developers and organizations must balance the advantages with the potential risks, ensuring responsible use of these powerful tools.

Future of Code Automation

The future of AI code assistants is poised for significant growth and evolution, with technological advancements and changing developer attitudes shaping their trajectory towards potential ubiquity or obsolescence.

Technological Advancements on the Horizon:

  1. Enhanced Contextual Understanding: Future AI assistants are expected to gain deeper comprehension of project structures, coding patterns, and business logic. This will enable more accurate and context-aware code suggestions, reducing the need for extensive human review.
  2. Multi-Modal AI: Integration of natural language processing, computer vision, and code analysis will allow AI assistants to understand and generate code based on diverse inputs, including voice commands, sketches, and high-level descriptions.
  3. Autonomous Code Generation: By 2027, we may see AI agents capable of handling entire segments of a project with minimal oversight, potentially scaffolding entire applications from natural language descriptions.
  4. Self-Improving AI: Machine learning models that continuously learn from developer interactions and feedback will lead to increasingly accurate and personalized code suggestions over time.

Adoption Barriers and Enablers:

Barriers:

  1. Data Privacy Concerns: Organizations remain cautious about sharing proprietary code with cloud-based AI services.
  2. Integration Challenges: Seamless integration with existing development workflows and tools is crucial for widespread adoption.
  3. Skill Erosion Fears: Concerns about over-reliance on AI leading to a decline in fundamental coding skills among developers.

Enablers:

  1. Open-Source Models: The development of powerful open-source AI models may address privacy concerns and increase accessibility.
  2. IDE Integration: Deeper integration with popular integrated development environments will streamline adoption.
  3. Demonstrable ROI: Clear evidence of productivity gains and cost savings will drive enterprise adoption.
  1. AI-Driven Architecture Design: AI assistants may evolve to suggest optimal system architectures based on project requirements and best practices.
  2. Automated Code Refactoring: AI tools will increasingly offer intelligent refactoring suggestions to improve code quality and maintainability.
  3. Predictive Bug Detection: Advanced AI models will predict potential bugs and security vulnerabilities before they manifest in production environments.
  4. Cross-Language Translation: AI assistants will facilitate seamless translation between programming languages, enabling easier migration and interoperability.
  5. AI-Human Pair Programming: More sophisticated AI agents may act as virtual pair programming partners, offering real-time guidance and code reviews.
  6. Ethical AI Coding: Future AI assistants will incorporate ethical considerations, suggesting inclusive and bias-free code practices.

As these trends unfold, the role of human developers is likely to shift towards higher-level problem-solving, creative design, and AI oversight. By 2025, it’s projected that over 70% of professional software developers will regularly collaborate with AI agents in their coding workflows1. However, the path to ubiquity will depend on addressing key challenges such as reliability, security, and maintaining a balance between AI assistance and human expertise.

The future outlook for AI code assistants is one of transformative potential, with the technology poised to become an integral part of the software development landscape. As these tools continue to evolve, they will likely reshape team structures, development methodologies, and the very nature of coding itself.

Conclusion: A Tool, Not a Panacea

AI code assistants have irrevocably altered software development, delivering measurable productivity gains but introducing new technical and societal challenges. Current metrics suggest they are transitioning from novel aids to essential utilities—63% of enterprises now mandate their use. However, their ascendancy as the de facto standard hinges on addressing security flaws, mitigating cognitive erosion, and fostering equitable upskilling. For organizations, the optimal path lies in balanced integration: harnessing AI’s speed while preserving human ingenuity. As generative models evolve, developers who master this symbiosis will define the next epoch of software engineering.

Cancel

Knowledge thats worth delivered in your inbox

Loading More Posts ...
Go Top
ml floating chatbot