Try : Insurtech, Application Development

AgriTech(1)

Augmented Reality(20)

Clean Tech(8)

Customer Journey(17)

Design(44)

Solar Industry(8)

User Experience(67)

Edtech(10)

Events(34)

HR Tech(3)

Interviews(10)

Life@mantra(11)

Logistics(5)

Strategy(18)

Testing(9)

Android(48)

Backend(32)

Dev Ops(11)

Enterprise Solution(29)

Technology Modernization(8)

Frontend(29)

iOS(43)

Javascript(15)

AI in Insurance(38)

Insurtech(66)

Product Innovation(57)

Solutions(22)

E-health(12)

HealthTech(24)

mHealth(5)

Telehealth Care(4)

Telemedicine(5)

Artificial Intelligence(146)

Bitcoin(8)

Blockchain(19)

Cognitive Computing(7)

Computer Vision(8)

Data Science(21)

FinTech(51)

Banking(7)

Intelligent Automation(27)

Machine Learning(47)

Natural Language Processing(14)

expand Menu Filters

Pushing the Envelope on ICR Accuracy in Hand-written Forms

5 minutes, 6 seconds read

The need for and consequently the number of solutions for reading hand-written forms in an automated manner has been on a rise for as long as one could remember. Almost all businesses to varying degrees utilize paper-based forms that are filled by customers by hand. Most if not all of these businesses convert this handwritten information into the digital format. Depending on the technological sophistication or the size of the business this digitization might be done manually by one or more data entry specialists or through an automated solution. 

It’s easy to see how the manual route may not be an ideal solution for medium or large-sized business. Some of the apparent drawbacks of manual document processing are:

  1. The cost of having data entry specialists quickly add up as more documents need to be digitized necessitating adding more resources.
  2. Manual data entry is a slow process.
  3. Manual data entry is error-prone and requires a quality inspection which is costly and not fail-proof.

Many businesses have realized this and have transitioned to some form of a partially or fully automated solution to this problem. However, it’s not all rosy for these businesses either. The problems these businesses face is primarily related to the accuracy of the current solutions in the market. 

Shortcomings of Existing Hand-written Document Processing Solutions

The industry average for ICR (Intelligent Character Recognition) accuracy at the character level is about 70% and it will drop significantly if measured at word level which is what matters at the end. Such automation may allow for reducing the number of data entry personnel but with such a low level of accuracy, there will be a need for increased quality check resources, which are often more expensive than data entry resources hence diluting the cost-benefit of automation. Moreover, since the quality check is a slower process than data entry, this kind of automation doesn’t even address the speed problem.

Some of the reasons that result in a low level of accuracy among existing document processing solutions are:

  • Poor form design
  • User input not in line with the format
  • Noisy images
  • Misaligned documents
  • Low-quality scanning of documents
  • Spelling mistakes by the user
  • Overwriting/corrections by user

While we may not have control over some of the above factors such as form design and user input, we can definitely improvise the data extraction models to account for the other factors such as image noise, misalignments, spelling mistakes etc.

Our ICR Solution

The Document Parser solution in FlowMagic provides an intuitive user interface where data can be extracted from any standard form in three easy steps:

Step 1:   The user annotates the form (this is a one-time exercise for each new form) using an easy and intuitive UI. During annotation, each input field can optionally be labelled as mandatory. The user can specify the datatype for each field as alphabets, numeric or checkbox and also set the context for the field e.g. Name, PAN, City, Car Make, Date etc. Once done, the saved template can be used repeatedly for reading forms of the same type as long as there are no changes in the form design. In case of a change, the saved template can be easily modified. 

Step 2:   The user uploads one or more forms and chooses the corresponding template (from previous annotations). The system automatically extracts data from the forms.

Step 3:  The system exports the output in CSV, XML or JSON as desired by the user. If any field was marked as mandatory during annotation, the system also outputs a list of all mandatory fields that are blank.

Salient features of ICR Document Parser

  1. The standard form being annotated can be any number of pages. The input form need not have the same number of pages. If there is a mismatch between the pages in the input form and the template, the system does a matching and runs the data extraction on matching pages only. This also means that the input form need not be sorted correctly.
  2. The system can read handwritten as well as printed forms.
  3. The system corrects for minor misalignments during scanning of documents or documents scanned in the wrong orientation.
  4. The system has inbuilt dictionaries for various contexts such as Name, Cities, States, Countries, PAN, Profession, Marital Status, Relationship, Amount, Car Make, Date, Gender.
  5. The various data types supported by the system are alphabets, numeric, alphanumeric, checkboxes and special characters.
  6. The system corrects user errors or scanning issues by performing data type and dictionary checks (see examples below).
  7. The system checks for mandatory fields to make sure the form is completely filled.

Examples of Data Read/Corrections Made by an ICR

Benefits of ICR

Flexibility – you can annotate a wide variety of forms with complex inputs and data formats using the multiple data types and contexts built into the system.

Speed – Both annotation and data extraction are very user-friendly and fast. The system can extract data from a five-page form in under 30 seconds.

Scalability – The system is highly extensible and once set up for one type of form can easily be scaled for multiple forms or to process documents in bulk of the same format.

Accuracy – The character level accuracy of our model is over 90%. Word level accuracy depends on the form design and quality but in general, varies between 75% and 85%.

Workflow

ICR (Intelligent Character Recognizer) workflow

No matter what solution you use, you can always benefit from these best practices for form design to improve the accuracy of your ICR:

  1. Have all instructions in bold at the top of the form.
  2. Instruct the user to write clearly in block letters as the form will be processed by a machine.
  3. Provide examples of how to enter data wherever there is a scope for confusion.
  4. Instead of providing a free form space for data entry, it provides a clearly marked space with a specific location to enter each character.
  5. The overall space should be large enough to contain the requisite data to avoid user writing outside of this space.
  6. Have enough separation between the space for two fields to avoid overlap.

To learn more about how FlowMagic can improve the accuracy and speed of your document digitization/Intelligent Character Recognition (ICR) or discuss your broader AI goals, please get in touch with us at hello@mantralabsglobal.com

Cancel

Knowledge thats worth delivered in your inbox

Lake, Lakehouse, or Warehouse? Picking the Perfect Data Playground

By :

In 1997, the world watched in awe as IBM’s Deep Blue, a machine designed to play chess, defeated world champion Garry Kasparov. This moment wasn’t just a milestone for technology; it was a profound demonstration of data’s potential. Deep Blue analyzed millions of structured moves to anticipate outcomes. But imagine if it had access to unstructured data—Kasparov’s interviews, emotions, and instinctive reactions. Would the game have unfolded differently?

This historic clash mirrors today’s challenge in data architectures: leveraging structured, unstructured, and hybrid data systems to stay ahead. Let’s explore the nuances between Data Warehouses, Data Lakes, and Data Lakehouses—and uncover how they empower organizations to make game-changing decisions.

Deep Blue’s triumph was rooted in its ability to process structured data—moves on the chessboard, sequences of play, and pre-defined rules. Similarly, in the business world, structured data forms the backbone of decision-making. Customer transaction histories, financial ledgers, and inventory records are the “chess moves” of enterprises, neatly organized into rows and columns, ready for analysis. But as businesses grew, so did their need for a system that could not only store this structured data but also transform it into actionable insights efficiently. This need birthed the data warehouse.

Why was Data Warehouse the Best Move on the Board?

Data warehouses act as the strategic command centers for enterprises. By employing a schema-on-write approach, they ensure data is cleaned, validated, and formatted before storage. This guarantees high accuracy and consistency, making them indispensable for industries like finance and healthcare. For instance, global banks rely on data warehouses to calculate real-time risk assessments or detect fraud—a necessity when billions of transactions are processed daily, tools like Amazon Redshift, Snowflake Data Warehouse, and Azure Data Warehouse are vital. Similarly, hospitals use them to streamline patient care by integrating records, billing, and treatment plans into unified dashboards.

The impact is evident: according to a report by Global Market Insights, the global data warehouse market is projected to reach $30.4 billion by 2025, driven by the growing demand for business intelligence and real-time analytics. Yet, much like Deep Blue’s limitations in analyzing Kasparov’s emotional state, data warehouses face challenges when encountering data that doesn’t fit neatly into predefined schemas.

The question remains—what happens when businesses need to explore data outside these structured confines? The next evolution takes us to the flexible and expansive realm of data lakes, designed to embrace unstructured chaos.

The True Depth of Data Lakes 

While structured data lays the foundation for traditional analytics, the modern business environment is far more complex, organizations today recognize the untapped potential in unstructured and semi-structured data. Social media conversations, customer reviews, IoT sensor feeds, audio recordings, and video content—these are the modern equivalents of Kasparov’s instinctive reactions and emotional expressions. They hold valuable insights but exist in forms that defy the rigid schemas of data warehouses.

Data lake is the system designed to embrace this chaos. Unlike warehouses, which demand structure upfront, data lakes operate on a schema-on-read approach, storing raw data in its native format until it’s needed for analysis. This flexibility makes data lakes ideal for capturing unstructured and semi-structured information. For example, Netflix uses data lakes to ingest billions of daily streaming logs, combining semi-structured metadata with unstructured viewing behaviors to deliver hyper-personalized recommendations. Similarly, Tesla stores vast amounts of raw sensor data from its autonomous vehicles in data lakes to train machine learning models.

However, this openness comes with challenges. Without proper governance, data lakes risk devolving into “data swamps,” where valuable insights are buried under poorly cataloged, duplicated, or irrelevant information. Forrester analysts estimate that 60%-73% of enterprise data goes unused for analytics, highlighting the governance gap in traditional lake implementations.

Is the Data Lakehouse the Best of Both Worlds?

This gap gave rise to the data lakehouse, a hybrid approach that marries the flexibility of data lakes with the structure and governance of warehouses. The lakehouse supports both structured and unstructured data, enabling real-time querying for business intelligence (BI) while also accommodating AI/ML workloads. Tools like Databricks Lakehouse and Snowflake Lakehouse integrate features like ACID transactions and unified metadata layers, ensuring data remains clean, compliant, and accessible.

Retailers, for instance, use lakehouses to analyze customer behavior in real time while simultaneously training AI models for predictive recommendations. Streaming services like Disney+ integrate structured subscriber data with unstructured viewing habits, enhancing personalization and engagement. In manufacturing, lakehouses process vast IoT sensor data alongside operational records, predicting maintenance needs and reducing downtime. According to a report by Databricks, organizations implementing lakehouse architectures have achieved up to 40% cost reductions and accelerated insights, proving their value as a future-ready data solution.

As businesses navigate this evolving data ecosystem, the choice between these architectures depends on their unique needs. Below is a comparison table highlighting the key attributes of data warehouses, data lakes, and data lakehouses:

FeatureData WarehouseData LakeData Lakehouse
Data TypeStructuredStructured, Semi-Structured, UnstructuredBoth
Schema ApproachSchema-on-WriteSchema-on-ReadBoth
Query PerformanceOptimized for BISlower; requires specialized toolsHigh performance for both BI and AI
AccessibilityEasy for analysts with SQL toolsRequires technical expertiseAccessible to both analysts and data scientists
Cost EfficiencyHighLowModerate
ScalabilityLimitedHighHigh
GovernanceStrongWeakStrong
Use CasesBI, ComplianceAI/ML, Data ExplorationReal-Time Analytics, Unified Workloads
Best Fit ForFinance, HealthcareMedia, IoT, ResearchRetail, E-commerce, Multi-Industry
Conclusion

The interplay between data warehouses, data lakes, and data lakehouses is a tale of adaptation and convergence. Just as IBM’s Deep Blue showcased the power of structured data but left questions about unstructured insights, businesses today must decide how to harness the vast potential of their data. From tools like Azure Data Lake, Amazon Redshift, and Snowflake Data Warehouse to advanced platforms like Databricks Lakehouse, the possibilities are limitless.

Ultimately, the path forward depends on an organization’s specific goals—whether optimizing BI, exploring AI/ML, or achieving unified analytics. The synergy of data engineering, data analytics, and database activity monitoring ensures that insights are not just generated but are actionable. To accelerate AI transformation journeys for evolving organizations, leveraging cutting-edge platforms like Snowflake combined with deep expertise is crucial.

At Mantra Labs, we specialize in crafting tailored data science and engineering solutions that empower businesses to achieve their analytics goals. Our experience with platforms like Snowflake and our deep domain expertise makes us the ideal partner for driving data-driven innovation and unlocking the next wave of growth for your enterprise.

Cancel

Knowledge thats worth delivered in your inbox

Loading More Posts ...
Go Top
ml floating chatbot