IDP, The Killer App for OCR

A Brief History of the "Killer App"

When Steve Jobs unveiled the first iPhone in 2007, he emphasized that it was first a ‘really cool phone’, and also for music, photos, internet browsing, email, and texting. Steve was tired of his phone dropping calls. Since then, more than a billion iPhones have shipped.

When Hedy LaMarr and George Antheil patented their frequency-hopping technology in 1941, they wanted to prevent Allied torpedoes from being detected. That patent became the precursor technology for secure wi-fi, GPS and Bluetooth now used by billions of people.

In the 1960’s, the Advanced Research Projects Agency was interested in time-sharing of computers – which ultimately led to one of the most transformative technology ever seen – the internet.

These examples all have one thing in common – the initial technology and the initial use case were not the “killer app” – the ultimate core value of this new technology.

The first Optical Character Recognition (OCR) technologies (1910-1930) were used in telegraphy, microfiche, and reading devices for the blind. The Omni-font OCR, which can detect text in many fonts, was developed in the 1970’s. The OCR is the foundation for Intelligent Document Processing (IDP.)

OCR is a broad term, covering a range of different technologies to detect and extract text from images. It has two parts – the OCR engine itself, which is doing the detection; and the processing software, which handles the extraction.

IDP can be thought of as “OCR + Artificial Intelligence”, or ‘Smart OCR.” By combining OCR technology with Computer Vision, Deep Neural Networks, Natural Language Processing and Transfer Learning, Machine Learning models can be built that read and understand the document.

It wasn’t until 2018, with the advent of BERT (Bidirectional Encoder Representations from Transformers) technology, which enabled understanding the full context of a word by looking at the words that come before and after it, that a machine learning model could incorporate context from both directions.  This enabled a new approach to NLP, called “masked language modeling,” which is measurably more effective than unidirectional approaches for sentence-level NLP tasks like natural language inference and paraphrasing, and token-level tasks like named entity recognition and question answering.

 (An example would be “I went to the store to swim.” Or “I went to the shore to buy bread.” It’s only in a bidirectional context that the errors would be recognized aka “contextual word embedding.”)

Ways to Extract Data

There are three ways to extract data from documents: manual data entry, OCR, and IDP. Manual data entry is performed by ‘swivel-chair’ employees reading data from one device/screen/physical document, and manually typing it into a different screen/spreadsheet/database for further processing. This is slow, expensive, unscalable, and error-prone. (Sorry, humans – you’re not really great at this. You’re good at many other things, but not this.)

Comparing OCR to IDP

OCR is useful for data extraction only from simple, structured documents.

  • It requires fixed, expensive, hard to manage templates for each document type.
  • User needs to train OCR for each template, and any subsequent variations/revisions.
  • It’s rule-based.
  • It requires font libraries.
  • It doesn’t learn as it goes, and it doesn’t become more accurate over time.
  • Every extracted document needs to be reviewed – no straight-through processing.
  • The value comes from reduced processing time.
  • The core technology is the OCR.

IDP is useful to extract data, understand the data, create insight, detect context and sentiment from complex unstructured documents that include images, tables, free-flowing text, and a wide range of formats.

  • It doesn’t require any templates – just a handful of sample documents.
  • Machine Learning models can adapt to changes in documents.
  • It’s content-based.
  • It leverages Natural Language Processing with machine learning models trained on a huge amount of data.
  • With minor human interaction in the training phase, it learns on its own and becomes more accurate over time.
  • Straight-through processing is possible.
  • The value comes from reduced processing time, as well as greater business insight, ROI, and contextual information (i.e., sentiment of writer).
  • The core technology is Machine Learning.


So, there you have it – OCR is one part of the ‘Pre-Extraction’ technology of the IDP solution, along with auto-crop & noise reduction, classification and text mining and machine learning. That’s followed by the data “Extraction’ phase, using NLP and machine learning, and the “Post-Extraction” phase, using pre-defined taxonomies and business validation rules. IDP will pass the resulting data, structured data, to other systems for further processing, and if any result is below a defined confidence score, alert a human-in-the-loop for guidance (Humans – you do this very well!)

The best modern OCRs can be made to be more accurate by being configured for particular image types and can give users some configurable controls.

But it is the additional technologies powering the IDP solution (along with Intelligent Image Processing – more on this in next blog post), allowing for straight-through processing, which form the killer app for the OCR.

One Response

Comments are closed.