Erhan OK/ July 14, 2019/ BPA, Digital Transformation, IRPA/ 0 comments

Target Audiences: Service Delivery Leaders, BPO/SSC Leaders, Digitalization Beginners, Digital Transformers or Managers, OCR Users, Data/Document Processors.

Computer Vision helping in Automated Document Processing

In this article, I’m introducing you with the concept of Computer Vision (CV). This a new capability for enabling Automated Document Processing (ADP). This will also help to lower the service delivery costs and increase service quality.

Computer Vision capability is one of the fastest evolving scientific fields among Artificial Intelligence propositions.

Why so?

if we look into the history of document processing needs in the market, it’s all started with paper-based operations shifted to computers during Industry 3.0 revolution. Together with computers, we also started using digital documents for data exchange. So the majority of the “transactional data” has been exchanged between a global company and its partners via those.

The digital documents were still sufficient to use in day-to-day operations until we had to process “scanned documents”. However, the processing of those documents had created a significant amount of manual work and low data quality issues. That’s why, OCR (Optical Character Recognition) solutions were commercialized for extracting the data from an image or scanned documents, instead of manual data entry. That has increased the efficiency for the low volume of documents, but if we talk about over a few million documents to be processed, there was again a big manual workload for people to maintain OCR outcomes.

Exactly at this point of time, Computer Vision came into the picture. It has a promising capability for handling big amount of documents effectively and lowering the manual effort needs as much as possible.

Why Computer Vision instead of OCR solutions?

In the current service delivery teams, OCR solutions exist either as built-in to ERP or as a separate solution integrated with different internal systems. Most of them are proposed in the market as a new individual platform. And OCR process flow usually as below.

  1. Receive scanned document to the inbox or directly to OCR solution
  2. Process them through OCR and get results on the systems
  3. Verify, validate, update OCR outcomes and submit the results
  4. Obtain digital data from a database or interface as CSV file

This workflow is requiring a significant amount of effort just to ensure/maintain data quality. There are many reasons behind this workload but, one of the major ones is the “learning” feature of OCR solutions. That is not as sufficient as expected for after a few hundred million documents processed within a year.

Actually, the “real learning” started with Artificial Neural Network, so-called Deep Learning (Machine Learning method), enabled!

As a summary, OCR algorithms (the core of OCR solutions) plus Deep Learning became major components of Computer Vision. But that’s not the all within the Industry 4.0 revolution.

Let’s move ahead with the details of “Computer Vision“.

The Terminology

Just to clarify, I will no longer use the term of OCR, it is Computer Vision going forward. And the ingredients to make it suitable for your custom needs will be mentioned in this article.

Wikipedia definition: Computer vision deals with how computers can be made to gain a high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world, in order to produce numerical or symbolic information.

What does that mean for an end-users?

Basically, we can get meaningful insights from the visual inputs! Furthermore, we can input images or videos through a computer vision service, and it can generate digital content or structured data. This ability is the same as the human visual system. Such as; through capturing a moment, we can understand what’s going on in the environment or we capture knowledge from the text by reading books and newspapers.

As one of the results of the CV application would be decreasing service delivery operational costs. This will happen via extracting the necessary data with high accuracy and, data insights availability for faster decisions and processing. This will help to increase the agility and quality of our operations, also lower costs by the less manual effort needed to maintain data quality.

If we look at the CV concept from three different angles;

  1. Inputs, it takes
  2. How does it process
  3. Outputs, it can generate

What inputs does Computer Vision take?

It is simply digital images and videos. Talking specifically about the images, it has a lot of variations. Such as;

  • The photos from Instagram
  • Computer or camera pictures in the form of *.tiff, *.jpeg, *.png,…
  • Scanned documents or images, screenshots
  • Any type of a digital document (e.g. word and powerpoint)

Why is it necessary to highlight the input variations? Because, depending on types of image or video as an input, it will directly impact the set of technologies chosen for building the entire CV service. Actually, it’s not a service anymore, it’s a “CV product“.

How does Computer Vision process inputs?

As mentioned above, Computer Vision includes a set of technologies or methods. Keeping the focus on images and especially scanned documents (like purchasing form), there are certain methods requiring to process them. Therefore, the service might include a combination of the following methods as per the needs;

Contextual Image Classification: Helping to define patterns based on the Contextual information in image. Meaning, that helps to obtain information about what’s within the image and where is it located. Image Classification, Object Localization & Detection, Semantic Segmentation, and Instance Segmentation are some example methods for applying, in case of any need.

Optical Character Recognition (OCR): A method to convert images of typed, handwritten or printed text into electronic data (machine-encoded text).

Intelligent Character Recognition (ICR): A method focussing on handwriting text recognition and identifying different fonts and styles of it.

Intelligent Word Recognition (IWR): A method to increase the quality of extraction and eliminate a high percentage of the manual extraction effort needed by a human after OCR/ICR processing. This method focussing on words and phrases, instead of single characters in complex and unstructured pages.

Optical Mark Recognition (OMR): Is the process of capturing human-marked data from documents. This grants the ability to define data field locations in a specific template.

Other types of recognition: There are also specific methods specific for Bar code, QR code, Magnetic Ink character, logo, and signatures detections.

The methods listed above are all about extracting expected information from a digital document with the highest accuracy possible.

What else needed?

When it also comes to supporting multi-languages, there is a need to integrate with Natural Language Processing (NLP) services.

In order to structure/correlate extracted data and, continue supervised learning, machine learning methods are required as well.

As a whole, the Computer Vision product performs end to end process of extraction, structuring and correlating, and continuous learning with higher and higher accuracy.

What possible outcomes Computer Vision can generate?

The CV can generate a structured set of data, in order to trigger further actions within the process. Some examples of what outcomes could be:

  • Purchase Order #
  • Document Type/Title
  • Vendor name, address,…
  • Requestor’s name
  • Customers personal information (from a driver license)
  • Email/ticket category

And many other data points.

As a summary, the Computer Vision capabilities can help in many areas of our life, in order to get a decent insight from images and videos. This empowers computer to handle and facilitate most of the activities on behalf of us. It’s only a matter of defining the right input set and expected outcomes. The rest comes with selecting the right set technologies for serving to your needs.

As this a large topic, it will continue within further articles. Such as; technology assessments, technical solution description, potential areas of the use, and RPA + Computer Vision (intelligent automation).

Stay tuned!

Share this Post

Leave a Comment

Your email address will not be published. Required fields are marked *