Facebook Twitter Pinterest LinkedIn

Blog

How AI Turns Pictures into Words -The Fun Science Behind OCR

Posted on July 4, 2024July 4, 2024 by Lalit Kumar

Intro – What is OCR? How does it relate to AI?

Technology has come a long way since the inception of the digital age. Today, we have things like personal assistants that can talk to us and provide information in real-time.

Automation is the name of the game. Today, there are semi-automated factories with robot arms and assembly lines that manufacture goods with little human input.

OCR (Optical Character Recongition) helps with automating many data entry and text recognition systems. Its most common uses are in traffic monitoring systems and document processing in airports, government offices, and banks.

OCR is a product of artificial intelligence. In this article, we are going to explore how OCR works and what role AI plays in it.

How Does AI Turn Pictures Into Words

The simplest definition of OCR is recognizing words and letters inside an image and then converting them to a format in which word processors can use them. Without OCR, computers are unable to understand that there is text inside an image because, to them, it looks like a plot of pixels on a plane.

OCR technology enables computers to understand that specific configurations of pixels are actually letters and characters, i.e., text. Let’s see how that happens.

Image Input

The first step in optical character recognition is to obtain an image that has some text in it. This is quite simple to do. There are several ways you can do this.

You can attach a camera to the computer and use it to snap a picture of some text.
Use your phone’s camera to snap a picture and transfer it to your computer via cloud, USB cable, Bluetooth, or email.
Use a scanner to scan a document and save it in image form.

Once the image is on the computer, make sure that it is stored somewhere easy to find and has a recognizable name. Then, when you boot the OCR program, you can easily find the image and input it.

Preprocessing of Image

The preprocessing step of OCR deals with treating the image to make the actual text recognition easier.

Images can have a variety of problems depending on the method of input as well as the quality of the cameras. Some of these problems include:

Noise. The presence of unwanted artifacts in an image.
Skewed Image. The image itself, or the text inside it, is at an angle and not horizontally aligned.
Blurry Image. A blurry image is one where the characters in the image don’t have sharp outlines. It makes it difficult to recognize them.

Preprocessing is the step where all of these issues are removed. Unwanted artifacts are deleted, the image is deskewed, and the sharpness is increased to make the image less blurry.

Additionally, one more important thing happens here: binarization. This is a technique where the image is turned into two colors only (often black and white). The aim is to make the characters stand out against the background and make it easier to recognize them.

These are the usual happenings of the preprocessing stage.

Application of Text Recognition Techniques (2 basic techs)

Text recognition is the step where the real stuff happens. This is where the “AI” kicks into gear and recognizes the text. You can make this process as simple or as convoluted as you want.

With simple techniques, you can do pattern matching. Pattern matching is a simple approach where the OCR engine tries to match recognized patterns with existing ones.

This requires a database of glyphs (of characters and numbers). The OCR engine matches the patterns in the image with those in the glyph database. The closest match is considered the correct choice.

This approach works well with printed documents and digital fonts as they are very uniform and easy to recognize.

A more complex approach is feature extraction. This requires more sophisticated applications of AI. Instead of simply checking for matching patterns, the OCR engine uses “rules.” These rules check for “features” of a character. If all “features” of a particular letter, number, or sign are present, then the character is recognized as such.

To get a better idea of “features,” here is an example. The features of the letter “Capital H” include two parallel lines with a perpendicular horizontal line joining their centers.

As long as these features are present in an unrecognized character, the OCR engine will recognize it as a capital H. Most of the online OCR tools, such as an Image to text converter, use feature extraction. That’s why they are capable of recognizing handwriting and “graffiti” as well.

Post Processing of Text

Once text recognition is over, the OCR engine has to check whether the recognized characters form sensible words and sentences. OCR has come far, but it is not yet perfect. Even the most advanced systems do not boast of 100% accuracy. So, there are occasional cases of misrecognition.

To amend this, the post-processing step is added. This uses natural language processing to make sense of the output. If the output is sensible, then nothing is done, and if there are issues, they are rectified.

Finally, the output is compiled into ASCII or UNICODE text in DOC, PDF, TXT, or any other text document format. This is the output that users see when they use an OCR tool.

AI Technologies Used for OCR

We have discussed how OCR works from a high-level point of view. Now, let us discuss the various AI technologies that are working in the background.

NLP

NLP stands for natural language processing. This is an application of AI that enables computer systems to understand high-level (human) language. A system utilizing NLP is capable of not only understanding human language but also being used to write or speak (in text-to-speech systems).

In OCR, we have seen that NLP is used in post-processing to make sure that the converted text is correct.

Computer Vision

Computer vision is the branch of artificial intelligence that enables computers to look at images and understand that different pixel configurations represent real objects and characters.

Before advancements in computer vision, a computer thought of images as nothing more than 2D graphs. Now, systems can recognize faces, animals, objects, number plates, trees, basically everything that a human can.

Without computer vision, OCR would be impossible as the OCR system would not be able to recognize the text in the image.

Machine Learning

Machine learning is an advanced branch of artificial intelligence. It enables computers to learn from existing data and recognize patterns/draw conclusions from new data without explicit programming.

Machine learning itself is highly advanced, and there are many different techniques. Convolutional neural networks and deep learning are a few of the many known ones.

OCR systems require machine learning because it enables them to recognize patterns (characters) in new data (your provided image).

Without machine learning, the text recognition step would not be possible, as the computer would be unable to match patterns or draw conclusions.

Conclusion

In this way, different AI technologies come together to enable OCR. Now, you know the fun science behind OCR and why AI is so important for its function.