如何选择Tesseract和OpenCV？

I recently came across Tesseract and OpenCV. It looks like Tesseract is a full-fledged OCR engine and OpenCV can be used as a framework to create an OCR application/service.

我最近遇到了Tesseract和OpenCV。看起来Tesseract是一个成熟的OCR引擎，OpenCV可以用作创建OCR应用程序/服务的框架。

I tried using Tesseract on some of my images and its accuracy seems decent. Later, I came across a very simple tutorial on using OpenCV to perform OCR using Python and was impressed. In a few minutes, I finished training the system and its accuracy was good. But of course, taking this approach means I need to train my system extensively using a large training set.

我尝试在我的一些图像上使用Tesseract，它的准确性似乎不错。后来，我遇到了一个关于使用OpenCV使用Python执行OCR的非常简单的教程，并给人留下了深刻的印象。几分钟后，我完成了系统的培训，其准确性很好。但是，当然，采用这种方法意味着我需要使用大型训练集来广泛训练我的系统。

My specific questions are the following:

我的具体问题如下：

How does one choose between Tesseract and using OpenCV to build a custom OCR app?
如何在Tesseract和使用OpenCV构建自定义OCR应用程序之间做出选择？
There are training datasets available for Tesseract for different languages. Does OpenCV have something similar so that I don't have to start ground up to achieve OCR?
有针对不同语言的Tesseract提供的培训数据集。 OpenCV是否有类似的东西，以便我不必开始实现OCR？
Which one is better for a wanna-be commercial application?
对于想成为商业应用程序哪个更好？

Any suggestions?

有什么建议么？

Note: I am 24 hours old in the area of Computer Vision but am willing to put in time and effort to learn the pre-requisites.

注意：我24小时在计算机视觉领域，但我愿意花时间和精力学习先决条件。

4 个解决方案

#1

Tesseract is an OCR engine. It's used, worked on and funded by Google specifically to read text from images, perform basic document segmentation and operate on specific image inputs (a single word, line, paragraph, page, limited dictionaries, etc.).

Tesseract是一个OCR引擎。它由Google专门用于阅读文档，进行基本文档分割以及对特定图像输入（单个单词，行，段落，页面，有限词典等）进行操作。
OpenCV, on the other hand, is a computer vision library that includes features that let you perform some feature extraction and data classification. You can create a simple letter segmenter and classifier that performs basic OCR, but it is not a very good OCR engine (I've made one in Python before from scratch. It's really inaccurate for input that deviates from your training data).

另一方面，OpenCV是一个计算机视觉库，其中包含可以执行某些特征提取和数据分类的功能。您可以创建一个简单的字母分段器和分类器来执行基本的OCR，但它不是一个非常好的OCR引擎（我从头开始在Python中创建一个。对于偏离训练数据的输入，它实际上是不准确的）。

If you want to get a basic understanding of how hard OCR is, try OpenCV. Tesseract is for real OCR.

如果您想基本了解OCR的难度，请尝试使用OpenCV。 Tesseract用于真正的OCR。

#2

I am the author of that digit recognition tutorial you mentioned, and I would say, that is no way substitute for tesseract.

我是你提到的那个数字识别教程的作者，我想说，这无法替代tesseract。

Tesseract is a really good OCR engine, may be the best OpenSource OCR engine.

Tesseract是一款非常好的OCR引擎，可能是最好的OpenSource OCR引擎。

The tutorial you mentioned is just a try, to understand most simple working of OCR.

您提到的教程只是一个尝试，以了解OCR最简单的工作。

So, if you are looking for OCR app, I would recommend you to use OpenCV for preprocessing the image and then apply tesseract engine.

因此，如果您正在寻找OCR应用程序，我建议您使用OpenCV预处理图像，然后应用tesseract引擎。

#3

The two can be complementary. If you read the paper on OpenCV http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseracticdar2007.pdf

这两者可以是互补的。如果您阅读OpenCV上的论文http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseracticdar2007.pdf

It highlights that "Since HP had independently-developed page layout analysis technology that was used in products, (and therefore not released for open-source) Tesseract never needed its own page layout analysis. Tesseract therefore assumes that its input is a binary image with optional polygonal text regions defined."

它强调“由于惠普拥有独立开发的产品中使用的页面布局分析技术，（因此未针对开源发布），Tesseract从不需要自己的页面布局分析。因此，Tesseract假设其输入是二进制图像可选的多边形文本区域定义。“

This type of task can be performed by OpenCV and the resulting image handed off to Tesseract. You can find a sample of this type of code in the Git repo: https://github.com/Itseez/opencv_contrib/tree/master/modules/text/samples The samples use Tesseract APIs to do image to text conversion.

这种类型的任务可以由OpenCV执行，并将生成的图像传递给Tesseract。您可以在Git仓库中找到此类代码的示例：https：//github.com/Itseez/opencv_contrib/tree/master/modules/text/samples这些示例使用Tesseract API进行图像到文本的转换。

#4

OpenCV is a library for CV, used to analyze and process images in general. Tesseract is a library for OCR, which is a specialized subset of CV that's dedicated to extracting text from images.

OpenCV是一个用于CV的库，通常用于分析和处理图像。 Tesseract是OCR的一个库，它是CV的专用子集，专门用于从图像中提取文本。

From OpenCV.org

来自OpenCV.org

.....used to detect and recognize faces, identify objects, classify human actions in videos, track camera movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images together to produce a high resolution image of an entire scene, find similar images from an image database, remove red eyes from images taken using flash, follow eye movements, recognize scenery and establish markers to overlay it with augmented reality, etc

.....用于检测和识别面部，识别物体，对视频中的人体动作进行分类，跟踪相机移动，跟踪移动物体，提取物体的3D模型，从立体相机产生3D点云，将图像拼接在一起以产生高整个场景的分辨率图像，从图像数据库中查找相似图像，从使用闪光灯拍摄的图像中移除红眼，跟踪眼睛运动，识别风景并建立标记以使用增强现实覆盖它等

From Tesseract Github:

来自Tesseract Github：

.....can be used directly, or (for programmers) using an API to extract typed, handwritten or printed text from images. It supports a wide variety of languages.

.....可以直接使用，或者（对于程序员）使用API从图像中提取打印的，手写的或打印的文本。它支持多种语言。

#1

Tesseract is an OCR engine. It's used, worked on and funded by Google specifically to read text from images, perform basic document segmentation and operate on specific image inputs (a single word, line, paragraph, page, limited dictionaries, etc.).

Tesseract是一个OCR引擎。它由Google专门用于阅读文档，进行基本文档分割以及对特定图像输入（单个单词，行，段落，页面，有限词典等）进行操作。
OpenCV, on the other hand, is a computer vision library that includes features that let you perform some feature extraction and data classification. You can create a simple letter segmenter and classifier that performs basic OCR, but it is not a very good OCR engine (I've made one in Python before from scratch. It's really inaccurate for input that deviates from your training data).

另一方面，OpenCV是一个计算机视觉库，其中包含可以执行某些特征提取和数据分类的功能。您可以创建一个简单的字母分段器和分类器来执行基本的OCR，但它不是一个非常好的OCR引擎（我从头开始在Python中创建一个。对于偏离训练数据的输入，它实际上是不准确的）。