A friend and I are interested in training the tesseract-OCR engine for a CV project. We tried using some wrappers such as PyTesser and pyocr, but the results are currently not as accurate as we need them to be. As such, we want to try training the tesseract to perform better for our purposes (i.e. identifying text on food labels), but are having some trouble installing the training tools.
我和一个朋友有兴趣为CV项目培训特斯拉- ocr引擎。我们尝试使用一些包装器,如PyTesser和pyocr,但是目前的结果并不像我们需要的那样准确。因此,我们想尝试训练tesseract来达到我们的目的(例如识别食品标签上的文字),但是在安装培训工具时遇到了一些麻烦。
What we've tried:
我们已经试过:
Looking on the google code website, the 'Compiling' page on the tesseract's google code wiki says the training tools are only available on version 3.03. However, the google code 'Downloads' page for tesseract-ocr only has the materials for 3.02. The bottom of the 'Compiling' page also has some comments about installing version 3.03 on Windows and OSX, but no comments yet for Linux users.
在谷歌代码网站上,tesseract的谷歌代码wiki上的“编译”页面说,培训工具只能在3.03版本中使用。但是,对于tesserac -ocr,谷歌代码“下载”页面只有3.02的材料。“编译”页面的底部也有一些关于在Windows和OSX上安装3.03版本的评论,但是对于Linux用户还没有评论。
There also appears to be some sort of 3.03 source package for Ubuntu but we're not sure how to access it on our computers and the 'Compiling' page says we need to run these commands:
Ubuntu似乎也有3.03源代码包,但我们不确定如何在电脑*问它,“编译”页面说我们需要运行以下命令:
make training
sudo make training-install
We've also found a google group thread about tesseract 3.03 but again it seems like these posts do not include advice for Linux users (unless we missed something during the initial read).
我们还发现了一个关于tesseract 3.03的谷歌组线程,但同样地,这些帖子似乎不包括针对Linux用户的建议(除非我们在最初的阅读中漏掉了一些内容)。
Is this actually a really simple command-line install problem? Or, is there a way train tesseract with 3.02 (which we currently have installed)? Have we been looking at the wrong places for information?
这实际上是一个简单的命令行安装问题吗?或者,有没有一种方法可以用3.02(我们现在已经安装了)来培训tesseract ?我们是否一直在错误的地方寻找信息?
Any advice or links to instructions for installing tesseract-ocr 3.03 for Linux distributions would be greatly appreciated! Thanks.
任何关于为Linux发行版安装tesserac -ocr 3.03的建议或链接都将非常感谢!谢谢。
4 个解决方案
#1
26
Tesseract can directly be installed in Ubuntu 14.04 using
Tesseract可以直接安装在Ubuntu 14.04中使用
sudo apt-get install tesseract-ocr
I don't have any idea if you can do it in older version of Ubuntu because the repo might be updated in later version of Ubuntu.
我不知道你能不能用旧版本的Ubuntu来做,因为repo可能会在以后版本的Ubuntu中更新。
#2
3
I had an aws ubuntu 14.04 instance. when I tried installing Tesseract with
我有一个aws ubuntu 14.04实例。当我尝试安装Tesseract时。
sudo apt-get install tesseract-ocr
It retuned package not found
它重新调优未找到的包
But this worked for me.
但这对我起了作用。
sudo apt-get update
sudo apt-get install tesseract-ocr
#3
2
Ubuntu is a debian based Linux distribution. The tesseract package you find will most likely be a debian package which will contain tesseract and the required default language files to allow you to run/train tesseract. You do NOT want the source package -- unless you just want to compile it yourself -- no need. You will not have to build tesseract, you just need to install the package. First, it appears you are new to Ubuntu, so please ready InstallingSoftware. It can be as easy as opening up an x-term and issuing the command apt-get install tesseract-pkgname
(note: that means whatever the package name is).
Ubuntu是一个基于debian的Linux发行版。您找到的tesseract包很可能是一个debian包,它将包含tesseract和所需的默认语言文件,以允许您运行/训练tesseract。您不需要源代码包——除非您只想自己编译它——不需要。您不必构建tesseract,只需安装包即可。首先,看起来你是Ubuntu的新用户,所以请准备好安装软件。它可以很容易地打开一个x项并发出命令apt-get install tesserac -pkgname(注意:无论包名是什么)。
There is no shortcut, take the time to understand whether you have a .deb package on your box that need to be installed or whether you are installing from a remote repository. The link above explains how to handle both.
没有快捷方式,请花时间了解您的框中是否有需要安装的.deb包,或者您是否正在从远程存储库安装。上面的链接解释了如何处理这两者。
Here is a specific Ubuntu thread dealing with installing tesseract Tesseract 3.0 + Ubuntu 10.04 Installation Guide Hope that helps. Tesseract is very good software.
这里有一个特定的Ubuntu线程处理安装tesseract 3.0 + Ubuntu 10.04安装指南希望这能有所帮助。Tesseract是很好的软件。
#4
1
I don't have any instructions for building Tesseract 3.03 for Linux specifically (I'm on Mac), but here's a link to download the source code for the 3.03 release candidate: https://tesseract-ocr.googlecode.com/archive/3.03-rc1.tar.gz
我没有专门为Linux构建Tesseract 3.03的说明(我在Mac上),但是这里有一个链接,可以下载3.03版本候选版本的源代码:https://tesserac -ocr.googlecode.com/archive/3.03-rc1.tar.gz
#1
26
Tesseract can directly be installed in Ubuntu 14.04 using
Tesseract可以直接安装在Ubuntu 14.04中使用
sudo apt-get install tesseract-ocr
I don't have any idea if you can do it in older version of Ubuntu because the repo might be updated in later version of Ubuntu.
我不知道你能不能用旧版本的Ubuntu来做,因为repo可能会在以后版本的Ubuntu中更新。
#2
3
I had an aws ubuntu 14.04 instance. when I tried installing Tesseract with
我有一个aws ubuntu 14.04实例。当我尝试安装Tesseract时。
sudo apt-get install tesseract-ocr
It retuned package not found
它重新调优未找到的包
But this worked for me.
但这对我起了作用。
sudo apt-get update
sudo apt-get install tesseract-ocr
#3
2
Ubuntu is a debian based Linux distribution. The tesseract package you find will most likely be a debian package which will contain tesseract and the required default language files to allow you to run/train tesseract. You do NOT want the source package -- unless you just want to compile it yourself -- no need. You will not have to build tesseract, you just need to install the package. First, it appears you are new to Ubuntu, so please ready InstallingSoftware. It can be as easy as opening up an x-term and issuing the command apt-get install tesseract-pkgname
(note: that means whatever the package name is).
Ubuntu是一个基于debian的Linux发行版。您找到的tesseract包很可能是一个debian包,它将包含tesseract和所需的默认语言文件,以允许您运行/训练tesseract。您不需要源代码包——除非您只想自己编译它——不需要。您不必构建tesseract,只需安装包即可。首先,看起来你是Ubuntu的新用户,所以请准备好安装软件。它可以很容易地打开一个x项并发出命令apt-get install tesserac -pkgname(注意:无论包名是什么)。
There is no shortcut, take the time to understand whether you have a .deb package on your box that need to be installed or whether you are installing from a remote repository. The link above explains how to handle both.
没有快捷方式,请花时间了解您的框中是否有需要安装的.deb包,或者您是否正在从远程存储库安装。上面的链接解释了如何处理这两者。
Here is a specific Ubuntu thread dealing with installing tesseract Tesseract 3.0 + Ubuntu 10.04 Installation Guide Hope that helps. Tesseract is very good software.
这里有一个特定的Ubuntu线程处理安装tesseract 3.0 + Ubuntu 10.04安装指南希望这能有所帮助。Tesseract是很好的软件。
#4
1
I don't have any instructions for building Tesseract 3.03 for Linux specifically (I'm on Mac), but here's a link to download the source code for the 3.03 release candidate: https://tesseract-ocr.googlecode.com/archive/3.03-rc1.tar.gz
我没有专门为Linux构建Tesseract 3.03的说明(我在Mac上),但是这里有一个链接,可以下载3.03版本候选版本的源代码:https://tesserac -ocr.googlecode.com/archive/3.03-rc1.tar.gz