I have encountered a problem while setting up the font properties file to train tesseract v 3.01 ocr engine. according to the 3.01v you are required to setup a font properties file. The format of the font_properties file is such that
我在设置字体属性文件的时候遇到了一个问题,就是培训tesseract v . 3.01 ocr引擎。根据3.01v,您需要设置一个字体属性文件。font_properties文件的格式是这样的。
and 0 or 1 flags must be used to indicate the properties. does any one know what fixed, serif or fraktur means?
必须使用0或1标记来指示属性。有没有人知道什么是固定的,有衬线的或尖角的?
and when I run it with my font_properties file it throws the following error
当我使用font_properties文件运行它时,它会抛出以下错误。
Thank you
谢谢你!
3 个解决方案
#1
1
No input files to Tesseract training should have spaces in their names.
没有输入文件到Tesseract培训应该有空格的名字。
The entry in font_properties should match the fontname part of the name of the image file; e.g., if font_properties has uknumberplate, then the filename of your image should be eng.uknumberplate.exp0.tif.
font_properties中的条目应该匹配图像文件名称的fontname部分;如果font_properties有uknumberplate,那么图像的文件名应该是eng.uknumberplate.exp0.tif。
#2
1
Fixed (or monospaced), Serif, and Fraktur are standard font descriptors - you can look up what they mean on Wikipedia.
固定(或单间隔)、Serif和Fraktur是标准的字体描述符——你可以在Wikipedia上查找它们的意思。
Regarding your error, ensure you have formatted your font_properties file properly correctly, as outlined in the Training Tesseract 3 tutorial below. If you're only training one font, the file should contain one line, in your case
关于您的错误,请确保您已经正确地格式化了您的font_properties文件,如下面的培训Tesseract 3教程中所概述的那样。如果你只训练一种字体,文件应该包含一行,在你的情况下。
times_new_roman 0 0 0 1 0
时间为0 0 0 1 0。
You haven't included what you've put in your font_properties file, but note that your font name should not have spaces!
您还没有包括您在font_properties文件中添加的内容,但是请注意,您的字体名称应该没有空格!
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
#3
0
you have to put font_properties.txt in the command, but then an exception is thrown in windows, but it finds the font properties file.
你必须输入font_properties。在命令中txt,但是在windows中抛出一个异常,但是它找到了字体属性文件。
#1
1
No input files to Tesseract training should have spaces in their names.
没有输入文件到Tesseract培训应该有空格的名字。
The entry in font_properties should match the fontname part of the name of the image file; e.g., if font_properties has uknumberplate, then the filename of your image should be eng.uknumberplate.exp0.tif.
font_properties中的条目应该匹配图像文件名称的fontname部分;如果font_properties有uknumberplate,那么图像的文件名应该是eng.uknumberplate.exp0.tif。
#2
1
Fixed (or monospaced), Serif, and Fraktur are standard font descriptors - you can look up what they mean on Wikipedia.
固定(或单间隔)、Serif和Fraktur是标准的字体描述符——你可以在Wikipedia上查找它们的意思。
Regarding your error, ensure you have formatted your font_properties file properly correctly, as outlined in the Training Tesseract 3 tutorial below. If you're only training one font, the file should contain one line, in your case
关于您的错误,请确保您已经正确地格式化了您的font_properties文件,如下面的培训Tesseract 3教程中所概述的那样。如果你只训练一种字体,文件应该包含一行,在你的情况下。
times_new_roman 0 0 0 1 0
时间为0 0 0 1 0。
You haven't included what you've put in your font_properties file, but note that your font name should not have spaces!
您还没有包括您在font_properties文件中添加的内容,但是请注意,您的字体名称应该没有空格!
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
#3
0
you have to put font_properties.txt in the command, but then an exception is thrown in windows, but it finds the font properties file.
你必须输入font_properties。在命令中txt,但是在windows中抛出一个异常,但是它找到了字体属性文件。