图像处理提高了tesseract的OCR精度。

时间:2022-01-13 23:27:47

I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed that text that is highly pixellated - for example that generated by fax machines - is especially difficult for tesseract to process - presumably all those jagged edges to the characters confound the shape-recognition algorithms.

我一直在使用tesseract将文档转换为文本。文档的质量范围很广,我在寻找什么样的图像处理可以改进结果的提示。我注意到,那些高度像素化的文本——例如由传真机生成的文本——对于tesseract来说尤其困难——假设所有这些锯齿状的边缘都混淆了形状识别算法。

What sort of image processing techniques would improve the accuracy? I've been using a Gaussian blur to smooth out the pixellated images and seen some small improvement, but I'm hoping that there is a more specific technique that would yield better results. Say a filter that was tuned to black and white images, which would smooth out irregular edges, followed by a filter which would increase the contrast to make the characters more distinct.

什么样的图像处理技术可以提高精确度?我一直在使用高斯模糊来消除像素化的图像,并且看到了一些细微的改进,但是我希望有一种更具体的技术能够产生更好的效果。说一个调到黑白图像的过滤器,它会平滑不规则的边缘,然后是一个过滤器,它会增加对比度,使字符更加清晰。

Any general tips for someone who is a novice at image processing?

对于一个在图像处理方面是新手的人,有什么建议吗?

10 个解决方案

#1


68  

  1. fix DPI (if needed) 300 DPI is minimum
  2. 修正DPI(如果需要)300 DPI最小。
  3. fix text size (e.g. 12 pt should be ok)
  4. 修复文本大小(例如,12 pt应该可以)
  5. try to fix text lines (deskew and dewarp text)
  6. 尝试修复文本行(deskew和dewarp文本)
  7. try to fix illumination of image (e.g. no dark part of image)
  8. 试着修复图像的照明(如图像的暗部)
  9. binarize and de-noise image
  10. binarize和de-noise形象

There is no universal command line that would fit to all cases (sometimes you need to blur and sharpen image). But you can give a try to TEXTCLEANER from Fred's ImageMagick Scripts.

没有通用的命令行适合所有情况(有时需要模糊和锐化图像)。但是你可以试着从Fred的ImageMagick脚本中尝试TEXTCLEANER。

If you are not fan of command line, maybe you can try to use opensource scantailor.sourceforge.net or commercial bookrestorer.

如果您不喜欢命令行,也许您可以尝试使用opensource scantailor.sourceforge.net或商业bookrestorer。

#2


58  

I am by no means an OCR expert. But I this week had need to convert text out of a jpg.

我绝不是一个OCR专家。但我这周需要将文本转换成jpg格式。

I started with a colorized, RGB 445x747 pixel jpg. I immediately tried tesseract on this, and the program converted almost nothing. I then went into GIMP and did the following. image>mode>grayscale image>scale image>1191x2000 pixels filters>enhance>unsharp mask with values of radius = 6.8, amount = 2.69, threshold = 0 I then saved as a new jpg at 100% quality.

我从一个彩色的RGB 445x747像素jpg开始。我立即尝试了tesseract,而这个程序几乎什么也没改变。然后我进入GIMP,做了如下工作。图像>模式>灰度图像>1191x2000像素滤波器>增强>的非锐化掩模值= 6.8,量= 2.69,阈值= 0,然后以100%的质量保存为新的jpg。

Tesseract then was able to extract all the text into a .txt file

然后,Tesseract能够将所有文本提取到.txt文件中。

Gimp is your friend.

Gimp是你的朋友。

#3


24  

Three points to improve the readability of the image: 1)Resize the image with variable height and width(multiply 0.5 and 1 and 2 with image height and width). 2)Convert the image to Gray scale format(Black and white). 3)Remove the noise pixels and make more clear(Filter the image).

3点提高图像的可读性:1)调整图像的高度和宽度(用图像的高度和宽度分别乘以0.5和1和2)。2)将图像转换为灰度格式(黑白)。3)去除噪声像素,使图像更清晰(过滤图像)。

Refer below code :

参考下面的代码:

//Resize
  public Bitmap Resize(Bitmap bmp, int newWidth, int newHeight)
        {

                Bitmap temp = (Bitmap)bmp;

                Bitmap bmap = new Bitmap(newWidth, newHeight, temp.PixelFormat);

                double nWidthFactor = (double)temp.Width / (double)newWidth;
                double nHeightFactor = (double)temp.Height / (double)newHeight;

                double fx, fy, nx, ny;
                int cx, cy, fr_x, fr_y;
                Color color1 = new Color();
                Color color2 = new Color();
                Color color3 = new Color();
                Color color4 = new Color();
                byte nRed, nGreen, nBlue;

                byte bp1, bp2;

                for (int x = 0; x < bmap.Width; ++x)
                {
                    for (int y = 0; y < bmap.Height; ++y)
                    {

                        fr_x = (int)Math.Floor(x * nWidthFactor);
                        fr_y = (int)Math.Floor(y * nHeightFactor);
                        cx = fr_x + 1;
                        if (cx >= temp.Width) cx = fr_x;
                        cy = fr_y + 1;
                        if (cy >= temp.Height) cy = fr_y;
                        fx = x * nWidthFactor - fr_x;
                        fy = y * nHeightFactor - fr_y;
                        nx = 1.0 - fx;
                        ny = 1.0 - fy;

                        color1 = temp.GetPixel(fr_x, fr_y);
                        color2 = temp.GetPixel(cx, fr_y);
                        color3 = temp.GetPixel(fr_x, cy);
                        color4 = temp.GetPixel(cx, cy);

                        // Blue
                        bp1 = (byte)(nx * color1.B + fx * color2.B);

                        bp2 = (byte)(nx * color3.B + fx * color4.B);

                        nBlue = (byte)(ny * (double)(bp1) + fy * (double)(bp2));

                        // Green
                        bp1 = (byte)(nx * color1.G + fx * color2.G);

                        bp2 = (byte)(nx * color3.G + fx * color4.G);

                        nGreen = (byte)(ny * (double)(bp1) + fy * (double)(bp2));

                        // Red
                        bp1 = (byte)(nx * color1.R + fx * color2.R);

                        bp2 = (byte)(nx * color3.R + fx * color4.R);

                        nRed = (byte)(ny * (double)(bp1) + fy * (double)(bp2));

                        bmap.SetPixel(x, y, System.Drawing.Color.FromArgb
                (255, nRed, nGreen, nBlue));
                    }
                }



                bmap = SetGrayscale(bmap);
                bmap = RemoveNoise(bmap);

                return bmap;

        }


//SetGrayscale
  public Bitmap SetGrayscale(Bitmap img)
        {

            Bitmap temp = (Bitmap)img;
            Bitmap bmap = (Bitmap)temp.Clone();
            Color c;
            for (int i = 0; i < bmap.Width; i++)
            {
                for (int j = 0; j < bmap.Height; j++)
                {
                    c = bmap.GetPixel(i, j);
                    byte gray = (byte)(.299 * c.R + .587 * c.G + .114 * c.B);

                    bmap.SetPixel(i, j, Color.FromArgb(gray, gray, gray));
                }
            }
            return (Bitmap)bmap.Clone();

        }
//RemoveNoise
   public Bitmap RemoveNoise(Bitmap bmap)
        {

            for (var x = 0; x < bmap.Width; x++)
            {
                for (var y = 0; y < bmap.Height; y++)
                {
                    var pixel = bmap.GetPixel(x, y);
                    if (pixel.R < 162 && pixel.G < 162 && pixel.B < 162)
                        bmap.SetPixel(x, y, Color.Black);
                    else if (pixel.R > 162 && pixel.G > 162 && pixel.B > 162)
                        bmap.SetPixel(x, y, Color.White);
                }
            }

            return bmap;
        }

INPUT IMAGE
图像处理提高了tesseract的OCR精度。

输入图像

OUTPUT IMAGE 图像处理提高了tesseract的OCR精度。

输出图像

#4


16  

This is somewhat ago but it still might be useful.

这在某种程度上是以前的,但它仍然可能是有用的。

My experience shows that resizing the image in-memory before passing it to tesseract sometimes helps.

我的经验表明,在将图像传递给tesseract之前调整其内存大小有时会有所帮助。

Try different modes of interpolation. The post https://*.com/a/4756906/146003 helped me a lot.

尝试不同的插值模式。post https://*.com/a/4756906/146003帮助了我很多。

#5


13  

What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project. http://sourceforge.net/projects/capture2text/files/Capture2Text/.

对我来说,最能帮助我的是Capture2Text项目的源代码。http://sourceforge.net/projects/capture2text/files/Capture2Text/。

BTW: Kudos to it's author for sharing such a painstaking algorithm.

顺便说一下,它的作者分享了这样一种艰苦的算法。

Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.

特别注意文件Capture2Text\SourceCode\leptonica_util\leptonica_util。这是本实用程序的图像预处理的本质。

If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.

如果您运行二进制文件,您可以在Capture2Text\Output\文件夹中检查进程之前/之后的图像转换。

P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.

p.s提到的解决方案使用Tesseract的OCR和Leptonica进行预处理。

#6


12  

Java version for Sathyaraj's code above:

Sathyaraj代码的Java版本:

// Resize
public Bitmap resize(Bitmap img, int newWidth, int newHeight) {
    Bitmap bmap = img.copy(img.getConfig(), true);

    double nWidthFactor = (double) img.getWidth() / (double) newWidth;
    double nHeightFactor = (double) img.getHeight() / (double) newHeight;

    double fx, fy, nx, ny;
    int cx, cy, fr_x, fr_y;
    int color1;
    int color2;
    int color3;
    int color4;
    byte nRed, nGreen, nBlue;

    byte bp1, bp2;

    for (int x = 0; x < bmap.getWidth(); ++x) {
        for (int y = 0; y < bmap.getHeight(); ++y) {

            fr_x = (int) Math.floor(x * nWidthFactor);
            fr_y = (int) Math.floor(y * nHeightFactor);
            cx = fr_x + 1;
            if (cx >= img.getWidth())
                cx = fr_x;
            cy = fr_y + 1;
            if (cy >= img.getHeight())
                cy = fr_y;
            fx = x * nWidthFactor - fr_x;
            fy = y * nHeightFactor - fr_y;
            nx = 1.0 - fx;
            ny = 1.0 - fy;

            color1 = img.getPixel(fr_x, fr_y);
            color2 = img.getPixel(cx, fr_y);
            color3 = img.getPixel(fr_x, cy);
            color4 = img.getPixel(cx, cy);

            // Blue
            bp1 = (byte) (nx * Color.blue(color1) + fx * Color.blue(color2));
            bp2 = (byte) (nx * Color.blue(color3) + fx * Color.blue(color4));
            nBlue = (byte) (ny * (double) (bp1) + fy * (double) (bp2));

            // Green
            bp1 = (byte) (nx * Color.green(color1) + fx * Color.green(color2));
            bp2 = (byte) (nx * Color.green(color3) + fx * Color.green(color4));
            nGreen = (byte) (ny * (double) (bp1) + fy * (double) (bp2));

            // Red
            bp1 = (byte) (nx * Color.red(color1) + fx * Color.red(color2));
            bp2 = (byte) (nx * Color.red(color3) + fx * Color.red(color4));
            nRed = (byte) (ny * (double) (bp1) + fy * (double) (bp2));

            bmap.setPixel(x, y, Color.argb(255, nRed, nGreen, nBlue));
        }
    }

    bmap = setGrayscale(bmap);
    bmap = removeNoise(bmap);

    return bmap;
}

// SetGrayscale
private Bitmap setGrayscale(Bitmap img) {
    Bitmap bmap = img.copy(img.getConfig(), true);
    int c;
    for (int i = 0; i < bmap.getWidth(); i++) {
        for (int j = 0; j < bmap.getHeight(); j++) {
            c = bmap.getPixel(i, j);
            byte gray = (byte) (.299 * Color.red(c) + .587 * Color.green(c)
                    + .114 * Color.blue(c));

            bmap.setPixel(i, j, Color.argb(255, gray, gray, gray));
        }
    }
    return bmap;
}

// RemoveNoise
private Bitmap removeNoise(Bitmap bmap) {
    for (int x = 0; x < bmap.getWidth(); x++) {
        for (int y = 0; y < bmap.getHeight(); y++) {
            int pixel = bmap.getPixel(x, y);
            if (Color.red(pixel) < 162 && Color.green(pixel) < 162 && Color.blue(pixel) < 162) {
                bmap.setPixel(x, y, Color.BLACK);
            }
        }
    }
    for (int x = 0; x < bmap.getWidth(); x++) {
        for (int y = 0; y < bmap.getHeight(); y++) {
            int pixel = bmap.getPixel(x, y);
            if (Color.red(pixel) > 162 && Color.green(pixel) > 162 && Color.blue(pixel) > 162) {
                bmap.setPixel(x, y, Color.WHITE);
            }
        }
    }
    return bmap;
}

#7


6  

Adaptive thresholding is important if the lighting is uneven across the image. My preprocessing using GraphicsMagic is mentioned in this post: https://groups.google.com/forum/#!topic/tesseract-ocr/jONGSChLRv4

如果图像中光线不均匀,自适应阈值是很重要的。我在这篇文章中提到了使用GraphicsMagic的预处理:https://groups.google.com/forum/#!

GraphicsMagic also has the -lat feature for Linear time Adaptive Threshold which I will try soon.

GraphicsMagic还具有线性时间自适应阈值的-lat特性,我将很快尝试。

Another method of thresholding using OpenCV is described here: http://docs.opencv.org/trunk/doc/py_tutorials/py_imgproc/py_thresholding/py_thresholding.html

另一种使用OpenCV的阈值方法是:http://docs.opencv.org/trunk/doc/py_tutorials/py_imgproc/py_olding/py_olding.html。

#8


5  

The Tesseract documentation contains some good details on how to improve the OCR quality via image processing steps.

Tesseract文档包含了一些关于如何通过图像处理步骤提高OCR质量的详细信息。

To some degree, Tesseract automatically applies them. It is also possible to tell Tesseract to write an intermediate image for inspection, i.e. to check how well the internal image processing works (search for tessedit_write_images in the above reference).

在某种程度上,Tesseract自动应用它们。还可以告诉Tesseract编写一个中间的检查图像,即检查内部图像处理工作的情况(在上面的引用中搜索tessedit_write_images)。

More importantly, the new neural network system in Tesseract 4 yields much better OCR results - in general and especially for images with some noise. It is enabled with --oem 1, e.g. as in:

更重要的是,在Tesseract 4中新的神经网络系统产生了更好的OCR结果——通常,特别是对于带有一些噪声的图像。它是由-oem 1,例如:

$ tesseract --oem 1 -l deu page.png result pdf

(this example selects the german language)

(此示例选择德语)

Thus, it makes sense to test first how far you get with the new Tesseract LSTM mode before applying some custom pre-processing image processing steps.

因此,在应用一些定制的预处理图像处理步骤之前,先测试一下您在新的Tesseract LSTM模式下获得的距离是有意义的。

(as of late 2017, Tesseract 4 isn't released as stable yet, but the development version is usable)

(到2017年底,Tesseract 4还没有发布,但开发版本是可用的)

#9


2  

I did these to get good results out of an image which has not very small text.

我这样做是为了得到一个图像的好结果,它的文字并不非常小。

  1. Apply blur to the original image.
  2. 将模糊应用于原始图像。
  3. Apply Adaptive Threshold.
  4. 应用自适应阈值。
  5. Apply Sharpening effect.
  6. 应用锐化的效果。

And if the still not getting good results, scale the image to 150% or 200%.

如果仍然没有得到好的结果,将图像缩放到150%或200%。

#10


2  

Reading text from image documents using any OCR engine have many issues in order get good accuracy. There is no fixed solution to all the cases but here are a few things which should be considered to improve OCR results.

使用任何OCR引擎从图像文档中读取文本有许多问题,从而获得良好的准确性。对于所有的案例都没有固定的解决方案,但是这里有一些事情应该被考虑来改善OCR的结果。

1) Presence of noise due to poor image quality / unwanted elements/blobs in the background region. This requires some pre-processing operations like noise removal which can be easily done using gaussian filter or normal median filter methods. These are also available in OpenCV.

1)背景区域中由于图像质量差/不需要的元素/斑点而产生的噪声。这需要一些预处理操作,比如噪声去除,可以很容易地使用高斯滤波器或普通中值滤波方法。这些也可以在OpenCV中找到。

2) Wrong orientation of image: Because of wrong orientation OCR engine fails to segment the lines and words in image correctly which gives the worst accuracy.

2)图像定位错误:由于方向错误,OCR引擎无法正确分割图像中的线和单词,从而造成最糟糕的准确性。

3) Presence of lines: While doing word or line segmentation OCR engine sometimes also tries to merge the words and lines together and thus processing wrong content and hence giving wrong results. There are other issues also but these are the basic ones.

3)线条的存在:在做单词或行分割的时候,OCR引擎有时也会试图将单词和线合并在一起,从而处理错误的内容,从而导致错误的结果。还有其他问题,但这些都是基本问题。

This post OCR application is an example case where some image pre-preocessing and post processing on OCR result can be applied to get better OCR accuracy.

本文的OCR应用程序是一个例子,其中一些图像预处理和后期处理的OCR结果可以应用于获得更好的OCR精度。

#1


68  

  1. fix DPI (if needed) 300 DPI is minimum
  2. 修正DPI(如果需要)300 DPI最小。
  3. fix text size (e.g. 12 pt should be ok)
  4. 修复文本大小(例如,12 pt应该可以)
  5. try to fix text lines (deskew and dewarp text)
  6. 尝试修复文本行(deskew和dewarp文本)
  7. try to fix illumination of image (e.g. no dark part of image)
  8. 试着修复图像的照明(如图像的暗部)
  9. binarize and de-noise image
  10. binarize和de-noise形象

There is no universal command line that would fit to all cases (sometimes you need to blur and sharpen image). But you can give a try to TEXTCLEANER from Fred's ImageMagick Scripts.

没有通用的命令行适合所有情况(有时需要模糊和锐化图像)。但是你可以试着从Fred的ImageMagick脚本中尝试TEXTCLEANER。

If you are not fan of command line, maybe you can try to use opensource scantailor.sourceforge.net or commercial bookrestorer.

如果您不喜欢命令行,也许您可以尝试使用opensource scantailor.sourceforge.net或商业bookrestorer。

#2


58  

I am by no means an OCR expert. But I this week had need to convert text out of a jpg.

我绝不是一个OCR专家。但我这周需要将文本转换成jpg格式。

I started with a colorized, RGB 445x747 pixel jpg. I immediately tried tesseract on this, and the program converted almost nothing. I then went into GIMP and did the following. image>mode>grayscale image>scale image>1191x2000 pixels filters>enhance>unsharp mask with values of radius = 6.8, amount = 2.69, threshold = 0 I then saved as a new jpg at 100% quality.

我从一个彩色的RGB 445x747像素jpg开始。我立即尝试了tesseract,而这个程序几乎什么也没改变。然后我进入GIMP,做了如下工作。图像>模式>灰度图像>1191x2000像素滤波器>增强>的非锐化掩模值= 6.8,量= 2.69,阈值= 0,然后以100%的质量保存为新的jpg。

Tesseract then was able to extract all the text into a .txt file

然后,Tesseract能够将所有文本提取到.txt文件中。

Gimp is your friend.

Gimp是你的朋友。

#3


24  

Three points to improve the readability of the image: 1)Resize the image with variable height and width(multiply 0.5 and 1 and 2 with image height and width). 2)Convert the image to Gray scale format(Black and white). 3)Remove the noise pixels and make more clear(Filter the image).

3点提高图像的可读性:1)调整图像的高度和宽度(用图像的高度和宽度分别乘以0.5和1和2)。2)将图像转换为灰度格式(黑白)。3)去除噪声像素,使图像更清晰(过滤图像)。

Refer below code :

参考下面的代码:

//Resize
  public Bitmap Resize(Bitmap bmp, int newWidth, int newHeight)
        {

                Bitmap temp = (Bitmap)bmp;

                Bitmap bmap = new Bitmap(newWidth, newHeight, temp.PixelFormat);

                double nWidthFactor = (double)temp.Width / (double)newWidth;
                double nHeightFactor = (double)temp.Height / (double)newHeight;

                double fx, fy, nx, ny;
                int cx, cy, fr_x, fr_y;
                Color color1 = new Color();
                Color color2 = new Color();
                Color color3 = new Color();
                Color color4 = new Color();
                byte nRed, nGreen, nBlue;

                byte bp1, bp2;

                for (int x = 0; x < bmap.Width; ++x)
                {
                    for (int y = 0; y < bmap.Height; ++y)
                    {

                        fr_x = (int)Math.Floor(x * nWidthFactor);
                        fr_y = (int)Math.Floor(y * nHeightFactor);
                        cx = fr_x + 1;
                        if (cx >= temp.Width) cx = fr_x;
                        cy = fr_y + 1;
                        if (cy >= temp.Height) cy = fr_y;
                        fx = x * nWidthFactor - fr_x;
                        fy = y * nHeightFactor - fr_y;
                        nx = 1.0 - fx;
                        ny = 1.0 - fy;

                        color1 = temp.GetPixel(fr_x, fr_y);
                        color2 = temp.GetPixel(cx, fr_y);
                        color3 = temp.GetPixel(fr_x, cy);
                        color4 = temp.GetPixel(cx, cy);

                        // Blue
                        bp1 = (byte)(nx * color1.B + fx * color2.B);

                        bp2 = (byte)(nx * color3.B + fx * color4.B);

                        nBlue = (byte)(ny * (double)(bp1) + fy * (double)(bp2));

                        // Green
                        bp1 = (byte)(nx * color1.G + fx * color2.G);

                        bp2 = (byte)(nx * color3.G + fx * color4.G);

                        nGreen = (byte)(ny * (double)(bp1) + fy * (double)(bp2));

                        // Red
                        bp1 = (byte)(nx * color1.R + fx * color2.R);

                        bp2 = (byte)(nx * color3.R + fx * color4.R);

                        nRed = (byte)(ny * (double)(bp1) + fy * (double)(bp2));

                        bmap.SetPixel(x, y, System.Drawing.Color.FromArgb
                (255, nRed, nGreen, nBlue));
                    }
                }



                bmap = SetGrayscale(bmap);
                bmap = RemoveNoise(bmap);

                return bmap;

        }


//SetGrayscale
  public Bitmap SetGrayscale(Bitmap img)
        {

            Bitmap temp = (Bitmap)img;
            Bitmap bmap = (Bitmap)temp.Clone();
            Color c;
            for (int i = 0; i < bmap.Width; i++)
            {
                for (int j = 0; j < bmap.Height; j++)
                {
                    c = bmap.GetPixel(i, j);
                    byte gray = (byte)(.299 * c.R + .587 * c.G + .114 * c.B);

                    bmap.SetPixel(i, j, Color.FromArgb(gray, gray, gray));
                }
            }
            return (Bitmap)bmap.Clone();

        }
//RemoveNoise
   public Bitmap RemoveNoise(Bitmap bmap)
        {

            for (var x = 0; x < bmap.Width; x++)
            {
                for (var y = 0; y < bmap.Height; y++)
                {
                    var pixel = bmap.GetPixel(x, y);
                    if (pixel.R < 162 && pixel.G < 162 && pixel.B < 162)
                        bmap.SetPixel(x, y, Color.Black);
                    else if (pixel.R > 162 && pixel.G > 162 && pixel.B > 162)
                        bmap.SetPixel(x, y, Color.White);
                }
            }

            return bmap;
        }

INPUT IMAGE
图像处理提高了tesseract的OCR精度。

输入图像

OUTPUT IMAGE 图像处理提高了tesseract的OCR精度。

输出图像

#4


16  

This is somewhat ago but it still might be useful.

这在某种程度上是以前的,但它仍然可能是有用的。

My experience shows that resizing the image in-memory before passing it to tesseract sometimes helps.

我的经验表明,在将图像传递给tesseract之前调整其内存大小有时会有所帮助。

Try different modes of interpolation. The post https://*.com/a/4756906/146003 helped me a lot.

尝试不同的插值模式。post https://*.com/a/4756906/146003帮助了我很多。

#5


13  

What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project. http://sourceforge.net/projects/capture2text/files/Capture2Text/.

对我来说,最能帮助我的是Capture2Text项目的源代码。http://sourceforge.net/projects/capture2text/files/Capture2Text/。

BTW: Kudos to it's author for sharing such a painstaking algorithm.

顺便说一下,它的作者分享了这样一种艰苦的算法。

Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.

特别注意文件Capture2Text\SourceCode\leptonica_util\leptonica_util。这是本实用程序的图像预处理的本质。

If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.

如果您运行二进制文件,您可以在Capture2Text\Output\文件夹中检查进程之前/之后的图像转换。

P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.

p.s提到的解决方案使用Tesseract的OCR和Leptonica进行预处理。

#6


12  

Java version for Sathyaraj's code above:

Sathyaraj代码的Java版本:

// Resize
public Bitmap resize(Bitmap img, int newWidth, int newHeight) {
    Bitmap bmap = img.copy(img.getConfig(), true);

    double nWidthFactor = (double) img.getWidth() / (double) newWidth;
    double nHeightFactor = (double) img.getHeight() / (double) newHeight;

    double fx, fy, nx, ny;
    int cx, cy, fr_x, fr_y;
    int color1;
    int color2;
    int color3;
    int color4;
    byte nRed, nGreen, nBlue;

    byte bp1, bp2;

    for (int x = 0; x < bmap.getWidth(); ++x) {
        for (int y = 0; y < bmap.getHeight(); ++y) {

            fr_x = (int) Math.floor(x * nWidthFactor);
            fr_y = (int) Math.floor(y * nHeightFactor);
            cx = fr_x + 1;
            if (cx >= img.getWidth())
                cx = fr_x;
            cy = fr_y + 1;
            if (cy >= img.getHeight())
                cy = fr_y;
            fx = x * nWidthFactor - fr_x;
            fy = y * nHeightFactor - fr_y;
            nx = 1.0 - fx;
            ny = 1.0 - fy;

            color1 = img.getPixel(fr_x, fr_y);
            color2 = img.getPixel(cx, fr_y);
            color3 = img.getPixel(fr_x, cy);
            color4 = img.getPixel(cx, cy);

            // Blue
            bp1 = (byte) (nx * Color.blue(color1) + fx * Color.blue(color2));
            bp2 = (byte) (nx * Color.blue(color3) + fx * Color.blue(color4));
            nBlue = (byte) (ny * (double) (bp1) + fy * (double) (bp2));

            // Green
            bp1 = (byte) (nx * Color.green(color1) + fx * Color.green(color2));
            bp2 = (byte) (nx * Color.green(color3) + fx * Color.green(color4));
            nGreen = (byte) (ny * (double) (bp1) + fy * (double) (bp2));

            // Red
            bp1 = (byte) (nx * Color.red(color1) + fx * Color.red(color2));
            bp2 = (byte) (nx * Color.red(color3) + fx * Color.red(color4));
            nRed = (byte) (ny * (double) (bp1) + fy * (double) (bp2));

            bmap.setPixel(x, y, Color.argb(255, nRed, nGreen, nBlue));
        }
    }

    bmap = setGrayscale(bmap);
    bmap = removeNoise(bmap);

    return bmap;
}

// SetGrayscale
private Bitmap setGrayscale(Bitmap img) {
    Bitmap bmap = img.copy(img.getConfig(), true);
    int c;
    for (int i = 0; i < bmap.getWidth(); i++) {
        for (int j = 0; j < bmap.getHeight(); j++) {
            c = bmap.getPixel(i, j);
            byte gray = (byte) (.299 * Color.red(c) + .587 * Color.green(c)
                    + .114 * Color.blue(c));

            bmap.setPixel(i, j, Color.argb(255, gray, gray, gray));
        }
    }
    return bmap;
}

// RemoveNoise
private Bitmap removeNoise(Bitmap bmap) {
    for (int x = 0; x < bmap.getWidth(); x++) {
        for (int y = 0; y < bmap.getHeight(); y++) {
            int pixel = bmap.getPixel(x, y);
            if (Color.red(pixel) < 162 && Color.green(pixel) < 162 && Color.blue(pixel) < 162) {
                bmap.setPixel(x, y, Color.BLACK);
            }
        }
    }
    for (int x = 0; x < bmap.getWidth(); x++) {
        for (int y = 0; y < bmap.getHeight(); y++) {
            int pixel = bmap.getPixel(x, y);
            if (Color.red(pixel) > 162 && Color.green(pixel) > 162 && Color.blue(pixel) > 162) {
                bmap.setPixel(x, y, Color.WHITE);
            }
        }
    }
    return bmap;
}

#7


6  

Adaptive thresholding is important if the lighting is uneven across the image. My preprocessing using GraphicsMagic is mentioned in this post: https://groups.google.com/forum/#!topic/tesseract-ocr/jONGSChLRv4

如果图像中光线不均匀,自适应阈值是很重要的。我在这篇文章中提到了使用GraphicsMagic的预处理:https://groups.google.com/forum/#!

GraphicsMagic also has the -lat feature for Linear time Adaptive Threshold which I will try soon.

GraphicsMagic还具有线性时间自适应阈值的-lat特性,我将很快尝试。

Another method of thresholding using OpenCV is described here: http://docs.opencv.org/trunk/doc/py_tutorials/py_imgproc/py_thresholding/py_thresholding.html

另一种使用OpenCV的阈值方法是:http://docs.opencv.org/trunk/doc/py_tutorials/py_imgproc/py_olding/py_olding.html。

#8


5  

The Tesseract documentation contains some good details on how to improve the OCR quality via image processing steps.

Tesseract文档包含了一些关于如何通过图像处理步骤提高OCR质量的详细信息。

To some degree, Tesseract automatically applies them. It is also possible to tell Tesseract to write an intermediate image for inspection, i.e. to check how well the internal image processing works (search for tessedit_write_images in the above reference).

在某种程度上,Tesseract自动应用它们。还可以告诉Tesseract编写一个中间的检查图像,即检查内部图像处理工作的情况(在上面的引用中搜索tessedit_write_images)。

More importantly, the new neural network system in Tesseract 4 yields much better OCR results - in general and especially for images with some noise. It is enabled with --oem 1, e.g. as in:

更重要的是,在Tesseract 4中新的神经网络系统产生了更好的OCR结果——通常,特别是对于带有一些噪声的图像。它是由-oem 1,例如:

$ tesseract --oem 1 -l deu page.png result pdf

(this example selects the german language)

(此示例选择德语)

Thus, it makes sense to test first how far you get with the new Tesseract LSTM mode before applying some custom pre-processing image processing steps.

因此,在应用一些定制的预处理图像处理步骤之前,先测试一下您在新的Tesseract LSTM模式下获得的距离是有意义的。

(as of late 2017, Tesseract 4 isn't released as stable yet, but the development version is usable)

(到2017年底,Tesseract 4还没有发布,但开发版本是可用的)

#9


2  

I did these to get good results out of an image which has not very small text.

我这样做是为了得到一个图像的好结果,它的文字并不非常小。

  1. Apply blur to the original image.
  2. 将模糊应用于原始图像。
  3. Apply Adaptive Threshold.
  4. 应用自适应阈值。
  5. Apply Sharpening effect.
  6. 应用锐化的效果。

And if the still not getting good results, scale the image to 150% or 200%.

如果仍然没有得到好的结果,将图像缩放到150%或200%。

#10


2  

Reading text from image documents using any OCR engine have many issues in order get good accuracy. There is no fixed solution to all the cases but here are a few things which should be considered to improve OCR results.

使用任何OCR引擎从图像文档中读取文本有许多问题,从而获得良好的准确性。对于所有的案例都没有固定的解决方案,但是这里有一些事情应该被考虑来改善OCR的结果。

1) Presence of noise due to poor image quality / unwanted elements/blobs in the background region. This requires some pre-processing operations like noise removal which can be easily done using gaussian filter or normal median filter methods. These are also available in OpenCV.

1)背景区域中由于图像质量差/不需要的元素/斑点而产生的噪声。这需要一些预处理操作,比如噪声去除,可以很容易地使用高斯滤波器或普通中值滤波方法。这些也可以在OpenCV中找到。

2) Wrong orientation of image: Because of wrong orientation OCR engine fails to segment the lines and words in image correctly which gives the worst accuracy.

2)图像定位错误:由于方向错误,OCR引擎无法正确分割图像中的线和单词,从而造成最糟糕的准确性。

3) Presence of lines: While doing word or line segmentation OCR engine sometimes also tries to merge the words and lines together and thus processing wrong content and hence giving wrong results. There are other issues also but these are the basic ones.

3)线条的存在:在做单词或行分割的时候,OCR引擎有时也会试图将单词和线合并在一起,从而处理错误的内容,从而导致错误的结果。还有其他问题,但这些都是基本问题。

This post OCR application is an example case where some image pre-preocessing and post processing on OCR result can be applied to get better OCR accuracy.

本文的OCR应用程序是一个例子,其中一些图像预处理和后期处理的OCR结果可以应用于获得更好的OCR精度。