如何从图像中裁剪出最大的矩形

时间:2022-01-18 08:57:06

I have a few images of pages on a table. I would like to crop the pages out of the image. Generally, the page will be the biggest rectangle in the image, however, all four sides of the rectangle might not be visible in some cases.

我在桌子上有一些页面图像。我想从图像中裁剪页面。通常,页面将是图像中最大的矩形,但是,在某些情况下,矩形的所有四个边可能都不可见。

I am doing the following but not getting desired results:

我正在做以下但没有得到理想的结果:

import cv2
import numpy as np

im = cv2.imread('images/img5.jpg')
gray=cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
ret,thresh = cv2.threshold(gray,127,255,0)
_,contours,_ = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
areas = [cv2.contourArea(c) for c in contours]
max_index = np.argmax(areas)
cnt=contours[max_index]
x,y,w,h = cv2.boundingRect(cnt)
cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
cv2.imshow("Show",im)
cv2.imwrite("images/img5_rect.jpg", im)
cv2.waitKey(0)

Below are a few examples:

以下是一些例子:

1st Example: I can find the rectangle in this image , however, would like if the remaining part of the wood can be cropped out as well. 如何从图像中裁剪出最大的矩形

第一个例子:我可以在这个图像中找到矩形,但是,如果木材的剩余部分也可以被裁剪掉。

如何从图像中裁剪出最大的矩形

2nd Example: Not finding the correct dimensions of the rectangle in this image. 如何从图像中裁剪出最大的矩形

第二个示例:未在此图像中找到矩形的正确尺寸。

如何从图像中裁剪出最大的矩形

3rd Example: Not able to find the correct dimensions in this image either. 如何从图像中裁剪出最大的矩形 如何从图像中裁剪出最大的矩形

第3个示例:无法在此图像中找到正确的尺寸。

4th Example: Same with this as well. 如何从图像中裁剪出最大的矩形 如何从图像中裁剪出最大的矩形

第四个例子:同样如此。

2 个解决方案

#1


27  

As I have previously done something similar, I have experienced with hough transforms, but they were much harder to get right for my case than using contours. I have the following suggestions to help you get started:

正如我以前做过类似的事情一样,我经历过霍夫变换,但是对于我的情况来说,使用轮廓比使用轮廓要困难得多。我有以下建议可以帮助您入门:

  1. Generally paper (edges, at least) is white, so you may have better luck by going to a colorspace like YUV which better separates luminosity:

    通常纸张(至少是边缘)是白色的,所以你可以通过像YUV这样更好地分离光度的颜色空间来获得更好的运气:

    image_yuv = cv2.cvtColor(image,cv2.COLOR_BGR2YUV)
    image_y = np.zeros(image_yuv.shape[0:2],np.uint8)
    image_y[:,:] = image_yuv[:,:,0]
    
  2. The text on the paper is a problem. Use a blurring effect, to (hopefully) remove these high frequency noises. You may also use morphological operations like dilation as well.

    文件上的文字是一个问题。使用模糊效果,(希望)消除这些高频噪音。您也可以使用扩张等形态学操作。

    image_blurred = cv2.GaussianBlur(image_y,(3,3),0)
    
  3. You may try to apply a canny edge-detector, rather than a simple threshold. Not necessarily, but may help you:

    您可以尝试应用canny边缘检测器,而不是简单的阈值。不一定,但可以帮助你:

     edges = cv2.Canny(image_blurred,100,300,apertureSize = 3)
    
  4. Then find the contours. In my case I only used the extreme outer contours. You may use CHAIN_APPROX_SIMPLE flag to compress the contour

    然后找到轮廓。在我的情况下,我只使用极端外轮廓。您可以使用CHAIN_APPROX_SIMPLE标志来压缩轮廓

    contours,hierarchy = cv2.findContours(edges,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
    
  5. Now you should have a bunch of contours. Time to find the right ones. For each contour cnt, first find the convex hull, then use approaxPolyDP to simplify the contour as much as possible.

    现在你应该有一堆轮廓。是时候找到合适的人了。对于每个轮廓cnt,首先找到凸包,然后使用approaxPolyDP尽可能简化轮廓。

    hull = cv2.convexHull(cnt)
    simplified_cnt = cv2.approxPolyDP(hull,0.001*cv2.arcLength(hull,True),True)
    
  6. Now we should use this simplified contour to find the enclosing quadrilateral. You may experiment with lots of rules you come up with. The simplest method is picking the four longest longest segments of the contour, and then create the enclosing quadrilateral by intersecting these four lines. Based on your case, you can find these lines based on the contrast the line makes, the angle they make and similar things.

    现在我们应该使用这个简化的轮廓来找到封闭的四边形。您可以尝试一些您提出的规则。最简单的方法是选取轮廓的四个最长的段,然后通过交叉这四条线来创建包围的四边形。根据您的情况,您可以根据线条的对比度,它们所构成的角度以及类似的东西找到这些线条。

  7. Now you have a bunch of quadrilaterals. You can now perform a two step method to find your required quadrilateral. First you remove those ones that are probably wrong. For example one angle of the quadrilateral is more than 175 degrees. Then you can pick the one with the biggest area as the final result. You can see the orange contour as one of the results I got at this point: 如何从图像中裁剪出最大的矩形

    现在你有一堆四边形。您现在可以执行两步法查找所需的四边形。首先,删除可能错误的那些。例如,四边形的一个角度超过175度。然后你可以选择面积最大的那个作为最终结果。你可以看到橙色轮廓是我此时得到的结果之一:

  8. The final step after finding (hopefully) the right quadrilateral, is transforming back to a rectangle. For this you can use findHomography to come up with a transformation matrix.

    找到(希望)正确的四边形之后的最后一步,正在转变为一个矩形。为此,您可以使用findHomography来提出转换矩阵。

    (H,mask) = cv2.findHomography(cnt.astype('single'),np.array([[[0., 0.]],[[2150., 0.]],[[2150., 2800.]],[[0.,2800.]]],dtype=np.single))
    

    The numbers assume projecting to letter paper. You may come up with better and more clever numbers to use. You also need to reorder the contour points to match the order of coordinates of the letter paper. Then you call warpPerspective to create the final image:

    这些数字假设投射到信纸上。您可能会想出更好,更聪明的数字。您还需要重新排序轮廓点以匹配信纸的坐标顺序。然后调用warpPerspective来创建最终图像:

    final_image = cv2.warpPerspective(image,H,(2150, 2800))
    

    This warping should result in something like the following (from my results before): 如何从图像中裁剪出最大的矩形

    这种扭曲应该导致类似下面的事情(从我之前的结果):

I hope this helps you to find an appropriate approach in your case.

我希望这有助于您在您的案例中找到合适的方法。

#2


10  

That's a pretty complicated task which cannot be solved by simply searching contours. The Economist cover for example only shows 1 edge of the magazine which splits the image in half. How should your computer know which one is the magazine and which one is the table? So you have to add much more intelligence to your program.

这是一项非常复杂的任务,只能通过搜索轮廓来解决。例如,“经济学人”封面仅显示了杂志的1个边缘,将图像分成两半。您的计算机应该如何知道哪一个是杂志,哪一个是表?因此,您必须为您的程序添加更多智能。

You might look for lines in your image. Hough transform for example. Then find sets of more or less parallel or orthogonal lines, lines of a certain length... Find prints by checking for typical print colours or colours that you usually don't find on a table. Search for high contrast frequencies as created by printed texts... Imagine how you as a human recognize a printed paper...

您可能会在图像中查找线条。霍夫变换例如。然后找到或多或少平行或正交的线条,一定长度的线条......通过检查通常在桌面上找不到的典型印刷颜色或颜色来查找印刷品。搜索由印刷文本创建的高对比度频率...想象一下,作为一个人,你如何认识一张印刷纸......

All in all this is a too broad question for *. Try to break it down into smaller sub-problems, try to solve them and if you hit a wall, come back here.

总而言之,对于*来说,这是一个过于宽泛的问题。尝试将其分解为较小的子问题,尝试解决它们,如果你碰壁,请回到这里。

#1


27  

As I have previously done something similar, I have experienced with hough transforms, but they were much harder to get right for my case than using contours. I have the following suggestions to help you get started:

正如我以前做过类似的事情一样,我经历过霍夫变换,但是对于我的情况来说,使用轮廓比使用轮廓要困难得多。我有以下建议可以帮助您入门:

  1. Generally paper (edges, at least) is white, so you may have better luck by going to a colorspace like YUV which better separates luminosity:

    通常纸张(至少是边缘)是白色的,所以你可以通过像YUV这样更好地分离光度的颜色空间来获得更好的运气:

    image_yuv = cv2.cvtColor(image,cv2.COLOR_BGR2YUV)
    image_y = np.zeros(image_yuv.shape[0:2],np.uint8)
    image_y[:,:] = image_yuv[:,:,0]
    
  2. The text on the paper is a problem. Use a blurring effect, to (hopefully) remove these high frequency noises. You may also use morphological operations like dilation as well.

    文件上的文字是一个问题。使用模糊效果,(希望)消除这些高频噪音。您也可以使用扩张等形态学操作。

    image_blurred = cv2.GaussianBlur(image_y,(3,3),0)
    
  3. You may try to apply a canny edge-detector, rather than a simple threshold. Not necessarily, but may help you:

    您可以尝试应用canny边缘检测器,而不是简单的阈值。不一定,但可以帮助你:

     edges = cv2.Canny(image_blurred,100,300,apertureSize = 3)
    
  4. Then find the contours. In my case I only used the extreme outer contours. You may use CHAIN_APPROX_SIMPLE flag to compress the contour

    然后找到轮廓。在我的情况下,我只使用极端外轮廓。您可以使用CHAIN_APPROX_SIMPLE标志来压缩轮廓

    contours,hierarchy = cv2.findContours(edges,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
    
  5. Now you should have a bunch of contours. Time to find the right ones. For each contour cnt, first find the convex hull, then use approaxPolyDP to simplify the contour as much as possible.

    现在你应该有一堆轮廓。是时候找到合适的人了。对于每个轮廓cnt,首先找到凸包,然后使用approaxPolyDP尽可能简化轮廓。

    hull = cv2.convexHull(cnt)
    simplified_cnt = cv2.approxPolyDP(hull,0.001*cv2.arcLength(hull,True),True)
    
  6. Now we should use this simplified contour to find the enclosing quadrilateral. You may experiment with lots of rules you come up with. The simplest method is picking the four longest longest segments of the contour, and then create the enclosing quadrilateral by intersecting these four lines. Based on your case, you can find these lines based on the contrast the line makes, the angle they make and similar things.

    现在我们应该使用这个简化的轮廓来找到封闭的四边形。您可以尝试一些您提出的规则。最简单的方法是选取轮廓的四个最长的段,然后通过交叉这四条线来创建包围的四边形。根据您的情况,您可以根据线条的对比度,它们所构成的角度以及类似的东西找到这些线条。

  7. Now you have a bunch of quadrilaterals. You can now perform a two step method to find your required quadrilateral. First you remove those ones that are probably wrong. For example one angle of the quadrilateral is more than 175 degrees. Then you can pick the one with the biggest area as the final result. You can see the orange contour as one of the results I got at this point: 如何从图像中裁剪出最大的矩形

    现在你有一堆四边形。您现在可以执行两步法查找所需的四边形。首先,删除可能错误的那些。例如,四边形的一个角度超过175度。然后你可以选择面积最大的那个作为最终结果。你可以看到橙色轮廓是我此时得到的结果之一:

  8. The final step after finding (hopefully) the right quadrilateral, is transforming back to a rectangle. For this you can use findHomography to come up with a transformation matrix.

    找到(希望)正确的四边形之后的最后一步,正在转变为一个矩形。为此,您可以使用findHomography来提出转换矩阵。

    (H,mask) = cv2.findHomography(cnt.astype('single'),np.array([[[0., 0.]],[[2150., 0.]],[[2150., 2800.]],[[0.,2800.]]],dtype=np.single))
    

    The numbers assume projecting to letter paper. You may come up with better and more clever numbers to use. You also need to reorder the contour points to match the order of coordinates of the letter paper. Then you call warpPerspective to create the final image:

    这些数字假设投射到信纸上。您可能会想出更好,更聪明的数字。您还需要重新排序轮廓点以匹配信纸的坐标顺序。然后调用warpPerspective来创建最终图像:

    final_image = cv2.warpPerspective(image,H,(2150, 2800))
    

    This warping should result in something like the following (from my results before): 如何从图像中裁剪出最大的矩形

    这种扭曲应该导致类似下面的事情(从我之前的结果):

I hope this helps you to find an appropriate approach in your case.

我希望这有助于您在您的案例中找到合适的方法。

#2


10  

That's a pretty complicated task which cannot be solved by simply searching contours. The Economist cover for example only shows 1 edge of the magazine which splits the image in half. How should your computer know which one is the magazine and which one is the table? So you have to add much more intelligence to your program.

这是一项非常复杂的任务,只能通过搜索轮廓来解决。例如,“经济学人”封面仅显示了杂志的1个边缘,将图像分成两半。您的计算机应该如何知道哪一个是杂志,哪一个是表?因此,您必须为您的程序添加更多智能。

You might look for lines in your image. Hough transform for example. Then find sets of more or less parallel or orthogonal lines, lines of a certain length... Find prints by checking for typical print colours or colours that you usually don't find on a table. Search for high contrast frequencies as created by printed texts... Imagine how you as a human recognize a printed paper...

您可能会在图像中查找线条。霍夫变换例如。然后找到或多或少平行或正交的线条,一定长度的线条......通过检查通常在桌面上找不到的典型印刷颜色或颜色来查找印刷品。搜索由印刷文本创建的高对比度频率...想象一下,作为一个人,你如何认识一张印刷纸......

All in all this is a too broad question for *. Try to break it down into smaller sub-problems, try to solve them and if you hit a wall, come back here.

总而言之,对于*来说,这是一个过于宽泛的问题。尝试将其分解为较小的子问题,尝试解决它们,如果你碰壁,请回到这里。