I have many scanned document in PDF.
我有很多PDF扫描文档。
I use ImageMagick with Ghostscript to convert PDF to PNG in big density. I use convert -density 288 2.pdf 2.png
. After that I read the pixels with PHP and find where is QR code and decode it. Because image is very big (~ 2500px), it's need very much RAM. I want, before I read pixels with PHP, to crop the image with ImageMagick and leave only that part with the QR code.
我使用ImageMagick和Ghostscript将PDF转换为密度大的PNG。我使用convert -density 288 2.pdf 2.png。之后,我用PHP读取像素,找到QR码的位置并对其进行解码。因为图像非常大(~2500px),所以需要非常大的RAM。在用PHP读取像素之前,我希望用ImageMagick裁剪图像,只留下带有QR码的部分。
Can I detect the approximate location of QR code with ImageMagick, crop and leave only that part ?
我可以使用ImageMagick检测QR码的大致位置,裁剪并仅留下该部分吗?
1 个解决方案
#1
Further Update
I see your discussion with Kurt about better extraction of the image from the PDF in the first place, and his recommendation was to use pdfimages
. I just wanted to add that you won't find that if you do brew search pdfimages
, but you actually need to use
我看到你与Kurt的讨论,首先要从PDF中更好地提取图像,他的建议是使用pdfimages。我只想补充一点,如果你酿造搜索pdfimages,你就不会发现,但实际上你需要使用它
brew install poppler
and then you get the pdfimages
executable.
然后你得到pdfimages可执行文件。
Updated Answer
If you change the tile size to 100x100 on the crop command and run this for the second PDF you supplied:
如果您在裁剪命令上将拼贴大小更改为100x100,并为您提供的第二个PDF运行此操作:
convert -density 288 pdf2.pdf -crop 100x100 tile%04d.png
and then use the same entropy analysis command
然后使用相同的熵分析命令
convert -format "%[entropy]:%X%Y:%f\n" tile*.png info: | sort -n
...
...
0.84432:+600+3100:tile0750.png
0.846019:+600+2800:tile0678.png
0.980938:+700+400:tile0103.png
0.984906:+700+500:tile0127.png
0.988808:+600+400:tile0102.png
0.998365:+600+500:tile0126.png
The last 4 listed tiles are
最后列出的4个瓷砖是
Likewise for the other PDF file you supplied, you get
同样,对于您提供的其他PDF文件,您也可以获得
0.863498:+1900+500:tile0139.png
0.954581:+2000+500:tile0140.png
0.974077:+1900+600:tile0163.png
0.97671:+2000+600:tile0164.png
which means these tiles
这意味着这些瓷砖
I would think that should help you pretty much approximately locate the QR code.
我认为这应该可以帮助你几乎找到QR码。
Original Answer
This is not all that scientific, but it may help you get started. The key, I think, is the entropy of the various areas of the image. The QR code has a lot of information encoded in a small area so it should have high entropy. So, I use ImageMagick to split the image into square 400x400 tiles like this:
这不是那么科学,但它可能会帮助你开始。我认为,关键是图像各个区域的熵。 QR码有很多信息在一个小区域内编码,所以它应该有很高的熵。所以,我使用ImageMagick将图像分割成方形的400x400瓷砖,如下所示:
convert image.png -crop 400x400 tile%03d.png
which gives me 54 tiles. Then I calculate the entropy of each of the tiles and sort them by increasing entropy, also outputting their offsets from the top left of the frame, and their name, like this:
这给了我54块瓷砖。然后我计算每个瓦片的熵并通过增加熵对它们进行排序,同时从帧的左上角输出它们的偏移量,以及它们的名称,如下所示:
convert -format "%[entropy]:%X%Y:%f\n" tile*.png info: | sort -n
0.00408949:+1200+2800:tile045.png
0.00473755:+1600+2800:tile046.png
0.00944815:+800+2800:tile044.png
0.0142171:+1200+3200:tile051.png
0.0143607:+1600+3200:tile052.png
0.0341039:+400+2800:tile043.png
0.0349564:+800+3200:tile050.png
0.0359226:+800+0:tile002.png
0.0549334:+800+400:tile008.png
0.0556793:+400+3200:tile049.png
0.0589632:+400+0:tile001.png
0.0649078:+1200+0:tile003.png
0.10811:+1200+400:tile009.png
0.116287:+2000+3200:tile053.png
0.120092:+800+800:tile014.png
0.12454:+0+2800:tile042.png
0.125963:+1600+0:tile004.png
0.128795:+800+1200:tile020.png
0.133506:+0+400:tile006.png
0.139894:+1600+400:tile010.png
0.143205:+2000+2800:tile047.png
0.144552:+400+2400:tile037.png
0.153143:+0+0:tile000.png
0.154167:+400+400:tile007.png
0.173786:+0+2400:tile036.png
0.17545:+400+1600:tile025.png
0.193964:+2000+400:tile011.png
0.209993:+0+3200:tile048.png
0.211954:+1200+800:tile015.png
0.215337:+400+2000:tile031.png
0.218159:+800+1600:tile026.png
0.230095:+2000+1200:tile023.png
0.237791:+2000+0:tile005.png
0.239336:+2000+1600:tile029.png
0.24275:+800+2400:tile038.png
0.244751:+0+2000:tile030.png
0.254958:+800+2000:tile032.png
0.271722:+2000+2000:tile035.png
0.275329:+0+1600:tile024.png
0.278992:+2000+800:tile017.png
0.282241:+400+1200:tile019.png
0.285228:+1200+1200:tile021.png
0.290524:+400+800:tile013.png
0.320734:+0+800:tile012.png
0.330168:+1600+2000:tile034.png
0.360795:+1200+2000:tile033.png
0.391519:+0+1200:tile018.png
0.421396:+1200+1600:tile027.png
0.421421:+2000+2400:tile041.png
0.421696:+1600+2400:tile040.png
0.486866:+1600+1600:tile028.png
0.489479:+1600+800:tile016.png
0.611449:+1600+1200:tile022.png
0.674079:+1200+2400:tile039.png
and, hey presto, the last one listed (i.e. the one with the highest entropy) tile039.png
is this one.
并且,嘿presto,列出的最后一个(即具有最高熵的那个)tile039.png就是这个。
I have drawn a rectangle around its location using this command
我使用此命令在其位置周围绘制了一个矩形
convert image.png -stroke red -fill none -strokewidth 3 -draw "rectangle 1200,2400 1600,2800" a.jpg
I concede there may be luck involved, but I only have one image to test my mad theories. You may need to tile twice, the second time with an x-offset and y-offset of half a tile width, so that you don't cut the QR code and split it across 2 tiles. You may need different size tiles for different size barcodes. You may need to consider the last 3-5 tiles located for your next algorithm. But I think it could form the basis of a method.
我承认可能有运气,但我只有一个图像来测试我疯狂的理论。您可能需要平铺两次,第二次使用x偏移和y偏移为半个拼贴宽度,这样您就不会剪切二维码并将其拆分为两个拼贴。对于不同尺寸的条形码,您可能需要不同尺寸的瓷砖。您可能需要考虑为您的下一个算法定位的最后3-5个磁贴。但我认为它可以构成一种方法的基础。
#1
Further Update
I see your discussion with Kurt about better extraction of the image from the PDF in the first place, and his recommendation was to use pdfimages
. I just wanted to add that you won't find that if you do brew search pdfimages
, but you actually need to use
我看到你与Kurt的讨论,首先要从PDF中更好地提取图像,他的建议是使用pdfimages。我只想补充一点,如果你酿造搜索pdfimages,你就不会发现,但实际上你需要使用它
brew install poppler
and then you get the pdfimages
executable.
然后你得到pdfimages可执行文件。
Updated Answer
If you change the tile size to 100x100 on the crop command and run this for the second PDF you supplied:
如果您在裁剪命令上将拼贴大小更改为100x100,并为您提供的第二个PDF运行此操作:
convert -density 288 pdf2.pdf -crop 100x100 tile%04d.png
and then use the same entropy analysis command
然后使用相同的熵分析命令
convert -format "%[entropy]:%X%Y:%f\n" tile*.png info: | sort -n
...
...
0.84432:+600+3100:tile0750.png
0.846019:+600+2800:tile0678.png
0.980938:+700+400:tile0103.png
0.984906:+700+500:tile0127.png
0.988808:+600+400:tile0102.png
0.998365:+600+500:tile0126.png
The last 4 listed tiles are
最后列出的4个瓷砖是
Likewise for the other PDF file you supplied, you get
同样,对于您提供的其他PDF文件,您也可以获得
0.863498:+1900+500:tile0139.png
0.954581:+2000+500:tile0140.png
0.974077:+1900+600:tile0163.png
0.97671:+2000+600:tile0164.png
which means these tiles
这意味着这些瓷砖
I would think that should help you pretty much approximately locate the QR code.
我认为这应该可以帮助你几乎找到QR码。
Original Answer
This is not all that scientific, but it may help you get started. The key, I think, is the entropy of the various areas of the image. The QR code has a lot of information encoded in a small area so it should have high entropy. So, I use ImageMagick to split the image into square 400x400 tiles like this:
这不是那么科学,但它可能会帮助你开始。我认为,关键是图像各个区域的熵。 QR码有很多信息在一个小区域内编码,所以它应该有很高的熵。所以,我使用ImageMagick将图像分割成方形的400x400瓷砖,如下所示:
convert image.png -crop 400x400 tile%03d.png
which gives me 54 tiles. Then I calculate the entropy of each of the tiles and sort them by increasing entropy, also outputting their offsets from the top left of the frame, and their name, like this:
这给了我54块瓷砖。然后我计算每个瓦片的熵并通过增加熵对它们进行排序,同时从帧的左上角输出它们的偏移量,以及它们的名称,如下所示:
convert -format "%[entropy]:%X%Y:%f\n" tile*.png info: | sort -n
0.00408949:+1200+2800:tile045.png
0.00473755:+1600+2800:tile046.png
0.00944815:+800+2800:tile044.png
0.0142171:+1200+3200:tile051.png
0.0143607:+1600+3200:tile052.png
0.0341039:+400+2800:tile043.png
0.0349564:+800+3200:tile050.png
0.0359226:+800+0:tile002.png
0.0549334:+800+400:tile008.png
0.0556793:+400+3200:tile049.png
0.0589632:+400+0:tile001.png
0.0649078:+1200+0:tile003.png
0.10811:+1200+400:tile009.png
0.116287:+2000+3200:tile053.png
0.120092:+800+800:tile014.png
0.12454:+0+2800:tile042.png
0.125963:+1600+0:tile004.png
0.128795:+800+1200:tile020.png
0.133506:+0+400:tile006.png
0.139894:+1600+400:tile010.png
0.143205:+2000+2800:tile047.png
0.144552:+400+2400:tile037.png
0.153143:+0+0:tile000.png
0.154167:+400+400:tile007.png
0.173786:+0+2400:tile036.png
0.17545:+400+1600:tile025.png
0.193964:+2000+400:tile011.png
0.209993:+0+3200:tile048.png
0.211954:+1200+800:tile015.png
0.215337:+400+2000:tile031.png
0.218159:+800+1600:tile026.png
0.230095:+2000+1200:tile023.png
0.237791:+2000+0:tile005.png
0.239336:+2000+1600:tile029.png
0.24275:+800+2400:tile038.png
0.244751:+0+2000:tile030.png
0.254958:+800+2000:tile032.png
0.271722:+2000+2000:tile035.png
0.275329:+0+1600:tile024.png
0.278992:+2000+800:tile017.png
0.282241:+400+1200:tile019.png
0.285228:+1200+1200:tile021.png
0.290524:+400+800:tile013.png
0.320734:+0+800:tile012.png
0.330168:+1600+2000:tile034.png
0.360795:+1200+2000:tile033.png
0.391519:+0+1200:tile018.png
0.421396:+1200+1600:tile027.png
0.421421:+2000+2400:tile041.png
0.421696:+1600+2400:tile040.png
0.486866:+1600+1600:tile028.png
0.489479:+1600+800:tile016.png
0.611449:+1600+1200:tile022.png
0.674079:+1200+2400:tile039.png
and, hey presto, the last one listed (i.e. the one with the highest entropy) tile039.png
is this one.
并且,嘿presto,列出的最后一个(即具有最高熵的那个)tile039.png就是这个。
I have drawn a rectangle around its location using this command
我使用此命令在其位置周围绘制了一个矩形
convert image.png -stroke red -fill none -strokewidth 3 -draw "rectangle 1200,2400 1600,2800" a.jpg
I concede there may be luck involved, but I only have one image to test my mad theories. You may need to tile twice, the second time with an x-offset and y-offset of half a tile width, so that you don't cut the QR code and split it across 2 tiles. You may need different size tiles for different size barcodes. You may need to consider the last 3-5 tiles located for your next algorithm. But I think it could form the basis of a method.
我承认可能有运气,但我只有一个图像来测试我疯狂的理论。您可能需要平铺两次,第二次使用x偏移和y偏移为半个拼贴宽度,这样您就不会剪切二维码并将其拆分为两个拼贴。对于不同尺寸的条形码,您可能需要不同尺寸的瓷砖。您可能需要考虑为您的下一个算法定位的最后3-5个磁贴。但我认为它可以构成一种方法的基础。