无法在终端中搜索pdf文件的内容

时间:2022-09-12 23:05:23

I have pdf -files which contents I have not managed to search by any terminal program. I can only search them by Acrobat Reader and Skim.

我有pdf文件,我没有设法通过任何终端程序搜索的内容。我只能通过Acrobat Reader和Skim搜索它们。

How can you search contents of pdf -files in terminal?

如何在终端中搜索pdf -files的内容?

It seems that a better question is

似乎更好的问题是

How is the search done in the pdf viewers such as Acrobat Reader and Skim?

如何在PDF浏览器中完成搜索,例如Acrobat Reader和Skim?

Perhaps, I need to make such a search tool if no such tools exist.

也许,如果不存在这样的工具,我需要制作这样的搜索工具。

3 个解决方案

#1


Try installing xpdf from MacPorts; it is supposed to come with a tool called pdftotext which should then allow you to search using grep.

尝试从MacPorts安装xpdf;它应该带有一个名为pdftotext的工具,然后允许你使用grep进行搜索。

#2


pdftotext is indeed an excellent tool, but it produces very long lines; in order to grep you will want to break them up, e.g.,

pdftotext确实是一个很好的工具,但它产生很长的线条;为了grep你会想要打破它们,例如,

pdftotext drscheme.pdf - | fmt | grep -i spidey

#3


PDF files are usually compressed. PDF viewers such as Acrobat Reader and Skim search the contents by decompressing the PDF text into memory, and then searching that text. If you want to search from the command line, one possible suggestion is to use pdftk to decompress the PDF, and then use grep (or your favorite command line text searching utility) to find the desired text. For example:

PDF文件通常是压缩的。 Acrobat Reader和Skim等PDF查看器通过将PDF文本解压缩到内存中,然后搜索该文本来搜索内容。如果要从命令行搜索,可能的建议是使用pdftk解压缩PDF,然后使用grep(或您最喜欢的命令行文本搜索实用程序)来查找所需的文本。例如:

# Search for the text "text_to_search_for", and print out 3 lines of context
# above and below each match
pdftk mydoc.pdf output - uncompress | grep -C3 text_to_search_for

#1


Try installing xpdf from MacPorts; it is supposed to come with a tool called pdftotext which should then allow you to search using grep.

尝试从MacPorts安装xpdf;它应该带有一个名为pdftotext的工具,然后允许你使用grep进行搜索。

#2


pdftotext is indeed an excellent tool, but it produces very long lines; in order to grep you will want to break them up, e.g.,

pdftotext确实是一个很好的工具,但它产生很长的线条;为了grep你会想要打破它们,例如,

pdftotext drscheme.pdf - | fmt | grep -i spidey

#3


PDF files are usually compressed. PDF viewers such as Acrobat Reader and Skim search the contents by decompressing the PDF text into memory, and then searching that text. If you want to search from the command line, one possible suggestion is to use pdftk to decompress the PDF, and then use grep (or your favorite command line text searching utility) to find the desired text. For example:

PDF文件通常是压缩的。 Acrobat Reader和Skim等PDF查看器通过将PDF文本解压缩到内存中,然后搜索该文本来搜索内容。如果要从命令行搜索,可能的建议是使用pdftk解压缩PDF,然后使用grep(或您最喜欢的命令行文本搜索实用程序)来查找所需的文本。例如:

# Search for the text "text_to_search_for", and print out 3 lines of context
# above and below each match
pdftk mydoc.pdf output - uncompress | grep -C3 text_to_search_for