如何编写文本搜索和替换PDF文件

时间:2022-03-30 16:51:21

How would I be able to programmatically search and replace some text in a large number of PDF files? I would like to remove a URL that has been added to a set of files. I have been able to remove the link using javascript under Batch Processing in Adobe Pro, but the link text remains. I have seen recommendations to use text touchup, which works manually, but I don't want to modify 1300 files manually.

我如何能够以编程方式搜索和替换大量PDF文件中的某些文本?我想删除已添加到一组文件的URL。我已经能够在Adobe Pro的批处理下使用javascript删除链接,但链接文本仍然存在。我已经看到使用文本touchup的建议,它可以手动工作,但我不想手动修改1300个文件。

8 个解决方案

#1


15  

Finding text in a PDF can be inherently hard because of the graphical nature of the document format -- the letters you are searching for may not be contiguous in the file. That said, CAM::PDF has some search-replace capabilities and heuristics. Give changepagestring.pl a try and see if it works on your PDFs.

由于文档格式的图形性质,在PDF中查找文本本质上很难 - 您搜索的字母在文件中可能不是连续的。也就是说,CAM :: PDF具有一些搜索替换功能和启发式功能。尝试更换changepagestring.pl,看看它是否适用于您的PDF。

#2


5  

I have also become desperate. After 10 PDF Editor installations which all cost money, and no success:

我也变得绝望了。经过10次PDF编辑器安装后,所有这些都需要花钱,但没有成功:

pdftk + editor suffice:

pdftk +编辑器就足够了:

Replace Text in PDF Files

替换PDF文件中的文本

  • Use pdftk to uncompress PDF page streams

    使用pdftk解压缩PDF页面流

    pdftk original.pdf output original.clear.pdf uncompress

    pdftk original.pdf输出original.clear.pdf解压缩

  • Replace the text (sometimes this works, sometimes it doesn't).

    替换文本(有时这是有效的,有时它不会)。

  • Repair the modified (and now broken) PDF

    修复已修改(现在已损坏)的PDF

    pdftk original.clear.pdf output original.clear.fixed.pdf

    pdftk original.clear.pdf输出original.clear.fixed.pdf

(from Joel Dare)

(来自Joel Dare)

#3


2  

Not sure I would want to do all the work to write the code to modify your 1300 files when there is a program that can do it for you. The other day, I used the Professional version of Infix to batch modify almost 100 files using its "Find and Replace in Files" feature. It works great. I have evaluated other programs in hopes finding an find and replace functionality similar to Microsoft Word. Infix was the only one I found that can do it. Check out: http://www.iceni.com/infix-pro.htm

当有一个可以为你做的程序时,不确定我是否想要编写代码来修改你的1300文件。前几天,我使用专业版的Infix批量修改了近100个文件,使用“查找和替换文件”功能。它很棒。我已经评估了其他程序,希望找到一个类似于Microsoft Word的查找和替换功能。 Infix是我发现的唯一可以做到的人。查看:http://www.iceni.com/infix-pro.htm

#4


2  

You can use the 'redaction' feature in Adobe Acrobat Pro to find & replace all references in a single document in one step...not sure if it can be automated to multiple steps.

您可以使用Adobe Acrobat Pro中的“编辑”功能在一个步骤中查找和替换单个文档中的所有引用...不确定它是否可以自动执行多个步骤。

http://help.adobe.com/en_US/Acrobat/9.0/Professional/WS5E28D332-9FF7-4569-AFAD-79AD60092D4D.w.html

#5


1  

I just finished trying out infix for a text that is comprised of text ladened with diacritics with the hope of generating another text where characters with double and composed diacritics are replaced by alternate with single diacritics. Infix is such definitely a good solution for someone who does not care for the trouble of understanding the working of programmatic solutions. All the request changes were effected. Still need to understand how to effect reflow of words that change the layout of text.

我刚刚尝试了一个由文本构成的文本,这个文本由变音符号组成,希望生成另一个文本,其中带有双重和复合变音符号的字符被替换为单个变音符号。对于那些不关心理解程序化解决方案工作的人而言,Infix绝对是一个很好的解决方案。所有请求更改都已生效。仍然需要了解如何影响改变文本布局的单词的重排。

#6


1  

This is just half a solution but I used Touch up combined with AppleScript's support for sending keystrokes to replace a string in thousands of table cells. Depending on how your pages are layout it could work for you. In my case I had to manually insert the cursor in the beginning of every table (tens of tables - quite manageable for a manual process) but after that i replaced thousands of cells automatically.

这只是解决方案的一半,但我使用Touch up结合AppleScript支持发送击键来替换数千个表格单元格中的字符串。根据页面布局的不同,它可能适合您。在我的情况下,我不得不手动将光标插入每个表的开头(数十个表 - 对于手动过程非常易于管理),但之后我自动替换了数千个单元格。

#7


1  

The question is for a programmatic solution, but I will still share this free online tool which helped me mass replace text in some PDF files:

问题是程序化解决方案,但我仍然会分享这个免费的在线工具,它帮助我大量替换某些PDF文件中的文本:

http://www.pdfdu.com/pdf-replace-text.aspx

I did not notice any ads or other modifications in the resulting PDF files after replacing the text.

替换文本后,我在结果PDF文件中没有注意到任何广告或其他修改。

I was not able to make the changes locally with the software I tried. I think the main problem was that I was missing the font used in the PDF and it did not work properly, even with Acrobat Pro. The online tool did not complain and produced a great result.

我无法使用我尝试过的软件在本地进行更改。我认为主要问题是我错过了PDF中使用的字体,即使使用Acrobat Pro也无法正常工作。在线工具没有抱怨并产生了很好的结果。

#8


0  

I suggest you may use VeryPDF PDF Text Replacer Command Line software to batch replace text in PDF pages, you can run pdftr.exe to replace text in PDF pages easily, for example,

我建议您可以使用VeryPDF PDF Text Replacer命令行软件批量替换PDF页面中的文本,您可以运行pdftr.exe轻松替换PDF页面中的文本,例如,

pdftr.exe -contentreplace "My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -contentreplace“我的名字=>你的名字”D:\ in.pdf D:\ out.pdf

pdftr.exe -searchandoverlaytext "My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -searchandoverlaytext“我的名字=>你的名字”D:\ in.pdf D:\ out.pdf

pdftr.exe -searchandoverlaytext "My Name=>D:\temp\myname.png*20*20" D:\in.pdf D:\out.pdf

pdftr.exe -searchandoverlaytext“我的名字=> D:\ temp \ myname.png * 20 * 20”D:\ in.pdf D:\ out.pdf

pdftr.exe -pagerange 1-3 -contentreplace "Old Text=>New Text||VeryPDF=>VeryDOC||My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -pagerange 1-3 -contentreplace“Old Text => New Text || VeryPDF => VeryDOC || My Name => Your Name”D:\ in.pdf D:\ out.pdf

pdftr.exe -searchtext "string" C:\in.pdf

pdftr.exe -searchtext“string”C:\ in.pdf

pdftr.exe -pagerange 1 -searchtext "string" C:\in.pdf

pdftr.exe -pagerange 1 -searchtext“string”C:\ in.pdf

pdftr.exe -pagerange 1 -searchandoverlaytext "Old Text=>New Text||VeryPDF=>VeryDOC||My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -pagerange 1 -searchandoverlaytext“Old Text => New Text || VeryPDF => VeryDOC || My Name => Your Name”D:\ in.pdf D:\ out.pdf

pdftr.exe -overlaytextfontname "Arial" -overlaytextcolor FF0000 -overlaybgcolor 00FF00 -searchandoverlaytext "Old Text=>New Text||VeryPDF=>VeryDOC||My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -overlaytextfontname“Arial”-overlaytextcolor FF0000 -overlaybgcolor 00FF00 -searchandoverlaytext“Old Text => New Text || VeryPDF => VeryDOC || My Name => Your Name”D:\ in.pdf D:\ out.pdf

pdftr.exe -opw 123 -upw 456 -contentreplace "Old Text=>New Text||VeryPDF=>VeryDOC||My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -opw 123 -upw 456 -contentreplace“Old Text => New Text || VeryPDF => VeryDOC || My Name => Your Name”D:\ in.pdf D:\ out.pdf

pdftr.exe -searchandoverlaytext "PDFcamp Printer=>VeryPDF Printer" -overlaytextfontsize 8 D:\in.pdf D:\out.pdf

pdftr.exe -searchandoverlaytext“PDFcamp Printer => VeryPDF Printer”-overlaytextfontsize 8 D:\ in.pdf D:\ out.pdf

pdftr.exe -searchandoverlaytext "PDFcamp Printer=>VeryPDF Printer" -overlaytextfontsize 80% D:\in.pdf D:\out.pdf

pdftr.exe -searchandoverlaytext“PDFcamp Printer => VeryPDF Printer”-overlaytextfontsize 80%D:\ in.pdf D:\ out.pdf

#1


15  

Finding text in a PDF can be inherently hard because of the graphical nature of the document format -- the letters you are searching for may not be contiguous in the file. That said, CAM::PDF has some search-replace capabilities and heuristics. Give changepagestring.pl a try and see if it works on your PDFs.

由于文档格式的图形性质,在PDF中查找文本本质上很难 - 您搜索的字母在文件中可能不是连续的。也就是说,CAM :: PDF具有一些搜索替换功能和启发式功能。尝试更换changepagestring.pl,看看它是否适用于您的PDF。

#2


5  

I have also become desperate. After 10 PDF Editor installations which all cost money, and no success:

我也变得绝望了。经过10次PDF编辑器安装后,所有这些都需要花钱,但没有成功:

pdftk + editor suffice:

pdftk +编辑器就足够了:

Replace Text in PDF Files

替换PDF文件中的文本

  • Use pdftk to uncompress PDF page streams

    使用pdftk解压缩PDF页面流

    pdftk original.pdf output original.clear.pdf uncompress

    pdftk original.pdf输出original.clear.pdf解压缩

  • Replace the text (sometimes this works, sometimes it doesn't).

    替换文本(有时这是有效的,有时它不会)。

  • Repair the modified (and now broken) PDF

    修复已修改(现在已损坏)的PDF

    pdftk original.clear.pdf output original.clear.fixed.pdf

    pdftk original.clear.pdf输出original.clear.fixed.pdf

(from Joel Dare)

(来自Joel Dare)

#3


2  

Not sure I would want to do all the work to write the code to modify your 1300 files when there is a program that can do it for you. The other day, I used the Professional version of Infix to batch modify almost 100 files using its "Find and Replace in Files" feature. It works great. I have evaluated other programs in hopes finding an find and replace functionality similar to Microsoft Word. Infix was the only one I found that can do it. Check out: http://www.iceni.com/infix-pro.htm

当有一个可以为你做的程序时,不确定我是否想要编写代码来修改你的1300文件。前几天,我使用专业版的Infix批量修改了近100个文件,使用“查找和替换文件”功能。它很棒。我已经评估了其他程序,希望找到一个类似于Microsoft Word的查找和替换功能。 Infix是我发现的唯一可以做到的人。查看:http://www.iceni.com/infix-pro.htm

#4


2  

You can use the 'redaction' feature in Adobe Acrobat Pro to find & replace all references in a single document in one step...not sure if it can be automated to multiple steps.

您可以使用Adobe Acrobat Pro中的“编辑”功能在一个步骤中查找和替换单个文档中的所有引用...不确定它是否可以自动执行多个步骤。

http://help.adobe.com/en_US/Acrobat/9.0/Professional/WS5E28D332-9FF7-4569-AFAD-79AD60092D4D.w.html

#5


1  

I just finished trying out infix for a text that is comprised of text ladened with diacritics with the hope of generating another text where characters with double and composed diacritics are replaced by alternate with single diacritics. Infix is such definitely a good solution for someone who does not care for the trouble of understanding the working of programmatic solutions. All the request changes were effected. Still need to understand how to effect reflow of words that change the layout of text.

我刚刚尝试了一个由文本构成的文本,这个文本由变音符号组成,希望生成另一个文本,其中带有双重和复合变音符号的字符被替换为单个变音符号。对于那些不关心理解程序化解决方案工作的人而言,Infix绝对是一个很好的解决方案。所有请求更改都已生效。仍然需要了解如何影响改变文本布局的单词的重排。

#6


1  

This is just half a solution but I used Touch up combined with AppleScript's support for sending keystrokes to replace a string in thousands of table cells. Depending on how your pages are layout it could work for you. In my case I had to manually insert the cursor in the beginning of every table (tens of tables - quite manageable for a manual process) but after that i replaced thousands of cells automatically.

这只是解决方案的一半,但我使用Touch up结合AppleScript支持发送击键来替换数千个表格单元格中的字符串。根据页面布局的不同,它可能适合您。在我的情况下,我不得不手动将光标插入每个表的开头(数十个表 - 对于手动过程非常易于管理),但之后我自动替换了数千个单元格。

#7


1  

The question is for a programmatic solution, but I will still share this free online tool which helped me mass replace text in some PDF files:

问题是程序化解决方案,但我仍然会分享这个免费的在线工具,它帮助我大量替换某些PDF文件中的文本:

http://www.pdfdu.com/pdf-replace-text.aspx

I did not notice any ads or other modifications in the resulting PDF files after replacing the text.

替换文本后,我在结果PDF文件中没有注意到任何广告或其他修改。

I was not able to make the changes locally with the software I tried. I think the main problem was that I was missing the font used in the PDF and it did not work properly, even with Acrobat Pro. The online tool did not complain and produced a great result.

我无法使用我尝试过的软件在本地进行更改。我认为主要问题是我错过了PDF中使用的字体,即使使用Acrobat Pro也无法正常工作。在线工具没有抱怨并产生了很好的结果。

#8


0  

I suggest you may use VeryPDF PDF Text Replacer Command Line software to batch replace text in PDF pages, you can run pdftr.exe to replace text in PDF pages easily, for example,

我建议您可以使用VeryPDF PDF Text Replacer命令行软件批量替换PDF页面中的文本,您可以运行pdftr.exe轻松替换PDF页面中的文本,例如,

pdftr.exe -contentreplace "My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -contentreplace“我的名字=>你的名字”D:\ in.pdf D:\ out.pdf

pdftr.exe -searchandoverlaytext "My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -searchandoverlaytext“我的名字=>你的名字”D:\ in.pdf D:\ out.pdf

pdftr.exe -searchandoverlaytext "My Name=>D:\temp\myname.png*20*20" D:\in.pdf D:\out.pdf

pdftr.exe -searchandoverlaytext“我的名字=> D:\ temp \ myname.png * 20 * 20”D:\ in.pdf D:\ out.pdf

pdftr.exe -pagerange 1-3 -contentreplace "Old Text=>New Text||VeryPDF=>VeryDOC||My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -pagerange 1-3 -contentreplace“Old Text => New Text || VeryPDF => VeryDOC || My Name => Your Name”D:\ in.pdf D:\ out.pdf

pdftr.exe -searchtext "string" C:\in.pdf

pdftr.exe -searchtext“string”C:\ in.pdf

pdftr.exe -pagerange 1 -searchtext "string" C:\in.pdf

pdftr.exe -pagerange 1 -searchtext“string”C:\ in.pdf

pdftr.exe -pagerange 1 -searchandoverlaytext "Old Text=>New Text||VeryPDF=>VeryDOC||My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -pagerange 1 -searchandoverlaytext“Old Text => New Text || VeryPDF => VeryDOC || My Name => Your Name”D:\ in.pdf D:\ out.pdf

pdftr.exe -overlaytextfontname "Arial" -overlaytextcolor FF0000 -overlaybgcolor 00FF00 -searchandoverlaytext "Old Text=>New Text||VeryPDF=>VeryDOC||My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -overlaytextfontname“Arial”-overlaytextcolor FF0000 -overlaybgcolor 00FF00 -searchandoverlaytext“Old Text => New Text || VeryPDF => VeryDOC || My Name => Your Name”D:\ in.pdf D:\ out.pdf

pdftr.exe -opw 123 -upw 456 -contentreplace "Old Text=>New Text||VeryPDF=>VeryDOC||My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -opw 123 -upw 456 -contentreplace“Old Text => New Text || VeryPDF => VeryDOC || My Name => Your Name”D:\ in.pdf D:\ out.pdf

pdftr.exe -searchandoverlaytext "PDFcamp Printer=>VeryPDF Printer" -overlaytextfontsize 8 D:\in.pdf D:\out.pdf

pdftr.exe -searchandoverlaytext“PDFcamp Printer => VeryPDF Printer”-overlaytextfontsize 8 D:\ in.pdf D:\ out.pdf

pdftr.exe -searchandoverlaytext "PDFcamp Printer=>VeryPDF Printer" -overlaytextfontsize 80% D:\in.pdf D:\out.pdf

pdftr.exe -searchandoverlaytext“PDFcamp Printer => VeryPDF Printer”-overlaytextfontsize 80%D:\ in.pdf D:\ out.pdf