PHP正则表达式查找模式但只替换一个字符

时间:2021-09-15 19:26:15

I'm converting a PDF to text using xpdf pdf2text and it works great except for one thing: it converts paragraph symbols (¶) into the number 8. I need to find a way to get to everything with the pattern of:

我正在使用xpdf pdf2text将PDF转换为文本,除了一件事之外,它的工作原理很好:它将段落符号(¶)转换为数字8.我需要找到一种方法来使用以下模式获取所有内容:

preg_match_all('/\b8\d{1,2}-/', 'text');

but only replace the "8" from that pattern. I've tried saving the matches into an array, but them how do I re-insert them into the text where they belong?

但只能从该模式中替换“8”。我已经尝试将匹配保存到数组中,但是如何将它们重新插入到它们所属的文本中?

Ideally, the paragraph tag would just convert properly, but I've tried several different encodings with no success; I think some of the pdf's have embedded fonts.

理想情况下,段落标记只能正确转换,但我尝试了几种不同的编码但没有成功;我认为一些pdf有嵌入字体。

Any ideas on how I could replace just the "8" in that pattern? I can't just replace all 8's because the page or chapter of the article being referenced may be 8; but there is no danger of the paragraph being 80-something (which is why I check for a digit after the 8).

关于如何在该模式中替换“8”的任何想法?我不能只替换所有8个,因为被引用的文章的页面或章节可能是8;但是段落没有80-something的危险(这就是我在8之后检查一个数字的原因)。

Thanks.

1 个解决方案

#1


5  

Capture the rest of the pattern in a group and put it back in place:

捕获组中的其余模式并将其放回原位:

$str = preg_replace('/\b8(\d{1,2}-)/', 'replacement$1', $str);

#1


5  

Capture the rest of the pattern in a group and put it back in place:

捕获组中的其余模式并将其放回原位:

$str = preg_replace('/\b8(\d{1,2}-)/', 'replacement$1', $str);