在一个句子中多次匹配一个特定的模式

时间:2021-02-12 23:37:04

I have the following problem with a latex textfile that consist of multiple sentences, e.g.

我有以下问题,一个乳胶文本文件,由多个句子组成,例如:

Aaa \cref{fig:1}. Bbb \cref{fig:2} bbb \cref{fig:3}. Ccc \cref{fig:4}. Ddd \cref{fig:5} ddd \cref{fig:6} ddd \cref{fig:7}.

What I need to find out is how to isolate the \cref{fig:xxx} parts in each sentence. The problem is that the regex should only account for sentences in which \cref{fig:xxx} occurs more than one times (>1).

我需要知道的是如何在每个句子中分离\cref{fig:xxx}部分。问题是regex应该只对发生了不止一次(>1)的\cref{fig:xxx}的句子进行解释。

A good result would be if the regex could return fig:2 and fig:3 from sentence bbb, as well as fig:5, fig:6, and fig:7 from sentence ddd.

如果regex能从bbb句返回fig . 2和fig . 3,以及从ddd句返回fig . 5、fig . 6和fig . 7,则可以得到一个好的结果。

I have to use regular expressions for the search in Textmate (texteditor).

我必须在Textmate (texteditor)中使用正则表达式进行搜索。

2 个解决方案

#1


1  

In addition to my comment, you could come up with a recursive approach. However, looking at the documentation, recursion seems not to be supported in TextMate. In this case, you could easily repeat the pattern one more time (fulfilling your requirement of sentences with more than one occurence):

除了我的评论之外,您还可以提出递归方法。但是,查看文档,在TextMate中似乎不支持递归。在这种情况下,你可以很容易地重复这个句型一次(用多个出现的句子来满足你的要求):

(?:\\cref\{(fig:\d+)\})(?:[^.]+?(?:\\cref\{(fig:\d+)\}))+

Broken down, this looks for \\cref{} and captures the inner fig:+ digit, then looks for a character that is not a dot ([^.]) and repeats the first subpattern. As already mentionned in the comments, you will likely need to play around with the sentence conditions (e.g. what is considered as a sentence - this is the [^.] part). See a demo of the approach on regex101.com.

分解,这看起来\ \ cref { }和捕捉内部图:+数字,然后查找一个字符不是一个点([^])和重复第一子模式。正如已经提到在评论中,你可能会需要把玩这句条件(例如被认为是一个句子——这是什么(^。部分)。参见regex101.com上的方法演示。

#2


1  

what you need is a positive lookahead statement. eg:

你需要的是一个积极的前瞻声明。例如:

\S*(?=\s*\\cref{)

note! I'm not sure how to enter escapes and/or symbols in your text program so just to be clear by double "\" I mean the \ char and \s is space char, \S anti space. to return also the fig, you will need to introduce different groups. this guide might help you: http://www.rexegg.com/regex-lookarounds.html#compound

注意!我不知道如何在你的文本程序中输入转义和/或符号,所以我用双"\"来说明,我的意思是char和\s是空间char, \s反空间。要返回fig,您将需要引入不同的组。本指南可能对您有所帮助:http://www.rexegg.com/regex lookarounds.html#化合物

#1


1  

In addition to my comment, you could come up with a recursive approach. However, looking at the documentation, recursion seems not to be supported in TextMate. In this case, you could easily repeat the pattern one more time (fulfilling your requirement of sentences with more than one occurence):

除了我的评论之外,您还可以提出递归方法。但是,查看文档,在TextMate中似乎不支持递归。在这种情况下,你可以很容易地重复这个句型一次(用多个出现的句子来满足你的要求):

(?:\\cref\{(fig:\d+)\})(?:[^.]+?(?:\\cref\{(fig:\d+)\}))+

Broken down, this looks for \\cref{} and captures the inner fig:+ digit, then looks for a character that is not a dot ([^.]) and repeats the first subpattern. As already mentionned in the comments, you will likely need to play around with the sentence conditions (e.g. what is considered as a sentence - this is the [^.] part). See a demo of the approach on regex101.com.

分解,这看起来\ \ cref { }和捕捉内部图:+数字,然后查找一个字符不是一个点([^])和重复第一子模式。正如已经提到在评论中,你可能会需要把玩这句条件(例如被认为是一个句子——这是什么(^。部分)。参见regex101.com上的方法演示。

#2


1  

what you need is a positive lookahead statement. eg:

你需要的是一个积极的前瞻声明。例如:

\S*(?=\s*\\cref{)

note! I'm not sure how to enter escapes and/or symbols in your text program so just to be clear by double "\" I mean the \ char and \s is space char, \S anti space. to return also the fig, you will need to introduce different groups. this guide might help you: http://www.rexegg.com/regex-lookarounds.html#compound

注意!我不知道如何在你的文本程序中输入转义和/或符号,所以我用双"\"来说明,我的意思是char和\s是空间char, \s反空间。要返回fig,您将需要引入不同的组。本指南可能对您有所帮助:http://www.rexegg.com/regex lookarounds.html#化合物