I have the following problem with a latex textfile that consist of multiple sentences, e.g.
我有以下问题,一个乳胶文本文件,由多个句子组成,例如:
Aaa \cref{fig:1}. Bbb \cref{fig:2} bbb \cref{fig:3}. Ccc \cref{fig:4}. Ddd \cref{fig:5} ddd \cref{fig:6} ddd \cref{fig:7}.
What I need to find out is how to isolate the \cref{fig:xxx}
parts in each sentence. The problem is that the regex should only account for sentences in which \cref{fig:xxx}
occurs more than one times (>1).
我需要知道的是如何在每个句子中分离\cref{fig:xxx}部分。问题是regex应该只对发生了不止一次(>1)的\cref{fig:xxx}的句子进行解释。
A good result would be if the regex could return fig:2
and fig:3
from sentence bbb, as well as fig:5
, fig:6
, and fig:7
from sentence ddd.
如果regex能从bbb句返回fig . 2和fig . 3,以及从ddd句返回fig . 5、fig . 6和fig . 7,则可以得到一个好的结果。
I have to use regular expressions for the search in Textmate (texteditor).
我必须在Textmate (texteditor)中使用正则表达式进行搜索。
2 个解决方案
#1
1
In addition to my comment, you could come up with a recursive approach. However, looking at the documentation, recursion seems not to be supported in TextMate
. In this case, you could easily repeat the pattern one more time (fulfilling your requirement of sentences with more than one occurence):
除了我的评论之外,您还可以提出递归方法。但是,查看文档,在TextMate中似乎不支持递归。在这种情况下,你可以很容易地重复这个句型一次(用多个出现的句子来满足你的要求):
(?:\\cref\{(fig:\d+)\})(?:[^.]+?(?:\\cref\{(fig:\d+)\}))+
Broken down, this looks for \\cref{}
and captures the inner fig:
+ digit, then looks for a character that is not a dot ([^.]
) and repeats the first subpattern. As already mentionned in the comments, you will likely need to play around with the sentence conditions (e.g. what is considered as a sentence - this is the [^.]
part). See a demo of the approach on regex101.com.
分解,这看起来\ \ cref { }和捕捉内部图:+数字,然后查找一个字符不是一个点([^])和重复第一子模式。正如已经提到在评论中,你可能会需要把玩这句条件(例如被认为是一个句子——这是什么(^。部分)。参见regex101.com上的方法演示。
#2
1
what you need is a positive lookahead statement. eg:
你需要的是一个积极的前瞻声明。例如:
\S*(?=\s*\\cref{)
note! I'm not sure how to enter escapes and/or symbols in your text program so just to be clear by double "\" I mean the \ char and \s is space char, \S anti space. to return also the fig, you will need to introduce different groups. this guide might help you: http://www.rexegg.com/regex-lookarounds.html#compound
注意!我不知道如何在你的文本程序中输入转义和/或符号,所以我用双"\"来说明,我的意思是char和\s是空间char, \s反空间。要返回fig,您将需要引入不同的组。本指南可能对您有所帮助:http://www.rexegg.com/regex lookarounds.html#化合物
#1
1
In addition to my comment, you could come up with a recursive approach. However, looking at the documentation, recursion seems not to be supported in TextMate
. In this case, you could easily repeat the pattern one more time (fulfilling your requirement of sentences with more than one occurence):
除了我的评论之外,您还可以提出递归方法。但是,查看文档,在TextMate中似乎不支持递归。在这种情况下,你可以很容易地重复这个句型一次(用多个出现的句子来满足你的要求):
(?:\\cref\{(fig:\d+)\})(?:[^.]+?(?:\\cref\{(fig:\d+)\}))+
Broken down, this looks for \\cref{}
and captures the inner fig:
+ digit, then looks for a character that is not a dot ([^.]
) and repeats the first subpattern. As already mentionned in the comments, you will likely need to play around with the sentence conditions (e.g. what is considered as a sentence - this is the [^.]
part). See a demo of the approach on regex101.com.
分解,这看起来\ \ cref { }和捕捉内部图:+数字,然后查找一个字符不是一个点([^])和重复第一子模式。正如已经提到在评论中,你可能会需要把玩这句条件(例如被认为是一个句子——这是什么(^。部分)。参见regex101.com上的方法演示。
#2
1
what you need is a positive lookahead statement. eg:
你需要的是一个积极的前瞻声明。例如:
\S*(?=\s*\\cref{)
note! I'm not sure how to enter escapes and/or symbols in your text program so just to be clear by double "\" I mean the \ char and \s is space char, \S anti space. to return also the fig, you will need to introduce different groups. this guide might help you: http://www.rexegg.com/regex-lookarounds.html#compound
注意!我不知道如何在你的文本程序中输入转义和/或符号,所以我用双"\"来说明,我的意思是char和\s是空间char, \s反空间。要返回fig,您将需要引入不同的组。本指南可能对您有所帮助:http://www.rexegg.com/regex lookarounds.html#化合物