如何使用正则表达式在先前搜索的结果中搜索和替换?

时间:2022-04-08 16:49:23

Thanks for taking the time to read this and apologies right off the bat if this is slightly confusing, remedial or has been previously asked (extensive searching, limited results).

感谢您花时间阅读本文并立即道歉,如果这有点令人困惑,补救或以前曾被问过(广泛搜索,有限的结果)。

I'm editing using archaic software, HomeSite 5 if you're familiar, and it allows the use of scripts.

如果您熟悉的话,我正在使用古老的软件HomeSite 5进行编辑,它允许使用脚本。

My conundrum is as follows:

我的难题如下:

I would like to isolate multiple selections of text. I am currently doing this using a (long-winded) regex that captures all of the content following a certainspecifically set date (in this instance "2030-12-31") until it reaches a certain tag (in this instance ]]<content>). Thus far I have managed.

我想隔离多个文本选择。我目前正在使用(冗长的)正则表达式执行此操作,该正则表达式捕获特定设置日期之后的所有内容(在此实例中为“2030-12-31”),直到它到达某个标记(在此实例中)] )。到目前为止我已经成功了。

I then would like, within only those previously found selections of text, to remove all of the <span> tags it contains. However, I would like the <span> tags in other sections of text to remain (for example those assigned earlier dates).

然后,我想在之前找到的文本选择中删除它包含的所有标记。但是,我希望保留文本其他部分中的标记(例如,那些分配了早期日期的标记)。

Individually I can carry-out both functions, isolating the specific sections or removing all of the <span> tags, I feel there's just a link that I'm not aware of that can enable me to run one within the other.

单独我可以执行这两个功能,隔离特定部分或删除所有标签,我觉得只有一个我不知道的链接可以让我在另一个中运行一个。

Once again, apologies if the answer is very simple; my knowledge of scripting and regex is limited at best. I've been doing most of my work using Jscript, however I'm not certain if HomeSite accepts other formats - I'm open to multiple solutions!

如果答案很简单,再一次道歉;我对脚本和正则表达式的了解最多也是有限的。我一直在使用Jscript完成大部分工作,但我不确定HomeSite是否接受其他格式 - 我对多种解决方案持开放态度!

TLDR: Search and replace only within certain selections, as specified by immediately preceeding regex.

TLDR:仅在正确的正则表达式之前指定的某些选择中搜索和替换。

EDIT 1: Please see below the expression used to isolate the required sections. The first is the entire expression. The second is the container in which the content is captured.:

编辑1:请参阅下面用于隔离所需部分的表达式。第一个是整个表达。第二个是捕获内容的容器:

/<version recordId="([0-9]{4,})" start="2030-12-31"([^>]*)>([^<]*)<title><!\[CDATA\[<span class="uk">([^<]*)<\/span>\]\]><\/title>([^<]*)<number><!\[CDATA\[<span class="uk">([0-9]{1,3})\.<\/span>\]\]><\/number>([^<]*)<content><!\[CDATA\[([^]]*)\]\]><\/content>([^<]*)<\/version>/g;

..<content><!\[CDATA\[([^]]*)\]\]></content>..

Within that I would hop to amend as follows:

在那之内我会跳到如下修改:

<span class="uk">content</span>
content

Now that I've typed that out in public I am aware what kind of a horror show of a regular expression it is and I apologise to the good coders of * for even having to look at it!

现在我已经在公开场合输入了这个,我知道它是一个正则表达式的恐怖节目,我向*的优秀编码器道歉,甚至不得不看一下!

EDIT 2: Please see below an example of desired output:

编辑2:请参阅下面的所需输出示例:

<version recordId="1234" start="2012-01-01"><stuffhere...<content><![CDATA[[
  <span class="uk">content1</span>
  <span class="uk">content2</span>
 ]]</content>
    </version>
 <version record="4231" start="2030-12-31"><stuffhere...<content><![CDATA[[
   <span class="uk">content1</span>
   <span class="uk">content2</span>
 ]]</content>
    </version>

BECOMES

BECOMES

<version recordId="1234" start="2012-01-01"><stuffhere...<content><![CDATA[[
  <span class="uk">content1</span>
  <span class="uk">content2</span>
 ]]</content>
    </version>
 <version record="4231" start="2030-12-31"><stuffhere...<content><![CDATA[[
   content1
   content2
 ]]</content>
    </version>

n.b: Thanks to Hannele for earlier formatting corrections.

n.b:感谢Hannele早期的格式化修正。

1 个解决方案

#1


2  

Using a callback function with String.replace()

The second argument to the String.replace() method, (the replacement text), may be specified as a callback function. This callback function can in turn have another replace() call. In this manner, you can easily process text-within-a-section. Here is an example that demonstrates this technique.

String.replace()方法的第二个参数(替换文本)可以指定为回调函数。这个回调函数又可以有另一个replace()调用。通过这种方式,您可以轻松处理部分内的文本。这是一个演示此技术的示例。

Given this example text:

鉴于此示例文本:

Before:

blah foo? foo blah foo, foo.
<section1>blah foo? foo blah foo, foo.</section1>
blah foo? foo blah foo, foo.
<section2>blah foo? foo blah foo, foo.</section2>
blah foo? foo blah foo, foo.

哇哇? foo blah foo,foo。 blah foo? foo blah foo,foo。 blah foo? foo blah foo,foo。 blah foo? foo blah foo,foo。 blah foo? foo blah foo,foo。

Let's say you want to replace each foo with bar, but only within the sections. This is easily done by using a callback function as the replacement argument of the String.replace() method like so:

假设你想用bar替换每个foo,但只在部分内部。这可以通过使用回调函数作为String.replace()方法的替换参数来完成,如下所示:

function f1(text) {
    var re1 = /<section(\d+)>[\S\s]*?<\/section\1>/g;
    var re2 = /foo/ig;
    text = text.replace(re1,
        function(m0, m1){
            return m0.replace(re2, 'bar');
        });
    return text;
}

When a pattern match is found, the replace() method calls the callback function and passes the whole match in the first argument (in the above example I named it: "m0"). If the regex has capture groups, the matched text for each of these groups are passed in following arguments (in this case, there is only one capture group, and I've named this argument: "m1" - and note that this argument is not used by the function).

找到模式匹配时,replace()方法调用回调函数并在第一个参数中传递整个匹配(在上面的示例中,我将其命名为:“m0”)。如果正则表达式具有捕获组,则每个组的匹配文本将在以下参数中传递(在这种情况下,只有一个捕获组,我将此参数命名为:“m1” - 并注意此参数是没有被功能使用)。

Here is the example text after being processed by the above function:

以下是上述函数处理后的示例文本:

After:

blah foo? foo blah foo, foo.
<section1>blah bar? bar blah bar, bar.</section1>
blah foo? foo blah foo, foo.
<section2>blah bar? bar blah bar, bar.</section2>
blah foo? foo blah foo, foo.

哇哇? foo blah foo,foo。 blah bar? bar blah bar,bar。 blah foo? foo blah foo,foo。 等等吧? bar blah bar,bar。 blah foo? foo blah foo,foo。

#1


2  

Using a callback function with String.replace()

The second argument to the String.replace() method, (the replacement text), may be specified as a callback function. This callback function can in turn have another replace() call. In this manner, you can easily process text-within-a-section. Here is an example that demonstrates this technique.

String.replace()方法的第二个参数(替换文本)可以指定为回调函数。这个回调函数又可以有另一个replace()调用。通过这种方式,您可以轻松处理部分内的文本。这是一个演示此技术的示例。

Given this example text:

鉴于此示例文本:

Before:

blah foo? foo blah foo, foo.
<section1>blah foo? foo blah foo, foo.</section1>
blah foo? foo blah foo, foo.
<section2>blah foo? foo blah foo, foo.</section2>
blah foo? foo blah foo, foo.

哇哇? foo blah foo,foo。 blah foo? foo blah foo,foo。 blah foo? foo blah foo,foo。 blah foo? foo blah foo,foo。 blah foo? foo blah foo,foo。

Let's say you want to replace each foo with bar, but only within the sections. This is easily done by using a callback function as the replacement argument of the String.replace() method like so:

假设你想用bar替换每个foo,但只在部分内部。这可以通过使用回调函数作为String.replace()方法的替换参数来完成,如下所示:

function f1(text) {
    var re1 = /<section(\d+)>[\S\s]*?<\/section\1>/g;
    var re2 = /foo/ig;
    text = text.replace(re1,
        function(m0, m1){
            return m0.replace(re2, 'bar');
        });
    return text;
}

When a pattern match is found, the replace() method calls the callback function and passes the whole match in the first argument (in the above example I named it: "m0"). If the regex has capture groups, the matched text for each of these groups are passed in following arguments (in this case, there is only one capture group, and I've named this argument: "m1" - and note that this argument is not used by the function).

找到模式匹配时,replace()方法调用回调函数并在第一个参数中传递整个匹配(在上面的示例中,我将其命名为:“m0”)。如果正则表达式具有捕获组,则每个组的匹配文本将在以下参数中传递(在这种情况下,只有一个捕获组,我将此参数命名为:“m1” - 并注意此参数是没有被功能使用)。

Here is the example text after being processed by the above function:

以下是上述函数处理后的示例文本:

After:

blah foo? foo blah foo, foo.
<section1>blah bar? bar blah bar, bar.</section1>
blah foo? foo blah foo, foo.
<section2>blah bar? bar blah bar, bar.</section2>
blah foo? foo blah foo, foo.

哇哇? foo blah foo,foo。 blah bar? bar blah bar,bar。 blah foo? foo blah foo,foo。 等等吧? bar blah bar,bar。 blah foo? foo blah foo,foo。