如何用JavaScript regex替换href中的所有内容?

My text is something like:

我的文本是这样的:

<a href="http://example.com/test this now">Stuff</a>

More stuff

<a href="http://example.com/more?stuff goes here">more</a>

I want to replace what's inside the href with a function that will URL Encode just the URL portion.

我想用一个只对URL部分进行URL编码的函数替换href内部的内容。

How would I go about this?

我该怎么做呢?

UPDATE Here's what I've tried:

更新如下:

postdata.comment.content = postdata.comment.content.replace(/href=\"(.+?)\"/g, function(match, p1) {
    return encodeURI(p1);
});

Does not do what I would have hoped.

不做我希望的事。

Expected result is:

预期的结果是:

<a href="http%3A%2F%2Fexample.com%2Ftest%20this%20now">Stuff</a>

More stuff

<a href="http%3A%2F%2Fexample.com%2Fmore%3Fstuff%20goes%20here">more</a>

5 个解决方案

#1

The regex is matching the complete attribute href="....", however, the replacement is only done by the encoded URL and use encodeURIComponent() to encode complete URL.

正则表达式匹配完成属性href = " ....但是，替换只由编码的URL完成，并使用encodeURIComponent()对完整的URL进行编码。

var string = '<a href="http://example.com/test this now">Stuff</a>';

string = string.replace(/href="(.*?)"/, function(m, $1) {
    return 'href="' + encodeURIComponent($1) + '"';
    //      ^^^^^^                     ^
});

var str = `<a href="http://example.com/test this now">Stuff</a>

More stuff

<a href="http://example.com/more?stuff goes here">more</a>`;

str = str.replace(/href="(.*?)"/g, (m, $1) => 'href="' + encodeURIComponent($1) + '"');

console.log(str);
document.body.textContent = str;

#2

For the encoding, you can use encodeURIComponent:

对于编码，可以使用encodeURIComponent:

var links = document.querySelectorAll('a');
for(var i=0; i<links.length; ++i)
  links[i].href = encodeURIComponent(links[i].href);

<a href="http://example.com/test this now">Stuff</a>
More stuff
<a href="http://example.com/more?stuff goes here">more</a>

If you only have a HTML string instead of DOM elements, then use don't use regular expressions. Parse your string with a DOM parser instead.

如果您只有一个HTML字符串而没有DOM元素，那么使用不要使用正则表达式。使用DOM解析器解析字符串。

var codeString = '<a href="http://example.com/test this now">Stuff</a>\nMore stuff\n<a href="http://example.com/more?stuff goes here">more</a>';
var doc = new DOMParser().parseFromString(codeString, "text/html");
var links = doc.querySelectorAll('a');
for(var i=0; i<links.length; ++i)
  links[i].href = encodeURIComponent(links[i].href);
document.querySelector('code').textContent = doc.body.innerHTML;

<pre><code></code></pre>

And note that if you encode the URL entirely, it will be treated as a relative URL.

注意，如果您对URL进行了完全的编码，那么它将被视为一个相对URL。

#3

Disclaimer: Don't use regex to parse HTML
(too many reasons to list here..)

免责声明:不要使用regex解析HTML(这里列出的原因太多了)。

But, if you insist, this might work.

但是，如果你坚持，这可能行得通。

Find /(<[\w:]+(?:[^>"']|"[^"]*"|'[^']*')*?\shref\s*=\s*)(?:(['"])([\S\s]*?)\2)((?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>)/

找到/(<(\ w:)+(?:[[^ ^ >“|”“*”|“[^]*)* ? \ shref \ s * = \ s *)(?:[’])([\ s \ s]* ?)\ 2)((?:“[\ s \ s]* ?”|“[\ s \ s]* ?”|[^ >]* ?)+ >)

Replace $1$2 + someEncoding( $3 ) + $2$4

替换$1$2 + someEncoding($3) + $2$4

Expanded

扩大

 (                             # (1 start)
      < [\w:]+ 
      (?: [^>"'] | " [^"]* " | ' [^']* ' )*?
      \s 
      href \s* = \s* 
 )                             # (1 end)
 (?:
      ( ['"] )                      # (2)
      (                             # (3 start)
           [\S\s]*? 
      )                             # (3 end)
      \2 
 )
 (                             # (4 start)
      (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
      >
 )                             # (4 end)

#4

Where is this running? If you have a DOM, then you are MUCH better off using a DOM loop over document.links or document.querySelectorAll("a") than regex on HTML. Also you likely do not want to encode EVERYTHING, only the search part

这是运行在哪里?如果您有一个DOM，那么最好在文档上使用DOM循环。在HTML上链接或文档。queryselectorall(“a”)，而不是regex。同样，您可能不希望对所有内容进行编码，只希望对搜索部分进行编码

var allLinks = document.querySelectorAll("a");
for (var i=0;i<allLinks.length;i++) {
   var search = allLinks[i].search;
   if (search) {
     allLinks[i].search="?"+search.substring(1).replace(/stuff/,encodeURIComponent("something"));
   }
}

In case you really DO want to have encoded hrefs then

如果您确实想要对hrefs进行编码

for (var i=0;i<allLinks.length;i++) {
   var href = allLinks[i].href;
   if (href) {
     allLinks[i].href=href.replace(/stuff/,encodeURIComponent("something"));
   }
}

#5

Your expected string "http%3A%2F%2Fexample.com%2Ftest%20this%20now" corresponds to this operation encodeURIComponent("http://example.com/test this now"), but not with encodeURI function:

您期望的字符串“http%3A%2F%2Fexample.com%2Ftest%20this%20now”对应于此操作encodeURIComponent(“http://example.com/test this now”)，但不包括encodeURI函数:

var str = '<a href="http://example.com/test this now">Stuff</a>More stuff<a href="http://example.com/more?stuff goes here">more</a>';
str = str.replace(/href=\"(.+?)\"/g, function (m, p1) {
    return encodeURIComponent(p1);
});

console.log(str);
// <a http%3A%2F%2Fexample.com%2Ftest%20this%20now>Stuff</a>More stuff<a http%3A%2F%2Fexample.com%2Fmore%3Fstuff%20goes%20here>more</a>

#1