正则表达式帮助 - 匹配任何不在列表中的URL参数和值

时间:2021-10-31 08:14:46

Thank you for looking at this!

谢谢你看这个!

I am trying to build some Regex that works in JavaScript that will match ALL URL parameters and their values that are not in my predefined list. Example:

我正在尝试构建一些在JavaScript中工作的正则表达式,它将匹配所有URL参数及其不在我的预定义列表中的值。例:

Raw URL:

/folder/index.html?knownParamA=1234&unknownParamA=1234&knownParamB=1234&unknownParamB=1234

My List of Know Parameters:

我知道参数列表:

((knownParamA|knownParamB|knownParamC)=[^&]*&?)/gi

Resulting (Cleaned up) URL:

结果(清理)URL:

/folder/index.html?knownParamA=1234&unknownParam=1234

Ultimately, I want to capture a cleaned up version of any URL with only the parameters and values I need. There's tons of parameters on my website that are meaningless to me and only get in the way. One solution I found required a look back but I don't think JavaScript supports those.

最终,我想捕获任何URL的清理版本,只包含我需要的参数和值。我网站上的大量参数对我来说毫无意义,只会妨碍我。我找到的一个解决方案需要回顾一下,但我认为JavaScript不支持这些。

Thank you so much for the help!!!

非常感谢你的帮助!!!

Solution Based on Feedback Below:

基于以下反馈的解决方案:

pageURL = window.location.pathname + window.location.search;

pageURL = window.location.pathname + window.location.search;

knownParams = 'knownParamA|knownParamB|knownParamC|knownParamD';

knownParams ='knownParamA | knownParamB | knownParamC | knownParamD';

var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:' + knownParams + ')(?==))[^=]+=[^&]*', 'gi');

var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:'+ knownParams +')(?==))[^ =] + = [^&] *','gi');

var urlCleanerRegexStep2 = new RegExp('?&', '');

var urlCleanerRegexStep2 = new RegExp('?&','');

cleanPageURL = pageURL.replace(urlCleanerRegexStep1, "").replace(urlCleanerRegexStep2, '?$1');

cleanPageURL = pageURL.replace(urlCleanerRegexStep1,“”).replace(urlCleanerRegexStep2,'?$ 1');

1 个解决方案

#1


0  

Negative searches are tricky, and require zero-width lookaheads.

负搜索很棘手,需要零宽度前瞻。

This will find the unknown parameters and strip them out of the URL: (Update 2: This doesn't keep unknown parameters that start with known parameters any more.)

这将找到未知参数并将其从URL中删除:(更新2:这不会保留以已知参数开头的未知参数。)

step1 = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"

However, if the first parameter gets stripped out, your first remaining parameter will be preceded by a & instead of a ?, and you will need to replace that too:

但是,如果第一个参数被删除,那么您的第一个剩余参数将以&而不是?开头,您还需要替换它:

clean = step1.replace(/[?&]([^=]+=[^&]*)/, '?$1');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"

You can chain these together, of course:

当然,您可以将这些链接在一起:

clean = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '').
  replace(/[?&]([^=]+=[^&]*)/, '?$1');

Update: I have included user3842539's expansion of the code, as it's easier to read here than in a comment.

更新:我已经包含了user3842539的代码扩展,因为这里比在评论中更容易阅读。

pageURL = window.location.pathname + window.location.search;
knownParams = 'knownParamA|knownParamB|knownParamC|knownParamD';
var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:' + knownParams + ')(?==))[^=]+=[^&]*', 'gi');
var urlCleanerRegexStep2 = new RegExp('[?&]([^=]+=[^&]*)', '');
cleanPageURL = pageURL.replace(urlCleanerRegexStep1, '').replace(urlCleanerRegexStep2, '?$1');

To help you interpret these regexes:

为了帮助您解释这些正则表达式:

  • [?&] = either ? or &
  • [?&] =要么?要么 &

  • (...) = captured group
  • (...)=被捕获的组

  • (?!...) = not followed by a match for this group
  • (?!...)=后面没有匹配这个组

  • (?:...) = uncaptured group
  • (?:...)=未被捕获的组

  • (?=...) = followed by a match for this group
  • (?= ...)=后跟该组的匹配

  • = = =
  • = = =

  • [^=] = any character other than =
  • [^ =] = =以外的任何字符

  • + = one or more times
  • + =一次或多次

  • [^&] = any character other than &
  • [^&] =除&之外的任何字符

  • * = zero or more times
  • * =零次或多次

Outside the regex body,

在正则表达式体外,

  • The g flag means 'all matches' (as opposed to only the first)
  • g标志表示“所有匹配”(而不是仅与第一个匹配)

  • The i flag means 'case-insensitive'
  • i标志意味着'不区分大小写'

  • In the replacement string, $1 means 'captured group 1'
  • 在替换字符串中,$ 1表示“捕获的组1”

#1


0  

Negative searches are tricky, and require zero-width lookaheads.

负搜索很棘手,需要零宽度前瞻。

This will find the unknown parameters and strip them out of the URL: (Update 2: This doesn't keep unknown parameters that start with known parameters any more.)

这将找到未知参数并将其从URL中删除:(更新2:这不会保留以已知参数开头的未知参数。)

step1 = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"

However, if the first parameter gets stripped out, your first remaining parameter will be preceded by a & instead of a ?, and you will need to replace that too:

但是,如果第一个参数被删除,那么您的第一个剩余参数将以&而不是?开头,您还需要替换它:

clean = step1.replace(/[?&]([^=]+=[^&]*)/, '?$1');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"

You can chain these together, of course:

当然,您可以将这些链接在一起:

clean = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '').
  replace(/[?&]([^=]+=[^&]*)/, '?$1');

Update: I have included user3842539's expansion of the code, as it's easier to read here than in a comment.

更新:我已经包含了user3842539的代码扩展,因为这里比在评论中更容易阅读。

pageURL = window.location.pathname + window.location.search;
knownParams = 'knownParamA|knownParamB|knownParamC|knownParamD';
var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:' + knownParams + ')(?==))[^=]+=[^&]*', 'gi');
var urlCleanerRegexStep2 = new RegExp('[?&]([^=]+=[^&]*)', '');
cleanPageURL = pageURL.replace(urlCleanerRegexStep1, '').replace(urlCleanerRegexStep2, '?$1');

To help you interpret these regexes:

为了帮助您解释这些正则表达式:

  • [?&] = either ? or &
  • [?&] =要么?要么 &

  • (...) = captured group
  • (...)=被捕获的组

  • (?!...) = not followed by a match for this group
  • (?!...)=后面没有匹配这个组

  • (?:...) = uncaptured group
  • (?:...)=未被捕获的组

  • (?=...) = followed by a match for this group
  • (?= ...)=后跟该组的匹配

  • = = =
  • = = =

  • [^=] = any character other than =
  • [^ =] = =以外的任何字符

  • + = one or more times
  • + =一次或多次

  • [^&] = any character other than &
  • [^&] =除&之外的任何字符

  • * = zero or more times
  • * =零次或多次

Outside the regex body,

在正则表达式体外,

  • The g flag means 'all matches' (as opposed to only the first)
  • g标志表示“所有匹配”(而不是仅与第一个匹配)

  • The i flag means 'case-insensitive'
  • i标志意味着'不区分大小写'

  • In the replacement string, $1 means 'captured group 1'
  • 在替换字符串中,$ 1表示“捕获的组1”