Thank you for looking at this!
谢谢你看这个!
I am trying to build some Regex that works in JavaScript that will match ALL URL parameters and their values that are not in my predefined list. Example:
我正在尝试构建一些在JavaScript中工作的正则表达式,它将匹配所有URL参数及其不在我的预定义列表中的值。例:
Raw URL:
/folder/index.html?knownParamA=1234&unknownParamA=1234&knownParamB=1234&unknownParamB=1234
My List of Know Parameters:
我知道参数列表:
((knownParamA|knownParamB|knownParamC)=[^&]*&?)/gi
Resulting (Cleaned up) URL:
结果(清理)URL:
/folder/index.html?knownParamA=1234&unknownParam=1234
Ultimately, I want to capture a cleaned up version of any URL with only the parameters and values I need. There's tons of parameters on my website that are meaningless to me and only get in the way. One solution I found required a look back but I don't think JavaScript supports those.
最终,我想捕获任何URL的清理版本,只包含我需要的参数和值。我网站上的大量参数对我来说毫无意义,只会妨碍我。我找到的一个解决方案需要回顾一下,但我认为JavaScript不支持这些。
Thank you so much for the help!!!
非常感谢你的帮助!!!
Solution Based on Feedback Below:
基于以下反馈的解决方案:
pageURL = window.location.pathname + window.location.search;
pageURL = window.location.pathname + window.location.search;
knownParams = 'knownParamA|knownParamB|knownParamC|knownParamD';
knownParams ='knownParamA | knownParamB | knownParamC | knownParamD';
var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:' + knownParams + ')(?==))[^=]+=[^&]*', 'gi');
var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:'+ knownParams +')(?==))[^ =] + = [^&] *','gi');
var urlCleanerRegexStep2 = new RegExp('?&', '');
var urlCleanerRegexStep2 = new RegExp('?&','');
cleanPageURL = pageURL.replace(urlCleanerRegexStep1, "").replace(urlCleanerRegexStep2, '?$1');
cleanPageURL = pageURL.replace(urlCleanerRegexStep1,“”).replace(urlCleanerRegexStep2,'?$ 1');
1 个解决方案
#1
0
Negative searches are tricky, and require zero-width lookaheads.
负搜索很棘手,需要零宽度前瞻。
This will find the unknown parameters and strip them out of the URL: (Update 2: This doesn't keep unknown parameters that start with known parameters any more.)
这将找到未知参数并将其从URL中删除:(更新2:这不会保留以已知参数开头的未知参数。)
step1 = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"
However, if the first parameter gets stripped out, your first remaining parameter will be preceded by a &
instead of a ?
, and you will need to replace that too:
但是,如果第一个参数被删除,那么您的第一个剩余参数将以&而不是?开头,您还需要替换它:
clean = step1.replace(/[?&]([^=]+=[^&]*)/, '?$1');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"
You can chain these together, of course:
当然,您可以将这些链接在一起:
clean = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '').
replace(/[?&]([^=]+=[^&]*)/, '?$1');
Update: I have included user3842539's expansion of the code, as it's easier to read here than in a comment.
更新:我已经包含了user3842539的代码扩展,因为这里比在评论中更容易阅读。
pageURL = window.location.pathname + window.location.search;
knownParams = 'knownParamA|knownParamB|knownParamC|knownParamD';
var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:' + knownParams + ')(?==))[^=]+=[^&]*', 'gi');
var urlCleanerRegexStep2 = new RegExp('[?&]([^=]+=[^&]*)', '');
cleanPageURL = pageURL.replace(urlCleanerRegexStep1, '').replace(urlCleanerRegexStep2, '?$1');
To help you interpret these regexes:
为了帮助您解释这些正则表达式:
-
[?&]
= either?
or&
-
(
...)
= captured group -
(?!
...)
= not followed by a match for this group -
(?:
...)
= uncaptured group -
(?=
...)
= followed by a match for this group -
=
==
-
[^=]
= any character other than=
-
+
= one or more times -
[^&]
= any character other than&
-
*
= zero or more times
[?&] =要么?要么 &
(...)=被捕获的组
(?!...)=后面没有匹配这个组
(?:...)=未被捕获的组
(?= ...)=后跟该组的匹配
= = =
[^ =] = =以外的任何字符
+ =一次或多次
[^&] =除&之外的任何字符
* =零次或多次
Outside the regex body,
在正则表达式体外,
- The
g
flag means 'all matches' (as opposed to only the first) - The
i
flag means 'case-insensitive' - In the replacement string,
$1
means 'captured group 1'
g标志表示“所有匹配”(而不是仅与第一个匹配)
i标志意味着'不区分大小写'
在替换字符串中,$ 1表示“捕获的组1”
#1
0
Negative searches are tricky, and require zero-width lookaheads.
负搜索很棘手,需要零宽度前瞻。
This will find the unknown parameters and strip them out of the URL: (Update 2: This doesn't keep unknown parameters that start with known parameters any more.)
这将找到未知参数并将其从URL中删除:(更新2:这不会保留以已知参数开头的未知参数。)
step1 = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"
However, if the first parameter gets stripped out, your first remaining parameter will be preceded by a &
instead of a ?
, and you will need to replace that too:
但是,如果第一个参数被删除,那么您的第一个剩余参数将以&而不是?开头,您还需要替换它:
clean = step1.replace(/[?&]([^=]+=[^&]*)/, '?$1');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"
You can chain these together, of course:
当然,您可以将这些链接在一起:
clean = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '').
replace(/[?&]([^=]+=[^&]*)/, '?$1');
Update: I have included user3842539's expansion of the code, as it's easier to read here than in a comment.
更新:我已经包含了user3842539的代码扩展,因为这里比在评论中更容易阅读。
pageURL = window.location.pathname + window.location.search;
knownParams = 'knownParamA|knownParamB|knownParamC|knownParamD';
var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:' + knownParams + ')(?==))[^=]+=[^&]*', 'gi');
var urlCleanerRegexStep2 = new RegExp('[?&]([^=]+=[^&]*)', '');
cleanPageURL = pageURL.replace(urlCleanerRegexStep1, '').replace(urlCleanerRegexStep2, '?$1');
To help you interpret these regexes:
为了帮助您解释这些正则表达式:
-
[?&]
= either?
or&
-
(
...)
= captured group -
(?!
...)
= not followed by a match for this group -
(?:
...)
= uncaptured group -
(?=
...)
= followed by a match for this group -
=
==
-
[^=]
= any character other than=
-
+
= one or more times -
[^&]
= any character other than&
-
*
= zero or more times
[?&] =要么?要么 &
(...)=被捕获的组
(?!...)=后面没有匹配这个组
(?:...)=未被捕获的组
(?= ...)=后跟该组的匹配
= = =
[^ =] = =以外的任何字符
+ =一次或多次
[^&] =除&之外的任何字符
* =零次或多次
Outside the regex body,
在正则表达式体外,
- The
g
flag means 'all matches' (as opposed to only the first) - The
i
flag means 'case-insensitive' - In the replacement string,
$1
means 'captured group 1'
g标志表示“所有匹配”(而不是仅与第一个匹配)
i标志意味着'不区分大小写'
在替换字符串中,$ 1表示“捕获的组1”