I have a CSV file that has 3 columns: Site ID, HTML Header, HTML Footer
我有一个包含3列的CSV文件:站点ID,HTML标头,HTML页脚
I need to go through the HTML Header and Footer columns and locate any version of the Google Analytics tracking code, and remove it, but leaving anything else in those cells intact.
我需要浏览HTML页眉和页脚列,找到任何版本的Google Analytics跟踪代码,然后将其删除,但保留这些单元格中的任何其他内容。
I tried using this regex: <script(?m:.*?)\'UA-.{8,12}\'(?m:.*?)</script>
but it seems to be getting thrown off and removing too much, probably from some malformed code somewhere in the CSV.
我尝试使用这个正则表达式: (?m:。*?)\'ua>
Any ideas on a better way to do this?
有关更好的方法的任何想法吗?
1 个解决方案
#1
1
For a quick hacky one-off replacement, you can probably fix it by avoiding ungreedy repetition and excluding <script
or </script
from the allowed sequences within the repetition. Replace both .*?
with
对于快速hacky一次性替换,您可以通过避免不合理的重复并在重复中从允许的序列中排除 或
(?:(?!</?script).)*
#1
1
For a quick hacky one-off replacement, you can probably fix it by avoiding ungreedy repetition and excluding <script
or </script
from the allowed sequences within the repetition. Replace both .*?
with
对于快速hacky一次性替换,您可以通过避免不合理的重复并在重复中从允许的序列中排除 或
(?:(?!</?script).)*