如何在python中使用正则表达式搜索单词然后替换文本?

时间:2022-03-30 16:51:33

I'm trying to write a script that will search through a html file and then replace the form action. So in this basic code:

我正在尝试编写一个脚本,它将搜索html文件,然后替换表单操作。所以在这个基本代码中:

<html>
    <head>
        <title>Forms</title>
    </head>
    <body>
    <form action="login.php" method="post">
        Username: <input type="text" name="username" value="" />
        <br />
        Password: <input type="password" name="password" value="" /> 
        <br />
        <input type="submit" name="submit" value="Submit">
    </form>
    </body>
</html>

I would like the script to search for form action="login.php" but then only replace the login.php, with say newlogin.php. The key thing is that the form action might change from file to file, i.e. on another html file the login.php might be something totally different, so the regular expression has to search for the form action= and replace the text after it (maybe using the " as limiters?)

我希望脚本能够搜索form action =“login.php”但是只能用newlogin.php替换login.php。关键是表单操作可能会在不同文件之间发生变化,即在另一个html文件中,login.php可能会完全不同,因此正则表达式必须搜索表单action =并替换后面的文本(也许使用“限制器?”

My knowledge of regular expressions is pretty basic, for example I'd know how to replace just login.php:

我对正则表达式的了解非常基础,例如我知道如何替换login.php:

(re.sub('login.php', 'newlogin.php', line))

but obviously it's no use as mentioned above if the login.php changes from file to file.

但显然如果login.php从一个文件更改为另一个文件,则如上所述没有用处。

Any help is much appreciated!

任何帮助深表感谢!

Thanks all =)

谢谢所有=)

3 个解决方案

#1


1  

Make the re catch 2 groups, the form and everything leading up to the 1st quote after action, and the action content.

重新捕获2组,表单和行动后第1个引用的所有内容以及操作内容。

Use the 1st group for the replacement, followed by the new action:

使用第一组进行替换,然后使用新操作:

re.sub(r'(<form.*?action=")([^"]+)', r'\1newlogin.php',  content)

#2


2  

You can use regex, or just simple string manipulation. Just a test case.

您可以使用正则表达式,或只是简单的字符串操作。只是一个测试案例。

for line in open("file"):
    if "form action" in line:
       line=line.rstrip()
       a=line.split('<form action="')
       a[-1] = '"newlogin" ' + a[-1].split()[-1]
       line = '<form action='.join(a)
    print line

#3


0  

You cant try this technique:

你不能尝试这种技术:

(<form[^>]*action=")[^"]*

pseudo-code:

regex.replace(input, pattern, concat(\1, new_value))

You can use this regex:

你可以使用这个正则表达式:

(?<=<form[^>]*action=")[^"]*

#1


1  

Make the re catch 2 groups, the form and everything leading up to the 1st quote after action, and the action content.

重新捕获2组,表单和行动后第1个引用的所有内容以及操作内容。

Use the 1st group for the replacement, followed by the new action:

使用第一组进行替换,然后使用新操作:

re.sub(r'(<form.*?action=")([^"]+)', r'\1newlogin.php',  content)

#2


2  

You can use regex, or just simple string manipulation. Just a test case.

您可以使用正则表达式,或只是简单的字符串操作。只是一个测试案例。

for line in open("file"):
    if "form action" in line:
       line=line.rstrip()
       a=line.split('<form action="')
       a[-1] = '"newlogin" ' + a[-1].split()[-1]
       line = '<form action='.join(a)
    print line

#3


0  

You cant try this technique:

你不能尝试这种技术:

(<form[^>]*action=")[^"]*

pseudo-code:

regex.replace(input, pattern, concat(\1, new_value))

You can use this regex:

你可以使用这个正则表达式:

(?<=<form[^>]*action=")[^"]*