How want to convert an html input string, which is the from of:
如何转换html输入字符串,它来自:
String tag = "<input type=\"submit\" class=\"cssSubmit\"/>";
String tag =“”;
to
"<input type=\"submit\" class=\"cssSubmit disable\" disabled=\"disabled\"/>"
“”
Is there any possible Java or Groovy way to do this?
有没有可能的Java或Groovy方法来做到这一点?
For example:
String convert(String input) {
//input: <input type=\"submit\" class=\"cssSubmit\"/>
//process the input string
//processedString: <input type=\"submit\" class=\"cssSubmit disable\" disabled=\"disabled\"/>
return processedString;
}
2 个解决方案
#1
This is the most generic way I can think of:
这是我能想到的最通用的方式:
public static String editTagXML(String tag,
Map<String, String> newAttributes,
Collection<String> removeAttributes)
throws SAXException, IOException,
ParserConfigurationException, TransformerConfigurationException,
TransformerException {
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder()
.parse(new InputSource(new StringReader(tag)));
Element root = doc.getDocumentElement();
NamedNodeMap attrs = root.getAttributes();
for (String removeAttr : removeAttributes) {
attrs.removeNamedItem(removeAttr);
}
for (Map.Entry<String, String> addAttr : newAttributes.entrySet()) {
final Attr attr = doc.createAttribute(addAttr.getKey());
attr.setValue(addAttr.getValue());
attrs.setNamedItem(attr);
}
StringWriter result = new StringWriter();
final Transformer transformer = TransformerFactory.newInstance()
.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(doc), new StreamResult(result));
return result.toString();
}
public static void main(String[] args) throws Exception {
long start = System.nanoTime();
String tag = "<input type=\"submit\" class=\"cssSubmit\"/>";
String edited = editTagXML(tag, new HashMap<String, String>() {{
put("class", "cssSubmit disable");
put("disabled", "disabled");
}}, new ArrayList<>());
long time = System.nanoTime() - start;
System.out.println(edited);
System.out.println("Time: " + time + " ns");
start = System.nanoTime();
tag = "<input type=\"submit\" class=\"cssSubmit\"/>";
editTagXML(tag, new HashMap<String, String>() {{
put("class", "cssSubmit disable");
put("disabled", "disabled");
}}, new ArrayList<>());
time = System.nanoTime() - start;
System.out.println("Time2: " + time + " ns");
}
It is ugly, huge, complicated, throws a lot of checked exceptions and mixes up the attributes order which may or may not be important. It is probably not how it should be done. It is also pretty slow.
它是丑陋的,巨大的,复杂的,抛出了大量已检查的异常并混合了属性顺序,这可能是重要的,也可能是不重要的。它可能不是应该如何完成的。它也很慢。
Here is the output:
这是输出:
<input class="cssSubmit disable" disabled="disabled" type="submit"/>
Time: 86213231 ns
Time2: 2379674 ns
The first run is probably so slow because it takes a while to load up the necessary libraries. The second run is surprisingly fast, but my PC is pretty powerful too. If you put some constraints on your input (like, attribute values are only quoted with "
, and no "
in attribute values and so on), there will be probably a much better way to do it, like using regular expressions or maybe even simple iteration.
第一次运行可能很慢,因为加载必要的库需要一段时间。第二次运行速度惊人,但我的PC也非常强大。如果你对你的输入设置了一些约束(比如,属性值只引用属性值中的“,而不是”等等),那么可能会有更好的方法来实现它,比如使用正则表达式甚至可能更简单迭代。
For example, if your input always looks like that, this could work just as well:
例如,如果您的输入始终如此,那么这也可以起作用:
start = System.nanoTime();
edited = tag.replaceFirst("\"cssSubmit\"", "\"cssSubmit disable\" disabled=\"disabled\"");
time = System.nanoTime() - start;
System.out.println(edited);
System.out.println("Time3: " + time + " ns");
Output:
<input type="submit" class="cssSubmit disable" disabled="disabled"/>
Time3: 1422672 ns
Hmm. The funny thing is, it's not that faster.
嗯。有趣的是,它不是那么快。
OK, but what if we want a more generic solution, but still simple enough? We could use regular expressions:
好的,但如果我们想要一个更通用的解决方案,但仍然足够简单怎么办?我们可以使用正则表达式:
private static final Pattern classAttributePattern
= Pattern.compile("\\bclass=\"([^\"]+)\"");
public static String disableTag(String tag) {
Matcher matcher = classAttributePattern.matcher(tag);
if (!matcher.find()) {
throw new IllegalArgumentException("Doesn't match: " + tag);
}
int start = matcher.start();
int end = matcher.end();
String classValue = matcher.group(1);
if (classValue.endsWith(" disable")) {
return tag; // already disabled
} else {
// assume that if the class doesn't end with " disable",
// then the disabled attribute is not present as well
return tag.substring(0, start)
+ "class=\"" + classValue
+ " disable\" disabled=\"disabled\""
+ tag.substring(end);
}
}
Note that usually using regular expressions for XML/(X)HTML is extremely error-prone. Here is a non-exhaustive list of example inputs that could break the code above:
请注意,通常使用XML /(X)HTML的正则表达式非常容易出错。以下是可能破坏上述代码的示例输入的非详尽列表:
-
<input type="submit" class="cssSubmit disable " disabled="disabled"/>
- this will break because of the space before the quote; -
<input type="submit" class='cssSubmit disable' disabled="disabled"/>
- this will break because single quotes are not expected by our code; -
<input type="submit" class = "cssSubmit" disabled="disabled"/>
- this will break because there are spaces around=
; -
<input title='this is an input with class="cssSubmit" that could be changed to class="cssSubmit disable"' type="submit" class="cssSubmit" disabled="disabled"/>
- this will break because there is attribute-like text in another attribute's value.
- 这会因为引号前的空格而中断;
- 这会破坏,因为我们的代码不需要单引号;
- 这会因为周围有空格而中断;
- 这会打破因为那里是另一个属性值中的属性类文本。
Each of these cases can be fixed by modifying the pattern in some way (although I'm not sure about the last one), but then you can find yet another case when it breaks. So this technique is best used for the input that was generated by a program, rather than written by a human, and even then you should be careful about where the input for that program came from (it could easily contain attribute values like in the last example).
这些情况中的每一种都可以通过以某种方式修改模式来修复(虽然我不确定最后一种),但是当它中断时你可以找到另一种情况。所以这种技术最好用于程序生成的输入,而不是人类编写的输入,即使这样,你也应该注意该程序的输入来自哪里(它可以很容易地包含属性值,如同最后一个例)。
#2
You can do this in groovy:
你可以在groovy中做到这一点:
String tag = "<input type=\"submit\" class=\"cssSubmit\"/>"
tag = new XmlSlurper().parseText(tag).with { x ->
x.@class = 'cssSubmit disable'
x.@disabled = 'disabled'
new groovy.xml.StreamingMarkupBuilder().bind { delegate.out << x}.toString()
}
#1
This is the most generic way I can think of:
这是我能想到的最通用的方式:
public static String editTagXML(String tag,
Map<String, String> newAttributes,
Collection<String> removeAttributes)
throws SAXException, IOException,
ParserConfigurationException, TransformerConfigurationException,
TransformerException {
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder()
.parse(new InputSource(new StringReader(tag)));
Element root = doc.getDocumentElement();
NamedNodeMap attrs = root.getAttributes();
for (String removeAttr : removeAttributes) {
attrs.removeNamedItem(removeAttr);
}
for (Map.Entry<String, String> addAttr : newAttributes.entrySet()) {
final Attr attr = doc.createAttribute(addAttr.getKey());
attr.setValue(addAttr.getValue());
attrs.setNamedItem(attr);
}
StringWriter result = new StringWriter();
final Transformer transformer = TransformerFactory.newInstance()
.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(doc), new StreamResult(result));
return result.toString();
}
public static void main(String[] args) throws Exception {
long start = System.nanoTime();
String tag = "<input type=\"submit\" class=\"cssSubmit\"/>";
String edited = editTagXML(tag, new HashMap<String, String>() {{
put("class", "cssSubmit disable");
put("disabled", "disabled");
}}, new ArrayList<>());
long time = System.nanoTime() - start;
System.out.println(edited);
System.out.println("Time: " + time + " ns");
start = System.nanoTime();
tag = "<input type=\"submit\" class=\"cssSubmit\"/>";
editTagXML(tag, new HashMap<String, String>() {{
put("class", "cssSubmit disable");
put("disabled", "disabled");
}}, new ArrayList<>());
time = System.nanoTime() - start;
System.out.println("Time2: " + time + " ns");
}
It is ugly, huge, complicated, throws a lot of checked exceptions and mixes up the attributes order which may or may not be important. It is probably not how it should be done. It is also pretty slow.
它是丑陋的,巨大的,复杂的,抛出了大量已检查的异常并混合了属性顺序,这可能是重要的,也可能是不重要的。它可能不是应该如何完成的。它也很慢。
Here is the output:
这是输出:
<input class="cssSubmit disable" disabled="disabled" type="submit"/>
Time: 86213231 ns
Time2: 2379674 ns
The first run is probably so slow because it takes a while to load up the necessary libraries. The second run is surprisingly fast, but my PC is pretty powerful too. If you put some constraints on your input (like, attribute values are only quoted with "
, and no "
in attribute values and so on), there will be probably a much better way to do it, like using regular expressions or maybe even simple iteration.
第一次运行可能很慢,因为加载必要的库需要一段时间。第二次运行速度惊人,但我的PC也非常强大。如果你对你的输入设置了一些约束(比如,属性值只引用属性值中的“,而不是”等等),那么可能会有更好的方法来实现它,比如使用正则表达式甚至可能更简单迭代。
For example, if your input always looks like that, this could work just as well:
例如,如果您的输入始终如此,那么这也可以起作用:
start = System.nanoTime();
edited = tag.replaceFirst("\"cssSubmit\"", "\"cssSubmit disable\" disabled=\"disabled\"");
time = System.nanoTime() - start;
System.out.println(edited);
System.out.println("Time3: " + time + " ns");
Output:
<input type="submit" class="cssSubmit disable" disabled="disabled"/>
Time3: 1422672 ns
Hmm. The funny thing is, it's not that faster.
嗯。有趣的是,它不是那么快。
OK, but what if we want a more generic solution, but still simple enough? We could use regular expressions:
好的,但如果我们想要一个更通用的解决方案,但仍然足够简单怎么办?我们可以使用正则表达式:
private static final Pattern classAttributePattern
= Pattern.compile("\\bclass=\"([^\"]+)\"");
public static String disableTag(String tag) {
Matcher matcher = classAttributePattern.matcher(tag);
if (!matcher.find()) {
throw new IllegalArgumentException("Doesn't match: " + tag);
}
int start = matcher.start();
int end = matcher.end();
String classValue = matcher.group(1);
if (classValue.endsWith(" disable")) {
return tag; // already disabled
} else {
// assume that if the class doesn't end with " disable",
// then the disabled attribute is not present as well
return tag.substring(0, start)
+ "class=\"" + classValue
+ " disable\" disabled=\"disabled\""
+ tag.substring(end);
}
}
Note that usually using regular expressions for XML/(X)HTML is extremely error-prone. Here is a non-exhaustive list of example inputs that could break the code above:
请注意,通常使用XML /(X)HTML的正则表达式非常容易出错。以下是可能破坏上述代码的示例输入的非详尽列表:
-
<input type="submit" class="cssSubmit disable " disabled="disabled"/>
- this will break because of the space before the quote; -
<input type="submit" class='cssSubmit disable' disabled="disabled"/>
- this will break because single quotes are not expected by our code; -
<input type="submit" class = "cssSubmit" disabled="disabled"/>
- this will break because there are spaces around=
; -
<input title='this is an input with class="cssSubmit" that could be changed to class="cssSubmit disable"' type="submit" class="cssSubmit" disabled="disabled"/>
- this will break because there is attribute-like text in another attribute's value.
- 这会因为引号前的空格而中断;
- 这会破坏,因为我们的代码不需要单引号;
- 这会因为周围有空格而中断;
- 这会打破因为那里是另一个属性值中的属性类文本。
Each of these cases can be fixed by modifying the pattern in some way (although I'm not sure about the last one), but then you can find yet another case when it breaks. So this technique is best used for the input that was generated by a program, rather than written by a human, and even then you should be careful about where the input for that program came from (it could easily contain attribute values like in the last example).
这些情况中的每一种都可以通过以某种方式修改模式来修复(虽然我不确定最后一种),但是当它中断时你可以找到另一种情况。所以这种技术最好用于程序生成的输入,而不是人类编写的输入,即使这样,你也应该注意该程序的输入来自哪里(它可以很容易地包含属性值,如同最后一个例)。
#2
You can do this in groovy:
你可以在groovy中做到这一点:
String tag = "<input type=\"submit\" class=\"cssSubmit\"/>"
tag = new XmlSlurper().parseText(tag).with { x ->
x.@class = 'cssSubmit disable'
x.@disabled = 'disabled'
new groovy.xml.StreamingMarkupBuilder().bind { delegate.out << x}.toString()
}