I have an xml string that is being posted to an ashx handler on the server. The xml string is built on the client-side and is based on a few different entries made on a form. Occasionally some users will copy and paste from other sources into the web form. When I try to load the xml string into an XMLDocument
object using xmldoc.LoadXml(xmlStr)
I get the following exception:
我有一个xml字符串,它被发布到服务器上的ashx处理程序。 xml字符串构建在客户端,基于表单上的几个不同条目。有时,某些用户会从其他来源复制并粘贴到Web表单中。当我尝试使用xmldoc.LoadXml(xmlStr)将xml字符串加载到XMLDocument对象时,我得到以下异常:
System.Xml.XmlException = {"'', hexadecimal value 0x0B, is an invalid character. Line 2, position 1."}
In debug mode I can see the rogue character (sorry I'm not sure of it's official title?):
在调试模式中,我可以看到流氓角色(抱歉,我不确定它的官方标题?):
My questions is how can I sanitise the xml string before I attempt to load it into the XMLDocument object? Do I need a custom function to parse out all these sorts of characters one-by-one or can I use some native .NET4 class to remove them?
我的问题是,在尝试将xml字符串加载到XMLDocument对象之前,如何清理xml字符串?我是否需要一个自定义函数来逐个解析所有这些类型的字符,还是可以使用一些本机.NET4类来删除它们?
2 个解决方案
#1
23
Here you have an example to clean xml invalid characters using Regex
:
这里有一个使用Regex清除xml无效字符的示例:
xmlString = CleanInvalidXmlChars(xmlString);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xmlString);
public static string CleanInvalidXmlChars(string text)
{
string re = @"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]";
return Regex.Replace(text, re, "");
}
#2
2
A more efficient way to not error out on invalid XML characters would be to use the CheckCharacters flag in XmlReaderSettings.
不对错误的XML字符进行错误输出的更有效方法是在XmlReaderSettings中使用CheckCharacters标志。
var xmlDoc = new XmlDocument();
var xmlReaderSettings = new XmlReaderSettings { CheckCharacters = false };
using (var stringReader = new StringReader(xml)) {
using (var xmlReader = XmlReader.Create(stringReader, xmlReaderSettings)) {
xmlDoc.Load(xmlReader);
}
}
#1
23
Here you have an example to clean xml invalid characters using Regex
:
这里有一个使用Regex清除xml无效字符的示例:
xmlString = CleanInvalidXmlChars(xmlString);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xmlString);
public static string CleanInvalidXmlChars(string text)
{
string re = @"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]";
return Regex.Replace(text, re, "");
}
#2
2
A more efficient way to not error out on invalid XML characters would be to use the CheckCharacters flag in XmlReaderSettings.
不对错误的XML字符进行错误输出的更有效方法是在XmlReaderSettings中使用CheckCharacters标志。
var xmlDoc = new XmlDocument();
var xmlReaderSettings = new XmlReaderSettings { CheckCharacters = false };
using (var stringReader = new StringReader(xml)) {
using (var xmlReader = XmlReader.Create(stringReader, xmlReaderSettings)) {
xmlDoc.Load(xmlReader);
}
}