用正确的格式编写XML字符串?

时间:2021-10-06 20:23:24

Please pardon my lack of proper terminology, as I'm sure there's a term for this. I'm writing XML text using raw strings (not with any type of XML builder/parser, for ease of use). However, I'm facing an issue where some characters in the data I'm providing throw off the standardization. For example, the & symbol. When a string includes this, the end parser gets thrown off. How do I accommodate for this properly and convert strings to XML standards?

请原谅我没有恰当的术语,因为我相信这是有术语的。我正在使用原始字符串编写XML文本(为了便于使用,不使用任何类型的XML构建器/解析器)。但是,我面临的问题是,我提供的数据中的一些字符会使标准化变得不那么容易。例如,&符号。当一个字符串包含此内容时,结束解析器就会被抛出。如何正确地适应这种情况并将字符串转换为XML标准?

I'm writing plain strings to a string list and reading its Text property like below. Note the subroutine A(const S: String); which is a shortened method of adding a line to the XML file and adds a necessary indent. See the subroutine Standardize, this is what I need to fill in.

我将普通字符串写入字符串列表,并读取它的文本属性,如下所示。注意子例程A(const S: String);这是向XML文件添加行并添加必要的缩进的一种缩短方法。见子例程标准化,这是我需要填写的。

uses Windows, Classes, SysUtils, DB, ADODB, ActiveX;

function TSomething.FetchXML(const SQL: String): String;
var
  L: TStringList;
  Q: TADOQuery;
  X, Y: Integer;
  function Standardize(const S: String): String;
  begin
    Result:= S; //<<<--- Need to convert string to XML standards
  end;
  procedure A(const Text: String; const Indent: Integer = 0);
  var
    I: Integer;
    S: String;
  begin
    if Indent > 0 then
      for I := 0 to Indent do
        S:= S + '  ';
    L.Append(S + Text);
  end;
begin
  Result:= '';
  L:= TStringList.Create;
  try
    Q:= TADOQuery.Create(nil);
    try
      Q.ConnectionString:= FCredentials.ConnectionString;
      Q.SQL.Text:= SQL;
      Q.Open;
      A('<?xml version="1.0" encoding="UTF-8"?>');
      A('<dataset Source="ECatAPI">');
      A('<table>');
      A('<fields>', 1);
      for X := 0 to Q.FieldCount - 1 do begin
        A('<field Name="'+Q.Fields[X].FieldName+'" '+
          'Type="'+IntToStr(Integer(Q.Fields[X].DataType))+'" '+
          'Width="'+IntToStr(Q.Fields[X].DisplayWidth)+'" />', 2);
      end;
      A('</fields>', 1);
      A('<rows>', 1);
      if not Q.IsEmpty then begin
        Q.First;
        while not Q.Eof do begin
          A('<row>', 2);
          for Y:= 0 to Q.FieldCount - 1 do begin
            A('<value Field="'+Q.Fields[Y].FieldName+'">'+
              Standardize(Q.Fields[Y].AsString)+'</value>', 3);
          end;
          A('</row>', 2);
          Q.Next;
        end;
      end;
      A('</rows>', 1);
      A('</table>');
      A('</dataset>');
      Result:= L.Text;
      Q.Close;
    finally
      Q.Free;
    end;
  finally
    L.Free;
  end;
end;

NOTE

请注意

The above is pseudo-code, copied and modified, irrelevant things have been altered/excluded...

上面是伪代码,复制和修改,不相关的东西被修改/排除…

MORE INFO

更多信息

This application is a stand-alone web server providing read-only access to data. I only need to write XML data, I don't need to read it. And even if I do, I have an XML parser library covering that part already. I'm trying to keep this light-weight as possible, without filling the memory with unnecessary objects.

这个应用程序是一个独立的web服务器,提供对数据的只读访问。我只需要编写XML数据,不需要读取它。即使我这样做了,我已经有一个XML解析器库覆盖了这一部分。我尽量保持这个轻的重量,而不是用不必要的对象填充内存。

4 个解决方案

#1


4  

Do not generate XML by hand PERIOD.

不要手工生成XML。

Writing correct code for escaping complex data (for instance XML, HTML or other SGML in XML, escaped CDATA) is not worth it.

为转义复杂数据(例如XML、HTML或XML中的其他SGML、转义CDATA)编写正确的代码是不值得的。

The escaping you do is just a start. Wait until someone puts something in your data that is incompatible.

你的逃避只是一个开始。等待,直到有人在你的数据中放入不兼容的东西。

Many databases support creating well formed XML from queries anyway (see the other answers), that is a direction you should be looking into.

无论如何,许多数据库都支持通过查询创建格式良好的XML(请参阅其他答案),这是您应该研究的方向。

#2


3  

Another tip: Maybe your database supports generating results as XML.

另一个提示:也许您的数据库支持以XML的形式生成结果。

#3


1  

Jerry' solution is a good one.

杰瑞的解决方案很好。

It's worth noting that there are existing VCL procedures to do this.

值得注意的是,有现有的VCL过程可以做到这一点。

unit IdStrings has StrXHtmlEncode(). This is identical to Jerry's solution.

单位IdStrings StrXHtmlEncode()。这和杰瑞的解决方案是一样的。

unit HttpApp has HTMLEncode(). This function is more efficient that Jerry's solution - but be warned - this procedure is actually broken for unicode strings. It worked correctly in pre unicode compilers, but was not correctly upgraded for unicode, and the error has never been fixed.

单位HttpApp HTMLEncode()。这个函数比Jerry的解决方案更有效,但是要注意,这个过程实际上在unicode字符串中被破坏了。它在前unicode编译器中工作正确,但是在unicode中没有正确地升级,并且错误从未被修正。

A unicode safe version of HttpApp.HTMLEncode(), with the apos replacement added, is as follows. It's more verbose that the StringReplace() style, but a lot more efficient in terms of run-time performance. (apos is a predefined entity for XML and XHTHML, but not for HTML 4).

下面是HttpApp.HTMLEncode()的unicode安全版本,添加了apos替换。StringReplace()样式更加冗长,但是在运行时性能方面效率更高。(apos是为XML和XHTHML预定义的实体,但不是为HTML 4)。

function XHTMLEncode( const sRawValue: string): string;
var
  Sp, Rp: PChar;
begin
  SetLength( result, Length( sRawValue) * 10);
  Sp := PChar( sRawValue);
  Rp := PChar( result);
  while Sp^ <> #0 do
  begin
    case Sp^ of
      '&': begin
             FormatBuf( Rp^, 10, '&amp;', 10, []);
             Inc(Rp,4);
           end;
      '<',
      '>': begin
             if Sp^ = '<' then
               FormatBuf(Rp^, 8, '&lt;', 8, [])
             else
               FormatBuf(Rp^, 8, '&gt;', 8, []);
             Inc(Rp,3);
           end;
      '"': begin
             FormatBuf(Rp^, 12, '&quot;', 12, []);
             Inc(Rp,5);
           end;
      '''': begin
             FormatBuf(Rp^, 12, '&apos;', 12, []);
             Inc(Rp,5);
           end;
    else
      Rp^ := Sp^
    end;
    Inc(Rp);
    Inc(Sp);
  end;
  SetLength( result, Rp - PChar( result))
end;

#4


0  

Thanks to the comments above in the question, I've implemented a function to replace predefined entities with the appropriate name. This is the new subroutine:

由于上面的评论,我实现了一个函数,用适当的名称替换预定义的实体。这是新的子程序:

function EncodeXmlStr(const S: String): String;
begin
  Result:= StringReplace(S,      '&',  '&amp;',  [rfReplaceAll]);
  Result:= StringReplace(Result, '''', '&apos;', [rfReplaceAll]);
  Result:= StringReplace(Result, '"',  '&quot;', [rfReplaceAll]);
  Result:= StringReplace(Result, '<',  '&lt;',   [rfReplaceAll]);
  Result:= StringReplace(Result, '>',  '&gt;',   [rfReplaceAll]);
end;

#1


4  

Do not generate XML by hand PERIOD.

不要手工生成XML。

Writing correct code for escaping complex data (for instance XML, HTML or other SGML in XML, escaped CDATA) is not worth it.

为转义复杂数据(例如XML、HTML或XML中的其他SGML、转义CDATA)编写正确的代码是不值得的。

The escaping you do is just a start. Wait until someone puts something in your data that is incompatible.

你的逃避只是一个开始。等待,直到有人在你的数据中放入不兼容的东西。

Many databases support creating well formed XML from queries anyway (see the other answers), that is a direction you should be looking into.

无论如何,许多数据库都支持通过查询创建格式良好的XML(请参阅其他答案),这是您应该研究的方向。

#2


3  

Another tip: Maybe your database supports generating results as XML.

另一个提示:也许您的数据库支持以XML的形式生成结果。

#3


1  

Jerry' solution is a good one.

杰瑞的解决方案很好。

It's worth noting that there are existing VCL procedures to do this.

值得注意的是,有现有的VCL过程可以做到这一点。

unit IdStrings has StrXHtmlEncode(). This is identical to Jerry's solution.

单位IdStrings StrXHtmlEncode()。这和杰瑞的解决方案是一样的。

unit HttpApp has HTMLEncode(). This function is more efficient that Jerry's solution - but be warned - this procedure is actually broken for unicode strings. It worked correctly in pre unicode compilers, but was not correctly upgraded for unicode, and the error has never been fixed.

单位HttpApp HTMLEncode()。这个函数比Jerry的解决方案更有效,但是要注意,这个过程实际上在unicode字符串中被破坏了。它在前unicode编译器中工作正确,但是在unicode中没有正确地升级,并且错误从未被修正。

A unicode safe version of HttpApp.HTMLEncode(), with the apos replacement added, is as follows. It's more verbose that the StringReplace() style, but a lot more efficient in terms of run-time performance. (apos is a predefined entity for XML and XHTHML, but not for HTML 4).

下面是HttpApp.HTMLEncode()的unicode安全版本,添加了apos替换。StringReplace()样式更加冗长,但是在运行时性能方面效率更高。(apos是为XML和XHTHML预定义的实体,但不是为HTML 4)。

function XHTMLEncode( const sRawValue: string): string;
var
  Sp, Rp: PChar;
begin
  SetLength( result, Length( sRawValue) * 10);
  Sp := PChar( sRawValue);
  Rp := PChar( result);
  while Sp^ <> #0 do
  begin
    case Sp^ of
      '&': begin
             FormatBuf( Rp^, 10, '&amp;', 10, []);
             Inc(Rp,4);
           end;
      '<',
      '>': begin
             if Sp^ = '<' then
               FormatBuf(Rp^, 8, '&lt;', 8, [])
             else
               FormatBuf(Rp^, 8, '&gt;', 8, []);
             Inc(Rp,3);
           end;
      '"': begin
             FormatBuf(Rp^, 12, '&quot;', 12, []);
             Inc(Rp,5);
           end;
      '''': begin
             FormatBuf(Rp^, 12, '&apos;', 12, []);
             Inc(Rp,5);
           end;
    else
      Rp^ := Sp^
    end;
    Inc(Rp);
    Inc(Sp);
  end;
  SetLength( result, Rp - PChar( result))
end;

#4


0  

Thanks to the comments above in the question, I've implemented a function to replace predefined entities with the appropriate name. This is the new subroutine:

由于上面的评论,我实现了一个函数,用适当的名称替换预定义的实体。这是新的子程序:

function EncodeXmlStr(const S: String): String;
begin
  Result:= StringReplace(S,      '&',  '&amp;',  [rfReplaceAll]);
  Result:= StringReplace(Result, '''', '&apos;', [rfReplaceAll]);
  Result:= StringReplace(Result, '"',  '&quot;', [rfReplaceAll]);
  Result:= StringReplace(Result, '<',  '&lt;',   [rfReplaceAll]);
  Result:= StringReplace(Result, '>',  '&gt;',   [rfReplaceAll]);
end;