python xml.dom中的XML字符

I am working on producing an xml document from python. We are using the xml.dom package to create the xml document. We are having a problem where we want to produce the character φ which is a φ. However, when we put that string in a text node and call toxml() on it we get &#x03c6;. Our current solution is to use saxutils.unescape() on the result of toxml() but this is not ideal because we will have to parse the xml twice.

我正在从python生成一个xml文档。我们使用xml.dom包来创建xml文档。我们遇到了一个问题,我们想要制作角色φ这是一个φ。但是,当我们将该字符串放在文本节点中并在其上调用toxml()时,我们得到φ。我们当前的解决方案是在toxml()的结果上使用saxutils.unescape()但这并不理想,因为我们必须解析xml两次。

Is there someway to get the dom package to recognize "φ" as an xml character?

有没有办法让dom包识别“φ”作为一个xml字符?

1 个解决方案

#1

I think you need to use a Unicode string with \u03c6 in it, because the .data field of a text node is supposed (as far as I understand) to be "parsed" data, not including XML entities (whence the & when made back into XML). If you want to ensure that, on output, non-ascii characters are expressed as entities, you could do:

我认为你需要在其中使用带有\ u03c6的Unicode字符串,因为文本节点的.data字段被认为(据我所知)是“解析”数据,不包括XML实体(从而& when回到XML)。如果要确保在输出时将非ascii字符表示为实体,则可以执行以下操作:

import codecs

def ent_replace(exc):
  if isinstance(exc, (UnicodeEncodeError, UnicodeTranslateError)):
    s = []
    for c in exc.object[exc.start:exc.end]:
      s.append(u'&#x%4.4x;' % ord(c))
    return (''.join(s), exc.end)
  else:
    raise TypeError("can't handle %s" % exc.__name__)

codecs.register_error('ent_replace', ent_replace)

and use x.toxml().encode('ascii', 'ent_replace').

并使用x.toxml()。encode('ascii','ent_replace')。

#1

import codecs

def ent_replace(exc):
  if isinstance(exc, (UnicodeEncodeError, UnicodeTranslateError)):
    s = []
    for c in exc.object[exc.start:exc.end]:
      s.append(u'&#x%4.4x;' % ord(c))
    return (''.join(s), exc.end)
  else:
    raise TypeError("can't handle %s" % exc.__name__)

codecs.register_error('ent_replace', ent_replace)

and use x.toxml().encode('ascii', 'ent_replace').

并使用x.toxml()。encode('ascii','ent_replace')。

秒客网

python xml.dom中的XML字符

1 个解决方案

#1

#1

相关文章