如何使用xml.etree.ElementTree编写XML声明


72

我正在使用Python在Python中生成XML文档ElementTree,但是在转换为纯文本时,该tostring函数不包含XML声明

from xml.etree.ElementTree import Element, tostring

document = Element('outer')
node = SubElement(document, 'inner')
node.NewValue = 1
print tostring(document)  # Outputs "<outer><inner /></outer>"

我需要我的字符串包含以下XML声明:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

但是,似乎没有任何记录的方式来执行此操作。

有没有合适的方法来呈现XML声明ElementTree

Answers:


115

我很惊讶地发现似乎没有办法ElementTree.tostring()。但是,您可以ElementTree.ElementTree.write()用来将XML文档写入伪文件:

from io import BytesIO
from xml.etree import ElementTree as ET

document = ET.Element('outer')
node = ET.SubElement(document, 'inner')
et = ET.ElementTree(document)

f = BytesIO()
et.write(f, encoding='utf-8', xml_declaration=True) 
print(f.getvalue())  # your XML file, encoded as UTF-8

看到这个问题。即使那样,我认为如果不自己编写,就无法获得“独立”属性。


为什么在这里定义“节点”变量?
铁路苏莱曼诺夫

6
感谢这一行et.write(f,encoding ='utf-8',xml_declaration = True)救了我的一天
Vineel

et.write()有一个漂亮的打印参数吗?或任何其他方式来生成带有换行符的xml?
jan-seins

@ jan-seins是的,请参见stackoverflow.com/questions/749796/…–

29

我将使用lxml(请参阅http://lxml.de/api.html)。

那么你也能:

from lxml import etree
document = etree.Element('outer')
node = etree.SubElement(document, 'inner')
print(etree.tostring(document, xml_declaration=True))

21

如果包含encoding='utf8',则将获得XML标头

xml.etree.ElementTree.tostring使用encoding ='utf8'编写XML编码声明

示例Python代码(适用于Python 2和3):

import xml.etree.ElementTree as ElementTree

tree = ElementTree.ElementTree(
    ElementTree.fromstring('<xml><test>123</test></xml>')
)
root = tree.getroot()

print('without:')
print(ElementTree.tostring(root, method='xml'))
print('')
print('with:')
print(ElementTree.tostring(root, encoding='utf8', method='xml'))

Python 2输出:

$ python2 example.py
without:
<xml><test>123</test></xml>

with:
<?xml version='1.0' encoding='utf8'?>
<xml><test>123</test></xml>

在Python 3中,您会注意到表示返回字节文字b前缀(就像Python 2一样):

$ python3 example.py
without:
b'<xml><test>123</test></xml>'

with:
b"<?xml version='1.0' encoding='utf8'?>\n<xml><test>123</test></xml>"

在Python 3中,转义字符将在打印时显示在声明中。<?xml version=\'1.0\' encoding=\'utf8\'?>
Stevoisiak

有助于解决此问题的原因是想知道您为什么要执行大量操作,Elementree.Elementree(Elementree.fromstring(...而现在我意识到fromstring返回的element不是ElementTree,而parse方法确实返回的是ElementTree。这使得尝试通过使用字符串来模拟测试套件中的xml文件非常混乱!如果您使用该元素并运行tostring,它允许那些编码和方法参数,但是输出缺少<?xml声明行,现在我看到的是,因为它不是完整的文档。
达沃斯

请注意,这utf8不是有效的字符编码字符串。这也是Python3添加声明并以Bytes而不是string的形式返回整个内容的原因。
mbirth

@mbirth,因此该方法应表示为“ tobytes”而不是“ tostring”。
Marek Marczak

@MarekMarczak不,XML应该读encoding='utf-8'为有效。
mbirth

3

我最近遇到此问题,在对代码进行了一些挖掘之后,我发现以下代码段是函数的定义 ElementTree.write

def write(self, file, encoding="us-ascii"):
    assert self._root is not None
    if not hasattr(file, "write"):
        file = open(file, "wb")
    if not encoding:
        encoding = "us-ascii"
    elif encoding != "utf-8" and encoding != "us-ascii":
        file.write("<?xml version='1.0' encoding='%s'?>\n" % 
     encoding)
    self._write(file, self._root, encoding, {})

因此,答案是,如果您需要将XML标头写入文件,请设置encoding参数不是utf-8or us-ascii,例如UTF-8


尽管它很脆弱,但这将是一个不错的选择,但是它似乎不起作用(在此之前,编码可能是小写的)。另外,ElementTree.ElementTree.write()据记录具有xml_declaration参数(请参见接受的答案)。但是ElementTree.tostring()没有该参数,这是原始问题中提出的方法。
昆汀·普拉德

2

ElementTree包用法的最小工作示例:

import xml.etree.ElementTree as ET

document = ET.Element('outer')
node = ET.SubElement(document, 'inner')
node.text = '1'
res = ET.tostring(document, encoding='utf8', method='xml').decode()
print(res)

输出为:

<?xml version='1.0' encoding='utf8'?>
<outer><inner>1</inner></outer>

3
不幸的是utf8'不是有效的XML,但'UTF-8'是docs.python.org/3.8/library/xml.etree.elementtree.html#id6
空袭

1

另一个非常简单的选项是将所需的标头连接到xml的字符串,如下所示:

xml = (bytes('<?xml version="1.0" encoding="UTF-8"?>\n', encoding='utf-8') + ET.tostring(root))
xml = xml.decode('utf-8')
with open('invoice.xml', 'w+') as f:
    f.write(xml)

它给出了这个错误:TypeError:str()最多接受1个参数(给定2个)
Panduranga Rao Sadhu,

1

简单

Python 2和3的示例(编码参数必须为utf8):

import xml.etree.ElementTree as ElementTree

tree = ElementTree.ElementTree(ElementTree.fromstring('<xml><test>123</test></xml>'))
root = tree.getroot()
print(ElementTree.tostring(root, encoding='utf8', method='xml'))

从Python 3.8开始,该东西有xml_declaration参数:

3.8版中的新功能:xml_declaration和default_namespace参数。

xml.etree.ElementTree.tostring(element,encoding =“ us-ascii”,method =“ xml”,*,xml_declaration = None,default_namespace = None,short_empty_elements = True)生成XML元素的字符串表示形式,包括所有子元素。element是一个Element实例。encoding 1是输出编码(默认为US-ASCII)。使用encoding =“ unicode”生成Unicode字符串(否则,将生成一个字节字符串)。方法是“ xml”,“ html”或“ text”(默认为“ xml”)。xml_declaration,default_namespace和short_empty_elements具有与ElementTree.write()中相同的含义。返回包含XML数据的(可选)编码字符串。

适用于Python 3.8及更高版本的示例:

import xml.etree.ElementTree as ElementTree

tree = ElementTree.ElementTree(ElementTree.fromstring('<xml><test>123</test></xml>'))
root = tree.getroot()
print(ElementTree.tostring(root, encoding='unicode', method='xml', xml_declaration=True))

1

xml_declaration参数

是否有合适的方法在ElementTree中呈现XML声明?

是的,不需要使用.tostring功能。根据ElementTree文档,您应该创建一个ElementTree对象,创建Element和SubElements,设置树的根,最后xml_declaration.write函数中,因此声明行包含在输出文件中。

您可以这样操作:

import xml.etree.ElementTree as ET

tree = ET.ElementTree("tree")

document = ET.Element("outer")
node1 = ET.SubElement(document, "inner")
node1.text = "text"

tree._setroot(document)
tree.write("./output.xml", encoding = "UTF-8", xml_declaration = True)  

输出文件是:

<?xml version='1.0' encoding='UTF-8'?>
<outer><inner>text</inner></outer>

0

我会使用ET

try:
    from lxml import etree
    print("running with lxml.etree")
except ImportError:
    try:
        # Python 2.5
        import xml.etree.cElementTree as etree
        print("running with cElementTree on Python 2.5+")
    except ImportError:
        try:
            # Python 2.5
            import xml.etree.ElementTree as etree
            print("running with ElementTree on Python 2.5+")
        except ImportError:
            try:
                # normal cElementTree install
                import cElementTree as etree
                print("running with cElementTree")
            except ImportError:
               try:
                   # normal ElementTree install
                   import elementtree.ElementTree as etree
                   print("running with ElementTree")
               except ImportError:
                   print("Failed to import ElementTree from any known place")

document = etree.Element('outer')
node = etree.SubElement(document, 'inner')
print(etree.tostring(document, encoding='UTF-8', xml_declaration=True))

0

如果您只想打印,这将起作用。尝试将其发送到文件时出现错误...

import xml.dom.minidom as minidom
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element, SubElement, Comment, tostring

def prettify(elem):
    rough_string = ET.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="  ")

0

在声明中包括“独立”

我没有找到standalone在文档中添加参数的任何替代方法,因此我对ET.tosting函数进行了修改以将其作为参数。

from xml.etree import ElementTree as ET

# Sample
document = ET.Element('outer')
node = ET.SubElement(document, 'inner')
et = ET.ElementTree(document)

 # Function that you need   
 def tostring(element, declaration, encoding=None, method=None,):
     class dummy:
         pass
     data = []
     data.append(declaration+"\n")
     file = dummy()
     file.write = data.append
     ET.ElementTree(element).write(file, encoding, method=method)
     return "".join(data)
# Working example
xdec = """<?xml version="1.0" encoding="UTF-8" standalone="no" ?>"""    
xml = tostring(document, encoding='utf-8', declaration=xdec)
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.