147

在执行Shell脚本时，通常数据将存储在单行记录的文件中，例如csv。使用grep和处理数据非常简单sed。但是我必须经常处理XML，因此我真的很想一种通过命令行编写对XML数据的脚本访问方式。什么是最好的工具？

xml command-line scripting

— 约瑟夫·霍尔斯滕
source

xml_grep适用于

— grepping

105

我发现xmlstarlet在这种事情上非常出色。

http://xmlstar.sourceforge.net/

大多数发行版存储库中也应该可用。入门教程在这里：

http://www.ibm.com/developerworks/library/x-starlet.html

— 拉斯
source

1

我想指出的是，Sourceforge网站上有Windows二进制文件。

— 史蒂夫·贝内特

据我所知，它不支持XQuery。

— 史蒂夫·贝内特

@SteveBennett确实没有，但是它在原始XPath之上添加的功能足以使其与“ grep and sed”竞争。如果您想获得XQuery的精美外观，那么……更像是与perl或awk等效的XML。:)

— 查尔斯·达菲

36

一些有前途的工具：

nokogiri：使用XPath和CSS选择器在ruby中解析HTML / XML DOM
hpricot：已弃用
fxgrep：使用其自己的类似XPath的语法来查询文档。用SML编写，因此安装可能很困难。
LT XML：从SGML工具，包括衍生XML工具箱sggrep，sgsort， xmlnorm和其他人。使用其自己的查询语法。该文档非常正式。LT用C写。LT XML 2声称支持XPath，XInclude和其他W3C标准。
xmlgrep2：使用XPath进行简单而强大的搜索。使用XML :: LibXML和libxml2在Perl中编写。
XQSharp：支持XQuery，它是XPath的扩展。为.NET Framework而编写。
xml-coreutils：Laird Breyer的等同于GNU coreutils的工具包。在有关理想工具包应包含的内容的有趣文章中进行了讨论。
xmldiff：比较两个xml文件的简单工具。
xmltk：在debian，ubuntu，fedora或macports中似乎没有软件包，自2007年以来没有发布过，并且使用了非便携式构建自动化。

xml-coreutils似乎是记录最好的文档，也是最面向UNIX的。

— 约瑟夫·霍尔斯滕
source

1

您是否不能为Ruby程序创建包装器脚本，然后将脚本中的参数数组传递给hpricot？例如，在PHP Shell脚本中，应执行以下操作：<？php / path / to / hpricot $ argv？>

— 9

25

在Joseph Holsten的出色列表中，我添加了Perl库XML :: XPath随附的xpath命令行脚本。从XML文件提取信息的好方法：

 xpath -q -e '/entry[@xml:lang="fr"]' *xml

— Bortzmeyer
source

3

默认情况下，它已安装在osx中，但没有-q -e选项。例如，从“ AndroidManifest.xml”中的“清单”节点获取属性“包装”值：xpath AndroidManifest.xml 'string(/manifest/@package)' 2> /dev/null

— antonj 2011年

25

也有xml2和2xml一对。它将允许普通的字符串编辑工具来处理XML。

例。q.xml：

<?xml version="1.0"?>
<foo>
    text
    more text
    <textnode>ddd</textnode><textnode a="bv">dsss</textnode>
    <![CDATA[ asfdasdsa <foo> sdfsdfdsf <bar> ]]>
</foo>

xml2 < q.xml

/foo=
/foo=   text
/foo=   more text
/foo=   
/foo/textnode=ddd
/foo/textnode
/foo/textnode/@a=bv
/foo/textnode=dsss
/foo=
/foo=    asfdasdsa <foo> sdfsdfdsf <bar> 
/foo=

xml2 < q.xml | grep textnode | sed 's!/foo!/bar/baz!' | 2xml

<bar><baz><textnode>ddd</textnode><textnode a="bv">dsss</textnode></baz></bar>

PS还有html2/ 2html。

— 六
source

@Joseph Holsten是的。它允许使用XML进行黑客攻击，而无需考虑XPath问题。

— 六。

真好！我一直专注于不使用中间格式的工具，但是高保真，面向行的xml表示的想法似乎是继续使用真实grep和sed的一种好方法。你尝试过pyxie吗？它如何比较？还有其他面向行的表示形式吗？您是否认为这比仅将xml换行符替换为实体（＆＃10;）更好？这将使您至少将记录粘贴在同一行上。哦，您可以编辑帖子以包含指向该项目的链接吗？

— 约瑟夫·霍尔斯滕

@Joseph Holsten不，我认为pyxie格式比xml2格式有用。xml2在嵌套的XML元素中提供“完整路径”，因此允许更多的面向行的匹配和替换。还2xml可以轻松地从部分（过滤的）xml2输出中重新创建XML 。

— 六。

5

+1我无法接受足够的投票...- cat foo.xml | xml2 | grep /bar | 2xml为您提供与原始结构相同的结构，但是除“ bar”元素外，所有元素均已删除。太棒了

— mogsie

14

您可以使用xmllint：

xmllint --xpath //title books.xml

应该与大多数发行版捆绑在一起，并且也与Cygwin捆绑在一起。

$ xmllint --version
xmllint: using libxml version 20900

看到：

$ xmllint
Usage : xmllint [options] XMLfiles ...
        Parse the XML files and output the result of the parsing
        --version : display the version of the XML library used
        --debug : dump a debug tree of the in-memory document
        ...
        --schematron schema : do validation against a schematron
        --sax1: use the old SAX1 interfaces for processing
        --sax: do not build a tree but work just at the SAX level
        --oldxml10: use XML-1.0 parsing rules before the 5th edition
        --xpath expr: evaluate the XPath expression, inply --noout

— 戴夫·贾维斯（Dave Jarvis）
source

2

没有--xpath参数xmllint：manpagez.com/man/1/xmllint

— 悲惨变量

1

@MiserableVariable：手册页不正确。我只是查看了我的版本的手册页：xpath参数未列出。这是一个文档错误。请尝试运行该程序。

— Dave Jarvis

2

@MiserableVariable --xpath是一个相当新的添加，例如在RHEL 6版本的中xmllint。

— Daniel Beck 2013年

2

更准确地说，xmllint --xpath是在libxml2 2.7.7（2010年）中引入的。

— 2014年

9

如果您正在Windows上寻找解决方案，则Powershell具有用于读取和写入XML的内置功能。

test.xml：

<root>
  <one>I like applesauce</one>
  <two>You sure bet I do!</two>
</root>

Powershell脚本：

# load XML file into local variable and cast as XML type.
$doc = [xml](Get-Content ./test.xml)

$doc.root.one                                   #echoes "I like applesauce"
$doc.root.one = "Who doesn't like applesauce?"  #replace inner text of <one> node

# create new node...
$newNode = $doc.CreateElement("three")
$newNode.set_InnerText("And don't you forget it!")

# ...and position it in the hierarchy
$doc.root.AppendChild($newNode)

# write results to disk
$doc.save("./testNew.xml")

testNew.xml：

<root>
  <one>Who likes applesauce?</one>
  <two>You sure bet I do!</two>
  <three>And don't you forget it!</three>
</root>

来源：https : //serverfault.com/questions/26976/update-xml-from-the-command-line-windows

— 粘土
source

在使用Powershell之前，与各种linux工具进行了数小时的战斗。我很惊讶这是如此困难-linux cmd-line通常真的很好，但是这里似乎有一个漏洞。注意：对我而言，用例是：1）通过xpath找到节点，2）删除（如果找到），3）添加新节点，4）保存文件。我正在更新一堆Solr配置。如果有人知道一种简单/可靠的方式做到这一点，我将耳熟能详

— Richard Hauer

哇，这真的是可以接受的解决方案了。但老实说，如果它看起来像xps $doc .root.one xps $doc 'AppendChild("three")'和，我可能会接受它xps $doc '.three.set_InnerText("And don't you forget it!")'，这显然是次等的！

— 约瑟夫·霍尔斯滕

8

还有NetBSD xmltools的xmlsed和xmlgrep！

http://blog.huoc.org/xmltools-not-dead.html

— 标记
source

6

取决于您要做什么。

XSLT可能是要走的路，但是有一个学习曲线。尝试使用xsltproc并注意可以输入参数。

— 阿德里安·穆阿特（Adrian Mouat）
source

4

saxon-lint从命令行也可以使用XPath 3.0 / XQuery 3.0。（其他命令行工具使用XPath 1.0）。

例子：

http / html：

$ saxon-lint --html --xpath 'count(//a)' http://stackoverflow.com/q/91791
328

xml：

$ saxon-lint --xpath '//a[@class="x"]' file.xml

— 吉尔·奎诺（Gilles Quenot）
source

4

D. Bohdan维护了一个开源GitHub存储库，其中包含用于结构化文本工具的命令行工具列表，其中有一个用于XML / HTML工具的部分：

https://github.com/dbohdan/structured-text-tools#xml-html

— 恶魔
source

3

XQuery可能是一个很好的解决方案。（相对）易于学习，并且是W3C标准。

我建议将XQSharp用于命令行处理器。

— 奥利弗·哈拉姆（Oliver Hallam）
source

1

BaseX还具有命令行XQuery处理器（除其数据库模式外），并且与该标准的最新版本保持最新（非常紧密地遵循XQuery 3.0的不断发展的草案）。

— 查尔斯·达菲

3

我首先使用xmlstarlet并仍在使用它。当查询变得困难时，我需要XML的xpath2和xquery功能支持，我转向xidel http://www.videlibri.de/xidel.html

— 真相调整者
source

1

等效Grep

您可以定义一个bash函数，例如包装一些python3代码的“ xp”（“ xpath”）。要使用它，您需要安装python3和python-lxml。好处：

正则表达式匹配，例如在xmllint中缺少。
在命令行上用作过滤器（在管道中）

像这样使用既简单又强大：

xmldoc=$(cat <<EOF
<?xml version="1.0" encoding="utf-8"?>
<job xmlns="http://www.sample.com/">programming</job>
EOF
)
selection='//*[namespace-uri()="http://www.sample.com/" and local-name()="job" and re:test(.,"^pro.*ing$")]/text()'
echo "$xmldoc" | xp "$selection"
# prints programming

xp（）看起来像这样：

xp()
{ 
local selection="$1";
local xmldoc;
if ! [[ -t 0 ]]; then
    read -rd '' xmldoc;
else
    xmldoc="$2";
fi;
python3 <(printf '%b' "from lxml.html import tostring\nfrom lxml import etree\nfrom sys import stdin\nregexpNS = \"http://exslt.org/regular-expressions\"\ntree = etree.parse(stdin)\nfor e in tree.xpath('""$selection""', namespaces={'re':regexpNS}):\n  if isinstance(e, str):\n    print(e)\n  else:\n    print(tostring(e).decode('UTF-8'))") <<< "$xmldoc"
}

等效色当量

考虑使用xq，它可以为您提供jq“编程语言”的全部功能。如果您安装了python-pip，则可以使用pip install yq安装xq ，然后在以下示例中，我们将“ Keep Accounts”替换为“ Keep Accounts 2”：

xmldoc=$(cat <<'EOF'
<resources>
    <string name="app_name">Keep Accounts</string>
    <string name="login">"login"</string>
    <string name="login_password">"password："</string>
    <string name="login_account_hint">input to login</string>
    <string name="login_password_hint">input your password</string>
    <string name="login_fail">login failed</string>
</resources>
EOF
)
echo "$xmldoc" | xq '.resources.string = ([.resources.string[]|select(."#text" == "Keep Accounts") ."#text" = "Keep Accounts 2"])' -x

— methuselah-0
source

-1

JEdit有一个名为“ XQuery”的插件，该插件提供XML文档的查询功能。

不完全是命令行，但是可以！

— 本
source

尽管JEdit可能有一种搜索文件的方法，但这并不能使其成为的竞争者grep(1)。

— 约瑟夫·霍尔斯滕

Grep和Sed等效于XML命令行处理

例子 ：

等效Grep

等效色当量

例子：