如何使用wget / curl在给定的网页上下载所有指向.zip文件的链接？

页面包含指向一组.zip文件的链接，我都希望下载所有这些文件。我知道这可以通过wget和curl来完成。怎么做？

curl download wget

— uyetch
source

Answers:

125

该命令是：

wget -r -np -l 1 -A zip http://example.com/download/

选项含义：

-r,  --recursive          specify recursive download.
-np, --no-parent          don't ascend to the parent directory.
-l,  --level=NUMBER       maximum recursion depth (inf or 0 for infinite).
-A,  --accept=LIST        comma-separated list of accepted extensions.

— 吱吱作响
source

的-nd，如果你不希望任何额外的目录中创建（即，所有的文件将在根文件夹）事（目录）的标志是很方便的。

— 史蒂夫·戴维斯

我如何调整此解决方案以使其在给定页面上更深入？我尝试了-l 20，但是wget立即停止。

— 扳手

如果文件与起始网址不在同一个目录中，则可能需要删除-np。如果他们在不同的主机上，则需要--span-host。

— 丹

上述解决方案不适用于我。对我来说，只有一个有效：

wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off [url of website]

选项含义：

-r            recursive
-l1           maximum recursion depth (1=use only this directory)
-H            span hosts (visit other hosts in the recursion)
-t1           Number of retries
-nd           Don't make new directories, put downloaded files in this one
-N            turn on timestamping
-A.mp3        download only mp3s
-erobots=off  execute "robots.off" as if it were a part of .wgetrc

— 迈克尔·艾
source

来源：commandlinefu.com/commands/view/12498/...

— 詹姆斯杰弗里

是啊谢谢！我不记得它来自哪里，只是放在我的脚本中。

— K.-Michael Aye 2014年

不知道对不起。提出一个新问题！;）

— K.-Michael Aye 2015年

+1为-H开关。这就是阻止第一个答案（这是我在查看SO之前尝试过的）的原因。

— Alex

我收到一个“长选项的强制参数也是短选项的必需参数”错误。:(

— –FrançoisLeblanc

对于具有并行魔术的其他场景，我使用：

curl [url] | grep -i [filending] | sed -n 's/.*href="\([^"]*\).*/\1/p' |  parallel -N5 wget -

— 林德布拉德
source