假设我有10,000个XML文件。现在假设我想将它们发送给朋友。在发送它们之前,我想对其进行压缩。
方法1:不要压缩它们
结果:
Resulting Size: 62 MB
Percent of initial size: 100%
方法2:压缩每个文件并将其发送给他10,000个xml文件
命令:
for x in $(ls -1) ; do echo $x ; zip "$x.zip" $x ; done
结果:
Resulting Size: 13 MB
Percent of initial size: 20%
方法3:创建一个包含10,000个xml文件的单个zip
命令:
zip all.zip $(ls -1)
结果:
Resulting Size: 12 MB
Percent of initial size: 19%
方法4:将文件串联成单个文件并压缩
命令:
cat *.xml > oneFile.txt ; zip oneFile.zip oneFile.txt
结果:
Resulting Size: 2 MB
Percent of initial size: 3%
问题:
- 当我仅压缩单个文件时,为什么会得到如此显着更好的结果?
- 我期望使用方法3会比使用方法2获得更好的结果,但事实并非如此。为什么?
- 此行为特定于
zip
吗?如果尝试使用,gzip
会得到不同的结果吗?
附加信息:
$ zip --version
Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license.
This is Zip 3.0 (July 5th 2008), by Info-ZIP.
Currently maintained by E. Gordon. Please send bug reports to
the authors using the web page at www.info-zip.org; see README for details.
Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip,
as of above date; see http://www.info-zip.org/ for other sites.
Compiled with gcc 4.4.4 20100525 (Red Hat 4.4.4-5) for Unix (Linux ELF) on Nov 11 2010.
Zip special compilation options:
USE_EF_UT_TIME (store Universal Time)
SYMLINK_SUPPORT (symbolic links supported)
LARGE_FILE_SUPPORT (can read and write large files on file system)
ZIP64_SUPPORT (use Zip64 to store large files in archives)
UNICODE_SUPPORT (store and read UTF-8 Unicode paths)
STORE_UNIX_UIDs_GIDs (store UID/GID sizes/values using new extra field)
UIDGID_NOT_16BIT (old Unix 16-bit UID/GID extra field not used)
[encryption, version 2.91 of 05 Jan 2007] (modified for Zip 3)
编辑:元数据
一个答案表明区别在于存储在zip中的系统元数据。我认为情况并非如此。为了测试,我做了以下工作:
for x in $(seq 10000) ; do touch $x ; done
zip allZip $(ls -1)
生成的zip文件为1.4MB。这意味着仍有约10 MB的无法解释的空间。
$(ls -1)
只是使用*
:for x in *
; zip all.zip *
.tar.gz
而不是仅仅压缩整个目录。