快速连接大量小PDF

0

我在Windows 10上。我有2000个PDF文件，每个有两到三页（只有一页空白），大小只有40~50 KiB，总计不到100 MiB。我想将所有文件中的所有页面连接成一个PDF文件。我目前使用的方法是Acrobat DC→工具→组合文件。我将所有文件拖入工具并点击开始。经过一些估计，我发现它需要超过12个小时（Core i7-4710HQ笔记本电脑，16 GiB RAM和SSD）。这对我来说相当不切实际。有更快的方法吗？

— iBug
source

1

如果你关心使用python，那么在前面的线程中讨论了几个python脚本： https://stackoverflow.com/questions/3444645/merge-pdf-files

由于python PDF库的工作方式，所有文件都先打开，只有在写入输出文件时才会读取内容。因此，您应该期望高内存消耗。解决方法是将文件拆分为多个文件夹。

您可以轻松地扩展此脚本，例如，将子树及其所有子文件夹中的所有PDF组合在一起。

此程序支持详细输出的可选标志，以及跳过每个输入文件的最后一页。允许使用通配符作为输入文件模式。

from argparse import ArgumentParser
from glob import glob
from PyPDF2 import PdfFileReader, PdfFileWriter



def PDF_cat(files, output_filename, skiplastpage, verbose):
    # First open all the files, then produce the output file, and
    # finally close the input files. This is necessary because
    # the data isn't read from the input files until the write
    # operation. Thanks to
    # https://stackoverflow.com/questions/6773631/problem-with-closing-_
    #    python-pypdf-writing-getting-a-valueerror-i-o-operation/6773733

    writer = PdfFileWriter()
    skip = 1 if skiplastpage else 0

    # collect and open input files
    inp = [open(f,'rb') for f in glob(files) if f != output_filename]
    n = len(inp)
    print 'merging %d files' % n
    for i, fh in enumerate(inp, 1):
        reader = PdfFileReader(fh)
        for pg in range(reader.getNumPages() - skip):
            writer.addPage(reader.getPage(pg))
        if verbose: print '%d/%d %s' % (i, n, fh.name)

    print('writing output file...')
    with open(output_filename, 'wb') as fout:
        writer.write(fout)
    # finallly...
    for fh in inp:
        fh.close()

if __name__ == '__main__':
    parser = ArgumentParser()

    # add more options if you like
    parser.add_argument('-o', '--output',
                        dest='output_filename',
                        help='write merged PDF files to FILE',
                        metavar='FILE')
    parser.add_argument(dest='files',
                        help='PDF files to merge')
    parser.add_argument('-s', '--skiplastpage',
                        dest='skiplastpage',
                        action='store_true',
                        help='skip last page of each merged PDF')
    parser.add_argument('-v', '--verbose',
                        dest='verbose',
                        action='store_true',
                        help='show progress')
    parser.set_defaults(output_filename='mergedPDFs.pdf', files='.\*.pdf',
                        skiplastpage=False, verbose=False)

    args = parser.parse_args()
    PDF_cat(args.files, args.output_filename, args.skiplastpage, args.verbose)

一个快速测试：合并501个相同的91 KB的PDF每个在我的笔记本上花了61秒，使用PDFtk.exe花了83秒。输出文件的大小不同，但显示的相同。

— user1016274
source

我渴望使用Windows子系统Linux（这是我的日常工作平台）。我明天会试一试。

— iBug

好的。它在不到半分钟内完成了2000个PDF文件。在接受这个答案之前，我会等待一些用户友好的解决方案。

— iBug

0

您可以尝试其他Acrobat替代方案。这些工具可能会以某种方式帮助您。

1。 PDFSam

在给定的页码，给定的书签级别或给定大小的文件中合并和拆分PDF文件
从PDF中提取页面
旋转PDF文件，每页或仅选定页面
将PDF文件合并在一起，从一个和另一个中取出页面。

2。 PDFMerge

安全文件合并和处理
提供在线平台用于合并PDF
还提供DEsktop版本

3。 PDFTK

简单但非常强大的工具包
附带一个命令行工具，可以轻松地在命令行上与多个pdf进行交互。

现在，我建议你使用pdftk，因为它的命令行工具非常强大，可以节省大量的时间和精力。

随意使用任何其他工具编辑列表。

— C0deDaedalus
source

与命令行工具相比，我会使用Python代码。

— iBug