如何将PDF文件快速拆分为单个页面（即从“终端”命令行）？

23

我有一个6页长的PDF文件，我想分成1.pdf，2.pdf，3.pdf等...

预览不适用于此功能（除非我丢失了某些内容）。

我很希望能够从命令行执行此简单任务，但是在这一点上，我将采取一切措施完成任务（无需下载草图软件）

仅供参考http://users.skynet.be/tools/不能像宣传的那样工作。

macos command-line pdf

— 用户391339
source

2

这个SE答案是一个很好的命令行解决方案。您可以使用Homebrew安装ghostscript 。

— fideli

21

在预览中打开pdf，然后在视图菜单上选择缩略图。Ctrl选择要立即将其拖放到桌面上的页面。

— lee
source

1

这很好。花了大约30分钟后，花了我大约30秒钟来完成此操作。某些人将这种技术与Automator结合使用，但是我还没有尝试过。

— user391339 2014年

35

这可以通过使用来实现pdfseparate。您可以通过安装带有自制程序的poppler brew install poppler。这也将安装pdfseparate。到PDF拆分document.pdf成成单页1.pdf，2.pdf等用途：

pdfseparate document.pdf %d.pdf

— q
source

1

poppler一天前刚刚安装，可以使用转换PDF文档为SVG pdf2svg。没注意到命令poppler附带的pdfseparate。由于上面接受的答案（将带有预览的所有PDF页面拖放到桌面）需要我“单击”，并且由于我喜欢终端机上的解决方案，而该解决方案只能通过一个命令行自动运行，pdfseparate因此正是我所需要的。非常感谢您的提示！

— Arvid

有趣的是，pdfseparate生成的pdf的总大小比原始pdf的大得多。我有一个1.9 MB的400页文档。拆分后，我得到了大约60 MB的内存。

— 康斯坦丁

5

如果您对从命令行执行此操作感兴趣，可以查看Benjamin Han的splitPDF python脚本来完成此工作。例如：

splitPDF.py in.pdf 3 5

会将文件拆分in.pdf为3个文件，分别在第3页和第5页上。

— 让·菲利普·佩莱特
source

这很好，输出的内容比上面的pdfseparate灵活得多。尽管它主要用于将pdf分为几页，但是如果您确实希望将每一页都分成几页，则可以轻松地seq在命令中产生一系列数字。谢谢！

— dgig

1

像是python splitPDF.py MyPDF.pdf $(seq -s ' ' 1 10 411)为我工作的东西

— dgig '16

1

话好极了。我确认这可以直接在MacOS 10.13.3上工作

— MichaelCodes

1

有关其他选择，请参见此答案。这使用ImageMagick命令行工具。

convert x.pdf -quality 100 -density 300x300 x-%04d.pdf

但是，您必须小心质量。

— on
source

1

如果要提取一定范围的页面，则可以使用以下脚本，其调用方式如下（假设将其保存到系统PATH的某个位置的pdfextract.py文件中，例如/ usr / local / bin，并为其分配执行权限chmod 744 pdfextract.py的许可）：

pdfextract.py-文件输入/ path / to / large / pdf-文件输出/ path / to / new / pdf-开始-停止

#!/usr/bin/env python
# -*- coding: utf-8 -*-


import argparse
import os
import subprocess as sp


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--file-in', required=True, type=str, dest='file_in')
    parser.add_argument('--file-out', required=True, type=str, dest='file_out')
    parser.add_argument('--start', required=True, type=int, dest='start', default=-1)
    parser.add_argument('--stop', required=True, type=int, dest='stop', default=-1)

    args = parser.parse_args()
    assert os.path.isfile(args.file_in)
    assert not os.path.isfile(args.file_out)

    # remove temporary files
    for el in os.listdir('/tmp'):
        if os.path.isfile(os.path.join('/tmp', el)) and el[:12] == 'pdfseparate-':
            os.remove(os.path.join('/tmp', el))

    sp.check_call('pdfseparate -f {:d} -l {:d} {:s} /tmp/pdfseparate-%d.pdf'.format(args.start, args.stop, args.file_in), shell=True)

    cmd_unite = 'pdfunite '
    for i in range(args.start, args.stop + 1):
        cmd_unite += '/tmp/pdfseparate-{:d}.pdf '.format(i)
    cmd_unite += args.file_out
    sp.check_call(cmd_unite, shell=True)

    # remove temporary files
    for el in os.listdir('/tmp'):
        if os.path.isfile(os.path.join('/tmp', el)) and el[:12] == 'pdfseparate-':
            os.remove(os.path.join('/tmp', el))


if __name__ == "__main__":
    main()

— 康斯坦丁
source