106

我需要使用Python向现有的PDF中添加一些额外的文本，最好的方法是什么，需要安装哪些额外的模块。

注意：理想情况下，我希望能够在Windows和Linux上都可以运行此程序，但是一键式仅Linux可以运行。

编辑：pyPDF和ReportLab看起来不错，但没人允许我编辑现有的PDF，还有其他选择吗？

python pdf

— 冰雪奇缘
source

88

我知道这是一篇较旧的文章，但是我花了很长时间尝试寻找解决方案。我碰巧只使用ReportLab和PyPDF，所以我想分享一下：

使用阅读您的PDF PdfFileReader()，我们称此输入
创建一个包含要使用ReportLab添加的文本的新pdf文件，并将其另存为字符串对象
使用读取字符串对象PdfFileReader()，我们将此文本称为
使用创建一个新的PDF对象PdfFileWriter()，我们将其称为输出
遍历输入内容并申请.mergePage(*text*.getPage(0))要添加文本的每个页面，然后用于output.addPage()将修改后的页面添加到新文档中

这对于简单的文本添加效果很好。请参阅PyPDF的示例为文档加水印。

这是一些代码，可以回答以下问题：

packet = StringIO.StringIO()
can = canvas.Canvas(packet, pagesize=letter)
<do something with canvas>
can.save()
packet.seek(0)
input = PdfFileReader(packet)

在这里，您可以将输入文件的页面与另一个文档合并。

— wel
source

2

“创建一个包含要使用ReportLab添加的文本的新pdf，将其另存为字符串对象”您如何做到这一点？它是一个画布实例。

— Lakshman Prasad

1

我在上面添加了一些示例代码来回答Lakshman的问题。

— dwelch 2010年

我建议使用PyPDF2，因为它已更新，还请检查其示例代码：github.com/mstamy2/PyPDF2/blob/…–

— blaze

2

此代码将创建一个新的pdf文件，并将跳过所有元数据。因此，它不会追加到现有的pdf中。

— 安东·库库巴

124

[Python 2.7]的示例：

from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(10, 100, "Hello world")
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = file("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()

Python 3.x的示例：

from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

packet = io.BytesIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(10, 100, "Hello world")
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(open("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = open("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()

— 戴维·德汉（David Dehghan）
source

13

对于python3，数据包应为，io.BytesIO并使用PyPDF2而不是pyPDF（未维护）。好答案！

— Noufal Ibrahim

4

感谢分享。效果很好。一个注意事项：我认为最好使用open代替file。

— mitenka's

我相信这是一个更可接受的答案，尤其是因为它包含一个可行的示例。

— 凯西（Casey）

1

注意：新文档仅包含原始文档的第一页！将其余页面从复制existing_pdf到output，这很容易，示例代码却没有。

— Alexis

@alexis：您将如何修改代码以将某些内容放置在pdf的第二页上？我有一个使用两页的表格，并且卡在第一页上。提前致谢。

— DavidV

11

pdfrw允许您读取现有PDF的页面并将其绘制到reportlab画布上（类似于绘制图像）。github上的pdfrw examples / rl1子目录中有一些示例。免责声明：我是pdfrw的作者。

— 帕特里克·莫平
source

我认为您可以在此处建立链接

— The6thSense

好点子！发布该消息时，我并没有做太多事情，并且担心“最小文本加链接策略”。（当时我的代表只有46岁，而IIRC在一个答案上刚收到-2，所以我有点担心5岁的新问题的答案：）

— Patrick Maupin

旧的问题得到更多的关注:)和关注

— The6thSense

FWIW，如果您开始关注此链接，还有更多的reportlab / pdfrw示例。我根据伪造目标的答案在那里回答了。

— 帕特里克·莫平

7

利用David Dehghan的回答，以下在Python 2.7.13中起作用：

from PyPDF2 import PdfFileWriter, PdfFileReader, PdfFileMerger

import StringIO

from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(290, 720, "Hello world")
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader("original.pdf")
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = open("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()

— 罗斯·史密斯二世
source

3

cpdf将通过命令行执行此工作。它不是python，但是（afaik）：

cpdf -add-text "Line of text" input.pdf -o output .pdf

— 用户名
source

0

将问题分解为将PDF转换为可编辑格式，编写更改，然后再将其转换回PDF可能会更好。我不知道可以直接编辑PDF的库，但是例如在DOC和PDF之间有很多转换器。

— 艾尔克
source

1

问题是我只有PDF来源（来自第三方），并且PDF-> DOC-> PDF在转换中会损失很多。另外，我需要在Linux上运行它，因此DOC可能不是最佳选择。

— Frozenskys

我相信Adobe会将PDF编辑功能保持相当封闭和专有的状态，以便他们可以出售其更好版本的Acrobat的许可证。也许您可以找到一种使用某种宏接口自动使用Acrobat Pro进行编辑的方法。

— aehlke

如果要编写的部分是表单字段，则有XML接口可以编辑它们-否则我什么也找不到。

— aehlke

不，我只是想在每页上添加几行文字。

— Frozenskys

0

如果您使用的是Windows，这可能会起作用：

PDF Creator飞行员

还有一个Python中的PDF创建和编辑框架的白皮书。这有点过时，但也许可以给您一些有用的信息：

使用Python作为PDF编辑和处理框架

— 德兹
source

白皮书看起来不错，但是对代码的了解却很少，我本人实际上没有资源来实现整个PDF框架！;）

— Frozenskys

-4

您尝试过pyPdf吗？

抱歉，它无法修改页面的内容。

— 佐曼
source

看起来可能可行，有人用过吗？内存使用情况如何？

— Frozenskys

它确实可以添加文本水印，并且如果格式正确，它可能会起作用。

— Frozenskys

使用Python将文本添加到现有PDF

[Python 2.7]的示例：

Python 3.x的示例：