如何搜索PDF文件中的某些文本

我想在PDF文件中搜索一些文本。例如，我的PDF中的“转到”一词在哪里？如果你找到它，那里有什么页面？

我找到这个命令行：

find /TEMP -name 'manu.pdf' -exec pdftotext {} - \; | grep "go to"

它产生了一些元素。

我想获得结果的页码。如何检索该项目？

linux pdf search

— Braiam
source

Answers:

pdfgrep似乎这样做。从手册页：

-n, --page-number
Prefix each match with the number of the page where it was found.

— 凯斯特纳德
source

非常感谢，对不起这个话题，我应该看看这个页面！

抱歉，我的服务器上没有安装pdfgrep。我安装poppler-utils但我无法安装pdfgrep。所以，我没有任何结果

你为什么不能安装pdfgrep？

— Kai Sternad

在Centos 5.7和ubuntu 9.10上：apt-get（或yum）安装pdfgrep：没有包pdfgrep可用。我下载了1.3.0.tar.gz，解压缩，。/ configure：configure：错误：未满足包要求（poppler-cpp）：找不到包'poppler-cpp'。我无能为力

Pdfgrep可从Ubuntu 10.10获得。我刚刚在Ubuntu 11 VM中成功安装了它

— 启动Sternad

默认情况下，pdftotext会在页面之间插入换页符（0xC）。您可以将它们计算到您搜索的单词的外观。

另一种方法是使用bbox选项：

 Generate an XHTML file containing bounding box information for each word in the file.

在这里，每个单词都包含在一个page容器中。所以你可以把page你单词的索引+ 1 作为页码

你有一个例子来得到它吗？

Recoll可以搜索PDF文档。它具有命令行模式，但GUI将更有助于详细说明匹配发生的位置，并且它将允许您单击在正确位置打开文档。

— user2391635
source

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.

Licensed under cc by-sa 3.0 with attribution required.