列出PDF中的命名目的地

6

如何在PDF文件中列出命名的目的地？

命名目的地是您可能称为锚的正式名称。foo当您访问时，主要浏览器会跳至指定的目的地http://example.com/some.pdf#foo。

我有可以在其中看到锚的文档，但似乎找不到列出锚的方法。指示时，Evince，okular和xpdf会跳至它们，但似乎没有列出它们的界面。pdftk dump_data列出书签，但这不是一回事（这是目录条目的表，它很可能与命名目的地位于同一位置，但不能用作锚点）。

我正在寻找一种命令行解决方案（例如，适合在后面加上evince -n）用于完成功能。由于这是有意义的，因此我想按目的地在文档中出现的顺序列出目的地。奖励：显示目标页码和其他有助于大致确定目标位置的信息。

_{另请参阅有关针对GUI查看器的软件建议上的PDF文档中的查看锚点。}

pdf

— 吉尔斯
source

7

该pyPDF库可以列出锚：

#!/usr/bin/env python
import sys
from pyPdf import PdfFileReader
def pdf_list_anchors(fh):
    reader = PdfFileReader(fh)
    destinations = reader.getNamedDestinations()
    for name in destinations:
        print name
pdf_list_anchors(open(sys.argv[1]))

这对于完成用例已经足够了，但是锚点以随机顺序列出。仅靠pyPdf 1.13的稳定接口，我找不到一种顺序列出锚的方法。我还没有尝试过pyPdf2。

— 吉尔斯
source

6

Poppler的 pdfinfo命令行实用程序将为您提供PDF中所有命名目标的页码，位置和名称。您至少需要0.58版本的Poppler。

$ pdfinfo -dests input.pdf
Page  Destination                 Name
   1 [ XYZ null null null      ] "F1"
   1 [ XYZ  122  458 null      ] "G1.1500945"
   1 [ XYZ   79  107 null      ] "G1.1500953"
   1 [ XYZ   79   81 null      ] "G1.1500954"
   1 [ XYZ null null null      ] "P.1"
   2 [ XYZ null null null      ] "L1"
   2 [ XYZ null null null      ] "P.2"
(...)

— 乔治
source

1

这将打印它们（两次），按名称排序，然后按pdf下的页面位置排序。一个大样本PDF包含命名目的地

#!/usr/bin/env python
import sys
from pyPdf import PdfFileReader
def pdf_get_anchors(fh):
    reader = PdfFileReader(fh)
    destinations = reader.getNamedDestinations()                #completely unsorted order, does not include pagenums
    L=list();
    for PageNum in range(1,reader.numPages+1) :
        ThisPage = reader.getPage(PageNum-1)
        PageTop = ThisPage['/MediaBox'][3]
        for name in destinations:
            ThisDest = destinations[name]
            ThisDestPage = ThisDest.page.getObject()
            if ThisDestPage == ThisPage:                        #have to do this to identify the pagenum
                DownPage = (PageTop - ThisDest.top) / PageTop   # calc fraction of page down
                Position = PageNum + DownPage                   # a sortable number down the whole pdf
                L.append((name, PageNum, Position));            # put everything in a sortable list         
    return L, len (destinations), reader.getNumPages()

def pdf_print_anchors ( L ) :
    for dest in L :
        name=dest[0]
        PageNum=dest[1]
        Position= round(dest[2]*100)/100
        print "%-8.2f % %s" % Position % name #ThisDest.title
        #print ThisDest.title, "       ",  PageNum,  round(Position*100)/100

HeaderLine="\n Page   Name\n"                     
L, NumDests, NumPages =pdf_get_anchors(open(sys.argv[1],'rb'))
print HeaderLine
L.sort(key=lambda dest: dest[0])                        #sort name order
pdf_print_anchors(L);     
print HeaderLine
L.sort(key=lambda dest: dest[2])                        #sort in order down the pdf
pdf_print_anchors(L);
print HeaderLine
print "Number of NamedDestinations: ", NumDests, "NumPages: ", NumPages

— 亨利·克伦
source

1

您的答案基于较早版本的工作。允许，但您应该承认这一点。

— JigglyNaga

2

我最初的答复只是修正了原始内容的错误，我会以为这是不言而喻的，当然也不是任何东西。

— Henry Crun '16