使用Python获取文件的最后n行，类似于tail

181

我正在为Web应用程序编写日志文件查看器，为此，我想在日志文件的各行中进行分页。文件中的项目是基于行的，底部是最新的项目。

因此，我需要一种tail()可以n从底部读取行并支持偏移量的方法。我想到的是这样的：

def tail(f, n, offset=0):
    """Reads a n lines from f with an offset of offset lines."""
    avg_line_length = 74
    to_read = n + offset
    while 1:
        try:
            f.seek(-(avg_line_length * to_read), 2)
        except IOError:
            # woops.  apparently file is smaller than what we want
            # to step back, go to the beginning instead
            f.seek(0)
        pos = f.tell()
        lines = f.read().splitlines()
        if len(lines) >= to_read or pos == 0:
            return lines[-to_read:offset and -offset or None]
        avg_line_length *= 1.3

这是合理的方法吗？推荐使用偏移量尾部日志文件的推荐方法是什么？

— 阿明·罗纳彻（Armin Ronacher）
source

在我的系统（Linux SLES 10）上，相对于末尾的搜索会引发IOError“无法执行非零的末尾相对搜索”。我喜欢这种解决方案，但已对其进行了修改，以获取文件长度（seek(0,2)然后为tell()），并使用该值相对于开始位置进行查找。

— 安妮

2

恭喜-这个问题已纳入Kippo源代码

— Miles

所述的参数open用于生成命令f文件对象应指定，因为根据是否 f=open(..., 'rb')或f=open(..., 'rt')在f必须被不同地处理

— 伊戈尔Fobia

123

这可能比您的要快。不假设行长。一次返回一个文件块，直到找到正确数量的'\ n'字符为止。

def tail( f, lines=20 ):
    total_lines_wanted = lines

    BLOCK_SIZE = 1024
    f.seek(0, 2)
    block_end_byte = f.tell()
    lines_to_go = total_lines_wanted
    block_number = -1
    blocks = [] # blocks of size BLOCK_SIZE, in reverse order starting
                # from the end of the file
    while lines_to_go > 0 and block_end_byte > 0:
        if (block_end_byte - BLOCK_SIZE > 0):
            # read the last block we haven't yet read
            f.seek(block_number*BLOCK_SIZE, 2)
            blocks.append(f.read(BLOCK_SIZE))
        else:
            # file too small, start from begining
            f.seek(0,0)
            # only read what was not read
            blocks.append(f.read(block_end_byte))
        lines_found = blocks[-1].count('\n')
        lines_to_go -= lines_found
        block_end_byte -= BLOCK_SIZE
        block_number -= 1
    all_read_text = ''.join(reversed(blocks))
    return '\n'.join(all_read_text.splitlines()[-total_lines_wanted:])

我不喜欢关于行长的棘手假设，实际上，您永远都不知道那样的事情。

通常，这将在循环的第一遍或第二遍中定位最后20行。如果您的74个字符实际上是准确的，则将块大小设置为2048，并且几乎立即尾随20行。

另外，我不会消耗大量的大脑卡路里来尝试与物理OS块进行精确对齐。使用这些高级I / O程序包，我怀疑您会发现尝试在OS块边界上对齐会对性能产生任何影响。如果使用较低级别的I / O，则可能会看到加速。

更新

对于Python 3.2及更高版本，请按照字节上的说明进行操作，如在文本文件中（那些在模式字符串中打开而没有“ b”的文件），则仅允许相对于文件开头的查找（例外是查找到文件末尾）与seek（0，2））.:

例如： f = open('C:/.../../apache_logs.txt', 'rb')

 def tail(f, lines=20):
    total_lines_wanted = lines

    BLOCK_SIZE = 1024
    f.seek(0, 2)
    block_end_byte = f.tell()
    lines_to_go = total_lines_wanted
    block_number = -1
    blocks = []
    while lines_to_go > 0 and block_end_byte > 0:
        if (block_end_byte - BLOCK_SIZE > 0):
            f.seek(block_number*BLOCK_SIZE, 2)
            blocks.append(f.read(BLOCK_SIZE))
        else:
            f.seek(0,0)
            blocks.append(f.read(block_end_byte))
        lines_found = blocks[-1].count(b'\n')
        lines_to_go -= lines_found
        block_end_byte -= BLOCK_SIZE
        block_number -= 1
    all_read_text = b''.join(reversed(blocks))
    return b'\n'.join(all_read_text.splitlines()[-total_lines_wanted:])

— 洛特
source

13

这在小日志文件上失败– IOError：无效参数– f.seek（block *

— 1024，2

1

确实是非常好的方法。我使用了上面代码的略微修改版本，并得出

— GiampaoloRodolà11年

6

在python 3.2中不再起作用。我得到io.UnsupportedOperation: can't do nonzero end-relative seeks我可以将偏移量更改为0，但这违反了功能的目的。

— 逻辑谬论

4

@DavidEnglund原因在这里。简而言之：在文本模式下，不允许相对于文件末尾进行查找，这可能是因为必须对文件内容进行解码，并且通常，当您在编码字节序列内查找任意位置时，可能会产生不确定的结果尝试从该位置开始解码为Unicode。该链接提供的建议是尝试以二进制模式打开文件并自行解码，以捕获DecodeError异常。

— 最多

6

请勿使用此代码。在python 2.7中，在某些边界情况下会破坏线条。下面来自@papercrane的答案已解决。

— xApple

88

假设您可以在Python 2上使用类似unix的系统：

import os
def tail(f, n, offset=0):
  stdin,stdout = os.popen2("tail -n "+n+offset+" "+f)
  stdin.close()
  lines = stdout.readlines(); stdout.close()
  return lines[:,-offset]

对于python 3，您可以执行以下操作：

import subprocess
def tail(f, n, offset=0):
    proc = subprocess.Popen(['tail', '-n', n + offset, f], stdout=subprocess.PIPE)
    lines = proc.stdout.readlines()
    return lines[:, -offset]

— 标记
source

5

应该与平台无关。此外，如果您阅读问题，您将看到f是一个类似于object的文件。

— Armin Ronacher

40

这个问题并不是说平台依赖性是不可接受的。我看不到为什么当它提供一种非常统一的方式（也许是您正在寻找的……肯定是对我来说）时，为什么值得两票，这正是问题的所在。

— Shabbyrobe”

3

谢谢，我当时以为我必须用纯Python解决此问题，但是没有理由不使用UNIX实用程序，因此，我同意了。在现代Python中，FWIW的subprocess.check_output可能比os.popen2更可取；它只是将输出作为字符串返回，并引发了非零的退出代码，从而简化了操作。

— mrooney

3

尽管这取决于平台，但这是一种非常有效的方式来完成所要求的操作，并且是一种非常快速的方式（您不必将整个文件加载到内存中）。@Shabbyrobe

— EarthmeL14年

6

您可能要像这样预先计算偏移量，offset_total = str(n+offset)并替换此行stdin,stdout = os.popen2("tail -n "+offset_total+" "+f)以避免：TypeErrors (cannot concatenate int+str)

— AddingColor '16

32

这是我的答案。纯Python。使用timeit似乎非常快。拖尾具有100,000行的日志文件的100行：

>>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=10)
0.0014600753784179688
>>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=100)
0.00899195671081543
>>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=1000)
0.05842900276184082
>>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=10000)
0.5394978523254395
>>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=100000)
5.377126932144165

这是代码：

import os


def tail(f, lines=1, _buffer=4098):
    """Tail a file and get X lines from the end"""
    # place holder for the lines found
    lines_found = []

    # block counter will be multiplied by buffer
    # to get the block size from the end
    block_counter = -1

    # loop until we find X lines
    while len(lines_found) < lines:
        try:
            f.seek(block_counter * _buffer, os.SEEK_END)
        except IOError:  # either file is too small, or too many lines requested
            f.seek(0)
            lines_found = f.readlines()
            break

        lines_found = f.readlines()

        # we found enough lines, get out
        # Removed this line because it was redundant the while will catch
        # it, I left it for history
        # if len(lines_found) > lines:
        #    break

        # decrement the block counter to get the
        # next X bytes
        block_counter -= 1

    return lines_found[-lines:]

— 格伦博特
source

3

优雅的解决方案！是 if len(lines_found) > lines:真的有必要吗？loop病情也不会抓住吗？

— 马克西米利安·彼得斯

我理解的一个问题：os.SEEK_END仅仅是为了清楚起见？据我发现，它的值是常数（= 2）。我在想将它排除在外以便能够排除import os。感谢您的出色解决方案！

— n1k31t4

2

@MaximilianPeters是的。这不是必需的。我把它注释掉了。

— glenbot

您可以将@DexterMorgan替换os.SEEK_END为其等效的整数。主要是为了提高可读性。

— glenbot

1

我投票赞成，但有个小建议。寻道后，第一行读取可能是不完整的，所以要得到N个_complete_lines我改变了while len(lines_found) < lines对while len(lines_found) <= lines我的副本。谢谢！

— 格雷厄姆·克莱恩

30

如果可以读取整个文件，请使用双端队列。

from collections import deque
deque(f, maxlen=n)

在2.6之前，双端队列没有maxlen选项，但实施起来很容易。

import itertools
def maxque(items, size):
    items = iter(items)
    q = deque(itertools.islice(items, size))
    for item in items:
        del q[0]
        q.append(item)
    return q

如果需要从头开始读取文件，请使用疾驰（也就是指数）搜索。

def tail(f, n):
    assert n >= 0
    pos, lines = n+1, []
    while len(lines) <= n:
        try:
            f.seek(-pos, 2)
        except IOError:
            f.seek(0)
            break
        finally:
            lines = list(f)
        pos *= 2
    return lines[-n:]

— A.科迪
source

为什么该底部功能起作用？pos *= 2似乎完全是武断的。它的意义是什么？

— 2mac 2014年

1

@ 2mac 指数搜索。它迭代地从文件末尾读取，每次读取的量加倍，直到找到足够的行。

— A. Coady

我认为从末尾读取的解决方案将不支持使用UTF-8编码的文件，因为字符长度是可变的，并且您（可能会）以某种无法正确解释的奇数偏移降落。

— 迈克，

不幸的是，您驰 gall的搜索解决方案不适用于python3。因为f.seek（）不会采用负偏移。我已经更新了您的代码，使其适用于python 3 链接

— itsjwala

25

S.Lott的上述回答几乎对我有用，但最终给了我一些局限性。事实证明，它破坏了块边界上的数据，因为数据以相反的顺序保存读取的块。调用''.join（data）时，块顺序错误。这样可以解决此问题。

def tail(f, window=20):
    """
    Returns the last `window` lines of file `f` as a list.
    f - a byte file-like object
    """
    if window == 0:
        return []
    BUFSIZ = 1024
    f.seek(0, 2)
    bytes = f.tell()
    size = window + 1
    block = -1
    data = []
    while size > 0 and bytes > 0:
        if bytes - BUFSIZ > 0:
            # Seek back one whole BUFSIZ
            f.seek(block * BUFSIZ, 2)
            # read BUFFER
            data.insert(0, f.read(BUFSIZ))
        else:
            # file too small, start from begining
            f.seek(0,0)
            # only read what was not read
            data.insert(0, f.read(bytes))
        linesFound = data[0].count('\n')
        size -= linesFound
        bytes -= BUFSIZ
        block -= 1
    return ''.join(data).splitlines()[-window:]

— 千纸鹤
source

1

在列表的开头插入是个坏主意。为什么不使用双端队列结构？

— Sergey11g

1

可悲的是，Python 3不兼容...试图找出原因。

— Sherlock70

20

我最终使用的代码。我认为这是迄今为止最好的：

def tail(f, n, offset=None):
    """Reads a n lines from f with an offset of offset lines.  The return
    value is a tuple in the form ``(lines, has_more)`` where `has_more` is
    an indicator that is `True` if there are more lines in the file.
    """
    avg_line_length = 74
    to_read = n + (offset or 0)

    while 1:
        try:
            f.seek(-(avg_line_length * to_read), 2)
        except IOError:
            # woops.  apparently file is smaller than what we want
            # to step back, go to the beginning instead
            f.seek(0)
        pos = f.tell()
        lines = f.read().splitlines()
        if len(lines) >= to_read or pos == 0:
            return lines[-to_read:offset and -offset or None], \
                   len(lines) > to_read or pos > 0
        avg_line_length *= 1.3

— 阿明·罗纳彻（Armin Ronacher）
source

5

不能完全回答问题。

— sheki 2012年

13

mmap的简单快速解决方案：

import mmap
import os

def tail(filename, n):
    """Returns last n lines from the filename. No exception handling"""
    size = os.path.getsize(filename)
    with open(filename, "rb") as f:
        # for Windows the mmap parameters are different
        fm = mmap.mmap(f.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ)
        try:
            for i in xrange(size - 1, -1, -1):
                if fm[i] == '\n':
                    n -= 1
                    if n == -1:
                        break
            return fm[i + 1 if i else 0:].splitlines()
        finally:
            fm.close()

— 迪米特里
source

1

当输入可能很大时，这可能是最快的答案（或者，如果它使用.rfind向后扫描换行的方法，而不是在Python级别一次执行字节检查，则可能是这样；在CPython中，用替换Python级别的代码C内置调用通常会获胜很多）。对于较小的输入，deque使用a maxlen可以更简单并且可能同样快。

— ShadowRanger 2015年

4

甚至更清洁的python3兼容版本，不会插入但会附加和反向：

def tail(f, window=1):
    """
    Returns the last `window` lines of file `f` as a list of bytes.
    """
    if window == 0:
        return b''
    BUFSIZE = 1024
    f.seek(0, 2)
    end = f.tell()
    nlines = window + 1
    data = []
    while nlines > 0 and end > 0:
        i = max(0, end - BUFSIZE)
        nread = min(end, BUFSIZE)

        f.seek(i)
        chunk = f.read(nread)
        data.append(chunk)
        nlines -= chunk.count(b'\n')
        end -= nread
    return b'\n'.join(b''.join(reversed(data)).splitlines()[-window:])

像这样使用它：

with open(path, 'rb') as f:
    last_lines = tail(f, 3).decode('utf-8')

— 豪克·雷菲尔德
source

不太陈旧-但我通常建议不要在已有10年历史的问题上添加答案，但要提供很多答案。但请帮帮我：您的代码中特定于Python 3的是什么？

— usr2564301

其他的答案是不完全的工作进行的顺利:-) PY3：看stackoverflow.com/questions/136168/...

— Hauke Rehfeld

3

将@papercrane解决方案更新为python3。使用open(filename, 'rb')和打开文件：

def tail(f, window=20):
    """Returns the last `window` lines of file `f` as a list.
    """
    if window == 0:
        return []

    BUFSIZ = 1024
    f.seek(0, 2)
    remaining_bytes = f.tell()
    size = window + 1
    block = -1
    data = []

    while size > 0 and remaining_bytes > 0:
        if remaining_bytes - BUFSIZ > 0:
            # Seek back one whole BUFSIZ
            f.seek(block * BUFSIZ, 2)
            # read BUFFER
            bunch = f.read(BUFSIZ)
        else:
            # file too small, start from beginning
            f.seek(0, 0)
            # only read what was not read
            bunch = f.read(remaining_bytes)

        bunch = bunch.decode('utf-8')
        data.insert(0, bunch)
        size -= bunch.count('\n')
        remaining_bytes -= BUFSIZ
        block -= 1

    return ''.join(data).splitlines()[-window:]

— 埃米利奥
source

3

根据评论者的要求，将答案发布到我对类似问题的回答上，其中使用相同的技术来改变文件的最后一行，而不仅仅是获取文件。

对于很大的文件，这mmap是最好的方法。为了改善现有mmap答案，此版本可在Windows和Linux之间移植，并且运行速度应更快（尽管如果不对文件大小在GB范围内的32位Python进行一些修改，它将无法正常运行，请参见其他答案以获取处理此问题的提示），并进行修改以在Python 2上运行）。

import io  # Gets consistent version of open for both Py2.7 and Py3.x
import itertools
import mmap

def skip_back_lines(mm, numlines, startidx):
    '''Factored out to simplify handling of n and offset'''
    for _ in itertools.repeat(None, numlines):
        startidx = mm.rfind(b'\n', 0, startidx)
        if startidx < 0:
            break
    return startidx

def tail(f, n, offset=0):
    # Reopen file in binary mode
    with io.open(f.name, 'rb') as binf, mmap.mmap(binf.fileno(), 0, access=mmap.ACCESS_READ) as mm:
        # len(mm) - 1 handles files ending w/newline by getting the prior line
        startofline = skip_back_lines(mm, offset, len(mm) - 1)
        if startofline < 0:
            return []  # Offset lines consumed whole file, nothing to return
            # If using a generator function (yield-ing, see below),
            # this should be a plain return, no empty list

        endoflines = startofline + 1  # Slice end to omit offset lines

        # Find start of lines to capture (add 1 to move from newline to beginning of following line)
        startofline = skip_back_lines(mm, n, startofline) + 1

        # Passing True to splitlines makes it return the list of lines without
        # removing the trailing newline (if any), so list mimics f.readlines()
        return mm[startofline:endoflines].splitlines(True)
        # If Windows style \r\n newlines need to be normalized to \n, and input
        # is ASCII compatible, can normalize newlines with:
        # return mm[startofline:endoflines].replace(os.linesep.encode('ascii'), b'\n').splitlines(True)

假设尾行的数目足够小，您可以安全地一次将它们全部读取到内存中；您还可以将其作为生成器函数，并通过将最后一行替换为以下内容来手动读取一行：

        mm.seek(startofline)
        # Call mm.readline n times, or until EOF, whichever comes first
        # Python 3.2 and earlier:
        for line in itertools.islice(iter(mm.readline, b''), n):
            yield line

        # 3.3+:
        yield from itertools.islice(iter(mm.readline, b''), n)

最后，以二进制模式（必须使用mmap）进行读取，因此得到str行（Py2）和bytes行（Py3）；如果您想要unicode（Py2）或str（Py3），则可以调整迭代方法为您解码和/或修复换行符：

        lines = itertools.islice(iter(mm.readline, b''), n)
        if f.encoding:  # Decode if the passed file was opened with a specific encoding
            lines = (line.decode(f.encoding) for line in lines)
        if 'b' not in f.mode:  # Fix line breaks if passed file opened in text mode
            lines = (line.replace(os.linesep, '\n') for line in lines)
        # Python 3.2 and earlier:
        for line in lines:
            yield line
        # 3.3+:
        yield from lines

注意：我在无法访问Python进行测试的机器上输入了所有内容。如果我有错字，请告诉我；这与我认为应该起作用的其他答案足够相似，但是这些调整（例如，处理）可能会导致细微的错误。如果有任何错误，请在评论中让我知道。offset

— 暗影游侠
source

3

我发现上面的Popen是最好的解决方案。它既快速又肮脏，并且适用于Unix机器上的python 2.6，我使用了以下命令

def GetLastNLines(self, n, fileName):
    """
    Name:           Get LastNLines
    Description:        Gets last n lines using Unix tail
    Output:         returns last n lines of a file
    Keyword argument:
    n -- number of last lines to return
    filename -- Name of the file you need to tail into
    """
    p = subprocess.Popen(['tail','-n',str(n),self.__fileName], stdout=subprocess.PIPE)
    soutput, sinput = p.communicate()
    return soutput

soutput将包含代码的最后n行。逐行遍历soutput：

for line in GetLastNLines(50,'myfile.log').split('\n'):
    print line

— 马可
source

2

基于S.Lott的最高票选答案（08年9月25日在21:43），但针对小文件而固定。

def tail(the_file, lines_2find=20):  
    the_file.seek(0, 2)                         #go to end of file
    bytes_in_file = the_file.tell()             
    lines_found, total_bytes_scanned = 0, 0
    while lines_2find+1 > lines_found and bytes_in_file > total_bytes_scanned: 
        byte_block = min(1024, bytes_in_file-total_bytes_scanned)
        the_file.seek(-(byte_block+total_bytes_scanned), 2)
        total_bytes_scanned += byte_block
        lines_found += the_file.read(1024).count('\n')
    the_file.seek(-total_bytes_scanned, 2)
    line_list = list(the_file.readlines())
    return line_list[-lines_2find:]

    #we read at least 21 line breaks from the bottom, block by block for speed
    #21 to ensure we don't get a half line

希望这是有用的。

— 眼球
source

2

pypi上有一些tail的现有实现，您可以使用pip安装：

mtFileUtil
多尾
log4tailer
...

根据您的情况，使用这些现有工具之一可能会有好处。

— 特拉维斯熊
source

您知道在Windows上可用的任何模块吗？我尝试过tailhead，tailer但是他们没有用。也试过了mtFileUtil。最初是因为print语句不带括号而引发错误（我在Python 3.6上）。我在其中添加了这些内容，reverse.py但错误消息消失了，但是当我的脚本调用模块（mtFileUtil.tail(open(logfile_path), 5)）时，它什么也不会打印。

— Technext

2

简单：

with open("test.txt") as f:
data = f.readlines()
tail = data[-2:]
print(''.join(tail)

— 桑巴·西瓦·雷迪（Samba Siva Reddy）
source

这完全是一个糟糕的实现。考虑处理大文件，并且n也是巨大的，太昂贵的操作

— Nivesh Krishna

1

为了提高大文件的效率（在日志文件中通常需要使用tail的情况下很常见），通常希望避免读取整个文件（即使这样做时也没有立即将整个文件读入内存），但是需要以某种方式而不是字符来计算偏移量。一种可能性是通过char用seek（）char向后读取，但这非常慢。相反，最好在较大的块中进行处理。

我有一阵子我写的实用程序函数，可以向后读取文件，可以在这里使用。

import os, itertools

def rblocks(f, blocksize=4096):
    """Read file as series of blocks from end of file to start.

    The data itself is in normal order, only the order of the blocks is reversed.
    ie. "hello world" -> ["ld","wor", "lo ", "hel"]
    Note that the file must be opened in binary mode.
    """
    if 'b' not in f.mode.lower():
        raise Exception("File must be opened using binary mode.")
    size = os.stat(f.name).st_size
    fullblocks, lastblock = divmod(size, blocksize)

    # The first(end of file) block will be short, since this leaves 
    # the rest aligned on a blocksize boundary.  This may be more 
    # efficient than having the last (first in file) block be short
    f.seek(-lastblock,2)
    yield f.read(lastblock)

    for i in range(fullblocks-1,-1, -1):
        f.seek(i * blocksize)
        yield f.read(blocksize)

def tail(f, nlines):
    buf = ''
    result = []
    for block in rblocks(f):
        buf = block + buf
        lines = buf.splitlines()

        # Return all lines except the first (since may be partial)
        if lines:
            result.extend(lines[1:]) # First line may not be complete
            if(len(result) >= nlines):
                return result[-nlines:]

            buf = lines[0]

    return ([buf]+result)[-nlines:]


f=open('file_to_tail.txt','rb')
for line in tail(f, 20):
    print line

[编辑]添加了更特定的版本（避免需要反向两次）

— 布赖恩
source

快速测试显示，此功能比上面的版本差很多。可能是由于您的缓冲。

— Armin Ronacher's

我怀疑这是因为我正在向后进行多次查找，所以没有充分利用预读缓冲区。但是，我认为如果您对行长的猜测不准确（例如，非常大的行）可能会更好，因为这样可以避免在这种情况下必须重新读取数据。

— 布赖恩

1

您可以使用f.seek（0，2）到文件末尾，然后用readline（）的以下替换内容逐行读取：

def readline_backwards(self, f):
    backline = ''
    last = ''
    while not last == '\n':
        backline = last + backline
        if f.tell() <= 0:
            return backline
        f.seek(-1, 1)
        last = f.read(1)
        f.seek(-1, 1)
    backline = last
    last = ''
    while not last == '\n':
        backline = last + backline
        if f.tell() <= 0:
            return backline
        f.seek(-1, 1)
        last = f.read(1)
        f.seek(-1, 1)
    f.seek(1, 1)
    return backline

— 兔子
source

1

基于Eyecue的答案（2010年6月10日，21：28）：此类将head（）和tail（）方法添加到文件对象。

class File(file):
    def head(self, lines_2find=1):
        self.seek(0)                            #Rewind file
        return [self.next() for x in xrange(lines_2find)]

    def tail(self, lines_2find=1):  
        self.seek(0, 2)                         #go to end of file
        bytes_in_file = self.tell()             
        lines_found, total_bytes_scanned = 0, 0
        while (lines_2find+1 > lines_found and
               bytes_in_file > total_bytes_scanned): 
            byte_block = min(1024, bytes_in_file-total_bytes_scanned)
            self.seek(-(byte_block+total_bytes_scanned), 2)
            total_bytes_scanned += byte_block
            lines_found += self.read(1024).count('\n')
        self.seek(-total_bytes_scanned, 2)
        line_list = list(self.readlines())
        return line_list[-lines_2find:]

用法：

f = File('path/to/file', 'r')
f.head(3)
f.tail(3)

— 数据库
source

1

如果文件未以\ n结尾或确保读取完整的第一行，则其中的几种解决方案都存在问题。

def tail(file, n=1, bs=1024):
    f = open(file)
    f.seek(-1,2)
    l = 1-f.read(1).count('\n') # If file doesn't end in \n, count it anyway.
    B = f.tell()
    while n >= l and B > 0:
            block = min(bs, B)
            B -= block
            f.seek(B, 0)
            l += f.read(block).count('\n')
    f.seek(B, 0)
    l = min(l,n) # discard first (incomplete) line if l > n
    lines = f.readlines()[-l:]
    f.close()
    return lines

— 大卫·罗杰斯
source

1

这是一个非常简单的实现：

with open('/etc/passwd', 'r') as f:
  try:
    f.seek(0,2)
    s = ''
    while s.count('\n') < 11:
      cur = f.tell()
      f.seek((cur - 10))
      s = f.read(10) + s
      f.seek((cur - 10))
    print s
  except Exception as e:
    f.readlines()

— GL2014
source

很好的例子！您能先解释一下try的用法f.seek吗？为什么不之前with open呢？另外，为什么要在except您中做一个f.readlines()？

老实说，应该先尝试一下。我不记得除了在健康的标准Linux系统上没有其他原因没有捕获open（）之外，/ etc / passwd应该始终可读。尝试，然后是更常见的顺序。

— GL2014年

1

有一个非常有用的模块可以做到这一点：

from file_read_backwards import FileReadBackwards

with FileReadBackwards("/tmp/file", encoding="utf-8") as frb:

# getting lines by lines starting from the last line up
for l in frb:
    print(l)

— 昆腾·卡波
source

1

另一种解决方案

如果您的txt文件如下所示：鼠标蛇猫蜥蜴狼狗

您可以通过简单地在python'''中使用数组索引来反转此文件

contents=[]
def tail(contents,n):
    with open('file.txt') as file:
        for i in file.readlines():
            contents.append(i)

    for i in contents[:n:-1]:
        print(i)

tail(contents,-5)

结果：狗狼蜥蜴猫

— 布莱恩·麦克马洪
source

1

最简单的方法是使用deque：

from collections import deque

def tail(filename, n=10):
    with open(filename) as f:
        return deque(f, n)

— 王振
source

0

我不得不从文件的最后一行读取特定值，并偶然发现了该线程。我没有重新发明Python的工作原理，而是得到了一个很小的shell脚本，保存为/ usr / local / bin / get_last_netp：

#! /bin/bash
tail -n1 /home/leif/projects/transfer/export.log | awk {'print $14'}

在Python程序中：

from subprocess import check_output

last_netp = int(check_output("/usr/local/bin/get_last_netp"))

— 莱福克
source

0

不是第一个使用双端队列的示例，而是一个更简单的示例。这是一般性的：它适用于任何可迭代的对象，而不仅仅是文件。

#!/usr/bin/env python
import sys
import collections
def tail(iterable, N):
    deq = collections.deque()
    for thing in iterable:
        if len(deq) >= N:
            deq.popleft()
        deq.append(thing)
    for thing in deq:
        yield thing
if __name__ == '__main__':
    for line in tail(sys.stdin,10):
        sys.stdout.write(line)

— 哈尔·金丝雀
source

0

This is my version of tailf

import sys, time, os

filename = 'path to file'

try:
    with open(filename) as f:
        size = os.path.getsize(filename)
        if size < 1024:
            s = size
        else:
            s = 999
        f.seek(-s, 2)
        l = f.read()
        print l
        while True:
            line = f.readline()
            if not line:
                time.sleep(1)
                continue
            print line
except IOError:
    pass

— 拉吉
source

0

import time

attemps = 600
wait_sec = 5
fname = "YOUR_PATH"

with open(fname, "r") as f:
    where = f.tell()
    for i in range(attemps):
        line = f.readline()
        if not line:
            time.sleep(wait_sec)
            f.seek(where)
        else:
            print line, # already has newline

— moylop260
source

0

import itertools
fname = 'log.txt'
offset = 5
n = 10
with open(fname) as f:
    n_last_lines = list(reversed([x for x in itertools.islice(f, None)][-(offset+1):-(offset+n+1):-1]))

— Kal
source

0

abc = "2018-06-16 04:45:18.68"
filename = "abc.txt"
with open(filename) as myFile:
    for num, line in enumerate(myFile, 1):
        if abc in line:
            lastline = num
print "last occurance of work at file is in "+str(lastline)

— 康德
source

0

更新由A.Coady给出的答案

适用于python 3。

这使用指数搜索，将仅缓冲N来自后面的行，并且非常高效。

import time
import os
import sys

def tail(f, n):
    assert n >= 0
    pos, lines = n+1, []

    # set file pointer to end

    f.seek(0, os.SEEK_END)

    isFileSmall = False

    while len(lines) <= n:
        try:
            f.seek(f.tell() - pos, os.SEEK_SET)
        except ValueError as e:
            # lines greater than file seeking size
            # seek to start
            f.seek(0,os.SEEK_SET)
            isFileSmall = True
        except IOError:
            print("Some problem reading/seeking the file")
            sys.exit(-1)
        finally:
            lines = f.readlines()
            if isFileSmall:
                break

        pos *= 2

    print(lines)

    return lines[-n:]




with open("stream_logs.txt") as f:
    while(True):
        time.sleep(0.5)
        print(tail(f,2))

— Itsjwala
source

-1

再次考虑，这可能和这里的一切一样快。

def tail( f, window=20 ):
    lines= ['']*window
    count= 0
    for l in f:
        lines[count%window]= l
        count += 1
    print lines[count%window:], lines[:count%window]

这要简单得多。而且它似乎确实在快速发展。

— 洛特
source

因为这里的几乎所有内容都无法处理超过30 MB的日志文件，而没有将相同数量的内存加载到RAM中；）您的第一个版本要好很多，但是对于这里的测试文件，它的性能比我的稍差并且它不适用于其他换行符。

— Armin Ronacher

3

我错了。版本1花费0.00248908996582在字典中排了10个尾巴。第2版在字典中花了10尾巴才花了1.2963051796。我几乎要投我自己。

— S.Lott

“不能与其他换行符一起使用。” 如果重要，请用len（data.splitlines（））替换datacount（'\ n'）。

— S.Lott