Answers:
Python 2
with open("datafile") as myfile:
head = [next(myfile) for x in xrange(N)]
print head
Python 3
with open("datafile") as myfile:
head = [next(myfile) for x in range(N)]
print(head)
这是另一种方式(Python 2和3)
from itertools import islice
with open("datafile") as myfile:
head = list(islice(myfile, N))
print head
N = 10
with open("file.txt", "a") as file: # the a opens it in append mode
for i in range(N):
line = next(file).strip()
print(line)
f = open("file")
无例外地关闭文件时,我都会畏缩。处理文件的Pythonic方法是使用上下文管理器,即使用with语句。输入输出Python教程对此进行了介绍。"It is good practice to use the with keyword when dealing with file objects. This has the advantage that the file is properly closed after its suite finishes, even if an exception is raised on the way."
如果您想快速阅读第一行而又不关心性能,则可以使用.readlines()
返回列表对象然后对列表进行切片的方法。
例如前5行:
with open("pathofmyfileandfileandname") as myfile:
firstNlines=myfile.readlines()[0:5] #put here the interval you want
注意:整个文件都是读取的,因此从性能的角度来看并不是最好的,但是它易于使用,编写速度快且易于记忆,因此如果您只想执行一次一次性计算就非常方便
print firstNlines
与其他答案相比,一个优点是可以轻松选择行范围,例如跳过前10行[10:30]
或后10行或[:-10]
仅采用偶数行[::2]
。
我要做的是使用调用N行pandas
。我认为性能不是最好的,但是例如N=1000
:
import pandas as pd
yourfile = pd.read('path/to/your/file.csv',nrows=1000)
nrows
选项,该选项可以设置为1000并且不加载整个文件。pandas.pydata.org/pandas-docs/stable/generated/…通常,pandas具有针对大文件的这种和其他节省内存的技术。
sep
以定义列定界符(在非csv文件中不应出现)
pandas.read()
在文档中找不到该功能,您知道有关此主题的任何信息吗?
基于gnibbler最高投票的答案(09年11月20日,0:27):此类将head()和tail()方法添加到文件对象。
class File(file):
def head(self, lines_2find=1):
self.seek(0) #Rewind file
return [self.next() for x in xrange(lines_2find)]
def tail(self, lines_2find=1):
self.seek(0, 2) #go to end of file
bytes_in_file = self.tell()
lines_found, total_bytes_scanned = 0, 0
while (lines_2find+1 > lines_found and
bytes_in_file > total_bytes_scanned):
byte_block = min(1024, bytes_in_file-total_bytes_scanned)
self.seek(-(byte_block+total_bytes_scanned), 2)
total_bytes_scanned += byte_block
lines_found += self.read(1024).count('\n')
self.seek(-total_bytes_scanned, 2)
line_list = list(self.readlines())
return line_list[-lines_2find:]
用法:
f = File('path/to/file', 'r')
f.head(3)
f.tail(3)
做到这一点的两种最直观的方法是:
迭代对文件中的行由行和break
后N
线。
使用next()
方法N
时间逐行迭代文件。(这实际上是最佳答案的语法不同。)
这是代码:
# Method 1:
with open("fileName", "r") as f:
counter = 0
for line in f:
print line
counter += 1
if counter == N: break
# Method 2:
with open("fileName", "r") as f:
for i in xrange(N):
line = f.next()
print line
底线是,只要您不使用整个文件readlines()
或enumerate
将其放入内存中,您就有很多选择。
如果您希望某些东西(无需查找手册中深奥的东西)显然不需要导入和try / except即可工作,并且可以在各种Python 2.x版本(2.2至2.6)上工作:
def headn(file_name, n):
"""Like *x head -N command"""
result = []
nlines = 0
assert n >= 1
for line in open(file_name):
result.append(line)
nlines += 1
if nlines >= n:
break
return result
if __name__ == "__main__":
import sys
rval = headn(sys.argv[1], int(sys.argv[2]))
print rval
print len(rval)
如果文件很大,并且假设您希望输出为numpy数组,则使用np.genfromtxt将冻结您的计算机。根据我的经验,这要好得多:
def load_big_file(fname,maxrows):
'''only works for well-formed text file of space-separated doubles'''
rows = [] # unknown number of lines, so use list
with open(fname) as f:
j=0
for line in f:
if j==maxrows:
break
else:
line = [float(s) for s in line.split()]
rows.append(np.array(line, dtype = np.double))
j+=1
return np.vstack(rows) # convert list of vectors to array
从Python 2.6开始,您可以在IO基本类中利用更复杂的功能。因此,上面评分最高的答案可以重写为:
with open("datafile") as myfile:
head = myfile.readlines(N)
print head
(您不必担心文件少于N行,因为不会引发StopIteration异常。)
lines
但参数提及bytes
。
这适用于Python 2和3:
from itertools import islice
with open('/tmp/filename.txt') as inf:
for line in islice(inf, N, N+M):
print(line)
fname = input("Enter file name: ")
num_lines = 0
with open(fname, 'r') as f: #lines count
for line in f:
num_lines += 1
num_lines_input = int (input("Enter line numbers: "))
if num_lines_input <= num_lines:
f = open(fname, "r")
for x in range(num_lines_input):
a = f.readline()
print(a)
else:
f = open(fname, "r")
for x in range(num_lines_input):
a = f.readline()
print(a)
print("Don't have", num_lines_input, " lines print as much as you can")
print("Total lines in the text",num_lines)
#!/usr/bin/python
import subprocess
p = subprocess.Popen(["tail", "-n 3", "passlist"], stdout=subprocess.PIPE)
output, err = p.communicate()
print output
这种方法对我有用