从第2行读取文件或跳过标题行

242

如何跳过标题行并开始从第2行读取文件？

python file-io

— 超级9
source

453

with open(fname) as f:
    next(f)
    for line in f:
        #do something

— 寂静幽灵
source

51

如果您以后需要标题，则不要next(f)使用f.readline()它并将其存储为变量

— 该死

36

或使用header_line = next(f)。

— 塞缪尔

94

f = open(fname,'r')
lines = f.readlines()[1:]
f.close()

— 克里斯考利
source

这将跳过1行。['a', 'b', 'c'][1:]=>['b', 'c']

— 埃里克·杜米尼尔

3

@LjubisaLivac是正确的-这个答案可以推广到任何行，所以这是一个更强大的解决方案。

— Daniel Soutar '18

17

很好，直到文件太大而无法读取。这对于小文件来说很好。

— CppLearner '18

1

切片还会构建内容的副本。这只是不必要的低效。

— chepner

如docs.python.org/3/library/itertools.html#itertools-recipes所述，如何使用consume()from ？我在stackoverflow.com/questions/11113803上听说过more-itertools

— AnotherParker

24

如果要第一行，然后要对文件执行一些操作，此代码将很有帮助。

with open(filename , 'r') as f:
    first_line = f.readline()
    for line in f:
            # Perform some operations

— saimadhu.polamuri
source

如果不需要此行，则不必将readline（）分配给变量。我最喜欢这种解决方案。

— 安娜，

不建议将直接读取与使用文件作为迭代器混合使用（尽管在这种情况下不会造成任何危害）。

— chepner

9

如果切片可以在迭代器上工作...

from itertools import islice
with open(fname) as f:
    for line in islice(f, 1, None):
        pass

— 瓦杰克·赫尔梅茨（Vajk Hermecz）
source

1

这是解决问题的一种非常好的Python方式，可以扩展到任意数量的标题行

— Dai Dai

这是一个非常好的执行！

— 柴油

奇妙的解决方案

— Russ Hyde

应该比现在多得多。

— chepner

8

f = open(fname).readlines()
firstLine = f.pop(0) #removes the first line
for line in f:
    ...

— 德罗·希尔曼
source

2

这将立即将整个文件读取到内存中，因此，只有在读取较小的文件时，这才是实用的。

— 海顿·希夫

1

为了概括读取多个标题行的任务并提高可读性，我将使用方法提取。假设您想标记化前三行coordinates.txt以用作标题信息。

例

coordinates.txt
---------------
Name,Longitude,Latitude,Elevation, Comments
String, Decimal Deg., Decimal Deg., Meters, String
Euler's Town,7.58857,47.559537,0, "Blah"
Faneuil Hall,-71.054773,42.360217,0
Yellowstone National Park,-110.588455,44.427963,0

然后提取方法允许你指定什么，你想用头信息做（在这个例子中，我们简单的记号化基础上，逗号标题行并返回一个列表，但有足够的空间做更多的工作）。

def __readheader(filehandle, numberheaderlines=1):
    """Reads the specified number of lines and returns the comma-delimited 
    strings on each line as a list"""
    for _ in range(numberheaderlines):
        yield map(str.strip, filehandle.readline().strip().split(','))

with open('coordinates.txt', 'r') as rh:
    # Single header line
    #print next(__readheader(rh))

    # Multiple header lines
    for headerline in __readheader(rh, numberheaderlines=2):
        print headerline  # Or do other stuff with headerline tokens

输出量

['Name', 'Longitude', 'Latitude', 'Elevation', 'Comments']
['String', 'Decimal Deg.', 'Decimal Deg.', 'Meters', 'String']

如果coordinates.txt包含另一个标题行，只需更改numberheaderlines。最重要的是，很清楚__readheader(rh, numberheaderlines=2)正在做什么，并且我们避免了必须弄清楚或评论为什么接受的答案的作者next()在其代码中使用原因的含糊之处。

— 明创
source

1

如果您想从第2行开始读取多个CSV文件，这就像一个超级按钮

for files in csv_file_list:
        with open(files, 'r') as r: 
            next(r)                  #skip headers             
            rr = csv.reader(r)
            for row in rr:
                #do something

（这是帕菲特对另一个问题的回答的一部分）

— 蒂亚戈·马丁斯·佩雷斯李大仁
source

0

# Open a connection to the file
with open('world_dev_ind.csv') as file:

    # Skip the column names
    file.readline()

    # Initialize an empty dictionary: counts_dict
    counts_dict = {}

    # Process only the first 1000 rows
    for j in range(0, 1000):

        # Split the current line into a list: line
        line = file.readline().split(',')

        # Get the value for the first column: first_col
        first_col = line[0]

        # If the column value is in the dict, increment its value
        if first_col in counts_dict.keys():
            counts_dict[first_col] += 1

        # Else, add to the dict and set value to 1
        else:
            counts_dict[first_col] = 1

# Print the resulting dictionary
print(counts_dict)

— 毛罗·雷曼特里亚
source