将Pandas DataFrame的行转换为列标题，

111

我必须使用的数据有点混乱。它的数据中包含标头名称。如何从现有的pandas数据框中选择一行并使其（重命名为）列标题？

我想做类似的事情：

header = df[df['old_header_name1'] == 'new_header_name1']

df.columns = header

— EK
source

196

In [21]: df = pd.DataFrame([(1,2,3), ('foo','bar','baz'), (4,5,6)])

In [22]: df
Out[22]: 
     0    1    2
0    1    2    3
1  foo  bar  baz
2    4    5    6

将列标签设置为等于第二行（索引位置1）中的值：

In [23]: df.columns = df.iloc[1]

如果索引具有唯一标签，则可以使用以下命令删除第二行：

In [24]: df.drop(df.index[1])
Out[24]: 
1 foo bar baz
0   1   2   3
2   4   5   6

如果索引不是唯一的，则可以使用：

In [133]: df.iloc[pd.RangeIndex(len(df)).drop(1)]
Out[133]: 
1 foo bar baz
0   1   2   3
2   4   5   6

使用df.drop(df.index[1])删除所有与第二行具有相同标签的行。因为非唯一索引可能会导致像这样的绊脚石（或潜在的错误），所以通常最好注意索引的唯一性（即使Pandas不需要它）。

— Unutbu
source

非常感谢您的快速回复！如何选择按值代替索引位置的行作为标题？因此，对于您的示例，类似.. df.columns = df [df [0] =='foo']

— EK

问题在于，可能存在多于一行的值"foo"。解决该问题的一种方法是显式选择第一行：df.columns = df.iloc[np.where(df[0] == 'foo')[0][0]]。

— unutbu 2014年

啊，我明白你为什么要那样做。就我而言，我知道只有一行具有值“ foo”。这样就可以了我只是以这种方式做的，我想这与您上面给我的那句话是一样的。idx_loc = df [df [0] =='foo']。index.tolist（）[0] df.columns = df.iloc [idx_loc]

— EK

63

这有效（熊猫v'0.19.2'）：

df.rename(columns=df.iloc[0])

— 扎卡里·威尔逊（Zachary Wilson）
source

22

您可以通过添加.drop(df.index[0])

— ostrokach

我喜欢这个比实际接受的答案更好。我喜欢简短的oneline解决方案。

— 哈维尔

13

重新创建数据框会更容易。这也将从头开始解释列的类型。

headers = df.iloc[0]
new_df  = pd.DataFrame(df.values[1:], columns=headers)

— shahar_m
source

4

您可以通过代表的参数在read_csv或read_html构造函数中指定行索引。这样的优点是可以自动删除所有先前被认为是垃圾的行。headerRow number(s) to use as the column names, and the start of the data

import pandas as pd
from io import StringIO

In[1]
    csv = '''junk1, junk2, junk3, junk4, junk5
    junk1, junk2, junk3, junk4, junk5
    pears, apples, lemons, plums, other
    40, 50, 61, 72, 85
    '''

    df = pd.read_csv(StringIO(csv), header=2)
    print(df)

Out[1]
       pears   apples   lemons   plums   other
    0     40       50       61      72      85

— ccpizza
source