将列表列表转换为Pandas Dataframe


30

我正在尝试将如下所示的列表列表转换为Pandas Dataframe

[['New York Yankees ', '"Acevedo Juan"  ', 900000, ' Pitcher\n'], 
['New York Yankees ', '"Anderson Jason"', 300000, ' Pitcher\n'], 
['New York Yankees ', '"Clemens Roger" ', 10100000, ' Pitcher\n'], 
['New York Yankees ', '"Contreras Jose"', 5500000, ' Pitcher\n']]

我基本上是试图将数组中的每个项目转换为具有四列的pandas数据框。最好的方法是pd.Dataframe并不能完全满足我的需求。


在堆栈溢出中看到以下问题:stackoverflow.com/questions
/.../…

Answers:


36
import pandas as pd

data = [['New York Yankees', 'Acevedo Juan', 900000, 'Pitcher'], 
        ['New York Yankees', 'Anderson Jason', 300000, 'Pitcher'], 
        ['New York Yankees', 'Clemens Roger', 10100000, 'Pitcher'], 
        ['New York Yankees', 'Contreras Jose', 5500000, 'Pitcher']]

df = pd.DataFrame.from_records(data)

4
您可以使用以下方法进一步完善它:DataFrame.from_records(data,columns = ['Team','Player','whatever-stat-is-that','position'])
Juan Ignacio Gil

1
有没有一种方法可以更具体地指定进口?例如,我想指定DataFrame["Team"]必须引用每个子列表的第一项(即data[i][0]),并DataFrame["Position"]引用每个子列表的最后一项(即data[i][-1])?
伊沃

@Ivo:使用columns的参数DataFrame.from_records
Emre

14

获得数据后:

import pandas as pd

data = [['New York Yankees ', '"Acevedo Juan"  ', 900000, ' Pitcher\n'], 
        ['New York Yankees ', '"Anderson Jason"', 300000, ' Pitcher\n'], 
        ['New York Yankees ', '"Clemens Roger" ', 10100000, ' Pitcher\n'], 
        ['New York Yankees ', '"Contreras Jose"', 5500000, ' Pitcher\n']]

您可以通过转置数据来创建数据框:

data_transposed = zip(data)
df = pd.DataFrame(data_transposed, columns=["Team", "Player", "Salary", "Role"])

其它的办法:

df = pd.DataFrame(data)
df = df.transpose()
df.columns = ["Team", "Player", "Salary", "Role"]

5

您可以将其直接定义为数据框,如下所示:

import pandas as pd

data = [['New York Yankees', 'Acevedo Juan', 900000, 'Pitcher'], 
        ['New York Yankees', 'Anderson Jason', 300000, 'Pitcher'], 
        ['New York Yankees', 'Clemens Roger', 10100000, 'Pitcher'], 
        ['New York Yankees', 'Contreras Jose', 5500000, 'Pitcher']]

data = pd.DataFrame(data)

1
import pandas as pd

data = [['New York Yankees', 'Acevedo Juan', 900000, 'Pitcher'],
        ['New York Yankees', 'Anderson Jason', 300000, 'Pitcher'], 
        ['New York Yankees', 'Clemens Roger', 10100000, 'Pitcher'], 
        ['New York Yankees', 'Contreras Jose', 5500000, 'Pitcher']]

df = pd.DataFrame(data)

0

到目前为止,这是最简单的:

import pandas as pd

data = [['New York Yankees', 'Acevedo Juan', 900000, 'Pitcher'], 
        ['New York Yankees', 'Anderson Jason', 300000, 'Pitcher'], 
        ['New York Yankees', 'Clemens Roger', 10100000, 'Pitcher'], 
        ['New York Yankees', 'Contreras Jose', 5500000, 'Pitcher']]

data = pd.DataFrame(data)

现在,如果键是列表列表中的第一个列表(数据[0]),则可以将其分配给数据框中的列标题,如下所示:

import pandas as pd

data = [['key1', 'key2', key3, 'key4'], 
    ['New York Yankees', 'Anderson Jason', 300000, 'Pitcher'], 
    ['New York Yankees', 'Clemens Roger', 10100000, 'Pitcher'], 
    ['New York Yankees', 'Contreras Jose', 5500000, 'Pitcher']]

data = pd.DataFrame(data[1:], columns=data[0])
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.