Pandas DataFrame到字典列表

165

我有以下DataFrame：

客户item1 item2 item3
1个苹果牛奶番茄
2水橙土豆
3汁芒果片

我想将其翻译为每行词典列表

rows = [{'customer': 1, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
    {'customer': 2, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
    {'customer': 3, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]

— 穆罕默德·易卜拉欣
source

2

欢迎使用Stack Overflow！我将您的代码示例缩进了4个空格，以便正确呈现-请参阅编辑帮助以获取有关格式的更多信息。

— ByteHamster 2015年

189

编辑

正如John Galt在回答中提到的那样，您可能应该改用df.to_dict('records')。它比手动移调要快。

In [20]: timeit df.T.to_dict().values()
1000 loops, best of 3: 395 µs per loop

In [21]: timeit df.to_dict('records')
10000 loops, best of 3: 53 µs per loop

原始答案

使用df.T.to_dict().values()，如下所示：

In [1]: df
Out[1]:
   customer  item1   item2   item3
0         1  apple    milk  tomato
1         2  water  orange  potato
2         3  juice   mango   chips

In [2]: df.T.to_dict().values()
Out[2]:
[{'customer': 1.0, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
 {'customer': 2.0, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
 {'customer': 3.0, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]

— 电脑研究员
source

2

如果数据帧为每个客户包含许多行，那该怎么办？

— 阿齐兹

2

使用时df.T.to_dict().values()，我也会失去排序顺序

— 侯赛因（Hussain）

当打开一个CSV文件类型的字典的名单，我得到两倍的速度unicodecsv.DictReader

— radtek

219

用途df.to_dict('records')-提供输出，而无需外部转置。

In [2]: df.to_dict('records')
Out[2]:
[{'customer': 1L, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
 {'customer': 2L, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
 {'customer': 3L, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]

— 零
source

2

我如何更改它以将索引值包含在结果列表的每个条目中？

— 加布里埃尔·奥利维拉

5

@ GabrielL.Oliveira，您可以执行df.reset_index（）。to_dict（'records'）

— 马伟

是否在每种情况下都保留了列的顺序，即结果列表中的第n个条目也总是第n个列吗？

— 克莱布'18

@Cleb是i.e. is the nth entry in the resulting list always also the nth column?第n列还是第n行？

— Nauman Naeem

14

作为对John Galt答案的扩展-

对于以下DataFrame，

   customer  item1   item2   item3
0         1  apple    milk  tomato
1         2  water  orange  potato
2         3  juice   mango   chips

如果要获取包含索引值的词典列表，可以执行以下操作：

df.to_dict('index')

输出字典的字典，其中父字典的键是索引值。在这种情况下

{0: {'customer': 1, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
 1: {'customer': 2, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
 2: {'customer': 3, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}}

— 侯赛因·穆克塔迪尔（Hossain Muctadir）
source

1

如果您只想选择一列，则可以使用。

df[["item1"]].to_dict("records")

下面将不工作，并产生一个类型错误：不支持的类型。我相信这是因为它正在尝试将系列转换为字典，而不是将数据帧转换为字典。

df["item1"].to_dict("records")

我只需要选择一个列，然后将其转换为以列名作为键的字典列表，然后在此卡住一会儿，以至于我想与大家分享。

— 乔·里维拉
source