程序设计 pandas

4

考虑以下数据帧： A B C D 0 foo one 0.162003 0.087469 1 bar one -1.156319 -1.526272 2 foo two 0.833892 -1.666304 3 bar three -2.026673 -0.322057 4 foo two 0.411452 -0.954371 5 bar two 0.765878 -0.095968 6 foo one -0.654890 0.678091 7 foo three -1.789842 -1.130922 以下命令起作用： > df.groupby('A').apply(lambda x: (x['C'] …

174 python pandas

5

pandas loc vs. iloc vs. ix vs. at vs. iat？

最近开始从我的安全地方（R）分支到Python，并且对中的单元格本地化/选择感到有些困惑Pandas。我已经阅读了文档，但仍在努力了解各种本地化/选择选项的实际含义。我为什么应该使用.loc或.iloc超过最一般的选择.ix？我的理解是.loc，iloc，at，和iat可以提供一些保证正确性是.ix不能提供的，但我也看到了在那里.ix往往是一刀切最快的解决方案。请说明使用除.ix？以外的任何东西背后的现实世界，最佳实践推理。

171 python pandas performance indexing lookup

10

如何将Seaborn图保存到文件中

我尝试了以下代码（test_seaborn.py）： import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt matplotlib.style.use('ggplot') import seaborn as sns sns.set() df = sns.load_dataset('iris') sns_plot = sns.pairplot(df, hue='species', size=2.5) fig = sns_plot.get_figure() fig.savefig("output.png") #sns.plt.show() 但是我得到这个错误： Traceback (most recent call last): File "test_searborn.py", line 11, in <module> fig = sns_plot.get_figure() AttributeError: 'PairGrid' object has no attribute 'get_figure' 我希望决赛output.png将存在，看起来像这样： …

171 python pandas matplotlib seaborn

8

如何在没有索引的情况下打印Pandas DataFrame

我想打印整个数据框，但是我不想打印索引此外，一列是日期时间类型，我只想打印时间，而不是日期。数据框如下所示： User ID Enter Time Activity Number 0 123 2014-07-08 00:09:00 1411 1 123 2014-07-08 00:18:00 893 2 123 2014-07-08 00:49:00 1041 我希望它打印为 User ID Enter Time Activity Number 123 00:09:00 1411 123 00:18:00 893 123 00:49:00 1041

170 python datetime pandas dataframe

7

如何通过正则表达式过滤熊猫中的行

我想在其中一列上使用正则表达式干净地过滤数据框。举一个人为的例子： In [210]: foo = pd.DataFrame({'a' : [1,2,3,4], 'b' : ['hi', 'foo', 'fat', 'cat']}) In [211]: foo Out[211]: a b 0 1 hi 1 2 foo 2 3 fat 3 4 cat 我想将行过滤为以f正则表达式开头的行。首先去： In [213]: foo.b.str.match('f.*') Out[213]: 0 [] 1 () 2 () 3 [] 这不是太有用了。但是，这将使我得到我的布尔值索引： In [226]: foo.b.str.match('(f.*)').str.len() …

169 python regex pandas

5

pandas系列和单列DataFrame有什么区别？

为何熊猫区分a Series和单栏DataFrame？换句话说：Series该类存在的原因是什么？我主要使用带有datetime索引的时间序列，也许这有助于设置上下文。

168 python pandas

3

大熊猫：在多列上合并（合并）两个数据框

我正在尝试使用两列加入两个熊猫数据框： new_df = pd.merge(A_df, B_df, how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]') 但出现以下错误： pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)() pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)() pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)() pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)() KeyError: '[B_1, c2]' 任何想法应该是正确的方法吗？谢谢！

168 python python-3.x pandas join

7

将Pandas DataFrame转换为字典

我有一个包含四列的DataFrame。我想将此DataFrame转换为python字典。我希望第一列keys的元素为，同一行中其他列的元素为values。数据框： ID A B C 0 p 1 3 2 1 q 4 3 2 2 r 4 0 9 输出应如下所示：字典： {'p': [1,3,2], 'q': [4,3,2], 'r': [4,0,9]}

168 python pandas dictionary dataframe

5

熊猫groupby排序

我想按两列对数据框进行分组，然后对各组中的汇总结果进行排序。 In [167]: df Out[167]: count job source 0 2 sales A 1 4 sales B 2 6 sales C 3 3 sales D 4 7 sales E 5 5 market A 6 3 market B 7 2 market C 8 4 market D 9 1 market E In [168]: …

166 python sorting pandas group-by

6

熊猫DataFrame Groupby两列并获取计数

我有以下格式的熊猫数据框： df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T df.columns = ['col1','col2','col3','col4','col5'] df： col1 col2 col3 col4 col5 0 1.1 A 1.1 x/y/z 1 1 1.1 A 1.7 x/y 3 2 1.1 A 2.5 x/y/z/n 3 3 2.6 B 2.6 x/u 2 …

165 python pandas dataframe

4

如何将标题行添加到Pandas DataFrame

我正在将csv文件读入pandas。此csv文件由四列和一些行组成，但没有要添加的标题行。我一直在尝试以下方法： Cov = pd.read_csv("path/to/file.txt", sep='\t') Frame=pd.DataFrame([Cov], columns = ["Sequence", "Start", "End", "Coverage"]) Frame.to_csv("path/to/file.txt", sep='\t') 但是，当我应用代码时，出现以下错误： ValueError: Shape of passed values is (1, 1), indices imply (4, 1) 错误的确切含义是什么？在python中将标题行添加到csv文件/ pandas df的一种干净方法是什么？

165 python csv pandas header

5

应用具有多个参数的函数以创建新的pandas列

我想pandas通过将函数应用于两个现有列在数据框中创建一个新列。按照这个答案，当我只需要一个列作为参数时，我已经能够创建一个新列： import pandas as pd df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]}) def fx(x): return x * x print(df) df['newcolumn'] = df.A.apply(fx) print(df) 但是，当函数需要多个参数时，我无法弄清楚该怎么做。例如，如何通过将A列和B列传递给下面的函数来创建新列？ def fxy(x, y): return x * y

165 python pandas

7

如何使用列的格式字符串显示浮点数的pandas DataFrame？

我想使用print()和IPython 显示给定格式的熊猫数据框display()。例如： df = pd.DataFrame([123.4567, 234.5678, 345.6789, 456.7890], index=['foo','bar','baz','quux'], columns=['cost']) print df cost foo 123.4567 bar 234.5678 baz 345.6789 quux 456.7890 我想以某种方式强迫这样做 cost foo $123.46 bar $234.57 baz $345.68 quux $456.79 无需修改数据本身或创建副本，只需更改其显示方式即可。我怎样才能做到这一点？

165 python python-2.7 pandas ipython dataframe

4

Pandas DataFrame到字典列表

我有以下DataFrame：客户item1 item2 item3 1个苹果牛奶番茄 2水橙土豆 3汁芒果片我想将其翻译为每行词典列表 rows = [{'customer': 1, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'}, {'customer': 2, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'}, {'customer': 3, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]

165 python list dictionary pandas dataframe

7

使用Pandas对同一工作簿的多个工作表进行pd.read_excel（）

我有一个较大的电子表格文件（.xlsx），正在使用python pandas处理。碰巧我需要该大文件中两个选项卡中的数据。选项卡中的一个包含大量数据，另一个仅包含几个正方形单元格。当我在任何工作表上使用pd.read_excel（）时，在我看来整个文件都已加载（而不仅仅是我感兴趣的工作表）。因此，当我两次使用该方法（每张纸一次）时，我实际上不得不使整个工作簿被读两次（即使我们仅使用指定的工作表）。我使用的是错误的还是仅限于这种方式？谢谢！

165 python excel pandas dataframe

Questions tagged «pandas»