Python:基于某些行appers的pandas数据框中的两列(变量)获得频率计数


Answers:


144

您可以使用groupby的size

In [11]: df.groupby(["Group", "Size"]).size()
Out[11]:
Group     Size
Moderate  Medium    1
          Small     1
Short     Small     2
Tall      Large     1
dtype: int64

In [12]: df.groupby(["Group", "Size"]).size().reset_index(name="Time")
Out[12]:
      Group    Size  Time
0  Moderate  Medium     1
1  Moderate   Small     1
2     Short   Small     2
3      Tall   Large     1

7
谢谢。一个次要的选择,可以根据频率(“时间”)选择前k个(= 20)值:df.groupby([“ Group”,“ Size”])。size()。reset_index(name =“ Time”) .sort_values(by ='Time',ascending = False).head(20);
Dileep Kumar Patchigolla

1
只需注意,使用.size()将返回Series而while.size().reset_index(name="Time")是一个DataFrame。谢谢安迪。
alemol

或者您也可以做df.groupby(by=["Group", "Size"], as_index=False).size()简单
Naveen Kumar

51

熊猫1.1之后更新value_counts现在接受多列

df.value_counts(["Group", "Size"])

您也可以尝试 pd.crosstab()

Group           Size

Short          Small
Short          Small
Moderate       Medium
Moderate       Small
Tall           Large

pd.crosstab(df.Group,df.Size)


Size      Large  Medium  Small
Group                         
Moderate      0       1      1
Short         0       0      2
Tall          1       0      0

编辑:为了得到你的输出

pd.crosstab(df.Group,df.Size).replace(0,np.nan).\
     stack().reset_index().rename(columns={0:'Time'})
Out[591]: 
      Group    Size  Time
0  Moderate  Medium   1.0
1  Moderate   Small   1.0
2     Short   Small   2.0
3      Tall   Large   1.0

7
很好 您甚至可以添加margins=True以获得边际计数!
Matt Hancock

0

其他可能性正在使用.pivot_table()aggfunc='size'

df_solution = df.pivot_table(index=['Group','Size'], aggfunc='size')
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.