熊猫获得列平均值/平均值


155

我无法获得熊猫列的平均值或均值。有一个数据框。我在下面尝试的任何事情都没有给我该列的平均值weight

>>> allDF 
         ID           birthyear  weight
0        619040       1962       0.1231231
1        600161       1963       0.981742
2      25602033       1963       1.3123124     
3        624870       1987       0.94212

以下返回几个值,而不是一个:

allDF[['weight']].mean(axis=1)

这样:

allDF.groupby('weight').mean()


df.groupby('weight')并不是您想要的,因为它会将df分为不同的列,每列都有不同的权重值。df['weight'].mean()
而不

allDF。
weight.mean

Answers:


265

如果您只想要weight列的均值,请选择列(这是一个系列),然后调用.mean()

In [479]: df
Out[479]: 
         ID  birthyear    weight
0    619040       1962  0.123123
1    600161       1963  0.981742
2  25602033       1963  1.312312
3    624870       1987  0.942120

In [480]: df["weight"].mean()
Out[480]: 0.83982437500000007

1
如果我想获得每一列的平均值怎么办?
克里斯(Chris

3
@Chris df.describe()
Abhishek Poojary '18

2
@Chris df.mean()为您提供每一列的权重,并按系列返回。
emschorsch '19

24

Try df.mean(axis=0)axis=0参数计算数据帧的列均值,因此结果将axis=1是行均值,因此您将获得多个值。


13

尝试尝试一下print (df.describe())。希望对您的数据框有一个总体描述会很有帮助。


1
display(df.describe())更好(在Jupyter Notebooks中),因为displayipython提供了格式化的HTML而不是ASCII,这在视觉上更加有用/令人愉悦。
陈占文

6

您可以使用

df.describe() 

您将获得数据框的基本统计信息并获取可以使用的特定列的平均值

df["columnname"].mean()

1
这是上述答案的重复。
Mehdi Boukhechba '18


4

每列中的均值 df

    A   B   C
0   5   3   8
1   5   3   9
2   8   4   9

df.mean()

A    6.000000
B    3.333333
C    8.666667
dtype: float64

以及是否要平均所有列:

df.stack().mean()
6.0

1

另外,如果要round在找到以后获取值mean

#Create a DataFrame
df1 = {
    'Subject':['semester1','semester2','semester3','semester4','semester1',
               'semester2','semester3'],
   'Score':[62.73,47.76,55.61,74.67,31.55,77.31,85.47]}
df1 = pd.DataFrame(df1,columns=['Subject','Score'])

rounded_mean = round(df1['Score'].mean()) # specified nothing as decimal place
print(rounded_mean) # 62

rounded_mean_decimal_0 = round(df1['Score'].mean(), 0) # specified decimal place as 0
print(rounded_mean_decimal_0) # 62.0

rounded_mean_decimal_1 = round(df1['Score'].mean(), 1) # specified decimal place as 1
print(rounded_mean_decimal_1) # 62.2

1

您可以使用以下两个语句之一:

numpy.mean(df['col_name'])
# or
df['col_name'].mean()

请用适当的评论丰富您的答案。否则,它很可能会被标记为删除

0
You can easily followthe following code
    `import pandas as pd 
    import numpy as np 

    classxii = {'Name':['Karan','Ishan','Aditya','Anant','Ronit'],
        'Subject':['Accounts','Economics','Accounts','Economics','Accounts'],
        'Score':[87,64,58,74,87],
        'Grade':['A1','B2','C1','B1','A2']}
    df = pd.DataFrame(classxii,index = ['a','b','c','d','e'],columns=['Name','Subject','Score','Grade'])
    print(df)
    #use the below for mean if you already have a dataframe
print('mean of score is:')
print(df[['Score']].mean())

0

您可以简单地进行以下操作:df.describe()将为您提供所需的所有相关详细信息,但是要查找特定列的最小值,最大值或平均值(在您的情况下为“权重”),请使用:

    df['weights'].mean(): For average value
    df['weights'].max(): For maximum value
    df['weights'].min(): For minimum value
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.