标准化样品(计算z得分)


14

给定浮点数列表,对其进行标准化

细节

  • 如果所有值的平均值为0,并且标准偏差1,则列表x1,x2,,xn标准化的。一种计算方法是首先计算平均值μ和标准偏差σμ = 1个μσ
    μ=1ni=1nxiσ=1ni=1n(xiμ)2,
    然后通过替换每一个计算标准化xixiμσ
  • 您可以假定输入至少包含两个不同的条目(这意味着σ0)。
  • 请注意,某些实现使用样本标准偏差,该样本标准偏差不等于我们在此处使用的总体标准偏差σ
  • 有一个CW答案为所有平凡的解决方案

例子

[1,2,3] -> [-1.224744871391589,0.0,1.224744871391589]
[1,2] -> [-1,1]
[-3,1,4,1,5] -> [-1.6428571428571428,-0.21428571428571433,0.8571428571428572,-0.21428571428571433,1.2142857142857144]

(这些示例是使用此脚本生成的。)

Answers:





4

MATL,10字节

tYm-t&1Zs/

在线尝试!

说明

t       % Implicit input
        % Duplicate
Ym      % Mean
-       % Subtract, element-wise
t       % Duplicate
&1Zs    % Standard deviation using normalization by n
/       % Divide, element-wise
        % Implicit display

4

APL + WIN,41,32 30字节

Erik节省了9个字节,ngn节省了2个字节

x←v-(+/v)÷⍴v←⎕⋄x÷(+/x×x÷⍴v)*.5

提示输入数字并计算平均标准差和输入向量的标准化元素


您不能分配x←v-(+/v)÷⍴v←⎕然后做x÷((+/x*2)÷⍴v)*.5吗?
暴民埃里克(Erik the Outgolfer)

我确实可以。谢谢。
格雷厄姆

apl + win会做单例扩展(1 2 3+,4←→ 1 2 3+4)吗?如果是的话,您可以改写(+/x*2)÷⍴v+/x×x÷⍴v
ngn

@ngn可以工作另外2个字节。谢谢。
格雷厄姆

3

R + pryr,53 52字节

-1字节使用sum(x|1)而不是length(x)@Robert S.解决方案中看到的

pryr::f((x-(y<-mean(x)))/(sum((x-y)^2)/sum(x|1))^.5)

作为统计员专用的语言,我很惊讶它没有内置功能。至少我找不到。即使该函数mosaic::zscore也无法产生预期的结果。这可能是由于使用总体标准偏差而不是样本标准偏差。

在线尝试!


2
您可以更改<-=保存1个字节。
罗伯特S.18年

@ J.Doe不,我使用了我在Robert S.的解决方案中评论的方法。scale整齐!
朱塞佩

2
@ J.Doe,因为您只使用n一次即可直接使用38字节
Giuseppe

2
@RobertS。在PPCG上,我们倾向于鼓励提供灵活的输入和输出,包括输出超出要求的输出,但挑战之外,因为精确的输出布局是挑战的重点。
ngm

6
当然,R内置项不会使用“人口差异”。只有困惑的工程师才会使用这种东西(因此Python和Matlab回答;))
ngm


2

果冻,10字节

_ÆmµL½÷ÆḊ×

在线尝试!

它并不短,但是Jelly的行列式函数ÆḊ也可以计算向量范数。

_Æm             x - mean(x)
   µ            then:
    L½          Square root of the Length
      ÷ÆḊ       divided by the norm
         ×      Multiply by that value

嘿,不错的选择!不幸的是,我找不到缩短它的方法。
暴民埃里克(Erik the Outgolfer)

2

Mathematica,25个字节

Mean[(a=#-Mean@#)a]^-.5a&

纯功能。将数字列表作为输入,并返回机器精度数字列表作为输出。请注意,内置Standardize函数默认使用样本方差。


2

J,22字节

-1个字节感谢牛嘎嘎!

(-%[:%:1#.-*-%#@[)+/%#

在线尝试!

J31 23字节

(-%[:%:#@[%~1#.-*-)+/%#

在线尝试!

                   +/%# - mean (sum (+/) divided (%) by the number of samples (#)) 
(                 )     - the list is a left argument here (we have a hook)
                 -      - the difference between each sample and the mean
                *       - multiplied by 
               -        - the difference between each sample and the mean
            1#.         - sum by base-1 conversion
          %~            - divided by
       #@[              - the length of the samples list
     %:                 - square root
   [:                   - convert to a fork (function composition) 
 -                      - subtract the mean from each sample
  %                     - and divide it by sigma

1
重新排列后得到[:(%[:%:1#.*:%#)]-+/%# 22tio.run/##y/qfVmyrp2CgYKVg8D/…,我认为其中的一个上限可以删除,但是到目前为止还没有运气,编辑:更直接的字节数(-%[:%:1#.-*-%#@[)+/%#也位于22
Kritixi Lithos

@牛嘎嘎谢谢!
Galen Ivanov '18

2

APL(Dyalog Unicode)33 29字节

{d÷.5*⍨l÷⍨+/×⍨d←⍵-(+/⍵)÷l←≢⍵}

-4个字节,感谢@ngn

在线尝试!


您可以分配⍵-m给变量并m←像这样删除:{d÷.5*⍨l÷⍨+/×⍨d←⍵-(+/⍵)÷l←≢⍵}
ngn

@ngn啊,很好,谢谢,我以某种方式没有看到重复
Quintec

2

Haskell,80 75 68字节

t x=k(/sqrt(f$sum$k(^2)))where k g=g.(-f(sum x)+)<$>x;f=(/sum(1<$x))

感谢@flawr提供的建议,以sum(1<$x)代替sum[1|_<-x]和插入均值,@xnor提供内联的标准差和其他减少量。

展开:

-- Standardize a list of values of any floating-point type.
standardize :: Floating a => [a] -> [a]
standardize input = eachLessMean (/ sqrt (overLength (sum (eachLessMean (^2)))))
  where

    -- Map a function over each element of the input, less the mean.
    eachLessMean f = map (f . subtract (overLength (sum input))) input

    -- Divide a value by the length of the input.
    overLength n = n / sum (map (const 1) input)

1
您可以替换[1|_<-x](1<$x)以节省一些字节。这是避免fromIntegral出现到目前为止尚未见过的的绝妙技巧!
瑕疵

顺便说一句:我喜欢使用tryitonline,您可以在此处运行代码,然后将预格式化的aswer复制到此处!
瑕疵

而且您不必定义 m
瑕疵

您可以(-x+)(+(-x))避免而写。看起来f也可以是无点的:f=(/sum(1<$x)),并且s可以用其定义替换。
xnor

@xnor Ooh,(-x+)非常方便,我确定以后会用到它
Jon Purdy

2

MathGolf,7个字节

▓-_²▓√/

在线尝试!

说明

从字面上看,这是凯文·克鲁伊森(Kevin Cruijssen)的05AB1E答案的逐字节更新,但我从MathGolf中保存了一些字节,这些字节具有1个字节的字节,可以解决此挑战。我认为答案也相当不错!

▓         get average of list
 -        pop a, b : push(a-b)
  _       duplicate TOS
   ²      pop a : push(a*a)
    ▓     get average of list
     √    pop a : push(sqrt(a)), split string to list
      /   pop a, b : push(a/b), split strings

1

JavaScript(ES7), 80  79字节

a=>a.map(x=>(x-g(a))/g(a.map(x=>(x-m)**2))**.5,g=a=>m=eval(a.join`+`)/a.length)

在线尝试!

已评论

a =>                      // given the input array a[]
  a.map(x =>              // for each value x in a[]:
    (x - g(a)) /          //   compute (x - mean(a)) divided by
    g(                    //   the standard deviation:
      a.map(x =>          //     for each value x in a[]:
        (x - m) ** 2      //       compute (x - mean(a))²
      )                   //     compute the mean of this array
    ) ** .5,              //   and take the square root
    g = a =>              //   g = helper function taking an array a[],
      m = eval(a.join`+`) //     computing the mean
          / a.length      //     and storing the result in m
  )                       // end of outer map()


1

Haskell,59个字节

(%)i=sum.map(^i)
f l=[(0%l*y-1%l)/sqrt(2%l*0%l-1%l^2)|y<-l]

在线尝试!

不使用库。

helper函数%计算i列表的三次方之和,这使我们可以获得三个有用的值。

  • 0%ll(称为n)的长度
  • 1%ll(称为s)的总和
  • 2%ll(称为m)的平方和

我们可以将元素的z得分表示y

(n*y-s)/sqrt(n*v-s^2)

(此表达式(y-s/n)/sqrt(v/n-(s/n)^2)通过将top和bottom乘以来简化n。)

我们可以插入表情0%l1%l2%l没有括号,因为%我们定义具有比算术运算符优先级越高。

(%)i=sum.map(^i)与的长度相同i%l=sum.map(^i)l。使其更加无意义无济于事。g i=...当我们调用它时,定义它就像丢失字节。尽管%适用于任何列表,但我们仅使用问题输入列表来调用它,但是l每次使用参数调用它都不会丢失字节,因为两参数调用i%l不再是单参数调用g i


我们确实有 大号一种ŤËX在这里:)
瑕疵的

我真的很喜欢这个%主意!它看起来像统计的离散版本。
瑕疵

1

K(oK)33 23字节

-10个字节,感谢ngn!

{t%%(+/t*t:x-/x%#x)%#x}

在线尝试!

第一次尝试用K编码(我不敢称其为“高尔夫”)。我敢肯定它可以做得更好(这里的变量名太多...)


1
真好!可以更换初始(x-m)t二氧化钛
NGN

1
内部{ }是不必要的-它的隐式参数名称是x,并且已将其传递为xas参数(tio
ngn

1
通过更换另一个-1字节x-+/xx-/x。左边的参数to -/作为减少量(tio)的初始值
ngn

@ngn谢谢!现在,我看到前两个高尔夫很明显。最后一个超出了我当前的水平:)
Galen Ivanov '18


1

TI-Basic(83系列),14 11字节

Ans-mean(Ans
Ans/√(mean(Ans²

在中接受输入Ans。例如,如果您在中键入prgmSTANDARD{1,2,3}:prgmSTANDARD则将返回{-1.224744871,0.0,1.224744871}

以前,我尝试使用该1-Var Stats命令,该命令将总体标准差存储在中σx,但是手动计算它的麻烦较少。


1

05AB1E, 9 bytes

ÅA-DnÅAt/

Port of @Arnauld's JavaScript answer, so make sure to upvote him!

Try it online or verify all test cases.

Explanation:

ÅA          # Calculate the mean of the (implicit) input
            #  i.e. [-3,1,4,1,5] → 1.6
  -         # Subtract it from each value in the (implicit) input
            #  i.e. [-3,1,4,1,5] and 1.6 → [-4.6,-0.6,2.4,-0.6,3.4]
   D        # Duplicate that list
    n       # Take the square of each
            #  i.e. [-4.6,-0.6,2.4,-0.6,3.4] → [21.16,0.36,5.76,0.36,11.56]
     ÅA     # Pop and calculate the mean of that list
            #  i.e. [21.16,0.36,5.76,0.36,11.56] → 7.84
       t    # Take the square-root of that
            #  i.e. 7.84 → 2.8
        /   # And divide each value in the duplicated list with it (and output implicitly)
            #  i.e. [-4.6,-0.6,2.4,-0.6,3.4] and 2.8 → [-1.6428571428571428,
            #   -0.21428571428571433,0.8571428571428572,-0.21428571428571433,1.2142857142857144]


0

Pyth, 21 19 bytes

mc-dJ.OQ@.Om^-Jk2Q2

Try it online here.

mc-dJ.OQ@.Om^-Jk2Q2Q   Implicit: Q=eval(input())
                       Trailing Q inferred
    J.OQ               Take the average of Q, store the result in J
           m     Q     Map the elements of Q, as k, using:
             -Jk         Difference between J and k
            ^   2        Square it
         .O            Find the average of the result of the map
        @         2    Square root it
                       - this is the standard deviation of Q
m                  Q   Map elements of Q, as d, using:
  -dJ                    d - J
 c                       Float division by the standard deviation
                       Implicit print result of map

Edit: after seeing Kevin's answer, changed to use the average builtin for the inner results. Previous answer: mc-dJ.OQ@csm^-Jk2QlQ2


0

SNOBOL4 (CSNOBOL4), 229 bytes

	DEFINE('Z(A)')
Z	X =X + 1
	M =M + A<X>	:S(Z)
	N =X - 1.
	M =M / N
D	X =GT(X) X - 1	:F(S)
	A<X> =A<X> - M	:(D)
S	X =LT(X,N) X + 1	:F(Y)
	S =S + A<X> ^ 2 / N	:(S)
Y	S =S ^ 0.5
N	A<X> =A<X> / S
	X =GT(X) X - 1	:S(N)
	Z =A	:(RETURN)

Try it online!

Link is to a functional version of the code which constructs an array from STDIN given its length and then its elements, then runs the function Z on that, and finally prints out the values.

Defines a function Z which returns an array.

The 1. on line 4 is necessary to do the floating point arithmetic properly.



0

Charcoal, 25 19 bytes

≧⁻∕ΣθLθθI∕θ₂∕ΣXθ²Lθ

Try it online! Link is to verbose version of code. Explanation:

       θ    Input array
≧           Update each element
 ⁻          Subtract
   Σ        Sum of
    θ       Input array
  ∕         Divided by
     L      Length of
      θ     Input array

Calculate μ and vectorised subtract it from each xi.

  θ         Updated array
 ∕          Vectorised divided by
   ₂        Square root of
     Σ      Sum of
       θ    Updated array
      X     Vectorised to power
        ²   Literal 2
    ∕       Divided by
         L  Length of
          θ Array
I           Cast to string
            Implicitly print each element on its own line.

Calculate σ, vectorised divide each xi by it, and output the result.

Edit: Saved 6 bytes thanks to @ASCII-only for a) using SquareRoot() instead of Power(0.5) b) fixing vectorised Divide() (it was doing IntDivide() instead) c) making Power() vectorise.


crossed out 25 = no bytes? :P (Also, you haven't updated the TIO link yet)
ASCII-only

@ASCII-only Oops, thanks!
Neil
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.