统计和数学概念完全相同,请理解“家庭”是一个通用数学术语,其技术变化适用于不同情况:
参数族是所有分布空间中的曲线(或其表面或其他有限维概括)。
这篇文章的其余部分解释了这意味着什么。顺便说一句,我不认为这在数学上或统计学上都是有争议的(除了下面提到的一个小问题)。为了支持这种观点,我提供了许多参考(主要是Wikipedia文章)。
当将函数的类C Y研究成集合Y或“映射” 时,往往会使用“家庭”这一术语。给定一个域X,由某个集合Θ(“参数”)参数化的X上的地图族F是一个函数CYYX FX Θ
F:X × Θ → Y
F:X×Θ→Y
对于其中(1)为每个θ ∈ Θ,函数˚F θ:X → ý由下式给出˚F θ(X )= ˚F(X ,θ )是在Ç ÿ和(2)˚F本身具有一定的“好”的性质。θ∈ΘFθ:X→YFθ(x)=F(x,θ)CYF
这个想法是我们希望以“平滑”或受控的方式将功能从X更改为Y。属性(1)的装置,每个θ候任这样的功能,而属性(2)的细节将捕获,其中,在“小”的变化感测θ诱导足够“小”变化˚F θ。XYθθFθ
接近问题中提到的一个标准数学例子是同伦。在这种情况下,C Y是从拓扑空间X到拓扑空间Y的连续映射的类别;Θ = [ 0 ,1 ] ⊂ - [R是具有其通常的拓扑单元间隔,并且我们要求˚F是一个连续的从拓扑产物地图X × Θ到ÿ。可以认为是“图F的连续变形CY XYΘ=[0,1]⊂RFX×ΘY 0至 ˚F 1 “。当 XF0F1= [ 0 ,1 ]本身是一个间隔,例如地图是曲线在X=[0,1] ÿ和同伦是平滑变形从一条曲线到另一个。Y
对于统计应用, C Y是R上所有分布的集合(或者,实际上,是R n上某个n的分布,但是为了使说明简单,我将集中在n = 1上)。我们可能会与该组所有非递减的识别它右连左极函数功能- [R → [ 0 ,1 ]其中它们的范围的闭合包括0和1:这些是累积分布函数,或简单地分布函数。因此,X = R和CYRRnnn=1R→[0,1]01X=R Y = [0 ,1 ]。Y=[0,1]
一个家庭的分布是任何子集Ç ÿ。CY 家庭的另一个名字是统计模型。 它由我们假定的所有分布组成,但我们不知道哪个分布是实际分布。
- 一个家庭可以是空的。
- C Y本身是一个家庭。CY
- 一个家庭可以只包含一个分布,也可以仅包含有限个分布。
这些抽象的集合论特征很少受到关注或使用。 仅当我们考虑C Y上的其他(相关)数学结构时,此概念才有用。但是,C Y的哪些属性具有统计意义?经常出现的一些是:CYCY
Ç ÿ是一个凸集:给出的任何两个分布 ˚F, ģ ∈ Ç Ý,我们可形成混合物分配(1-吨) ˚F +吨 ģ ∈ý所有吨∈[0,1]。这是从F到G的“同伦”。CYF,G∈CY (1−t)F+tG∈Yt∈[0,1]FG
C Y的大部分支持各种伪度量,例如Kullback-Leibler散度或密切相关的Fisher信息度量。CY
CYCY has an additive structure: corresponding to any two distributions FF and GG is their sum, F⋆GF⋆G.
CYCY supports many useful, natural functions, often termed "properties." These include any fixed quantile (such as the median) as well as the cumulants.
CYCY is a subset of a function space. As such, it inherits many useful metrics, such as the sup norm (L∞L∞ norm) given by ||F−G||∞=supx∈R|F(x)−G(x)|.
||F−G||∞=supx∈R|F(x)−G(x)|.
Natural group actions on RR induce actions on CYCY. The commonest actions are translations Tμ:x→x+μTμ:x→x+μ and scalings Sσ:x→xσSσ:x→xσ for σ>0σ>0. The effect these have on a distribution is to send FF to the distribution given by Fμ,σ(x)=F((x−μ)/σ)Fμ,σ(x)=F((x−μ)/σ). These lead to the concepts of location-scale families and their generalizations. (I don't supply a reference, because extensive Web searches turn up a variety of different definitions: here, at least, may be a tiny bit of controversy.)
The properties that matter depend on the statistical problem and on how you intend to analyze the data. Addressing all the variations suggested by the preceding characteristics would take too much space for this medium. Let's focus on one common important application.
Take, for instance, Maximum Likelihood. In most applications you will want to be able to use Calculus to obtain an estimate. For this to work, you must be able to "take derivatives" in the family.
(Technical aside: The usual way in which this is accomplished is to select a domain Θ⊂RdΘ⊂Rd for d≥0d≥0 and specify a continuous, locally invertible function pp from ΘΘ into CYCY. (This means that for every θ∈Θθ∈Θ there exists a ball B(θ,ϵ)B(θ,ϵ), with ϵ>0ϵ>0 for which p∣B(θ,ϵ):B(θ,ϵ)∩Θ→CYp∣B(θ,ϵ):B(θ,ϵ)∩Θ→CY is one-to-one. In other words, if we alter θθ by a sufficiently small amount we will always get a different distribution.))
Consequently, in most ML applications we require that pp be continuous (and hopefully, almost everywhere differentiable) in the ΘΘ component. (Without continuity, maximizing the likelihood generally becomes an intractable problem.) This leads to the following likelihood-oriented definition of a parametric family:
A parametric family of (univariate) distributions is a locally invertible map F:R×Θ→[0,1],
F:R×Θ→[0,1],
with Θ⊂RnΘ⊂Rn, for which (a) each FθFθ is a distribution function and (b) for each x∈Rx∈R, the function Lx:θ→[0,1]Lx:θ→[0,1] given by Lx(θ)=F(x,θ)Lx(θ)=F(x,θ) is continuous and almost everywhere differentiable.
Note that a parametric family FF is more than just the collection of FθFθ: it also includes the specific way in which parameter values θθ correspond to distributions.
Let's end up with some illustrative examples.
Let CYCY be the set of all Normal distributions. As
given, this is not a parametric family: it's just a family. To be
parametric, we have to choose a parameterization. One way is to
choose Θ={(μ,σ)∈R2∣σ>0}Θ={(μ,σ)∈R2∣σ>0}
and to map (μ,σ)(μ,σ) to the Normal distribution with mean μμ
and variance σ2σ2.
The set of Poisson(λ)(λ) distributions is a parametric family
with λ∈Θ=(0,∞)⊂R1λ∈Θ=(0,∞)⊂R1.
The set of Uniform(θ,θ+1)(θ,θ+1) distributions (which features
prominently in many textbook exercises) is a parametric family with
θ∈R1θ∈R1. In this case, Fθ(x)=max(0,min(1,x−θ))Fθ(x)=max(0,min(1,x−θ)) is differentiable in θθ except for
θ∈{x,x−1}θ∈{x,x−1}.
Let FF and GG be any two distributions. Then F(x,θ)=(1−θ)F(x)+θG(x)F(x,θ)=(1−θ)F(x)+θG(x) is a parametric family for θ∈[0,1]θ∈[0,1]. (Proof: the image of FF is a set of distributions and its partial derivative in θθ equals −F(x)+G(x)−F(x)+G(x) which is defined everywhere.)
The Pearson family is a four-dimensional family, Θ⊂R4Θ⊂R4, which includes (among others) the Normal distributions, Beta distributions, and Inverse Gamma distributions. This illustrates the fact that any one given distribution may belong to many different distribution families. This is perfectly analogous to observing that any point in a (sufficiently large) space may belong to many paths that intersect there. This, together with the previous construction, shows us that no distribution uniquely determines a family to which it belongs.
The family CYCY of all finite-variance absolutely continuous distributions is not parametric. The proof requires a deep theorem of topology: if we endow CYCY with any topology (whether statistically useful or not) and p:Θ→CYp:Θ→CY is continuous and locally has a continuous inverse, then locally CYCY must have the same dimension as that of ΘΘ. However, in all statistically meaningful topologies, CYCY is infinite dimensional.