分配家庭的定义?


14

分布族对统计的定义是否不同于其他学科?

通常,曲线族是一组曲线,每条曲线由一个函数或参数化给定,其中一个或多个参数发生变化。这样的族例如用于表征电子部件

为了进行统计,根据形状来源的一个是改变形状参数的结果。那么,我们如何才能理解伽玛分布具有形状和比例参数,并且只有广义伽玛分布才具有位置参数?这是否会使家庭成为改变位置参数的结果?根据@whuber一个家庭的意义是隐式A中的家庭的“参数化”是从ℝ的一个子集的连续映射Ñ,以其平常的拓扑结构,为分布的空间,其图像是家庭。n

用简单的语言来说,统计分布族是什么?

关于同一个家庭的分布的统计属性之间的关系的一个问题已经为另一个问题引起了很大的争议,因此似乎值得探讨其含义。

不一定是一个简单的问题,是因为它在指数族这一短语中的使用而产生的,它与曲线族无关,但与通过重新参数化(不仅是参数)改变分布的PDF的形式有关。 ,还可以替换独立随机变量的功能。


1
用“分布族”的措辞,您是指“分布族”吗?指数族是分布(具有某些属性)的族,并且将每个分布的pdf解释为曲线,它甚至对应于曲线族,因此最后几段似乎很混乱。
Juho Kokkala

@JuhoKokkala似乎令人困惑,因为“家庭”的含义取决于上下文。例如,一个正态分布未知均值和方差已知的是在指数族。一个正常的分布具有无限的支持,- + 和指数分布具有半无限的支持,[ 0 + ,因此不存在对于一个指数分布曲线族,其覆盖一个正常的范围分布,它们永远不会具有相同的形状...(,+)[0,+)
卡尔

@JuhoKokkala ...并且指数PDF甚至没有位置参数,而正态分布离不开位置参数。请参阅上面的链接,了解所需的替换以及普通pdf处于指数族中的上下文。
卡尔,

1
stats.stackexchange.com/questions/129990/…可能相关。据我所知,“未知平均值和已知方差的正态分布在指数族中”是滥用术语(尽管有些普遍)。确切地说,指数族是具有某些属性的分布族。具有未知均值和方差已知正态分布的家族是一个指数族; 指数分布的家族是另一个指数族等
的Juho Kokkala

1
@JuhoKokkala:在特殊情况下,“家庭”的用法如此普遍(不常见),意思是“家庭集合”可能值得一提。(我想不出其他情况下-由于某种原因,似乎没有一个人容易出现“的说话位置,规模家庭”。)
恢复莫妮卡- Scortchi

Answers:


14

统计和数学概念完全相同,请理解“家庭”是一个通用数学术语,其技术变化适用于不同情况:

参数族是所有分布空间中的曲线(或其表面或其他有限维概括)。

这篇文章的其余部分解释了这意味着什么。顺便说一句,我不认为这在数学上或统计学上都是有争议的(除了下面提到的一个小问题)。为了支持这种观点,我提供了许多参考(主要是Wikipedia文章)。


当将函数的类C Y研究成集合Y或“映射” 时,往往会使用“家庭”这一术语。给定一个域X由某个集合Θ(“参数”)参数化X上的地图F是一个函数CYYX FX Θ

FX × Θ Y

F:X×ΘY

对于其中(1)为每个θ ∈ Θ,函数˚F θX ý由下式给出˚F θX = ˚FX θ 是在Ç ÿ和(2)˚F本身具有一定的“好”的性质。θΘFθ:XYFθ(x)=F(x,θ)CYF

这个想法是我们希望以“平滑”或受控的方式将功能从X更改为Y。属性(1)的装置,每个θ候任这样的功能,而属性(2)的细节将捕获,其中,在“小”的变化感测θ诱导足够“小”变化˚F θXYθθFθ

接近问题中提到的一个标准数学例子同伦。在这种情况下,C Y是从拓扑空间X到拓扑空间Y的连续映射的类别Θ = [ 0 1 ] - [R是具有其通常的拓扑单元间隔,并且我们要求˚F是一个连续的从拓扑产物地图X × Θÿ。可以认为是“图F的连续变形CY XYΘ=[0,1]RFX×ΘY 0 ˚F 1 “。当 XF0F1= [ 0 1 ]本身是一个间隔,例如地图是曲线X=[0,1] ÿ和同伦是平滑变形从一条曲线到另一个。Y

对于统计应用, C YR上所有分布的集合(或者,实际上,是R n上某个n的分布,但是为了使说明简单,我将集中在n = 1上)。我们可能会与该组所有非递减的识别它右连左极函数功能- [R[ 0 1 ]其中它们的范围的闭合包括01:这些是累积分布函数,或简单地分布函数。因此,X = RCYRRnnn=1R[0,1]01X=R Y = [0 1 ]Y=[0,1]

一个家庭的分布是任何子集Ç ÿCY 家庭的另一个名字是统计模型。 它由我们假定的所有分布组成,但我们不知道哪个分布是实际分布。

  • 一个家庭可以是空的。
  • C Y本身是一个家庭。CY
  • 一个家庭可以只包含一个分布,也可以仅包含有限个分布。

这些抽象的集合论特征很少受到关注或使用。 仅当我们考虑C Y上的其他(相关)数学结构时,此概念才有用。但是,C Y的哪些属性具有统计意义?经常出现的一些是:CYCY

  1. Ç ÿ是一个凸集:给出的任何两个分布 ˚F ģ Ç Ý,我们可形成混合物分配1- ˚F + ģý所有[01]。这是从FG的“同伦”。CYF,GCY (1t)F+tGYt[0,1]FG

  2. C Y的大部分支持各种伪度量,例如Kullback-Leibler散度或密切相关的Fisher信息度量。CY

  3. CYCY has an additive structure: corresponding to any two distributions FF and GG is their sum, FGFG.

  4. CYCY supports many useful, natural functions, often termed "properties." These include any fixed quantile (such as the median) as well as the cumulants.

  5. CYCY is a subset of a function space. As such, it inherits many useful metrics, such as the sup norm (LL norm) given by ||FG||=supxR|F(x)G(x)|.

    ||FG||=supxR|F(x)G(x)|.
  6. Natural group actions on RR induce actions on CYCY. The commonest actions are translations Tμ:xx+μTμ:xx+μ and scalings Sσ:xxσSσ:xxσ for σ>0σ>0. The effect these have on a distribution is to send FF to the distribution given by Fμ,σ(x)=F((xμ)/σ)Fμ,σ(x)=F((xμ)/σ). These lead to the concepts of location-scale families and their generalizations. (I don't supply a reference, because extensive Web searches turn up a variety of different definitions: here, at least, may be a tiny bit of controversy.)

The properties that matter depend on the statistical problem and on how you intend to analyze the data. Addressing all the variations suggested by the preceding characteristics would take too much space for this medium. Let's focus on one common important application.

Take, for instance, Maximum Likelihood. In most applications you will want to be able to use Calculus to obtain an estimate. For this to work, you must be able to "take derivatives" in the family.

(Technical aside: The usual way in which this is accomplished is to select a domain ΘRdΘRd for d0d0 and specify a continuous, locally invertible function pp from ΘΘ into CYCY. (This means that for every θΘθΘ there exists a ball B(θ,ϵ)B(θ,ϵ), with ϵ>0ϵ>0 for which pB(θ,ϵ):B(θ,ϵ)ΘCYpB(θ,ϵ):B(θ,ϵ)ΘCY is one-to-one. In other words, if we alter θθ by a sufficiently small amount we will always get a different distribution.))

Consequently, in most ML applications we require that pp be continuous (and hopefully, almost everywhere differentiable) in the ΘΘ component. (Without continuity, maximizing the likelihood generally becomes an intractable problem.) This leads to the following likelihood-oriented definition of a parametric family:

A parametric family of (univariate) distributions is a locally invertible map F:R×Θ[0,1],

F:R×Θ[0,1],
with ΘRnΘRn, for which (a) each FθFθ is a distribution function and (b) for each xRxR, the function Lx:θ[0,1]Lx:θ[0,1] given by Lx(θ)=F(x,θ)Lx(θ)=F(x,θ) is continuous and almost everywhere differentiable.

Note that a parametric family FF is more than just the collection of FθFθ: it also includes the specific way in which parameter values θθ correspond to distributions.

Let's end up with some illustrative examples.

  • Let CYCY be the set of all Normal distributions. As given, this is not a parametric family: it's just a family. To be parametric, we have to choose a parameterization. One way is to choose Θ={(μ,σ)R2σ>0}Θ={(μ,σ)R2σ>0} and to map (μ,σ)(μ,σ) to the Normal distribution with mean μμ and variance σ2σ2.

  • The set of Poisson(λ)(λ) distributions is a parametric family with λΘ=(0,)R1λΘ=(0,)R1.

  • The set of Uniform(θ,θ+1)(θ,θ+1) distributions (which features prominently in many textbook exercises) is a parametric family with θR1θR1. In this case, Fθ(x)=max(0,min(1,xθ))Fθ(x)=max(0,min(1,xθ)) is differentiable in θθ except for θ{x,x1}θ{x,x1}.

  • Let FF and GG be any two distributions. Then F(x,θ)=(1θ)F(x)+θG(x)F(x,θ)=(1θ)F(x)+θG(x) is a parametric family for θ[0,1]θ[0,1]. (Proof: the image of FF is a set of distributions and its partial derivative in θθ equals F(x)+G(x)F(x)+G(x) which is defined everywhere.)

  • The Pearson family is a four-dimensional family, ΘR4ΘR4, which includes (among others) the Normal distributions, Beta distributions, and Inverse Gamma distributions. This illustrates the fact that any one given distribution may belong to many different distribution families. This is perfectly analogous to observing that any point in a (sufficiently large) space may belong to many paths that intersect there. This, together with the previous construction, shows us that no distribution uniquely determines a family to which it belongs.

  • The family CYCY of all finite-variance absolutely continuous distributions is not parametric. The proof requires a deep theorem of topology: if we endow CYCY with any topology (whether statistically useful or not) and p:ΘCYp:ΘCY is continuous and locally has a continuous inverse, then locally CYCY must have the same dimension as that of ΘΘ. However, in all statistically meaningful topologies, CYCY is infinite dimensional.


2
It will take me about a day to digest your answer. I will have to chew slowly. Meanwhile, thank you.
Carl

(+1) OK, I slogged through it. So is F:R×Θ[0,1]F:R×Θ[0,1] a Polish space or not? Can we do a simple answer so people know how to avoid using the word family improperly, please. @JuhoKokkala related, for example, that Wikipedia abused language in their exponential family, that needs clarification.
Carl

1
Doesn't the second sentence of this answer serve that request for simplicity?
whuber

IMHO, however uninformed, no, it does not due to incompleteness, it doesn't say what a family isn't. The concept "in the space of all distributions" seems to relate to statistics only.
Carl

1
I have accepted your answer. You have enough information in it that I could apply it to the question in question.
Carl

1

To address a specific point brought up in the question: "exponential family" does not denote a set of distributions. (The standard, say, exponential distribution is a member of the family of exponential distributions, an exponential family; of the family of gamma distributions, also an exponential family; of the family of Weibull distributions, not an exponential family; & of any number of other families you might dream up.) Rather, "exponential" here refers to a property possessed by a family of distributions. So we shouldn't talk of "distributions in the exponential family" but of "exponential families of distributions"—the former is an abuse of terminology, as @JuhoKokkala points out. For some reason no-one commits this abuse when talking of location–scale families.


0

Thanks to @whuber there is enough information to summarize in what I hope is a simpler form relating to the question from which this post arose. "Another name for a family [Sic, statistical family] is [a] statistical model."

From that Wikipedia entry: A statistical model consists of all distributions that we suppose govern our observations, but we do not otherwise know which distribution is the actual one. What distinguishes a statistical model from other mathematical models is that a statistical model is non-deterministic. Thus, in a statistical model specified via mathematical equations, some of the variables do not have specific values, but instead have probability distributions; i.e., some of the variables are stochastic. A statistical model is usually thought of as a pair (S,P)(S,P), where SS is the set of possible observations, i.e., the sample space, and PP is a set of probability distributions on SS.

Suppose that we have a statistical model (S,P)(S,P) with P={Pθ:θΘ}P={Pθ:θΘ}. The model is said to be a Parametric model if ΘΘ has a finite dimension. In notation, we write that ΘRdΘRd where dd is a positive integer (RR denotes the real numbers; other sets can be used, in principle). Here, dd is called the dimension of the model.

As an example, if we assume that data arise from a univariate Gaussian distribution, then we are assuming that
P={Pμ,σ(x)12πσexp((xμ)22σ2):μR,σ>0}.

In this example, the dimension, d, equals 2, end quote.

Thus, if we reduce the dimensionality by assigning, for the example above, μ=0, we can show a family of curves by plotting σ=1,2,3,4,5 or whatever choices for σ.

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.