我可以使用分布的矩采样分布吗?


14

我注意到在统计/机器学习方法中,分布通常由高斯近似,然后将高斯用于采样。它们通过计算分配的所述第一两个时刻开始,并使用这些估计μσ2。然后他们可以从那个高斯样本中取样。

在我看来,我计算出的力矩越多,我越应该能够估计出我想要采样的分布。

如果我计算3个矩该怎么办...如何使用这些矩从分布中采样?这可以扩展到N个时刻吗?


2
三刻不能确定分布形式*;如果选择具有与前三个总体矩相关的三个参数的分布族,则可以进行矩匹配(“矩方法”)以估计这三个参数,然后从这种分布中生成值。有很多这样的分布。[*确实,有时甚至连所有的时刻都不足以确定分布。]
Glen_b-恢复莫妮卡

谢谢@Glen_b!我将阅读“时刻方法”,以了解何时有可能。您能指出理论来描述何时矩不足以确定分布吗?
curious_dan

“矩量法”仅告诉您如何根据矩量估算参数。您剩下的评论是一个新问题(我认为它已经在现场得到解答);简单地说-如果存在力矩生成函数(在0附近),则它唯一地标识分布(从技术上讲,原则上您可以进行拉普拉斯逆变换)。当然,如果某些时刻并不有限,这将意味着MGF不存在,但也有所有时刻都是有限的,但仍然MGF没有在0附近存在的情况下..
Glen_b -Reinstate莫妮卡

我正在根据我的评论写答案。
Glen_b-恢复莫妮卡

Answers:


22

Three moments don't determine a distributional form; if you choose a distribution-famiy with three parameters which relate to the first three population moments, you can do moment matching ("method of moments") to estimate the three parameters and then generate values from such a distribution. There are many such distributions.

Sometimes even having all the moments isn't sufficient to determine a distribution. If the moment generating function exists (in a neighborhood of 0) then it uniquely identifies a distribution (you could in principle do an inverse Laplace transform to obtain it).

[如果某些矩不是有限的,这意味着mgf不存在,但是在某些情况下,所有矩都是有限的,但mgf仍然不存在于0附近。

考虑到存在分布的选择,可能会倾向于考虑在前三个时刻有约束的最大熵解,但是在实线上没有达到它的分布(因为指数中的所得三次方将是无界的)。


特定分配方式的流程如何运作

We can simplify the process of obtaining a distribution matching three moments by ignoring the mean and variance and working with a scaled third moment -- the moment-skewness (γ1=μ3/μ23/2).

We can do this because having selected a distribution with the relevant skewness, we can then back out the desired mean and variance by scaling and shifting.

Let's consider an example. Yesterday I created a large data set (which still happens to be in my R session) whose distribution I haven't tried to calculate the functional form of (it's a large set of values of the log of the sample variance of a Cauchy at n=10). We have the first three raw moments as 1.519, 3.597 and 11.479 respectively, or correspondingly a mean of 1.518, a standard deviation* of 1.136 and a skewness of 1.429 (so these are sample values from a large sample).

Formally, method of moments would attempt to match the raw moments, but the calculation is simpler if we start with the skewness (turning solving three equations in three unknowns into solving for one parameter at a time, a much simpler task).

* I am going to handwave away the distinction between using an n-denominator on the variance - as would correspond to formal method of moments - and an n-1 denominator and simply use sample calculations.

This skewness (~1.43) indicates we seek a distribution which is right-skew. I could choose, for example, a shifted lognormal distribution (three parameter lognormal, shape σ, scale μ and location-shift γ) with the same moments. Let's begin by matching the skewness. The population skewness of a two parameter lognormal is:

γ1=(eσ2+2)eσ21

So let's start by equating that to the desired sample value to obtain an estimate of σ2, σ~2, say.

Note that γ12 is (τ+2)2(τ1) where τ=eσ2. This then yields a simple cubic equation τ3+3τ24=γ12. Using the sample skewness in that equation yields τ~1.1995 or σ~20.1819. (The cubic has only one real root so there's no issue with choosing between roots; nor is there any risk of choosing the wrong sign on γ1 -- we can flip the distribution left-for-right if we need negative skewness)

We can then in turn solve for μ by matching the variance (or standard deviation) and then for the location parameter by matching the mean.

But we could as easily have chosen a shifted-gamma or a shifted-Weibull distribution (or a shifted-F or any number of other choices) and run through essentially the same process. Each of them would be different.

[For the sample I was dealing with, a shifted gamma would probably have been a considerably better choice than a shifted lognormal, since the distribution of the logs of the values was left skew and the distribution of their cube root was very close to symmetric; these are consistent with what you will see with (unshifted) gamma densities, but a left-skewed density of the logs cannot be achieved with any shifted lognormal.]

One could even take the skewness-kurtosis diagram in a Pearson plot and draw a line at the desired skewness and thereby obtain a two-point distribution, sequence of beta distributions, a gamma distribution, a sequence of beta-prime distributions, an inverse-gamma disribution and a sequence of Pearson type IV distributions all with the same skewness.

We can see this illustrated in a skewness-kurtosis plot (Pearson plot) below (note that β1=γ12 and β2 is the kurtosis), with the regions for the various Pearson-distributions marked in.

Pearson plot with desired skewness line drawn in

The green horizontal line represents γ12=2.042, and we see it pass through each of the mentioned distribution-families, each point corresponding to a different population kurtosis. (The dashed curve represents the lognormal, which is not a Pearson-family distribution; its intersection with the green line marks the particular lognormal-shape we identified. Note that the dashed curve is purely a function of σ.)


More moments

Moments don't pin distributions down very well, so even if you specify many moments, there will still be a lot of different distributions (particularly in relation to their extreme-tail behavior) that will match them.

You can of course choose some distributional family with at least four parameters and attempt to match more than three moments; for example the Pearson distributions above allow us to match the first four moments, and there are other choices of distributions that would allow similar degree of flexibility.

One can adopt other strategies to choose distributions that can match distributional features - mixture distributions, modelling the log-density using splines, and so forth.

Frequently, however, if one goes back to the initial purpose for which one was trying to find a distribution, it often turns out there's something better that can be done than the sort of strategy outlined here.


2

So, the answer is generally NO, you can't do this, but sometimes you can.

When you can't

The reasons you can't do this usually are two folds.

First, if you have N observations, then at most you can calculates N moments. What about the other moments? You can't simply set them to zero.

Second, higher moments calculations become less and less precise, because you have to raise the numbers into higher powers. Consider 100th non-central moment, you can't usually calculate it with any precision:

γ100=ixi100n

When you can

Now, sometimes you can get the distribution from moments. It's when you make an assumption about the distribution of some sort. For instance, you declare that it's normal. In this case all you need is just two moment, which can be calculated with decent precision, usually. Note, that normal distribution has higher moments, indeed, e.g. kurtosis, but we don't need them. If you were to calculate all moments of the normal distribution (without assuming it's normal), then tried to recover the characteristic function to sample from the distribution, it wouldn't work. However, when you forget about the higher moments and stick to the first two, it does work.

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.