我注意到在统计/机器学习方法中,分布通常由高斯近似,然后将高斯用于采样。它们通过计算分配的所述第一两个时刻开始,并使用这些估计和。然后他们可以从那个高斯样本中取样。
在我看来,我计算出的力矩越多,我越应该能够估计出我想要采样的分布。
如果我计算3个矩该怎么办...如何使用这些矩从分布中采样?这可以扩展到N个时刻吗?
我注意到在统计/机器学习方法中,分布通常由高斯近似,然后将高斯用于采样。它们通过计算分配的所述第一两个时刻开始,并使用这些估计和。然后他们可以从那个高斯样本中取样。
在我看来,我计算出的力矩越多,我越应该能够估计出我想要采样的分布。
如果我计算3个矩该怎么办...如何使用这些矩从分布中采样?这可以扩展到N个时刻吗?
Answers:
Three moments don't determine a distributional form; if you choose a distribution-famiy with three parameters which relate to the first three population moments, you can do moment matching ("method of moments") to estimate the three parameters and then generate values from such a distribution. There are many such distributions.
Sometimes even having all the moments isn't sufficient to determine a distribution. If the moment generating function exists (in a neighborhood of 0) then it uniquely identifies a distribution (you could in principle do an inverse Laplace transform to obtain it).
[如果某些矩不是有限的,这意味着mgf不存在,但是在某些情况下,所有矩都是有限的,但mgf仍然不存在于0附近。
考虑到存在分布的选择,可能会倾向于考虑在前三个时刻有约束的最大熵解,但是在实线上没有达到它的分布(因为指数中的所得三次方将是无界的)。
特定分配方式的流程如何运作
We can simplify the process of obtaining a distribution matching three moments by ignoring the mean and variance and working with a scaled third moment -- the moment-skewness ().
We can do this because having selected a distribution with the relevant skewness, we can then back out the desired mean and variance by scaling and shifting.
Let's consider an example. Yesterday I created a large data set (which still happens to be in my R session) whose distribution I haven't tried to calculate the functional form of (it's a large set of values of the log of the sample variance of a Cauchy at n=10). We have the first three raw moments as 1.519, 3.597 and 11.479 respectively, or correspondingly a mean of 1.518, a standard deviation* of 1.136 and a skewness of 1.429 (so these are sample values from a large sample).
Formally, method of moments would attempt to match the raw moments, but the calculation is simpler if we start with the skewness (turning solving three equations in three unknowns into solving for one parameter at a time, a much simpler task).
* I am going to handwave away the distinction between using an n-denominator on the variance - as would correspond to formal method of moments - and an n-1 denominator and simply use sample calculations.
This skewness (~1.43) indicates we seek a distribution which is right-skew. I could choose, for example, a shifted lognormal distribution (three parameter lognormal, shape , scale and location-shift ) with the same moments. Let's begin by matching the skewness. The population skewness of a two parameter lognormal is:
So let's start by equating that to the desired sample value to obtain an estimate of , , say.
Note that is where . This then yields a simple cubic equation . Using the sample skewness in that equation yields or . (The cubic has only one real root so there's no issue with choosing between roots; nor is there any risk of choosing the wrong sign on -- we can flip the distribution left-for-right if we need negative skewness)
We can then in turn solve for by matching the variance (or standard deviation) and then for the location parameter by matching the mean.
But we could as easily have chosen a shifted-gamma or a shifted-Weibull distribution (or a shifted-F or any number of other choices) and run through essentially the same process. Each of them would be different.
[For the sample I was dealing with, a shifted gamma would probably have been a considerably better choice than a shifted lognormal, since the distribution of the logs of the values was left skew and the distribution of their cube root was very close to symmetric; these are consistent with what you will see with (unshifted) gamma densities, but a left-skewed density of the logs cannot be achieved with any shifted lognormal.]
One could even take the skewness-kurtosis diagram in a Pearson plot and draw a line at the desired skewness and thereby obtain a two-point distribution, sequence of beta distributions, a gamma distribution, a sequence of beta-prime distributions, an inverse-gamma disribution and a sequence of Pearson type IV distributions all with the same skewness.
We can see this illustrated in a skewness-kurtosis plot (Pearson plot) below (note that and is the kurtosis), with the regions for the various Pearson-distributions marked in.
The green horizontal line represents , and we see it pass through each of the mentioned distribution-families, each point corresponding to a different population kurtosis. (The dashed curve represents the lognormal, which is not a Pearson-family distribution; its intersection with the green line marks the particular lognormal-shape we identified. Note that the dashed curve is purely a function of .)
More moments
Moments don't pin distributions down very well, so even if you specify many moments, there will still be a lot of different distributions (particularly in relation to their extreme-tail behavior) that will match them.
You can of course choose some distributional family with at least four parameters and attempt to match more than three moments; for example the Pearson distributions above allow us to match the first four moments, and there are other choices of distributions that would allow similar degree of flexibility.
One can adopt other strategies to choose distributions that can match distributional features - mixture distributions, modelling the log-density using splines, and so forth.
Frequently, however, if one goes back to the initial purpose for which one was trying to find a distribution, it often turns out there's something better that can be done than the sort of strategy outlined here.
So, the answer is generally NO, you can't do this, but sometimes you can.
The reasons you can't do this usually are two folds.
First, if you have N observations, then at most you can calculates N moments. What about the other moments? You can't simply set them to zero.
Second, higher moments calculations become less and less precise, because you have to raise the numbers into higher powers. Consider 100th non-central moment, you can't usually calculate it with any precision:
Now, sometimes you can get the distribution from moments. It's when you make an assumption about the distribution of some sort. For instance, you declare that it's normal. In this case all you need is just two moment, which can be calculated with decent precision, usually. Note, that normal distribution has higher moments, indeed, e.g. kurtosis, but we don't need them. If you were to calculate all moments of the normal distribution (without assuming it's normal), then tried to recover the characteristic function to sample from the distribution, it wouldn't work. However, when you forget about the higher moments and stick to the first two, it does work.