我想知道是否有一种标准的方法来测量数组的“排序”?是否将具有可能反转的中位数的数组视为最大未排序的数组?我的意思是,基本上从排序或反向排序开始都是尽可能的。
我想知道是否有一种标准的方法来测量数组的“排序”?是否将具有可能反转的中位数的数组视为最大未排序的数组?我的意思是,基本上从排序或反向排序开始都是尽可能的。
Answers:
No, it depends on your application. The measures of sortedness are often refered to as measures of disorder, which are functions from to , where is the collection of all finite sequences of distinct nonnegative integers. The survey by Estivill-Castro and Wood [1] lists and discusses 11 different measures of disorder in the context of adaptive sorting algorithms.
The number of inversions might work for some cases, but is sometimes insufficient. An example given in [1] is the sequence
that has a quadratic number of inversions, but only consists of two ascending runs. It is nearly sorted, but this is not captured by inversions.
Mannila [1] axiomatizes presortedness (with a focus on comparison-based algorithms) as follows (paraphrasing).
Let a totally ordered set. Then a mapping from (the sequences of distinct elements from ) to the naturals is a measure of presortedness if it satisfies below conditions.
If is sorted then .
If with , and for all , then .
If is a subsequence of , then .
If for all and for some , then .
for all and .
Examples of such measures are the
Note that random distributions using these measures have been defined, i.e. such that make sequences that are more/less sorted more or less likely. These are called Ewens-like distributions [2, Ch. 4-5; 3, Example 12; 4], a special case of which is the so-called Mallows distribution. The weights are parametric in a constant and fulfill
.
Note how defines the uniform distribution (for all ).
Since it is possible to sample permutations w.r.t. these measures efficiently, this body of work can be useful in practice when benchmarking sorting algorithms.
I have my own definition of "sortedness" of a sequence.
Given any sequence [a,b,c,…] we compare it with the sorted sequence containing the same elements, count number of matches and divide it by the number of elements in the sequence.
For example, given sequence [5,1,2,3,4]
we proceed as follows:
1) sort the sequence: [1,2,3,4,5]
2) compare the sorted sequence with the original by moving it one position at a time and counting the maximal number of matches:
[5,1,2,3,4]
[1,2,3,4,5] one match
[5,1,2,3,4]
[1,2,3,4,5] no matches
[5,1,2,3,4]
[1,2,3,4,5] no matches
[5,1,2,3,4]
[1,2,3,4,5] no matches
[5,1,2,3,4]
[1,2,3,4,5] no matches
[5,1,2,3,4]
[1,2,3,4,5] 4 matches
[5,1,2,3,4]
[1,2,3,4,5] no matches
...
[5,1,2,3,4]
[1,2,3,4,5] no matches
3) The maximal number of matches is 4, we can calculate the "sortedness" as 4/5 = 0.8.
Sortedness of a sorted sequence would be 1, and sortedness of a sequence with elements placed in reversed order would be 1/n.
The idea behind this definition is to estimate the minimal amount of work we would need to do to convert any sequence to the sorted sequence. In the example above we need to move just one element, the 5 (there are many ways, but moving 5 is the most efficient). When the elements would be placed in reversed order, we would need to move 4 elements. And when the sequence were sorted, no work is needed.
I hope my definition makes sense.
If you need something quick and dirty (summation signs scare me) I wrote a super easy disorder function in C++ for a Class named Array which generates int arrays filled with randomly generated numbers:
void Array::disorder() {
double disorderValue = 0;
int counter = this->arraySize;
for (int n = 0; n < this->arraySize; n++) {
disorderValue += abs(((n + 1) - array[n]));
// cout << "disorderValue variable test value = " << disorderValue << endl;
counter++;
}
cout << "Disorder Value = " << (disorderValue / this->arraySize) / (this->arraySize / 2) << "\n" << endl;
}
Function simply compares the value in each element to the index of the element + 1 so that an array in reverse order has a disorder value of 1, and a sorted array has a disorder value of 0. Not sophisticated, but working.
Michael