I find that heuristic arguments are often quite misleading when considering task scheduling (and closely related problems like bin packing). Things can happen that are counter-intuitive. For such a simple case, it is worthwhile actually doing the probability theory.
Let n=kmn=km with kk a positive integer. Suppose TijTij is the time taken to complete the jj-th task given to processor ii. This is a random variable with mean μμ and variance σ2σ2. The expected makespan in the first case is
E[M]=E[max{k∑j=1Tij∣i=1,2,…,m}].
E[M]=E[max{∑j=1kTij∣i=1,2,…,m}].
The sums are all iid with mean
kμkμ and variance
kσ2kσ2, assuming that
TijTij are all iid (this is stronger than pairwise independence).
Now to obtain the expectation of a maximum, one either needs more information about the distribution, or one has to settle for distribution-free bounds, such as:
- Peter J. Downey, Distribution-free bounds on the expectation of the maximum with scheduling applications, Operations Research Letters 9, 189–201, 1990. doi:10.1016/0167-6377(90)90018-Z
which can be applied if the processor-wise sums are iid. This would not necessarily be the case if the underlying times were just pairwise independent. In particular, by Theorem 1 the expected makespan is bounded above by
E[M]≤kμ+σ√kn−1√2n−1.
E[M]≤kμ+σk−−√n−12n−1−−−−−√.
Downey also gives a particular distribution achieving this bound, although the distribution changes as
nn does, and is not exactly natural.
Note that the bound says that the expected makespan can increase as any of the parameters increase: the variance σ2σ2, the number of processors nn, or the number of tasks per processor kk.
For your second question, the low-variance scenario resulting in a larger makespan seems to be an unlikely outcome of a thought experiment. Let X=maxmi=1XiX=maxmi=1Xi denote the makespan for the first distribution, and Y=maxmi=1YiY=maxmi=1Yi for the second (with all other parameters the same). Here XiXi and YiYi denote the sums of kk task durations corresponding to processor ii under the two distributions. For all x≥kμx≥kμ, independence yields
Pr[X≤x]=m∏i=1Pr[Xi≤x]≤m∏i=1Pr[Yi≤x]=Pr[Y≤x].
Pr[X≤x]=∏i=1mPr[Xi≤x]≤∏i=1mPr[Yi≤x]=Pr[Y≤x].
Since most of the mass of the probability distribution of the maximum will be above its mean,
E[X]E[X] will therefore tend to be larger than
E[Y]. This is not a completely rigorous answer, but in short, the second case seems preferable.