12

这种高尔夫运动需要将阶乘计算分配到多个线程或多个进程中。

某些语言比其他语言更易于协调，因此与lang无关。提供了示例化的示例代码，但是您应该开发自己的算法。

比赛的目的是看谁能拿出最短的（以字节为单位，而不是秒）多核阶乘算法来计算N！根据比赛结束时的选票来衡量。应该有一个多核优势，因此我们要求它应在N〜10,000下工作。如果作者未能提供有效的解释说明选民如何在处理器/核心之间分配工作，则选民应投反对票，并基于高尔夫的简洁性投票。

出于好奇，请发布一些性能数字。在某些时候，可能会在性能与高尔夫得分之间进行权衡，只要符合要求，就选择高尔夫。我很想知道什么时候发生。

您可以使用通常可用的单核大整数库。例如，perl通常与bigint一起安装。但是，请注意，简单地调用系统提供的阶乘函数通常不会将工作分散在多个内核上。

您必须从STDIN或ARGV接受输入N，并向STDOUT输出N！的值。您可以选择使用第二个输入参数来为程序提供处理器/内核数，这样它就不会执行下面将要看到的事情：-)或者您可以为可用的2、4进行显式设计。

我将在下面发布自己的奇数球perl示例，该示例先前已提交给Stack Overflow下的使用不同语言的阶乘算法。这不是高尔夫。提交了许多其他示例，其中许多是高尔夫，但许多不是。由于采用类似共享的许可方式，因此可以随意使用以上链接中任何示例中的代码作为起点。

在我的示例中，性能由于许多原因而表现欠佳：它使用了太多的进程，太多的字符串/ bigint转换。正如我所说的，这是一个故意的奇怪例子。它将计算5000！在这里在4核计算机上不到10秒。但是，一个更明显的两个衬里for / next循环可以做5000个！在3.6s的四个处理器之一上。

您绝对必须做得比这更好：

#!/usr/bin/perl -w                                                              
use strict;
use bigint;
die "usage: f.perl N (outputs N!)" unless ($ARGV[0] > 1);
print STDOUT &main::rangeProduct(1,$ARGV[0])."\n";
sub main::rangeProduct {
    my($l, $h) = @_;
    return $l    if ($l==$h);
    return $l*$h if ($l==($h-1));
    # arghhh - multiplying more than 2 numbers at a time is too much work       
    # find the midpoint and split the work up :-)                               
    my $m = int(($h+$l)/2);
    my $pid = open(my $KID, "-|");
      if ($pid){ # parent                                                       
        my $X = &main::rangeProduct($l,$m);
        my $Y = <$KID>;
        chomp($Y);
        close($KID);
        die "kid failed" unless defined $Y;
        return $X*$Y;
      } else {
        # kid                                                                   
        print STDOUT &main::rangeProduct($m+1,$h)."\n";
        exit(0);
    }
}

我对此的兴趣仅仅是（1）减轻无聊；（2）学习新东西。对我来说，这不是家庭作业或研究问题。

祝好运！

code-golf

— 保罗
source

10

您无法通过投票计算出最短的代码，而打高尔夫球和多线程的要求似乎相差甚远。

— aaaaaaaaaaaaa

我的古代单核笔记本可以做10000个！在Python中不到0.2秒。

— gnibbler 2011年

对与CPU绑定的进程进行多线程处理几乎总是会使其速度降低。您要做的只是增加开销而几乎没有性能提升。多线程用于I / O等待。

— mellamokb 2011年

2

@mellamokb：我希望对多核系统有所不同。

— 乔伊

@乔伊：嗯。错过了这么小的细节：s同意

— mellamokb'3

7

Mathematica

具有并行功能：

 f[n_, g_] := g[Product[N@i, {i, 1, n, 2}] Product[N@i, {i, 2, n, 2}]]

其中g是Identity或Parallelize取决于所需过程的类型

对于时序测试，我们将稍微修改该函数，使其返回实际时钟时间。

f[n_, g_] := First@AbsoluteTiming[g[Product[N@i,{i,1,n,2}] Product[N@i,{i,2,n,2}]]]

并且我们测试了两种模式（从10 ^ 5到9 * 10 ^ 5）：（这里只有两个内核）

ListLinePlot[{Table[f[i, Identity],    {i, 100000, 900000, 100000}], 
              Table[f[i, Parallelize], {i, 100000, 900000, 100000}]}]

结果：在此处输入图片说明

— 贝利萨留博士
source

您在第一行代码中是否缺少]？看起来不平衡。

— 彼得·泰勒

@Peter谢谢，最后一个“]”没有通过复制缓冲区。已更正。

— belisarius博士2011年

1

这似乎是最短的。它看起来也最快，除非我误读了一些东西。我不再订阅Mathematica，因此无法验证。感谢您的参与。

— 保罗

7

Haskell：209 200 198 177个字符

176 167源+ 33 10编译器标志

这个解决方案很愚蠢。它将product并行应用于type的值[[Integer]]，其中内部列表的长度最多为两项。一旦外部列表减少到最多2个列表，我们将其展平并直接拿走产品。是的，类型检查器需要使用Integer注释的内容，否则它将无法编译。

import Control.Parallel.Strategies
s(x:y:z)=[[x,y::Integer]]++s z;s x=[x]
p=product
f n=p$concat$(until((<3).length)$s.parMap rseq p)$s[1..n]
main=interact$show.f.read

（随时阅读的中间部分f之间concat，并s为“直到我心脏长”）

由于Control.Parallel.Strategies的parMap使得将其移植到多个线程非常容易，因此事情看起来会很好。但是，看起来GHC 7在命令行选项和环境var中需要多达33个字符，才能真正使线程运行时使用多个核（我已将其包括在内）。除非我缺少什么，否则肯定是~~有可能~~ 的。（更新：线程化的GHC运行时似乎使用了N-1个线程，其中N是内核数，因此无需摆弄运行时选项。）

编译：

ghc -threaded prog.hs

但是，考虑到引发了可笑的并行评估，并且我没有使用-O2进行编译，因此运行时非常不错。50000！在双核MacBook上，我得到：

SPARKS: 50020 (29020 converted, 1925 pruned)

INIT  time    0.00s  (  0.00s elapsed)
MUT   time    0.20s  (  0.19s elapsed)
GC    time    0.12s  (  0.07s elapsed)
EXIT  time    0.00s  (  0.00s elapsed)
Total time    0.31s  (  0.27s elapsed)

几个不同值的总时间，第一列是高尔夫球平行杆，第二列是朴素的顺序版本：

          Parallel   Sequential
 10000!      0.03s        0.04s
 50000!      0.27s        0.78s
100000!      0.74s        3.08s
500000!      7.04s       86.51s

作为参考，朴素的顺序版本是这样的（使用-O2编译）：

factorial :: Integer -> Integer
factorial n = product [1..n]
main = interact $ show.factorial.read

1

IMO，您不必为编译器和解释器计算args。

— FUZxxl 2011年

@FUZxxl：通常我会同意，但是这个问题特别要求它在多个线程或进程中运行，并且需要使用这些标志来实现（至少从最新的Haskell平台使用GHC 7.0.2）。

6

Ruby-111 + 56 = 167个字符

这是一个两个文件的脚本，主文件（fact.rb）：

c,n=*$*.map(&:to_i)
p=(0...c).map{|k|IO.popen("ruby f2.rb #{k} #{c} #{n}")}
p p.map{|l|l.read.to_i}.inject(:*)

多余的文件（f2.rb）：

c,h,n=*$*.map(&:to_i)
p (c*n/h+1..(c+1)*n/h).inject(:*)

只需将进程数和要计算的数字作为args，然后将工作划分为每个进程可以分别计算的范围。然后将结果相乘。

这确实表明Rubinius比YARV慢多少：

鲁比尼乌斯（Rubinius）：

time ruby fact.rb 5 5000 #=> 61.84s

Ruby1.9.2：

time ruby fact.rb 5 50000 #=> 3.09s

（注意额外的0）

— 尼莫157
source

1

注入可以使用符号作为参数，因此您可以使用来保存字符inject(:+)。这是docs中的示例：(5..10).reduce(:+)。

— Michael Kohl

@Michael：谢谢:)。还只是注意到，如果有人在运行此程序时遇到问题，我8应该在哪里应该有一个*。

— Nemo157 2011年

6

Java，523 519 434 430 429个字符

import java.math.*;public class G extends Thread{BigInteger o,i,r=BigInteger.ONE,h;G g;G(BigInteger O,int
I,int n){o=O;i=new BigInteger(""+I);if(n>1)g=new G(O.subtract(r),I,n-1);h=n==I?i:r;start();}public void
run(){while(o.signum()>0){r=r.multiply(o);o=o.subtract(i);}try{g.join();r=r.multiply(g.r);}catch(Exception
e){}if(h==i)System.out.println(r);}public static void main(String[] args){new G(new BigInteger(args[0]),4,4);}}

最后一行中的两个4是要使用的线程数。

50000！在以下框架（原始版本的非ololfed版本，并减少了一些不良实践，尽管仍然有很多）上进行了测试（在我的4核Linux机器上）

7685ms
2338ms
1361ms
1093ms
7724ms

请注意，为了公平起见，我用一个线程重复了该测试，因为jit可能已经变热了。

import java.math.*;

public class ForkingFactorials extends Thread { // Bad practice!
    private BigInteger off, inc;
    private volatile BigInteger res;

    private ForkingFactorials(int off, int inc) {
        this.off = new BigInteger(Integer.toString(off));
        this.inc = new BigInteger(Integer.toString(inc));
    }

    public void run() {
        BigInteger p = new BigInteger("1");
        while (off.signum() > 0) {
            p = p.multiply(off);
            off = off.subtract(inc);
        }
        res = p;
    }

    public static void main(String[] args) throws Exception {
        int n = Integer.parseInt(args[0]);
        System.out.println(f(n, 1));
        System.out.println(f(n, 2));
        System.out.println(f(n, 3));
        System.out.println(f(n, 4));
        System.out.println(f(n, 1));
    }

    private static BigInteger f(int n, int numThreads) throws Exception {
        long now = System.currentTimeMillis();
        ForkingFactorials[] th = new ForkingFactorials[numThreads];
        for (int i = 0; i < n && i < numThreads; i++) {
            th[i] = new ForkingFactorials(n-i, numThreads);
            th[i].start();
        }
        BigInteger f = new BigInteger("1");
        for (int i = 0; i < n && i < numThreads; i++) {
            th[i].join();
            f = f.multiply(th[i].res);
        }
        long t = System.currentTimeMillis() - now;
        System.err.println("Took " + t + "ms");
        return f;
    }
}

带有bigints的Java不是打高尔夫球的正确语言（看看我要做的只是构造可怜的东西，因为需要很长时间的构造函数是私有的），但是嘿。

从松散的代码中应该可以很明显地看出它是如何分解工作的：每个线程将等价类乘以线程数模数。关键是每个线程的工作量大致相同。

— 彼得·泰勒
source

5

CSHARP - 206 215个字符

using System;using System.Numerics;using System.Threading.Tasks;class a{static void Main(){var n=int.Parse(Console.ReadLine());var r=new BigInteger(1);Parallel.For(1,n+1,i=>{lock(this)r*=i;});Console.WriteLine(r);}}

使用C＃Parallel.For（）功能拆分计算。

编辑; 忘记锁

执行时间：

n = 10,000, time: 59ms.
n = 20,000, time: 50ms.
n = 30,000, time: 38ms.
n = 40,000, time: 100ms.
n = 50,000, time: 139ms.
n = 60,000, time: 164ms.
n = 70,000, time: 222ms.
n = 80,000, time: 266ms.
n = 90,000, time: 401ms.
n = 100,000, time: 424ms.
n = 110,000, time: 501ms.
n = 120,000, time: 583ms.
n = 130,000, time: 659ms.
n = 140,000, time: 832ms.
n = 150,000, time: 1143ms.
n = 160,000, time: 804ms.
n = 170,000, time: 653ms.
n = 180,000, time: 1031ms.
n = 190,000, time: 1034ms.
n = 200,000, time: 1765ms.
n = 210,000, time: 1059ms.
n = 220,000, time: 1214ms.
n = 230,000, time: 1362ms.
n = 240,000, time: 2737ms.
n = 250,000, time: 1761ms.
n = 260,000, time: 1823ms.
n = 270,000, time: 3357ms.
n = 280,000, time: 2110ms.

— 抢
source

4

佩尔（140）

需要N从标准输入。

use bigint;$m=<>;open A,'>',
undef;$i=$p=fork&&1;$n=++$i;
{$i+=2;$n*=$i,redo if$i<=$m}
if($p){wait;seek A,0,0;$_=<A
>;print$n*$_}else{print A$n}

特征：

计算拆分：一侧为偶数，另一侧为奇数（比这复杂的东西将需要很多字符来适当地平衡计算负载。
IPC使用共享的匿名文件。

基准测试：

10000！以分叉的2.3s，未分叉的3.4s打印
100000！印有5'08.8分叉，7'07.9未分叉

— JB
source

4

阶（345个 266 244个 232个 214字符）

使用演员：

object F extends App{import actors.Actor._;var(t,c,r)=(args(1).toInt,self,1:BigInt);val n=args(0).toInt min t;for(i<-0 to n-1)actor{c!(i*t/n+1 to(i+1)*t/n).product};for(i<-1 to n)receive{case m:Int=>r*=m};print(r)}

编辑删除了对的引用System.currentTimeMillis()，将其分解为，从a(1).toInt更改List.range为x to y

编辑2-将while循环更改为a for，将左折更改为具有相同功能的列表函数，这依赖于隐式类型转换，因此6个字符BigInt类型仅出现一次，将println更改为print

编辑3-了解如何在Scala中执行多个声明

编辑4-自从我第一次做这件事以来，我学到了各种优化

非高尔夫版本：

import actors.Actor._
object ForkingFactorials extends App
{
    var (target,caller,result)=(args(1).toInt,self,1:BigInt)
    val numthreads=args(0).toInt min target
    for(i<-0 to numthreads-1)
        actor
        {
            caller ! (i*target/numthreads+1 to(i+1)*target/numthreads+1).product
        }
    for(i<-1 to numthreads)
        receive
        {
            case m:Int=>result*=m
        }
    print(result)
}

— 加雷斯
source

3

Scala-2.9.0 170

object P extends App{
def d(n:Int,c:Int)=(for(i<-1 to c)yield(i to n by c)).par
println((BigInt(1)/: d(args(0).toInt,args(1).toInt).map(x=>(BigInt(1)/: x)(_*_)))(_*_))}

松开

object ParallelFactorials extends App
{
  def distribute (n: Int, cores: Int) = {
    val factorgroup = for (i <- 1 to cores) 
      yield (i to n by cores)
    factorgroup.par
  }

  val parallellist = distribute (args(0).toInt, args(1).toInt)

  println ((BigInt (1) /: parallellist.map (x => (BigInt(1) /: x) (_ * _)))(_ * _))

}

通过生成4个列表在4个核上计算10的阶乘：

1 5 9
2 6 10
3 7
4 8

并行相乘。分配数字会更简单：

 (1 to n).sliding ((n/cores), (n/cores)

1 2 3
4 5 6
7 8 9
10

但是分布不是那么好-较小的数字都将在同一列表中结束，在另一个列表中最高，导致最后一个列表中的计算时间更长（对于高Ns，最后一个线程几乎不会为空，但至少包含（N / cores）个cores元素。

2.9版的Scala包含并行Collections，它们自己处理并行调用。

— 用户未知
source

2

Erlang-295个字符。

我用Erlang写过的第一件事，因此如果有人可以轻松地将其减半，我不会感到惊讶：

-module(f).
-export([m/2,f/4]).
m(N,C)->g(N,C,C,[]).
r([],B)->B;
r(A,B)->receive{F,V}->r(lists:delete(F,A),V*B)end.
s(H,L)->spawn(f,f,[self(),H,L,1]).
g(N,1,H,A)->r([s(N div H,1)|A],1);
g(N,C,H,A)->g(N,C-1,H,[s(N*C div H,N*(C-1) div H)|A]).
f(P,H,H,A)->P!{self(),A};
f(P,H,L,A)->f(P,H-1,L,A*H).

使用与我以前的Ruby条目相同的线程模型：将范围划分为子范围，并将范围在单独的线程中相乘，然后将结果相乘回到主线程中。

我无法弄清楚如何使escript正常工作，因此只能另存为f.erl并打开erl并运行：

c(f).
f:m(NUM_TO_CALC, NUM_OF_PROCESSES).

进行适当的替换。

在我的MacBook Air（双核）上，以2个进程的50000耗时8s，以1个进程的10s耗时。

注意：刚注意到，如果尝试的进程数量超过要分解的数量，它将冻结。