洞穴,惠勒和背部


15

背景

Burrows-Wheeler变换(BWT)是一个字符串的字符的可逆置换的结果在对某些类型的字符串如纯文本相似字符大运行。例如,它在bzip2压缩算法中使用

BWT定义如下:

给定一个输入字符串,例如codegolf,计算它的所有可能的旋转并将它们按字典顺序排序:

codegolf
degolfco
egolfcod
fcodegol
golfcode
lfcodego
odegolfc
olfcodeg

字符串的BWT codegolf是由该字符串的每个字符的最后一个字符组成的字符串,即该块的最后一列。因为codegolf,这产生了fodleocg

就其本身而言,这种转换是不可逆的,因为字符串codegolfgolfcode导致相同的字符串。但是,如果我们知道字符串以结束f,则只有一个可能的原像。

任务

实现一个渐进式程序或函数,该程序或函数从STDIN读取单个字符串,或者作为命令行或函数参数读取并打印或返回BWT或它的输入字符串的倒数。

如果输入字符串不包含空格,则您的提交应在输入后附加一个空格并计算BWT。

如果输入字符串已经包含一个空格,则它应该计算具有尾随空格的BWT的原像,并去除该空格。

例子

INPUT:  ProgrammingPuzzles&CodeGolf
OUTPUT: fs&e grodllnomzomaiCrGgPePzu
INPUT:  fs&e grodllnomzomaiCrGgPePzu
OUTPUT: ProgrammingPuzzles&CodeGolf
INPUT:  bt4{2UK<({ZyJ>LqQQDL6!d,@:~L"#Da\6%EYp%y_{ed2GNmF"1<PkB3tFbyk@u0#^UZ<52-@bw@n%m5xge2w0HeoM#4zaT:OrI1I<|f#jy`V9tGZA5su*b7X:Xn%L|9MX@\2W_NwQ^)2Yc*1b7W<^iY2i2Kr[mB;,c>^}Z]>kT6_c(4}hIJAR~x^HW?l1+^5\VW'\)`h{6:TZ)^#lJyH|J2Jzn=V6cyp&eXo4]el1W`AQpHCCYpc;5Tu@$[P?)_a?-RV82[):[@94{*#!;m8k"LXT~5EYyD<z=n`Gfn/;%}did\fw+/AzVuz]7^N%vm1lJ)PK*-]H~I5ixZ1*Cn]k%dxiQ!UR48<U/fbT\P(!z5l<AefL=q"mx_%C:2=w3rrIL|nghm1i\;Ho7q+44D<74y/l/A)-R5zJx@(h8~KK1H6v/{N8nB)vPgI$\WI;%,DY<#fz>is"eB(/gvvP{7q*$M4@U,AhX=JmZ}L^%*uv=#L#S|4D#<
OUTPUT: <#Q6(LFksq*MD"=L0<f^*@I^;_6nknNp;pWPBc@<A^[JZ?\B{qKc1u%wq1dU%;2)?*nl+U(yvuwZl"KIl*mm5:dJi{\)8YewB+RM|4o7#9t(<~;^IzAmRL\{TVH<bb]{oV4mNh@|VCT6X)@I/Bc\!#YKZDl18WDIvXnzL2Jcz]PaWux[,4X-wk/Z`J<,/enkm%HC*44yQ,#%5mt2t`1p^0;y]gr~W1hrl|yI=zl2PKU~2~#Df"}>%Io$9^{G_:\[)v<viQqwAU--A#ka:b5X@<2!^=R`\zV7H\217hML:eiD2ECETxUG}{m2:$r'@aiT5$dzZ-4n)LQ+x7#<>xW)6yWny)_zD1*f @F_Yp,6!ei}%g"&{A]H|e/G\#Pxn/(}Ag`2x^1d>5#8]yP>/?e51#hv%;[NJ"X@fz8C=|XHeYyQY=77LOrK3i5b39s@T*V6u)v%gf2=bNJi~m5d4YJZ%jbc!<f5Au4J44hP/(_SLH<LZ^%4TH8:R
INPUT:  <#Q6(LFksq*MD"=L0<f^*@I^;_6nknNp;pWPBc@<A^[JZ?\B{qKc1u%wq1dU%;2)?*nl+U(yvuwZl"KIl*mm5:dJi{\)8YewB+RM|4o7#9t(<~;^IzAmRL\{TVH<bb]{oV4mNh@|VCT6X)@I/Bc\!#YKZDl18WDIvXnzL2Jcz]PaWux[,4X-wk/Z`J<,/enkm%HC*44yQ,#%5mt2t`1p^0;y]gr~W1hrl|yI=zl2PKU~2~#Df"}>%Io$9^{G_:\[)v<viQqwAU--A#ka:b5X@<2!^=R`\zV7H\217hML:eiD2ECETxUG}{m2:$r'@aiT5$dzZ-4n)LQ+x7#<>xW)6yWny)_zD1*f @F_Yp,6!ei}%g"&{A]H|e/G\#Pxn/(}Ag`2x^1d>5#8]yP>/?e51#hv%;[NJ"X@fz8C=|XHeYyQY=77LOrK3i5b39s@T*V6u)v%gf2=bNJi~m5d4YJZ%jbc!<f5Au4J44hP/(_SLH<LZ^%4TH8:R
OUTPUT: bt4{2UK<({ZyJ>LqQQDL6!d,@:~L"#Da\6%EYp%y_{ed2GNmF"1<PkB3tFbyk@u0#^UZ<52-@bw@n%m5xge2w0HeoM#4zaT:OrI1I<|f#jy`V9tGZA5su*b7X:Xn%L|9MX@\2W_NwQ^)2Yc*1b7W<^iY2i2Kr[mB;,c>^}Z]>kT6_c(4}hIJAR~x^HW?l1+^5\VW'\)`h{6:TZ)^#lJyH|J2Jzn=V6cyp&eXo4]el1W`AQpHCCYpc;5Tu@$[P?)_a?-RV82[):[@94{*#!;m8k"LXT~5EYyD<z=n`Gfn/;%}did\fw+/AzVuz]7^N%vm1lJ)PK*-]H~I5ixZ1*Cn]k%dxiQ!UR48<U/fbT\P(!z5l<AefL=q"mx_%C:2=w3rrIL|nghm1i\;Ho7q+44D<74y/l/A)-R5zJx@(h8~KK1H6v/{N8nB)vPgI$\WI;%,DY<#fz>is"eB(/gvvP{7q*$M4@U,AhX=JmZ}L^%*uv=#L#S|4D#<
INPUT:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
OUTPUT: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
INPUT:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
OUTPUT: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

附加规则

  • 您不能使用任何内置运算符来计算字符串的BWT(或其反函数)。

  • 您不能使用任何内置函数来反转功能。

  • 如果您选择STDOUT作为输出,即使您的代码不支持输入的尾随换行符,您的代码也可能会打印尾随的换行符。

  • 您的代码必须能处理500个或更少可打印ASCII字符(0x20至0x7E)的任何输入,包括最多一个空格。

  • 对于上述任何可能的输入,您的代码必须在我的机器上(Intel Core i7-3770,16 GiB RAM)在十分钟之内完成。最后一个测试用例应该是最慢的用例,因此请确保您将代码和该用例一起计时。

    对于大多数语言和方法,应该很容易遵守时间限制。此规则仅用于防止将任一变换实现为另一变换。

  • 适用标准代码高尔夫规则。以字节为单位的最短提交获胜。


“您不能使用任何内置函数来反转功能。” 如果存在这样的事情,我会感到非常惊讶!
休·艾伦

4
@HughAllen J有inv
丹尼斯

它可能适用于琐碎或内置的函数,但是如何适用于一般的用户定义函数?此“ inv”记录在某处吗?
休·艾伦

@HughAllen:我真的不知道Ĵ,但这里有一个例子inv一个用户定义函数的工作。我不确定是否inv可以为BWT工作,但比遗憾更好。
丹尼斯2015年

相关信息,请参见codegolf.stackexchange.com/questions/4771/…,以获取完整的bzip2。
基思·兰德尔

Answers:


9

Pyth,29个字节

?tthu+VzSGzz}dzseMS.>L+z\ hlz

示范。 测试线束。

这直接分为编码和解码部分,其中代码决定与三元数一起使用。

?tthu+VzSGzz}dzseMS.>L+z\ hlz
                                 Implicit:
                                 z = input()
                                 d = ' '

?           }dz                  If there is a space in the input,
    u     zz                     Update G to the result of the following,
                                 with G starting as z and repeating len(z) times:
     +V                          The vectorized sum of
       zSG                       z and sorted(G)
   h                             Take the first such result, which will consist of
                                 the first character of z followed by the
                                 first cyclic permuation of the pre-BWT string,
                                 which must start with ' '.
 tt                              Remove the first two characters and return.
                 L               Otherwise, left-map (map with the variable as the 
                                 left parameter and a constant as the right)
               .>                cyclic right shift
                  +z\            of z + ' '
                      hlz        over range(len(z)+1)
              S                  Sort the shifted strings,
            eM                   take their last charactes,
           s                     combine into one string and return.

6

CJam,41 36 35字节

q:XS&LX,{X\.+$}*0=1>XS+_,,\fm<$zW>?

在这里测试。

说明

q:X   e# Read STDIN and store it in X.
S&    e# Take the set intersection with " ". We'll use this as a truthy/falsy value to
      e# select the correct output later.

# Compute the iBWT:
LX,   e# Push an empty array, compute the length of X.
{     e# Run the following block that many times:
  X\  e# Push X and pull the other array on top.
  .+  e# Add the characters of X to the corresponding line of the other array,
      e# i.e. prepend X as a new column.
  $   e# Sort the rows.
}*
0=    e# Since we just sorted the rows, the first permutation of the output will be
      e# one starting with a space, followed by the string we actually want. So just
      e# pick the first permutation.
1>    e# Remove the leading space.

# Compute the BWT:
XS+   e# Push X and append a space.
_,    e# Get that string's length N.
,\    e# Turn it into a range [0 .. N-1], swap it with the string.
fm<   e# Map each value in the range to the string shifted left by that many characters.
$     e# Sort the permutations.
zW>   e# Transpose the grid and discard all lines but the last.

?     e# Choose between the iBWT and the BWT based on whether the input had a space.

2

Perl 5,179

我的另一个不太好。这可能会有一些收获,但是它不会与专用的高尔夫语言竞争。

$_=<>;@y=/./g;if(/ /){@x=sort@y;for(1..$#y){@x=sort map$y[$_].$x[$_],0..$#x}
say$x[0]=~/^ (.*)/}else{push@y," ";say map/(.)$/,sort map{$i=$_;join"",
map$y[($_+$i)%@y],0..$#y}0..$#y}

未打高尔夫球:

# Read input
$_ = <>;
# Get all the chars of the input
my @chars = /./g;

if (/ /) {
  # If there's a space, run the IBWT:
  # Make the first column of the table
  my @working = sort @chars;

  # For each remaining character
  for (1 .. $#chars) {
    # Add the input as a new column to the left of @working,
    # then sort @working again
    @working = sort map {
      $chars[$_] . $working[$_]
    } 0 .. $#working;
  }
  # Print the first element of @working (the one beginning with space), sans space
  say $working[0] =~ /^ (.*)/;
} else {
  # BWT
  # Add a space to the end of the string
  push @chars, " ";
  # Get all the rotations of the string and sort them
  @rows = sort map {
    my $offset = $_;
    join "", map {
      $chars[($_ + $offset) % @chars]
    } 0 .. $#chars
  } 0 .. $#chars;

  # Print all the last characters
  say map /(.)$/, @rows;
}

一些小的改进:1.如果使用-n开关(通常计为1个字节),则不需要$_=<>;。2. for(1..$#y){...}可以成为... for 1..$#y。3. $"被初始化为," "并且缩短了一个字节。
丹尼斯2015年

@丹尼斯好电话。我for在同一时间有多个语句,因此后缀形式不是一个胜利,但是当我将其缩减为一个时,我没有注意到:)
霍布斯2015年
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.