沃森克里克回文


31

问题

创建一个可以确定任意DNA字符串是否为Watson-Crick回文的函数。如果该字符串是Watson-Crick回文,则该函数将获取一个DNA字符串并输出一个真值,否则返回一个假值。(True和False也可以分别表示为1和0。)

DNA字符串可以全部大写或全部小写,具体取决于您的喜好。

同样,DNA字符串将不会为空。

说明

当一个DNA字符串的反向互补序列等于其自身时,即为沃森-克里克回文。

给定一个DNA字符串,首先将其反转,然后根据DNA碱基(A↔T和C↔G)对每个字符进行补充。如果原始字符串等于反补字符串,则为Watson-Crick回文。

有关更多信息,请参见此问题。这是一个不同的挑战,您必须找到DNA字符串中最长的子字符串,其中该子字符串是Watson-Crick回文。

目标

这是代码高尔夫球,最短的代码获胜。

测试用例

格式为<input> = <output>

ATCGCGAT = true
AGT = false
GTGACGTCAC = true
GCAGTGA = false
GCGC = true
AACTGCGTTTAC = false
ACTG = false


3
有人应该用DNA#编写程序,这也是Watson-Crick回文。:D(可能不可能)
mbomb007 '16

或者,如果您喜欢,“如果一个单词在2个生成器上的自由组中具有2阶,则它是一个沃森-克里克回文集”(或n个生成器!)。
wchargin '16

(从技术上我想这是“最多2个命令”。)
wchargin '16

1
@AndrasDeak根据Watsons的书,富兰克林显然是他们身边的荆棘。她一再拒绝交出显示螺旋线的X光片(我记得),因为她拒绝相信它。如果您对发现有任何兴趣,则值得一读。
黑曜石凤凰

Answers:


27

05AB1E10 7字节

码:

Â'š×‡Q

说明:

要检查字符串是否是回文,我们只需要检查输入是否已被at交换,cg交换,然后将其反转即可。这就是我们要做的。我们使用Â(bifurcate)推送输入,然后将输入反转。现在是棘手的部分。'š×是的压缩版本creating。如果我们将其反转,则可以在代码中看到它的原因:

CreATinG
|  ||  |
GniTAerC

这将用于音译反向输入。用进行音译。之后,我们只需检查输入和音译输入是否正确Q并打印该值即可。所以这就是输入栈的样子actg

          # ["actg", "gtca"]
 'š×       # ["actg", "gtca", "creating"]
    Â      # ["actg", "gtca", "creating", "gnitaerc"]
     ‡     # ["actg", "cagt"]
      Q    # [0]

这也可以通过debug标志看到(请在此处尝试)。

使用CP-1252编码。在线尝试!


4
非常,很有创意 ...
Toby Speight 2016年

2
该语言具有一些非常简洁的功能
英里

18

果冻,9 个字节

O%8µ+U5ḍP

在线尝试!验证所有测试用例

怎么运行的

O%8µ+U5ḍP  Main link. Argument: S (string)

O          Compute the code points of all characters.
 %8        Compute the residues of division by 8.
           This maps 'ACGT' to [1, 3, 7, 4].
   µ       Begin a new, monadic link. Argument: A (array of residues)
    +U     Add A and A reversed.
      5ḍ   Test the sums for divisibility by 5.
           Of the sums of all pairs of integers in [1, 3, 7, 4], only 1 + 4 = 5
           and 3 + 7 = 10 are divisible by 5, thus identifying the proper pairings.
        P  Take the product of the resulting Booleans.

4
I think Python is pretty close to competing with this answer! Compare the first nine bytes of my answer: lambda s:. That's almost the full solution!
orlp

Wait, the "How it works" part does not really explain how it works... Why residues of 8 and sums of 5?? Where are the letters complemented?
ZeroOne

@ZeroOne I've clarified that part.
Dennis

Oh, wow! That's darn clever. :) Thanks!
ZeroOne

12

Python 2, 56 45 44 bytes

lambda s:s==s[::-1].translate("_T_GA__C"*32)

lambda s:s==s[::-1].translate("TCG_A"*99) works in Python 3
Alex Varga

8

Perl, 27 bytes

Includes +2 for -lp

Give input on STDIN, prints 1 or nothing:

dnapalin.pl <<< ATCGCGAT

dnapalin.pl:

#!/usr/bin/perl -lp
$_=y/ATCG/TAGC/r=~reverse

Replace $_= by $_+= to get 0 instead of empty for the false case



7

Retina, 34 33 bytes

$
;$_
T`ACGT`Ro`;.+
+`(.);\1
;
^;

Try it online! (Slightly modified to run all test cases at once.)

Explanation

$
;$_

Duplicate the input by matching the end of the string and inserting a ; followed by the entire input.

T`ACGT`Ro`;.+

Match only the second half of the input with ;.+ and perform the substitution of pairs with a transliteration. As for the target set Ro: o references the other set, that is o is replaced with ACGT. But R reverses this set, so the two sets are actually:

ACGT
TGCA

If the input is a DNA palindrome, we will now have the input followed by its reverse (separated by ;).

+`(.);\1
;

Repeatedly (+) remove a pair of identical characters around the ;. This will either continue until only the ; is left or until the two characters around the ; are no longer identical, which would mean that the strings aren't the reverse of each other.

^;

Check whether the first character is ; and print 0 or 1 accordingly.


6

JavaScript (ES6), 59 bytes

f=s=>!s||/^(A.*T|C.*G|G.*C|T.*A)$/.test(s)&f(s.slice(1,-1))

Best I could do without using Regexp was 62 bytes:

f=s=>!s||parseInt(s[0]+s.slice(-1),33)%32%7<1&f(s.slice(1,-1))

5

Ruby, 35

I tried other ways, but the obvious way was the shortest:

->s{s.tr('ACGT','TGCA').reverse==s}

in test program

f=->s{s.tr('ACGT','TGCA').reverse==s}

puts f['ATCGCGAT']
puts f['AGT']
puts f['GTGACGTCAC']
puts f['GCAGTGA']
puts f['GCGC']
puts f['AACTGCGTTTAC'] 

2
->s{s.==s.reverse.tr'ACGT','TGCA'} is a byte shorter
Mitch Schwartz

@MitchSchwartz wow, that works, but I have no idea what that first . is for. The code looks more right to me without it, but it is required to make it run. Is it documented anywhere?
Level River St

Are you sure you don't want to figure it out on your own?
Mitch Schwartz

@MitchSchwartz hahaha I already tried. I find ruby's requirements for whitespace very idiosyncratic. Strange requirements for periods is a whole other issue. I have several theories but all of them may be wrong. I suspect it may have something to do with treating == as a method rather than an operator, but searching by symbols is impossible.
Level River St

You suspected correctly. :) It's just a plain old method call.
Mitch Schwartz

5

Haskell, 48 45 bytes

(==)=<<reverse.map((cycle"TCG_A"!!).fromEnum)

Usage example: (==)=<<reverse.map((cycle"_T_GA__C"!!).fromEnum) $ "ATCGCGAT"-> True.

A non-pointfree version is

f x = reverse (map h x) == x           -- map h to x, reverse and compare to x
h c = cycle "TCG_A" !! fromEnum c      -- take the ascii-value of c and take the
                                       -- char at this position of string
                                       -- "TCG_ATCG_ATCG_ATCG_A..."

Edit: @Mathias Dolidon saved 3 bytes. Thanks!


Works with cycle "TCG_A" too. :)
Mathias Dolidon


4

Julia, 47 38 bytes

s->((x=map(Int,s)%8)+reverse(x))%50

This is an anonymous function that accepts a Char array and returns a boolean. To call it, assign it to a variable.

This uses Dennis' algorithm, which is shorter than the naïve solution. We get the remainder of each code point divided by 8, add that to itself reversed, get the remainders from division by 5, and check whether all are 0. The last step is accomplished using , the infix version of issubset, which casts both arguments to Set before checking. This means that [0,0,0] is declared a subset of 0, since Set([0,0,0]) == Set(0). This is shorter than an explicit check against 0.

Try it online!

Saved 9 bytes thanks to Dennis!


4

Jolf, 15 Bytes

Try it!

=~A_iγ"AGCT"_γi

Explanation:

   _i            Reverse the input
 ~A_iγ"AGCT"_γ   DNA swap the reversed input
=~A_iγ"AGCT"_γi  Check if the new string is the same as the original input

3

Jolf, 16 bytes

Try it here!

pe+i~Aiγ"GATC"_γ

Explanation

pe+i~Aiγ"GATC"_γ
    ~Aiγ"GATC"_γ  perform DNA transformation
  +i              i + (^)
pe                is a palindrome

3

Actually, 19 bytes

O`8@%`M;RZ`5@Σ%Y`Mπ

This uses Dennis's algorithm.

Try it online!

Explanation:

O`8@%`M;RZ`5@Σ%Y`Mπ
O                    push an array containing the Unicode code points of the input
 `8@%`M              modulo each code point by 8
       ;RZ           zip with reverse
          `5@Σ%Y`M   test sum for divisibility by 5
                  π  product

3

Oracle SQL 11.2, 68 bytes

SELECT DECODE(TRANSLATE(REVERSE(:1),'ATCG','TAGC'),:1,1,0)FROM DUAL; 

2
With SQL like that, I'm confident you must have written reports for some of my projects before...
corsiKa

3

Julia 0.4, 22 bytes

s->s$reverse(s)⊆""

The string contains the control characters EOT (4) and NAK (21). Input must be in form of a character array.

This approach XORs the characters of the input with the corresponding characters in the reversed input. For valid pairings, this results in the characters EOT or NAK. Testing for inclusion in the string of those characters produces the desired Boolean.

Try it online!


3

C,71

r,e;f(char*s){for(r=0,e=strlen(s)+1;*s;s++)r|=*s*s[e-=2]%5^2;return!r;}

2 bytes saved by Dennis. Additional 2 bytes saved by adapting for lowercase input: constants 37 and 21 are revised to 5 and 2.

C,75

i,j;f(char*s){for(i=j=0;s[i];i++)j|=s[i]*s[strlen(s)-i-1]%37!=21;return!j;}

Saved one byte: Eliminated parenthesis by taking the product of the two ASCII codes mod 37. The valid pairs evaluate to 21. Assumes uppercase input.

C,76

i,j;f(char*s){for(i=j=0;s[i];i++)j|=(s[i]+s[strlen(s)-i-1])%11!=6;return!j;}

Uses the fact that ASCII codes of the valid pairs sum to 138 or 149. When taken mod 11, these are the only pairs that sum to 6. Assumes uppercase input.

ungolfed in test program

i,j;

f(char *s){
   for(i=j=0;s[i];i++)                  //initialize i and j to 0; iterate i through the string
     j|=(s[i]+s[strlen(s)-i-1])%11!=6;  //add characters at i from each end of string, take result mod 11. If not 6, set j to 1
return!j;}                              //return not j (true if mismatch NOT detected.)

main(){
  printf("%d\n", f("ATCGCGAT"));
  printf("%d\n", f("AGT"));
  printf("%d\n", f("GTGACGTCAC"));
  printf("%d\n", f("GCAGTGA"));
  printf("%d\n", f("GCGC"));
  printf("%d\n", f("AACTGCGTTTAC"));
} 

1
r,e;f(char*s){for(r=0,e=strlen(s)+1;*s;s++)r|=*s*s[e-=2]%37^21;return!r;} saves a couple of bytes.
Dennis

@Dennis thanks, I really wasn't in the mood for modifying pointers, but it squeezed a byte out! I should have seen != > ^ myself. I reduced another 2 by changing to lowercase input: both magic numbers are now single digit.
Level River St

3

Factor, 72 bytes

Unfortunately regex can't help me here.

[ dup reverse [ { { 67 71 } { 65 84 } { 71 67 } { 84 65 } } at ] map = ]

Reverse, lookup table, compare equal.


Wow, that's a lot of whitespace!!! Is it all necessary? Also, a link to the language homepage would be useful.
Level River St

@LevelRiverSt Unfortunately, every bit of it is necessary. I'll add a link to the header.
cat

3

Bash + coreutils, 43 32 bytes

[ `tr ATCG TAGC<<<$1|rev` = $1 ]

Tests:

for i in ATCGCGAT AGT GTGACGTCAC GCAGTGA GCGC AACTGCGTTTAC; do ./78410.sh $i && echo $i = true || echo $i = false; done
ATCGCGAT = true
AGT = false
GTGACGTCAC = true
GCAGTGA = false
GCGC = true
AACTGCGTTTAC = false

3

J - 21 bytes

0=[:+/5|[:(+|.)8|3&u:

Based on Dennis' method

Usage

   f =: 0=[:+/5|[:(+|.)8|3&u:
   f 'ATCGCGAT'
1
   f 'AGT'
0
   f 'GTGACGTCAC'
1
   f 'GCAGTGA'
0
   f 'GCGC'
1
   f 'AACTGCGTTTAC'
0
   f 'ACTG'
0

Explanation

0=[:+/5|[:(+|.)8|3&u:
                 3&u:    - Convert from char to int
               8|        - Residues from division by 8 for each
            |.           - Reverse the list
           +             - Add from the list and its reverse element-wise
        [:               - Cap, compose function
      5|                 - Residues from division by 5 for each
    +/                   - Fold right using addition to create a sum
  [:                     - Cap, compose function
0=                       - Test the sum for equality to zero

3

Labyrinth, 42 bytes

_8
,%
;
"}{{+_5
"=    %_!
 = """{
 ;"{" )!

Terminates with a division-by-zero error (error message on STDERR).

Try it online!

The layout feels really inefficient but I'm just not seeing a way to golf it right now.

Explanation

This solution is based on Dennis's arithmetic trick: take all character codes modulo 8, add a pair from both ends and make sure it's divisible by 5.

Labyrinth primer:

  • Labyrinth has two stacks of arbitrary-precision integers, main and aux(iliary), which are initially filled with an (implicit) infinite amount of zeros.
  • The source code resembles a maze, where the instruction pointer (IP) follows corridors when it can (even around corners). The code starts at the first valid character in reading order, i.e. in the top left corner in this case. When the IP comes to any form of junction (i.e. several adjacent cells in addition to the one it came from), it will pick a direction based on the top of the main stack. The basic rules are: turn left when negative, keep going ahead when zero, turn right when positive. And when one of these is not possible because there's a wall, then the IP will take the opposite direction. The IP also turns around when hitting dead ends.
  • Digits are processed by multiplying the top of the main stack by 10 and then adding the digit.

The code starts with a small 2x2, clockwise loop, which reads all input modulo 8:

_   Push a 0.
8   Turn into 8.
%   Modulo. The last three commands do nothing on the first iteration
    and will take the last character code modulo 8 on further iterations.
,   Read a character from STDIN or -1 at EOF. At EOF we will leave loop.

Now ; discards the -1. We enter another clockwise loop which moves the top of the main stack (i.e. the last character) down to the bottom:

"   No-op, does nothing.
}   Move top of the stack over to aux. If it was at the bottom of the stack
    this will expose a zero underneath and we leave the loop.
=   Swap top of main with top of aux. The effect of the last two commands
    together is to move the second-to-top stack element from main to aux.
"   No-op.

Now there's a short linear bit:

{{  Pull two characters from aux to main, i.e. the first and last (remaining)
    characters of the input (mod 8).
+   Add them.
_5  Push 5.
%   Modulo.

The IP is now at a junction which acts as a branch to test divisibility by 5. If the result of the modulo is non-zero, we know that the input is not a Watson-Crick palindrome and we turn east:

_   Push 0.
!   Print it. The IP hits a dead end and turns around.
_   Push 0.
%   Try to take modulo, but division by zero fails and the program terminates.

Otherwise, we need to keep checking the rest of the input, so the IP keeps going south. The { pulls over the bottom of the remaining input. If we've exhausted the input, then this will be a 0 (from the bottom of aux), and the IP continues moving south:

)   Increment 0 to 1.
!   Print it. The IP hits a dead end and turns around.
)   Increment 0 to 1.
{   Pull a zero over from aux, IP keeps moving north.
%   Try to take modulo, but division by zero fails and the program terminates.

Otherwise, there are more characters in the string to be checked. The IP turns west and moves into the next (clockwise) 2x2 loop which consists largely of no-ops:

"   No-op.
"   No-op.
{   Pull one value over from aux. If it's the bottom of aux, this will be
    zero and the IP will leave the loop eastward.
"   No-op.

After this loop, we've got the input on the main stack again, except for its first and last character and with a zero on top. The ; discards the 0 and then = swaps the tops of the stacks, but this is just to cancel the first = in the loop, because we're now entering the loop in a different location. Rinse and repeat.


3

sed, 67 61 bytes

G;H;:1;s/\(.\)\(.*\n\)/\2\1/;t1;y/ACGT/TGCA/;G;s/^\(.*\)\1$/1/;t;c0

(67 bytes)

Test

for line in ATCGCGAT AGT GTGACGTCAC GCAGTGA GCGC AACTGCGTTTAC ACTG
do echo -n "$line "
    sed 'G;H;:1;s/\(.\)\(.*\n\)/\2\1/;t1;y/ACGT/TGCA/;G;s/^\(.*\)\1$/1/;t;c0' <<<"$line"
done

Output

ATCGCGAT 1
AGT 0
GTGACGTCAC 1
GCAGTGA 0
GCGC 1
AACTGCGTTTAC 0
ACTG 0

By using extended regular expressions, the byte count can be reduced to 61.

sed -r 'G;H;:1;s/(.)(.*\n)/\2\1/;t1;y/ACGT/TGCA/;G;s/^(.*)\1$/1/;t;c0'

If you can do it in 61 bytes, then that's your score -- there's nothing against NFA or turing-complete regexp on this particular challenge. Some challenges disallow regex in full, but usually only regex-golf will disallow non regular-expressions.
cat

3

C#, 65 bytes

bool F(string s)=>s.SequenceEqual(s.Reverse().Select(x=>"GACT"[("GACT".IndexOf(x)+2)%4]));

.NET has some fairly long framework method names at times, which doesn't necessarily make for the best code golf framework. In this case, framework method names make up 33 characters out of 90. :)

Based on the modulus trick from elsewhere in the thread:

bool F(string s)=>s.Zip(s.Reverse(),(a,b)=>a%8+b%8).All(x=>x%5==0);

Now weighs in at 67 characters whereof 13 are method names.

Another minor optimization to shave off a whopping 2 characters:

bool F(string s)=>s.Zip(s.Reverse(),(a,b)=>(a%8+b%8)%5).Sum()<1;

So, 65 of which 13 are framework names.

Edit: Omitting some of the limited "boilerplate" from the solution and adding a couple of conditions leaves us with the expression

s.Zip(s.Reverse(),(a,b)=>(a%8+b%8)%5).Sum()

Which gives 0 if and only if the string s is a valid answer. As cat points out, "bool F(string s)=>" is actually replacable with "s=>" if it's otherwise clear in the code that the expression is a Func<string,bool>, ie. maps a string to a boolean.


1
Welcome to PPCG, nice first answer! :D
cat

@cat Thanks for that! :)
robhol

1
I don't really know C#, but if this is a lambda, then you can leave out its type and assigning it, as anonymous functions are fine as long as they are assignable.
cat

1
Also, can't you do !s.Zip... instead of s.Zip...==0? (Or can't you ! ints in C#?) Even if you can't boolean-negate it, you can leave out any sort of inversion and state in your answer that this returns <this thing> for falsy and <this other deterministic, clearly discernable thing> for truthy.
cat

1
@cat: You're right about dropping the type. I thought the code had to be directly executable, but making simple assumptions about input and output makes it a bit easier. The other thing won't work, however - rightly so, in my opinion, since a boolean operation has no logical (hue hue) way to apply to a number. Assigning 0 and 1 the values of false and true is, after all, just convention.
robhol

2

REXX 37

s='ATCGCGAT';say s=translate(reverse(s),'ATCG','TAGC')

2

R, 101 bytes

g=function(x){y=unlist(strsplit(x,""));all(sapply(rev(y),switch,"C"="G","G"="C","A"="T","T"="A")==y)}

Test Cases

g("ATCGCGAT")
[1] TRUE
g("AGT")
[1] FALSE
g("GTGACGTCAC")
[1] TRUE
g("GCAGTGA")
[1] FALSE
g("GCGC")
[1] TRUE
g("AACTGCGTTTAC")
[1] FALSE
g("ACTG")
[1] FALSE

strsplit(x,"")[[1]] is 3 bytes shorter than unlist(strsplit(x,"")) and, here, is equivalent since x is always a single string of character.
plannapus

2

Octave, 52 bytes

f=@(s) prod(mod((i=mod(toascii(s),8))+flip(i),5)==0)

Following Denis's trick ... take the ASCII values mod 8, flip and add together; if every sum is a multiple of five, you're golden.


That one whitespace is significant? That's... odd.
cat

Also, you can leave out the f= assignment; unnamed functions are okay.
cat

1

Clojure/ClojureScript, 49 chars

#(=(list* %)(map(zipmap"ATCG""TAGC")(reverse %)))

Works on strings. If the requirements are loosened to allow lists, I can take off the (list* ) and save 7 chars.


1

R, 70 bytes

f=function(x)all(chartr("GCTA","CGAT",y<-strsplit(x,"")[[1]])==rev(y))

Usage:

> f=function(x)all(chartr("GCTA","CGAT",y<-strsplit(x,"")[[1]])==rev(y))
> f("GTGACGTCAC")
[1] TRUE
> f("AACTGCGTTTAC")
[1] FALSE
> f("AGT")
[1] FALSE
> f("ATCGCGAT")
[1] TRUE

1

C, 71 bytes

Requires ASCII codes for the relevant characters, but accepts uppercase, lowercase or mixed-case input.

f(char*s){char*p=s+strlen(s),b=0;for(;*s;b&=6)b|=*--p^*s++^4;return!b;}

This code maintains two pointers, s and p, traversing the string in opposite directions. At each step, we compare the corresponding characters, setting b true if they don't match. The matching is based on XOR of the character values:

'A' ^ 'T' = 10101
'C' ^ 'G' = 00100

'C' ^ 'T' = 10111
'G' ^ 'A' = 00110
'A' ^ 'C' = 00010
'T' ^ 'G' = 10011
 x  ^  x  = 00000

We can see in the above table that we want to record success for xx10x and failure for anything else, so we XOR with 00100 (four) and mask with 00110 (six) to get zero for AT or CG and non-zero otherwise. Finally, we return true if all the pairs accumulated a zero result in b, false otherwise.

Test program:

#include <stdio.h>
int main(int argc, char **argv)
{
    while (*++argv)
        printf("%s = %s\n", *argv, f(*argv)?"true":"false");
}

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.