您的任务（如果希望接受它）是编写一个程序，该程序以二进制UTF-8表示形式输出其自己的源代码。

规则

源必须至少为1个字节长。
您的程序不得输入（或有未使用的空输入）。
输出可以是任何方便的格式。
允许使用尾随换行符。
请注意，一个字节为8位，二进制UTF-8表示形式的长度必须为8的倍数。
这是代码高尔夫球，因此所有常用的高尔夫规则都适用，并且最短的代码（以字节为单位）获胜。
禁止出现标准漏洞。

例

假设您的源代码是Aä$$€h，其对应的UTF-8二进制表示形式是010000011100001110100100001001000010010011100010100000101010110001101000。

如果我运行Aä$$€h，输出必须是010000011100001110100100001001000010010011100010100000101010110001101000。

A      --> 01000001
ä      --> 1100001110100100
$      --> 00100100
$      --> 00100100
€      --> 111000101000001010101100
h      --> 01101000
Aä$$€h --> 010000011100001110100100001001000010010011100010100000101010110001101000

字符串到二进制UTF-8转换器

code-golf quine binary

— mdahmoune
source

1

“二进制”是指二进制值的字符串表示形式，即仅包含1和0的字符串吗？

1

@mdahmoune现在已经好多了。问题仍然是如何将其表示为UTF-8。注意，Unicode表示主要基于字符的外观（仅偶尔基于语义）。如果没有分配的Unicode字形看起来像源代码中的字符怎么办？Unicode也有许多相似之处（象形文字）。如何决定使用哪一个？例如Dyalog APL具有可被编码为AND函数01011110或0010011100100010在UTF-8（它们看起来很相似：^VS ∧）

— ADAM

1

更好的例子：01111100并0010001100100010编码|和∣。

— 亚当

4

@Adám我认为输出对应于将在某种语言的某种实现中编译/运行的符号的二进制序列是公平的。

— qwr

1

机器代码怎么样？（假设机器代码本身就是“源代码”，Commodore C64占用28个字节）

— Martin Rosenau

7

V，28（或16？）拉丁1字节（35 UTF-8字节）

ñéÑ~"qpx!!xxd -b
ÎdW54|D
Íßó

在线尝试！

十六进制转储（拉丁语1）：

00000000: f1e9 d17e 2271 7078 2121 7878 6420 2d62  ...~"qpx!!xxd -b
00000010: 0ace 6457 3534 7c44 0acd dff3            ..dW54|D....

输出（UTF-8中相同代码的二进制表示形式，而不是拉丁1）：

110000111011000111000011101010011100001110010001011111100010001001110001011100000111100000100001001000010111100001111000011001000010000000101101011000100000110111000011100011100110010001010111001101010011010001111100010001000000110111000011100011011100001110011111110000111011001100001010

说明：

ñéÑ~"qpx            " Standard quine. Anything after this doesn't affect the
                    " program's 'quine-ness' unless it modifies text in the buffer
        !!xxd -b    " Run xxd in binary mode on the text
Î                   " On every line...
 dW                 "   delete a WORD
   54|              "   Go to the 54'th character on this line
      D             "   And delete everything after the cursor
Í                   " Remove on every line...
  ó                 "   Any whitespace
 ß                  "   Including newlines

要么...

V，16字节

ñéÑ~"qpx!!xxd -b

在线尝试！

输出：

00000000: 11000011 10110001 11000011 10101001 11000011 10010001  ......
00000006: 01111110 00100010 01110001 01110000 01111000 00100001  ~"qpx!
0000000c: 00100001 01111000 01111000 01100100 00100000 00101101  !xxd -
00000012: 01100010 00001010                                      b.

OP说：

输出可以是任何方便的格式。

对于V：P，这将以更方便的格式输出（但是我不确定这是否在扩展规则）

— 詹姆士
source

6

CJam，20个字节

{s"_~"+{i2b8Te[}%}_~

在线尝试！

惊讶地看到CJam获胜！_{我们将看到持续多长时间...}

— 硕果累累
source

4

05AB1E，105 个字节

0"D34çýÇbεDg•Xó•18в@ƶà©i7j0ìëR6ôRíć7®-jšTìJ1®<×ì]ð0:J"D34çýÇbεDg•Xó•18в@ƶà©i7j0ìëR6ôRíć7®-jšTìJ1®<×ì]ð0:J

05AB1E没有内置的UTF-8转换，因此我必须手动完成所有操作。

在线尝试或验证它是奎奴亚藜。

说明：

quine -part：

05AB1E 的最短quine是这个：0"D34çý"D34çý（14个字节）由@OliverNi提供。我的答案通过在...此处添加来使用该quine的修改版本0"D34çý..."D34çý...。对这个方法的简短解释：

0               # Push a 0 to the stack (can be any digit)
 "D34çý"        # Push the string "D34çý" to the stack
        D       # Duplicate this string
         34ç    # Push 34 converted to an ASCII character to the stack: '"'
            ý   # Join everything on the stack (the 0 and both strings) by '"'
                # (output the result implicitly)

挑战部分：

现在是代码的挑战部分。如我在顶部所述，05AB1E没有内置的UTF-8转换，因此我必须手动执行这些操作。我已将此源用作如何执行此操作的参考：手动将Unicode代码点转换为UTF-8和UTF-16。以下是有关将Unicode字符转换为UTF-8的简短摘要：

将unicode字符转换为其unicode值（即"dЖ丽"成为[100,1046,20029]）
将这些unicode值转换为二进制（即[100,1046,20029]成为["1100100","10000010110","100111000111101"]）
检查字符在以下哪个范围内：
1. 0x00000000 - 0x0000007F （0-127）： 0xxxxxxx
2. 0x00000080 - 0x000007FF （128-2047）： 110xxxxx 10xxxxxx
3. 0x00000800 - 0x0000FFFF （2048-65535）： 1110xxxx 10xxxxxx 10xxxxxx
4. 0x00010000 - 0x001FFFFF （65536-2097151）： 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

也有5或6个字节的范围，但让我们暂时忽略它们。

字符d将在第一个范围内，因此UTF-8中为1个字节；字符Ж在第二个范围内，因此UTF-8中为2个字节；并且字符丽在第三个范围内，因此UTF-8中为3个字节。

将x在模式背后都充满了这些字符的二进制，从右到左。因此带有模式的d（1100100）0xxxxxxx变为01100100; 具有模式的Ж（10000010110）110xxxxx 10xxxxxx变为11010000 10010110; 并且带有模式的丽（100111000111101）1110xxxx 10xxxxxx 10xxxxxx变为1110x100 10111000 10111101，之后将其余x的替换为0：11100100 10111000 10111101。

因此，我在代码中也使用了这种方法。我只查看二进制文件的长度，然后将其x与模式中的数量进行比较，而不是检查实际范围，因为这样可以节省一些字节。

Ç               # Convert each character in the string to its unicode value
 b              # Convert each value to binary
  ε             # Map over these binary strings:
   Dg           #  Duplicate the string, and get its length
     •Xó•       #  Push compressed integer 8657
         18в    #  Converted to Base-18 as list: [1,8,12,17]
            @   #  Check for each if the length is >= to this value
                #  (1 if truthy; 0 if falsey)
   ƶ            #  Multiply each by their 1-based index
    à           #  Pop and get its maximum
     ©          #  Store it in the register (without popping)
   i            #  If it is exactly 1 (first range):
    7j          #   Add leading spaces to the binary to make it of length 7
      0ì        #   And prepend a "0"
   ë            #  Else (any of the other ranges):
    R           #   Reverse the binary
     6ô         #   Split it into parts of size 6
       Rí       #   Reverse it (and each individual part) back
    ć           #   Pop, and push the remainder and the head separated to the stack
     7®-        #   Calculate 7 minus the value from the register
        j       #   Add leading spaces to the head binary to make it of that length
         š      #   Add it at the start of the remainder-list again
    Tì          #   Prepend "10" before each part
      J         #   Join the list together
    1®<×        #   Repeat "1" the value from the register - 1 amount of times
        ì       #   Prepend that at the front
  ]             # Close both the if-else statement and map
   ð0:          # Replace all spaces with "0"
      J         # And join all modified binary strings together
                # (which is output implicitly - with trailing newline)

看到这个05AB1E回答我的（部分如何压缩大的整数？以及如何压缩整数列表？）理解为什么•Xó•18в是[1,8,12,17]。

— 凯文·克鲁伊森
source

3

JavaScript（Node.js），60字节

@Neil和@Shaggy的-15个字节

f=_=>[...Buffer(`f=`+f)].map(x=>x.toString(2).padStart(8,0))

在线尝试！

— 路易斯·费利佩·德·耶稣·穆诺兹
source

padStart(8,0)节省2个字节。

— 尼尔，

规范允许以任何方便的格式输出，因此您可以保留map和放弃join以输出位数组

— Shaggy

60个字节，输出为字节数组。

— 毛茸茸的

谢谢@Neil和@Shaggy !!

— 路易斯·费利佩·德·耶稣·穆诺兹

2

锈，187字节

fn f(o:u8){for c in b"go!g)n;t9(zgns!b!ho!c#%#/huds)(zhg!b_n <27zqshou )#z;19c|#-b_n(:|dmrdzg)1(:|||go!l`ho)(zg)0(:|".iter(){if c^o!=36{print!("{:08b}",c^o);}else{f(0);}}}fn main(){f(1);}

在线尝试！

— 尼捷科布
source

2

Perl 6，46个字节

<say "<$_>~~.EVAL".ords.fmt("%08b",'')>~~.EVAL

在线尝试！

标准quin .fmt("%08b",'')格式将有序值列表格式化为长度为8的二进制，并以空字符串连接。

— 乔·金
source

2

Perl 5，42个字节

$_=q(say unpack'B*',"\$_=q($_);eval");eval

蒂奥

— 纳韦尔·福耶勒（Nahuel Fouilleul）
source

2

Java的10，339个 308 265 227 225 186 184字节

v->{var s="v->{var s=%c%s%1$c;return 0+new java.math.BigInteger(s.format(s,34,s).getBytes()).toString(2);}";return 0+new java.math.BigInteger(s.format(s,34,s).getBytes()).toString(2);}

@NahuelFouilleul消除了不必要的-8个字节&255（以及-35，提醒我注意该挑战的完整程序规范已被撤销，并且现在也允许使用功能。）-
41个字节，感谢@OlivierGrégoire。

在线尝试。

说明：

quine -part：

var s 包含未格式化的源代码String
%s 用于将String本身放入 s.format(...)
%c，%1$c和34用于格式化双引号（"）
s.format(s,34,s) 放在一起

挑战部分：

v->{                         //  Method with empty unused parameter and String return-type
  var s="...";               //   Unformatted source code String
  return 0+                  //   Return, with a leading "0":
   new java.math.BigInteger( //    A BigInteger of:
     s.format(s,34,s)        //     The actual source code String
      .getBytes())           //     Converted to a list of bytes (UTF-8 by default)
   .toString(2);}            //    And convert this BigInteger to a binary-String

— 凯文·克鲁伊森
source

1

使用lambda的265字节，也是因为所有源都是ascii，似乎c&255不需要unsigned int

— Nahuel Fouilleul

@NahuelFouilleul最初的问题是“ 您必须构建一个完整的程序。 ”和“ 您的输出必须打印到STDOUT。 ”，因此，我具有冗长的边框代码，而不是返回字符串的lambda函数。&255但是，由于我们不使用任何非ASCII字符，所以关于不需要的好处是，谢谢！

— 凯文·克鲁伊森

好的，我还不太熟悉用法，但是其他语言（如javascript）会给lambda返回一个字符串，我也不明白为什么在Java中，当我们使用lambda时我们不计算类型和最终分号，在哪里可以找到规则？

— Nahuel Fouilleul

1

好吧，那是我迷路的地方。但是我尝试了，这是一个184个字节的新候选者。告诉我在某处是否错了;）

— OlivierGrégoire19年

1

@OlivierGrégoire啊，不错的方法！完全忘BigInteger了转换为二进制字符串很短。通过改变2个字节return'0'+来return 0+。嗯，为什么领导这个0必要呢？它混淆了我所有的内部二进制字符串也有这一领先0，但就在第一个没有使用的时候BigInteger.toString(2)..

— 凯文Cruijssen

2

Python 2中，68 67个字节

_="print''.join(bin(256|ord(i))[3:]for i in'_=%r;exec _'%_)";exec _

在线尝试！

这个答案的修改

通过删除'in'后的空格来-1字节（感谢@mdahmoune）

— 银河系90
source

-1个字节：您可以在in

— mdahmoune

您尚未更新TIO链接。同时，我试图做'%08b'%ord(i)的，而不是bin(256|ord(i))[3:]，但它并没有出于某种原因

— 乔金

2

R，138114字节

x=function(){rev(rawToBits(rev(charToRaw(sprintf("x=%s;x()",gsub("\\s","",paste(deparse(x),collapse="")))))))};x()

在线尝试！

使用R的功能将函数视作其字符表示。该revs的必要的，因为rawToBits放了至少显著位第一。as.integer之所以需要此位，是因为否则这些位将显示前导零。

一旦我意识到允许任何方便的输出就进行了编辑。在原始字节数上也被淘汰一位。

— 尼克·肯尼迪
source

1

C＃（Visual C＃交互式编译器），221字节

var s="var s={0}{1}{0};Write(string.Concat(string.Format(s,(char)34,s).Select(z=>Convert.ToString(z,2).PadLeft(8,'0'))));";Write(string.Concat(string.Format(s,(char)34,s).Select(z=>Convert.ToString(z,2).PadLeft(8,'0'))));

在线尝试！

带标志的C＃（Visual C＃交互式编译器）`/u:System.String`，193个字节

var s="var s={0}{1}{0};Write(Concat(Format(s,(char)34,s).Select(z=>Convert.ToString(z,2).PadLeft(8,'0'))));";Write(Concat(Format(s,(char)34,s).Select(z=>Convert.ToString(z,2).PadLeft(8,'0'))));

在线尝试！

— 无知的体现
source

1

Bash + GNU工具，48个字节

trap -- 'trap|xxd -b|cut -b9-64|tr -dc 01' EXIT

蒂奥

— 纳韦尔·福耶勒（Nahuel Fouilleul）
source

谢谢，更新的确是最短的版本，否则应从陷阱输出中删除

— Nahuel Fouilleul

Quine以二进制形式输出自身

规则

例

字符串到二进制UTF-8转换器

V，28（或16？）拉丁1字节（35 UTF-8字节）

V，16字节

CJam，20个字节

05AB1E，105 个字节

JavaScript（Node.js），60字节

锈，187字节

Perl 6，46个字节

Perl 5，42个字节

Java的10，339个 308 265 227 225 186 184字节

Python 2中，68 67个字节

R，138114字节

C＃（Visual C＃交互式编译器），221字节

带标志的C＃（Visual C＃交互式编译器）/u:System.String，193个字节

Bash + GNU工具，48个字节

带标志的C＃（Visual C＃交互式编译器）`/u:System.String`，193个字节