在韩文两组键盘和qwerty键盘之间转换


14

介绍

它有点像DVORAK键盘布局,但要难得多。

让我们先谈谈韩文键盘。正如您在Wikipedia中看到的那样,有一个Kor / Eng键可以在韩文和英文键集之间进行切换。

韩国人有时输入错误:他们试图在qwerty键盘上用韩语书写,或者在两组键盘上用英语书写。

因此,这就是问题所在:如果给定在两组键盘中输入的韩语字符,请将其转换为在qwerty键盘中键入的字母字符。如果使用qwerty输入了给定的字母字符,请将其更改为两组键盘。

两件套键盘

这是两套键盘布局:

ㅂㅈㄷㄱㅅㅛㅕㅑㅐㅔ
 ㅁㄴㅇㄹㅎㅗㅓㅏㅣ
  ㅋㅌㅊㅍㅠㅜㅡ

并使用shift键:

ㅃㅉㄸㄲㅆㅛㅕㅑㅒㅖ

仅第一行发生变化,而其他行则保持不变。

关于韩文字符

如果到此结束,这可能很容易,但是没有。当您键入

dkssud, tprP!

输出不会以这种方式显示:

ㅇㅏㄴㄴㅕㅇ, ㅅㅔㄱㅖ!

但是这样:

안녕, 세계!(means Hello, World!)

这使事情变得更加困难。

朝鲜语字符分为三部分:“宗城(辅音)”,“正城(元音)”和“正城(音节结尾的辅音:可以为空白)”,您必须将其分开。

幸运的是,有办法做到这一点。

如何分开

朝鲜有19个成城,21个成城和28个成城(带空白),0xAC00是朝鲜语的第一个字符“가”。使用此功能,我们可以将朝鲜语字符分为三个部分。这是两个键盘中每个键盘的顺序及其位置。

选择顺序:

ㄱㄲㄴㄷㄸㄹㅁㅂㅃㅅㅆㅇㅈㅉㅊㅋㅌㅍㅎ
r R s e E f a q Q t T d w W c z x v g

se城订单:

ㅏㅐㅑㅒㅓㅔㅕㅖㅗㅘㅙㅚㅛㅜㅝㅞㅟㅠㅡㅢㅣ
k o i O j p u P h hk ho hl y n nj np nl b m ml l

钟城顺序:

()ㄱㄲㄳㄴㄵㄶㄷㄹㄺㄻㄼㄽㄾㄿㅀㅁㅂㅄㅅㅆㅇㅈㅊㅋㅌㅍㅎ
()r R rt s sw sg e f fr fa fq ft fx fv fg a q qt t T d w c z x v g

让我们说(unicode value of some character) - 0xAC00Korean_code,并CHOSEONG,Jungseong的指数,Jongseong是ChoJungJong

Korean_code(Cho * 21 * 28) + Jung * 28 + Jong

为方便起见以下是将韩文字符与该韩文网站分开的JavaScript代码。

var rCho = [ "ㄱ", "ㄲ", "ㄴ", "ㄷ", "ㄸ", "ㄹ", "ㅁ", "ㅂ", "ㅃ", "ㅅ", "ㅆ", "ㅇ", "ㅈ", "ㅉ", "ㅊ", "ㅋ", "ㅌ", "ㅍ", "ㅎ" ];
var rJung =[ "ㅏ", "ㅐ", "ㅑ", "ㅒ", "ㅓ", "ㅔ", "ㅕ", "ㅖ", "ㅗ", "ㅘ", "ㅙ", "ㅚ", "ㅛ", "ㅜ", "ㅝ", "ㅞ", "ㅟ", "ㅠ", "ㅡ", "ㅢ", "ㅣ" ];
var rJong = [ "", "ㄱ", "ㄲ", "ㄳ", "ㄴ", "ㄵ", "ㄶ", "ㄷ", "ㄹ", "ㄺ", "ㄻ", "ㄼ", "ㄽ", "ㄾ","ㄿ", "ㅀ", "ㅁ", "ㅂ", "ㅄ", "ㅅ", "ㅆ", "ㅇ", "ㅈ", "ㅊ", "ㅋ", "ㅌ", "ㅍ", "ㅎ" ];
var cho, jung, jong;
var sTest = "탱";
var nTmp = sTest.charCodeAt(0) - 0xAC00;
jong = nTmp % 28; // Jeongseong
jung = ((nTmp - jong) / 28 ) % 21 // Jungseong
cho = ( ( (nTmp - jong) / 28 ) - jung ) / 21 // Choseong

alert("Choseong:" + rCho[cho] + "\n" + "Jungseong:" + rJung[jung] + "\n" + "Jongseong:" + rJong[jong]);

组装时

  1. 需要注意的是是其他jungseongs的组合。
ㅗ+ㅏ=ㅘ, ㅗ+ㅐ=ㅙ, ㅗ+ㅣ=ㅚ, ㅜ+ㅓ=ㅝ, ㅜ+ㅔ=ㅞ, ㅜ+ㅣ=ㅟ, ㅡ+ㅣ=ㅢ
  1. 车城是必须的 这意味着,如果frk给出的话ㄹㄱㅏ,它可以以两种方式改变:ㄺㅏㄹ가。然后,您必须将其转换为选择的方式。如果jjjrjr给定的话,则ㅓㅓㅓㄱㅓㄱ领导s没有可以选择的任何东西,但是第四个具有可以选择的东西,因此将其更改为ㅓㅓㅓ걱

又如:세계tprP)。可以将其更改为섹ㅖ(ㅅㅔㄱ)(ㅖ)),但是由于选择是必需的,因此将其更改为세계(ㅅㅔ)(ㄱㅖ))。

例子

输入1

안녕하세요

输出1

dkssudgktpdy

输入2

input 2

输出2

ㅑㅞㅕㅅ 2

输入3

힘ㄴㄴ

输出3

glass

输入4

아희(Aheui) is esolang which you can program with pure Korean characters.

输出4

dkgml(모뎌ㅑ) ㅑㄴ ㄷ내ㅣ뭏 조ㅑ초 ㅛㅐㅕ ㅊ무 ㅔ갷ㄱ므 쟈소 ㅔㅕㄱㄷ ㅏㅐㄱㄷ무 촘ㄱㅁㅊㅅㄷㄱㄴ.

输入5

dkssud, tprP!

输出5

안녕, 세계!

输入6

ㅗ디ㅣㅐ, 째깅! Hello, World!

输出6

hello, World! ㅗ디ㅣㅐ, 째깅!

最短的代码获胜。(以字节为单位)

方便您的新规则

您可以关闭A两组键盘中没有对应字符的字符。所以AheuiAㅗ뎌ㅑ正常。但是,如果更改Aheui모뎌ㅑ,则可以得到-5点,因此您可以赚取5个字节。

您可以分隔两个jungseongs(喜欢ㅗ+ㅏ)。像rhk고ㅏ,或者howㅗㅐㅈ。但是,如果你把它(如rhkhowㅙㅈ),你可以赚更多的-5点。


jungseong命令部分,缺少字母之一。我看到21个韩文符号,但只有20个字母(对)。编辑:似乎缺少一个试运行l后,ml为韩国符号
凯文·克鲁伊森

@KevinCruijssen编辑。l为ㅣ。
LegenDUST

1
有时可能会有不止一种解释。例如,fjfau可以解释为럶ㅕ럴며。我们如何解决这个问题?
尼克·肯尼迪

1
@LegenDUST嗯,我看不懂韩语,所以我不得不跟你解释。; p关于tprP测试用例5:转换为ㅅㅔㄱㅖ,其中一个choiceong,一个jungseong和一个jongseong。那么,是否应该将其转换为섷ㅖ(分组为(ㅅㅔㄱ)(ㅖ))而不是세계(分组为(ㅅㅔ)(ㄱㅖ))?在前面的评论中,您声明它是通过键入来解释的,因此我希望将ㅅㅔㄱ其转换为。还是朝鲜语是从右到左而不是从左到右打字?
凯文·克鲁伊森

1
来自Unicode.org的@KevinCruijssen PDF文件。AC00()至D7AF()。
LegenDUST

Answers:


6

果冻296264字节

Ẏœṣjƭƒ
“ȮdȥŒ~ṙ7Ṗ:4Ȧịعʂ ="÷Ƥi-ẓdµ£f§ñỌ¥ẋaḣc~Ṡd1ÄḅQ¥_æ>VÑʠ|⁵Ċ³(Ė8ịẋs|Ṇdɼ⁼:Œẓİ,ḃṙɠX’ṃØẠs2ḟ€”A
“|zƒẉ“®6ẎẈ3°Ɠ“⁸)Ƙ¿’ḃ2’T€ị¢
¢ĖẈṪ$ÞṚƊ€
3£OŻ€3¦ŒpFḟ0Ɗ€J+“Ḥœ’,ƲyO2£OJ+⁽.[,Ʋ¤y¹ỌŒḊ?€µ¢ṖŒpZF€’ḋ588,28+“Ḥþ’Ʋ0;,ʋ/ṚƲ€ñṣ0ḊḢ+®Ṫ¤Ɗ;ṫ®$Ɗ¹Ḋ;⁶Ṫ⁼ṁ@¥¥Ƈ@¢ṪẈṪ‘;Ʋ€¤ḢƲ©?€ṭḢƲF2£żJ+⁽.[Ɗ$ẈṪ$ÞṚ¤ñỌ

在线尝试!

以字符串为参数并返回字符串(隐式打印)的完整程序。这可以通过三遍:首先将所有韩语字符转换为拉丁字母的代码点列表。然后,它会识别并构建复合韩文字符。最后,它将所有剩余的流浪拉丁字母转换为与韩国语对应的字母。请注意,规范中未出现的其他字符和拉丁字母(例如A)被单独保留。

如果需要转换为超出规格的小写大写字母,则可以额外花费10个字节来完成。

说明

Helper链接1:带有参数x和y的二元链接。x是成对的搜索和替换子列表的列表。y将用相应的替换子列表替换每个搜索子列表

Ẏ      | Tighten (reduce to a single list of alternating search and replace sublists)
     ƒ | Reduce using y as starting argument and the following link:
    ƭ  | - Alternate between using the following two links:
 œṣ    |   - Split at sublist
   j   |   - Join using sublist

助手链接2:拉丁字符/字符对的列表,其顺序与朝鲜语字符的Unicode顺序相对应

“Ȯ..X’          | Base 250 integer 912...
      ṃØẠ       | Base decompress into Latin letters (A..Za..z)
         s2     | Split into twos
           ḟ€”A | Filter out A from each (used as filler for the single characters)

助手链接3:用于成城,中城和中城的拉丁字符列表

“|...¿’        | List of base 250 integers, [1960852478, 2251799815782398, 2143287262]
       ḃ2      | Convert to bijective base 2
         ’     | Decrease by 1
          T€   | List of indices of true values for each list
            ị¢ | Index into helper link 2

辅助链接4:上面列出的拉丁字符列表按长度从小到大的顺序排列和排序

¢         | Helper link 3 as a nilad
       Ɗ€ | For each list, the following three links as a monad
 Ė        | - Enumerate (i.e. prepend a sequential index starting at 1 to each member of the list)
    $Þ    | - Sort using, as a key, the following two links as a monad
  Ẉ       |   - Lengths of lists
   Ṫ      |   - Tail (this will be the length of the original character or characters)
      Ṛ   | - Reverse

主链接:以果冻字符串作为参数并返回翻译后的果冻字符串的Monad

第1节:将词素块转换为相应拉丁字符的Unicode代码点

第1.1节:获取制作块所需的拉丁字符列表

3£      | Helper link 3 as a nilad (lists of Latin characters used for Choseong, Jungseong and Jongseong)
  O     | Convert to Unicode code points
   Ż€3¦ | Prepend a zero to the third list (Jongseong)

1.2节:创建这些字母的所有组合(19×21×28 = 11,172以适当的词汇顺序组合)

Œp      | Cartesian product
     Ɗ€ | For each combination:
  F     | - Flatten
   ḟ0   | - Filter zero (i.e. combinations with an empty Jonseong)

1.3节:将块的Unicode代码点与相应的拉丁字符列表配对,并使用它们来翻译输入字符串中的语素块

       Ʋ   | Following as a monad
J          | - Sequence from 1..11172
 +“Ḥœ’     | - Add 44031
      ,    | - Pair with the blocks themelves
        y  | Translate the following using this pair of lists
         O | - The input string converted to Unicode code points

第2节:将第1节输出中的各个朝鲜语字符转换为与拉丁语等效的代码点

          ¤  | Following as a nilad
2£           | Helper link 2 (list of Latin characters/character pairs in the order that corresponds to the Unicode order of the Korean characters)
  O          | Convert to Unicode code points
         Ʋ   | Following as a monad:
   J         | - Sequence along these (from 1..51)
    +⁽.[     | - Add 12592
        ,    | - Pair with list of Latin characters
           y | Translate the output from section 1 using this mapping

第3节:在第2节的输出中整理未翻译的字符(之所以有效,因为从韩语翻译的任何内容现在都将在子列表中,因此深度为1)

  ŒḊ?€  | For each member of list if the depth is 1:
¹       | - Keep as is
 Ọ      | Else: convert back from Unicode code points to characters
      µ | Start a new monadic chain using the output from this section as its argument

第4节:将拉丁字符的词素块转换为韩文

第4.1节:获取城城与正城的所有可能组合

¢    | Helper link 4 (lists of Latin characters enumerated and sorted in decreasing order of length)
 Ṗ   | Discard last list (Jongseong)
  Œp | Cartesian product

第4.2节:用基础语素块的Unicode代码点标记每个组合(即没有Jongseong)

                       Ʋ€ | For each Choseong/Jungseong combination
Z                         | - Transpose, so that we now have e.g. [[1,1],["r","k"]]
 F€                       | - Flatten each, joining the strings together
                    ʋ/    | - Reduce using the following as a dyad (effectively using the numbers as left argument and string of Latin characters as right)
                Ʋ         |   - Following links as a monad
   ’                      |     - Decrease by 1
    ḋ588,28               |     - Dot product with 21×28,28
           +“Ḥþ’          |     - Add 44032
                 0;       |     - Prepend zero; used for splitting in section 4.3 before each morphemic block (Ż won’t work because on a single integer it produces a range)
                   ,      |     - Pair with the string of Latin characters
                      Ṛ   |   - Reverse (so we now have e.g. ["rk", 44032]

第4.3节:将第3节输出中的这些拉丁字符字符串替换为基本词素块的Unicode代码点

ñ   | Call helper link 1 (effectively search and replace)
 ṣ0 | Split at the zeros introduced in section 4.2

第4.4节:确定每个语素块中是否有城城

                                        Ʋ | Following as a monad:
Ḋ                                         | - Remove the first sublist (which won’t contain a morphemic block; note this will be restored later)
                                     €    | - For each of the other lists Z returned by the split in section 4.3 (i.e. each will have a morphemic block at the beginning):
                                  Ʋ©?     |   - If the following is true (capturing its value in the register in the process) 
             Ḋ                            |     - Remove first item (i.e. the Unicode code point for the base morphemic block introduced in section 4.3)
              ;⁶                          |     - Append a space (avoids ending up with an empty list if there is nothing after the morphemic block code point)
                                          |       (Output from the above will be referred to as X below)
                                ¤         |       * Following as a nilad (call this Y):
                        ¢                 |         * Helper link 4
                         Ṫ                |         * Jongseong
                              Ʋ€          |         * For each Jongseong Latin list:
                          Ẉ               |           * Lengths of lists
                           Ṫ              |           * Tail (i.e. length of Latin character string)
                            ‘             |           * Increase by 1
                             ;            |           * Prepend this (e.g. [1, 1, "r"]
                     ¥Ƈ@                  |     - Filter Y using X from above and the following criteria
                Ṫ                         |       - Tail (i.e. the Latin characters for the relevant Jongseong
                 ⁼ṁ@¥                     |       - is equal to the beginning of X trimmed to match the relevant Jongseong (or extended but this doesn’t matter since no Jongseong are a double letter)
                                  Ḣ       |       - First matching Jongseong (which since they’re sorted by descending size order will prefer the longer one if there is a matching shorter one)
           Ɗ                              | - Then: do the following as a monad (note this is now using the list Z mentioned much earlier):
      Ɗ                                   |   - Following as a monad
 Ḣ                                        |     - Head (the Unicode code point of the base morphemic block)
  +®Ṫ¤                                    |     - Add the tail of the register (the position of the matched Jongsepng in the list of Jongseong)
       ;                                  |   - Concatenate to:
        ṫ®$                               |     - The rest of the list after removing the Latin characters representing the Jongseong
            ¹                             | - Else: leave the list untouched (no matching Jongseong)
                                       ṭ  | - Prepend:
                                        Ḣ |   - The first sublist from the split that was removed at the beginning of this subsection

第5节:处理与韩文匹配但不属于morphemuc块的拉丁字母

F                   | Flatten
                ¤   | Following as a nilad
 2£                 | - Helper link 2 (Latin characters/pairs of characters in Unicode order of corresponding Korean character)
          $         | - Following as a monad
   ż     Ɗ          |   - zip with following as a monad
    J               |     - Sequence along helper link 2 (1..51)
     +⁽.[           |     - Add 12592
             $Þ     | - Sort using following as key
           Ẉ        |   - Lengths of lists
            Ṫ       |   - Tail (i.e. length of Latin string)
               Ṛ    | - Reverse
                 ñ  | Call helper link 1 (search Latin character strings and replace with Korean code points)
                  Ọ | Finally, convert all Unicode code points back to characters and implicitly output

1
输出是错误的:我放的时候我例外cor,但是它给了cBor。并且它不会更改ccan必须转换为ㅊ무,但转换为c무。而且我还排除了规范中未出现的大字符会取消大写,但这可能没问题。
LegenDUST

@LegenDUST c的问题已修复。我用作A单个字符的第二个字符的占位符,由于某种原因,后一个字符c以形式出现B。可以转换为其他字母的小写字母,但是感觉到已经很困难的挑战变得不必要了。
肯尼迪

我知道这很难。因此,我添加了一条新规则:如果您进行资本缩减,您可以赚取5个字节。但这很好。
LegenDUST

3

的JavaScript(Node.js的)587 582 575 569 557个 554 550 549字节

tww你不知道string.charCodeAt() == string.charCodeAt(0)

s=>s.replace(eval(`/[ㄱ-힣]|${M="(h[kol]?|n[jpl]?|ml?|[bi-puyOP])"}|([${S="rRseEfaqQtTdwWczxvg"}])(${M}((s[wg]|f[raqtxvg]|qt|[${S}])(?!${M}))?)?/g`,L="r,R,rt,s,sw,sg,e,E,f,fr,fa,fq,ft,fx,fv,fg,a,q,Q,qt,t,T,d,w,W,c,z,x,v,g,k,o,i,O,j,p,u,P,h,hk,ho,hl,y,n,nj,np,nl,n,m,ml,l".split`,`,l=L.filter(x=>!/[EQW]/.test(x)),I="indexOf"),(a,E,A,B,C,D)=>a<"~"?E?X(E):A&&C?F(43193+S[I](A)*588+L[I](C)*28+l[I](D)):X(A)+X(C)+X(D):(b=a.charCodeAt()-44032)<0?L[b+31439]||a:S[b/588|0]+L[30+b/28%21|0]+["",...l][b%28],F=String.fromCharCode,X=n=>n?F(L[I](n)+12593):"")

在线尝试!

547如果可以忽略字母和韩文字母以外的字符。

好吧,我花了很长时间努力写这篇文章,但这应该行得通。没有使用韩文的Jamo /音节,因为它们太贵了(每次使用3个字节)。在正则表达式中使用以节省字节。

s=>                                                    // Main Function:
 s.replace(                                            //  Replace all convertible strings:
  eval(
   `/                                                  //   Matching this regex:
    [ㄱ-힣]                                             //   ($0) All Korean jamos and syllables
    |${M="(h[kol]?|n[jpl]?|ml?|[bi-puyOP])"}           //   ($1) Isolated jungseong codes
    |([${S="rRseEfaqQtTdwWczxvg"}])                    //   ($2) Choseong codes (also acts as lookup)
     (                                                 //   ($3) Jungseong and jongseong codes:
      ${M}                                             //   ($4)  Jungseong codes
      (                                                //   ($5)  Jongseong codes:
       (                                               //   ($6)
        s[wg]|f[raqtxvg]|qt                            //          Diagraphs unique to jongseongs
        |[${S}]                                        //          Or jamos usable as choseongs
       ) 
       (?!${M})                                        //         Not linked to the next jungseong
      )?                                               //        Optional to match codes w/o jongseong
     )?                                                //       Optional to match choseong-only codes
   /g`,                                                //   Match all
   L="(...LOOKUP TABLE...)".split`,`,                  //   Lookup table of codes in jamo order
   l=L.filter(x=>!/[EQW]/.test(x)),                    //   Jongseong lookup - only first half is used
   I="indexOf"                                         //   [String|Array].prototype.indexOf
  ),
  (a,E,A,B,C,D)=>                                      //   Using this function:
   a<"~"?                                              //    If the match is code (alphabets):
    E?                                                 //     If isolated jungseongs code:
     X(E)                                              //      Return corresponding jamo
    :A&&C?                                             //     Else if complete syllable code:
     F(43193+S[I](A)*588+L[I](C)*28+l[I](D))           //      Return the corresponding syllable
    :X(A)+X(C)+X(D)                                    //     Else return corresponding jamos joined
   :(b=a.charCodeAt()-44032)<0?                        //    Else if not syllable:
    L[b+31439]||a                                      //     Return code if jamo (if not, ignore)
   :S[b/588|0]+L[30+b/28%21|0]+["",...l][b%28],        //    Else return code for the syllable
  F=String.fromCharCode,                               //   String.fromCharCode
  X=n=>                                                //   Helper function to convert code to jamo
   n?                                                  //    If not undefined:
    F(L[I](n)+12593)                                   //     Return the corresponding jamo
   :""                                                 //    Else return empty string
 )

2

Wolfram语言(数学)405个 401 400字节

c=CharacterRange
p=StringReplace
q=StringReverse
r=Reverse
t=Thread
j=Join
a=j[alphabet@"Korean",4520~c~4546]
x=j[#,r/@#]&@t[a->Characters@"rRseEfaqQtTdwWczxvgkoiOjpuPh"~j~StringSplit@"hk ho hl y n nj np nl b m ml l r R rt s sw sg e f fr fa fq ft fx fv fg a q qt t T d w c z x v g"]
y=t[""<>r@#&/@Tuples@TakeList[Insert[a,"",41]~p~x~p~x,{19,21,28}]->44032~c~55203]
f=q@p[q@#,#2]&
g=f[#,r/@y]~p~x~f~y&

在线尝试!

略松一口

要在Mathematica中进行测试,只需替换 alphabetAlphabet; 但是,TIO不支持Wolfram Cloud,因此我Alphabet["Korean"]在标头中定义了它。

我们首先将所有韩文音节分解为韩文字母,然后交换拉丁文和韩文字符,然后重新构成音节。


1
测试用例input 2结果ㅑㅜㅔㅕㅅ 2而不是ㅑㅞㅕㅅ 2您的TIO。虽然在解决同样的情况我工作的,因为这两个的jungseong,我的印象是唯一CHOSEONG + jungseong + jongseong或CHOSEONG + jungseong +空将结合下。我要求OP进行验证,为什么会ㅜㅔ变成
凯文·克鲁伊森

@KevinCruijssen n(np)本身就是一名正城
Nick Kennedy

1
对于两个字符的辅音或元音,这似乎无法正常工作。例如fnpfa应为单个字符,但最终以루ㅔㄹㅁ
Nick Kennedy

修复中。它不应该花费太多。
lirtosiast

2

Java 19, 19,1133 1126 1133字节

s->{String r="",k="ㄱㄲㄴㄷㄸㄹㅁㅂㅃㅅㅆㅇㅈㅉㅊㅋㅌㅍㅎ ㅏㅐㅑㅒㅓㅔㅕㅖㅗㅘㅙㅚㅛㅜㅝㅞㅟㅠㅡㅢㅣ ㄱㄲㄳㄴㄵㄶㄷㄹㄺㄻㄼㄽㄾㄿㅀㅁㅂㅄㅅㅆㅇㅈㅊㅋㅌㅍㅎ",K[]=k.split(" "),a="r R s e E f a q Q t T d w W c z x v g k o i O j p u P h hk ho hl y n nj np nl b m ml l r R rt s sw sg e f fr fa fq ft fx fv fg a q qt t T d w c z x v g";var A=java.util.Arrays.asList(a.split(" "));k=k.replace(" ","");int i,z,y,x=44032;for(var c:s.toCharArray())if(c>=x&c<55204){z=(i=c-x)%28;y=(i=(i-z)/28)%21;s=s.replace(c+r,r+K[0].charAt((i-y)/21)+K[1].charAt(y)+(z>0?K[2].charAt(z-1):r));}for(var c:s.split(r))r+=c.charAt(0)<33?c:(i=k.indexOf(c))<0?(i=A.indexOf(c))<0?c:k.charAt(i):A.get(i);for(i=r.length()-1;i-->0;r=z>0?r.substring(0,i)+(char)(K[0].indexOf(r.charAt(i))*588+K[1].indexOf(r.charAt(i+1))*28+((z=K[2].indexOf(r.charAt(i+2)))<0?0:z+1)+x)+r.substring(z<0?i+2:i+3):r)for(z=y=2;y-->0;)z&=K[y].contains(r.charAt(i+y)+"")?2:0;for(var p:"ㅗㅏㅘㅗㅐㅙㅗㅣㅚㅜㅓㅝㅜㅔㅞㅜㅣㅟㅡㅣㅢ".split("(?<=\\G...)"))r=r.replace(p.substring(0,2),p.substring(2));return r;}

ASDFGHJKLZXCVBNM自大写字母以来的输出不变.toLowerCase()成本高于-5奖金。

返回+7个字节,作为unicode值20,000以上的非韩语字符的错误修复(感谢@NickKennedy注意)。

在线尝试。

说明:

s->{                         // Method with String as both parameter and return-type
  String r="",               //  Result-String, starting empty
         k="ㄱㄲㄴㄷㄸㄹㅁㅂㅃㅅㅆㅇㅈㅉㅊㅋㅌㅍㅎ ㅏㅐㅑㅒㅓㅔㅕㅖㅗㅘㅙㅚㅛㅜㅝㅞㅟㅠㅡㅢㅣ ㄱㄲㄳㄴㄵㄶㄷㄹㄺㄻㄼㄽㄾㄿㅀㅁㅂㅄㅅㅆㅇㅈㅊㅋㅌㅍㅎ",
                             //  String containing the Korean characters
         K[]=k.split(" "),   //  Array containing the three character-categories
         a="r R s e E f a q Q t T d w W c z x v g k o i O j p u P h hk ho hl y n nj np nl b m ml l r R rt s sw sg e f fr fa fq ft fx fv fg a q qt t T d w c z x v g"; 
                             //  String containing the English characters
  var A=java.util.Arrays.asList(a.split(" "));
                             //  List containing the English character-groups
  k=k.replace(" ","");       //  Remove the spaces from the Korean String
  int i,z,y,                 //  Temp integers
      x=44032;               //  Integer for 0xAC00
  for(var c:s.toCharArray()) //  Loop over the characters of the input:
    if(c>=x&c<55204){        //   If the unicode value is in the range [44032,55203]
                             //   (so a Korean combination character):
      z=(i=c-x)%28;          //    Set `i` to this unicode value - 0xAC00,
                             //    And then `z` to `i` modulo-28
      y=(i=(i-z)/28)%21;     //    Then set `i` to `i`-`z` integer divided by 28
                             //    And then `y` to `i` modulo-21
      s=s.replace(c+r,       //    Replace the current non-Korean character with:
        r+K[0].charAt((i-y)/21)
                             //     The corresponding choseong
         +K[1].charAt(y)     //     Appended with jungseong
         +(z>0?K[2].charAt(z-1):r));}
                             //     Appended with jongseong if necessary
  for(var c:s.split(r))      //  Then loop over the characters of the modified String:
    r+=                      //   Append to the result-String:
       c.charAt(0)<33?       //    If the character is a space:
        c                    //     Simply append that space
       :(i=k.indexOf(c))<0?  //    Else-if the character is NOT a Korean character:
         (i=A.indexOf(c))<0? //     If the character is NOT in the English group List:
          c                  //      Simply append that character
         :                   //     Else:
          k.charAt(i)        //      Append the corresponding Korean character
       :                     //    Else:
        A.get(i);            //     Append the corresponding letter
  for(i=r.length()-1;i-->0   //  Then loop `i` in the range (result-length - 2, 0]:
      ;                      //    After every iteration:
       r=z>0?                //     If a group of Korean characters can be merged:
          r.substring(0,i)   //      Leave the leading part of the result unchanged
          +(char)(K[0].indexOf(r.charAt(i))
                             //      Get the index of the first Korean character,
                   *588      //      multiplied by 588
                  +K[1].indexOf(r.charAt(i+1))
                             //      Get the index of the second Korean character,
                   *28       //      multiplied by 28
                  +((z=K[2].indexOf(r.charAt(i+2)))
                             //      Get the index of the third character
                    <0?      //      And if it's a Korean character in the third group:
                      0:z+1) //       Add that index + 1
                  +x         //      And add 0xAC00
                 )           //      Then convert that integer to a character
          +r.substring(z<0?i+2:i+3) 
                             //      Leave the trailing part of the result unchanged as well
         :                   //     Else (these characters cannot be merged)
          r)                 //      Leave the result the same
     for(z=y=2;              //   Reset `z` to 2
         y-->0;)             //   Inner loop `y` in the range (2, 0]:
       z&=                   //    Bitwise-AND `z` with:
         K[y].contains(      //     If the `y`'th Korean group contains
           r.charAt(i+y)+"")?//     the (`i`+`y`)'th character of the result
          2                  //      Bitwise-AND `z` with 2
         :                   //     Else:
          0;                 //      Bitwise-AND `z` with 0
                             //   (If `z` is still 2 after this inner loop, it means
                             //    Korean characters can be merged)
  for(var p:"ㅗㅏㅘㅗㅐㅙㅗㅣㅚㅜㅓㅝㅜㅔㅞㅜㅣㅟㅡㅣㅢ".split("(?<=\\G...)"))
                             //  Loop over these Korean character per chunk of 3:
    r=r.replace(p.substring(0,2),
                             //   Replace the first 2 characters in this chunk
         p.substring(2));    //   With the third one in the result-String
  return r;}                 //  And finally return the result-String

1
他们是从44032到55203。您已经为起始位置编码了。结局只是44032 + 19×21×28 - 1
尼克·肯尼迪

现在运作良好。以为我已经投票赞成但没有投票赞成,所以就到这里!
肯尼迪
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.