ASCII number
已不推荐使用,并且不能与unicode文本一起正常使用,请使用id of someCharacter
:
set charNum to id of "בְּ" -- this return id of 3 characters because "בְּ" is a composed character
log charNum
set charNum to id of "ב"
log charNum
-->result:
(*1489, 1456, 1468*)
(*1489*)
因此,我不知道如何在纯AppleScript中执行此操作。
但是,您可以在以下命令中使用perl命令do shell script
:
-- The text look not good in this code block, but it will be correct after the compilation of the script
set theString to "בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃
וְהָאָ֗רֶץ הָיְתָ֥ה תֹ֙הוּ֙ וָבֹ֔הוּ וְחֹ֖שֶׁךְ עַל־פְּנֵ֣י תְה֑וֹם וְר֣וּחַ אֱלֹהִ֔ים מְרַחֶ֖פֶת עַל־פְּנֵ֥י הַמָּֽיִם׃
וַיֹּ֥אמֶר אֱלֹהִ֖ים יְהִ֣י א֑וֹר וַֽיְהִי־אֽוֹר׃
וַיַּ֧רְא אֱלֹהִ֛ים אֶת־הָא֖וֹר כִּי־ט֑וֹב וַיַּבְדֵּ֣ל אֱלֹהִ֔ים בֵּ֥ין הָא֖וֹר וּבֵ֥ין הַחֹֽשֶׁךְ׃
וַיִּקְרָ֨א אֱלֹהִ֤ים ׀ לָאוֹר֙ י֔וֹם וְלַחֹ֖שֶׁךְ קָ֣רָא לָ֑יְלָה וַֽיְהִי־עֶ֥רֶב וַֽיְהִי־בֹ֖קֶר י֥וֹם אֶחָֽד׃ (פ)"
return do shell script "perl -CSD -pe 'use utf8; s~\\p{NonspacingMark}~~og; s~־|׀~ ~g; s~ +~ ~g;' <<< " & quoted form of theString
这是perl脚本的简要说明
- 的
-CSD
选项:输出与所述误差将是UTF-8,输入被假定为UTF-8
s~\\p{NonspacingMark}~~og
:删除非空格标记
s~־|׀~ ~g
:更换所有־
和׀
一个空格
s~ +~ ~g
:连续用一个空格替换多个空格
如果AppleScript从文件中读取文本,则可以使用perl读取文件:
do shell script "perl -CSD -pe 'use utf8; s~\\p{NonspacingMark}~~og; s~־|׀~ ~g; s~ +~ ~g;' < " & quoted form of posix path of pathOfTheTextFile
该文件的编码必须为utf8。
另一个解决方案是使用Cocoa-AppleScript:
use framework "Foundation"
use scripting additions
-- The text look not good in this code block, but it will be correct after the compilation of the script
set theString to "בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃
וְהָאָ֗רֶץ הָיְתָ֥ה תֹ֙הוּ֙ וָבֹ֔הוּ וְחֹ֖שֶׁךְ עַל־פְּנֵ֣י תְה֑וֹם וְר֣וּחַ אֱלֹהִ֔ים מְרַחֶ֖פֶת עַל־פְּנֵ֥י הַמָּֽיִם׃
וַיֹּ֥אמֶר אֱלֹהִ֖ים יְהִ֣י א֑וֹר וַֽיְהִי־אֽוֹר׃
וַיַּ֧רְא אֱלֹהִ֛ים אֶת־הָא֖וֹר כִּי־ט֑וֹב וַיַּבְדֵּ֣ל אֱלֹהִ֔ים בֵּ֥ין הָא֖וֹר וּבֵ֥ין הַחֹֽשֶׁךְ׃
וַיִּקְרָ֨א אֱלֹהִ֤ים ׀ לָאוֹר֙ י֔וֹם וְלַחֹ֖שֶׁךְ קָ֣רָא לָ֑יְלָה וַֽיְהִי־עֶ֥רֶב וַֽיְהִי־בֹ֖קֶר י֥וֹם אֶחָֽד׃ (פ)"
return stripString(theString)
on stripString(t)
set sourceString to current application's NSMutableString's stringWithString:t
set myOpt to current application's NSRegularExpressionSearch
set theSuccess to sourceString's applyTransform:(current application's NSStringTransformStripCombiningMarks) |reverse|:false range:(current application's NSMakeRange(0, (sourceString's |length|))) updatedRange:(missing value)
if theSuccess then
-- *** Replace all "־" and "׀" by a space, each character must be separated by a vertical bar character, e.g. "a|d|z"
sourceString's replaceOccurrencesOfString:"־|׀" withString:" " options:myOpt range:(current application's NSMakeRange(0, (sourceString's |length|)))
-- **** Replace multiple spaces in a row by one space
sourceString's replaceOccurrencesOfString:" +" withString:" " options:myOpt range:(current application's NSMakeRange(0, (sourceString's |length|)))
return sourceString as string -- convert the NSString object to an AppleScript's string
end if
return "" -- else, the transform was not applied
end stripString
根据评论:
对于小滴,脚本需要一个on open handler
,如下所示:
on open theseFiles
repeat with f in theseFiles
set cleanText to do shell script "perl -CSD -pe 'use utf8; s~\\p{NonspacingMark}~~og; s~־|׀~ ~g; s~ +~ ~g;' " & quoted form of POSIX path of f
-- do something with that cleanText
end repeat
end open
如果要进行就地编辑(perl脚本需要-i
选项+ '.some name extension'
):
这将创建每个文件的备份(在名称后添加“ .bak ”)
on open theseFiles
repeat with f in theseFiles -- *** create a backup and edit the file in-place ***
do shell script "perl -i'.bak' -CSD -pe 'use utf8; s~\\p{NonspacingMark}~~og; s~־|׀~ ~g; s~ +~ ~g;' " & quoted form of POSIX path of f
end repeat
end open
如果您不想备份每个文件(perl脚本需要-i
选项+ ''
),如下所示:
-- *** edit the file in-place without backup***
do shell script "perl -i'' -CSD -pe 'use utf8; s~\\p{NonspacingMark}~~og; s~־|׀~ ~g; s~ +~ ~g;' " & quoted form of POSIX path of f
return
功能是否允许编辑文件?