这是一个粗略的开始,基于此答案中组合字符的列表(然后扩展)。(将其标记为社区Wiki-请对此进行编辑和改进!)
(defconst arabic-diacritics '(#x064b #x064c #x064d #x064e #x064f #x0650 #x0651 #x0652 #x0653 #x0654 #x0655 #x0670)
"Unicode codepoints for Arabic combining characters.")
(defconst arabic-diacritics-regexp (regexp-opt (mapcar #'string arabic-diacritics)))
(defconst arabic-equivalents
'(
;; "alef" is equivalent to "alef with hamza above" etc
(#x0627 #x0623 #x0625 #x0622)))
;; (require 'cl-lib)
;; (defun arabic-strip-diacritics (string)
;; (cl-reduce (lambda (s c) (remove c s)) arabic-diacritics :initial-value string))
(defun arabic-search-without-diacritics (string)
(interactive (list (read-string "Search for: " nil nil nil t)))
(let ((regexp
(apply #'concat
(mapcar (lambda (c)
(let ((equivalents (assq c arabic-equivalents)))
(concat
(if equivalents
(regexp-opt (mapcar #'string equivalents))
(regexp-quote (string c)))
arabic-diacritics-regexp "*")))
string))))
(search-forward-regexp regexp)))
因此,如果缓冲区包含“ الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ”,并且我求值(arabic-search-without-diacritics "الحمد لله رب العالمين")
,它将找到文本。它也可以与交互工作M-x arabic-search-without-diacritics
。
替代方法:
这是一个完整的代码示例,演示了如何Mn
从正则表达式匹配项中的规范化字符串中删除变音符号和其他非间距标记(属性)。它适用于给出的示例,而IMO是正确的方法。
(defun kill-marks (string)
(concat (loop for c across string
when (not (eq 'Mn (get-char-code-property c 'general-category)))
collect c)))
(let* ((original1 "your Arabic string here")
(normalized1 (ucs-normalize-NFKD-string original1))
(original2 "your other Arabic string here")
(normalized2 (ucs-normalize-NFKD-string original2)))
(equal
(replace-regexp-in-string "." 'kill-marks normalized1)
(replace-regexp-in-string "." 'kill-marks normalized2)))
ucs-normalize-*
功能lisp/international/ucs-normalize.el
。没有像针对大小写折叠那样的预定义搜索折叠,但是您至少可以在搜索区域之前对其进行标准化。一个好的实现可能是一个相当复杂的任务。