如何使用OCR工具即时从屏幕区域提取文本？

27

在Ubuntu 12.10中，如果输入

gnome-screenshot -a | tesseract output

它返回：

** Message: Unable to use GNOME Shell's builtin screenshot interface, resorting to fallback X11.

如何从屏幕上选择文本并将其转换为文本（剪贴板或文档）？

谢谢！

— Erling
source

您仅使用gnome-screenshot -a？还要为什么将输出通过管道传输到tesseract？如果我没看错，gnome屏幕截图会将图片保存在文件中，而不是“打印”它

— Salem

如果我通过Bugzilla查看警告应该是无害的。问题：什么是auto-save-directory？它在里面掉了什么吗？有趣的链接：forums.debian.net/viewtopic.php?f=

— t

gnome-screenchot -a -c应该将选择内容复制到剪贴板，不是吗？但是将其管道传输到tesseract会产生相同的错误。默认目录为home / pictures（效果很好）。

— Erling 2013年

1

刚使用gnome-screenshot完成了此操作-然后我不得不编辑文件，以将颜色深度从16m减小为2（这是白色背景上的黑色文本，但是通过今天流行的字体平滑处理等等，它并不是真正的黑色）然后，在从tesseract获得准确的OCR之前，我不得不将图像放大到原始图像的200％，但是一旦完成操作，它的效果就非常好。

@SteveLake嗨，史蒂夫，谢谢你的建议。我编辑了脚本，以在OCR之前按照您描述的方式以编程方式修改图像。检测率现在应该更好。

— 谷氨酸（Glutanimate）2013年

35

也许已经有一些工具可以执行此操作，但是在尝试使用时，您还可以使用一些屏幕截图工具和tesseract创建一个简单的脚本。

以这个脚本为例（在我的系统中，我将其另存为/usr/local/bin/screen_ts）：

#!/bin/bash
# Dependencies: tesseract-ocr imagemagick scrot

select tesseract_lang in eng rus equ ;do break;done
# Quick language menu, add more if you need other languages.

SCR_IMG=`mktemp`
trap "rm $SCR_IMG*" EXIT

scrot -s $SCR_IMG.png -q 100 
# increase quality with option -q from default 75 to 100
# Typo "$SCR_IMG.png000" does not continue with same name.


mogrify -modulate 100,0 -resize 400% $SCR_IMG.png 
#should increase detection rate

tesseract $SCR_IMG.png $SCR_IMG &> /dev/null
cat $SCR_IMG.txt
exit

并具有剪贴板支持：

#!/bin/bash 
# Dependencies: tesseract-ocr imagemagick scrot xsel

select tesseract_lang in eng rus equ ;do break;done
# quick language menu, add more if you need other languages.

SCR_IMG=`mktemp`
trap "rm $SCR_IMG*" EXIT

scrot -s $SCR_IMG.png -q 100    
# increase image quality with option -q from default 75 to 100

mogrify -modulate 100,0 -resize 400% $SCR_IMG.png 
#should increase detection rate

tesseract $SCR_IMG.png $SCR_IMG &> /dev/null
cat $SCR_IMG.txt | xsel -bi

exit

它用于scrot获取屏幕，tesseract识别文本并cat显示结果。剪贴板版本还利用xsel管道输出到剪贴板。

样品用量

注：scrot，xsel，imagemagick和tesseract-ocr是不是默认安装的，但可从默认的存储库。

您可以更换scrot同gnome-screenshot，但它可能需要大量的工作。关于输出，您可以使用任何可以读取文本文件的内容（使用“文本编辑器”打开，将识别的文本显示为通知等）。

GUI版本的脚本

这是OCR脚本的简单图形版本，其中包括语言选择对话框：

#!/bin/bash
# DEPENDENCIES: tesseract-ocr imagemagick scrot yad
# AUTHOR:       Glutanimate 2013 (http://askubuntu.com/users/81372/)
# NAME:         ScreenOCR
# LICENSE:      GNU GPLv3
#
# BASED ON:     OCR script by Salem (http://askubuntu.com/a/280713/81372)

TITLE=ScreenOCR # set yad variables
ICON=gnome-screenshot

# - tesseract won't work if LC_ALL is unset so we set it here
# - you might want to delete or modify this line if you 
#   have a different locale:

export LC_ALL=en_US.UTF-8

# language selection dialog
LANG=$(yad \
    --width 300 --entry --title "$TITLE" \
    --image=$ICON \
    --window-icon=$ICON \
    --button="ok:0" --button="cancel:1" \
    --text "Select language:" \
    --entry-text \
    "eng" "ita" "deu")

# - You can modify the list of available languages by editing the line above
# - Make sure to use the same ISO codes tesseract does (man tesseract for details)
# - Languages will of course only work if you have installed their respective
#   language packs (https://code.google.com/p/tesseract-ocr/downloads/list)

RET=$? # check return status

if [ "$RET" = 252 ] || [ "$RET" = 1 ]  # WM-Close or "cancel"
  then
      exit
fi

echo "Language set to $LANG"

SCR_IMG=`mktemp` # create tempfile
trap "rm $SCR_IMG*" EXIT # make sure tempfiles get deleted afterwards

scrot -s $SCR_IMG.png -q 100 #take screenshot of area
mogrify -modulate 100,0 -resize 400% $SCR_IMG.png # postprocess to prepare for OCR
tesseract -l $LANG $SCR_IMG.png $SCR_IMG # OCR in given language
cat $SCR_IMG | xsel -bi # pass to clipboard
exit

除了上面列出的依赖项之外，您还需要从webupd8 PPA安装Zenity fork YAD才能使脚本正常工作。

— 塞勒姆
source

在终端上很棒！谢谢！我想对教程中的代码文本进行屏幕复制以进行测试。如何使用scrot到剪贴板？

— Erling 2013年

1

临时文件会发生什么？

— Erling 2013年

1

临时文件将保留在那里，直到您重新引导计算机。如果这对您来说是个问题，您只需在末尾（rm $SCR_IMG.png $SCR_IMG.txt）将其删除。

— 塞勒姆

1

scrot本身不能使用剪贴板。但是，如果要复制/粘贴文本，则可以使用类似xclip或的工具xsel来完成所需的工作。

— 塞勒姆2013年

1

塞勒姆的答案：如果您正在运行KDE，则可以调用另一个脚本将生成的文本自动发送到剪贴板，以准备粘贴。您将在这里找到合适的脚本。按照该页面上的说明安装该脚本。然后，您所需要的只是添加| clipboard到Salem脚本的最后一行的末尾。

— 克里斯

3

不知道是否有人需要我的解决方案。这是一个与Wayland一起运行的游戏。

它显示了文本编辑器中的字符识别，并且如果添加了参数“是”，则会从goggle trans工具获得翻译（必须连接互联网），然后才能使用tesseract-ocr imagemagick和google-trans。当您看到要识别的文本时，使用Alt + F2在gnome中启动脚本。将课程练习者移动到文本周围。而已。仅针对gnome测试了该脚本。对于其他窗口管理器，则必须适应。要翻译其他语言的文本，请替换第25行中的语言ID。

#!/bin/bash
# Dependencies: tesseract-ocr imagemagick google-trans

translate="no"
translate=$1

SCR_IMG=`mktemp`
trap "rm $SCR_IMG*" EXIT

gnome-screenshot -a -f $SCR_IMG.png  
# increase quality with option -q from default 75 to 100
# Typo "$SCR_IMG.png000" does not continue with same name.


mogrify -modulate 100,0 -resize 400% $SCR_IMG.png 
#should increase detection rate

tesseract $SCR_IMG.png $SCR_IMG &> /dev/null

if [ $translate = "yes" ] ; then

        trans :de file://$SCR_IMG.txt -o $SCR_IMG.translate.txt
        gnome-text-editor $SCR_IMG.translate.txt
        else
        gnome-text-editor $SCR_IMG.txt
fi

exit

— 罗纳德
source

1

我刚刚写了一篇有关如何使用现代屏幕截图的博客。即使我的目标是中文，但屏幕显示和代码均为英文。OCR只是功能之一。

我的OCR的功能：

在konsole + vimx或gedit中打开以进一步编辑。
对于vimx + english，启用拼写检查。
支持动态语言选择，无需硬编码。
转换和镶嵌时的进度对话框很慢。

功能码：

function ocr () {
    tmpj="$1"
    tmpocr="$2"
    tmpocr_p="$3"
    atom="$(tesseract --list-langs 2>&1)"; atom=(`echo "${atom#*:}"`); atom=(`echo "$(printf 'FALSE\n%s\n' "${atom[@]}")"`); atom[0]='True'
    ans=(`yad --center --height=200 --width=300 --separator='|' --on-top --list --title '' --text='Select Languages:' --radiolist --column '✓' --column 'Languages' "${atom[@]}" 2>/dev/null`) && ans="$(echo "${ans:5:-1}")" &&  convert "$tmpj[x2000]" -unsharp 15.6x7.8+2.69+0 "$tmpocr_p" | yad --on-top --title '' --text='Converting ...' --progress --pulsate --auto-close 2>/dev/null && tesseract "$tmpocr_p" "$tmpocr" -l "$ans" 2>>/tmp/tesseract.log | yad --percentage=50 --on-top --title '' --text='Tesseracting ...' --progress --pulsate --auto-close 2>/dev/null && if [[ "$ans" == 'eng' ]]; then konsole -e "vimx -c 'setlocal spell spelllang=en_us' -n $tmpocr.txt" 2>/dev/null; else gedit "$tmpocr.txt"; fi
    rm "$tmpocr_p"
}

来电显示：

for cmd in "mktemp" "convert" "tesseract" "gedit" "konsole" "vimx" "yad"; do 
    command -v $cmd >/dev/null 2>&1 || {  LANG=POSIX; xmessage "Require $cmd but it's not installed.  Aborting." >&2; exit 1; }; :;
done
tmpj="$(mktemp /tmp/`date +"%s_%Y-%m-%d"`_XXXXXXXXXX.png)"
tmpocr="$(mktemp -u /tmp/`date +"%s_%Y-%m-%d"`_ocr_XXXXX)"
tmpocr_p="$tmpocr"+'.png'
gnome-screenshot -a -f "$tmpj" 2>&1 >/dev/null | ts >>/tmp/gnome_area_PrtSc_error.log
ocr $tmpj $tmpocr $tmpocr_p &

将这2个代码合并到单个shell脚本中以运行。

屏幕截图1：

屏幕截图2：

— 林果皞
source

似乎是一个不错的解决方案，但脚本的可读性很差

— ukos

0

这个想法是，只要新的屏幕截图文件出现在运行tesseract OCR的文件夹中并在文件编辑器中打开它，就可以了。

您可以将此运行脚本保留在您喜欢的屏幕截图输出目录的输出目录中

#cat wait_for_it.sh
inotifywait -m . -e create -e moved_to |
    while read path action file; do
        echo "The file '$file' appeared in directory '$path' via '$action'"
        cd "$path"
        if [ ${file: -4} == ".png" ]; then
                tesseract "$file" "$file"
                sleep 1
                gedit "$file".txt &
        fi

    done

您将需要对此进行说明

sudo apt install tesseract-ocr
sudo apt install inotify-tools

— 爱德华·弗洛里内斯库（Eduard Florinescu）
source

0

为此，我创建了一个免费的开源程序：

https://danpla.github.io/dpscreenocr/

— 丹普拉
source