Pytesseract:“ TesseractNotFound错误:tesseract未安装或不在您的路径中”,如何解决此问题?


73

我正在尝试在python中运行基本且非常简单的代码。

from PIL import Image
import pytesseract

im = Image.open("sample1.jpg")

text = pytesseract.image_to_string(im, lang = 'eng')

print(text)

看起来就是这样,我实际上已经通过安装程序为Windows安装了tesseract。我是Python的新手,不确定如何继续?

这里的任何指导将非常有帮助。我尝试重新启动Spyder应用程序,但无济于事。


什么不起作用?您可以添加问题中出现的错误吗?
Mooncrater

Answers:


121

我看到步骤分散在不同的答案中。根据我最近在Windows上遇到的pytesseract错误的经验,依次编写不同的步骤可以更轻松地解决该错误:

1。使用Windows安装程序安装tesseract,网址为:https//github.com/UB-Mannheim/tesseract/wiki

2。请注意安装中的tesseract路径。进行此编辑时的默认安装路径为:C:\Users\USER\AppData\Local\Tesseract-OCR。它可能会更改,因此请检查安装路径。

3pip install pytesseract

4。在调用之前,在脚本中设置tesseract路径image_to_string

pytesseract.pytesseract.tesseract_cmd = r'C:\Users\USER\AppData\Local\Tesseract-OCR\tesseract.exe'


7
这对我有用!如果其他人在查找“ Tesseract-OCR”文件夹时遇到问题,则也请在“ C:\ Program Files \”文件夹中搜索
Shahriar Rahman Zahin

6
您是互联网的英雄
ArturMüllerRomanov

2
为了重新启动环境设置,不得不重启我的计算机
Barmaley '20

2
这个答案比文档更好,因为tesseract_cmd的路径确实需要指向tesseract.exe。文档中缺少此内容。
本治

1
pytesseract.pytesseract.tesseract_cmd = r'C:\ Users \ USER \ AppData \ Local \ Tesseract-OCR \ tesseract.exe'这个答案使我脱离了计算机视觉的重要截止日期-OCR项目非常感谢@Nafeez Quraishi:- )
Vetrivel PS

59

首先,您应该安装二进制文件:

在Linux上

sudo apt-get update
sudo apt-get install libleptonica-dev 
sudo apt-get install tesseract-ocr tesseract-ocr-dev
sudo apt-get install libtesseract-dev

在Mac上

brew install tesseract

在Windows上

https://github.com/UB-Mannheim/tesseract/wiki下载二进制文件。然后添加pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'到您的脚本中。

然后,您应该使用pip安装python软件包:

pip install tesseract
pip install tesseract-ocr

参考:https : //pypi.org/project/pytesseract/(安装部分)和 https://github.com/tesseract-ocr/tesseract/wiki#installation


16

仅适用于Windows

1-您需要在计算机上安装Tesseract OCR。

从这里得到它。 https://github.com/UB-Mannheim/tesseract/wiki

下载合适的版本。

2-将Tesseract路径添加到您的系统环境。即编辑系统变量。

3-运行pip install pytesseractpip install tesseract

4-每次将此行添加到您的python脚本中

pytesseract.pytesseract.tesseract_cmd = 'C:/OCR/Tesseract-OCR/tesseract.exe'  # your path may be different

5-运行代码。


1
在哪里可以找到Linux的路径?
帕拉维

@Pallavi这是Windows的答案,所以请在这里没有Linux。。由于4的缘故,似乎不需要第二步。–
Timo

8

https://pypi.org/project/pytesseract/

pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'
# Include the above line, if you don't have tesseract executable in your PATH
# Example tesseract_cmd: 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'

7

在Windows中:

点安装tesseract

点安装tesseract-ocr

并检查存储在系统中的文件usr / appdata / local / programs / site-pakages / python / python36 / lib / pytesseract / pytesseract.py文件并编译该文件


8
请注意,这在Windows上不起作用,您需要在Windows上安装二进制文件(来自github.com/tesseract-ocr/tesseract/wiki)并添加行pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"以使tesseract正常工作。
shahar_m

5

在Mac上,您可以按如下所示安装它。这对我有用。

brew install tesseract

4

该错误是因为您的计算机上未安装tesseract。

如果您使用的是Ubuntu,请使用以下命令安装tesseract:

sudo apt-get install tesseract-ocr

对于Mac:

brew install tesseract


3

在Windows 64位,只是下面添加到PATH环境变量: "C:\Program Files\Tesseract-OCR"它会奏效。


3

我可以通过使用pytesseract.py文件中的bin / tesseract路径更新tesseract_cmd变量来解决此问题


2

我在Windows上遇到了同样的问题。我尝试为tesseract的路径更新环境变量,该变量不起作用。

对我有用的是修改pytesseract.py,该文件可以在路径中C:\Program Files\Python37\Lib\site-packages\pytesseract或通常在C:\Users\YOUR USER\APPDATA\Python

我根据以下内容更改了一行:

#tesseract_cmd = 'tesseract' 
#tesseract_cmd = 'C:\Program Files\Tesseract-OCR\\tesseract.exe'

注意我必须\在tesseract之前加上一个额外的内容,因为Python的解释与之相同\t,您将收到以下错误消息:

pytesseract.pytesseract.TesseractNotFoundError:C:\ Program Files \ Tesseract-OCR esseract.exe未安装或不在您的路径中



1

第1步:

根据操作系统在系统上安装tesseract。最新的安装程序可以在https://github.com/UB-Mannheim/tesseract/wiki中找到

步骤2:使用以下方法安装以下依赖项库:pip install pytesseract pip install opencv-python pip install numpy

步骤3:示例代码

import cv2
import numpy as np
import pytesseract
from PIL import Image
from pytesseract import image_to_string

# Path of working folder on Disk Replace with your working folder
src_path = "C:\\Users\\<user>\\PycharmProjects\\ImageToText\\input\\"
# If you don't have tesseract executable in your PATH, include the 
following:
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract- 
OCR/tesseract'
TESSDATA_PREFIX = 'C:/Program Files (x86)/Tesseract-OCR'

def get_string(img_path):
    # Read image with opencv
    img = cv2.imread(img_path)

    # Convert to gray
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Apply dilation and erosion to remove some noise
    kernel = np.ones((1, 1), np.uint8)
    img = cv2.dilate(img, kernel, iterations=1)
    img = cv2.erode(img, kernel, iterations=1)

    # Write image after removed noise
    cv2.imwrite(src_path + "removed_noise.png", img)

    #  Apply threshold to get image with only black and white
    #img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)

    # Write the image after apply opencv to do some ...
    cv2.imwrite(src_path + "thres.png", img)

    # Recognize text with tesseract for python
    result = pytesseract.image_to_string(Image.open(src_path + "thres.png"))

    # Remove template file
    #os.remove(temp)

    return result


print('--- Start recognize text from image ---')
print(get_string(src_path + "image.png") )

print("------ Done -------")

1

在Windows中,对于默认的Windows tesseract安装,必须重定向命令路径。

  1. 在32位系统中,在导入命令之后添加此行。
pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe' 
  1. 在64位系统中,请添加此行。
 pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files\Tesseract-OCR\tesseract.exe'

1

可能正在发生这种情况,因为即使正确安装了Tesseract,您也没有像我一样安装语言。幸运的是,这很容易解决,我什至不需要弄乱tesseract_cmd

sudo apt-get install tesseract-ocr -y
sudo apt-get install tesseract-ocr-spa -y
tesseract --list-langs

请注意,在第二行中,我们指定了 -spa为西班牙语了。

如果安装成功,您应该获得可用语言的列表,例如:

List of available languages (3):
eng
osd
spa

我在这篇博客文章(西班牙语)中找到了这个。还有一个在Windows中安装西班牙语的帖子(显然不那么容易)。

注意:由于问题使用lang = 'eng',因此在特定情况下可能不是答案。但是在其他情况下也可能发生相同的错误,这就是为什么我在此处发布答案的原因。



0
# {Windows 10 instructions}
# before you use the script you need to install the dependence
# 1. download the tesseract from the official link:
#   https://github.com/UB-Mannheim/tesseract/wiki
# 2. install the tesseract
#   i chosed this path
#       *replace the user string in the below path with you name of user that you are using in your current machine
#       C:\Users\user\AppData\Local\Tesseract-OCR\
# 3. Install the  pillow for your python version
# * the best way for me is to install is this form(i'am using python3.7 version and in my CMD i run this version of python by     typing py -3.7):
# * if you are using another version of python first look how you start the python from you CMD
# * for some machine the run of python from the CMD is different
    # [examples]
    # =================================
    # PYTHON VERSION 3.7
    # python
    # python3.7
    # python -3.7
    # python 3.7
    # python3
    # python -3
    # python 3
    # py3.7
    # py -3.7
    # py 3.7
    # py3
    # py -3
    # py 3
    # PYTHON VERSION 3.6
    # python
    # python3.6
    # python -3.6
    # python 3.6
    # python3
    # python -3
    # python 3
    # py3.6
    # py -3.6
    # py 3.6
    # py3
    # py -3
    # py 3
    # PYTHON VERSION 2.7
    # python
    # python2.7
    # python -2.7
    # python 2.7
    # python2
    # python -2
    # python 2
    # py2.7
    # py -2.7
    # py 2.7
    # py2
    # py -2
    # py 2
    # ================================
# we are using pip to install the dependences
# because for me i start the python version 3.7 with the following line 
    # py -3.7
# open the CMD in windows machine and type the following line:
    # py -3.7 -m pip install pillow
# 4. Install the  pytesseract and tesseract for your python version
# * the best way for me is to install is this form(i'am using python3.7 version and in my CMD i run this version of python by     typing py -3.7):
# we are using pip to install the dependences
# open the CMD in windows machine and type the following lines:
    # py -3.7 -m pip install pytesseract
    # py -3.7 -m pip install tesseract


#!/usr/bin/python
from PIL import Image
import pytesseract
import os
import getpass

def extract_text_from_image(image_file_name_arg):

    # IMPORTANT
    # if you have followed my instructions to install this dependence in above text explanatin
    # for my machine is
    # if you don't put the right path for tesseract.exe the script will not work
    username = getpass.getuser()
    # here above line get the username for your machine automatically
    tesseract_exe_path_installation="C:\\Users\\"+username+"\\AppData\\Local\\Tesseract-OCR\\tesseract.exe"
    pytesseract.pytesseract.tesseract_cmd=tesseract_exe_path_installation

# specify the direction of your image files manually or use line bellow if the images are in the script directory in     folder  images
    # image_dir="D:\\GIT\\ai_example\\extract_text_from_image\\images"
    image_dir=os.getcwd()+"\\images"
    dir_seperator="\\"
    image_file_name=image_file_name_arg
    # if your image are in different format change the extension(ex. ".png")
    image_ext=".jpg"
    image_path_dir=image_dir+dir_seperator+image_file_name+image_ext

    print("=============================================================================")
    print("image used is in the following path dir:")
    print("\t"+image_path_dir)
    print("=============================================================================")

    img=Image.open(image_path_dir)
    text=pytesseract.image_to_string(img, lang="eng")
    print(text)

# change the name "image_1" whith the name without extension for your image name
# image_file_name_arg="image_1"
image_file_name_arg="image_2"
# image_file_name_arg="image_3"
# image_file_name_arg="image_4"
# image_file_name_arg="image_5"
extract_text_from_image(image_file_name_arg)

# ==================================
# CREATED BY: SHERIFI
# e-mail: sherif_co@yahoo.com
# git-link for script: https://github.com/sherifi/ai_example.git
# ==================================

2
您可以在发布的代码段中添加文字说明吗?
Dmytro Chasovskyi

我在代码段中添加了文本,以便更轻松地访问并完成答案
Sherifi


0

已经有很多不错的解决方案,但是我想分享一个很棒的网站,当我无法解决“ TesseractNotFound错误:tesseract未安装或不在您的路径中”时,请访问此网站: https:// /www.thetopsites.net/article/50655738.shtml

我意识到出现此错误是因为我使用pip安装了pytesseract,但忘记了安装二进制文件。您的机器可能缺少tesseract-ocr。在此处查看安装说明:https : //github.com/tesseract-ocr/tesseract/wiki

在Mac上,您可以使用自制软件进行安装:

brew install tesseract

在那之后它应该运行良好!

在Windows 10 OS环境下,以下方法适用于我:

  1. 转到此链接并下载tesseract并安装它。Windows版本可在此处获得:https//github.com/UB-Mannheim/tesseract/wiki

  2. 从C:\ Users \ User \ Anaconda3 \ Lib \ site-packages \ pytesseract找到脚本文件pytesseract.py并将其打开。将以下代码从tesseract_cmd ='tesseract'更改为:tesseract_cmd ='C:/ Program Files(x86)/Tesseract-OCR/tesseract.exe' (这是安装Tesseract-OCR的路径,因此请检查安装位置并相应地更新路径)

  3. 您可能还需要添加环境变量C:/ Program Files(x86)/ Tesseract-OCR /

希望对你有帮助!


0

仅适用于Windows用户:

使用以下命令安装tesseract:

pip install tesseract

然后将此行添加到您的代码中,注意“ \”

pytesseract.pytesseract.tesseract_cmd = "C:\Program Files (x86)\Tesseract-OCR\\tesseract.exe" 
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.