使用YOLO或其他图像识别技术来识别图像中存在的所有字母数字文本

我有多个图像图，所有图像图都包含字母数字字符标签，而不仅仅是文本标签本身。我希望我的YOLO模型能够识别其中存在的所有数字和字母数字字符。

我该如何训练我的YOLO模型做同样的事情。数据集可以在这里找到。https://drive.google.com/open?id=1iEkGcreFaBIJqUdAADDXJbUrSj99bvoi

例如：请参阅边界框。我希望YOLO可以检测出文本的任何位置。但是，当前无需识别其中的文本。

这些类型的图像也需要做同样的事情

图像可以在这里下载

这是我使用opencv尝试过的方法，但不适用于数据集中的所有图像。

import cv2
import numpy as np
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Users\HPO2KOR\AppData\Local\Tesseract-OCR\tesseract.exe"

image = cv2.imread(r'C:\Users\HPO2KOR\Desktop\Work\venv\Patent\PARTICULATE DETECTOR\PD4.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
clean = thresh.copy()

horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
detect_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detect_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(clean, [c], -1, 0, 3)

vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,30))
detect_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)
cnts = cv2.findContours(detect_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(clean, [c], -1, 0, 3)

cnts = cv2.findContours(clean, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    area = cv2.contourArea(c)
    if area < 100:
        cv2.drawContours(clean, [c], -1, 0, 3)
    elif area > 1000:
        cv2.drawContours(clean, [c], -1, 0, -1)
    peri = cv2.arcLength(c, True)
    approx = cv2.approxPolyDP(c, 0.02 * peri, True)
    x,y,w,h = cv2.boundingRect(c)
    if len(approx) == 4:
        cv2.rectangle(clean, (x, y), (x + w, y + h), 0, -1)

open_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
opening = cv2.morphologyEx(clean, cv2.MORPH_OPEN, open_kernel, iterations=2)
close_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,2))
close = cv2.morphologyEx(opening, cv2.MORPH_CLOSE, close_kernel, iterations=4)
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    area = cv2.contourArea(c)
    if area > 500:
        ROI = image[y:y+h, x:x+w]
        ROI = cv2.GaussianBlur(ROI, (3,3), 0)
        data = pytesseract.image_to_string(ROI, lang='eng',config='--psm 6')
        if data.isalnum():
            cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)
            print(data)

cv2.imwrite('image.png', image)
cv2.imwrite('clean.png', clean)
cv2.imwrite('close.png', close)
cv2.imwrite('opening.png', opening)
cv2.waitKey()

是否有任何模型或任何opencv技术或一些经过预训练的模型可以为我做同样的事情？我只需要图像中所有字母数字字符周围的边界框即可。之后，我需要确定其中包含的内容。但是第二部分目前并不重要。

— Pulkit Bhatnagar
source

这回答了你的问题了吗？函数'cvtColor'中的OpenCV！_src.empty（）错误

— Amit Yadav

看一下使用opencv

— 获取

不适用于所有图像

— Pulkit Bhatnagar

Answers:

一种可行的方法是使用基于Zhou等人2017年论文《EAST：高效准确的场景文本检测器》的EAST（高效准确的场景文本）深度学习文本检测器。该模型最初经过训练，可以检测自然场景图像中的文本，但也可以将其应用于图表图像。EAST非常强大，能够检测模糊或反射的文本。这是Adrian Rosebrock对EAST的实施的修改版本。代替直接在图像上应用文本检测器，我们可以尝试在执行文本检测之前删除图像上尽可能多的非文本对象。这个想法是在应用检测之前删除水平线，垂直线和非文本轮廓（曲线，对角线，圆形）。这是一些图像的结果：

输入->非文本轮廓以绿色删除

结果

其他图片

在这里frozen_east_text_detection.pb可以找到执行文本检测所必需的预训练模型。尽管该模型捕获了大部分文本，但结果并非100％准确，并且可能由于在自然场景图像上的训练方式而偶尔出现误报。为了获得更准确的结果，您可能必须训练自己的自定义模型。但是，如果您需要一个体面的开箱即用的解决方案，那么您应该可以使用。请查看Adrian的OpenCV文本检测（EAST文本检测器）博客文章，以获取有关EAST文本检测器的更全面的说明。

码

from imutils.object_detection import non_max_suppression
import numpy as np
import cv2

def EAST_text_detector(original, image, confidence=0.25):
    # Set the new width and height and determine the changed ratio
    (h, W) = image.shape[:2]
    (newW, newH) = (640, 640)
    rW = W / float(newW)
    rH = h / float(newH)

    # Resize the image and grab the new image dimensions
    image = cv2.resize(image, (newW, newH))
    (h, W) = image.shape[:2]

    # Define the two output layer names for the EAST detector model that
    # we are interested -- the first is the output probabilities and the
    # second can be used to derive the bounding box coordinates of text
    layerNames = [
        "feature_fusion/Conv_7/Sigmoid",
        "feature_fusion/concat_3"]

    net = cv2.dnn.readNet('frozen_east_text_detection.pb')

    # Construct a blob from the image and then perform a forward pass of
    # the model to obtain the two output layer sets
    blob = cv2.dnn.blobFromImage(image, 1.0, (W, h), (123.68, 116.78, 103.94), swapRB=True, crop=False)
    net.setInput(blob)
    (scores, geometry) = net.forward(layerNames)

    # Grab the number of rows and columns from the scores volume, then
    # initialize our set of bounding box rectangles and corresponding
    # confidence scores
    (numRows, numCols) = scores.shape[2:4]
    rects = []
    confidences = []

    # Loop over the number of rows
    for y in range(0, numRows):
        # Extract the scores (probabilities), followed by the geometrical
        # data used to derive potential bounding box coordinates that
        # surround text
        scoresData = scores[0, 0, y]
        xData0 = geometry[0, 0, y]
        xData1 = geometry[0, 1, y]
        xData2 = geometry[0, 2, y]
        xData3 = geometry[0, 3, y]
        anglesData = geometry[0, 4, y]

        # Loop over the number of columns
        for x in range(0, numCols):
            # If our score does not have sufficient probability, ignore it
            if scoresData[x] < confidence:
                continue

            # Compute the offset factor as our resulting feature maps will
            # be 4x smaller than the input image
            (offsetX, offsetY) = (x * 4.0, y * 4.0)

            # Extract the rotation angle for the prediction and then
            # compute the sin and cosine
            angle = anglesData[x]
            cos = np.cos(angle)
            sin = np.sin(angle)

            # Use the geometry volume to derive the width and height of
            # the bounding box
            h = xData0[x] + xData2[x]
            w = xData1[x] + xData3[x]

            # Compute both the starting and ending (x, y)-coordinates for
            # the text prediction bounding box
            endX = int(offsetX + (cos * xData1[x]) + (sin * xData2[x]))
            endY = int(offsetY - (sin * xData1[x]) + (cos * xData2[x]))
            startX = int(endX - w)
            startY = int(endY - h)

            # Add the bounding box coordinates and probability score to
            # our respective lists
            rects.append((startX, startY, endX, endY))
            confidences.append(scoresData[x])

    # Apply non-maxima suppression to suppress weak, overlapping bounding
    # boxes
    boxes = non_max_suppression(np.array(rects), probs=confidences)

    # Loop over the bounding boxes
    for (startX, startY, endX, endY) in boxes:
        # Scale the bounding box coordinates based on the respective
        # ratios
        startX = int(startX * rW)
        startY = int(startY * rH)
        endX = int(endX * rW)
        endY = int(endY * rH)

        # Draw the bounding box on the image
        cv2.rectangle(original, (startX, startY), (endX, endY), (36, 255, 12), 2)
    return original

# Convert to grayscale and Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
clean = thresh.copy()

# Remove horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
detect_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detect_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(clean, [c], -1, 0, 3)

# Remove vertical lines
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,30))
detect_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)
cnts = cv2.findContours(detect_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(clean, [c], -1, 0, 3)

# Remove non-text contours (curves, diagonals, circlar shapes)
cnts = cv2.findContours(clean, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    area = cv2.contourArea(c)
    if area > 1500:
        cv2.drawContours(clean, [c], -1, 0, -1)
    peri = cv2.arcLength(c, True)
    approx = cv2.approxPolyDP(c, 0.02 * peri, True)
    x,y,w,h = cv2.boundingRect(c)
    if len(approx) == 4:
        cv2.rectangle(clean, (x, y), (x + w, y + h), 0, -1)

# Bitwise-and with original image to remove contours
filtered = cv2.bitwise_and(image, image, mask=clean)
filtered[clean==0] = (255,255,255)

# Perform EAST text detection
result = EAST_text_detector(image, filtered)

cv2.imshow('filtered', filtered)
cv2.imshow('result', result)
cv2.waitKey()

— 幼稚
source

非常完整的答案。多少小时的努力？

— karlphillip

大约花一个小时又三十分钟来写出来

— nathancy

直到今天，我仍然感到惊讶的是，几天之内出现了极为相似的简历问题的人数。看起来好像来自同一图像处理班的人都在寻求帮助以完成他们的作业，或者只是寻找某人为他们做作业。这是一个非常奇怪的“巧合”。

— karlphillip

@karlphillip也许这个问题看起来很熟悉，因为OP大约在一周前发布了它。他非常想得到一个CTRL + C，CTRL + V的答案，可以直接解决所有问题，所以，我想您可能会在几周后再次看到同样的问题！

— eldesgraciado

@eldesgraciado我刚刚意识到OP在几周前发布了类似的问题。直到现在才意识到自己是同一个人！我还想知道为什么这个问题看起来很熟悉

— nathancy

为了方便起见，我想添加软件包keras_ocr。它可以通过pip轻松安装，并且基于CRAFT文本检测器，如果我没记错的话，它比EAST检测器要新一些。

在检测旁边，它也已经执行了一些OCR！结果如下所示，这比公认的答案更容易实现。

— 维克多·桑克
source

嗨，胜利者，它至少可用于我70％的图像吗？

— Pulkit Bhatnagar

您尚未在数据集中包含标签。因此，如果我没有办法通过将其与标签进行比较来验证它是否有效，那么我真的不能告诉您它可以处理多少百分比的图像。但是，它是一个pip包，因此它应该很容易让您在数据集上运行并亲自查看:)

— Victor Sonck

您所描述的似乎是OCR（光学字符识别）。我知道的一种OCR引擎是tesseract，尽管IBM和其他公司也提供这种。

由于YOLO最初是接受一项非常不同的任务培训的，因此将其用于本地化文本可能需要从头开始对其进行重新培训。可以尝试使用现有的程序包（适应于您的特定设置）来获取基本事实（尽管要记住，该模型通常最多仅与基本事实一样好）。或者，也许更容易地，生成用于训练的综合数据（即，将文本添加到您选择的位置到现有工程图中，然后进行训练以将其本地化）。

或者，如果所有目标图像的结构均与上述类似，则可以像上面一样尝试使用经典的CV启发式方法创建基本事实，以分离/分割出符号，然后使用在MNIST上训练的CNN或类似方法进行分类，以确定如果给定的斑点包含一个符号。

对于情况下，你的YOLO做选择-在Python现有的实现，例如，我有一些经验，这一个 -应该是相当简单设置训练用自己的地面实况。

最后，如果使用YOLO或CNN本身不是目标，而仅仅是解决方案，则上述任何“基础事实”都可以直接用作解决方案，而不是用于训练模型。

希望我能正确理解你的问题

— 尤里·费尔德曼（Yuri Feldman）
source

如果您可以提供相同的代码，因为此问题包含赏金

— Pulkit Bhatnagar

任务是最终获取文本，但我首先要尝试识别其中的所有字母数字字符，然后对识别出的字符使用OCR

— Pulkit Bhatnagar

我提出的建议都不是真正的即用型解决方案，我认为算法代码不会简短也不简单，因此我将其停留在想法层面:-)。ps感谢您的支持！

— 尤里·费尔德曼