I am having a problem with my python OCR project. I am getting inconsistent results using pytesseract an I would like some second opinion on whether these results are a coding failure on my part or an issue with pytesseract. I included some examples and sample code. I could use some suggestions
Screenshot1: [It returned nothing for some reason]
Screenshot 2:
がんばります!
中 N 大神一族、粉骨大身の覚悟で
screenshot 3:
二
思
球
|
RI
培
唄
堆名此懐人的集中力,
Screenshot 4:
「よう、 ジュード」
「どうも、 ローエンさん。どうですか、 新しい浄水装置の方は」
「おかげさんで、何の問題ちないよ」
「そいつは何より」
Sample Code:
import cv2
import pytesseract
from pytesseract import *
pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image
img = cv2.imread(r"C:\Users\tnu20\Downloads\New folder\swstella_intro_cg2.jpg")
if img is None:
print("Image not found at path")
# Inverts Image
invert = cv2.bitwise_not(img)
# Cranks up contrast Image
contrast = cv2.convertScaleAbs(invert, alpha=1.5, beta=1)
# Greyscales Image
gray = cv2.cvtColor(contrast, cv2.COLOR_BGR2GRAY)
# Apply threshold to convert to binary image
threshold_img = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
# Pass the image through pytesseract
text = pytesseract.image_to_string(threshold_img, lang='jpn')
print(text)
#text = pytesseract.image_to_data(threshold_img, lang='jpn', output_type=Output.DICT)
#cv2.imshow('img', img)
#cv2.imshow('invert', invert)
#cv2.imshow('contrast', contrast)
#cv2.imshow('gray', gray)
cv2.imshow('threshold_img', threshold_img)
cv2.waitKey(0)
cv2.destroyAllWindows()