获取base64编码的图片数据,进行解码,并进行图片文字识别

时间:2024-03-04 12:31:37
一、首先需要import base64模块,转换为图片格式,再写入到文件中
import base64

img_base64 = \'iVBORw0KGgoAAAANSUhEUgAAAFAAAAAYCAIAAADieO37AAAIVUlEQVR42rWY+08bVxbH+Uu6SZ+rXVXb/tDdSv2hq22ibdKmbaoqaUXarLLtqrtpq2grLZsISiCAARs/8IOxwW97bGxsgwnYYPMyDxuMYxtsYOyExMVAjE0oaVMIyQ897jXTYTw20M1KR2h8uXNnPvec8z3nTsnNe4+fiM2tbkp4Fc7hgV9xL7G2XfhfW7mLzDZMU9RpwvH0oRaPrz+i/ix5UsDIhqamxLzyzuuWQ9CmsyTi+laHYzKe2ckHVirbvJGbACzQqa3Do+6BoIqNF9mjfGAWR0Gu/CSBK4++j0wsKDcaFbH09kFouRhrdnmjTsZuEjdLuHJ/NEmbE4gvy9ukI/5wZPW+ZXiMp5ErhLhF40T/7TzzJjKi8OO4IjyU2Pg/AnvnCLHoG7WcF13+bt+79F0mi1ljE1ZK/nGM++4z3FNHeO880/b5cbukJrK4gubMJDIqtaJvZByuAbtjwCOqlOj1drgmgYU8vq3bGUkyPBFT2yciyWLA9VWsQ+dhepsEhp8+4qZYWi3HGoPxZPEbxyIE962nGE145sXpQAhNiyQ39Liuy9GHgnMquoRVtTbrNCQwoFq7+wQcTj62zjrY751/wsBUD6PoWkhvjd+YVWIcUngKhlzpnySXL4bmYvH1HWLtx2CUkH71LmIWXXg9unI/lnnYircBhslqBUMLQhovpB6QwLv7ksXmc9hUbJvL3+ny7xPS/yMzo9IWsZ5RD5fFHfZFAViONU0Tibk7qZyrTx1ltXJAtKxup9sfAA+Dn8Hbcyv3aWlMCbctmZZvMJtJh4OHddahgsALUAAuXpi+eNp38gXfiecCZRfmxyYOCzzumhGfvlL9wtm6P3xiYRmK3xhd/Z4rrBc0COdXH3jD862ixvDtuwgY+9trQJsN4/iS0qxD8yGfIasht2nAYV/IdeXfXaWnus6/j1d+CRPA4SK+YHTmjlRtZwaOrj7QXfrce+w3NDsUsNvoIa+RdTaZC91VzWEjFTB3ODs6svI77JvCrlxAwCOOTnImAPvjS+gadBvUGzScBA70D5HXyHw4jlYOJzZAqxmAifTDFpbWd+aPCDLgcsPs6HSo+9wx3/y3Bweu+X0pXiaLpbb7tYNopOn1i8WBwcC9fBbfaTXZuFcAVXD2pVG3kzoTQtpKGYH6DFWaxLN/ctojEcXXtqd7+9FI75d/z1XjzE62Gq8/YvCwCR/wHj+KgPlVtVWNvOjqD8M3FhoaOcHFtQMCY2crd5/0CI1UPftBcWCaRJtYl4SNl/U6ObUbmbv7QysuBw2jVgcS2P3Nf8lmA410fXgStJPacjEAT84uD51+NRfJb/0W/+c5Lg8L3s44R/0cfnMkuXkQ4Gn/7eJKxqDVTJWJf+4VI8amTqPS0kQrErmVP8hTNocSawVbS9Sy4RX13hPPk9k78vbvZGVfdw8FLI5BoUS6UFh1STb/YuYgwLH0w7xOcCcwG2/+zyVjzb9y2G8f6VAIiuwUg0qv/eL2Xq9XZlIxAGdbcwkejKX63SGbWO3/7ARVsYzlZRK5SaE3yVQ6aowRqW2GBJYZ5lNbxWkBtQ3Xmx19o2GCtoPo56ANzzn5w5dBwwqdK2i0QNFSL6IOAi0wMwDHMjtiTZtcogvdyrTUa0G9ok7H1Md/RsCTH70GHmYL5FKlVmvqJGnljXp/OEljuyqW8w3d+7p3bvU+0AIzkDO6ercOH5GJGqBW0WjVjdjsYppki++uAElHab82IZ75qmYyk/eEtKbLausZ1Cus83d/zG32zWUEDNUY1WfYfohqq2MQTQBaWa0KMZNs4/OJCpHCPBygAXOq66i9gUounEmk4dXzacGCM3MIWPzRS9CHQDdCa2CG3FPtmD4/nqluF9fwumyucGKdOYd7xn09E5Oej487GmrjK5tw3DFcLkPAE+ffIKfBtoF6gYbRmEm20FLmRmKjWmoYnUsWAs4G7YTPaFBxm4XYZ3/pVfInJ8ZjqGdMPRjsNEhKX0HAncIq5NJyQe2eTVl/TAWeIpIMMpbc7LK6IMiRNoGk7wH2EQmt3Zrfcnj/+rSBt+ddoT5BlYJaRWUm2VyhbMfviSaBmRyEXgqA7b3d0eUNcp12o9re11/o8ACm+OLkwu6Riw587zE1pFs1+kIyRtLy1boSakFr0StCGMf/den4iee8x58GoXadf3N2crqGzyX2Rh00IWwunxQbqntx93U0aPEEycGKa+zG6jqtViHmXjV1GAPxJCgfxLNcxhtw9fO++ED86RuC957PStR7z4rOvGio/NRtUsb3O1GTYG0a3ZA/Uki0c7uzvAlWsrfE5eQXtFopwRdSW608aWhxjWDKMZq0ol0cJxb5xpac6tx73GJ1qR3js8kNud4m1ZicE9MWh1MmEzazK4TCxkjyHkQ1ODm68j2mUKnbLYzJXMx+jmpIZohnISZbSG/nfzOhWUnx70y9PSNgB3w8SBp0+dfa6sjeAIoTS2F2TsdmkptXa3kzu00LRLW9p7uFfw0i3IArABtQARiwAf5QzBDVijoh/DV0XrcPjKpklQZtk92udXv6vaFg+PZKfG+jss8Xj/CtNXAy7TtYIdp6kX4skqR1QtCB1La1w97rbf06m2vPKSWLfR0TNYBcEz+XDXOPSyBuCd/J0BcvGttIrme/3RBKZXBCBkhvOAjAgG3QcWELqG4v2XdRbaveF7p1kM0G2mt81Vgkyfi6NCcjq6htAFQi/Ut2ODxTbJ5gklhCJxl0u9hgHggS+wY242ctWpCXkO8k0LYPBGP5NwyOBi3GbBfBLW/+1cy5z1d5Tgbg/GkjNxagVsGJpUqinE9lO4JgIoOZbO1uTyGvQEir2dhBvjT8BAQhlOViOuz5AAAAAElFTkSuQmCC\'
imgdata=base64.b64decode(img_base64)
file=open(\'1.jpg\',\'wb\')
file.write(imgdata)
file.close()

二、图片文字识别

  首先安装依赖的包:

  pip install pillow

  or

       pip install PIL
  pip install pytesseract

  依赖的包安装完成后,需要安装识别引擎

  下载链接:https://github.com/tesseract-ocr/tesseract/wiki/Downloads

  http://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-4.00.00dev.exe

  

  下载完成后,配置环境变量,在cmd中输入tesseract -v,正常展示即可

  在pycharm中需要修改:pytessertact.py文件

  tesseract_cmd = \'d://Program Files//Tesseract-OCR//tesseract.exe\'  为你的安装路径

  但是指定完安装完路径后,依然报错:

  

 

 查询后,可以再代码中指定tessdata_dir_config

from PIL import Image
import pytesseract
tessdata_dir_config = \'--tessdata-dir "d://Program Files//Tesseract-OCR//tessdata"\'
text=pytesseract.image_to_string(Image.open(\'1.jpg\'), lang = \'eng\', config=tessdata_dir_config)
print(text)

即可正常识别图片