pip install "paddleocr>=2.0.1"
对图像进行识别
from paddleocr import PaddleOCR, draw_ocr
from PIL import Image
if __name__ == '__main__':
ocr = PaddleOCR(use_angle_cls=True, lang='ch')
img_path = 'demo/demo_kie.jpeg'
result = ocr.ocr(img_path, cls=True)
for line in result:
print(line)
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='data/chineseocr/labels/font.TTF')
im_show = Image.fromarray(im_show)
im_show.save('output/result5.jpg')
这里的PaddleOCR(use_angle_cls=True, lang='ch')中的lang可以是很多种语言,比如`ch`, `en`, `fr`, `german`, `korean`, `japan`。
这里即包含了文字检测,也包含了文本识别,一般结果如下
但如果是一张比较简单的文字,如
这个时候,我们只需要识别,无需检测
from paddleocr import PaddleOCR, draw_ocr
if __name__ == '__main__':
ocr = PaddleOCR(use_angle_cls=True, lang='en')
img_path = 'demo/demo_text_recog.jpg'
result = ocr.ocr(img_path, cls=True, det=False)
for line in result:
print(line)
运行结果(部分)
('STAR', 0.8838256597518921)
PaddleOCR框架下载地址:GitHub - PaddlePaddle/PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
这里依然以Kaggle 验证码文本识别为例,PaddleOCR的数据集格式跟MMOCR有一些不同,它需要将训练数据集和测试数据集的图片放在两个不同的文件夹中。大致样式如下
由于之前都是放在一起的,所以写一个脚本将它们分开
import shutil
if __name__ == '__main__':
with open('data/toy_dataset/test_label.txt', 'r') as f:
for line in f:
filename = line.split(' ')[0]
shutil.move('data/toy_dataset/train/' + filename, 'data/toy_dataset/test/' + filename)
另外它的标签文件中间是以制表符\t分开的,而在MMOCR中是以空格分开的。
2wc38.png 2wc38
y5n6d.png y5n6d
men4f.png men4f
57b27.png 57b27
x3deb.png x3deb
修改PaddleOCR主目录下的configs/rec/rec_icdar15_train.yml文件,当然这只是识别框架的其中之一,我们以此为例,修改的部分内容如下
Train:
dataset:
name: SimpleDataSet
# data_dir: ./train_data/ic15_data/
data_dir: ./data/toy_dataset/train/
# label_file_list: ["./train_data/ic15_data/rec_gt_train.txt"]
label_file_list: ["./data/toy_dataset/train_label.txt"]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- CTCLabelEncode: # Class handling label
- RecResizeImg:
image_shape: [3, 32, 100] # 中文[3, 32, 320]
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: True
batch_size_per_card: 256
drop_last: True
num_workers: 8
use_shared_memory: False
Eval:
dataset:
name: SimpleDataSet
# data_dir: ./train_data/ic15_data
data_dir: ./data/toy_dataset/test/
# label_file_list: ["./train_data/ic15_data/rec_gt_test.txt"]
label_file_list: ["./data/toy_dataset/test_label.txt"]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- CTCLabelEncode: # Class handling label
- RecResizeImg:
image_shape: [3, 32, 100]
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: False
batch_size_per_card: 256
num_workers: 4
use_shared_memory: False
将tools文件夹下的train.py拷贝到PaddleOCR主文件夹下,添加参数
--config=configs/rec/rec_icdar15_train.yml
运行,开始训练。