文章/答案/技术大牛

发布

社区首页 >问答首页 >tess-两个OCR无法正确解码

问tess-两个OCR无法正确解码
EN

Stack Overflow用户

提问于 2015-05-14 23:07:51

回答 1查看 643关注 0票数 0

我按照教程来安装Tesseract，特别是tess-two和eyes-两个安装和我的Android应用程序的一部分。

它可以运行，但从baseApi.getUTF8Text();返回的OCR文本完全是胡言乱语。

BitmapFactory.Options options = new BitmapFactory.Options();
        options.inSampleSize = 4;
        Bitmap bmp = BitmapFactory.decodeFile(path , options);
        receipt.setImageBitmap(bmp);

        try {
            ExifInterface exif = new ExifInterface(path);
            int exifOrientation = exif.getAttributeInt(ExifInterface.TAG_ORIENTATION , ExifInterface.ORIENTATION_NORMAL);
            int rotate = 0;
            switch (exifOrientation) {
                case ExifInterface.ORIENTATION_ROTATE_90:    rotate =  90;    break;
                case ExifInterface.ORIENTATION_ROTATE_180:   rotate = 180;    break;
                case ExifInterface.ORIENTATION_ROTATE_270:   rotate = 270;    break;
            }
            if (rotate != 0) {
                int w = bmp.getWidth();
                int h = bmp.getHeight();
                Matrix matrix = new Matrix();
                matrix.preRotate(rotate);
                bmp = Bitmap.createBitmap(bmp, 0, 0, w, h, matrix, false);
            }

            bmp = bmp.copy(Bitmap.Config.ARGB_8888, true);


            TessBaseAPI baseApi = new TessBaseAPI();
            baseApi.init(DATA_PATH , "eng");
            baseApi.setImage(bmp);
            String OCRText = baseApi.getUTF8Text();
            baseApi.end();

            Log.i("OCR Text", "rotate  " + rotate);
            Log.i("OCR Text", "OCR   ");
            Log.i("OCR Text",  OCRText);
            Log.i("OCR Text", "=======================================================================================");

拍摄包含OCR字符的支票将返回

05-14 11:01:59.131: I/OCR Text(18199): rotate  90
05-14 11:01:59.131: I/OCR Text(18199): OCR   
05-14 11:01:59.131: I/OCR Text(18199): 4— ‘ ‘
05-14 11:01:59.131: I/OCR Text(18199): \Dxﬁ ‘
05-14 11:01:59.131: I/OCR Text(18199): I W man"! no Accounv
05-14 11:01:59.131: I/OCR Text(18199): 1’
05-14 11:01:59.131: I/OCR Text(18199): my... «unblm m. mm.
05-14 11:01:59.131: I/OCR Text(18199): :~A
05-14 11:01:59.131: I/OCR Text(18199): «Ln.
05-14 11:01:59.131: I/OCR Text(18199): ‘ “w “IN. N I “H‘M‘
05-14 11:01:59.131: I/OCR Text(18199): mmnwnmw- .; k. '
05-14 11:01:59.131: I/OCR Text(18199): Wilt-run”. uni” nl
05-14 11:01:59.131: I/OCR Text(18199): mam. I
05-14 11:01:59.131: I/OCR Text(18199): =======================================================================================

对如何清理和纠正OCR识别有什么建议吗？使用的设备是三星Galaxy 7“。

android

tesseract

tess-two

回答 1

Stack Overflow用户

发布于 2015-05-15 01:23:12

您可以使用类似于

OCRText = OCRText.replaceAll("[^a-zA-Z0-9]+", " ");
OCRText = OCRText.trim();

它基于我在这里找到的一个Tesseract实现：SimpleAndroidOCRActivity.java

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/30240780

复制

相似问题

问tess-两个OCR无法正确解码
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问tess-两个OCR无法正确解码EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问tess-两个OCR无法正确解码
EN