We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. (Last updated on: November 16, 2018).
Unfortunately, activation email could not send to your email. Please try again.
Syncfusion Feedback

Update OCR to use tesseract 4.0.0

Thread ID:

Created:

Updated:

Platform:

Replies:

140843 Nov 9,2018 10:01 PM UTC Nov 13,2018 09:07 AM UTC ASP.NET MVC - EJ 2 3
loading
Tags: PDF
Jason Morse
Asked On November 9, 2018 10:01 PM UTC

Tesseract has been updated to 4.0.0 as of October 29, 2018.
https://github.com/tesseract-ocr/tesseract

The Syncfusion OCR library currently uses 3.02/5. I've found 4.0.0 to be much better for OCR of PDF documents. Is there a work item for upgrading OCR library to use newer tesseract library?

Dilli Babu Nandha Gopal [Syncfusion]
Replied On November 12, 2018 04:17 PM UTC

Hi Jason, 
 
We have tested the new Tesseract version and found that new Tesseract is performing slower than older version please find the details in the below table.  
Tesseract Version 4.0 OCR Process Time Taken Table Report (Syncfusion Tesseract .dll): 
  
Document 
Size 
Page count 
TesseractVersion4.0 LSTM Engine  
(OCR Time) 
Tesseract Version 4.0 
( Tesseract Engine) 
(OCR  Time ) 
Tesseract Version 3.05 
(OCR Time) 
Input.pdf 
1.8 Mb 
1 
00.27.604 sec 
00.13.496 sec 
00.12.870 sec 
Defect_143275.pdf 
106 KB 
1 
00.19.612 sec 
00.25.264 sec 
00.18.931 sec 
DefectID_WF11781.pdf 
3.0 MB 
8 
00.58.832 sec 
00.42.624 sec 
00.44.559 sec 
SpecialCharacters.pdf 
15.7 KB 
1 
00.09.172 sec 
00.07.523 sec 
00.09.441 sec 
DefectID_WF13618_1.pdf 
3.6 MB 
2 
00.23.752 sec 
00.37.851 sec 
00.34.297 sec 
WF25811.pdf 
24.8 MB 
15 
04.59.782 sec 
03.41.819 sec 
03.50.944 sec 
DefectID_WF32606.pdf 
4.27 MB 
12 
01.16.325 sec 
01.01.061 sec 
1.04.171 sec 
Defect_139301.pdf 
5.4MB 
32 
05.14.944 sec 
04.26.121 sec 
04.18.701 sec 
 
  
At present, we don't have any immediate plans provide support for this newer version. We have logged the feature request to this feature. We will let you know once this feature has been implemented. 
 
Regards, 
Dilli babu. 
 
 


Jason Morse
Replied On November 12, 2018 07:15 PM UTC

Thank you for the update. Although the current tesseract 4.0 performance generally is slower it is not my primary concern - recognition quality is. I am more than willing to consider taking a degradation in performance to achieve an much better improvement in recognition with LSTM engine.  

Document  Size  Page count  TesseractVersion4.0 LSTM Engine   Tesseract Version 4.0 ( Tesseract Engine)  Tesseract Version 3.05 (Baseline)
(OCR Time sec)  Perf Improvement (OCR Time sec)  Perf Improvement (OCR Time sec) 
Input.pdf  1.8 Mb  27.604 -114% 13.496 -5% 12.87
Defect_143275.pdf  106 KB  19.612 -4% 25.264 -33% 18.931
DefectID_WF11781.pdf  3.0 MB  58.832 -32% 42.624 4% 44.559
SpecialCharacters.pdf  15.7 KB  9.172 3% 7.523 20% 9.441
DefectID_WF13618_1.pdf  3.6 MB  23.752 31% 37.851 -10% 34.297
WF25811.pdf  24.8 MB  15  299.782 -30% 221.819 4% 230.944
DefectID_WF32606.pdf  4.27 MB  12  76.325 -19% 61.061 5% 64.171
Defect_139301.pdf  5.4MB  32  314.944 -22% 266.121 -3% 258.701

Dilli Babu Nandha Gopal [Syncfusion]
Replied On November 13, 2018 09:07 AM UTC

Hi Jason, 
 
Thank you for your update. 
  
We have considered your request and logged the feature request to this feature. We will implement this feature in any of our upcoming releases. The feature implementation would also greatly depend on the factors such as product design, code compatibility and complexity. We request you to visit our website periodically for feature related updates. 
 
Regards, 
Dilli babu. 
 


CONFIRMATION

This post will be permanently deleted. Are you sure you want to continue?

Sorry, An error occured while processing your request. Please try again later.

Please sign in to access our forum

or the page will be automatically redirected to sign-in page in 10 seconds.

Warning Icon You are using an outdated version of Internet Explorer that may not display all features of this and other websites. Upgrade to Internet Explorer 8 or newer for a better experience.Close Icon

;