Plug external OCR instead of build-in OCR

Question

I am evaluating SyncFunction for creating Searchable PDFs.
Is it possible to plug external OCR instead of built in Tesseract OCR library?

Usecase : When user upload non-searchable tiff/pdf files, system perfroms OCR using Azure and exports searchable pdf built based on Azure OCR output JSON object.

Appreciate your suggestions.
Thank You,
Amit

Gowthamraj Kumar · Answer

Hi Amit, 
 
Currently, we are analyzing your requirement on our end and we will update the further details by February 21st 2022 
 
Regards, 
Gowthamraj K

Gowthamraj Kumar · Answer

Hi Amit, 
Currently, still we are analysing possibilities to achieve your requirement with high priority on our end  and we will update the further details on February 23rd,  2022. 
Regards, 
Gowthamraj K

Gowthamraj Kumar · Answer

Hi Amit, 
On our further analysis, we can able to OCR the PDF document by using Azure Computer Vision API. In that, we have extract the images from this document page and get the OCR result from the image using Azure Computer vision API, then we draw that result to the page graphics. For this, we have created the simple POC sample to achieve this requirement. Please find the sample from below download link, 
 
Sample: https://www.syncfusion.com/downloads/support/directtrac/general/ze/AzureOCRVisionAPI_Sample1301096197  
 
We have logged a feature request for ”Add Support to include other engines instead of Tesseract in OCRProcessor”. For this implementation, we have internally extract the images from the pdf document and return the images. You can perform OCR for those image by using Azure Computer vision API and get the result. Finally, passing those OCR result and we have drawn the result to the page graphics internally and return the pdf document with selectable text. We are planned to include this support in our upcoming volume 2 main release, which will be available on June 2022 tentatively  
 
Please find the feedback link to track the implementation of the feature below. 
https://www.syncfusion.com/feedback/32293/add-support-to-include-other-engines-than-tessarct-in-ocrprocessor 
 
Regards, 
Gowthamraj K