BoldSignEasily embed eSignatures in your .NET applications. Free sandbox with native SDK available.
Why text is extracted with (lots of) random line breaks here and there, like this:
(B)
Compress ed Air (kgf/cm 2 ) as when opening the PDF with Adobe Acrobat, and selecting text, and copy&pasting it here is without any line breaks:
(B) Compressed Air (kgf/cm2)
|
We have used Tesseract engine to perform OCR on PDF document in our end. In Tesseract engine itself, process the PDF document by word by word. So this could be based on how the content preserved in PDF. Due to this only, extracted text is breaks at random line and this is the behavior.
Please let us know if you have any concerns on this. | |
Is it possible to OCR multiple languages with one go? Now it just accepts single language. |
We can able to process the OCR with multiple language at one time using below code snippet,
Note: Make sure to include the language data file for the respective language in Tessdata folder.
Please download the language data files in the below link,
|