OCR could not extract text on PDF

Hi,


i have below questions


i have selected area(using rectangle annotation) on pdf viewer on client side(angular) i got that coordinates for the selected area. same coordinates i applied to extract text on pdf using ocr .net core library.


attached screenshot the area where we need to ocr(bottom right) and also attached sample code and also pdf inside the sample


1) I could not extract text from pdf using below code


https://help.syncfusion.com/file-formats/pdf/working-with-ocr/dot-net-core#performing-ocr-for-a-region


in the sample code the controller action is PerformOCRPDF() did not extract text.


but i tried different way please check PerformOCRUsingPDFImage() method this is extracting text but some time gives wrong text.


please tell me how to extract text


2) where can i get libraries for Tesseract Version 4.0 in .net core


https://help.syncfusion.com/file-formats/pdf/working-with-ocr/dot-net-framework#performing-ocr-with-tesseract-version-40




3) on client side when i try to add rectangle to select area the , i could not change cursor type to cross hair for pdf-viewer control in angular, can you give sample how to change cursor type on pdf-viewer.




Thanks


Dayakar




Attachment: OCRCoreSample_bf6e5259.zip

7 Replies

GK Gowthamraj Kumar Syncfusion Team February 14, 2022 04:06 PM UTC

Hi Dayakar 

I could not extract text from pdf using below code 


in the sample code the controller action is PerformOCRPDF() did not extract text. 
Currently, we are analyze on this sample and we will update the further details on February 16th 2022. 
where can i get libraries for Tesseract Version 4.0 in .net core 
 
 
We can get the TesseractBinaries and tessdata from the OCR Processor download or from the Syncfusion.PDF.OCR.Net.Core NuGet package installed location. Please refer to the following example folder path. 
 
TesseractBinaries 
syncfusionocrprocessor\Tesseractbinaries_core (or) 
C:\Users\username.nuget\packages\Syncfusion.PDF.OCR.Net.Core\XX.X.X.XX\lib\TesseractBinaries 
tessdata 
syncfusionocrprocessor\tessdata (or) 
C:\Users\username.nuget\packages\Syncfusion.PDF.OCR.Net.Core\XX.X.X.XX\lib\tessdata 
 
 
 
 
on client side when i try to add rectangle to select area the , i could not change cursor type to cross hair for pdf-viewer control in angular, can you give sample how to change cursor type on pdf-viewer. 
 
You can change the cursor type of the annotations while resizing it using the resizerCursorType property. We have shared the sample and code snippets for your reference.  
  
  
Code snippet:  
  
<ejs-pdfviewer  
      id="pdfViewer"  
      [serviceUrl]="service"  
      [documentPath]="document"  
      [rectangleSettings]="rectangleSettings"  
      style="height:640px;display:block"  
    ></ejs-pdfviewer>  
  
  
public rectangleSettings: any = {  
    annotationSelectorSettings: {  
      resizerCursorType: 'crossHair',  
    },  
  };  
  
  
  

Regards, 
Gowthamraj K 



DR Dayakar Reddy replied to Gowthamraj Kumar February 15, 2022 06:54 AM UTC

Hi Thank you for the reply,


For the 3rd question, I have to draw annotation(Ex circle ,square) by selecting annotation from custom toolbox, so when I select annotation from toolbox and when I start to draw I need to show crosshair cursor on pdf viewer.


Thanks,

Dayakar



GK Gowthamraj Kumar Syncfusion Team February 15, 2022 12:36 PM UTC

Hi Dayakar, 
 
For the 3rd question, I have to draw annotation(Ex circle ,square) by selecting annotation from custom toolbox, so when I select annotation from toolbox and when I start to draw I need to show crosshair cursor on pdf viewer. 
Syncfusion PDF Viewer will have the cursor in crossHair type while drawing the rectangle and circle annotation. We have shared the video for your reference.  
  
  
Could you please try this and revert us with the screenshot of your exact requirement? It will be helpful for us to investigate further and provide the solution at the earliest. 
 
 
Regards, 
Gowthamraj K 



GK Gowthamraj Kumar Syncfusion Team February 16, 2022 02:00 PM UTC

Hi Dayakar, 

I could not extract text from pdf using below code  


in the sample code the controller action is PerformOCRPDF() did not extract text.  

We were able to reproduce the reported issue with a provided sample on our end. Currently, we are analyzing on this and we will update the further details on February 18th, 2022. 


 
Regards, 
Gowthamraj K 



GK Gowthamraj Kumar Syncfusion Team February 18, 2022 12:43 PM UTC

Hi Dayakar 
 
1) I could not extract text from pdf using below code 
in the sample code the controller action is PerformOCRPDF() did not extract text. 
 
We have checked the reported issue on our end, the provided PDF document does not have any images to perform OCR. OCR process returns the text only when the PDF document contains any scanned image. If it is not having any scanned images, it will return the empty text. So, that the provided document resultant text is empty.  
 
but i tried different way please check PerformOCRUsingPDFImage() method this is extracting text but some time gives wrong text. 
We have checked the PerformOCRUsingPDFImage() method, in that cloned image quality is very low. If the input images does not have proper quality, then it will returns empty or incorrect characters. You can check the cloned image quality by saving the image.This is our actual behavior of OCR processors.   
 
Regards, 
Gowthamraj K 



DR Dayakar Reddy replied to Gowthamraj Kumar February 21, 2022 08:10 AM UTC

Hi Gowthamraj,


For 1st Question the provide PDF contains scanned images only for PerformOCRPDF().

and   PerformOCRUsingPDFImage() do you have any sample where we can increase image quality for PDF images to extract ocr?


For the 3rd question i attached sample video , i need to select drawing number or revision number from list view control right side and then i need to select the area where drawing number is located.

so in pdf viewer page click event, i am adding the rectangle annotation to pdf to select the coordinates for drawing number ,  please see the below code for your ref, so my question is how can i show crosshair cursor on pdf viewer when i select drawing number from listview control to draw the rectangle on pdf viewer for getting coordinates.


 this.pdfViewer.importAnnotation({
      pdfAnnotation: {
        [pageIndex]: {
          shapeAnnotation: [{
            ShapeAnnotationType: "Rectangle",
            StrokeColor: "rgba(255,0,0,1)",
            FillColor: "rgba(255,255,255,0)",
            Opacity: 1,
            Bounds: {
              X: data.x,
              Y: data.y,
              Width: data.width,
              Height: data.height
            },
            Thickness: 1,
            BorderStyle: "Solid",
            BorderDashArray: 0,
            RotateAngle: "RotateAngle0",
            AnnotName: data.id,
            AnnotType: "shape"
          }]
        }
      }
    });

Attachment: 20220221082827_87cc2f77.zip


GK Gowthamraj Kumar Syncfusion Team February 21, 2022 01:44 PM UTC

Hi Dayakar,

 
For 1st question 
As we said earlier, we have internally using Google tesseract engine for OCRing the Images. We have checked this issue in Tesseract engine directly using command prompt(CMD) and it does not returns the characters properly in tesseract engine itself. We already tried to export the image with High dpi, but the image quality is poor. So we are unable to proceed further on this issue. 
 
For the 3rd question 
We have checked the code snippets you shared. You have tried to import the annotations in the pageClick event. If the annotations are imported directly then Syncfusion PDF Viewer does not shows the crosshair cursor to select the rectangle annotation. So, we suggest you add the rectangle annotations using thesetAnnotationMode method to get the crosshair cursor type while drawing the rectangle annotation. We have shared the sample and code snippet for your reference.  
  
  
Code snippet:  
  
  
 drawingNumber() {  
    var viewer = (<any>document.getElementById('pdfViewer')).ej2_instances[0];  
    viewer.annotationModule.setAnnotationMode('Rectangle');  
  }  
  
  
 
 

Regards,
 
Gowthamraj K 


Loader.
Up arrow icon