I want to extract the text from pdf page / tiff image by using Pdf Viewer.
Based on the co-ordinates need to extract the text from the pdf.
Currently i am using this code for extraction, but based on the bounds the extracted text is not accurate.
publicvoidExtractTextfromPDF(double X,double Y,doubleWidth,doubleHeight,intPageIndex)
{
string docPath =Path.GetFullPath("wwwroot/Data/Input.pdf");
//Initialize the OCR processor.
using(OCRProcessor processor =newOCRProcessor())
{
FileStream fileStream =newFileStream(docPath,FileMode.Open,FileAccess.Read);
PdfLoadedDocument loadedDocument =newPdfLoadedDocument(fileStream);
processor.Settings.Language="dan";
RectangleF rectangle =newRectangleF((float)(X),(float)(Y),(float)(Width),(float)(Height));// X, Y, width, height
//Assign rectangles to the page.
List<PageRegion> pageRegions =newList<PageRegion>();
PageRegion region =newPageRegion();
region.PageIndex=PageIndex;
region.PageRegions=newRectangleF[]{ rectangle };
pageRegions.Add(region);
processor.Settings.Regions= pageRegions;
string extracttext = processor.PerformOCR(loadedDocument,@"wwwroot/Data/TessData/");
loadedDocument.Close(true);
}
}
Could you please guide me on how to achieve this with syncfusion pdf viewer? Any examples or references would be greatly appreciated.
Hi Nirmal Chandran,
Thank you for
reaching out to us. Below, we have provided a sample for extracting text from
specific coordinates obtained from the rectangle annotation. In the sample, you
can add the rectangle using the "Select
Area" button and then extract the text from that area by clicking
the "Extract Text" button.
Please review and confirm if this meets your requirements.
Sample
to Extract Text
Demo
on Extract Text
Regards,
Sathiyaseelan K
Hi Sathya,
Thanks for your quick response. I tried your Sample and its working fine as expected for English Language, but I am trying to extract from Danish Language and its not extracting every text selected. so can you please update this sample to extract from different language's.
Thanks,
Nirmal C
Hi Nirmal Chandran,
Thank you for the
update. We attempted to extract text from the PDF containing Danish language,
but we did not encounter any issues. The text was extracted correctly from the
PDF. Below, we have provided the sample we tested. Please review the sample and
PDF to confirm whether the issue persists on your end.
If you are still
experiencing issues with your PDF or sample, kindly provide a modified sample
and demo that replicates the problem so we can investigate further and provide
an appropriate solution.
Sample
extract Danish text
Demo
on extracting Danish text
Regards,
Sathiyaseelan K