In VB.NET, Convert Non-searchable PDF to Searchable PDF

Hi all, I have a PDF file which is non-searchable.

I have tried several ways to read the content inside the PDF file. But failed to read it.



If the PDF is searchable, I can use below code to read it.


Dim pdfDocumentView As New PdfDocumentView()

pdfDocumentView.Load(openFileDialog.FileName)


For i As Integer = 0 To pdfDocumentView.PageCount - 1

    Dim extractedLines As String = pdfDocumentView.ExtractText(i, textLines)

    Dim linesArray As String() = extractedLines.Split(New String() {Environment.NewLine}, StringSplitOptions.None)

    For Each line As String In linesArray

        extractedText += line & Environment.NewLine

    Next

Next





So, please show the guide, how to convert non-searchable PDF to search PDF?

or

any way to get content from non-searchable PDF?


1 Reply

JT Jeyalakshmi Thangamarippandian Syncfusion Team July 4, 2024 09:46 AM UTC

Hi Choong Boon Koo,

Thank you for reaching out to Syncfusion support.

 

Upon further analysis, We can achieve your requirement by Performing OCR(Optical Character Recognition).  Optical character recognition (OCR) is a technology used to convert scanned paper documents in the form of PDF files or images into searchable and editable data.The Syncfusion OCR processor library has extended support to process OCR on scanned PDF documents and images with the help of Google’s Tesseract Optical Character Recognition engine.  We have attached our UG Documentation and sample for your reference.

UG: https://help.syncfusion.com/document-processing/pdf/pdf-library/net/working-with-ocr/features?cs-save-lang=1&cs-lang=vb

 

Sample:  https://www.syncfusion.com/downloads/support/directtrac/general/ze/Perform_OCR-186753588

 

Please try the above solution and let us know the result. 


Regards,

Jeyalakshmi T


Loader.
Up arrow icon