In VB.NET, Convert Non-searchable PDF to Searchable PDF

Question

Hi all, I have a PDF file which is non-searchable.

I have tried several ways to read the content inside the PDF file. But failed to read it.

If the PDF is searchable, I can use below code to read it.

Dim pdfDocumentView As New PdfDocumentView()

pdfDocumentView.Load(openFileDialog.FileName)

For i As Integer = 0 To pdfDocumentView.PageCount - 1

Dim extractedLines As String = pdfDocumentView.ExtractText(i, textLines)

Dim linesArray As String() = extractedLines.Split(New String() {Environment.NewLine}, StringSplitOptions.None)

For Each line As String In linesArray

extractedText += line & Environment.NewLine

So, please show the guide, how to convert non-searchable PDF to search PDF?

or

any way to get content from non-searchable PDF?

Jeyalakshmi Thangamarippandian · Answer

Hi Choong Boon Koo,Thank you for reaching out to Syncfusion support. Upon further analysis, We can achieve your requirement by Performing OCR(Optical Character Recognition).  Optical character recognition (OCR) is a technology used to convert scanned paper documents in the form of PDF files or images into searchable and editable data.The Syncfusion OCR processor library has extended support to process OCR on scanned PDF documents and images with the help of Google’s Tesseract Optical Character Recognition engine.  We have attached our UG Documentation and sample for your reference.
UG: https://help.syncfusion.com/document-processing/pdf/pdf-library/net/working-with-ocr/features?cs-save-lang=1&cs-lang=vb Sample:  https://www.syncfusion.com/downloads/support/directtrac/general/ze/Perform_OCR-186753588 Please try the above solution and let us know the result. Regards,Jeyalakshmi T