We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. (Last updated on: November 16, 2018).
Unfortunately, activation email could not send to your email. Please try again.
Syncfusion Feedback

ExtractText not working for pdfDocument object?

Thread ID:





95899 Aug 6,2010 04:44 PM UTC Dec 18,2017 12:45 PM UTC ASP.NET Web Forms (Classic) 3
Tags: PDF
Asked On August 6, 2010 04:44 PM UTC


I am using version 8.303.0.21 of Sync PDF. I have a few questions here:

1) Does ExtractText function only works on pdfLoadedDocument object? and not pdfdocument?

2) Seems that after importPage "pFinalDoc.ImportPage(pTempDoc, j)", I am not able to do a extractText on pdfDocument. It is giving me "Nothing".

Any advice is appreciateed. Thanks.

Sub test(ByVal msInputFile As MemoryStream)

Dim pDoc As Syncfusion.Pdf.Parsing.PdfLoadedDocument = New Parsing.PdfLoadedDocument(msInputFile)
Dim found As Boolean
Dim searchKey As String
Dim searchList As New SortedList(Of String, Byte())
Dim m As MemoryStream
Dim pFinalDoc As PdfDocument
Dim pTempDoc As Syncfusion.Pdf.Parsing.PdfLoadedDocument
Dim s As String = String.Empty

For i As Integer = 0 To pDoc.Pages.Count - 1

'create a new PDF doc
pFinalDoc = New Syncfusion.Pdf.PdfDocument()

'search if there is any existing PDF having the same key info
searchKey = pDoc.Pages(i).ExtractText().Substring(0, 10)
found = searchList.Keys.Contains(searchKey)

If (found = True) Then
'already existing, load existing pages
pTempDoc = New Parsing.PdfLoadedDocument(searchList(searchKey))

For j As Integer = 0 To pTempDoc.Pages.Count - 1
pFinalDoc.ImportPage(pTempDoc, j)

s &= pFinalDoc.Pages(j).ExtractText()

If (pFinalDoc.Pages(0).ExtractText() = Nothing) then
msgbox "Error"
End if

End If

'add current page
pFinalDoc.ImportPage(pDoc, i)

'save final doc to memory in order to get byte array
m = New MemoryStream()

If (found = True) Then
'set to the new value
searchList(searchKey) = m.ToArray()
searchList.Add(searchKey, m.ToArray())
End If


m = Nothing
pFinalDoc = Nothing

End Sub

Angappan G [Syncfusion]
Replied On August 13, 2010 10:28 AM UTC

Hi HY,

Thank you for your interest in Essential PDF.

We regret for the delay in getting back to you

1.The ExtractText method will only work with the PdfLoadedDocument not with PdfDocument class objects.

2.We can't use the ExtractText method with PdfDocument even after importing the contents of the existing document because the method will only work with PdfLoadedDocument class.

Please let us know if you have any queries.


Rodrigo T
Replied On December 15, 2017 04:55 PM UTC

Hi, that topic is very useful, please insert into main documentation of pdf.PdfDocument.ExtractText.

Using pdf.PdfDocument.ExtractText, formatted text and others returns dirty.

Using pdf.PdfLoadedDocument.ExtractText, all works fine.

Or still have (2017) a bug into pdf.PdfDocument.ExtractText comparing to out of pdf.PdfLoadedDocument.ExtractText.


Sabari Anand Senthamarai Kannan [Syncfusion]
Replied On December 18, 2017 12:45 PM UTC

Hi Rodrigo, 

Thank you for contacting Syncfusion products. 

The text extraction from the PDF document cannot be performed using the PdfDocument class after imported from the PdfLoadedDocument object. It can only be performed using the PdfLoadedDocument class. We will update the same in our UG documentation and it will be refreshed within a week. 

Please let us know if you need any further assistance. 

Sabari Anand 


This post will be permanently deleted. Are you sure you want to continue?

Sorry, An error occured while processing your request. Please try again later.

Please sign in to access our forum

This page will automatically be redirected to the sign-in page in 10 seconds.

Warning Icon You are using an outdated version of Internet Explorer that may not display all features of this and other websites. Upgrade to Internet Explorer 8 or newer for a better experience.Close Icon