We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date
Unfortunately, activation email could not send to your email. Please try again.
Syncfusion Feedback


Trusted by the world’s leading companies

Syncfusion Trusted Companies

Overview

The Syncfusion PDF Library is a .NET library that allows users to extract various types of useful data from a PDF document, such as text, images, attachments, and form data, using C#.

Text can be extracted from PDF documents for archiving or indexing. Extracting text from PDF using Syncfusion Essential PDF is easy and efficient, regardless of the document’s content and its properties.

The extract text feature works seamlessly on various platforms: WinForms, WPF, Blazor, .NET MAUI, WinUI, Flutter, ASP.NET MVC, ASP.NET Core, UWP, Xamarin with Windows, Linux, and MacOS.


How to extract text from the entire PDF document in C#

  • Install the Syncfusion.Pdf.Net.Core NuGet package in your project.
  • Create an instance of the PdfLoadedDocument class and initialize it with the filestream object.
  • Iterate through the pages of the PDF document and extract the text from each page using the ExtractText method.
  • Extracted text is saved to a text file named Result.
  • Call the Close method to clear the memory consumed by PDF DOM and its document stream.

Here is an example of extracting the text from the entire PDF document in C# using the Syncfusion PDF library. You can extract text from the PDF file with just a few lines of code.

//Get stream from an existing PDF document
FileStream docStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read); 
//Load the PDF document
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(docStream); 
string extractedText = string.Empty; 
//Extract all the text from the PDF document pages
foreach (PdfLoadedPage loadedPage in loadedDocument.Pages) { 
    extractedText += loadedPage.ExtractText(); 
} 
//Save the text to file
File.WriteAllText("Result.txt", extractedText); 
//Close the document
loadedDocument.Close(true);

Extract images from a PDF document

To extract images from a particular page or an entire PDF document: You can extract the images from a page using the ExtractImages method in the PdfPageBase class.

  • Install the Syncfusion.Pdf.Imaging.Net.Core NuGet package in your project.
  • Create an instance of the PdfLoadedDocument class and initialize it with the filestream object.
  • Get the reference of the page (Which contains the images) in the PdfPagePage object.
  • Then, call the method ExtractImages to extract all the images from that particular page.
  • Save the extracted images on the disk.
  • Finally, call the Close method to clear the memory consumed by PDF DOM and its document stream.
//Get stream from an existing PDF document
FileStream docStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read);
//Load the PDF document
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(docStream); 
//Load the first page
PdfPageBase pageBase = loadedDocument.Pages[0];
//Extract images from the first page
Stream[] extractedImages = pageBase.ExtractImages();
//Get each image and save it to a separate file
for(int i=0; i<extractedImages.Length; i++) 
{ 
    File.WriteAllBytes("ExtractedImage" + i.ToString() + ".jpg", (extractedImages[i] as MemoryStream).ToArray()); 
}  
//Close the document
loadedDocument.Close(true);

Extract attachments from PDF

Essential PDF provides support for extracting the attachments and saving them to the disk using PdfAttachment class.

  • Install the Syncfusion.Pdf.Net.Core NuGet package in your project.
  • Create an instance of the PdfLoadedDocument class and initialize it with the filestream object.
  • Iterate each attachment in the document.Attachments collection in a PDF document, extract each attachment’s data, and save it to the disk using a FileStream.
  • Finally, call the Close method to clear the memory consumed by PDF DOM and its document stream.
//Get stream from the existing PDF document
FileStream docStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read); 
//Load the PDF document
PdfLoadedDocument document = new PdfLoadedDocument(docStream);
//Iterates the attachments
foreach (PdfAttachment attachment in document.Attachments)
{ 
    //Extract the attachment and save it to the disk
    FileStream fileStream= new FileStream(attachment.FileName, FileMode.Create); 
    fileStream.Write(attachment.Data, 0, attachment.Data.Length);
    fileStream.Dispose();
} 
//Close the document
document.Close(true);




Awards

Greatness—it’s one thing to say you have it, but it means more when others recognize it. Syncfusion is proud to hold the following industry awards.

Scroll up icon

Warning Icon You are using an outdated version of Internet Explorer that may not display all features of this and other websites. Upgrade to Internet Explorer 8 or newer for a better experience.Close Icon

Live Chat Icon For mobile
Live Chat Icon