The Syncfusion PDF Library is a .NET library that allows users to extract various types of useful data from a PDF document, such as text, images, attachments, and form data, using C#.
Text can be extracted from PDF documents for archiving or indexing. Extracting text from PDF using Syncfusion Essential PDF is easy and efficient, regardless of the document’s content and its properties.
The extract text feature works seamlessly on various platforms: WinForms, WPF, Blazor, .NET MAUI, WinUI, Flutter, ASP.NET MVC, ASP.NET Core, UWP, Xamarin with Windows, Linux, and MacOS.
Here is an example of extracting the text from the entire PDF document in C# using the Syncfusion PDF library. You can extract text from the PDF file with just a few lines of code.
//Get stream from an existing PDF document
FileStream docStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read);
//Load the PDF document
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(docStream);
string extractedText = string.Empty;
//Extract all the text from the PDF document pages
foreach (PdfLoadedPage loadedPage in loadedDocument.Pages) {
extractedText += loadedPage.ExtractText();
}
//Save the text to file
File.WriteAllText("Result.txt", extractedText);
//Close the document
loadedDocument.Close(true);
To extract images from a particular page or an entire PDF document: You can extract the images from a page using the ExtractImages method in the PdfPageBase class.
//Get stream from an existing PDF document
FileStream docStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read);
//Load the PDF document
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(docStream);
//Load the first page
PdfPageBase pageBase = loadedDocument.Pages[0];
//Extract images from the first page
Stream[] extractedImages = pageBase.ExtractImages();
//Get each image and save it to a separate file
for(int i=0; i<extractedImages.Length; i++)
{
File.WriteAllBytes("ExtractedImage" + i.ToString() + ".jpg", (extractedImages[i] as MemoryStream).ToArray());
}
//Close the document
loadedDocument.Close(true);
Essential PDF provides support for extracting the attachments and saving them to the disk using PdfAttachment class.
//Get stream from the existing PDF document
FileStream docStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read);
//Load the PDF document
PdfLoadedDocument document = new PdfLoadedDocument(docStream);
//Iterates the attachments
foreach (PdfAttachment attachment in document.Attachments)
{
//Extract the attachment and save it to the disk
FileStream fileStream= new FileStream(attachment.FileName, FileMode.Create);
fileStream.Write(attachment.Data, 0, attachment.Data.Length);
fileStream.Dispose();
}
//Close the document
document.Close(true);
Greatness—it’s one thing to say you have it, but it means more when others recognize it. Syncfusion is proud to hold the following industry awards.