We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date
Unfortunately, activation email could not send to your email. Please try again.
Syncfusion Feedback

How to extract text from a PDF file in C#, VB.NET?

Platform: WinForms |
Control: PDF |
Published Date: August 16, 2018 |
Last Revised Date: May 3, 2019

Syncfusion Essential PDF is the .NET PDF library used to create, read, and edit PDF documents. Using this library, you can extract text from PDF document.

Essential PDF supports basic text extraction and layout-based extraction.

Steps to extract text in PDF programmatically:

  1. Create a new C# console application project. Create empty Console application in Visual Studio
  2. Install the Syncfusion.Pdf.WinForms  NuGet package as reference to your .NET Framework applications from NuGet.org. Install nuget packages
  3. Include the following namespaces in the Program.cs file.

C#

using Syncfusion.Pdf;
using Syncfusion.Pdf.Parsing;

 

VB.NET

Imports Syncfusion.Pdf;
Imports Syncfusion.Pdf.Parsing;

 

  1. Use the ExtractText() with true parameter to perform layout based text extraction in the PDF document.

C#

//Extract text from first page
string extractedTexts = page.ExtractText(true);

 

  1. The following C# and VB.NET code snippets show how to extract text from the PDF document.

C#

//Load an existing PDF
Assembly assembly = typeof(Program).GetTypeInfo().Assembly;
Stream fileStream = assembly.GetManifestResourceStream("ConsoleApplication.input.pdf");
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(fileStream);
 
//Load first page
PdfPageBase page = loadedDocument.Pages[0];
 
//Extract text from first page
string extractedTexts = page.ExtractText(true);
 
//Close the document
loadedDocument.Close(true);

 

VB.NET

'Load an existing PDF
Dim assembly As Assembly = GetType(Program).GetTypeInfo().Assembly
Dim fileStream As Stream = assembly.GetManifestResourceStream("ConsoleApplication.input.pdf")
Dim loadedDocument As PdfLoadedDocument = New PdfLoadedDocument(fileStream)
 
'Load first page
Dim page As PdfPageBase = loadedDocument.Pages(0)
 
'Extract text from first page
Dim extractedTexts As String = page.ExtractText(True)
 
'Close the document
loadedDocument.Close(True)

  

A complete work sample can be downloaded from Extract-Text-from-PDF-File.zip

The input PDF document is as follows. Input PDF text to be extracted

By executing the program, you will get the extracted text as in the following console window. Text extracted from PDF output

You can go through the documentation, where you will find the basic and layout based text extraction with Essential PDF. Also, the brief details about OCR processing and Image Extraction are available with code examples.

Refer here to explore the rich set of Syncfusion Essential PDF features.

An online sample link to extract text from PDF document.

Note:

Starting with v16.2.0.x, if you reference Syncfusion assemblies from trial setup or from the NuGet feed, include a license key in your projects. Refer to link to learn about generating and registering Syncfusion license key in your application to use the components without trail message.

 

2X faster development

The ultimate WinForms UI toolkit to boost your development speed.
ADD COMMENT
You must log in to leave a comment

Please sign in to access our KB

This page will automatically be redirected to the sign-in page in 10 seconds.

Up arrow icon

Warning Icon You are using an outdated version of Internet Explorer that may not display all features of this and other websites. Upgrade to Internet Explorer 8 or newer for a better experience.Close Icon

Live Chat Icon For mobile