Articles in this section
Category / Section

How to Support German and Other Languages in the OCR Processor?

3 mins read

The Syncfusion .NET Optical Character Recognition (OCR) Library is used to extract text from scanned PDFs and images. With a few lines of C# code, a scanned PDF document containing a raster image is converted into a searchable and selectable PDF document with German language. You can save the OCR result as text, structured data, or searchable PDF documents. The .NET OCR Library uses a powerful Tesseract OCR engine.

Steps to convert scanned PDF to searchable PDF by apply the OCR in German language:

  1. Create a new C# console application project.

Console app creation

  1. Install the Syncfusion.Pdf.OCR.NET NuGet package as reference to your .NET console application from NuGet.org.

NuGet package reference

Download the language packages from the following link.
https://github.com/tesseract-ocr/tessdata

  1. Install the following namespaces in the Program.cs file.

C#

using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Graphics;
using Syncfusion.Pdf;
using Syncfusion.Pdf.Parsing;

VB.NET

Imports Syncfusion.OCRProcessor
Imports Syncfusion.Pdf.Graphics
Imports Syncfusion.Pdf 
Imports Syncfusion.Pdf.Parsing
  1. Use the following code example to convert scanned PDF to searchable PDF by applying the OCR in German language.

C#

//Initialize the OCR processor.
using (OCRProcessor processor = new OCRProcessor())
{
   //Load an existing PDF document.
   FileStream stream = new FileStream("Input.pdf", FileMode.Open);
   PdfLoadedDocument document = new PdfLoadedDocument(stream);
   //Set the OCR language.
   processor.Settings.Language = "deu";
   //Perform OCR with input document and testdata (Language packs).
   processor.PerformOCR(document);
   //Create file stream.
   using (FileStream outputFileStream = new FileStream("OCR.pdf", FileMode.Create, FileAccess.ReadWrite))
   {
       //Save the PDF document to file stream.
       document.Save(outputFileStream);
   }
   //Close the document.
   document.Close(true);
}

VB.NET

'Initialize the OCR processor.
Using processor As OCRProcessor = New OCRProcessor()
   'Load an existing PDF document.
   Dim stream As FileStream = New FileStream("Input.pdf", FileMode.Open)
   Dim document As PdfLoadedDocument = New PdfLoadedDocument(stream)
   'Set OCR language.
   processor.Settings.Language = "deu"
   'Perform OCR with input document and tessdata (Language packs).
   processor.PerformOCR(document)
   'Create file stream.
   Using outputFileStream As FileStream = New FileStream("OCR.pdf", FileMode.Create, FileAccess.ReadWrite)
       'Save the PDF document to file stream.
       document.Save(outputFileStream)
   End Using.

A complete working sample can be downloaded from Perform-OCR-on-PDF-document.zip.

By executing the program, you will get the text file (contains extracted text) as follows.

Output screenshot

Take a moment to peruse the documentation, where you will find other options like performing OCR on image, region of the document, rotated page, and large PDF documents with code examples.

Refer here to explore the rich set of Syncfusion Essential PDF features.

Note: Starting with v16.2.0.x, if you reference Syncfusion assemblies from trial setup or from the NuGet feed, include a license key in your projects. Refer to link to learn about generating and registering Syncfusion license key in your application to use the components without trail message.

Conclusion

I hope you enjoyed learning about how to Support German and Other Languages in the OCR Processor.

You can refer to our PDF feature tour page to know about its other groundbreaking feature representations. You can also explore our documentation to understand how to create and manipulate data.

For current customers, you can check out our components from the License and Downloads page. If you are new to Syncfusion, you can try our 30-day free trial to check out our other controls.

If you have any queries or require clarifications, please let us know in the comments section below. You can also contact us through our support forums, Direct-Trac, or feedback portal. We are always happy to assist you!

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments
Please sign in to leave a comment
Access denied
Access denied