Articles in this section
Category / Section

How to compare text in two PDF documents?

2 mins read

Syncfusion Essential PDF is a .NET PDF library used to create, read, and edit PDF document. Using this library, you can compare the text in two PDF documents by text extraction. The resultant PDF document highlight the entire line of changed text.

Steps to compare the text in PDF documents programmatically:

  1. Create a new Windows Forms application project. Create new windows forms application in PDF
  2. Install the Syncfusion.Pdf.Base NuGet package as reference to your .NET Framework application from NuGet.org. install nuget packages in WinForms PDF
  3. Include the following namespace in the Form1.Designer.cs file.

C#

using Syncfusion.Pdf;
using Syncfusion.Pdf.Graphics;
using Syncfusion.Pdf.Parsing;

 

  1. Add a new button in Form1.Designer.cs to compare the PDF files as follows.
    label = new Label();
    button = new Button();
     
    //Label
    label.Location = new System.Drawing.Point(0, 40);
    label.Size = new System.Drawing.Size(426, 35);
    label.Text = "Click the button to view the compared PDF file generated by Essential PDF";
    label.TextAlign = System.Drawing.ContentAlignment.MiddleCenter;
     
    //Button
    button.Location = new System.Drawing.Point(180, 110);
    button.Size = new System.Drawing.Size(85, 26);
    button.Text = "Compare PDF";
    button.Click += new EventHandler(ComparePDF);
     
    //Create PDF
    ClientSize = new System.Drawing.Size(450, 150);
    Controls.Add(label);
    Controls.Add(button);
    Text = "Create PDF";
    

 

  1. Add the following code in ComparePDF to compare text in two PDF documents.
     //Load the first PDF document
    PdfLoadedDocument loadedDocument = new PdfLoadedDocument("../../Data/Source1.pdf");
     
    //Load the second PDF document
    PdfLoadedDocument loadedDocument1 = new PdfLoadedDocument("../../Data/Source2.pdf");
     
    //Creating the list to store text data in PDF documents
    List<TextData> textData = new List<TextData>();
    List<TextData> textData1 = new List<TextData>();
    List<TextData> maxContainsData = new List<TextData>();
    List<TextData> diff = new List<TextData>();
     
    for (int i = 0; i < loadedDocument.Pages.Count; i++)
    {
        //Get the page from first document
        PdfLoadedPage loadedPage = loadedDocument.Pages[i] as PdfLoadedPage;
        //Extract the text from page of first document 
        string extractedText = loadedPage.ExtractText(out textData);
     
        //Extract the text from page of second document 
        string extractedText1 = loadedDocument1.Pages[i].ExtractText(out textData1);
     
        int minCount = 0;
     
        //Compare the text data count
        if (textData.Count > textData1.Count)
            maxContainsData = textData;
        if (textData.Count < textData1.Count)
            maxContainsData = textData1;
     
        if (textData != textData1)
        {
            if (textData.Count == textData1.Count)
                minCount = textData.Count;
            else
            {
                List<int> count = new List<int>();
                count.Add(textData.Count);
                count.Add(textData1.Count);
                minCount = count.Min();
                //Add diff text to the list
                diff.Add(maxContainsData[minCount]);
            }
            for (int j = 0; j < minCount; j++)
            {
                if (textData[j].Text != textData1[j].Text && textData[j].Bounds != textData1[j].Bounds)
                {
                    //Add diff text to the list
                    diff.Add(textData[j]);
                }
            }
        }
        //Highlight the changed text
        foreach (TextData data in diff)
        {
            loadedPage.Graphics.DrawRectangle(PdfPens.Red,PdfBrushes.Transparent, data.Bounds);
        }
    }
     
    //Save and close the document 
    loadedDocument.Save("ComparedPDF.pdf");
    loadedDocument.Close(true);
    loadedDocument1.Close(true);
     
    //This will open the PDF file so, the result will be seen in default PDF viewer 
    System.Diagnostics.Process.Start("ComparedPDF.pdf");
    

 

A complete working sample can be downloaded from PDFComparisonSample.zip.

By executing the program, you will get the PDF document as follows. Screenshot of output PDF file in WinForms

Note:

Starting with v16.2.0.x, if you reference Syncfusion assemblies from trial setup or from the NuGet feed, include a license key in your projects. Refer to link to learn about generating and registering Syncfusion license key in your application to use the components without trail message.

 

 

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments
Please sign in to leave a comment
Access denied
Access denied