pdf compare highlight only differenece portion

We want to compare 2 PDFs. https://support.syncfusion.com/kb/article/8010/how-to-compare-text-in-two-pdf-documents

In the above example, it is highlighting rest of the document from the position it found difference.

What we are looking for is that skip text that is similar 


5 Replies

JT Jeyalakshmi Thangamarippandian Syncfusion Team July 15, 2024 11:21 AM UTC

Hi Mahesh,

We have created a sample to fulfill our requirements, and we have attached it for your reference. Kindly try this on your end and let us know if you need any further assistance.

Code snippet:

PdfDocument document = new PdfDocument();

 

//Create a new page in the document

PdfPage page = document.Pages.Add();

foreach (var line in diff)

{

   // Gets bounds of the line

   RectangleF lineBounds = line.Bounds;

   // Gets text in the line

   string text = line.Text;

 

   page.Graphics.DrawString(text, new PdfStandardFont(PdfFontFamily.Helvetica, 8), PdfBrushes.Black, new PointF(0, 0));

}

 

Sample:

https://www.syncfusion.com/downloads/support/directtrac/general/ze/PDFComparisonSample1565764664

Please refer to the UG documentation for further details:

Working with Text | Syncfusion

Regards,

Jeyalakshmi T




IM Ido Millet July 21, 2024 11:47 AM UTC

Jeyalakshmi, looks like you are writing the diffs on top of each other at  new PointF(0, 0))
The y argument needs to be incremented.



JT Jeyalakshmi Thangamarippandian Syncfusion Team July 22, 2024 01:16 PM UTC

To draw text in various positions on the page, it's essential to specify the exact boundaries of the page to achieve the desired outcome. For instance, in our code, we've used the coordinates (0,0). Please try the following code snippet to obtain the expected result:

PdfDocument document = new PdfDocument();

document.PageSettings.Margins.All = 0;

//Create a new page in the document

PdfPage page = document.Pages.Add();

foreach (var line in diff)

{

// Gets bounds of the line
    RectangleF lineBounds = line.Bounds;
    // Gets text in the line
    string text = line.Text;
    string fontName = line.FontName;
    FontStyle fontStyle = line.FontStyle;
    float fontSize = line.FontSize;
Font font = new Font(fontName, fontSize, fontStyle);
    PdfFont pdfFont = new PdfTrueTypeFont(font, true);

   page.Graphics.DrawString(text, pdfFont, PdfBrushes.Black, new PointF(lineBounds.X, lineBounds.Y));

}



Regards,

Jeyalakshmi T



MM Mahesh Machina replied to Jeyalakshmi Thangamarippandian July 22, 2024 05:16 PM UTC

Hi  Jeyalakshmi,

Thanks for quick response.

Is there any way to skip boundaries while doing comparison. Read 2 pdfs sentence by sentence and copy one sentence from pdf1 and search it in pdf 2 if it exists in pdf2(need not to be in same boundaries) then fine otherwise find boundaries of that text in pdf2 and highlight it. similarly check style for that particular sentence in in both pdfs.

To summarize that problem, we are facing.

For example, below 2 are same except there is more space in second. we want to ignore these types

Test String and  Test   String  


color: #000000 ​and color: Black both gives same output right, in this scenario we don't want difference to be called out.


Please suggest




SG Sivaram Gunabalan Syncfusion Team July 23, 2024 03:44 PM UTC

We don't have direct support for finding this way. We suggest you to achieve the desired result of ignoring minor spacing differences between two PDF documents by using regex patten for comparing texts. Please refer to the code snippet for your reference:

 

string str1 = "Text     String";
string str2 = "Text  String";

bool areEqual = CompareStringsIgnoringWhitespace(str1, str2);

Console.WriteLine(areEqual ? "Strings are equal" : "Strings are not equal");
    

static bool CompareStringsIgnoringWhitespace(string str1, string str2)
{
        // Define a regex pattern to match any sequence of whitespace characters
        string pattern = @"\s+";

// Normalize both strings by replacing multiple whitespace characters with a single space
        string normalizedStr1 = Regex.Replace(str1.Trim(), pattern, " ");
        string normalizedStr2 = Regex.Replace(str2.Trim(), pattern, " ");

// Compare the normalized strings
        return normalizedStr1 == normalizedStr2;
    }

 




Loader.
Up arrow icon