We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date

Version of assemblies required for OCR support

Hello,

I have an existing .net core 3.1 project which uses the following Syncfusion packages:

<PackageReference Include="Syncfusion.HtmlToPdfConverter.QtWebKit.Net.Core" Version="20.2.0.40" />
<PackageReference Include="Syncfusion.Pdf.Net.Core" Version="20.2.0.40" />


I attempted to install the OCR support using the following command:

dotnet add package Syncfusion.PDF.OCR.Net.Core

After that, my project fails to restore:

Determining projects to restore...

Writing /var/folders/xq/g92c84y57ks8mt57rgxl8fh00000gn/T/tmpDs8Wlp.tmp

info : Adding PackageReference for package 'Syncfusion.PDF.OCR.Net.Core' into project '/Users/markorciuch/Projects/lcenterprisemisproduct/LCEnterpriseMIS/LCEnterpriseMIS.Web/LCEnterpriseMIS.Web.csproj'.

info : CACHE https://api.nuget.org/v3/registration5-gz-semver2/syncfusion.pdf.ocr.net.core/index.json

info : Restoring packages for /Users/markorciuch/Projects/lcenterprisemisproduct/LCEnterpriseMIS/LCEnterpriseMIS.Web/LCEnterpriseMIS.Web.csproj...

error: NU1605: Detected package downgrade: Syncfusion.Pdf.Net.Core from 21.1.41 to 20.2.0.40. Reference the package directly from the project to select a different version.

error: LCEnterpriseMIS.Web -> Syncfusion.PDF.OCR.Net.Core 21.1.41 -> Syncfusion.Pdf.Imaging.Net.Core 21.1.41 -> Syncfusion.Pdf.Net.Core (>= 21.1.41)

error: LCEnterpriseMIS.Web -> Syncfusion.Pdf.Net.Core (>= 20.2.0.40)

info : Package 'Syncfusion.PDF.OCR.Net.Core' is compatible with all the specified frameworks in project '/Users/markorciuch/Projects/lcenterprisemisproduct/LCEnterpriseMIS/LCEnterpriseMIS.Web/LCEnterpriseMIS.Web.csproj'.

info : PackageReference for package 'Syncfusion.PDF.OCR.Net.Core' version '21.1.41' added to file '/Users/markorciuch/Projects/lcenterprisemisproduct/LCEnterpriseMIS/LCEnterpriseMIS.Web/LCEnterpriseMIS.Web.csproj'.

info : Generating MSBuild file /Users/markorciuch/Projects/lcenterprisemisproduct/LCEnterpriseMIS/LCEnterpriseMIS.Web/obj/LCEnterpriseMIS.Web.csproj.nuget.g.targets.

info : Writing assets file to disk. Path: /Users/markorciuch/Projects/lcenterprisemisproduct/LCEnterpriseMIS/LCEnterpriseMIS.Web/obj/project.assets.json

log : Failed to restore /Users/markorciuch/Projects/lcenterprisemisproduct/LCEnterpriseMIS/LCEnterpriseMIS.Web/LCEnterpriseMIS.Web.csproj (in 1.31 sec).

How to get around the "package downgrade detected" errors? Many thanks in advance.


12 Replies

MA Mark April 30, 2023 07:28 PM UTC

I was able to resolve my problem and got it working on MacOS. Now, I am trying to make it working on Windows and Linux.

I am following instructions from here to get the Windows and Linux binaries:

https://support.syncfusion.com/kb/article/9789/how-to-convert-html-to-pdf-using-blink-in-linux-docker 

I installed the following packages:

Syncfusion.HtmlToPdfConverter.Blink.Net.Core.Windows 

Syncfusion.HtmlToPdfConverter.Blink.Net.Core.Linux

When I look in the \.nuget\packages\syncfusion.htmltopdfconverter.net.linux\21.1.41 folder, I am not seeing the BlinkBinariesLinux or BlinkBinariesLinux folders. Same with Windows binaries. What am I missing?



GK Gowthamraj Kumar Syncfusion Team May 1, 2023 01:58 PM UTC

The reported exception might be due to a mismatched product version of Syncfusion assemblies. So, we request you refer to the same product version of Syncfusion assemblies to resolve this issue. If adding multiple Syncfusion assemblies to your project, it is dependent assemblies must be of the same assembly version, if they are different then the error will occur. We have created a sample for converting HTML to PDF and OCRing a PDF document using Syncfusion library and it is working properly. We have attached the sample for your reference, please try the sample on your end and let us know the result.


You can find the Blink binaries Linux from below nuget installed location,


You can find the Blink binaries Windows from below nuget installed location,


NuGet Package: https://help.syncfusion.com/file-formats/pdf/converting-html-to-pdf#nuget-packages-required-recommended




Attachment: SyncfusionSample_97aa3956.zip


MA Mark May 1, 2023 09:53 PM UTC

Thanks for the information. I was able to make it work on Windows for now. What I'm seeing is that Blink is slower and adds to the size of the docker image. Is there no way to continue using the legacy Webkit engine in .net core in the latest version of your software?

Is this the last release of WebKit engine?

<PackageReference Include="Syncfusion.HtmlToPdfConverter.QtWebKit.Net.Core" Version="20.2.0.40" />


SN Santhiya Narayanan Syncfusion Team May 2, 2023 12:32 PM UTC

WebKit based HTML to PDF conversion are deprecated. WebKit public NuGet package are not available/searched in nuget.org. But you can install the Syncfusion.HtmlToPdfConverter.QtWebKit.Net.Core package in Package manager console by using below command,

 

image

 

NuGet\Install-Package Syncfusion.HtmlToPdfConverter.QtWebKit.Net.Core -Version 21.1.41

 

https://www.nuget.org/packages/Syncfusion.HtmlToPfConverter.QtWebKit.Net.Core




MA Mark May 2, 2023 04:38 PM UTC

Thank you again for quick response! It's great to have this option for backwards compatibility.

Now I have updated my assemblies and I am proceeding with the original goal which is redaction of sensitive information. I am using the following blog entry as a starting point: https://www.syncfusion.com/blogs/post/easy-ways-to-redact-pdfs-using-c.aspx

The following code throws null object exception while trying to get GetImagesInfo() from loadedPage. I imagine I need to make some .net core specific adjustments because the referenced assembly "Syncfusion.PDF.OCR.WPF" is for .net framework. 

I am also attaching the PDF document used in my test. Many thanks in advance for additional guidance on how to make this work in .net core.

public byte[] RedactSensitiveInformation(RedactPdfRequest request)
{
byte[] result = request.PdfBytes;
using (OCRProcessor processor = new OCRProcessor(@"../../TesseractBinaries/3.02/"))
{
//Load the PDF document
PdfLoadedDocument lDoc = new PdfLoadedDocument(request.PdfBytes);

//Load the PDF page
PdfLoadedPage loadedPage = lDoc.Pages[0] as PdfLoadedPage;
//Language to process the OCR
processor.Settings.Language = Languages.English;

//Extract image and information from the PDF for processing OCR
PdfImageInfo[] imageInfoCollection = loadedPage.GetImagesInfo(); // <---- imageInfoCollection is null

foreach (PdfImageInfo imgInfo in imageInfoCollection)
{
Bitmap ocrImage = imgInfo.Image as Bitmap;
OCRLayoutResult ocrResult = null;
float scaleX = 0, scaleY = 0;
if (ocrImage != null)
{
//Process OCR by providing loaded PDF document, Data dictionary and language
string text = processor.PerformOCR(ocrImage, @"../../LanguagePack/", out ocrResult);

//Calculate the scale factor for the image used in the PDF
scaleX = imgInfo.Bounds.Height / ocrImage.Height;
scaleY = imgInfo.Bounds.Width / ocrImage.Width;
}

//Get the text from page and lines.
foreach (var page in ocrResult.Pages)
{
foreach (var line in page.Lines)
{
if (line.Text != null)
{
//Regular expression for social security number
var ssnMatches = Regex.Matches(line.Text, @"(\d{3})+[ -]*(\d{2})+[ -]*\d{4}", RegexOptions.IgnorePatternWhitespace);
if (ssnMatches.Count >= 1)
{
Syncfusion.Drawing.RectangleF redactionBound = new Syncfusion.Drawing.RectangleF(line.Rectangle.X * scaleX, line.Rectangle.Y * scaleY,
(line.Rectangle.Width - line.Rectangle.X) * scaleX, (line.Rectangle.Height - line.Rectangle.Y) * scaleY);

//Create PDF redaction for the found SSN location
PdfRedaction redaction = new PdfRedaction(redactionBound);

//Adds the redaction to loaded page
loadedPage.AddRedaction(redaction);
}
}
}
}
}

// Save the redacted PDF document
MemoryStream stream = new MemoryStream();
lDoc.Save(stream);
result = stream.ToArray();
stream.Position = 0;
lDoc.Close(true);
}
return result;
}

<PackageReference Include="Syncfusion.HtmlToPdfConverter.QtWebKit.Net.Core" Version="21.1.41" />
<PackageReference Include="Syncfusion.Pdf.Net.Core" Version="21.1.41" />
<PackageReference Include="Syncfusion.PDF.OCR.Net.Core" Version="21.1.41" />

Attachment: 2023_04_25_15_05_39.pdf_ac431189.zip


SN Santhiya Narayanan Syncfusion Team May 3, 2023 02:29 PM UTC


We have checked the reported issue with given document but it is working properly on our end and we have attached the sample for your reference.so please try the sample on your end and let us know the result.


Sample: https://www.syncfusion.com/downloads/support/directtrac/general/ze/Perform_OCR_ASPNetCore-1273977551


IF still you have facing any issue,we request you to share modified sample,input document to reproduce the issue on our end.so that it will be helpful for us to analyze and assist you further on this.



MA Mark May 5, 2023 02:02 PM UTC

Hello and thank you again for your most excellent support!


The sample works great except that the generated document is not redacted. The input document contains fake SSN in plain text that would expect to be redacted. Is there something else that I am missing?



SN Santhiya Narayanan Syncfusion Team May 8, 2023 03:01 PM UTC

We were able to reproduce the reported issue with provided details on our end. Currently, we are validating on this and will update the further details on May 10th 2023.



PV Prakash Viswanathan Syncfusion Team May 9, 2023 02:10 PM UTC

We have validated the reported redaction issue in our side. The image in the first page of the document is rotated by 180 degrees. You can check the image rotation by saving the image to file in sample level. As the bounds of the SSN text is wrong, the content is not redacted properly.


We have modified the code to calculate the proper X and Y for 180 degrees manually. To redact the content in .NET Core, we have to add ldoc.Redact() method. Kindly refer the below modified code example to resolve that issue in sample level.


 

    foreach (PdfImageInfo imgInfo in imageInfoCollection)

    {

        Bitmap ocrImage = imgInfo.Image as Bitmap;

 

        MemoryStream imgStream = new MemoryStream();

        ocrImage.Save(imgStream, System.Drawing.Imaging.ImageFormat.Bmp);

 

        OCRLayoutResult ocrResult = null;

        float scaleX = 0, scaleY = 0;

        if (ocrImage != null)

        {

            //Process OCR by providing loaded PDF document, Data dictionary and language

            string text = processor.PerformOCR(imgStream, tessdata,out ocrResult);

 

            //Calculate the scale factor for the image used in the PDF

            scaleX = imgInfo.Bounds.Height / ocrImage.Height;

            scaleY = imgInfo.Bounds.Width / ocrImage.Width;

        }

 

        //Get the text from page and lines.

        foreach (var page in ocrResult.Pages)

        {

            foreach (var line in page.Lines)

            {

                if (line.Text != null)

                {

                    //Regular expression for social security number

                    var ssnMatches = Regex.Matches(line.Text, @"(\d{3})+[ -]*(\d{2})+[ -]*\d{4}", RegexOptions.IgnorePatternWhitespace);

                    if (ssnMatches.Count >= 1)

                    {

                        Syncfusion.Drawing.RectangleF redactionBound = new Syncfusion.Drawing.RectangleF(line.Rectangle.X * scaleX, line.Rectangle.Y * scaleY,

                            (line.Rectangle.Width - line.Rectangle.X) * scaleX, (line.Rectangle.Height - line.Rectangle.Y) * scaleY);

 

 

                        //Image is rotated by 180 degree. so, apply height - y to get the correct y position.

                        redactionBound.Y = loadedPage.Size.Height - redactionBound.Y - redactionBound.Height;

 

                        //Image is rotated by 180 degree. so, apply widht - x to get the correct x position.

                        redactionBound.X = loadedPage.Size.Width - redactionBound.X - redactionBound.Width;

 

                        //Create PDF redaction for the found SSN location

                        PdfRedaction redaction = new PdfRedaction(redactionBound);

 

                        //Adds the redaction to loaded page

                        loadedPage.AddRedaction(redaction);

                    }

                }

            }

        }

    }

 

    lDoc.Redact();




MA Mark May 19, 2023 02:59 PM UTC

Thanks again for your help! 



MA Mark May 19, 2023 03:18 PM UTC

However, this change is still not redacting the embedded SSN. I am attaching the modified project.

Also, I think this line:

string text = processor.PerformOCR(imgStream, tessdata,out ocrResult);

should be:

string text = processor.PerformOCR(ocrImage, tessdata,out ocrResult);


Attachment: Perform_OCR_ASPNetCore_9edb87cf.rar


SN Santhiya Narayanan Syncfusion Team May 22, 2023 03:27 PM UTC

We have checked reported issue with the provided sample
but texts are redacted properly on our end and attached the output for your reference.

Output : https://www.syncfusion.com/downloads/support/directtrac/general/ze/Output712909651

Please refer the below screenshot,

                      Input

                               Output

 

 

 


If still you are facing any issue,we request you to elaborate your issue in detailed to check this on our end. So that it will helpful for us to analyze and assist you further on this


Loader.
Live Chat Icon For mobile
Up arrow icon