Version of assemblies required for OCR support

12 Replies
4 Participants

Created by
MA Mark

Platform
ASP.NET Core - EJ 2

Platform
ASP.NET Core - EJ 2

Control
PDF

Created On
Apr 27, 2023 11:19 PM UTC

Last Activity On
May 22, 2023 03:27 PM UTC

Want to subscribe?
SIGN IN

Hello,

I have an existing .net core 3.1 project which uses the following Syncfusion packages:

    <PackageReference Include="Syncfusion.HtmlToPdfConverter.QtWebKit.Net.Core" Version="20.2.0.40" />
    <PackageReference Include="Syncfusion.Pdf.Net.Core" Version="20.2.0.40" />

I attempted to install the OCR support using the following command:

dotnet add package Syncfusion.PDF.OCR.Net.Core

After that, my project fails to restore:

Determining projects to restore...

Writing /var/folders/xq/g92c84y57ks8mt57rgxl8fh00000gn/T/tmpDs8Wlp.tmp

info : Adding PackageReference for package 'Syncfusion.PDF.OCR.Net.Core' into project '/Users/markorciuch/Projects/lcenterprisemisproduct/LCEnterpriseMIS/LCEnterpriseMIS.Web/LCEnterpriseMIS.Web.csproj'.

info : CACHE https://api.nuget.org/v3/registration5-gz-semver2/syncfusion.pdf.ocr.net.core/index.json

info : Restoring packages for /Users/markorciuch/Projects/lcenterprisemisproduct/LCEnterpriseMIS/LCEnterpriseMIS.Web/LCEnterpriseMIS.Web.csproj...

error: NU1605: Detected package downgrade: Syncfusion.Pdf.Net.Core from 21.1.41 to 20.2.0.40. Reference the package directly from the project to select a different version.

error: LCEnterpriseMIS.Web -> Syncfusion.PDF.OCR.Net.Core 21.1.41 -> Syncfusion.Pdf.Imaging.Net.Core 21.1.41 -> Syncfusion.Pdf.Net.Core (>= 21.1.41)

error: LCEnterpriseMIS.Web -> Syncfusion.Pdf.Net.Core (>= 20.2.0.40)

info : Package 'Syncfusion.PDF.OCR.Net.Core' is compatible with all the specified frameworks in project '/Users/markorciuch/Projects/lcenterprisemisproduct/LCEnterpriseMIS/LCEnterpriseMIS.Web/LCEnterpriseMIS.Web.csproj'.

info : PackageReference for package 'Syncfusion.PDF.OCR.Net.Core' version '21.1.41' added to file '/Users/markorciuch/Projects/lcenterprisemisproduct/LCEnterpriseMIS/LCEnterpriseMIS.Web/LCEnterpriseMIS.Web.csproj'.

info : Generating MSBuild file /Users/markorciuch/Projects/lcenterprisemisproduct/LCEnterpriseMIS/LCEnterpriseMIS.Web/obj/LCEnterpriseMIS.Web.csproj.nuget.g.targets.

info : Writing assets file to disk. Path: /Users/markorciuch/Projects/lcenterprisemisproduct/LCEnterpriseMIS/LCEnterpriseMIS.Web/obj/project.assets.json

log : Failed to restore /Users/markorciuch/Projects/lcenterprisemisproduct/LCEnterpriseMIS/LCEnterpriseMIS.Web/LCEnterpriseMIS.Web.csproj (in 1.31 sec).

How to get around the "package downgrade detected" errors? Many thanks in advance.

12 Replies

MA Mark April 30, 2023 07:28 PM UTC

I was able to resolve my problem and got it working on MacOS. Now, I am trying to make it working on Windows and Linux.

I am following instructions from here to get the Windows and Linux binaries:

https://support.syncfusion.com/kb/article/9789/how-to-convert-html-to-pdf-using-blink-in-linux-docker

I installed the following packages:

Syncfusion.HtmlToPdfConverter.Blink.Net.Core.Windows

Syncfusion.HtmlToPdfConverter.Blink.Net.Core.Linux

When I look in the \.nuget\packages\syncfusion.htmltopdfconverter.net.linux\21.1.41 folder, I am not seeing the BlinkBinariesLinux or BlinkBinariesLinux folders. Same with Windows binaries. What am I missing?

GK Gowthamraj Kumar Syncfusion Team May 1, 2023 01:58 PM UTC

The reported exception might be due to a mismatched product version of Syncfusion assemblies. So, we request you refer to the same product version of Syncfusion assemblies to resolve this issue. If adding multiple Syncfusion assemblies to your project, it is dependent assemblies must be of the same assembly version, if they are different then the error will occur. We have created a sample for converting HTML to PDF and OCRing a PDF document using Syncfusion library and it is working properly. We have attached the sample for your reference, please try the sample on your end and let us know the result.

You can find the Blink binaries Linux from below nuget installed location,

You can find the Blink binaries Windows from below nuget installed location,

NuGet Package: https://help.syncfusion.com/file-formats/pdf/converting-html-to-pdf#nuget-packages-required-recommended

Attachment: SyncfusionSample_97aa3956.zip

MA Mark May 1, 2023 09:53 PM UTC

Thanks for the information. I was able to make it work on Windows for now. What I'm seeing is that Blink is slower and adds to the size of the docker image. Is there no way to continue using the legacy Webkit engine in .net core in the latest version of your software?

Is this the last release of WebKit engine?

<PackageReference Include="Syncfusion.HtmlToPdfConverter.QtWebKit.Net.Core" Version="20.2.0.40" />

SN Santhiya Narayanan Syncfusion Team May 2, 2023 12:32 PM UTC

WebKit based HTML to PDF conversion are deprecated. WebKit public NuGet package are not available/searched in nuget.org. But you can install the Syncfusion.HtmlToPdfConverter.QtWebKit.Net.Core package in Package manager console by using below command,

NuGet\Install-Package Syncfusion.HtmlToPdfConverter.QtWebKit.Net.Core -Version 21.1.41

https://www.nuget.org/packages/Syncfusion.HtmlToPfConverter.QtWebKit.Net.Core

MA Mark May 2, 2023 04:38 PM UTC

Thank you again for quick response! It's great to have this option for backwards compatibility.

Now I have updated my assemblies and I am proceeding with the original goal which is redaction of sensitive information. I am using the following blog entry as a starting point: https://www.syncfusion.com/blogs/post/easy-ways-to-redact-pdfs-using-c.aspx

The following code throws null object exception while trying to get GetImagesInfo() from loadedPage. I imagine I need to make some .net core specific adjustments because the referenced assembly "Syncfusion.PDF.OCR.WPF" is for .net framework.

I am also attaching the PDF document used in my test. Many thanks in advance for additional guidance on how to make this work in .net core.

        public byte[] RedactSensitiveInformation(RedactPdfRequest request)
        {
            byte[] result = request.PdfBytes;
            using (OCRProcessor processor = new OCRProcessor(@"../../TesseractBinaries/3.02/"))
            {
                //Load the PDF document 
                PdfLoadedDocument lDoc = new PdfLoadedDocument(request.PdfBytes);

                //Load the PDF page
                PdfLoadedPage loadedPage = lDoc.Pages[0] as PdfLoadedPage;
                //Language to process the OCR
                processor.Settings.Language = Languages.English;

                //Extract image and information from the PDF for processing OCR
                PdfImageInfo[] imageInfoCollection = loadedPage.GetImagesInfo(); // <---- imageInfoCollection is null

                foreach (PdfImageInfo imgInfo in imageInfoCollection)
                {
                    Bitmap ocrImage = imgInfo.Image as Bitmap;
                    OCRLayoutResult ocrResult = null;
                    float scaleX = 0, scaleY = 0;
                    if (ocrImage != null)
                    {
                        //Process OCR by providing loaded PDF document, Data dictionary and language
                        string text = processor.PerformOCR(ocrImage, @"../../LanguagePack/", out ocrResult);

                        //Calculate the scale factor for the image used in the PDF
                        scaleX = imgInfo.Bounds.Height / ocrImage.Height;
                        scaleY = imgInfo.Bounds.Width / ocrImage.Width;
                    }

                    //Get the text from page and lines.
                    foreach (var page in ocrResult.Pages)
                    {
                        foreach (var line in page.Lines)
                        {
                            if (line.Text != null)
                            {
                                //Regular expression for social security number
                                var ssnMatches = Regex.Matches(line.Text, @"(\d{3})+[ -]*(\d{2})+[ -]*\d{4}", RegexOptions.IgnorePatternWhitespace);
                                if (ssnMatches.Count >= 1)
                                {
                                    Syncfusion.Drawing.RectangleF redactionBound = new Syncfusion.Drawing.RectangleF(line.Rectangle.X * scaleX, line.Rectangle.Y * scaleY,
                                        (line.Rectangle.Width - line.Rectangle.X) * scaleX, (line.Rectangle.Height - line.Rectangle.Y) * scaleY);

                                    //Create PDF redaction for the found SSN location
                                    PdfRedaction redaction = new PdfRedaction(redactionBound);

                                    //Adds the redaction to loaded page
                                    loadedPage.AddRedaction(redaction);
                                }
                            }
                        }
                    }
                }

                // Save the redacted PDF document
                MemoryStream stream = new MemoryStream();
                lDoc.Save(stream);
                result = stream.ToArray();
                stream.Position = 0;
                lDoc.Close(true);
            }
            
            return result;
        }

    <PackageReference Include="Syncfusion.HtmlToPdfConverter.QtWebKit.Net.Core" Version="21.1.41" />
    <PackageReference Include="Syncfusion.Pdf.Net.Core" Version="21.1.41" />
    <PackageReference Include="Syncfusion.PDF.OCR.Net.Core" Version="21.1.41" />

Attachment: 2023_04_25_15_05_39.pdf_ac431189.zip

SN Santhiya Narayanan Syncfusion Team May 3, 2023 02:29 PM UTC

We have checked the reported issue with given document but it is working properly on our end and we have attached the sample for your reference.so please try the sample on your end and let us know the result.

Sample: https://www.syncfusion.com/downloads/support/directtrac/general/ze/Perform_OCR_ASPNetCore-1273977551

IF still you have facing any issue,we request you to share modified sample,input document to reproduce the issue on our end.so that it will be helpful for us to analyze and assist you further on this.

MA Mark May 5, 2023 02:02 PM UTC

Hello and thank you again for your most excellent support!

The sample works great except that the generated document is not redacted. The input document contains fake SSN in plain text that would expect to be redacted. Is there something else that I am missing?

SN Santhiya Narayanan Syncfusion Team May 8, 2023 03:01 PM UTC

We were able to reproduce the reported issue with provided details on our end. Currently, we are validating on this and will update the further details on May 10th 2023.

PV Prakash Viswanathan Syncfusion Team May 9, 2023 02:10 PM UTC

We have validated the reported redaction issue in our side. The image in the first page of the document is rotated by 180 degrees. You can check the image rotation by saving the image to file in sample level. As the bounds of the SSN text is wrong, the content is not redacted properly.

We have modified the code to calculate the proper X and Y for 180 degrees manually. To redact the content in .NET Core, we have to add ldoc.Redact() method. Kindly refer the below modified code example to resolve that issue in sample level.

foreach (PdfImageInfo imgInfo in imageInfoCollection)

{

Bitmap ocrImage = imgInfo.Image as Bitmap;

MemoryStream imgStream = new MemoryStream();

ocrImage.Save(imgStream, System.Drawing.Imaging.ImageFormat.Bmp);

OCRLayoutResult ocrResult = null;

float scaleX = 0, scaleY = 0;

if (ocrImage != null)

{

//Process OCR by providing loaded PDF document, Data dictionary and language

string text = processor.PerformOCR(imgStream, tessdata,out ocrResult);

//Calculate the scale factor for the image used in the PDF

scaleX = imgInfo.Bounds.Height / ocrImage.Height;

scaleY = imgInfo.Bounds.Width / ocrImage.Width;

}

//Get the text from page and lines.

foreach (var page in ocrResult.Pages)

{

foreach (var line in page.Lines)

{

if (line.Text != null)

{

//Regular expression for social security number

var ssnMatches = Regex.Matches(line.Text, @"(\d{3})+[ -]*(\d{2})+[ -]*\d{4}", RegexOptions.IgnorePatternWhitespace);

if (ssnMatches.Count >= 1)

{

Syncfusion.Drawing.RectangleF redactionBound = new Syncfusion.Drawing.RectangleF(line.Rectangle.X * scaleX, line.Rectangle.Y * scaleY,

(line.Rectangle.Width - line.Rectangle.X) * scaleX, (line.Rectangle.Height - line.Rectangle.Y) * scaleY);

//Image is rotated by 180 degree. so, apply height - y to get the correct y position.

redactionBound.Y = loadedPage.Size.Height - redactionBound.Y - redactionBound.Height;

//Image is rotated by 180 degree. so, apply widht - x to get the correct x position.

redactionBound.X = loadedPage.Size.Width - redactionBound.X - redactionBound.Width;

//Create PDF redaction for the found SSN location

PdfRedaction redaction = new PdfRedaction(redactionBound);

//Adds the redaction to loaded page

loadedPage.AddRedaction(redaction);

}

lDoc.Redact();

MA Mark May 19, 2023 02:59 PM UTC

Thanks again for your help!

MA Mark May 19, 2023 03:18 PM UTC

However, this change is still not redacting the embedded SSN. I am attaching the modified project.

Also, I think this line:

string text = processor.PerformOCR(imgStream, tessdata,out ocrResult);

should be:

string text = processor.PerformOCR(ocrImage, tessdata,out ocrResult);

Attachment: Perform_OCR_ASPNetCore_9edb87cf.rar

SN Santhiya Narayanan Syncfusion Team May 22, 2023 03:27 PM UTC

We have checked reported issue with the provided sample
but texts are redacted properly on our end and attached the output for your reference.

Output : https://www.syncfusion.com/downloads/support/directtrac/general/ze/Output712909651

Please refer the below screenshot,

Input

Output

If still you are facing any issue,we request you to elaborate your issue in detailed to check this on our end. So that it will helpful for us to analyze and assist you further on this

12 Replies
4 Participants
Want to subscribe?
SIGN IN
Created by
MA Mark
Platform
ASP.NET Core - EJ 2
Control
PDF
Created On
Apr 27, 2023 11:19 PM UTC
Last Activity On
May 22, 2023 03:27 PM UTC

Viewer Component

.NET PDF Processing Library

Conversions

Editor Component

.NET Word Processing Library

Conversions

Editor Component

.NET Excel Processing Library

Conversions

.NET PowerPoint Processing Library

Conversions

Version of assemblies required for OCR support

Enterprise Solutions

Free Products

Viewer Component

.NET PDF Processing Library

Conversions

Editor Component

.NET Word Processing Library

Conversions

Editor Component

.NET Excel Processing Library

Conversions

.NET PowerPoint Processing Library

Conversions

Learning

Resources

Support

Version of assemblies required for OCR support