Syncfusion.Pdf.Net.Core Extract Text removes y from text

8 Replies
2 Participants

Created by
MZ Martin Zeug

Platform
WinForms

Platform
WinForms

Control
PDF

Created On
Dec 8, 2022 11:54 PM UTC

Last Activity On
Jan 18, 2023 08:57 AM UTC

Want to subscribe?
SIGN IN

Hi,

we are parsing PDFs for a pattern via .ExtractText() - we are parsing here for email Addresses. Now we have some @yahoo.com addresses and the y is missing in the parsed text. If I edit the file in Adobe Acrobat, save again and try again, it is working. The problem is, the document is created from a thirdparty software.

Has anybody an idea, why y coud not be parsed in this case - other y could not be read either.

Greetings,

8 Replies

IJ Irfana Jaffer Sadhik Syncfusion Team December 9, 2022 07:30 AM UTC

We suspect that the reported issue is document specific. if possible we request you to share the sample input document with us. So that we can proceed further in this.

MZ Martin Zeug December 9, 2022 06:00 PM UTC

Hi,

Added file. The Emailaddress in [] would be extracted wrong.

Many Thanks,

Martin

Attachment: 99910_00010_Max_Muster_02_2021_Brutto_Netto_302_4969f4ec.zip

IJ Irfana Jaffer Sadhik Syncfusion Team December 14, 2022 12:12 PM UTC

We confirmed the issue “ExtractText returns incorrect text for the particular PDF document” as a defect in our product and defect and Since our 2022 volume 4 release is expected to be rolled out upcoming week. So there will be no weekly release. We will include the fix for the reported issue in our upcoming weekly NuGet release once our Volume 4 is rolled out which we excepted on mid of December.

Please use the below feedback link to track the status of the reported bug,

https://www.syncfusion.com/feedback/39776/extracttext-returns-incorrect-text-for-the-particular-pdf-document

Note: If you require a patch for the reported issue in any of our Essential Studio Main or SP release version, then kindly let us know the version, so that we can provide a patch in that version based on our SLA policy.

Disclaimer: “Inclusion of this solution in the weekly release may change due to other factors including but not limited to QA checks and works reprioritization.”

IJ Irfana Jaffer Sadhik Syncfusion Team December 29, 2022 06:38 AM UTC

We have not included the fix for the reported issue “ExtractText returns incorrect text for the particular PDF document” in our latest weekly release . It will be available on next weekly release January 3rd, 2022. We have created custom NuGet for this fix in this product version 20.4.0.40.Please find the download link below

Custom NuGet:

https://www.syncfusion.com/downloads/support/directtrac/general/ze/syncfusion.pdf.imaging.net.core.20.4.0.40.nupkg-76041363

Please refer to the below KB steps to install the custom NuGet package,

How to install the customer patch NuGet in Windows machine | Miscellaneous - Extension (syncfusion.com)

Disclaimer: “Inclusion of this solution in the weekly release may change due to other factors including but not limited to QA checks and works reprioritization.”

IJ Irfana Jaffer Sadhik Syncfusion Team January 4, 2023 06:35 AM UTC

We have included the fix for the reported issue “ExtractText returns incorrect text for the particular PDF document” in our latest weekly NuGet release (v20.4.0.42). Please use the below link to download our latest weekly NuGet,

Nuget Link: NuGet Gallery | Syncfusion.Pdf.Net.Core 20.4.0.42

MZ Martin Zeug January 4, 2023 07:10 AM UTC

Great to hear - we will check that.

Thanks,

Martin

MZ Martin Zeug January 17, 2023 09:17 PM UTC

I tried Version v20.4.0.43 - the missing y is there, but I had surrounding braces [[email protected]] but the extracted text is only [[email protected] without the closing brace.

So one part is solved, but not the whole problem.

IJ Irfana Jaffer Sadhik Syncfusion Team January 18, 2023 08:57 AM UTC

We have checked the reported issue with the provided details on our end. But we can able to extract the complete text from the pdf document(Please refer to the screenshot below). We have attached the sample for your reference. Try this on your end and let us know the result.

Sample: https://www.syncfusion.com/downloads/support/directtrac/general/ze/NetCoreSample1985161822

We request you clear and delete the NuGet cache from your project and reinstall the NuGet and let us know the result. If still you are facing the issue, we request you the input document that you are currently working on with us. So that we can assist you further in this.

Need More Help?

Get personalized assistance from our support team.

Contact Support

8 Replies
2 Participants
Want to subscribe?
SIGN IN
Created by
MZ Martin Zeug
Platform
WinForms
Control
PDF
Created On
Dec 8, 2022 11:54 PM UTC
Last Activity On
Jan 18, 2023 08:57 AM UTC

Need More Help?

Get personalized assistance from our support team.

Contact Support