We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date

Syncfusion.Pdf.Net.Core Extract Text removes y from text

Hi,

we are parsing PDFs for a pattern via .ExtractText() - we are parsing here for email Addresses. Now we have some @yahoo.com addresses and the y is missing in the parsed text. If I edit the file in Adobe Acrobat, save again and try again, it is working. The problem is, the document is created from a thirdparty software. 

Has anybody an idea, why y coud not be parsed in this case - other y could not be read either.


Greetings,


8 Replies

IJ Irfana Jaffer Sadhik Syncfusion Team December 9, 2022 07:30 AM UTC

We suspect that the reported issue is document specific. if possible we request you to share the sample input document with us. So that we can proceed further in this.



MZ Martin Zeug December 9, 2022 06:00 PM UTC

Hi,

Added file. The Emailaddress in [] would be extracted wrong.


Many Thanks,

Martin


Attachment: 99910_00010_Max_Muster_02_2021_Brutto_Netto_302_4969f4ec.zip


IJ Irfana Jaffer Sadhik Syncfusion Team December 14, 2022 12:12 PM UTC

We confirmed the issue “ExtractText returns incorrect text for the particular PDF document” as a defect in our product and defect and Since our 2022 volume 4 release is expected to be rolled out upcoming week. So there will be no weekly release. We will include the fix for the reported issue in our upcoming weekly NuGet release once our Volume 4 is rolled out which we excepted on mid of December.


Please use the below feedback link to track the status of the reported bug,

https://www.syncfusion.com/feedback/39776/extracttext-returns-incorrect-text-for-the-particular-pdf-document

 

Note: If you require a patch for the reported issue in any of our Essential Studio Main or SP release version, then kindly let us know the version, so that we can provide a patch in that version based on our SLA policy.

 

Disclaimer: “Inclusion of this solution in the weekly release may change due to other factors including but not limited to QA checks and works reprioritization.” 



IJ Irfana Jaffer Sadhik Syncfusion Team December 29, 2022 06:38 AM UTC

We have not  included the fix for the reported issue “ExtractText returns incorrect text for the particular PDF document” in our latest weekly release . It will be available on next weekly release January 3rd, 2022. We have created custom NuGet for this fix in this product version 20.4.0.40.Please find the download link below


Custom NuGet:

https://www.syncfusion.com/downloads/support/directtrac/general/ze/syncfusion.pdf.imaging.net.core.20.4.0.40.nupkg-76041363


Please refer to the below KB steps to install the custom NuGet package, 

 How to install the customer patch NuGet in Windows machine | Miscellaneous - Extension (syncfusion.com)

Disclaimer: “Inclusion of this solution in the weekly release may change due to other factors including but not limited to QA checks and works reprioritization.”



IJ Irfana Jaffer Sadhik Syncfusion Team January 4, 2023 06:35 AM UTC

                                     We have included the fix for the reported issue ExtractText returns incorrect text for the particular PDF documentin our latest weekly NuGet release (v20.4.0.42). Please use the below link to download our latest weekly NuGet,


Nuget Link: NuGet Gallery | Syncfusion.Pdf.Net.Core 20.4.0.42



MZ Martin Zeug January 4, 2023 07:10 AM UTC

Great to hear - we will check that.

Thanks,

Martin



MZ Martin Zeug January 17, 2023 09:17 PM UTC

I tried Version v20.4.0.43 - the missing y is there, but I had surrounding braces [name@yahoo.com] but the extracted text is only  [name@yahoo.com without the closing brace.


So one part is solved, but not the whole problem.


Gr



IJ Irfana Jaffer Sadhik Syncfusion Team January 18, 2023 08:57 AM UTC

We have checked the reported issue with the provided details on our end. But we can able to extract the complete text from the pdf document(Please refer to the screenshot below). We have attached the sample for your reference. Try this on your end and let us know the result.



Sample: https://www.syncfusion.com/downloads/support/directtrac/general/ze/NetCoreSample1985161822

We request you clear and delete the NuGet cache from your project and reinstall the NuGet and let us know the result. If still you are facing the issue, we request you the input document that you are currently working on with us. So that we can assist you further in this.


Loader.
Live Chat Icon For mobile
Up arrow icon