Hi,
we are parsing PDFs for a pattern via .ExtractText() - we are parsing here for email Addresses. Now we have some @yahoo.com addresses and the y is missing in the parsed text. If I edit the file in Adobe Acrobat, save again and try again, it is working. The problem is, the document is created from a thirdparty software.
Has anybody an idea, why y coud not be parsed in this case - other y could not be read either.
Greetings,
We suspect that the reported issue is document specific. if possible we request you to share the sample input document with us. So that we can proceed further in this.
Hi,
Added file. The Emailaddress in [] would be extracted wrong.
Many Thanks,
Martin
We confirmed the issue “ExtractText returns incorrect text for the particular PDF document” as a defect in our product and defect and Since our 2022 volume 4 release is expected to be rolled out upcoming week. So there will be no weekly release. We will include the fix for the reported issue in our upcoming weekly NuGet release once our Volume 4 is rolled out which we excepted on mid of December.
Please use the below feedback link to track the status of the reported bug,
Note: If you require a patch for the reported issue in any of our Essential Studio Main or SP release version, then kindly let us know the version, so that we can provide a patch in that version based on our SLA policy.
Disclaimer: “Inclusion of this solution in the weekly release may change due to other factors including but not limited to QA checks and works reprioritization.”
We have not included the fix for the reported issue “ExtractText returns incorrect text for the particular PDF document” in our latest weekly release . It will be available on next weekly release January 3rd, 2022. We have created custom NuGet for this fix in this product version 20.4.0.40.Please find the download link below
Custom NuGet:
Please refer to the below KB steps to install the custom NuGet package,
Disclaimer: “Inclusion of this solution in the weekly release may change due to other factors including but not limited to QA checks and works reprioritization.”
We have included the fix for the reported issue “ExtractText returns incorrect text for the particular PDF document” in our latest weekly NuGet release (v20.4.0.42). Please use the below link to download our latest weekly NuGet,
Nuget Link: NuGet Gallery | Syncfusion.Pdf.Net.Core 20.4.0.42
Great to hear - we will check that.
Thanks,
Martin
I tried Version v20.4.0.43 - the missing y is there, but I had surrounding braces [name@yahoo.com] but the extracted text is only [name@yahoo.com without the closing brace.
So one part is solved, but not the whole problem.
Gr
We have checked the reported issue with the provided details on our end. But we can able to extract the complete text from the pdf document(Please refer to the screenshot below). We have attached the sample for your reference. Try this on your end and let us know the result.
Sample: https://www.syncfusion.com/downloads/support/directtrac/general/ze/NetCoreSample1985161822
We request you clear and delete the NuGet cache from your project and reinstall the NuGet and let us know the result. If still you are facing the issue, we request you the input document that you are currently working on with us. So that we can assist you further in this.