we are parsing PDFs for a pattern via .ExtractText() - we are parsing here for email Addresses. Now we have some @yahoo.com addresses and the y is missing in the parsed text. If I edit the file in Adobe Acrobat, save again and try again, it is working. The problem is, the document is created from a thirdparty software.
Has anybody an idea, why y coud not be parsed in this case - other y could not be read either.
We suspect that the reported issue is document specific. if possible we request you to share the sample input document with us. So that we can proceed further in this.
Added file. The Emailaddress in  would be extracted wrong.
We have not included the fix for the reported issue “ExtractText returns incorrect text for the particular PDF document” in our latest weekly release . It will be available on next weekly release January 3rd, 2022. We have created custom NuGet for this fix in this product version 220.127.116.11.Please find the download link below
Please refer to the below KB steps to install the custom NuGet package,
Disclaimer: “Inclusion of this solution in the weekly release may change due to other factors including but not limited to QA checks and works reprioritization.”
We have included the fix for the reported issue “ExtractText returns incorrect text for the particular PDF document” in our latest weekly NuGet release (v18.104.22.168). Please use the below link to download our latest weekly NuGet,
Great to hear - we will check that.
I tried Version v22.214.171.124 - the missing y is there, but I had surrounding braces [firstname.lastname@example.org] but the extracted text is only [email@example.com without the closing brace.
So one part is solved, but not the whole problem.
We have checked the reported issue with the provided details on our end. But we can able to extract the complete text from the pdf document(Please refer to the screenshot below). We have attached the sample for your reference. Try this on your end and let us know the result.
We request you clear and delete the NuGet cache from your project and reinstall the NuGet and let us know the result. If still you are facing the issue, we request you the input document that you are currently working on with us. So that we can assist you further in this.