PdfLoadedPage.ExtractText() extracts extra letter

Question

Hi,I am using PdfLoadedPage.ExtractText() method. If the word in the PDF contains the letter y, this letter is written in pairs. For example, myth is extracted as "myyth".How can I fix this?Thanks.

Irfana Jaffer Sadhik · Accepted Answer

We have already logged a similar issue ​“The unwanted characters being returned during text extraction from PDF”  and the fix for the reported problem is planned to include in our upcoming weekly NuGet release, which will be available on March 14th, 2023.Please use the below feedback link to track the status of the reported bug.https://www.syncfusion.com/feedback/41795/the-unwanted-characters-being-returned-during-text-extraction-from-pdfIf the issue persists after referring to the NuGet, it is possible that it may be related to the specific PDF document. To assist you further, we kindly request that you share the input document with usNote: If you require a patch for the reported issue in any of our Essential Studio Main or SP release versions, then kindly let us know the version, so that we can provide a patch in that version based on our SLA policy.Disclaimer: “Inclusion of this solution in the weekly release may change due to other factors including but not limited to QA checks and works reprioritization.”

Irfana Jaffer Sadhik · Answer

We have included the fix for the reported issue “The unwanted characters being returned during text extraction from PDF” in our latest weekly NuGet release (v20.4.0.54).

Please use the below link to download our latest weekly NuGet, https://www.nuget.org/packages/Syncfusion.Pdf.Net.Core/20.4.0.54