Hello, in my asp.net core 5 web application I added the Syncfusion.Pdf.Net.Core (19.4.0.42) nuget package.
When I try to extract text from the pdf pages, all the characters with accents are suppressed. For example, the sentence "A Gestão da Documentação Escolar: o caso do Colégio" is extracted as "A Gesto da Documentaço Escolar: o caso do Colgio".
Here is my code:
FileStream docStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
PdfLoadedDocument pdf = new PdfLoadedDocument(docStream);
PdfPageBase page = pdf.Pages[0];
string texto = page.ExtractText();
I attached the sample PDF in case you need to reproduce this case.
Please any help is very welcome! Thank you in advance.
Hi Dhivya,
Thank you for your support. I really appreciate your help on this issue and I will look forward for the weekly NuGet release on February 16, 2022.
I just tried to open this link https://www.syncfusion.com/feedback/32056/accents-in-characters-are-not-extracted-properly and I received an Access Denied message stating that this private feedback was not associated with my account. Please tell me if this is ok.
Thank you very much again.
Best regards.
João Paulo
Hello Dhivya, thank you for your feedback and efforts to solve this issue. I will wait for the next weekly release. Thank you again.
Regards,
Joao
|
Service side package |
ASP.NET Core :
https://www.nuget.org/packages/Syncfusion.EJ2.PdfViewer.AspNet.Core.Windows/https://www.nuget.org/packages/Syncfusion.EJ2.PdfViewer.AspNet.Core.Linux/
ASP.NET MVC:
|
Hello Dhivya, thank you very much for your support during this time. I really appreciate your efforts on getting this issue solved.
Best Regards,
Joao Paulo