PDF ExtractText method suppress characters with accents

Hello, in my asp.net core 5 web application I added the Syncfusion.Pdf.Net.Core (19.4.0.42)  nuget package.

When I try to extract text from the pdf pages, all the characters with accents are suppressed. For example, the sentence "A Gestão da Documentação Escolar: o caso do Colégio" is extracted as "A Gesto da Documentaço Escolar: o caso do Colgio".

Here is my code:

FileStream docStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);

PdfLoadedDocument pdf = new PdfLoadedDocument(docStream);

PdfPageBase page = pdf.Pages[0];

string texto = page.ExtractText();

I attached the sample PDF in case you need to reproduce this case.

Please any help is very welcome! Thank you in advance.


Attachment: analise_da_gestao_de_documentacao_69378d7e.rar

10 Replies

DM Dhivyabharathi Mohan Syncfusion Team January 17, 2022 01:00 PM UTC

Hi Joao, 
  
  
 
Thank you for the PDF document. We were able to reproduce the reported issue “Texts are not properly extracted with the provided PDF document”. We will check and provide further details on January 20, 2022. 
  
 
 
Regards, 
Dhivya. 



DM Dhivyabharathi Mohan Syncfusion Team January 20, 2022 05:27 PM UTC

Hi Joao, 
 
 
We have confirmed the reported issue “Accents in characters are not extracted properly” as a defect and the fix for the issue will be included in our upcoming weekly NuGet release on February 16, 2022. You can track the status using the below feedback link, 
 
Regards, 
Dhivya. 
 



JP Joao Paulo Lima Braga January 21, 2022 11:56 AM UTC

Hi Dhivya,

Thank you for your support. I really appreciate your help on this issue and I will look forward for the weekly NuGet release on February 16, 2022.

I just tried to open this link https://www.syncfusion.com/feedback/32056/accents-in-characters-are-not-extracted-properly   and I received an Access Denied message stating that this private feedback was not associated with my account. Please tell me if this is ok.

Thank you very much again.

Best regards.

João Paulo



DM Dhivyabharathi Mohan Syncfusion Team January 24, 2022 11:02 AM UTC

Hi Joao, 
  
  
Thank you for your update. We have changed the feedback access permission. Kindly check and let us know whether you have the access for the provided feedback link. And as we mentioned in our previous update, we will include the fix in our upcoming weekly NuGet release on February 16, 2022. 
  
  
  
Regards, 
Dhivya. 



DM Dhivyabharathi Mohan Syncfusion Team February 17, 2022 06:28 AM UTC

Hi Joao, 
  
Sorry for the inconvenience. The fix was not included in our latest weekly release. We will include the fix in our upcoming weekly NuGet release on February 23, 2022. 
  
Regards, 
Dhivya. 
  



JP Joao Paulo Lima Braga February 18, 2022 04:01 PM UTC

Hello Dhivya, thank you for your feedback and efforts to solve this issue. I will wait for the next weekly release. Thank you again.


Regards,

Joao



DM Dhivyabharathi Mohan Syncfusion Team February 25, 2022 03:49 AM UTC

Hi Joao, 
  
Sorry for the inconvenience. The fix was not included in our latest weekly release. We will include the fix in our upcoming weekly NuGet release on March 2, 2022. 
 
Regards, 
Dhivya. 



DM Dhivyabharathi Mohan Syncfusion Team March 2, 2022 11:14 AM UTC

Hi Joao, 
 
We have fixed the reported issue and the fix was included in our latest weekly NuGet release v19.4.0.54. Kindly upgrade to that version to get the issue resolved. 
 
Packages:     
Service side package         
ASP.NET Core :         
ASP.NET MVC:         


 
 
 
 
 
Regards, 
Dhivya. 



JP Joao Paulo Lima Braga March 3, 2022 02:04 AM UTC

Hello Dhivya, thank you very much for your support during this time. I really appreciate your efforts on getting this issue solved.


Best Regards,

Joao Paulo



DM Dhivyabharathi Mohan Syncfusion Team March 3, 2022 07:12 AM UTC

Hi Joao, 
 
Thank you for your update. We are glad to know that the reported issue is resolved. 
 
Regards, 
Dhivya. 


Loader.
Up arrow icon