ExtractText() return dinstinct words as a single one joi

Hi everybody,
I'm trying to read a PDF file generated by AutoCAD, wich contains texts, tables and geometry of course.

Problem is, very far and separated words, like in a table cells, return as a single line with a single word, with all the characters joined together without separation between original words.

Using Adobe PDF reader, every single word is easily selectable individually.

Using isLayout parameter set to true does not make much difference. TextLines.Wordcollection contains the same result.

The code is very simple, just load the PDF and extract text...

Any advice would be greatly appreciated.

Max


Attachment: 03_200000PE0107_e9ae0037.rar

1 Reply

IJ Irfana Jaffer Sadhik Syncfusion Team September 30, 2022 11:26 AM UTC

Hi Massimo Cicognani,

Syncfusion Pdf Library Provides an ExtractText API with overload TextLineCollection to achieve the Collection of lines with the extracted text from the Pdf. We confirmed that the reported issues are resolved with this API. We request you to try this on your end and let us know if it satisfies your requirement.


Please refer to the below code snippet:

// Load the existing PDF document

PdfLoadedDocument loadedDocument = new PdfLoadedDocument(fileName);

 

// Get the first page of the loaded PDF document

PdfPageBase page = loadedDocument.Pages[0];

 

TextLines lineCollection = new TextLines();

 

// Extract text from the first page

string extractedText = page.ExtractText(out lineCollection);

 

// Gets specific lines from the collection

TextLine line = lineCollection[0];

 

// Gets bounds of the line

RectangleF lineBounds = line.Bounds;

 

// Gets text in the line

string text = line.Text;


Please follow the below links for more information,

https://help.syncfusion.com/file-formats/pdf/working-with-text-extraction#working-with-lines

https://help.syncfusion.com/cr/file-formats/Syncfusion.Pdf.PdfPageBase.html#Syncfusion_Pdf_PdfPageBase_ExtractText_Syncfusion_Pdf_TextLines__


Please let us know if you need any further assistance in this.


Regards,

Irfana J.



Loader.
Up arrow icon