Data Extaction for PDF Like structure

I tried to extract table like structure data from PDF using syncfusion PDF library. When the data is extracted using ExtractText method, the data are trimmed with no space in between if there is more than 2 spaces. when dealing with structured data like tables where spacing is significant. Are there any alternate way to handle this scenario?

Code:

byte[]? pdfData = CONTENT;
PdfLoadedDocument document = new PdfLoadedDocument(pdfData);
TextLineCollection textLineCollection = new();
string pageText = document.Pages[pageIndex].ExtractText(out textLineCollection);

3 Replies 1 reply marked as answer

JT Jeyalakshmi Thangamarippandian Syncfusion Team June 12, 2024 09:40 AM UTC

Hi G Priyanka,

We suspect that the problem may be specific to the document. Therefore, we kindly request you to share the input PDF document with us so that we can replicate the issue on our end. This will enable us to analyze it further and provide you with a prompt solution.


Regards,

Jeyalakshmi T



Marked as answer

GP G Priyanka June 13, 2024 07:42 AM UTC

Hi Jeyalakshmi,

Please find the attached file.


Attachment: SyncfusionData_449566f8.zip


JT Jeyalakshmi Thangamarippandian Syncfusion Team June 14, 2024 10:01 AM UTC

Hi G Priyanka,

As of now, we don’t have direct support to extract table data from a PDF document, and also, we don't support preserving the PDF structure during data extraction. We only add a single space between elements, not multiple spaces. This is the expected behavior.

Regards,

Jeyalakshmi T



Loader.
Up arrow icon