Data Extaction for PDF Like structure

Question

I tried to extract table like structure data from PDF using syncfusion PDF library. When the data is extracted using ExtractText method, the data are trimmed with no space in between if there is more than 2 spaces. when dealing with structured data like tables where spacing is significant. Are there any alternate way to handle this scenario?

Code:

byte[]? pdfData = CONTENT;
PdfLoadedDocument document = new PdfLoadedDocument(pdfData);
TextLineCollection textLineCollection = new();
string pageText = document.Pages[pageIndex].ExtractText(out textLineCollection);

Jeyalakshmi Thangamarippandian · Accepted Answer

Hi G Priyanka,We suspect that the problem may be specific to the document. Therefore, we kindly request you to share the input PDF document with us so that we can replicate the issue on our end. This will enable us to analyze it further and provide you with a prompt solution.Regards,Jeyalakshmi T

G Priyanka · Answer

Hi Jeyalakshmi,Please find the attached file.Attachment: SyncfusionData_449566f8.zip

Jeyalakshmi Thangamarippandian · Answer

Hi G Priyanka,As of now, we don’t have direct support to extract table data from a PDF document, and also, we don't support preserving the PDF structure during data extraction. We only add a single space between elements, not multiple spaces. This is the expected behavior.Regards,Jeyalakshmi T