Format exception error while Text extracting line by line from the PDF file

Hi,

I'm trying to extract text from this PDF File line by line using linecollection properties. but its showing this error- FormatException: String must be exactly one character long. But if try extracting text using layout base then its ok.

Another issue, how could I read a table inside the pdf file as it is? plss see the below picture in PDF file

table.png


Here is my sample code.

foreach (PdfPageBase page in loadedDocument.Pages)

{

TextLineCollection lineCollection = new TextLineCollection();

page.ExtractText(out lineCollection);

foreach (TextLine line in lineCollection.TextLine)

{

foreach (TextWord word in line.WordCollection)

{

if (word.Text != " ")

{

extractedText = extractedText + word.Text.Trim() + " ";

}


}

extractedText += Environment.NewLine;

}

extractedText += Environment.NewLine + Environment.NewLine + Environment.NewLine;

extractedText += "===========================================================";

extractedText += Environment.NewLine + Environment.NewLine + Environment.NewLine;

}


Thanking in advance for your help


6 Replies

VS Vasugi Sivajothi Syncfusion Team August 9, 2021 12:11 PM UTC

Hi Moshiur, 
 
Thank you for contacting Syncfusion support. 
 
Please find the details. 
 
Query 
Details 
 
I'm trying to extract text from this PDF File line by line using linecollection properties. but its showing this error- FormatException: String must be exactly one character long. But if try extracting text using layout base then its ok. 
 
 
We were able to reproduce the reported issue “Exception throws while extracting text” from the provided document. We will analyze further on this and update you with more details on August 11, 2021. 
 
 
Another issue, how could I read a table inside the pdf file as it is? plss see the below picture in PDF file 
 
 
 
 
Our Syncfusion PDF Viewer control will be extracting the text from PDF documents based on the structure of content present in the PDF document. So, based on that we cannot recognize the rows and columns present in the table of the PDF document. Also, it is not possible to extract the text in the correct order as it is in the PDF document. Sorry for the inconvenience. 
 
 
Regards, 
Vasugi. 




VS Vasugi Sivajothi Syncfusion Team August 11, 2021 03:10 PM UTC

Hi Moshiur, 
Please find the details. 
Query 
Details 
I'm trying to extract text from this PDF File line by line using linecollection properties. but its showing this error- FormatException: String must be exactly one character long. But if try extracting text using layout base then its ok.  
  
 
We can resolve the reported issue using PdfRenderer class instead of using PdfLoadedDocument. Please refer to the below code snippet. 
Code Snippet. 
PdfRenderer pdfExtractText = new PdfRenderer(); 
pdfExtractText.Load("wwwroot/data/2ND PO_BASICLINE_REVISED.pdf"); 
List<Syncfusion.Blazor.PdfViewer.TextData> textCollection = new List<Syncfusion.Blazor.PdfViewer.TextData>(); 
for (int i = 0; i < pdfExtractText.PageCount; i++) 
{ 
  extractedText += pdfExtractText.ExtractText(i, out textCollection); 
} 
 
However, we have confirmed that the reported issue “Exception throws while extracting text using PdfLoadedDocument” is a defect and the fix will be included in our upcoming weekly release on August 31, 2021. 
You can track the status using below feedback link. 
 
Regards, 
Vasugi. 



MR Moshiur Rahman Shohel August 12, 2021 01:35 AM UTC

Thanks for your reply..



VS Vasugi Sivajothi Syncfusion Team August 13, 2021 03:34 AM UTC

Hi Moshiur,  
 
Thank you for the update. As we mentioned earlier, the fix for the reported issue will be included in our upcoming weekly release on August 31, 2021.  
 
Regards, 
Vasugi. 



VS Vasugi Sivajothi Syncfusion Team August 31, 2021 02:04 PM UTC

Hi Moshiur,   
 
Sorry for the inconvenience. The fix was not included in the latest weekly release. However, it will be included in our upcoming weekly NuGet release on September 8, 2021 
 
Regards, 
Vasugi 



VS Vasugi Sivajothi Syncfusion Team September 8, 2021 04:41 PM UTC

Hi Moshiur,    
  
We have fixed the reported issue and the fix for the reported issue was included in our latest weekly release v19.2.0.60. Kindly upgrade to that version to get the issue resolved. 
 
Packages: 
 
Blazor Client  
Blazor Sever 
Service side package    
ASP.NET Core :    
     
ASP.NET MVC:    
  
 
Regards, 
Vasugi. 


Loader.
Up arrow icon