Hi,
The text extract from pdf is no working properly with break line. Below as code and attached pdf.
|
Document
|
Issue
|
|
1_BA20CBE6-4227-4E55-81E5-CF1BD280F4C8.txt
|
No
proper line break. For example: ARTICLE 1.1 should be start at new line.
|
|
1_E5E4AE9D-2C0E-4CCF-A39D-666D45D39B16.txt
|
No
proper line break, no space between words.
|
Code:
FileStream docStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(docStream);
PdfLoadedPageCollection loadedPages = loadedDocument.Pages;
foreach (PdfLoadedPage loadedPage in loadedPages)
{
docText += loadedPage.ExtractText();
}
Attachment:
Text_Extraction_Issue_91717c7e.zip