We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date
close icon

PDF ExtractText using Physical Layout

Hi,

Does anyone know how to extract text from PDF document according to the physical layout (WYSIWYG)?


I am working on PDF document from vendor where the content are not created in a linear fashion.

For example, the content in the PDF file can be:
Name: Apple ***
Age : 21 ***
Sex : Male ***


When I use ExtractText function, I will get the following string:
Name: ***
Age : ***
Sex : ***
Apple
21
Male


What I want to get is:
Name: Apple ***
Age : 21 ***
Sex : Male ***

Any advice is appreciated.
Thanks.

Regards
HY

3 Replies

AG Angappan G Syncfusion Team April 26, 2010 10:33 AM UTC

Hi,

Thank you for your interest in Essential Studio.

Essential pdf supports text extraction, where the glyphs will be extracted according to the order in which they are stored in the document structure.Please have a look at the sample in the link below where the contents of the document are extracted in a linear fashion.

Sample Link:
http://files.syncfusion.com/samples/PDF.Windows/TextExtractionSample.zip

Please try this and let us know if you have any queries.

Regards,
Angappan.


RT Robert Titular October 9, 2013 07:30 PM UTC

I have the same request. When I tried out the attached solution, there is no difference in the out text file. I'm looking to have the text in the physical layout of the original document.

I have v11.3.0.30 of the PDF library.

Is this possible? I just tried the trial version of Aspose's pdf .net product and was able to extract the text to match the physical layout of the original document.

 

 

 



PH Praveenkumar H Syncfusion Team October 14, 2013 04:25 AM UTC

Hi Robert,

Thank you for using syncfusion products,

Please provide us the sample input file.
It will help us to investigate further in this.

With Regards,
Praveen

Loader.
Live Chat Icon For mobile
Up arrow icon