)
We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. (Last updated on: June 24, 2019).
Unfortunately, activation email could not send to your email. Please try again.
Syncfusion Feedback

How to get the bounds of words by extracting text using PDF Viewer server library

Platform: ASP.NET Core - EJ 2 |
Control: PDF Viewer |
Published Date: September 19, 2019 |
Last Revised Date: September 19, 2019

Extract text using PDF Viewer server library

The PDF Viewer server library allows you to extract the text from a page along with the bounds. Text extracting can be done using the ExtractText() method. It will extract the text from the PDF document and return bounds of each character. Refer to the following UG link for more details.

https://ej2.syncfusion.com/aspnetcore/documentation/pdfviewer/how-to/extract-text/

Getting bounds of words using ExtractText()

The ExtractText() using PDF Viewer server library will return bounds of each character. Refer to the following code to get the bounds of the words.

Step1: Extracting the text from PDF document.

PdfRenderer renderer = new PdfRenderer();

            renderer.Load(@"currentDirectory\..\..\..\..\Data\HTTP Succinctly.pdf");

            List<TextData> textDataCollection = new List<TextData>();

            // "text" contains the whole text extracted from the PDF document

            string text = renderer.ExtractText(1, out textDataCollection);

            System.IO.File.WriteAllText(@"currentDirectory\..\..\..\..\Data\ExtractedText.txt", text);

 

Step2: Getting the bounds of the words with the extracted text

  //"textBounds" contain the bound of each word

            List<TextBounds> textBounds = new List<TextBounds>();

            int count = 0;

            string finalText = "";

            var glyphBounds = new RectangleF(0, 0, 0, 0);

            for (int j = count; j < textDataCollection.Count; j++)

            {

      //To find whether the character us empty string or new line

                if (!textDataCollection[j].Text.Contains("\r") && !textDataCollection[j].Text.Contains(" "))

                {

                    finalText += textDataCollection[j].Text;

                    int wordCount = 1;

                    var minx = textDataCollection[j].Bounds.Left;

                    var miny = textDataCollection[j].Bounds.Top;

                    var maxx = textDataCollection[j].Bounds.Right;

                    var maxy = textDataCollection[j].Bounds.Bottom;

                    for (int k = j + 1; k < textDataCollection.Count; k++, wordCount++)

                    {

                        if (!textDataCollection[k].Text.Contains(" ") && !textDataCollection[k].Text.Contains("\r"))

                        {

                           //Calculating the word bounds

                            if (minx > textDataCollection[k].Bounds.Left)

                                minx = textDataCollection[k].Bounds.Left;

                            if (miny > textDataCollection[k].Bounds.Top)

                                miny = textDataCollection[k].Bounds.Top;

                            if (maxx < textDataCollection[k].Bounds.Right)

                                maxx = textDataCollection[k].Bounds.Right;

                            if (maxy < textDataCollection[k].Bounds.Bottom)

                                maxy = textDataCollection[k].Bounds.Bottom;

                            finalText += textDataCollection[k].Text;

                            j = k;

                            if (j == textDataCollection.Count - 1)

                            {

                                glyphBounds = new RectangleF((float)minx, (float)miny, (float)(maxx - minx), (float)(maxy - miny));

                                textBounds.Add(new TextBounds(finalText.ToString(), glyphBounds));

                                finalText = "";

                                break;

                            }

                        }

                        else

                        {

                            glyphBounds = new RectangleF((float)minx, (float)miny, (float)(maxx - minx), (float)(maxy - miny));

                            textBounds.Add(new TextBounds(finalText.ToString(), glyphBounds));

                            finalText = "";

                            break;

                        }

                    }

                }

                else if (textDataCollection[j].Text.Contains("\r"))

                {

                    j++;

                }

            }   

 

Sample link:

https://www.syncfusion.com/downloads/support/directtrac/general/ze/WordBounds-1782596420

2X faster development

The ultimate ASP.NET Core UI toolkit to boost your development speed.
ADD COMMENT
You must log in to leave a comment

Please sign in to access our KB

This page will automatically be redirected to the sign-in page in 10 seconds.

Up arrow icon

Warning Icon You are using an outdated version of Internet Explorer that may not display all features of this and other websites. Upgrade to Internet Explorer 8 or newer for a better experience.Close Icon

Live Chat Icon For mobile
Live Chat Icon