Syncfusion Feedback

Trusted by the world’s leading companies

Syncfusion Trusted Companies

Overview

The Syncfusion® JavaScript PDF Library provides robust APIs for extracting meaningful content from PDF documents in web applications. It supports plain-text extraction for indexing and analysis, extracting text with precise bounds for layout-based processing, and high-quality embedded image retrieval. Create enabling powerful, data-driven workflows for analyzing content, processing specific regions, or reusing graphical assets. The library is compatible with JavaScript, TypeScript, Angular, React, Vue and Node.js.

How to extract text with bounds in JavaScript?

import { PdfDocument } from '@syncfusion/ej2-pdf'; 
import { PdfDataExtractor, TextLine } from '@syncfusion/ej2-pdf-data-extract'; 

// Load an existing PDF document.
let document: PdfDocument = new PdfDocument(data); 
// Initialize a new instance of the `PdfDataExtractor` class 
let extractor: PdfDataExtractor = new PdfDataExtractor(document); 
// Extract `TextLine` from the PDF document. 
let textLines: Array<TextLine> = extractor.extractTextLines({ startPageIndex: 0, endPageIndex: document.pageCount-1}); 
// Iterate through each text line in the collection 
textLines.forEach((textLine: TextLine) => { 
    // Gets the bounds of the text line. 
    let lineBounds: Rectangle = textLine.bounds; 
    // Gets the single line of extracted text from the PDF page. 
    let line: string = textLine.text; 
    // Gets the page index of the text line extracted. 
    let pageIndex: number = textLine.pageIndex; 
    // Gets the collection of text words extracted from a specified page in a PDF document. 
    let words: TextWord[] = textLine.words; 
    // Gets the name of the font used for a particular line of text. 
    let fontName: string = textLine.fontName; 
    // Gets the font style used for a particular line of text. 
    let fontStyle: PdfFontStyle = textLine.fontStyle; 
    // Gets the font size used for a particular line of text. 
    let fontSize: number = textLine.fontSize; 
}); 
// Save the document 
document.save('output.pdf'); 
// Destroy the document 
document.destroy();

Different ways to extract text and images from PDFs

Explore different methods for extracting text and images from PDFs.

Extract text from a PDF document using JavaScript PDF.

Extract text from a PDF document

Extract complete text content from a PDF page or an entire document using simple text extraction APIs. The library extracts text in reading order, making it easier to perform full text searching, content analysis, and further document processing.

Extract text with bounds from a PDF document using JavaScript PDF.

Extract text with bounds from a PDF document

Extract text along with its positional data to understand exactly where each fragment appears on the page. Bounds include x and y coordinates, width, and height, enabling precise mapping for region-based text filtering and layout and structure analysis.

 Extract text using a layout-based approach from a PDF document using JavaScript PDF.

Extract text from a PDF document using a layout-based approach

Extract text from a PDF while preserving its visual layout, including reading order, columns, line breaks, and spacing. This is especially useful for multi-column documents, tables, and structured reports, enabling full text searching with natural reading flow and content analysis that respects columns and line structure.

Extract embedded visuals within a PDF document using JavaScript PDF.

Extract images

Extract images embedded within PDF pages for image analysis and processing; reuse in documents, presentations, or web apps; archiving or converting images.

Syncfusion JavaScript PDF Library Resources

Explore these resources for comprehensive guides, knowledge base articles, insightful blogs, and ebooks.

Frequently Asked Questions

The Syncfusion JavaScript PDF Library provides text extraction APIs to retrieve plain text from a page or an entire document for search, indexing, or content analysis.

Yes. The library extracts embedded images from PDF pages, preserving their original quality.

It retrieves text along with its exact position on the page. Use it when processing specific regions, analyzing forms, detecting column layouts, or mapping content locations.

Our Customers Love Us

Having an excellent set of tools and a great support team, Syncfusion® reduces customers’ development time.
Here are some of their experiences.


Rated by users across the globe

Awards

Greatness—it’s one thing to say you have it, but it means more when others recognize it. Syncfusion® is proud to hold the following industry awards.

Scroll up icon