Category / Section
How to extract the text based on the text color?
1 min read
The support to extract text from the PDF document based on the color of the text is not supported directly in the PDF component. But, this can be achieved with the help of the ExtractText method with an option to obtain text along with its format details.
Refer to the following code snippet.
PdfLoadedDocument pdf;private void Form1_Load(object sender, System.EventArgs e) { //Loads the PDF document pdf = new PdfLoadedDocument(@"Succinctly.pdf"); textBox1.Text = "Red"; } private void button1_Click(object sender, EventArgs e) { List<TextData> TextFormat = new List<TextData>(); string text = null; //Gets the color by using the name of the color Color color = Color.FromName(textBox1.Text); if(color.ToArgb()==0) { MessageBox.Show("Enter the valid color name"); return; } for (int i = 0; i < pdf.Pages.Count; i++) { //Gets the PDF page PdfPageBase page = pdf.Pages[i]; //Extracts text with its format string pageTexts = page.ExtractText(out TextFormat); for (int j = 0; j < TextFormat.Count; j++) { if (TextFormat[j].FontColor.ToArgb() == color.ToArgb()) { text += TextFormat[j].Text; } } } if (text != null) MessageBox.Show(text); else MessageBox.Show("The PDF document does not contain " + textBox1.Text + " color text"); }