How to extract text from a PowerPoint presentation?

In PowerPoint presentation, text is always associated with shapes. Text can be added, modified, and extracted from auto-shapes like text box, rectangle, oval, partial circle, etc. Use the following code sample to extract text from PowerPoint presentation.


//Load the PowerPoint presentation 
IPresentation presentation = Presentation.Open("Sample.pptx"); 
//Text collection to store the extracted text 
List<string> textCollection = new List<string>(); 
//Iterate each slide in a presentation 
foreach (ISlide slide in presentation.Slides) 
    //Iterate all the shapes in the slide to get the text
    foreach (IShape shape in slide.Shapes) 
       //Check the shape is table 
       if (shape is ITable) 
          ITable table = shape as ITable; 
          //Iterate all the cells in the table and gets the text 
          foreach (IRow row in table.Rows) 
             foreach (ICell cell in row.Cells) 
                //Get the text from the cell body 
                string text = cell.TextBody.Text; 
                //Add the extracted text into string collection. 
           //Iterate all the paragraphs in the shape and gets the text 
           foreach (IParagraph paragraph in shape.TextBody.Paragraphs) 
              foreach (ITextPart textpart in paragraph.TextParts) 
                //Get the text from the paragraph 
                string text = textpart.Text; 
                //Add the extracted text into string collection 
  //Write the text collection to a text file
  System.IO.File.WriteAllLines("Sample.txt", textCollection); 
  //Dispose the presentation instance 


You can download the sample here.


