We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date
close icon

Parsing Table of Contents

I'm writing a parsing routine to parse Word documents to a proprietary object hierarchy used in my program. 

I am having trouble with recognising Word Table of Contents.  In the older DOC format, I can use EntityType.TOC to detect when I am dealing with a TOC but in DOCX the exact same document (saved as DOCX) and the exact same code fails to detect the TOC using EntityType.

 Here's a simple example 

    Public Sub ExampleParse(myfile As String)

        Dim doc As New WordDocument(myfile)
        Dim TOCEntity As ParagraphItem = Nothing

        For Each section As WSection In doc.Sections
            For Each paragraph As WParagraph In section.Paragraphs
                For Each item As ParagraphItem In paragraph.Items
                    Select Case item.EntityType
                        Case EntityType.TOC
                            ' I get here with a Word DOC file but never with the exact same file in DOCX version
                            TOCEntity = item
                        Case Else
                    End Select
                Next
            Next
        Next

    End Sub


Anyone got any ideas on why the DOCX version of the same file has this problem?  I've attached a two test documents as examples.  Set myFile to equal the name and path of one or other of the test Word documents. 

I see that the WordDocument class has a Friend member called TOC but I can't get access to that because it is marked Friend and not Public.


toc test_41d18709.zip

3 Replies

RM Ramkumar M Syncfusion Team August 7, 2012 12:43 PM UTC

Hi Steve,

Thank you for your interest in Syncfusion products.

On analyzing your docx format document, we found that the TOC filed preserved inside a StructureDocumentTag entity instead of with in a paragraph. Currently DocIO only provides only preservation support for StructureDocumentTag entity, so that it is not possible to loop through this entity to get TOC. As a workaround for this problem please try to resave the document by some other version of MS word or use DocIO, to preserve TOC with in a paragraph. For your reference please find the resaved document with TOC preserved inside a paragraph from the following link

Resaved Document:                                    

http://www.syncfusion.com/downloads/Support/DirectTrac/96973/toc%20test-resaved-1222843957.zip

Please let us know if you have any other questions

Regards

Ramkumar



SA Steve Aspey August 9, 2012 06:45 PM UTC

Thanks for the quick reply.  I'm not clear on one thing.  If, for example, I have a Word 2010 format DOCX and I use DocIO to open and resave it in a different format, I'm assuming the only choice I have is to resave it as say Word 2007.  Is that how you created the resaved file or did you use MS Word and saved as another format? 

Thanks


RM Ramkumar M Syncfusion Team August 13, 2012 03:50 AM UTC

Hi Steve,

Thanks for your update.

We have resaved your Doc format document ( toc test.doc ) as Docx document using DocIO ,but not the Docx format( toc test.docx ). If you resave your docx formatted document the TOC field will not preserve inside the paragraph instead of that it will remains as it in StructureDocumentTag (Content control) .

Please let us know if you have any question.

Regards

Ramkumar


Loader.
Live Chat Icon For mobile
Up arrow icon