We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date
Unfortunately, activation email could not send to your email. Please try again.
Syncfusion Feedback

How to convert HTML document to plain text in C# and VB.NET?

Platform: WinForms |
Control: DocIO

The Essential DocIO converts the HTML file into Word document and vice versa. You can also convert the HTML document to plain text format and vice versa.

In Word library (DocIO) we use XmlReader for parsing the content from input HTML. So, the input HTML should meet XML standard (have proper open and close tags), even if you specify XHTMLValidationType parameter as XHTMLValidationType.None.

XHTML Validation

Every HTML content is validated against a Document Type Declaration (DTD) which is a set of mark-up declarations that define a document type for a SGML-family mark-up language (GML, SGML, XML, HTML).

XHTML validation types

The following XHTML validation types are supported in Essential DocIO while importing an HTML content.

XHTML validation types

Description

XHTMLValidationType.None

It does not perform any schema validation, but the given HTML content should meet XHTML 1.0 format.

XHTMLValidationType.Transitional

It allows several attributes within the tags.

XHTMLValidationType.Strict

It does not allow the attributes inside the tag.

 

Steps to convert HTML document to plain text in C#

  1. Create a new C# console application project.

Create new C# console app in WinForms

  1. Install Syncfusion.DocIO.WinForms NuGet package as a reference to your .NET Framework applications from the NuGet.org.

Install WinForms NuGet packages

  1. Include the following namespace in the Program.cs file.

C#

using Syncfusion.DocIO;
using Syncfusion.DocIO.DLS;

VB

Imports Syncfusion.DocIO
Imports Syncfusion.DocIO.DLS
  1. Use the following code to convert HTML document to plain text.

C#

//Loads the HTML document against validation type none
WordDocument document = new WordDocument("Input.html", FormatType.Html, XHTMLValidationType.None);
//Saves the Word document
document.Save("HTMLtoText.txt", FormatType.Txt);
//Closes the document
document.Close();

VB

'Loads the HTML document against validation type none 
Dim document As WordDocument = New WordDocument("Input.html", FormatType.Html, XHTMLValidationType.None) 
'Saves the Word document
document.Save("HTMLtoText.txt", FormatType.Txt)
'Closes the document
document.Close()

 

A complete working example of converting a HTML document to plain text in C# can be downloaded from here.

Input HTML document as follows:

Input HTML document

By executing the program, you will get the plain text as follows:

Output Text file

Take a moment to peruse the documentation, where you can find basic Word document processing options along with features like mail merge, merge and split documents, find and replace text in the Word document, protect the Word documents, and most importantly PDF and Image conversions with code examples.

Explore more about the rich set of Syncfusion Word Framework features.

An online example to protect the Word document from editing using Essential DocIO..

See Also:

Word to HTML and HTML to Word Conversions

Note:

Starting with v16.2.0.x, if you reference Syncfusion assemblies from trial setup or from the NuGet feed, include a license key in your projects. Refer to link to learn about generating and registering Syncfusion license key in your application to use the components without trail message.

 

2X faster development

The ultimate WinForms UI toolkit to boost your development speed.
ADD COMMENT
You must log in to leave a comment

Please sign in to access our KB

This page will automatically be redirected to the sign-in page in 10 seconds.

Up arrow icon

Warning Icon You are using an outdated version of Internet Explorer that may not display all features of this and other websites. Upgrade to Internet Explorer 8 or newer for a better experience.Close Icon

Live Chat Icon For mobile