This code quickly strips out any HTML tags in a string. It does NOT require a regular expression and so runs quite a bit faster, especially on shorter strings. It works by first replacing all the "<" (which are only present at the start of a new HTML tag) with "><". It does this so that there is a single, consistant character to split the string on, while still leaving the "<" to identify the sections that are HTML. It then splits the string on ">" which cuts each section just before the html tags (as indicated by the "<") and at the end of each tag (as indicated by the closeing ">"). It then filters the resulting array, removing any element that contains a "<". This will be all the elements that were an html tag. The final operation is to simply re-join all the remaining elements, which are the text.
Example: "<i>this is <b>a <a href='test.html'>test</b></a>" (note that the html need not be correct or have matching closing tags.)
><i>this is ><b>a ><a href='test.html'>test></b>></a> (after replacing all "<" with "><")
|<i|this is |<b|a |<a href='test.html'|test|</b||</a| (after splitting on ">"; the | character is used to show the elements of the array)
|this is |a |test|| (after filtering out all the elements with a "<")
this is a test (after joining the remaining elements)
function StripHTML(ByRef asHTML) StripHTML = join(filter(split(replace(asHTML, "<", "><"),">"),"<", false)) End function
You may also want to remove excessive whitespace with:
set regex = New RegExp regex.pattern = "\s+" regex.Global = True ' Set global applicability. asHTML = regEx.Replace(asHTML, " ")
And possibly process common strings such as:
asHTML=replace(asHTML," "," ")
|file: /Techref/language/asp/striphtml.htm, 2KB, , updated: 2008/12/25 13:47, local time: 2018/2/17 18:56,
|©2018 These pages are served without commercial sponsorship. (No popup ads, etc...).Bandwidth abuse increases hosting cost forcing sponsorship or shutdown. This server aggressively defends against automated copying for any reason including offline viewing, duplication, etc... Please respect this requirement and DO NOT RIP THIS SITE. Questions?|
<A HREF="http://techref.massmind.org/techref/language/asp/striphtml.htm"> Strip HTML with ASP </A>
|Did you find what you needed?|
Welcome to massmind.org!
Welcome to techref.massmind.org!