Search in EPiServer can generate invalid html

I found two defects in the public templates for searching that makes it possible that the search result contains invalid html. Since a lot of people copied this code to their own sites I think is worth mentioning.

EPiServer.Core.Html.TextIndexer.StripHtml()

This method returns text with html tags removed. An optimization defect was introduced in R2.

If maxTextLengthToReturn parameter is greater than zero you risk that it cuts the raw text in the middle of a html-tag and then the regular expression that removes html-tags will fail.

You can end up with half a tag in the end of the string.

Exampel: "ipsum dolor <p al"

Search.aspx – HtmlEncode missing

In Search.aspx from the public templates a call is made to a local method to view a short snippet of text for each page in the search result.

Since this preview text can contain reserved characters like "<" or "&" (even if the defect above is fixed) you should always call HtmlEncoded like this:

<%# HttpUtility.HtmlEncode(GetPreviewText(Container.DataItem)) %>

Read more about pitfalls that can cause invalid html in EPiServer.