<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Fredrik Haglund's blog &#187; IndexingService</title>
	<atom:link href="http://blog.fredrikhaglund.se/blog/tag/indexingservice/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.fredrikhaglund.se</link>
	<description>Chatter about EPiServer, ASP.NET, CSS and Web Development.</description>
	<lastBuildDate>Tue, 28 Jun 2011 13:37:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>About Searching inside Uploaded Files</title>
		<link>http://blog.fredrikhaglund.se/blog/2008/01/25/about-searching-inside-uploaded-files/</link>
		<comments>http://blog.fredrikhaglund.se/blog/2008/01/25/about-searching-inside-uploaded-files/#comments</comments>
		<pubDate>Fri, 25 Jan 2008 12:23:05 +0000</pubDate>
		<dc:creator>Fredrik Haglund</dc:creator>
				<category><![CDATA[EPiServer]]></category>
		<category><![CDATA[IFilter]]></category>
		<category><![CDATA[IndexingService]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Microsoft Indexing Service]]></category>
		<category><![CDATA[Unified File System]]></category>

		<guid isPermaLink="false">http://blog.fredrikhaglund.se/blog/2008/01/25/about-searching-inside-uploaded-files/</guid>
		<description><![CDATA[EPiServer uses both an open source library (Lucene) and Microsoft Indexing Service to create the search index for files. In EPiServer 4 was Microsoft Indexing Service responsible for building an index for ordinary files (i.e. the upload folder) and EPiServer Indexing Service was needed to get the versioned files in the documents folder indexed. The [...]]]></description>
			<content:encoded><![CDATA[<p>EPiServer uses both an open source library (<a href="http://lucene.apache.org/">Lucene</a>) and <a href="http://msdn2.microsoft.com/en-us/library/ms689718.aspx">Microsoft Indexing Service</a> to create the search index for files.</p>
<p>In <strong>EPiServer 4</strong> was Microsoft Indexing Service responsible for building an index for ordinary files (i.e. the <em>upload </em>folder) and EPiServer Indexing Service was needed to get the versioned files in the <em>documents </em>folder indexed. The reason EPiServer has to implement their own search for the documents folder is because the path and file name is stored in the database and the content in a file with a guid as name.</p>
<p>In <strong>EPiServer CMS 5</strong> all files are stored using the <em>VirtualPathVersioningProvider </em>(that always stores the path and file name in the database and the content as a file with a GUID as name). For this reason must the EPiServer Indexing Service be running and the web site configured to use it if you want to search files.</p>
<h3>So how does the keywords get into the index?</h3>
<p>You can not just take the content of a binary word document or pdf-file. The binary file must be converted to text first and for this EPiServer relies on a part of Microsoft Indexing Service. Applications can register converters the implement a COM-interface (<a href="http://blogs.msdn.com/ifilter/">IFilter</a>) and this is used by Microsoft Indexing Service, SharePoint, EPiServer or any application intrested in getting the text out of a binary document.</p>
<p>You can have a look on the implementation with <a href="http://www.aisto.com/roeder/dotnet/">Lutz Roeder&#8217;s .NET Reflector </a>if you load the <em>EPiServer.InexingService.exe</em> and look at the class: <em>EPiServer.IndexingService.Indexers.FileItemIndexer</em></p>
<h3>Do  you want to create your own EPiServer File System?</h3>
<p>A little more analysis reveals (at least with the 5.1.422 version) that EPiServer Indexing Service does not ask the <em>VirtualPathProvider</em> class for the content of the file! Instead it has hardcoded knowledge of the physical location used by the <em>VirtualPathVersioningProvider</em> (see <em>EPiServer.IndexingService.ItemIndexerManager.CreateDocument</em>). This makes it impossible to create your own implementation of EPiServer&#8217;s Unified File System and get the files indexed correctly.</p>
<p>See also:  <a href="http://blog.fredrikhaglund.se/blog/2008/01/07/storing-metadata-attached-to-uploaded-files/">Storing metadata attached to uploaded files</a> and <a href="http://www.google.se/search?hl=sv&amp;q=technotes+%22Microsoft+Index+Server%22+site%3Aepiserver.com">EPiServer Tech Notes</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.fredrikhaglund.se/blog/2008/01/25/about-searching-inside-uploaded-files/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

