Wednesday, February 8, 2006

SP2003: Searching for PDF files, and other file types

SharePoint Portal Server can be configured to search PDF files, with some work...

SharePoint Portal Server 2003 has some strange search engine issues. The main issue is that there are actually 2 search engines.

Search Engine 1: SharePoint Portal Server search. This is the search that is run when you are on the main SharePoint home page. It is based on the old SharePoint Portal Server software that used to store everything in flat files, and it uses the SharePoint server indexing client to prepare files for searching.

Search Engine 2: Windows SharePoint Sites search. This is the search that runs when you are on a site, like a Team Site, or a Document Workspace. It is based on the old Team Services software which stored everything in SQL server. It uses the SQL server indexing client to prepare files for searching.

This is somewhat annoying as search results can be different depending on where you are in the system. It also requires that changes be made in 2 places.

By default, SharePoint can index Microsoft Office documents and several other file types. It can also be extended via "iFilters" to index other file types. The most common file type you may want to add for indexing is the Adobe Acrobat PDF.

You can download the PDF iFilter for free from Adobe's site. The link to version 6.0 is:
http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611

Adobe provides a nice README with installation instructions, but unfortunately it's packed in the installer, so you must install the iFilter before you can read how to install it. The key things to know from this document are the server requirements:

PDF iFilter 6.0 requires one of the following environments:

  • Microsoft Windows 2000 Professional, Service Pack 2

  • Microsoft Windows XP Professional, Service Pack 1

  • Microsoft Windows 2000 Server, Service Pack 3

  • Microsoft Windows 2003 Server

    Each of these environments must also contain Microsoft Indexing Services.
    In addition, PDF iFilter 6.0 has been tested in the following environments:

  • Microsoft Windows 2000 Server (Service Pack 3) with Microsoft Office SharePoint Portal Server 2001

  • Microsoft Windows 2003 Server with Microsoft Office SharePoint Portal Server 2003

  • Once you've downloaded the PDF iFilter, and your server has all of the required service packs, you must first install the iFilter on the SharePoint server. Microsoft has provided detailed instructions on how to do this:
    http://support.microsoft.com/default.aspx?scid=kb;EN-US;832809

    The only thing that document isn't clear on is the exact location of the PDF icons...
    So far I've discovered the following are needed:
    In web server extensions\60\TEMPLATE\IMAGES
    16x16 gif titled icpdf.gif
    16x16 gif titled PDF16.gif
    32x32 gif titled PDF32.gif

    right-click on these images and choose "Save Picture As..." to download them.

    After you've finished that installation, searches on the main SharePoint home page should be successfully showing PDF files.

    Next you must install the iFilter on the SQL server. Make sure your SQL server has the latest service packs, or you may run into this issue:
    http://support.microsoft.com/default.aspx?scid=kb;en-us;323040
    This install is similar to the SharePoint server install, with a few differences.
    First stop the Indexing Service
    Next install the software
    Next register the dll
    Finally restart the Indexing Service

    NOTE: This may not work on SQL 2005. Apparently there are additional "security" measures that make it very very difficult to install iFilters on SQL Server 2005. I've found several solutions online, but so far none work... hope to update this soon with a real solution.

    If you've already installed the icons for the SharePoint install then you don't need to worry about installing the icons for the SQL install.
    You will need to re-index the existing files on the site, and this is done on the SQL server.
    The easiest way to do this is to use the SQL Server Enterprise Manger.
    Open the SharePoint Databases, and find any that have any Catalogs listed under "Full-Text Catalogs". For our site there was only one catalog, and it was under the Server_SITE database. Next, right-click on the "Full-Text Catalogs" icon and choose "Repopulate All Catalogs". This can take several hours as the server will re-index all existing files.

    Once the SQL server indexing is finished then PDF files will show up in searches done on the site pages.

    There are many iFilters out there for different file types. Many companies like Adobe provide free iFilters for their document types. Others can be purchased. One site for purchasing iFilters is: http://www.ifiltershop.com/