Tuesday, February 14, 2006

SP2003: Advanced Searching

With some work you can improve SharePoint's search features to include Boolean, Wildcard, and Phrase Searching.

SharePoint Portal Server 2003 has some very frustrating search quirks. The first stems from the fact that there are actually 2 search engines running on the site. I've discussed this before in the context of searching for PDF's:

The search engines also respond differently when trying to run advanced searches that include things like boolean commands (AND, OR, NOT) and enclosing phrases in quotes "" to find exact matches.

The two search engines are the main SharePoint Portal Server search and the Windows SharePoint Sites search. Both can handle a fairly basic search, but have limitations when it comes to advanced searching. Here are some differences between them:

SharePoint Portal Server search (from main SharePoint home page):
This search engine allows the use of quotes "" to enclose a phrase when searching for an exact match.
It does not allow Boolean searches (AND, OR, NOT)
There is an Advanced Search option that can be reached by clicking on the magnifying glass. The Advanced Search allows you to narrow your search by searching the properties of documents, lists, and other types of items. This feature does allow AND and OR boolean searches, but is limitied to file properties and will not work on file contents.

Windows SharePoint Sites search (from a team site):
This search engine will not use quotes "" to search for a phrase. Instead matches to any word in the phrase will be included in the search results.
It does not allow Boolean searches (AND, OR, NOT)
There is no Advanced Search page.

Here are some more specifics on the WSS search from the Help file:

The search engine automatically includes variations of words based upon the base stem, such as plurals. For example, searching for the word "page" also returns results for "pages."

You cannot use the asterisk "*" character.

The search engine does not support Boolean functions such as AND and OR.

The search engine automatically ignores common words such as "the," "it," and "by" as well as single-digit numbers.

The search engine is not case sensitive.

Attachments to list items do not appear in search results.

You cannot search for information in items (rows) that are not included in the current view. For example, if you search in a view that shows only items created by you, any items created by another user are not searched. However, searches include all information in the items that are not filtered, including information in columns that are hidden. Searches also include information in items that exceed the current view's item limit.

There may be an internal error on the search server. Contact your server administrator for more information.

The lack of Boolean capabilities in paticular is very frustrating. It appears that the search uses a FREETEXT function. FREETEXT automatically stems all of the words (a search for "fish" will include "fishes" but not "fishbowl") and puts an OR between them. This makes it impossible to run Boolean searches with the standard search engines.

Looking around, I found this useful article:

It details the stored procedures used to preform the search... at least the Windows SharePoint Site search. The main procedure is called "proc_FetchDocSearchResults", and it contains the FreeTextTable function that is causing the problems. This article describes several ways to circumvent it and instead use the boolean friendly "ContainsTable" function.

There are 3 solutions provided, but each is lacking. The first destroys the existing procedure in favor of just using the ContainsTable function. The second uses the clever idea of just pushing Boolean searches to a new procedure with the ContainsTable function, while keeping all other searches in the standard procedure. The only thing lacking here is that it only works on AND and OR Boolean searches. The third option provides alot more functionality including NOT, quotes "" for phrase searching, and * wildcard searching. The only problem here is that it requires that you type "FT" in front of your search if you want to use the default WSS FreeText search, otherwise all searches are done in this new procedure.

I wanted one that defaulted to the standard WSS FreeText search, unless one of a variety of Advanced search features were used, including quotes "", Boolean (AND, OR, NOT) and wildcards *. To do this, I took the samples and created the following: