Mastering SharePoint

Search Results (follow-up from Findability webinar)

Latest post Mon, Oct 6 2008 12:55 AM by Bob Mixon. 2 replies.
  • Thu, Sep 25 2008 12:38 PM

    Search Results (follow-up from Findability webinar)

    Thank you so much for providing the webinars on findability.

    My question relates to searchability when the index contains misspellings.  In the real world, there are often many mistakes in documents, indexed fields and metadata for example as well as there will be some correct words OCR'd incorrectly (like missing a space and putting two words together). Thus after crawling, the index will be accurate for what it looked at but not findable by the user.

    But in spite of this these documents need to be discoverable and I was wondering if SharePoint had any "smarts" built in to handle this.  For example, if there was a handwritten document that was hard to read, the indexer may very well get it wrong and may type in 'Mixan' instead of 'Mixon' when assigning a keyword to the document. However when searching the user would have no idea that the name is wrong and search on 'Mixon'.

    I have worked with a search provider that uses an algorithm where they use closeness of two words by counting how many sets of consecutive pairs of letters match for a given word (along with other factors).  For example, there is a sliding scale where you can say you want only exact matches or 80% confidence.  I believe this is called 'fuzzy searching'.

    I am a developer, with a background in imaging, starting to dig into SharePoint and I'm very interested in its capabilities.  I know it's not supposed to replace a document management system, but in fact many customers expect it to work like one.

    Thanks,

    Maggie

    Filed under:
  • Sat, Sep 27 2008 7:39 PM In reply to

    Re: Search Results (follow-up from Findability webinar)

    The SharePoint capability that resembles the closest to what you describe is provided by the search predicate CONTAINS. This predicate has a number of features, one of which is FORMSOF. FORMSOF has two possible values: INFLECTIONAL finds inflection forms of a given word: “Program” finds “programming, “programmed”, etc. The other value is THESAURUS that finds words of similar meaning: “happy“finds “glad”, etc.  You can add terms to the dictionary to account for misspellings i.e. “Mixan” to find “Mixon”, “Microsof” to find “Microsoft”, etc.  To make use of the CONTAINS feature, you have to develop a customized Enterprise Search class in code.

     

  • Mon, Oct 6 2008 12:55 AM In reply to

    Re: Search Results (follow-up from Findability webinar)

    Hi Maggie,

    I was going to say the same thing as Jose, the thesaurus can be used for this type of thing.

    I am also curious to better understand you statement "SharePoint is not supposed to replace a document management system"?

     

Page 1 of 1 (3 items) | RSS
Copyright (c) 2008 Mixon Consulting, Inc.
Powered by Community Server (Commercial Edition), by Telligent Systems