Intrallect Intrallect intraLibrary 2.7: Advanced Searching

Intrallect intraLibrary 2.7: Advanced Searching


Revision: 2

Created: 4th August 2005

Last Revised: 7th July 2006

Contact: support@intrallect.com

Company: Intrallect Ltd

Product: intraLibrary, Learning Object Repository

Copyright: © Intrallect Ltd 2003-2005. All rights reserved.

This document is made available to support Intrallect's customers and users of Intrallect's software. The text of these documents and the design of the intraLibrary software are both the intellectual property of Intrallect Ltd. Intrallect do not provide this document for any other purpose, and offer no warranty nor accept any liability for its use in any other context. Parts of this document are based on the documentation for Apache Lucene.


Table of Contents
1. Introduction
2. Query Syntax
2.1. Basics
2.2. Case Insensitivity
2.3. Stopwords
2.4. Stemming
2.5. Plurals
2.6. Wildcard Searches
2.7. Fuzzy Searches
2.8. Proximity Searches
2.9. Range Searches
2.10. Boosting a Term
3. Search Operators
3.1. Boolean Operators
3.2. Grouping Operators
3.3. Special Characters
4. Advanced Search
4.1. Fields
4.2. Combining Searches
4.3. ALL or ANY
4.4. Saving Searches
4.5. Reusing Saved Searches

1. Introduction
table of contents

This guide describes two aspects of the search interface of intraLibrary - the syntax of search queries and the construction of complex search criteria through combinations of search terms. In addition, it also describes how search parameters can be stored and reused.

Searches are conducted in intraLibrary either through simple search, which is always immediately available, or advanced search, which is available from the top navigation bar. In this manual the query syntax and operators sections apply to all simple searches and free text fields in advanced searches. The final section only applies to advanced searches.


2. Query Syntax
table of contents

2.1. Basics
table of contents

A query is specified by using query terms. There are two types of terms: Single Terms and Phrases:

IntraLibrary provides a powerful and flexible search syntax. It automatically detects plurals and words with similar stems and ignores case. In addition there are many features that allow the search to be modified to narrow or broaden the terms used. These include: wildcards, fuzzy searching, proximity searching, range searching and boosting of terms.


2.2. Case Insensitivity
table of contents

All searches are case insensitive. You can use upper or lower case and words in the other case will be found.


2.3. Stopwords
table of contents

IntraLibrary uses a list of stopwords. Searches using words in this stop list are ignored. These are usually words such as and, the and or. Your intraLibrary administrator can modify the list of stopwords on your intraLibrary installation.


2.4. Stemming
table of contents

Searches in intraLibrary are filtered using the Porter Stemming method. This means stems of words will be used in the search so that, for example, searches for running will match run and searches for run will match running.


2.5. Plurals
table of contents

Plural searches are conducted automatically, so that a search for bucket will match buckets.


2.6. Wildcard Searches
table of contents

There is support in intraLibrary for single and multiple character wildcard searches.

The single character wildcard search looks for terms that match that with only a single character replaced. For example, to search for text or test you can use the search:

te?t

Multiple character wildcard searches looks for 0 or more characters. For example, to search for test, tests or tester, you can use the search:

test*

You can also use the wildcard searches in the middle of a term.

te*t

Note: You cannot use a * or ? symbol as the first character of a search.


2.7. Fuzzy Searches
table of contents

IntraLibrary supports fuzzy searches based on the Levenshtein Distance, or Edit Distance algorithm. To do a fuzzy search use the tilde, ~, symbol at the end of a single word term. For example to search for a term similar in spelling to "roam" use the fuzzy search:

roam~

This search will find terms like foam and roams


2.8. Proximity Searches
table of contents

IntraLibrary supports finding words which are within a specific distance of each other. To do a proximity search use the tilde, ~, symbol at the end of a phrase. The default is ~0. For example to search for a grilling and fish within 10 words of each other in a objects use the search:

"grilling fish"~10

2.9. Range Searches
table of contents

Range queries allow you to match objects whose field(s) values are between the lower and upper bound specified by the range query. Range queries can be inclusive or exclusive of the upper and lower bounds. Sorting is done lexicographically.

[Ernani TO Falstaff]

When used to search the Title field this will find all objects whose titles are between Ernani and Falstaff, including Ernani and Falstaff.

Inclusive range queries are denoted by square brackets. Exclusive range queries are denoted by curly brackets.

{Aida TO Otello}

This will find all objects whose titles are between Aida and Otello, but NOT including Aida and Otello.


2.10. Boosting a Term
table of contents

The relevance level of matching objects is based on the terms found in intraLibrary. To boost the relevance of a specific term use the caret, ^, symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.

Boosting allows you to control the relevance of an object by boosting a specific term. For example, if you are searching for

Verdi Rigoletto

and you want the term Verdi to be more relevant boost it using the ^ symbol along with the boost factor next to the term. You would type:

Verdi^4 Rigoletto

This will make objects with the term Verdi appear more relevant. You can also boost Phrase Terms as in the example:

"opera by verdi"^4 "opera by puccini"

By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2)


3. Search Operators
table of contents

In free text searches, including all simple searches, terms may be combined using Boolean operators. They may also be grouped together to create powerful combinations.


3.1. Boolean operators
table of contents

Boolean operators allow terms to be combined through logic operators. IntraLibrary supports AND, +, OR, NOT and - as Boolean operators.

Note: Boolean operators must always be ALL CAPITALS

OR
The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used. The OR operator links two terms and finds a matching object if either of the terms exist in a object. This is equivalent to a union using sets. The symbol || can be used in place of the word OR. To search for objects that contain either "italian operetta" or just "opera" use the query:
"italian operetta" opera or
"italian operetta" OR opera
AND
The AND operator matches objects where both terms exist anywhere in the text of a single object. This is equivalent to an intersection using sets. The symbol && can be used in place of the word AND. To search for objects that contain "italian opera" and "spanish opera" use the query:
"italian opera" AND "spanish opera"
+ (plus)
The + or required operator requires that the term after the + symbol exist somewhere in an object. To search for objects that must contain "opera" and may contain "verdi" use the query:
+opera verdi
NOT
The NOT operator excludes objects that contain the term after NOT. This is equivalent to a difference using sets. The symbol ! can be used in place of the word NOT. To search for objects that contain "italian opera" but not "spanish opera" use the query:
"italian opera" NOT "spanish opera" Note: The NOT operator cannot be used with just one term. For example, the following search will return no results:
NOT "scottish opera"
- (minus)
The - or prohibit operator excludes objects that contain the term after the - symbol. To search for objects that contain "italian opera" but not "spanish opera" use the query:
"italian opera" - "spanish opera" Note: The "-" symbol is only treated as an operator when it has a blank space on at least one side it. Otherwise it is treated as a hyphen. That means if you search for x-ray you can be hopeful of exposing some objects.

3.2. Grouping
table of contents

IntraLibrary allows you to use parentheses to group clauses to form sub queries. This can be very useful if you want to control the Boolean logic for a query.

To search for either "italian" or "spanish" and "opera" use the query:

(italian OR spanish) AND opera

This eliminates any confusion and makes sure that opera must exist and either term italian or spanish may exist.


3.3. Special Characters
table of contents

When you want to search for terms which include characters which have a special meaning in the query syntax you must escape these characters - you must let them escape from their normal meaning. IntraLibrary supports escaping of special characters that are part of the query syntax. The current list of special characters is

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \

To escape these character use the \ before the character. For example to search for (1+1):2 use the query:

\(1\+1\)\:2

4. Advanced Search
table of contents

The Advanced Search facility in intraLibrary allows you to search specific fields and to construct complex searches from combinations of these specific searches. In addition, you can save these searches to use again at a later time. IntraLibrary administrators may also make saved searches public so that they are available to all users.


4.1. Fields
table of contents

You can select specific fields to search using the pull-down menu in the Advanced Search interface.

The options that appear on this menu have been configured by your intraLibrary admininistrator. Once you have selected a field you will be presented with an option to search that field. Some of these fields contain free text while others contain a limited vocabulary of terms. If the field you choose has a limited vocabulary then the available options will be presented in a pull-down menu. If the field contains free text then you can include any of the search terms and operators that would normally be used in the simple search interface.

To conduct the search click on the search button.


4.2. Combining Searches
table of contents

When using the Advanced Search to search specific fields you can combine the search terms. For example you might want to search the Title field for Mozart, and the Technical Format field for a suitable audio format.

To add another field to your search, click on the + button.

You can add as many additional fields as are required. If you wish to remove any field simply click on the - button.

4.3. ALL or ANY
table of contents

When searching using multiple constraints the default is that ALL constraints must be satisfied. However, this can also be changed to having ANY of the search constraints satisfied by using the pull-down menu.

Remember to click on the search button to conduct the search.


4.4. Saving Searches
table of contents

When you have created a set of search constraints, a search filter, that you may wish to use again some time in the future you can save the search filter by providing a name for it in the box below save current search as a filter. This personal search filter will be available to you in all future sessions.


4.5. Reusing Saved Searches
table of contents

You can use public and private search filters by selecting them from the menu below choose search filter. A private filter is one that you have saved earlier and is available only to you. A public search filter is one that has been made available by your intraLIbrary administrator. If you no longer require a search filter select it and use the delete button to remove it.

A saved search filter can be applied by default to all your searches if you wish. To set a default search filter you need to choose the appropriate filter in the Preferences section of your Profile.

Note: When search filters are applied you will see the name of the search filter identified at the top of any search results page.