Advanced Keyword Searches (Regular Expressions)


Keyword Searches at this site are based on the Web standard called "Perl" regular expressions. Regular expressions are strings of characters which can stand for any word, partial word, or group of words. Using "REs" in a Web search allows you to find very complex, specific phrases without having to know their exact wording or spelling! Because of this, REs give Web surfers unprecedented searching power!

A basic regular expression consists of the words or wordparts you wish to find, separated by characters which can represent zero, one, or more separators. For example, the character "." can stand for ANY CHARACTER between two phrases. Thus for example, the RE "ab.d" would match the phrase "abcd", "abed", "ab(d", and so forth. You may use more than one "." in your RE: for example, "a.c." can match "abcd", "aXcY", etc. Thus the RE "...." will match ANY four-character string! [In fact without delimiters (defined below), "...." would match any string at all! See below...]

Adding a "*" after the "." makes it represent zero or more of ANY character. Thus, the RE "ab.*d" would match "abd", "abcd", "abcCd", "ab(even a phrase)d", and so forth! Using "?" in place of "*" means you only wish zero or one matches: thus RE "ab.?d" would match "abd", "abcd", "ab(d", etc., but NOT "abcccd"! Finally, the character "+" stands for one or more matches. As you might guess then, "ab.+d" would match "abcd", "abcLONGSTRINGd", or "ab<any phrase>", but NOT "abd"!

Note that the "." in all the examples above could have been replaced by another RE feature, the character range: a character range is any group of characters inside square "[]" braces. Some examples: "[123]" matches any ONE of the numbers 1, 2, or 3. "[cC]" matches either upper- or lower-case "c". Even better, the RE "[a-z]" matches ANY lower case letter, but nothing else! Combined with "*", "?", and "+", character ranges are very powerful: for instance, the string "NGC [0-9]+" would match "NGC 40", "NGC 7331", etc.

Finally, regular expressions allow you to "delimit" your search term: in other words, search for entire complete words, sentences, or even paragraphs. The strings used to do this are "\b", "^" and "$": "\b" stands for the beginning or end of a word; "^" stands for the beginning of a line of text, and "$" stands for the end of a line of text. Thus "^abc" will ONLY match a line which BEGINS with "abc", while "xyz$" will only match a line ENDING with "xyz". And as you may guess, "^abcxyz$" only matches a COMPLETE LINE "abcxyz"! Finally, "\babc\b" matches any line containing the WORD "abc".


Obviously regular expressions are powerful, but they do take some getting used to! Below are a few more simple examples of regular expressions to help you get started: a much more complete (and complex) reference can be found at The GREP Tutorial.

Search Examples using Regular Expressionss
The RE Search String:Would match any of:
the time for all good"Now is the time for all good men...", "Is the time for all good now come to an end?", etc.
^Now is the time$"Now is the time" (and nothing else)
[Oo]pen [Hh]eart"...open heart...", "Open heart...", "...open Heart...", "open Heart", etc.
[Rr][Ee].*grand.*!$"REs are worth a grand!", "An 're' is a grand thing!", "Reality is overly aggrandized!", etc.
\b[Ii][Cc] *[0-9]+\b"IC 10", "ic328", etc. (but NOT "RIC 0" or "IC 0a")


Clear skies!
Lew Gramer <dedalus@alum.mit.edu>