1.) Selection of a Search Strategy:
InfoTom already starts to process
information while retrieving databases and search engines: A good search
strategy is the best guarantee for success!
Choosing the right key words and combining them using Boolean operators
like AND and OR leads to much better results then using just a single
phrase.
|
Boolean
Search with WebFerret Pro
|
Another very useful approach
is defining the proximity between key words or phrases. This can be immediate
proximity as a complex "Exact Phrase" or any definable neighborhood
between words like for example 5 words proximity. Similar are search parameters
requiring that a certain word has to appear within the first N words (e.g.
within the first 10 words) of a web page. Typically "noise"
words like: "the", "a", "when" etc. are
ignored in both search strategies.
In addition, there are a
variety of more complex search strategies:
"Fuzzy" searches involving wild cards are able to find the occurrence
of key words even if they are misspelled.
Phonetic search strategies will find key words sharing a similar pronunciation.
A third group of specialized searches pays attention to the stemming of
key words. These searches may find key words even if they are present
in another grammatical variation as the one defined in the search phrase.
For example, searching for "retrieval" will also track down
pages containing "retrieve" or "retrievable".
Finally, InfoTom developed a comprehensive thesaurus of synonymous key
words. Instead of conducting several consecutive searches, one search
can also bring up results for synonyms of the specified key words.
|
Proximity
Search with BullsEye Pro
|
2.) Filter:
A second pre-search processing
is the use of filters which will automatically exclude results containing
any key word from a defined array. While the exclusion of adult oriented
pages might be helpful, the definition of specialized filters has to be
done very carefully as pre-search filtered pages will probably never come
to your attention again! In most cases it is therefore better to apply
filters after a search is done.
After a filtered or non-filtered
search procedure, the results can be processed using the same procedures
as before a search. This is in most cases safer than applying filters
before hand as the results can be screened and a decision can be made
whether it seems useful to apply a filter. In general, even a carefully
designed filter may exclude valuable information! In my experience it
is therefore in most cases better to screen the results manually although
this produces higher costs in time.
 |
 |
Post
Retrieval Processing using Proximity in BullsEye Pro
|
Post
Retrieval Processing using date and size in BullsEye Pro
|
3.) Processing and Data Extraction from
the Final Set of Results:
Results from online databases might be exportable in ASCII or other easy
to handle data formats. The majority of web pages will present the results
in HTML code. These data sets have to be processed by specialized extraction
and conversion tools to remove the markup code and transfer information
into exportable data formats to become accessable for the following data
analyses.
Dr.
Thomas Wassmer, Ph.D., Kapellenstr. 20, D-55124 Mainz, Germany, Phone:
+49-6131-211819, Fax: +49-6131-213911, Voice: Email: tom at infotom.com
|