Information Processing

Information Processing

































1.) Selection of a Search Strategy:
InfoTom already starts to process information while retrieving databases and search engines: A good search strategy is the best guarantee for success!
Choosing the right key words and combining them using Boolean operators like AND and OR leads to much better results then using just a single phrase.


Boolean Search with WebFerret Pro


Another very useful approach is defining the proximity between key words or phrases. This can be immediate proximity as a complex "Exact Phrase" or any definable neighborhood between words like for example 5 words proximity. Similar are search parameters requiring that a certain word has to appear within the first N words (e.g. within the first 10 words) of a web page. Typically "noise" words like: "the", "a", "when" etc. are ignored in both search strategies.

In addition, there are a variety of more complex search strategies:
"Fuzzy" searches involving wild cards are able to find the occurrence of key words even if they are misspelled.
Phonetic search strategies will find key words sharing a similar pronunciation. A third group of specialized searches pays attention to the stemming of key words. These searches may find key words even if they are present in another grammatical variation as the one defined in the search phrase. For example, searching for "retrieval" will also track down pages containing "retrieve" or "retrievable".
Finally, InfoTom developed a comprehensive thesaurus of synonymous key words. Instead of conducting several consecutive searches, one search can also bring up results for synonyms of the specified key words.


Proximity Search with BullsEye Pro


2.) Filter:
A second pre-search processing is the use of filters which will automatically exclude results containing any key word from a defined array. While the exclusion of adult oriented pages might be helpful, the definition of specialized filters has to be done very carefully as pre-search filtered pages will probably never come to your attention again! In most cases it is therefore better to apply filters after a search is done.

After a filtered or non-filtered search procedure, the results can be processed using the same procedures as before a search. This is in most cases safer than applying filters before hand as the results can be screened and a decision can be made whether it seems useful to apply a filter. In general, even a carefully designed filter may exclude valuable information! In my experience it is therefore in most cases better to screen the results manually although this produces higher costs in time.


Post Retrieval Processing using Proximity in BullsEye Pro
Post Retrieval Processing using date and size in BullsEye Pro


3.) Processing and Data Extraction from the Final Set of Results:
Results from online databases might be exportable in ASCII or other easy to handle data formats. The majority of web pages will present the results in HTML code. These data sets have to be processed by specialized extraction and conversion tools to remove the markup code and transfer information into exportable data formats to become accessable for the following data analyses.

Dr. Thomas Wassmer, Ph.D., Kapellenstr. 20, D-55124 Mainz, Germany, Phone: +49-6131-211819, Fax: +49-6131-213911, Voice: Email: tom at