TITUS Text Retrieval Engine

The TITUS Text Retrieval Engine

The linguistic content of the texts that are stored in the TITUS Text Database is accessible in two ways:

By reference to a given text passage. All texts are subdivided into several levels such as, e.g., chapters, paragraphs, verses, pages (of editions), lines and the like, mostly in agreement with underlying printed editions. A given passage can be accessed by using an entry form that covers the complete text data base; additionally, text-specific entry forms are displayed in the right frame of the text window when the start page of a given text is opened.
By using the word search engine.This can be accessed in four ways:
- In a given text, every word is linked to the search engine in the way that double-clicking on it will cause the search engine to look for all occurrences of the given word (including minor orthographic variants, e.g. consisting in the use of diacritics) in the given text. The results will be displayed in a "KWIC" ("keyword-in-context") style in the Query Result Table which can be used as the starting point for further enquiries.
- Within the Query Result Table, all words displayed as part of the output are linked to the search engine again, but with the difference that double-clicking on them will cause the search engine to look for all occurrences of these words (including minor orthographic variants, e.g. consisting in the use of diacritics) in all TITUS texts and in all language varieties matching. The results will be displayed in a "KWIC" ("keyword-in-context") style in another Query Result Table which can be used as the starting point for further enquiries etc. Note that the output will be ordered according to a) word forms and b) text age.
- Every text is provided with a search entry form which can be selected via a link in the right frame of the text window. The entry form which will be displayed in the same frame when activated contains several radio buttons to determine the scope of the search (e.g., language varieties, scope of texts to be included, search conditions) and a box where the word(s) to be searched can be entered with the keyboard. Please note that for the entry of non-Latin and special characters the TITUS encoding conventions must be observed. The results will be displayed in the Query Result Table which can be used as the starting point for further enquiries. Note that "wildcard" symbols representing single characters or sequences of characters can be used in the entry form.
- In quite a similar way, access to the retrieval engine is also available from a special query form page. Here too, you will first have to select the language (variatie(s)) the search applies to, then enter the word form to be searched. Please note that for the entry of non-Latin and special characters the TITUS encoding conventions must be observed. The results will be displayed in the Query Result Table which can be used as the starting point for further enquiries. Note that "wildcard" symbols representing single characters or sequences of characters can be used in the entry form.

Within the TITUS retrieval engine, searching can be done in three levels of exactness:

Exact match: All graphical elements of the given word(form) are considered as obligatory, including the differentiation of upper case / lower case letters and all diacritics.
Inexact match: The word(form)s are reduced to a minimal distinction of graphical elements by neglecting, e.g., the difference between upper and lower case letters, diacritics etc.
Fuzzy match: The elements of the word(form) entered in the query form are considered as forming a skeleton only, with any (sequence of) character(s) admissible between them.

In the query entry forms, two "wildcard" symbols can be used to represent single characters or a sequence of characters, resp. The wildcards to be used are the question mark, "?", for single characters, and the asterisk, "*", for sequences of characters. E.g., "h?t" will represent "hat", "hit", "hot", and "hut" while "h*t" will also represent "hunt", "host", "haircut" etc. Please note that the search type will automatically be set to "inexact match" when you use "*" as a wildcard in the query.

Further search types, including language-specific wildcard searches to match "all vowels", "all stops" etc. will be provided in the future, we hope.

Attention: All texts stored in the TITUS text database are encoded using Unicode / UTF8. The special characters as contained in them can only be displayed and printed using a font that covers Unicode such as the TITUS font Titus Cyberbit Unicode.

Notice on copyright and etiquette

All texts that can be downloaded via http from the TITUS server can be used freely for scholarly purposes, provided that they are quoted as sources and the name(s) of the editor(s) and the date of last changes are indicated in publications.
The texts must not be used for any kind of commercial usage. Downloading of some of the texts is restricted to members of the TITUS project.
Most of the texts mentioned below are accessible on the TITUS WordCruncher Server for investigations of many kinds.
For details concerning the concept of the TITUS server cf. the project description of November 2001.

Back to the TITUS homepage

This page written by JG ,

21.4.2002

Thesaurus Indogermanischer Text- und Sprachmaterialien
TITUS	TEXTUS