Digitization of Tocharian Manuscripts

Short notice about a new project

Original publication of this notice in: Tocharian and Indo-European Studies 7, 1997

Within the conference dedicated to the fulfilment of "100 Years of Tocharian Studies" (Saarbrücken, Oct. 13-15, 1995), the present state of the Tocharian manuscripts that have been preserved in European museums and collections, as well as their availability for future research, were discussed in a panel discussion. The speakers and the auditory agreed that a project of digitizing the manuscripts should be envisaged as soon as possible, in order both to preserve the data they contain for eternity and to make them more easily accessible to the scholarly world.

A first step in the direction of this aim has meanwhile been undertaken. In a joint effort, the Berlin-Brandenburgische Akademie der Wissenschaften, the Staatsbibliothek Berlin, the Institut für Vergleichende Sprachwissenschaft of the University of Frankfurt, and the Tamai Foundation have started digitizing the Tocharian manuscripts that are preserved in the institutions of the Stiftung Preußischer Kulturbesitz in Berlin. After a test phase which was finished during 1996, the following procedure has been established in accordance with the preservational necessities:

a) Every document is first photographed in its present state. In order to achieve a maximum of quality, this is done using high-resolution colour slide films. This has a double advantage as against using digital cameras: First, the slides can be stored as secondary reference copies of the documents themselves, and second, digitization from slides still gives best results with respect to orthochromaticity and resolution.
b) During the test phase, several attempts were made as to the arrangement of the manuscripts for the photographing. Given that most of the documents are stored in glass frames and should not be removed from them, a suitable background had to be found. It turned out that a bright-coloured paper (white or grey) is best for the purpose, yielding an optimum of contrast with both the manuscript paper and the ink used for writing.
c) The colour slides thus produced are digitized using a high resolution colour slide scanner. With a view to the different purposes the digitized images will be used for, this is done in at least two ways: First, every document is scanned in its entirety, i.e. including the glass frame, with a medium resolution of between 1000 and 1300 dpi (dots per inch). This resolution gives a digital image that fills a normal computer screen, the text being well readable without further enlargement. These images will be made accessible to the public via CD-ROMs and/or the internet in future times (for a set of specimens, cf. below). Second, the individual manuscripts are scanned with a high resolution of 2700 dpi. Using this resolution, a maximum of information can be stored in the digital files, thus meeting the requirements of an eternal preservation of data. Given that this procedure yields enormous file sizes (up to 26 MB per image -- today's CD-ROMs cannot contain more than 650 MB!), a huge amount of storage capacity is necessary for these images. This can be reduced by applying data compression routines, e.g. of the so-called "JPG" format. As this brings about a certain loss of information (the standard rate is 15%), the final decision of the format to store the data in is still being discussed.
d) Many documents are hardly readable, either because of damages or because the ink has faded. Such documents are scanned both as-is, i.e. in the way they appear to the eye, and with the aid of enhancing procedures such as increasing of contrast, intensifying of (ink) colour, etc. Similar procedures can be applied after digitizing too (i.e., using a photo editing software), and it depends on the actual document what steps are necessary for achieving a maximum of readability.

By today, about one third of the Tocharian documents preserved in Berlin have been digitized in the way indicated, and we expect that the photographing and scanning will be finished by the end of 1997. As a first exploitation of the material thus produced, the texts are now being re-transliterated by Tatsushi Tamai in cooperation with Klaus T. Schmidt and J. Gippert in order to establish a basis for a palaeographic investigation.

It is to be hoped that in due time, other institutions that own Tocharian manuscripts will join our efforts to prepare the documents for scholarly analysis and eternal preservation.

Frankfurt, 15.4.1997 Jost Gippert


Numbers refer to the Berlin catalogue of Tocharian manuscripts from Turfan [THT]; they are identical with the numbers used in Sieg-Siegling's edition of the Tocharian B texts.
Attention: File sizes range between 80 KB (resolution of 675) and 260 KB (resolution of 2700 dpi). It may be time consuming to retrieve the graphic files!

THT no. 50 recto, scanning resolution of 675 dpi, background dark-blue
THT no. 50 recto - extract, scanning resolution of 2700 dpi
THT no. 71 recto, scanning resolution of 675 dpi, background white
THT no. 71 recto - extract, scanning resolution of 2700 dpi
THT no. 71 recto - extract, scanning resolution of 2700 dpi, enhanced contrast
THT no. 94 recto, scanning resolution of 675 dpi, background dark-blue
THT no. 94 recto - extract, scanning resolution of 2700 dpi
THT no. 133 recto, scanning resolution of 675 dpi, background dark-blue
THT no. 133 verso - extract, scanning resolution of 1350 dpi, enhanced contrast
THT no. 239 recto, scanning resolution of 675 dpi, background white
THT no. 239 recto - extract, scanning resolution of 2700 dpi
THT no. 245 recto, scanning resolution of 675 dpi, background light blue
THT no. 245 recto - extract, scanning resolution of 2700 dpi