TEI Recommendations for Encoding Language Corpora
PPT Slide
What is a corpus?
"linguistically motivated selection"
Sampling issues
Representativenes
BNC Composition
Sampling frame
Encoding
User Requirements
1. Structural Issues
Newspapers and Periodicals
Transcribed speech
TEI basic structure
TEI basic structure (2)
TEI basic structure (3)
TEI Syntax
BNC Architecture
Components
TEI proposals
BNC structure...
Low-level segmentation
Examples
Sample written text
Sample spoken text
2. Reference schemes
BNC reference scheme
Editorial practices
for example: skuzzy/SCSI
3. Texts are not just words...
For example...
Who does the work?
Types of linguistic annotation
Inherited properties
for example...
Word class categorization
BNC practice
Providing an analysis
using interp...
Hierarchic bundling of interps
Feature structures
Using a feature structure...
feature definitions may be stored as a feature library...
...and invoked by reference
Clustering
discontinuous segments
Anaphoric reference
Translation pairs
4. Contextual Information
Information Discovery Needs
Describing a source
Contextualizing a source
The TEI header
TEI Header structure
The File Description
The publication statement
The source description
The Encoding Description
Editorial Declarations
The Profile Description
Text classification
BNC categorization scheme
Text classifications
Written text: medium
Spoken texts: region
Language
Ancillary documentation
BNC Participant Description
A BNC Setting Description
The revision description
Transcribing speech
The Spoken base tagset
Features of speech
Utterances
Vocals and events
Voice quality and prosody
Another example
Timing
Overlap
Synchronization
Defining a timeLine
Using a timeLine
The TEI as a standard
Why use this approach?
Home Page: http://users.ox.ac.uk/~lou
Other information: Presented at GLDV 99, 8 July 1999, Frankfurt a/M.