Skip to content
@Divergent-Discourses

Divergent-Discourses

Divergent Discourses: Processes of Narrative Construction in Tibet, 1955-62

Divergent Discourses is an international, collaborative UK-German research project. It studies the early history of a conflict that began in the high Himalayas in the 1950s and led to nearly two decades of armed conflict. That conflict continues today in the form of disputes over ideas and narratives between the Chinese government and the exile Tibetan community, marked by recurrent unrest and protests within Tibet as well as by protracted border tensions between China and its neighbours.

The project looks at the early phase of that conflict, just after the People’s Liberation Army annexed Tibet in 1950, during which initial attempts at compromise between the two sides collapsed. By 1959, armed struggle had spread across the Tibetan Plateau, leading the Dalai Lama, the traditional ruler of Tibet, to flee with some 80,000 other Tibetans to India. In response, Chinese officials produced millions of words in newsprint, historical tracts, propaganda leaflets and books to justify their claim to Tibet. From India, exiled Tibetans produced newspapers, refugee accounts, testimonies, memoirs, and histories of Tibet to counter China’s claims. The newspapers and documents written in those early years, just before and after the eruption of open conflict, provide crucial clues to the initial concerns, fears, and needs that underlay the conflict and shaped the aims and strategies pursued by either side.

By collecting and digitising Tibetan-language newspapers and other texts from those times, and developing computational tools for their analysis, Divergent Discourses will study the two competing discourses that emerged in the 1950s, each with their own account of Tibetan history, identity, and traditions.

Computational objectives: the project's initial goal is to develop the tools necessary for analysing a corpus of modern Tibetan texts. It will do this by using the iLCM, an integrated research environment for the analysis of structured and unstructured data which enables extensive forms of text-mining by social scientists (see https://ilcm.informatik.uni-leipzig.de/ilcm/ilcm/). To create the corpus and enable full use of the iLCM tools, the project will build a Tibetan language model for use with SpaCy; develop an OCR model in Transkribus for scanning Tibetan newspapers; develop a lay-out model for use in Transkribus for recognising text regions in newspapers; develop scripts for pre-processing images, extracting metadata from contributing library catalogues, and normalising output texts in Tibetan; develop or adapt tools for tokenising and PoS-tagging modern Tibetan texts; and develop a semantic search tool and NER capacity for modern Tibetan.

The project team includes researchers from the University of Leipzig in Germany (Franz Xaver Erhard) and SOAS University of London in the UK (Robert Barnett and James Engels), with major collaboration from Trinity College Dublin (Nathan Hill), as well as from Staatsbiliothek zu Berlin (CrossAsia), the Library of the Grassi Museum für Völkerkunde zu Leipzig, the Oriental Institute of the Czech Academy of Sciences, the Library of Tibetan Works and Archives in Dharamsala, India (LTWA), and others. It is jointly funded by the Deutsche Forschungsgemeinschaft (DFG) in Germany and the Arts and Humanities Research Council (AHRC) in the UK and will run from 2023 to 2026. For more details on the project, see our website at https://research.uni-leipzig.de/diverge/.

Licence: You are welcome to use the project data and code, but please acknowledge the project by name when doing so.

Popular repositories Loading

  1. TibNorm TibNorm Public

    Normalising Tibetan Text

    Python 1

  2. transkribus_utils transkribus_utils Public

    utilities for extracting text regions, text, etc. from Transkribus PAGE-xml outputs

    Python 1

  3. modern-botok modern-botok Public

    བོད་ཐོག BoTok custom dialect pack for modern Tibetan

    Python 1

  4. tibetan_tokenizers tibetan_tokenizers Public

    allows to feed directories to BoTok

    Python

  5. .github .github Public

    Divergent Discourses - The Project

  6. POS_utils POS_utils Public

    Python

Repositories

Showing 10 of 14 repositories
  • Tibetan_places Public

    Name lists of foreign toponyms, foreign anthroponyms, and Tibetan names as found in Tibetan language newspapers in the 1950s and 1960s

    Divergent-Discourses/Tibetan_places’s past year of commit activity
    0 GPL-2.0 0 0 0 Updated Dec 19, 2024
  • dd_preprocess Public

    Preprocesses images of written documents to prepare them for optical character recognition (OCR) or handwritten text recognition (HTR). Aims to obtain more accurate transcriptions by making text more machine-readable.

    Divergent-Discourses/dd_preprocess’s past year of commit activity
    Python 0 MIT 0 0 0 Updated Dec 19, 2024
  • dd_custom_preprocess Public

    Preprocesses images of written documents to prepare them for optical character recognition (OCR) or handwritten text recognition (HTR). Applies one of two pipelines to an image depending on its quality.

    Divergent-Discourses/dd_custom_preprocess’s past year of commit activity
    Python 0 MIT 0 0 0 Updated Dec 19, 2024
  • gemini_POS-tagger Public

    Pipeline to produce tokenised and PoS-tagged Tibetan datasets in conLL-u format using Google's Gemini via Google Cloud API

    Divergent-Discourses/gemini_POS-tagger’s past year of commit activity
    Python 0 0 2 0 Updated Dec 15, 2024
  • Tibetan_SpaCy-Model Public

    How to train a Tibetan language model for SpaCy

    Divergent-Discourses/Tibetan_SpaCy-Model’s past year of commit activity
    Python 0 0 0 0 Updated Dec 15, 2024
  • Divergent-Discourses/Transcription-conventions-for-Tibetan’s past year of commit activity
    0 GPL-2.0 0 0 0 Updated Dec 3, 2024
  • ilcm-integration Public

    a Tibetan language model, a stop word list and instructions how to load it into iLCM (Docker container)

    Divergent-Discourses/ilcm-integration’s past year of commit activity
    0 0 0 0 Updated Nov 6, 2024
  • modern-botok Public

    བོད་ཐོག BoTok custom dialect pack for modern Tibetan

    Divergent-Discourses/modern-botok’s past year of commit activity
    Python 1 0 0 0 Updated Nov 4, 2024
  • xlsx-to-xml-mods-processing Public

    convert Tibetan newspaper metadata from a excel spreadsheet to mods-xml

    Divergent-Discourses/xlsx-to-xml-mods-processing’s past year of commit activity
    Python 0 MIT 0 0 0 Updated Sep 25, 2024
  • TibNorm Public

    Normalising Tibetan Text

    Divergent-Discourses/TibNorm’s past year of commit activity
    Python 1 0 6 0 Updated Aug 12, 2024

Top languages

Loading…

Most used topics

Loading…