Terminology, corpora and language resources

Terminology, corpora and language resources

This research area covers contrastive applied terminological and lexicographical research especially within the economic-administrative and legal domains.

Our aim is to provide research-based terminological resources to support higher education, trade and industry as well as society in general within the field of professional communication.

Our researchers who work with terminology, terminography and specialised lexicography have developed new theories on concepts, conceptual analyses and ontologies to enhance professional communication. We also develop language technology solutions to improve terminological databases. This includes the implementation of semi-automatic term extraction tools based on parallel and multilingual corpora.

We are active in developing our own monolingual and multilingual resources and utilizing them in empirical research, especially within professional and academic communication. These resources include spoken and written corpora, term bases, technology for semi-automatic corpus-based term extraction, web-based corpora, parallel corpora containing translations of texts from various specialist domains, as well as the occurrence and transfer of terminology from specialist to general text domains.

Ongoing research projects

  • Clarino and Termportalen

    Clarino and Termportalen

    CLARINO (Common Language Resources and Technology Infrastructure: Norway) is a coordinated effort to create a national infrastructure for language resources.

     

    Our department is responsible for a specific work package which is devoted to the development of a national portal for terminology resources. The project contributes towards overcoming the fragmentation problem in the terminology field and the need for coordination of terminological efforts at national and international levels. These resources are used widely by field experts and NHH students and staff.

    CLARINO is funded by the Research Council of Norway and linked to the Europe-wide CLARIN project.

    For more information:

    Termportalen ved NHH

    Clarino

  • Clarin

    Clarin

    CLARIN (Common Language Resources and Technology Infrastructure) is a Europe-wide research infrastructure that aims to provide easy and sustainable access, for scholars in the humanities and social sciences, to digital language data (in written, spoken, video or multimodal form) and advanced tools in order to discover, explore, exploit, annotate, analyse or combine them, independent of where they are located.

    Our department is a member of the CLARIN network through our active involvement in the CLARA and CLARINO projects.

    For more informaton:

    CLARIN

  • NHH Termbase

    NHH Termbase

    NHH TERMBASE is a Norwegian-English term base developed as a teaching resource within the economic-administrative subject area.

    The termbase consists of central concepts and their associated Norwegian and English terminologies.

    For more information: 

    NHH termbase

  • Maritim Ordbok

    Maritim Ordbok

    Maritim ordbok is a terminology project that aims to create a common open-access termbase for Norwegian terminology linked to English and other languages within the maritime sector.
    One of the goals is to maintain Norwegian language within a sector that is constantly becoming more international and influenced by English since, at present, no updated Norwegian terminology exists within this very important domain for Norwegian trade and industry.

    Terminology within central maritime domains, such as the fishing industry, shipping, and marine biology are included.

    The termbase is available through the national terminological infrastructure (Termportalen) being developed at NHH within the CLARINO network.

    The project is funded by the Bergesen Charitable Foundation, the Norwegian Ministry of Culture, the Free Word Foundation, the Norwegian Language Council, NHH and the Shipowners’ Association fund at NHH.

  • The Forskning.no corpus

    The Forskning.no corpus

    The Forskning.no corpus is a large Norwegian corpus of popular science articles published online. The corpus contains 21 million tokens from all areas of science collected from the news service Forskning.no. It is a valuable resource for research on popular science as a genre, for the study of terminology and language for specific purposes, as well as for applications such as term extraction, ontology building, etc.

    The corpus is accessible via the Corpuscle infrastructure in CLARINO (requires FEIDE/CLARIN logon):

    http://clarino.uib.no/korpuskel/corpus-list

    The compilation procedure is the same as the one applied for the large Norwegian Newspaper Corpus (Andersen & Hofland 2012), i.e. automatic harvesting of web texts, boilerplate removal, conversion and tagging of texts, etc. The texts are tagged with the Oslo-Bergen tagger and metadata on author, author’s gender, date, source, domain/subject area, etc. is available and searchable in the Corpuscle interface to the corpus.

    The project is a cooperation between NHH and Uni Research Computing and is funded by Småforsk and NHH.

    Contact: Gisle Andersen, NHH

    Reference: Andersen, Gisle and Knut Hofland (2012). Building a large monitor corpus based on newspapers on the web. In G. Andersen (ed.) Exploring Newspaper Language - Using the web to create and investigate a large corpus of modern Norwegian. Amsterdam, John Benjamins: 1-30.

Completed research projects

  • Mikroøkonomen

    Mikroøkonomen

    The Mikroøkonomen terminology project developed a termbase consisting of 800 term records in the field of microeconomics (English-Norwegian).

    This has become a useful resource for students, lecturers and researchers of economics to bridge the gap between the literature written in English and the users’ need to acquire equal competence in Norwegian. Mikroøkonomen has since been further developed in the project NHH Termbase.

    The Mikroøkonomen project (completed 2008) received funding from the Research Council of Norway and NHH.

    For more information:
    Mikroøkonomen

  • KB-N KNOWLEDGE BANK OF NORWAY

    KB-N KNOWLEDGE BANK OF NORWAY

    The KB-N project developed a concept-oriented text and term based knowledge management system for economic-administrative domains.

    The project included language technology applications for use primarily within translation, documentation and publishing. It incorporated a 30-million-words parallel corpus of Norwegian and English economic-administrative texts from about 30 subdomains, as well as a bilingual termbase of some 8,400 term records. As part of the project an automatic term extractor module for Norwegian was developed.

    The 3-year KB-N project (completed 2006) was funded by the Research Council of Norway within its KUNSTI programme for language technology.

  • CLARA

    CLARA

    CLARA (Common languange resouces and their applications) was an EU-funded project that carried out research on theoretical, methodological and technical topics relating to the task of harmonising language resources and terminology for professional domains.

    Professional knowledge domains, such as economy, energy and medicine, present special challenges to correct understanding, especially across languages.

    In this project a large corpus of English and Spanish Free Trade Agreements was compiled and experiments were performed with semi-automatic extraction of term candidates and specialised collocation candidates.

    The work focused specifically on parallel corpora, computational terminology and phraseology.

    For further information: 
    CLARA