Genomics IULA Corpus in Catalan Corpus Text

Resource Name Genomics IULA Corpus in Catalan
Description The corpus consists of a number of specialized texts of Genome domain. This is LSP corpus has been created with articles from specialized publications, PhD theses, etc. It contains about 950 K words in 134 documents.
Language Name Catalan
Annotation Mode Automatic
Annotation Standoff true
Annotation Tool TreeTagger
Annotation Type Morphosyntactic Annotation Pos Tagging
Character Encoding Utf 8
Contact Person Jorge Vivaldi
Creation Mode Automatic
Domain medicine
Funding Project Metanet4 U – Enhancing The European Linguistic Infrastructure
Identifier IULA_cGenCAT
Language Code
Language Identifier ca
Licence Cc By Nc Sa
Linguality Monolingual
Media Type Media Type
Meta Share Identifier NOT_DEFINED_FOR_V2
Mime Type
Original Source IULACT GENOME
Resource Creator Universitat Pompeu Fabra. Institut Universitari De Lingüística Aplicada (IULA)
Resource Short Name Genomics Catalan
Segmentation Level Word
Size Information