Genomics IULA Corpus in Catalan Corpus Text

Resource Name Genomics IULA Corpus in Catalan
Description The corpus consists of a number of specialized texts of Genome domain. This is LSP corpus has been created with articles from specialized publications, PhD theses, etc. It contains about 950 K words in 134 documents.
Language Name Catalan
Url http://www.iula.upf.edu
Documentation
Annotation Mode Automatic
Annotation Standoff true
Annotation Tool TreeTagger
Annotation Type Morphosyntactic Annotation Pos Tagging
Character Encoding Utf 8
Contact Person Jorge Vivaldi
Creation Mode Automatic
Domain medicine
Funding Project Metanet4 U – Enhancing The European Linguistic Infrastructure
Identifier IULA_cGenCAT
Language Code http://www.fao.org/aims/aos/languagecode.owl#cat
Language Identifier ca
Licence Cc By Nc Sa
Linguality Monolingual
Media Type Media Type
Meta Share Identifier NOT_DEFINED_FOR_V2
Mime Type http://purl.org/NET/mediatypes/text/xml
Original Source IULACT GENOME
Resource Creator Universitat Pompeu Fabra. Institut Universitari De Lingüística Aplicada (IULA)
Resource Short Name Genomics Catalan
Segmentation Level Word
Size Information http://lodserver.iula.upf.edu/Metashare/resource/size_N13F1A
Tagset MULTEX/PAROLE