Lexica

Apertium RDF CA-IT RDF version of the Apertium bilingual dictionary CA-IT. The original dataset (in LMF) comes from http://hdl.handle.net/10230/17105. Apertium is a free/open-source machine translation platform. The LMF (Lexical Markup Framework) version of their linguistic data (available at http://repositori.upf.edu/discover?scope=%2F&query=Apertium) has served as input for its RDF version. Additional information about Apertium can be found in its wikipedia entry (http://en.wikipedia.org/wiki/Apertium). Apertium RDF contains the RDF (Resource Description Framework) version of the Apertium bilingual dictionaries, which have been transformed into RDF and published on the Web following the Linked Data principles. The core linguistic data has been modelled using the lemon model and the translations between terms have been modelled using the lemon translation module. More about the Apertium RDF dictionaries at http://linguistic.linkeddata.es/apertium/
Apertium RDF EN-CA RDF version of the Apertium bilingual dictionary EN-CA. The original dataset (in LMF) comes from http://hdl.handle.net/10230/17096 Apertium is a free/open-source machine translation platform. The LMF (Lexical Markup Framework) version of their linguistic data (available at http://repositori.upf.edu/discover?scope=%2F&query=Apertium) has served as input for its RDF version. Additional information about Apertium can be found in its wikipedia entry (http://en.wikipedia.org/wiki/Apertium). Apertium RDF contains the RDF (Resource Description Framework) version of the Apertium bilingual dictionaries, which have been transformed into RDF and published on the Web following the Linked Data principles. The core linguistic data has been modelled using the lemon model and the translations between terms have been modelled using the lemon translation module. More about the Apertium RDF dictionaries at http://linguistic.linkeddata.es/apertium/
Apertium RDF EN-ES RDF version of the Apertium bilingual dictionary EN-ES. The original dataset (in LMF) comes from http://hdl.handle.net/10230/1711 Apertium is a free/open-source machine translation platform. The LMF (Lexical Markup Framework) version of their linguistic data (available at http://repositori.upf.edu/discover?scope=%2F&query=Apertium) has served as input for its RDF version. Additional information about Apertium can be found in its wikipedia entry (http://en.wikipedia.org/wiki/Apertium). Apertium RDF contains the RDF (Resource Description Framework) version of the Apertium bilingual dictionaries, which have been transformed into RDF and published on the Web following the Linked Data principles. The core linguistic data has been modelled using the lemon model and the translations between terms have been modelled using the lemon translation module. More about the Apertium RDF dictionaries at http://linguistic.linkeddata.es/apertium/
Apertium RDF EN-GL RDF version of the Apertium bilingual dictionary EN-GL. The original dataset (in LMF) comes from http://hdl.handle.net/10230/17109 Apertium is a free/open-source machine translation platform. The LMF (Lexical Markup Framework) version of their linguistic data (available at http://repositori.upf.edu/discover?scope=%2F&query=Apertium) has served as input for its RDF version. Additional information about Apertium can be found in its wikipedia entry (http://en.wikipedia.org/wiki/Apertium). Apertium RDF contains the RDF (Resource Description Framework) version of the Apertium bilingual dictionaries, which have been transformed into RDF and published on the Web following the Linked Data principles. The core linguistic data has been modelled using the lemon model and the translations between terms have been modelled using the lemon translation module. More about the Apertium RDF dictionaries at http://linguistic.linkeddata.es/apertium/
Apertium RDF ES-CA RDF version of the Apertium bilingual dictionary ES-CA. The original dataset (in LMF) comes from http://hdl.handle.net/10230/17127 Apertium is a free/open-source machine translation platform. The LMF (Lexical Markup Framework) version of their linguistic data (available at http://repositori.upf.edu/discover?scope=%2F&query=Apertium) has served as input for its RDF version. Additional information about Apertium can be found in its wikipedia entry (http://en.wikipedia.org/wiki/Apertium). Apertium RDF contains the RDF (Resource Description Framework) version of the Apertium bilingual dictionaries, which have been transformed into RDF and published on the Web following the Linked Data principles. The core linguistic data has been modelled using the lemon model and the translations between terms have been modelled using the lemon translation module. More about the Apertium RDF dictionaries at http://linguistic.linkeddata.es/apertium/
Apertium RDF ES-GL RDF version of the Apertium bilingual dictionary ES-GL. The original dataset (in LMF) comes from http://hdl.handle.net/10230/17121 Apertium is a free/open-source machine translation platform. The LMF (Lexical Markup Framework) version of their linguistic data (available at http://repositori.upf.edu/discover?scope=%2F&query=Apertium) has served as input for its RDF version. Additional information about Apertium can be found in its wikipedia entry (http://en.wikipedia.org/wiki/Apertium). Apertium RDF contains the RDF (Resource Description Framework) version of the Apertium bilingual dictionaries, which have been transformed into RDF and published on the Web following the Linked Data principles. The core linguistic data has been modelled using the lemon model and the translations between terms have been modelled using the lemon translation module. More about the Apertium RDF dictionaries at http://linguistic.linkeddata.es/apertium/
Apertium RDF ES-PT RDF version of the Apertium bilingual dictionary ES-PT. The original dataset (in LMF) comes from http://hdl.handle.net/10230/17101 Apertium is a free/open-source machine translation platform. The LMF (Lexical Markup Framework) version of their linguistic data (available at http://repositori.upf.edu/discover?scope=%2F&query=Apertium) has served as input for its RDF version. Additional information about Apertium can be found in its wikipedia entry (http://en.wikipedia.org/wiki/Apertium). Apertium RDF contains the RDF (Resource Description Framework) version of the Apertium bilingual dictionaries, which have been transformed into RDF and published on the Web following the Linked Data principles. The core linguistic data has been modelled using the lemon model and the translations between terms have been modelled using the lemon translation module. More about the Apertium RDF dictionaries at http://linguistic.linkeddata.es/apertium/
Apertium RDF ES-RO RDF version of the Apertium bilingual dictionary ES-RO. The original dataset (in LMF) comes from http://hdl.handle.net/10230/17122 Apertium is a free/open-source machine translation platform. The LMF (Lexical Markup Framework) version of their linguistic data (available at http://repositori.upf.edu/discover?scope=%2F&query=Apertium) has served as input for its RDF version. Additional information about Apertium can be found in its wikipedia entry (http://en.wikipedia.org/wiki/Apertium). Apertium RDF contains the RDF (Resource Description Framework) version of the Apertium bilingual dictionaries, which have been transformed into RDF and published on the Web following the Linked Data principles. The core linguistic data has been modelled using the lemon model and the translations between terms have been modelled using the lemon translation module. More about the Apertium RDF dictionaries at http://linguistic.linkeddata.es/apertium/
Apertium RDF EU-ES RDF version of the Apertium bilingual dictionary EU-ES. The original dataset (in LMF) comes from http://hdl.handle.net/10230/17095. Apertium is a free/open-source machine translation platform. The LMF (Lexical Markup Framework) version of their linguistic data (available at http://repositori.upf.edu/discover?scope=%2F&query=Apertium) has served as input for its RDF version. Additional information about Apertium can be found in its wikipedia entry (http://en.wikipedia.org/wiki/Apertium). Apertium RDF contains the RDF (Resource Description Framework) version of the Apertium bilingual dictionaries, which have been transformed into RDF and published on the Web following the Linked Data principles. The core linguistic data has been modelled using the lemon model and the translations between terms have been modelled using the lemon translation module. More about the Apertium RDF dictionaries at http://linguistic.linkeddata.es/apertium/
Apertium RDF FR-CA RDF version of the Apertium bilingual dictionary FR-CA. The original dataset (in LMF) comes from http://hdl.handle.net/10230/17097. Apertium is a free/open-source machine translation platform. The LMF (Lexical Markup Framework) version of their linguistic data (available at http://repositori.upf.edu/discover?scope=%2F&query=Apertium) has served as input for its RDF version. Additional information about Apertium can be found in its wikipedia entry (http://en.wikipedia.org/wiki/Apertium). Apertium RDF contains the RDF (Resource Description Framework) version of the Apertium bilingual dictionaries, which have been transformed into RDF and published on the Web following the Linked Data principles. The core linguistic data has been modelled using the lemon model and the translations between terms have been modelled using the lemon translation module. More about the Apertium RDF dictionaries at http://linguistic.linkeddata.es/apertium/
Apertium RDF FR-ES RDF version of the Apertium bilingual dictionary FR-ES. The original dataset (in LMF) comes from http://hdl.handle.net/10230/17090. Apertium is a free/open-source machine translation platform. The LMF (Lexical Markup Framework) version of their linguistic data (available at http://repositori.upf.edu/discover?scope=%2F&query=Apertium) has served as input for its RDF version. Additional information about Apertium can be found in its wikipedia entry (http://en.wikipedia.org/wiki/Apertium). Apertium RDF contains the RDF (Resource Description Framework) version of the Apertium bilingual dictionaries, which have been transformed into RDF and published on the Web following the Linked Data principles. The core linguistic data has been modelled using the lemon model and the translations between terms have been modelled using the lemon translation module. More about the Apertium RDF dictionaries at http://linguistic.linkeddata.es/apertium/
Apertium RDF OC-CA RDF version of the Apertium bilingual dictionary OC-CA. The original dataset (in LMF) comes from http://hdl.handle.net/10230/17107. Apertium is a free/open-source machine translation platform. The LMF (Lexical Markup Framework) version of their linguistic data (available at http://repositori.upf.edu/discover?scope=%2F&query=Apertium) has served as input for its RDF version. Additional information about Apertium can be found in its wikipedia entry (http://en.wikipedia.org/wiki/Apertium). Apertium RDF contains the RDF (Resource Description Framework) version of the Apertium bilingual dictionaries, which have been transformed into RDF and published on the Web following the Linked Data principles. The core linguistic data has been modelled using the lemon model and the translations between terms have been modelled using the lemon translation module. More about the Apertium RDF dictionaries at http://linguistic.linkeddata.es/apertium/
Apertium RDF OC-ES RDF version of the Apertium bilingual dictionary OC-ES. The original dataset (in LMF) comes from http://hdl.handle.net/10230/17124 . Apertium is a free/open-source machine translation platform. The LMF (Lexical Markup Framework) version of their linguistic data (available at http://repositori.upf.edu/discover?scope=%2F&query=Apertium) has served as input for its RDF version. Additional information about Apertium can be found in its wikipedia entry (http://en.wikipedia.org/wiki/Apertium). Apertium RDF contains the RDF (Resource Description Framework) version of the Apertium bilingual dictionaries, which have been transformed into RDF and published on the Web following the Linked Data principles. The core linguistic data has been modelled using the lemon model and the translations between terms have been modelled using the lemon translation module. More about the Apertium RDF dictionaries at http://linguistic.linkeddata.es/apertium/
Apertium RDF PT-CA RDF version of the Apertium bilingual dictionary PT-CA. The original dataset (in LMF) comes from http://hdl.handle.net/10230/17120. Apertium is a free/open-source machine translation platform. The LMF (Lexical Markup Framework) version of their linguistic data (available at http://repositori.upf.edu/discover?scope=%2F&query=Apertium) has served as input for its RDF version. Additional information about Apertium can be found in its wikipedia entry (http://en.wikipedia.org/wiki/Apertium). Apertium RDF contains the RDF (Resource Description Framework) version of the Apertium bilingual dictionaries, which have been transformed into RDF and published on the Web following the Linked Data principles. The core linguistic data has been modelled using the lemon model and the translations between terms have been modelled using the lemon translation module. More about the Apertium RDF dictionaries at http://linguistic.linkeddata.es/apertium/
Apertium RDF PT-GL RDF version of the Apertium bilingual dictionary PT-GL. The original dataset (in LMF) comes from http://hdl.handle.net/10230/17115 . Apertium is a free/open-source machine translation platform. The LMF (Lexical Markup Framework) version of their linguistic data (available at http://repositori.upf.edu/discover?scope=%2F&query=Apertium) has served as input for its RDF version. Additional information about Apertium can be found in its wikipedia entry (http://en.wikipedia.org/wiki/Apertium). Apertium RDF contains the RDF (Resource Description Framework) version of the Apertium bilingual dictionaries, which have been transformed into RDF and published on the Web following the Linked Data principles. The core linguistic data has been modelled using the lemon model and the translations between terms have been modelled using the lemon translation module. More about the Apertium RDF dictionaries at http://linguistic.linkeddata.es/apertium/
Asturian LMF Freeling Lexicon This is the LMF version of the Asturian Freeling lexicon. FreeLing is a developer-oriented library providing language analysis services. FreeLing is designed to be used as an external library from any application requiring this kind of services. Nevertheless, a simple main program is also provided as a basic interface to the library, which enables the user to analyze text files from the command line.
Bank of Catalan Neologisms This is the LMF version of the Spanish Bank of Neologisms at the Observatori de Neologia (UPF). The Observatori de Neologia (OBNEO), under the direction by Dr. M. Teresa Cabré, is a public-funded consolidated group within the Institut Universitari de Lingüística Aplicada at Universitat Pompeu Fabra. This project analyzes the phenomenon of the appearance of new words or neologisms in the usage, both for Catalan and Spanish. Since 1996 has been recognized as a consolidated research group of Universitat Pompeu Fabra.
Bank of Spanish Neologisms This is the LMF version of the Spanish Bank of Neologisms at the Observatori de Neologia (UPF). The Observatori de Neologia (OBNEO), under the direction by Dr. M. Teresa Cabré, is a public-funded consolidated group within the Institut Universitari de Lingüística Aplicada at Universitat Pompeu Fabra. This project analyzes the phenomenon of the appearance of new words or neologisms in the usage, both for Catalan and Spanish. Since 1996 has been recognized as a consolidated research group of Universitat Pompeu Fabra.
Basic Vocabulary of Human Genome This is the LMF version of the Basic Vocabulary of Human Genome from the UPF-IULA located at http://www.iula.upf.edu/rec/vbgenoma/esp/index.html The project Vocabulary of the Human Genome (Biotechnology 2), approved in the 2003 plenary meeting of REALITER, incorporates the basic terminology most used in texts about genomics. The vocabulary presents selected entries in English and their equivalents in peninsular and Latin American Spanish, French, Italian, Galician, Portuguese and Catalan. A long with the information on equivalents, the users can find grammatical information and synonyms documented as variants in each of the languages.
Basque EuroWordNet-Lemon This is the Basque EuroWordNet-Lemon lexicon. The lexicon was created from the Basque Word-Net-LMF lexicon which is part of the Multilingual Central Repository (MCR http://adimen.si.ehu.es/web/MCR). The lexicon conforms to the 'lemon' specification.
Basque LMF Apertium Dictionary This is the LMF version of the Basque Apertium dictionary. Monolingual dictionaries for Spanish, Catalan, Gallego and Euskera have been generated from the Apertium expanded lexicons of the es-ca (for both Spanish andCatalan) es-gl (for Galician) and eu-es (for Basque). Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Basque WordNet-LMF This is the Basque Word-Net-LMF lexicon. The Basque lexicon is part of the Multilingual Central Repository (MCR http://adimen.si.ehu.es/web/MCR) and contains 23399 lexical Entries. The MCR currently integrates in the same EuroWordNet framework wordnets from five different languages: English, Basque, Basque, Basque and Basque. Its format was defined during the KYOTO Project (http://www.kyoto-project.eu/). The lexicon validates against the kyoto_wn.dtd which is also included in this distribution. The kyoto_wn.dtd is LMF compliant.
Basque-English LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for Basque and English languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Basque-Spanish LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for Basque and Spanish languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Catalan EuroWordNet-Lemon This is the Catalan EuroWordNet-Lemon lexicon. The lexicon was created from the Catalan Word-Net-LMF lexicon which is part of the Multilingual Central Repository (MCR http://adimen.si.ehu.es/web/MCR). The lexicon conforms to the 'lemon' specifications and it is linked to the Catalan Parole/Simple lemon lexicon.
Catalan LMF Apertium Dictionary This is the LMF version of the Catalan Apertium ditionary. Monolingual dictionaries for Spanish, Catalan, Gallego and Euskera have been generated from the Apertium expanded lexicons of the es-ca (for both Spanish and Catalan) es-gl (for Galician) and eu-es (for Basque). Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Catalan LMF Freeling Lexicon This is the LMF version of the Catalan Freeling lexicon. FreeLing is a developer-oriented library providing language analysis services. FreeLing is designed to be used as an external library from any application requiring this kind of services. Nevertheless, a simple main program is also provided as a basic interface to the library, which enables the user to analyze text files from the command line.
Catalan LMF Freeling Sense This is the LMF version of the Catalan Freeling Sense. FreeLing is a developer-oriented library providing language analysis services. FreeLing is designed to be used as an external library from any application requiring this kind of services. Nevertheless, a simple main program is also provided as a basic interface to the library, which enables the user to analyze text files from the command line. The original Catalan and Spanish sense dictionaries are extracted from EuroWordNe{@en}t, and the reduced subsets included in this FreeLing package are distibuted under GNU GPL license.
Catalan LMF Parole/Simple Lexicon This is the LMF version of the Catalan Parole-Simple lexicon. The original PAROLE lexica (20,000 entries per language) were built conform to a model based on EAGLES guidelines and GENELEX results, underlying a common lexical tool adapted from the EUREKA-GENELEX project. This software tool was extended to support the PAROLE model and conversion and management processes of the resulting resources. The languages involved in PAROLE lexica are: Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Italian, Portugese, Spanish and Swedish. The goal of SIMPLE project was to add semantic information, selected for its relevance for LE applications, to the set of harmonised multifunctional lexica built for 12 European languages by the PAROLE consortium. PAROLE +SIMPLE lexicons contain morphological, syntactic and semantic information, organised according to a common model and to common linguistic specifications.
Catalan PAROLE-SIMPLE lemon Lexicon The Catalan Parole/Simple 'lemon' Lexicon is the OWL version of the Spanish Parole & Simple lexicons (defined during the PAROLE LE2-4017 and SIMPLE LE4-8346 projects) once mapped to Lexinfo Model (http://lexinfo.net/). This data set has been published as Linked Open Data in the Data Hub (http://thedatahub.org/en/dataset/parole-simple-ont). The goal of SIMPLE project was to add semantic information, selected for its relevance for LE applications, to the set of harmonised multifunctional lexica built for 12 European languages by the PAROLE consortium. PAROLE +SIMPLE lexicons contain morphological, syntactic and semantic information, organised according to a common model and to common linguistic specifications. The Catalan The Catalan lexicon includes 20,545 entries annotated with syntactic information half of which are also annotated with semantic information.
Catalan WordNet-LMF This is the Catalan Word-Net-LMF lexicon. The Catalan lexicon is part of the Multilingual Central Repository (MCR http://adimen.si.ehu.es/web/MCR) and contains 23399 lexical Entries. The MCR currently integrates in the same EuroWordNet framework wordnets from five different languages: English, Spanish, Catalan, Basque and Catalan. Its format was defined during the KYOTO Project (http://www.kyoto-project.eu/). The lexicon validates against the kyoto_wn.dtd which is also included in this distribution. The kyoto_wn.dtd is LMF compliant.
English LMF Apertium Dictionary This is the LMF version of the Apertium English dictionary. Monolingual dictionary for English was generated from the Apertium expanded lexicon of the en-es pair system (English/Spanish). Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
English-Catalan LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for English and Catalan languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
English-Galician CLUVI Dictionary This is the LMF version of the English-Galician CLUVI Dictionary developed under the direction of Xavier Gómez Guinovart (2005-2012) from parallel texts in the CLUVI Corpus of the University of Vigo.
English-Galician LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for English and Galician languages. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
English-Spanish LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for English and Spanish languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Esperanto-Catalan LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for Esperanto and Catalan languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Esperanto-English LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for Esperanto and English languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Esperanto-French LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for Esperanto and French languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Esperanto-Spanish LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for Esperanto and Spanish languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
French LMF Apertium Dictionary This is the LMF version of the Apertium French dictionary. Monolingual dictionary for French was generated from the Apertium expanded lexicon of the fr-es pair system (French/Spanish). Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
French-Catalan LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for French and Catalan languags. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
French-Spanish LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for French and Spanish languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Galician EuroWordNet-Lemon This is the Galician EuroWordNet-Lemon lexicon. The lexicon was created from the Galician Word-Net-LMF lexicon which is part of the Multilingual Central Repository (MCR http://adimen.si.ehu.es/web/MCR). The lexicon conforms to the 'lemon' specification.
Galician LMF Apertium Dictionary This is the LMF version of the Galician Apertium dictionary. Monolingual dictionaries for Spanish, Catalan, Galician and Euskera have been generated from the Apertium expanded lexicons of the es-ca (for both Spanish and Catalan) es-gl (for Galician) and eu-es (for Basque). Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Galician LMF Freeling Lexicon This is the LMF version of the Galician Freeling dictionary. FreeLing is a developer-oriented library providing language analysis services. FreeLing is designed to be used as an external library from any application requiring this kind of services. Nevertheless, a simple main program is also provided as a basic interface to the library, which enables the user to analyze text files from the command line.
Galician WordNet-LMF This is the Galician Word-Net-LMF lexicon. The Galician lexicon is part of the Multilingual Central Repository (MCR http://adimen.si.ehu.es/web/MCR) and contains 23399 lexical Entries. The MCR currently integrates in the same EuroWordNet framework wordnets from five different languages: English, Spanish, Catalan, Basque and Galician. Its format was defined during the KYOTO Project (http://www.kyoto-project.eu/). The lexicon validates against the kyoto_wn.dtd which is also included in this distribution. The kyoto_wn.dtd is LMF compliant.
Italian LMF Apertium Dictionary This is the LMF version of the Apertium Italian dictionary. Monolingual dictionary for Italian was generated from the Apertium expanded lexicon of the es-it. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Italian-Catalan LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for Italian and Catalan languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Italian-Spanish LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for Italian and Spanish languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Spanish). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
KNAB, Place Names Database The Place Names Database (KNAB) is a systematic computerized collection of data on geographical names from both Estonia and abroad that is being developed at the Institute of Estonian Language. Its purpose is to facilitate the study and standardization of geographical names by providing data on their history and modern use. It has been planned as a linguistically-oriented database, to enable to compile and prepare different gazetteers and dictionaries. The data of KNAB may be freely used provided that the source is quoted. Extensive usage of the data (e.g. if a monograph, gazetteer or a map is based on the data of KNAB) will be possible on the presumption that the compiler of the database be informed of this usage beforehand. There will be no charge for the use of data, instead the user might receive information on the completeness and reliability of the data he/she needs.
LMF UPF Term This is the LMF version of the LMF UPF Term located at http://www.iula.upf.edu/rec/upfterm/cat/index.htm. The UPF_TERM terminological data bank was created with the purpose of establishing an electronic resort for the consultation and diffusion of terminological projects elaborated by students of the Facultat de Traducció i Interpretació, by the IULA and other work and research centres in the Universitat Pompeu Fabra. The UPF_TERM bank, placed in a database server devoted to this end, is free access for students, lecturers and researchers, from the Universitat Pompeu Fabra as well as outside users. UPF_TERM is the name that receives the set of global resorts of this terminology bank, which is divided in several databases. Each database gathers terminological papers with common features; therefore, the questions and answers can be more productive. Each register of each database has the author, the title and the origin of each project.
LMF version of the SenSem Catalan Data Base This is the LMF version of the SenSem database created by the Spanish Inter-University Research Group GRIAL. As part of SenSem project, a corpus of sentences annotated at the semantic and syntactic levels was created. The source corpus is made up of around 13 million words extracted from the online versions of a Spanish newspaper. From this corpus, 25.000 sentences have been randomly selected, 100 for each of the 250 more frequent verbs in current Spanish. Each sentence has been labeled according to the verb sense it exemplifies, the type of complements it takes (arguments or adjunts), their syntactic category and function, and finally each argument has been labelled with a semantic role. The sentence has also been annotated as to its semantics both in relation with aspectual information and the type of construction being expressed. From this annotated corpus a lexical data base of verbs was created in which all the previous information will be recollected. The unit of description of the verbs is the sense. In the description of the verbs, argument structure is included, incorporating subcategorization patterns, with the information of frequency of them, semantic roles and information regarding sentence semantics. The lexicon and the corpus are associated at sense level and together shape up what we call the data bank of the sentential semantic of the Spanish verbs. Both resources are available via web and will form a very important source of linguistic information which we hope will be of utility in different areas of the natural language processing and linguistic research in general. The LMF conversion has been done by the Universitat Pompeu Fabra.
LMF version of the SenSem Spanish Data Base This is the LMF version of the SenSem database created by the Spanish Inter-University Research Group GRIAL. As part of SenSem project, a corpus of sentences annotated at the semantic and syntactic levels was created. The source corpus is made up of around 13 million words extracted from the online versions of a Spanish newspaper. From this corpus, 25.000 sentences have been randomly selected, 100 for each of the 250 more frequent verbs in current Spanish. Each sentence has been labeled according to the verb sense it exemplifies, the type of complements it takes (arguments or adjunts), their syntactic category and function, and finally each argument has been labelled with a semantic role. The sentence has also been annotated as to its semantics both in relation with aspectual information and the type of construction being expressed. From this annotated corpus a lexical data base of verbs was created in which all the previous information will be recollected. The unit of description of the verbs is the sense. In the description of the verbs, argument structure is included, incorporating subcategorization patterns, with the information of frequency of them, semantic roles and information regarding sentence semantics. The lexicon and the corpus are associated at sense level and together shape up what we call the data bank of the sentential semantic of the Spanish verbs. Both resources are available via web and will form a very important source of linguistic information which we hope will be of utility in different areas of the natural language processing and linguistic research in general. The LMF conversion has been done by the Universitat Pompeu Fabra.
Multilingual Vocabulary of Economics This is the LMF version of the Multilingual Vocabulary of Economics Resource developed within the frame of the research project RICOTERM2. Financed by the Ministry of Science and Technology, within the Programa Nacional de Promoción General del Conocimiento (2004-2007).
Occitan-Catalan LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for Occitan and Catalan languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Occitan-Spanish LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for Occitan and Spanish languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
PANACEA Bilingual Glossary German-English with Contextual Transfer Information This glossary contains German entries with multiple English transfers, 22.000 transfers in total. It is lemma-based, annotated with part-of-speech information and with source language conceptual contexts which indicate which translation to select in a given context, to support transfer selection for multiple translation possibilities. It is the result of PANACEA research.
PANACEA Bilingual Glossary German-English with Probabilities This glossary contains German entries with multiple English transfers, 22.000 transfers in total. It is lemma-based, annotated with part-of-speech information and with several probabilities, to support transfer selection for multiple translation possibilities. It is the result of PANACEA research.
PANACEA English Gold Standard for lexical semantic classification We present a set of English gold-standards for different noun classes created in PANACEA to train and test automatic classifiers. To create these gold-standards we used we the data from the SemEval 2007 workshop Task 07: Coarse Grained English All-Words (Navigli et al., 2007). The words used in this task were first automatically tagged with an automatic clustering method (Navigli, 2006) using senses based on the WordNet sense inventory and later manually validated by expert lexicographers. For our experiments, we extracted all of the words from this inventory that contained as their first sense a sense that corresponded to the lexical semantic classes, i.e. “people” in the case of the class HUMAN. These gold-standards were created in the context of PANACEA http://www.panacea-lr.eu), an EU-FP7 Funded Project under Grant Agreement 248064.
PANACEA English V-SUBCAT gold-standard for ENV domain This is a domain-specific gold-standard for English subcategorization frames, in the case, for environment (ENV) domain. This gold-standard was manually developed, choosing a set of 28 verbs and 200 senteces for each verb. For each sentence, the SCFs present for the studied verb were manually annotated. The sentences were selected from crawled Web pages that were automatically detected to be in the English language and were automatically classified as relevant to the ENV domain. Data collection took place in the summer of 2011. This gold-standard was created in the context of PANACEA http://www.panacea-lr.eu), an EU-FP7 Funded Project under Grant Agreement 248064.
PANACEA English V-SUBCAT gold-standard for LAB domain This is a domain-specific gold-standard for English subcategorization frames, in the case, for labour (LAB) domain. This gold-standard was manually developed, choosing a set of 29 verbs and 200 senteces for each verb. For each sentence, the SCFs present for the studied verb were manually annotated. The sentences were selected from crawled Web pages that were automatically detected to be in the English language and were automatically classified as relevant to the ENV domain. Data collection took place in the summer of 2011. This gold-standard was created in the context of PANACEA http://www.panacea-lr.eu), an EU-FP7 Funded Project under Grant Agreement 248064.
PANACEA English automatically acquired lexicon for ENV domain: Lexical Semantic classes for nouns This is a domain-specific lexicon of for English for environtment (ENV) domain. This lexicon contains a set of nouns classified into seven different semantic classes. It has been automatically created using the PANACEA web services for noun classification and the crawled data for this domain and language, previously annotated with FreeLing tagger. The crawled data was obtained crawling web pages that were automatically detected to be in the English language and were automatically classified as relevant to the ENV domain. Data collection took place in the summer of 2011.
PANACEA English automatically acquired lexicon for LAB domain: Lexical Semantic classes for nouns This is a domain-specific lexicon of for English for labour (LAB) domain. This lexicon contains a set of nouns classified into seven different semantic classes. It has been automatically created using the PANACEA web services for noun classification and the crawled data for this domain and language, previously annotated with FreeLing tagger. The crawled data was obtained crawling web pages that were automatically detected to be in the English language and were automatically classified as relevant to the LAB domain. Data collection took place in the summer of 2011.
PANACEA English automatically acquired lexicon for ENV domain: Subcategorization Frames (V-SUBCAT) This lexicon was produced using an inductive SCF classifier, the tpc_subcat_inductive webservice in the PANACEA project. The lexicon was automatically produced from the PANACEA MCv2 crawled corpus, by parsing the data with the RASP parser (Third Release, Open-Source Version, February 2001, available from http://ilexir.co.uk; see also E. Briscoe, J. Carroll, and R. Watson, 2006, The Second Release of the RASP System, in Proceedings of COLING/ACL Interactive Presentation Sessions), and then processing the parsed data with tpc_subcat_inductive. Only verb lemmas with at least 200 instances in MCv2 were retained.
PANACEA English automatically acquired lexicon for LAB domain: Subcategorization Frames (V-SUBCAT) This lexicon was produced using an inductive SCF classifier, the tpc_subcat_inductive webservice in the PANACEA project. The lexicon was automatically produced from the PANACEA MCv2 crawled corpus, by parsing the data with the RASP parser (Third Release, Open-Source Version, February 2001, available from http://ilexir.co.uk; see also E. Briscoe, J. Carroll, and R. Watson, 2006, The Second Release of the RASP System, in Proceedings of COLING/ACL Interactive Presentation Sessions), and then processing the parsed data with tpc_subcat_inductive. Only verb lemmas with at least 200 instances in MCv2 were retained.
PANACEA English automatically acquired lexicon for ENV domain: Subcategorization Frames and Lexical Semantic classes for nouns This is a domain-specific lexicon for English for environment (ENV) domain. This lexicon contain both, subcategorization frames for verbs and lexical semantic classes for nouns. This lexicon has been automatically created using PANACEA webservices using crawled data. The crawled data was obtained crawling web pages that were automatically detected to be in the English language and were automatically classified as relevant to the ENV domain. Data collection took place in the summer of 2011.
PANACEA English automatically acquired lexicon for LAB domain: Subcategorization Frames and Lexical Semantic classes for nouns This is a domain-specific lexicon for English for labour (LAB) domain. This lexicon contain both, subcategorization frames for verbs and lexical semantic classes for nouns. This lexicon has been automatically created using PANACEA webservices using crawled data. The crawled data was obtained crawling web pages that were automatically detected to be in the English language and were automatically classified as relevant to the LAB domain. Data collection took place in the summer of 2011.
PANACEA Environment Bilingual Glossary French-to-English This glossary contains terminology in French-to-English, with a focus on environmental terms, resulting from PANACEA research. It contains about 3846 entries, both single words and multiwords, with part-of-speech information, manually validated.
PANACEA Environment SCF MWE merged Italian Lexicon The Italian PANACEA_ENV_MWE_SCF_merged.lmf.xml lexicon is obtained by merging two automatically extracted lexicons: a domain lexicon (environment) for SCFs, PANACEA_SCF_IT_environment.lmf.xml and a MWE Italian lexicon env-mw.lmf.xml. The lexicon was produced at CNR-ILC, Pisa, Italy as an outcome of the PANACEA EU-FP7 Funded Project under Grant Agreement 248064 (http://www.panacea-lr.eu).
PANACEA Environment Bilingual Glossary EL-EN (Greek-English) This folder contains files for bilingual glossary creation from factored phrase tables that include part of speech tagged text for EL-EN language pair. The tables are firstly filtered using part of speech tag sequences for each language so that entries with unsuitable part of speech sequences are filtered out. Then, feature scores from the phrase table are combined in a log-linear model to score each entry. The user specifies how large the output glossary should be (relative to the input) and the bottom ranking entries are discarded to produce the desired size glossary.
PANACEA Environment Bilingual Glossary FR-EN (French-English) This folder contains files for bilingual glossary creation from factored phrase tables that include part of speech tagged text for FR-EN language pair. The tables are firstly filtered using part of speech tag sequences for each language so that entries with unsuitable part of speech sequences are filtered out. Then, feature scores from the phrase table are combined in a log-linear model to score each entry. The user specifies how large the output glossary should be (relative to the input) and the bottom ranking entries are discarded to produce the desired size glossary.
PANACEA Environment and Parole merged Italian Lexicon The Italian PAROLE_env_merged.lmf.xml is SCF lexicon obtained by merging two automatically extracted lexicons: a domain lexicon (environment) PANACEA_SCF_IT_environment.lmf.xml and a the SCFs Italian lexicon PAROLE_lmf_subcat_ita.lmf.xml, generated from PAROLE SIMPLE Lexicon. The lexicon was produced at CNR-ILC, Pisa, Italy as an outcome of the PANACEA EU-FP7 Funded Project under Grant Agreement 248064 (http://www.panacea-lr.eu).
PANACEA Environment and Repubblica merged Italian Lexicon The Italian PANACEA_rep_env_merged.lmf.xml is SCF lexicon obtained by merging two automatically extracted lexicons: a domain lexicon (environment)for SCFs, PANACEA_SCF_IT_environment.lmf.xml and a general domain SCFs Italian lexicon repubblica.scf_extracted.lmf. The lexicon was produced at CNR-ILC, Pisa, Italy as an outcome of the PANACEA EU-FP7 Funded Project under Grant Agreement 248064 (http://www.panacea-lr.eu).
PANACEA Environment Multi Word Italian Lexicon The Environment MW Italian Lexicon is a lexicon of noun-noun multiword expressions automatically extracted from a 36Mio word web crawled corpus in the environmental domain. The lexicon was produced at CNR-ILC, Pisa, Italy as an outcome of the PANACEA EU-FP7 Funded Project under Grant Agreement 248064 (http://www.panacea-lr.eu).
PANACEA Italian Parole V-SUBCAT Gold Standard lexicon The PAROLE-SCF-31-IT is a lexicon of verb subcategorisation frames for 31 verb lemmas extracted from the PAROLE Italian Lexicon (Ruimy et a. 2003).
PANACEA Italian V-SUBCAT Repubblica lexicon (language dependent extractor) The OpenDomain SCF Italian Lexicon is a lexicon of verb subcategorisation frames automatically extracted from a 300Mio words newspaper corpus using a language dependent SCF acquisition software. The lexicon was produced at CNR-ILC, Pisa, Italy as an outcome of the PANACEA EU-FP7 Funded Project under Grant Agreement 248064 (http://www.panacea-lr.eu).
PANACEA Italian V-SUBCAT Repubblica lexicon (language independent extractor) TThis is a lexicon of verb subcategorisation frames automatically extracted from a 300Mio words newspaper corpus using a language independent SCF acquisition software. The lexicon was produced at CNR-ILC, Pisa, Italy as an outcome of the PANACEA EU-FP7 Funded Project under Grant Agreement 248064 (http://www.panacea-lr.eu).
PANACEA Italian V-SUBCAT gold-standard for ENV domain The PANACEA_SCF_Gold_ENV_IT is a manually created gold-standard lexicon of verbal subcategorisation frames for 26 verb lemmas. The language is Italian and the domain is Environment. The lexicon was created through manual annotation of 200 random contexts drawn from the MC_v1_ENV_IT corpus. The gold-standard was produced at CNR-ILC, Pisa, Italy as an outcome of the PANACEA EU-FP7 Funded Project under Grant Agreement 248064 (http://www.panacea-lr.eu).
PANACEA Italian V-SUBCAT Gold Standard lexicon The PANACEA_SCF_Lexicon_ITA_Gold is an manually created gold-standard lexicon of verbal subcategorisation frames for 30 verb lemmas. The language is Italian and the domain is general. The lexicon was manually created using three already existing lexicons, two manually curated (PAROLE-IT and VERBAT), one automacally created (LexIT). The gold-standard was produced at CNR-ILC, Pisa, Italy as an outcome of the PANACEA EU-FP7 Funded Project under Grant Agreement 248064 (http://www.panacea-lr.eu).
PANACEA Italian V-SUBCAT gold-standard for LAB domain TThe PANACEA_SCF_Gold_LAB_IT is a manually created gold-standard lexicon of verbal subcategorisation frames for 27 verb lemmas. The language is Italian and the domain is Labour Legislation. The lexicon was created through manual annotation of 200 random contexts drawn from the MCv1_LAB_IT corpus. The gold-standard was produced at CNR-ILC, Pisa, Italy as an outcome of the PANACEA EU-FP7 Funded Project under Grant Agreement 248064 (http://www.panacea-lr.eu).
PANACEA Italian automatically acquired lexicon for ENV domain: Subcategorization Frames (V-SUBCAT) The PANACEA_SCF_IT_ENV is an automatically created lexicon of verbal subcategorisation frames for 26 verb lemmas. The language is Italian and the domain is Environment. The lexicon was acquired from a monolingual corpus automatically crawled, normalised, cleaned, and NLP processed (Corpus name: PANACEA Annotated Italian Environment Corpus Version 2, PANACEA_MCv2_IT_ENV).
PANACEA Italian automatically acquired lexicon for LAB domain: Subcategorization Frames (V-SUBCAT) The PANACEA_SCF_IT_LAB is an automatically created lexicon of verbal subcategorisation frames for 27 verb lemmas. The language is Italian and the domain is Labour Legislation. The lexicon was acquired from a monolingual corpus automatically crawled, normalised, cleaned, and NLP processed (Corpus name: PANACEA Annotated Italian Labour Legislation Corpus Version 2, PANACEA_MCv2_IT_LAB).
PANACEA Labour Bilingual Glossary EL-EN (Greek-English) This folder contains files for bilingual glossary creation from factored phrase tables that includes part of speech tagged text for for EL-EN language pair. The tables are firstly filtered using part of speech tag sequences for each language so that entries with unsuitable part of speech sequences are filtered out. Then, feature scores from the phrase table are combined in a log-linear model to score each entry. The user specifies how large the output glossary should be (relative to the input) and the bottom ranking entries are discarded to produce the desired size glossary.
PANACEA Labour Bilingual Glossary FR-EN (French-English) This folder contains files for bilingual glossary creation from factored phrase tables that include part of speech tagged text for for FR-EN language pair. The tables are firstly filtered using part of speech tag sequences for each language so that entries with unsuitable part of speech sequences are filtered out. Then, feature scores from the phrase table are combined in a log-linear model to score each entry. The user specifies how large the output glossary should be (relative to the input) and the bottom ranking entries are discarded to produce the desired size glossary.
PANACEA Labour Legislation Bilingual Glossary French-to-English This glossary contains terminology in French-to-English, with a focus on labour legislation terms, resulting from PANACEA research. It contains about 2441 entries, both single words and multiwords, with part-of-speech information, manually validated.
PANACEA Labour SCF MWE merged Italian Lexicon The Italian PANACEA_LAB_SCF_MWE_merged.lmf.xml lexicon is obtained by merging two automatically extracted lexicons: a domain lexicon (labour) for SCFs, PANACEA_SCF_IT_labour.lmf.xml and a MWE Italian lexicon lab-mw.lmf.xml. The lexicon was produced at CNR-ILC, Pisa, Italy as an outcome of the PANACEA EU-FP7 Funded Project under Grant Agreement 248064 (http://www.panacea-lr.eu).
PANACEA Labour and Parole merged Italian Lexicon The Italian PAROLE_lab_merged.lmf.xml is SCF lexicon obtained by merging two automatically extracted lexicons: a domain lexicon (labour) PANACEA_SCF_IT_labour.lmf.xml and a the SCFs Italian lexicon PAROLE_lmf_subcat_ita.lmf.xml, generated from PAROLE SIMPLE Lexicon. The lexicon was produced at CNR-ILC, Pisa, Italy as an outcome of the PANACEA EU-FP7 Funded Project under Grant Agreement 248064 (http://www.panacea-lr.eu).
PANACEA Labour and Repubblica merged Italian Lexicon The Italian PANACEA_rep_lab_merged.lmf.xml is SCF lexicon obtained by merging two automatically extracted lexicons: a domain lexicon (labour) PANACEA_SCF_IT_labour.lmf.xml and a general domain SCFs Italian lexicon repubblica.scf_extracted.lmf. The lexicon was produced at CNR-ILC, Pisa, Italy as an outcome of the PANACEA EU-FP7 Funded Project under Grant Agreement 248064 (http://www.panacea-lr.eu).
PANACEA Spanish Gold Standard for lexical semantic classification We present a set of Spanish gold-standards for different noun classes created in PANACEA to train and test automatic classifiers. To create these gold-standards we used the Spanish working lexicon of the Incyta Machine Translation system (Alonso and Bocsák, 2005) to create the abstract, human, location, process, artifact, matter, semiotic and social gold-standard lists. The gold standard for event nouns (non-deverbal eventive nouns) was developed for the experiments of Bel et al. (2010). These gold-standards were created in the context of PANACEA (http://www.panacea-lr.eu), an EU-FP7 Funded Project under Grant Agreement 248064.
PANACEA Spanish V-SUBCAT Gold Standard lexicon ENV domain This is a domain-specific gold-standard for Spanish subcategorization frames, in the case, for environment (ENV) domain. This gold-standard was manually developed, choosing a set of 30 verbs and 200 senteces for each verb. For each sentence, the SCFs present for the studied verb were manually annotated. The sentences were selected from crawled Web pages that were automatically detected to be in the Spanish language and were automatically classified as relevant to the ENV domain. Data collection took place in the summer of 2011. This gold-standard was created in the context of PANACEA (http://www.panacea-lr.eu), an EU-FP7 Funded Project under Grant Agreement 248064.
PANACEA Spanish V-SUBCAT Gold Standard lexicon LAB domain This is a domain-specific gold-standard for Spanish subcategorization frames, in the case, for labour legislation (LAB) domain. This gold-standard was manually developed, choosing a set of 30 verbs and 200 senteces for each verb. For each sentence, the SCFs present for the studied verb were manually annotated. The sentences were selected from crawled Web pages that were automatically detected to be in the Spanish language and were automatically classified as relevant to the LAB domain. Data collection took place in the summer of 2011. This gold-standard was created in the context of PANACEA (http://www.panacea-lr.eu), an EU-FP7 Funded Project under Grant Agreement 248064.
PANACEA Spanish automatically acquired lexicon for ENV domain: Lexical Semantic classes for nouns This is a domain-specific lexicon of for Spanish for environment (ENV) domain. This lexicon contains the a set of nouns classified into nine different semantic classes. It has been automatically created using the PANACEA web services for noun classification and the crawled data for this domain and language, previously annotated with FreeLing tagger.
PANACEA Spanish automatically acquired lexicon for LAB domain: Lexical Semantic classes for nouns This is a domain-specific lexicon of for Spanish for labour (LAB) domain. This lexicon contains the a set of nouns classified into nine different semantic classes. It has been automatically created using the PANACEA web services for noun classification and the crawled data for this domain and language, previously annotated with FreeLing tagger.
PANACEA Spanish automatically acquired lexicon for ENV domain: Subcategorization Frames (V-SUBCAT) This is a domain-specific lexicon for Spanish subcategorization frames for environment (ENV) domain. This lexicon has been automatically created using the PANACEA web service named tpc_subcat_inductive (http://registry.elda.org/services/223) and the crawled data for this domain and language, previously annotated with Spanish Malt Parser web service (http://lod.iula.upf.edu/resources/249).
PANACEA Spanish automatically acquired lexicon for LAB domain: Subcategorization Frames (V-SUBCAT) This is a domain-specific lexicon for Spanish subcategorization frames for labour (LAB) domain. This lexicon has been automatically created using the PANACEA web service named tpc_subcat_inductive (http://registry.elda.org/services/223) and the crawled data for this domain and language, previously annotated with Spanish Malt Parser web service (http://lod.iula.upf.edu/resources/249).
PANACEA Spanish automatically acquired lexicon for ENV domain: Subcategorization Frames and Lexical Semantic classes for nouns This is a domain-specific lexicon for Spanish for environment (ENV) domain. This lexicon contain both, subcategorization frames for verbs and lexical semantic classes for nouns. This lexicon has been automatically created using PANACEA webservices using crawled data.
PANACEA Spanish automatically acquired lexicon for LAB domain: Subcategorization Frames and Lexical Semantic classes for nouns This is a domain-specific lexicon for Spanish for labour (LAB) domain. This lexicon contain both, subcategorization frames for verbs and lexical semantic classes for nouns. This lexicon has been automatically created using PANACEA webservices using crawled data.
PANACEA Spanish multi-level, multi-domain lexicon This is a multi-level, multi-domain lexicon for Spanish. It combines the automatically acquired lexica for ENV and LAB domains using PANACEA platform and some general domain lexica, manually developed. The automatically acquired lexica consist of semantic classification of nouns in 9 different classes and acquired subcategorization frames (SCF) for verbs. This information has been acquired for LAB and ENV domains using automatically crawled corpora. This automatically acquired lexica were combined (using a multilevel merger) with two general domain resources: a general domain SCF gold-standard for Spanish developed in the scope of PANACEA and a morphological dictionary created from existing dictionaries in Metanet4U project.
PAROLE-SIMPLE LexInfo Ontology The Parole/Simple 'lexinfo' Ontology is the OWL version of the Parole & Simple model (defined during the PAROLE LE2-4017 and SIMPLE LE4-8346 projects) once mapped to Lexinfo Model (http://lexinfo.net/). This data set has been published as Linked Open Data in the Data Hub (http://thedatahub.org/en/dataset/parole-simple-ont). The goal of SIMPLE project was to add semantic information, selected for its relevance for LE applications, to the set of harmonised multifunctional lexica built for 12 European languages by the PAROLE consortium. PAROLE +SIMPLE lexicons contain morphological, syntactic and semantic information, organised according to a common model and to common linguistic specifications. The original Parole/Simple model expressed in the parole DTD has been mapped into the Lexinfo model (http://lexinfo.net/). Thus the resulting Parole/Simple Ontology imports the Lexinfo Ontology and adds 'parole elements' (classes and/or properties) whenever these cannot be mapped to any 'lexinfo element'.
Portuguese LMF Apertium Dictionary This is the LMF version of the Apertium Portuguese dictionary. Monolingual dictionary for Portuguese was generated from the Apertium expanded lexicon of the pt-ca pair system. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Portuguese-Catalan LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for Portugues and Catalan languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Portuguese-Galician LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual ditionary for Portugues and Galician languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Spanish EuroWordNet-Lemon This is the Spanish EuroWordNet-Lemon lexicon. The lexicon was created from the Spanish Word-Net-LMF lexicon which is part of the Multilingual Central Repository (MCR http://adimen.si.ehu.es/web/MCR). The lexicon conforms to the 'lemon' specification.
Spanish LMF Apertium Dictionary This is the LMF version of the Apertium Spanish dictionary. Monolingual dictionaries for Spanish, Catalan, Gallego and Euskera have been generated from the Apertium expanded lexicons of the es-ca (for both Spanish and Catalan) es-gl (for Galician) and eu-es (for Basque). Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Spanish LMF Freeling Sense This is the LMF version of the Spanish Freeling Sense. FreeLing is a developer-oriented library providing language analysis services. FreeLing is designed to be used as an external library from any application requiring this kind of services. Nevertheless, a simple main program is also provided as a basic interface to the library, which enables the user to analyze text files from the command line. The original Catalan and Spanish sense dictionaries are extracted from EuroWordNet, and the reduced subsets included in this FreeLing package are distibuted under GNU GPL license.
Spanish LMF Freeling Lexicon This is the LMF version of the Spanish Freeling lexicon. FreeLing is a developer-oriented library providing language analysis services. FreeLing is designed to be used as an external library from any application requiring this kind of services. Nevertheless, a simple main program is also provided as a basic interface to the library, which enables the user to analyze text files from the command line.
Spanish LMF Parole Lexicon This is the LMF version of the Spanish Parole lexicon. The original PAROLE lexica (20,000 entries per language) were built conform to a model based on EAGLES guidelines and GENELEX results, underlying a common lexical tool adapted from the EUREKA-GENELEX project. This software tool was extended to support the PAROLE model and conversion and management processes of the resulting resources. The languages involved in PAROLE lexica are: Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Italian, Portugese, Spanish and Swedish. The current lexicon contains 64594 lexical entries.
Spanish LMF Parole/Simple Lexicon This is the LMF version of the Spanish Parole-Simple lexicon. The original PAROLE lexica (20,000 entries per language) were built conform to a model based on EAGLES guidelines and GENELEX results, underlying a common lexical tool adapted from the EUREKA-GENELEX project. This software tool was extended to support the PAROLE model and conversion and management processes of the resulting resources. The languages involved in PAROLE lexica are: Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Italian, Portugese, Spanish and Swedish. The goal of SIMPLE project was to add semantic information, selected for its relevance for LE applications, to the set of harmonised multifunctional lexica built for 12 European languages by the PAROLE consortium. PAROLE +SIMPLE lexicons contain morphological, syntactic and semantic information, organised according to a common model and to common linguistic specifications.
Spanish PAROLE-SIMPLE lemon Lexicon The Spanish Parole/Simple 'lemon' Lexicon is the OWL version of the Spanish Parole & Simple lexicons (defined during the PAROLE LE2-4017 and SIMPLE LE4-8346 projects) once mapped to Lexinfo Model (http://lexinfo.net/). This data set has been published as Linked Open Data in the Data Hub (http://thedatahub.org/en/dataset/parole-simple-ont). The goal of SIMPLE project was to add semantic information, selected for its relevance for LE applications, to the set of harmonised multifunctional lexica built for 12 European languages by the PAROLE consortium. PAROLE +SIMPLE lexicons contain morphological, syntactic and semantic information, organised according to a common model and to common linguistic specifications. The Spanish Parole/Simple lexicon contains 7,572 entries fully annotated with syntactic and semantic information.
Spanish WordNet-LMF This is the Spanish Word-Net-LMF lexicon. The Spanish lexicon is part of the Multilingual Central Repository (MCR http://adimen.si.ehu.es/web/MCR) and contains 37,876 lexical Entries. The MCR currently integrates in the same EuroWordNet framework wordnets from five different languages: English, Spanish, Spanish, Basque and Spanish. Its format was defined during the KYOTO Project (http://www.kyoto-project.eu/). The lexicon validates against the kyoto_wn.dtd which is also included in this distribution. The kyoto_wn.dtd is LMF compliant.
Spanish-Aragonese LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for Spanish and Aragonese languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Spanish-Asturian LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for Spanish and Asturian languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Spanish-Catalan LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for Spanish and Catalan languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Spanish-Galician LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for Spanish Galician languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Spanish-Portuguese LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for Spanish and Portuguese languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
Spanish-Romanian LMF Apertium Bilingual dictionary This is the LMF version of the Apertium bilingual dictionary for Spanish Romanian languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
TRL Spanish V-SUBCAT lexicon: LMF Format Gold-standard for Spanish verbal subcategorization frames. The gold-standard was built merging two manually developed dictionaries: the Spanish working lexicon of the Incyta Machine Translation system (Alonso and Bocsák, 2005) and the Spanish Resource Grammar (Marimon 2010). See Necsulescu et al (2011) for details about the process of merging both dictionaries and about the information encoded in the feature structures.
Termoteca This lexical resource is the LMF version of the Termoteca, a multilingual terminological database based on the monolingual and parallel speciality texts collected in the corpora of the University of Vigo, namely in the CLUVI Corpus and in the Galician Technical Corpus.