Given a list of URLs pointing to LMF files, this webservice merges them into a single LMF file. It works for LMF files encoding the information in the same way, i.e. same labels, values and structure. This will work, for example, for merging different lexica learnt under PANACEA platfor... more
BASYQUE (Base de Données Syntaxique Basque) is the web application we have developed to store, organize, manage and search for all the information concerning dialectal variation in Basque speaking areas, and specifically, in the North-Eastern Basque dialects. In order to collect and ana... more
Application that finds words ended by the character sequence given by the user. BertsolarIXA is able to find not only lemmas but also inflected forms. Results can be filtered by the domain and phonetic rules can also be applied. It is a tool aimed to help verse-makers.
Word-Sense Disambiguation. The WSD system is based on the well known Support Vectors Machine (SVM) Algorithm. This system has been trained on EuSemCor corpus (the unique basque corpus semantically tagged). Due to corpus's reduced size, the WSD system has been trained for 402 polysemous ... more
This WS creates an alignment file combining the Hunalign output and two sentences id lists extracted from GrAF documents.
Freeling-based chunker parser. The languages supported are English, Catalan, Spanish, Asturian and Galician. WARNING: This WS has a new version.
A modular set of Natural Language Processing tools for English and Spanish. IXA pipes is a modular set of Natural Language Processing tools (or pipes) which provide easy access to NLP technology for English and Spanish. It offers robust and efficient linguistic annotation to both resear... more
This WS performs a FreeLing-based sentence splitter. The WS splits a file in plain text format and UTF-8 encoded into units (tokens). Output sentences are separated by empty lines. The languages supported are English, Catalan, Spanish, Asturian, Welsh, Galician, Italian, Russian and Po... more
This WS performs basic text transformations on an input text. The serveice is based on the 'sed' progam, a Unix utility that parses and transforms text, using a simple, compact programming language.
A modular set of Natural Language Processing tools for English and Spanish. IXA pipes is a modular set of Natural Language Processing tools (or pipes) which provide easy access to NLP technology for English and Spanish. It offers robust and efficient linguistic annotation to both resear... more
This WS converts the results of IULA tagger (PoS tagger) in GrAF output.
This WS deploys a FreeLing-based text tokenizer. The WS splits a file in plain text format and UTF-8 encoded into units (tokens). The languages supported are Catalan, English, Galician, Italian, Portuguese, Russian, Spanish, Welsh, and Asturian. WARNING: This WS has a new version.
This WS converts a corpus to Weka vector arff file. The language supported are Asturian, Catalan, English, Galician, Italian, Portuguese, Russian, Spanish, and Welsh.
This WS provides a text segmentation into minor structural units (titles, paragraphs, sentences, etc.); detection of entities (not found in a dictionary: numbers, abbreviations, URLs, emails, etc.); and the keeping of sequences of two or more words in a single block (dates, phrases, etc... more
This WS performs a FreeLing-based sentence splitter (v 3.0). The WS splits a file in plain text format and UTF-8 encoded into units (tokens) separated by new lines. Output sentences are separated by empty lines. The languages supported are English, Catalan, Spanish, Asturian, Welsh, Ga... more
This WS performs a FreeLing-based part-of-speech tagger (v 3.0). WS job duration depends on the server load, approximately 1 million words takes one minute. The languages supported are English, Catalan, Spanish, Asturian, Welsh, Galician, Italian, and Portuguese. The output is a tabula... more
Given a list of lemmas, the WS looks for the occurrences of them in IULA corpus, applies the given regular expressions and returns all the signatures.
This WS deploys a FreeLing-based text tokenizer (v 3.0). The WS splits a file in plain text format and UTF-8 encoded into units (tokens) where tokens are separated by new lines. The languages supported are Catalan, English, Galician, Italian, Portuguese, Russian, Spanish, Welsh, and As... more
The IULA tokenizer WS splits a file in plain text format and UTF-8 encoded into units (tokens). The languages supported are Catalan and Spanish.
This WS collects all the headers of input XML files used in a Taverna workflow. The metadata that can be stored in the resulting XML file are: 1) workflow name, 2) workflow myExperiment link, 3) processors list, and 4) list of XML headers.
This WS allows querying an already indexed corpus (see CQP indexer WS for indexing details). The WS is based on the IMS Open Corpus Workbench (CWB). Language independent WS.
CQP indexer WS based on the IMS Open Corpus Workbench (CWB). The input is an annotated corpus in tabular format. The output is the Corpus ID to be used by the CQPquery Web Service. Language independent WS.
This WS creates a compress file (in TGZ format) with output documents stored on this same server using their URL
This WS converts the character encoding of given files from one encoding to another. Based on the Linux 'iconv' command used to convert between different character encodings.
This WS converts a corpus to Weka vector arff file. The language supported are Asturian, Catalan, English, Galician, Italian, Portuguese, Russian, Spanish, and Welsh.
This WS creates an alignment file combining the Hunalign output and two sentences id lists extracted from GrAF documents.
A command line tool for applying XSLT stylesheets to XML documents.
This WS is a Panacea project converter that creates GrAF elements from dependency parser output.
This WS is a Panacea project converter that creates GrAF documents from the output of PoS taggers (Freeling and IULA tagger).
A WS to convert MS Word documents to plain text format. Language independent WS.
This WS is a Panacea project converter that creates GrAF skeleton from BASIC XCES documents.
Convert character encoding of given files from one encoding to another. Based on the Linux 'iconv' command that converts text from one encoding to another encoding.
This is the Panacea conversion tool.
A WS that converts BasicXCES text corpus in plain text (.TXT).
A WS to convert HTML documents to plain text format. Language independent WS.
A WS to convert PoS Tagger formats to XCES.
Given a XML signatures file (signatures.xml) and the indicators file (indicators.txt) with the nouns that belong or not to the class, this WS creates a file in ARFF format to experiment with Weka. Warning: the default encoding for input and outputs files is ISO-8859-1. It may be changed... more
This WS converts PDF documents to plain text format. Language independent WS.
Given a training set encoded as vectors of cue (or feature) occurrences, this web service estimates the parameters P(cuei|class): the probability of seeing each cue as a member or non-member of the class. This estimation is performed using Bayesian inference, which combines prior knowle... more
This WS identifies process nouns in a part of speech tagged text (with FreeLing Morphosyntactic tagger V 3.0 WS). The classification is performed with a pre-trained Decision Tree. The output is a LMF file with the classifier prediction for each noun. You can choose to have this pred... more
This web service creates a weka file containing context information of a list of nouns in a given corpus. The context information for each noun is extracted using a set of Regular Expressions and it is encoded in one vector (one line per noun in the weka file). Each slot in the vector r... more
Given two LMF files, this webservice merges them into a single LMF file. It works for LMF files encoding the information in the same way, i.e. same labels, values and structure. This will work, for example, for merging different lexica learnt under PANACEA platform. If the LMF files con... more
This WS identifies artifact nouns in a part of speech tagged text (with FreeLing Morphosyntactic tagger V 3.0 WS). The classification is performed with a pre-trained Decision Tree. The output is a LMF file with the classifier prediction for each noun. You can choose to have this pr... more
This WS identifies eventive nouns in a part of speech tagged text (with FreeLing Morphosyntactic tagger V 3.0 WS). The classification is performed with a pre-trained Decision Tree. The output is a LMF file with the classifier prediction for each noun. You can choose to have this pre... more
This WS identifies matter nouns in a part of speech tagged text (with FreeLing Morphosyntactic tagger V 3.0 WS). The classification is performed with a pre-trained Decision Tree. The output is a LMF file with the classifier prediction for each noun. You can choose to have this predic... more
This WS identifies abstract nouns in a part of speech tagged text (with FreeLing Morphosyntactic tagger V 3.0 WS). The classification is performed with a pre-trained Decision Tree. The output is a LMF file with the classifier prediction for each noun. You can choose to have this pre... more
Given a LMF file with nouns classified with a score (see Nouns classifier Web Services), this WS filters the nouns with confidence over a desired threshold. Language independent WS.
This WS identifies semiotic nouns in a part of speech tagged text (with FreeLing Morphosyntactic tagger V 3.0 WS). The classification is performed with a pre-trained Decision Tree. The output is a LMF file with the classifier prediction for each noun. You can choose to have this pre... more
This WS identifies location nouns in a part of speech tagged text (with FreeLing Morphosyntactic tagger V 3.0 WS). The classification is performed with a pre-trained Decision Tree. The output is a LMF file with the classifier prediction for each noun. You can choose to have this pre... more
This WS identifies human nouns in a part of speech tagged text (with FreeLing Morphosyntactic tagger V 3.0 WS). The classification is performed with a pre-trained Decision Tree. The ouptut is a LMF file with the classifier prediction for each noun. ou can choose to have this predict... more
This WS identifies social nouns in a part of speech tagged text (with FreeLing Morphosyntactic tagger V 3.0 WS). The classification is performed with a pre-trained Decision Tree. The output is a LMF file with the classifier prediction for each noun. You can choose to have this pred... more
This webservice performs traditional Naive Bayes classification of instances given in a weka file. It outputs the predicted classification for each instance and some statistics about the performance of the classification. The parameters needed as input can be learnt using estimate_bayes... more
Given a set of signatures in a weka file (test_file.arff), classify them using the parameters estimated for each cue (theta_file.csv).
This WS verifies that a Soaplab web service is Panacea compliant.
A modular set of Natural Language Processing tools for English and Spanish. IXA pipes is a modular set of Natural Language Processing tools (or pipes) which provide easy access to NLP technology for English and Spanish. It offers robust and efficient linguistic annotation to both resear... more
Word-Sense Disambiguation. The WSD system is based on the well known Support Vectors Machine (SVM) Algorithm. This system has been trained on EuSemCor corpus (the unique basque corpus semantically tagged). Due to corpus's reduced size, the WSD system has been trained for 402 polysemous ... more
Morphological analyzer.
Lemmatizer. Eustagger is a robust and wide-coverage morphological analyser and a Part-of-Speech tagger for Basque. The analyser is based on the two-level formalism and has been designed in an incremental way with three main modules: the standard analyser, the analyser of linguistic vari... more
This Web Service deploys a FreeLing-based morphological analyzer. The languages supported are English, Catalan, Spanish, Asturian, Welsh, Galician, Italian, Russian and Portuguese. WARNING: This WS has a new version.
This WS is based on the Twitter NLP tool developed by Noah's ARK group (Noah Smith's research group at the Language Technologies Institute, School of Computer Science, Carnegie Mellon University). A fast and robust Java-based tokenizer and part-of-speech tagger for Twitter, its trainin... more
This WS converts the results of IULA tagger (PoS tagger) in GrAF output.
This WS provides a text segmentation into minor structural units (titles, paragraphs, sentences, etc.); detection of entities (not found in a dictionary: numbers, abbreviations, URLs, emails, etc.); and the keeping of sequences of two or more words in a single block (dates, phrases, etc... more
This WS performs a FreeLing-based part-of-speech tagger (v 3.0). WS job duration depends on the server load, approximately 1 million words takes one minute. The languages supported are English, Catalan, Spanish, Asturian, Welsh, Galician, Italian, and Portuguese. The output is a tabula... more
This Web Service deploys a FreeLing-based morphological analyzer (v 3.0). The languages supported are English, Catalan, Spanish, Asturian, Welsh, Galician, Italian, Russian and Portuguese.
This WS performs a FreeLing-based part-of-speech tagger. WS job duration depends on the server load, approximately 1 million words takes one minute. The languages supported are English, Catalan, Spanish, Asturian, Welsh, Galician, Italian, and Portuguese. WARNING: This WS has a new ver... more
This WS is a morphosyntatic tagger. The disambiguation process is done by a TreeTagger instance trained by the IULA. The input is plain text in Catalan or Spanish. The output allows optional formats and optional encoding. (http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/)
Eihera is a system for Named Entity recognition and classification in written Basque. The system is designed in four steps: first, the development of a recognizer based on linguistic information represented on finite-state-transducers; second, the generation of semi-automatically annota... more
Historians, literary scientists, and others are interested in the semantic interpretation of text. With automatic pre-processing of texts, e.g. named entity recognition, coreference resolution, and dependency parsing, relevant semantic relations can be extracted. The Stuttgart tools ext... more
A modular set of Natural Language Processing tools for English and Spanish. IXA pipes is a modular set of Natural Language Processing tools (or pipes) which provide easy access to NLP technology for English and Spanish. It offers robust and efficient linguistic annotation to both resear... more
This WS substitutes proper nouns with tags. This process anonymizes an input text by eliminating any person, place, corporation, etc. name. The service automatically calls the FreeLing WS and makes use of its Named Entity Recognition tool to detect proper nouns. The languages supported ... more
ContaWords is a web application that reads the words of a text file and decides what part of speech to assign to each word (credit-Noun-credit but credit-Verb-to_credit). It then begins to count how many times a word appears in the text in every possible way (credits, credit, credited… ... more
This WS performs a FreeLing-based part-of-speech tagger (v 3.0). WS job duration depends on the server load, approximately 1 million words takes one minute. The languages supported are English, Catalan, Spanish, Asturian, Welsh, Galician, Italian, and Portuguese. The output is a tabula... more
This WS performs a FreeLing-based part-of-speech tagger. WS job duration depends on the server load, approximately 1 million words takes one minute. The languages supported are English, Catalan, Spanish, Asturian, Welsh, Galician, Italian, and Portuguese. WARNING: This WS has a new ver... more
This Web Service deploys a FreeLing-based name entity recognizer (v 3.0). The languages supported are English, Catalan, Spanish, Asturian, Welsh, Galician, Italian, Russian and Portuguese.
PML-TQ is a powerful open-source search tool for all kinds of linguistically annotated treebanks with several client interfaces and two search backends (one based on a SQL database and one based on Perl and the TrEd toolkit). The tool works natively with treebanks encoded in the PML dat... more
Keeleveeb is a portal, where one can run queries on several dictionaries and corpora. There are 12 Estonian monolingual dictionaries, 12 bilingual dictionaries (one of them Estonian), 19 Specialty dictionaries, 15 Learner dictionaries (bilingual, Estonian-Russian-Estonian), 23 corpora, ... more
Given a lemma and a category, this WS returns the sentences of the IULA corpus where this lemma occurs. The user can perform a domain search. The languages supported are Spanish and English.
New version of the corpus search and post-processing tool Glossa. While the old version was tightly coupled to the IMS Corpus Workbench (CWB) and could only search in CWB-encoded corpora, the new version is flexible with respect to search engines and can even search in corpora located o... more
GrETEL stands for Greedy Extraction of Trees for Empirical Linguistics. It is a user-friendly search engine for the exploitation of treebanks. It comes in two formats: a) Example-based search: in this search mode you can use a natural language example as a starting point for searching ... more
Given a list of lemmas, the WS looks for the occurrences of them in IULA corpus, applies the given regular expressions and returns all the signatures.
A modular set of Natural Language Processing tools for English and Spanish. IXA pipes is a modular set of Natural Language Processing tools (or pipes) which provide easy access to NLP technology for English and Spanish. It offers robust and efficient linguistic annotation to both resear... more
This WS calculates the probability of seeing a linguistic cue given a lexical class (P(cue|class) value). This probability is computed given the occurrences of cues in a corpus (codified in the signatures file) and the information of belonging or not belonging of these words to differen... more
This WS performs the Count function from Ted Pedersen's Ngram Statistics Package (used to identify word Ngrams that appear in large corpora using standard tests of association such as Fisher's exact test, the log likelihood ratio, Pearson's chi-squared test, the Dice Coefficient, etc.).... more
ContaWords is a web application that reads the words of a text file and decides what part of speech to assign to each word (credit-Noun-credit but credit-Verb-to_credit). It then begins to count how many times a word appears in the text in every possible way (credits, credit, credited… ... more
This WS calculates the Term Frequency (TF) and the Inverse Document Frequency (IDF) of a word in a given corpus. The two values, labeled TF-IDF, are a statistical measure used to evaluate how important a word is to a document in a collection or corpus.
Ted Pedersen's Ngram Statistics Package (used to identify word Ngrams that appear in large corpora using standard tests of association such as Fisher's exact test, the log likelihood ratio, Pearson's chi-squared test, the Dice Coefficient, etc.).
Given a training set encoded as vectors of cue (or feature) occurrences in weka format, this web service computes P(cuei|class): the probability of seeing each cue as a member or non-member of the class using MLE approach (counts frequencies of appearance of each cue in each class). ... more
This WS calculates different lexicometric measures and displays them graphically (tokens, types, hapaxes and type/token ratio). The input is a plain text corpus with one token per line. Language independent WS.
This WS allows analyzing an already indexed corpus (see CQP indexer WS for indexing details). The WS returns an Excel file with some statistical metrics such as number of nouns, verbs, ngrams, etc. The languages supported are Spanish and English.
This WS calculates the probability of seeing a linguistic cue given a lexical class (P(cue|class) value). This probability is computed given the occurrences of cues in a corpus (codified in the signatures file) and the information of belonging or not belonging of these words to differen... more
This WS converts a corpus to Weka vector arff file. The language supported are Asturian, Catalan, English, Galician, Italian, Portuguese, Russian, Spanish, and Welsh.
This WS is a morphosyntatic tagger. The disambiguation process is done by a TreeTagger instance trained by the IULA. The input is plain text in Catalan or Spanish. The output allows optional formats and optional encoding. (http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/)
This WS converts the results of IULA tagger (PoS tagger) in GrAF output.
This WS provides a text segmentation into minor structural units (titles, paragraphs, sentences, etc.); detection of entities (not found in a dictionary: numbers, abbreviations, URLs, emails, etc.); and the keeping of sequences of two or more words in a single block (dates, phrases, etc... more
A modular set of Natural Language Processing tools for English and Spanish. IXA pipes is a modular set of Natural Language Processing tools (or pipes) which provide easy access to NLP technology for English and Spanish. It offers robust and efficient linguistic annotation to both resear... more
The file espmalt-1.0.mco contains a single malt configuration for parsing Spanish text with MaltParser. The parser presupposes that the input is in CoNLL-X format and tagged with the part-of-speech tags of FreeLing tagger.
A modular set of Natural Language Processing tools for English and Spanish. IXA pipes is a modular set of Natural Language Processing tools (or pipes) which provide easy access to NLP technology for English and Spanish. It offers robust and efficient linguistic annotation to both resear... more
Freeling-based chunker parser. The languages supported are English, Catalan, Spanish, Asturian and Galician. WARNING: This WS has a new version.
This WS performs a FreeLing-based chunker parser (v 3.0). The WS requires a plain text input. The possible outputs formats are FreeLing , XML, and XML CQP ready. The languages supported are English, Catalan, Spanish, Asturian and Galician.
This WS deploys a FreeLing-based dependency parser (v 3.0). The WS requires a plain text input. The possible outputs formats are FreeLing, XML, and XML CQP ready. The languages supported are English, Catalan, Spanish, Asturian and Galician.
Chunking for Basque. There is a web service. Zatiak performs shallow syntactic analysis of a sentence. This program reads an input text and, after morphological processing, identifies pieces of text (chunks). Each chunk is marked with its type: nominal phrase (NP or PP) or verb chain, t... more
Freeling-based dependency parser. The languages supported are English, Catalan, Spanish, Asturian and Galician. WARNING: This WS has a new version.
This WS deploys a FreeLing-based text tokenizer. The WS splits a file in plain text format and UTF-8 encoded into units (tokens). The languages supported are Catalan, English, Galician, Italian, Portuguese, Russian, Spanish, Welsh, and Asturian. WARNING: This WS has a new version.
This WS converts the results of IULA tagger (PoS tagger) in GrAF output.
The IULA tokenizer WS splits a file in plain text format and UTF-8 encoded into units (tokens). The languages supported are Catalan and Spanish.
This WS deploys a FreeLing-based text tokenizer (v 3.0). The WS splits a file in plain text format and UTF-8 encoded into units (tokens) where tokens are separated by new lines. The languages supported are Catalan, English, Galician, Italian, Portuguese, Russian, Spanish, Welsh, and As... more
A modular set of Natural Language Processing tools for English and Spanish. IXA pipes is a modular set of Natural Language Processing tools (or pipes) which provide easy access to NLP technology for English and Spanish. It offers robust and efficient linguistic annotation to both resear... more
This WS is based on Ted Pedersen's Text Similarity module. It measures the similarity of two documents based on the number of shared words scaled by the lengths of the files. Text Similarity WS computes the F-Measure, the Dice Coefficient, the Cosine, and the Lesk measure. Language inde... more
This WS randomizes the order of the translation units in TMX files. The goal is to make it difficult to reproduce the original text. The input size limit is 100 MB. Language independent WS.
This WS substitutes proper nouns with tags. This process anonymizes an input text by eliminating any person, place, corporation, etc. name. The service automatically calls the FreeLing WS and makes use of its Named Entity Recognition tool to detect proper nouns. The languages supported ... more
This WS will scramble the lines in a parallel text corpus keeping the alignment. The goal is to make it difficult to reproduce the original text. The input size limit is 100 MB. Language independent WS.
This WS scrambles the lines in a file. The goal is to make it difficult to reproduce the original text. The input size limit is 100 MB. Language independent WS.
Given a verb (infinitive or a verbal form) this WS outputs its verbal paradigm grouped according tense and mode. The languages supported are Catalan and Spanish.
Given a word form, this WS returns the lexical information by looking it up in the IULA's lexicon. The languages supported are Catalan, Spanish or English.
WordTies describes a multilingual wordnet initiative embarked in the META-NORD/ META-NET projects and originally concerned with the validation and pilot linking between Nordic and Baltic wordnets. Wordnets in Nordic/Baltic countries. The builders of these wordnets have applied very d... more
Keeleveeb is a portal, where one can run queries on several dictionaries and corpora. There are 12 Estonian monolingual dictionaries, 12 bilingual dictionaries (one of them Estonian), 19 Specialty dictionaries, 15 Learner dictionaries (bilingual, Estonian-Russian-Estonian), 23 corpora, ... more
It is a web application that enables simultaneous search in three micro-comparative databases on Dutch dialects via a common interface. This makes it possible to investigate potential correlations between variables at the three different linguistic levels. Cartographic functionality ena... more
Students basic dictionary (Cuba). The GUI of the Diccionario Básico Escolar allows, besides common dictionary lookup, detecting the most common misspellings, consulting verb conjugation, syllabification of the headwords and, in some cases, watching illustrations attached to the entries.... more
Machine translation from Spanish to Basque. Matxin is a Transfer-based MT system from Spanish into Basque. It is an open, reusable and interoperable framework which can be improved in the next future combining it with the statistical model. The MT architecture reuses several open tools ... more
Statistical Machine Translation from Spanish to Basque. Use of segmentation and reordering in Statistical Machine Translation from Spanish to Basque. It allows our system to achieve a relative improvement of 10% in the HTER metric.
A Question-Answering system for the area of Science and Technology. Ihardetsi is a question answering system for Basque. It is a general platform which architecture pays special attention to: 1) the integration of the development and evaluation environments, and 2) the systematic use of... more
Spelling corrector on-line. Xuxen is a spelling corrector for Basque integrated in MS-Office, OpenOffice, Firefox, OCR programs and others. It can be downloaded from the Basque Govern's website (> 25.000 downloads) Eleka is the company which manages it now. The fact that Basque is a ... more
This WS performs dependency parsing using Bohnet's graph-based Parser. The input is text in plain text or CoNLL format. The languages supported are English and Spanish.
This WS calls an instance of MaltParser for Spanish trained with the IULA treebank developed in the Metanet4you project. The input of this WS is plain text. The service performs PoS tagging with FreeLing and then performs the dependency parsing using Malt parser. The output follows CoNL... more
This WS deploys a FreeLing-based dependency parser (v 3.0). The WS requires a plain text input. The possible outputs formats are FreeLing, XML, and XML CQP ready. The languages supported are English, Catalan, Spanish, Asturian and Galician.
Freeling-based dependency parser. The languages supported are English, Catalan, Spanish, Asturian and Galician. WARNING: This WS has a new version.
Statistic-based dependency parser. Given a set of sentences in Basque, each sentence in a line, obtains a dependency-analysis of the sentences in a format equivalent (although not totally equal, as the columns appear in a different order) conll format.
Historians, literary scientists, and others are interested in the semantic interpretation of text. With automatic pre-processing of texts, e.g. named entity recognition, coreference resolution, and dependency parsing, relevant semantic relations can be extracted. The Stuttgart tools ext... more
This WS scrambles the lines in a file. The goal is to make it difficult to reproduce the original text. The input size limit is 100 MB. Language independent WS.
This WS performs a FreeLing-based sentence splitter (v 3.0). The WS splits a file in plain text format and UTF-8 encoded into units (tokens) separated by new lines. Output sentences are separated by empty lines. The languages supported are English, Catalan, Spanish, Asturian, Welsh, Ga... more
This WS allows extracting a column from a tabular file input text. It is useful to work with CoNLL or FreeLing annotated corpora. Language independent WS.
Convert character encoding of given files from one encoding to another. Based on the Linux 'iconv' command that converts text from one encoding to another encoding.
This WS deploys a FreeLing-based text tokenizer (v 3.0). The WS splits a file in plain text format and UTF-8 encoded into units (tokens) where tokens are separated by new lines. The languages supported are Catalan, English, Galician, Italian, Portuguese, Russian, Spanish, Welsh, and As... more
This WS performs basic text transformations on an input text. The serveice is based on the 'sed' progam, a Unix utility that parses and transforms text, using a simple, compact programming language.
A command line tool for applying XSLT stylesheets to XML documents.
This WS substitutes proper nouns with tags. This process anonymizes an input text by eliminating any person, place, corporation, etc. name. The service automatically calls the FreeLing WS and makes use of its Named Entity Recognition tool to detect proper nouns. The languages supported ... more
This WS splits an input file into smaller files containing the number of lines indicated as input parameter. Splitted files are stored in the results public directory, and the output is a file with the list of URLs pointing to each splitted file. Language independent WS.
Historians, literary scientists, and others are interested in the semantic interpretation of text. With automatic pre-processing of texts, e.g. named entity recognition, coreference resolution, and dependency parsing, relevant semantic relations can be extracted. The Stuttgart tools ext... more
It is a web application that enables simultaneous search in three micro-comparative databases on Dutch dialects via a common interface. This makes it possible to investigate potential correlations between variables at the three different linguistic levels. Cartographic functionality ena... more
Migmap is a web application where the user first chooses generation (forward or backward in time) and gender, while the migration map of The Netherlands related to an interactively pointed municipality (or other aggregation unit) is shown. The existing map-making software module "Kaart"... more