IULA Penn Treebank Corpus Text

Resource Name IULA Penn Treebank
Description This treebank consists of a number of Spanish and English sentences that has been manually annotated with syntactical information. The sentences have been choosed from the Penn TreeBank corpus, a resource containing texts from Wall Street Journal and originally compiled by the University of Pennsylvania. It contains 805 sentences that have been human translated to Spanish. The original English and the translated Spanish sentences share the same identification number. Sentences in both languages have been processed using the DELPH-IN environment (http://www.delph-in.net/).
Language Name
  • English
  • Spanish
Url http://hdl.handle.net/10230/20049
Documentation Annotating Wall Street Journal Texts Using A Hand Crafted Deep Linguistic Grammar
Annotation Mode Manual
Annotation Standoff false
Annotation Tool http://nlp.lsi.upc.edu/freeling/
Annotation Type Syntactic Annotation: Treebanks
Character Encoding Utf 8
Contact Person Jorge Vivaldi
Creation Mode Automatic
Domain
  • medicine
  • economy
  • environment
  • computer science
  • law
Funding Project Metanet4 U – Enhancing The European Linguistic Infrastructure
Identifier http://hdl.handle.net/10230/20049
Language Code
Language Identifier
  • en
  • es
Licence Cc By
Media Type Media Type
Meta Share Identifier NOT_DEFINED_FOR_V2
Mime Type http://purl.org/NET/mediatypes/text/xml
Multilinguality Type Parallel
Original Source http://www.cis.upenn.edu/~treebank/
Resource Creator Universitat Pompeu Fabra. Institut Universitari De Lingüística Aplicada (IULA)
Resource Short Name IULA Penn Treebank
Segmentation Level Word
Size Information http://lodserver.iula.upf.edu/Metashare/resource/size_N19484
Tagset MULTEX/PAROLE