본문 바로가기

Corpus

the open parallel corpus


... the open parallel corpus


http://opus.lingfil.uu.se/

OPUS is a growing collection of translated texts from the web. In the OPUS project we try to convert and align free online data, to add linguistic annotation, and to provide the community with a publicly available parallel corpus. OPUS is based on open source products and the corpus is also delivered as an open content package. We used several tools to compile the current collection. All pre-processing is done automatically. No manual corrections have been carried out.

Publications

Jörg Tiedemann, 2009,
News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces [pdf]
In N. Nicolov and K. Bontcheva and G. Angelova and R. Mitkov (eds.) Recent Advances in Natural Language Processing (vol V), pages 237-248, John Benjamins, Amsterdam/Philadelphia
Jörg Tiedemann, 2011,
Bitext Alignment, Synthesis Lecture on HLT, Morgan & Claypool Publishers (at Amazon)
Jörg Tiedemann, 2008,
Synchronizing Translated Movie Subtitles. [pdf]
In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC'2008)
Jörg Tiedemann, 2007,
Building a Multilingual Parallel Subtitle Corpus. [pdf]
In Proceedings of CLIN 17, Leuven, Belgium, 2007.
Jörg Tiedemann, 2007,
Improved Sentence Alignment for Movie Subtitles. [pdf]
In Proceedings of RANLP '07, Borovets, Bulgaria, 2007.
Jörg Tiedemann, unpublished
OPUS - an open source parallel corpus. [pdf]
In Proceedings of the 13th Nordic Conference on Computational Linguistics, University of Iceland, Reykjavik, 2003.
Jörg Tiedemann, Lars Nygaard, 2004
The OPUS corpus - parallel & free. [pdf]
In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC'04). Lisbon, Portugal, May 26-28.

'Corpus' 카테고리의 다른 글

Tanaka_Corpus  (0) 2012.02.08
Japanese-English Parallel Corpus  (0) 2012.02.08
British Academic Written English Corpus (BAWE)  (0) 2012.02.08
Corpora and Language Teachers:  (0) 2012.01.25
A Translators Reading List of Corpus-Related Works  (0) 2012.01.25