본문 바로가기

Conferences

Bootcamp 'Corpus linguistics and/or statistics with R'

Bootcamp 'Corpus linguistics and/or statistics with R'

July 30 - Aug 5 for the first session: Corpus linguistics with R


1. Bootcamp 'Corpus linguistics with R'
The corpus bootcamp is a 30-hours hands-on introduction to quantitative corpus linguistics for both graduate students and seasoned researchers. Using the open source software and programming language R, we will learn
−how to generate frequency lists and search for words and patterns;
−how to process corpora and perform corpus-linguistic searches in ways that typical corpus software does not support;

 


−how to write small functions for recurrent corpus-linguistic tasks.
Data to be dealt with include plain text corpora, corpora with SGML or XML annotation, ICE-GB files, and others. The participants will also get small functions and scripts they can use for their own corpus-linguistic tasks (concordancing, generating n-grams of words or characters, and others).
The content of this corpus linguistics bootcamp is based on Gries (2009e, <http://tinyurl.com/QuantCorpLingWithR>) but (i) structured differently to accommodate the workshop format of the bootcamp and (ii) provides functions and examples not discussed in it.

2. Bootcamp 'Statistics for linguistics with R'
The statistics bootcamp is a 30-hours hands-on introduction to statistical methods for both graduate students and seasoned researchers. Using the open source software and programming language R, we will
−briefly recap basic aspects of statistical evaluation as well as several descriptive statistics;
−discuss monofactorial statistical tests for frequencies, means, dispersions, correlations;
1
−explore different kinds of multifactorial and multivariate methods, in particular different kinds of regression approaches as well as hierarchical cluster analysis.
For all statistical methods to be explored, we will discuss how to test their assumptions and visualize their results with nice and annotated statistical graphs, and sometimes we will reanalyze published data from corpus-linguistic studies. The participants will also get small functions they can use for their own statistical applications. Also, there will be a small section on how to write small statistical/visualization functions yourself.
The content of this statistics bootcamp is based on Gries (2009d, <http://tinyurl.com/StatForLingWithR>), but goes beyond it in terms of the methods and datasets covered; in fact, the bootcamp will use materials currently being integrated into the second edition.