Weblinks

ONLINE CORPORA

There are a number of corpora of English which are available online for free. To find them you can just do a search for ‘online corpora’ in you search engine. However, here are a few of the corpora available, with the size (in million words), dates and Internet address:

Collins Wordbanks Online 550m, 1980s to present; http://www.collinslanguage.com
(free trial only)

British National Corpus, 100m, early 1990s; http://www.natcorp.ox.ac.uk/
(there is a limit of 50 lines)

Corpus of Contemporary American English, 425m, 1990–2011; http://corpus.byu.edu/coca

Google Book, 155m 1810–2009 (books only); http://googlebooks.byu.ed/

These are generalised corpora: most include some spoken English and cover different registers/genres (e.g. newspapers, academic writing, novels). There are many more specialised corpora (e.g. of learner English, of particular varieties of English).

The most obvious use you will want to make of them is to call up some examples, or concordance lines, of a word to check on the way it is used (as suggested in one of the activities following D7). The programmes associated with the corpora will do this kind of search (sometimes called KWIC - keyword in context) very quickly. Bear in mind, though, that some words such as the are very frequent, so you might need to limit the number of words that you request. Other possible refinements of a search are:

  • to order or ‘sort’ the concordance lines according a number of criteria, the most common being to arrange the lines alphabetically according to the word following or preceding the keyword
  • to extend the line to see more context: most corpora initially offer around 100 characters (letters, punctuation, gaps), which is adequate for most purposes
  • to search for phrases (or ‘strings’) as well as words
  • to search for ‘lemmas’ (word-families, for example run, runs, ran, running)
  • to use ‘wildcards’, e.g. recogni* will yield recognise etc. as well as recognition
  • to search for words according to the class and sub-class (e.g. run as a noun)
  • to search for collocates, i.e. words that are commonly found together with the keyword (but not necessarily next to it)

 

There are many other things that such programmes can do, such as produce frequency lists.