Crawling the web (Motoko Ueyama)


Working papers on the Web as Corpus

This book collects articles deriving from presentations at two Web as Corpus workshops (held in Forlė and Birmingham in 2005) and articles that were born out of discussions and collaborative experimentation among the WaCky community members. WaCky (for "Web as Corpus kool ynitiative") brings together linguists who think the World Wide Web is a great resource for their research, and that it would be even greater if it could be annotated and interrogated in a more linguist-friendly way.

Topics covered in this book include practical experiences with the construction and evaluation of Web corpora, methods to classify and represent Web corpora, and applications to terminology. The introduction provides an accessible account of the various steps and issues involved in building very large Web corpora and making them available to the linguistic community. English, Chinese and Japanese are among the studied languages.

Web corpora are undoubtedly a timely and important topic for the corpus/computational linguistics community. This book is unique in that it provides detailed technical discussion of the issues related to constructing Web corpora, as well as examples of concrete applications to terminology practice and teaching. As such, it should be of interest to a wide audience of linguists, language technologists, language/translation teachers and language professionals.

How to quote this book:
Baroni, Marco and Bernardini, Silvia (eds.) 2006. Wacky! Working papers on the Web as Corpus. Bologna: GEDIT. [ISBN 88-6027-004-9]

Credits and acknowledgements
Download the whole book
Download single papers

© The authors and editors
First edition: September 2006

Creative Commons License
This work is licensed under a Creative Commons Attribution-NoDerivs 2.5 License.