ICAME46: Per Corpora ad Astra: Exploring the Past, Mapping the Future
Welcome to the 46th ICAME (International Computer Archive of Modern and Medieval English) Conference!
Hosted by the Faculty of Philology at Vilnius University, this year’s conference brings together researchers from 25 countries to explore corpora, English, and the latest in corpus linguistics.
Over five days in June, participants will engage in keynote talks, pre-conference workshops, software demonstrations, and presentations of full papers and work-in-progress reports.
Beyond academia, the social program offers a walking tour of Vilnius Old Town, a welcome reception, a boat trip around magnificent Trakai Island Castle, and a conference dinner with a disco at the iconic 1960s-style “Neringa” restaurant.
Vilnius University’s motto is “Hinc Itur ad Astra”—while we can’t literally take you to the stars, we promise an inspiring experience!
On behalf of the ICAME46 organising committee,
Prof. Jolanta Šinkūnienė
Keynote speakers

Sebastian Hoffmann
Universität Trier
Rosa Lorés
Universidad de Zaragoza
Rūta Petrauskaitė
Vytautas Magnus University
Lukas Sönning
Universität Bamberg
Lukas Sönning
Universität Bamberg
Lukas Sönning is a post-doctoral researcher associated with the Chair of English Linguistics at the University of Bamberg (Germany). Following his PhD project, which looked at phonological features in German Learner English, his interest shifted to statistical aspects of corpus-linguistic methodology. He has worked on topics such as keyness analysis, dispersion, and down-sampling, and his habilitation (post-doc) project concentrates on the linguistically grounded use of mixed-effects models in variationist corpus research. Lukas has also been an active promoter of open-science practices and his work is strongly informed by his passion for data visualization. He is currently also involved in a DFG-funded project on the analysis of high-dimensional survey data drawn from the BSLVC (Bamberg Survey of Language Variation and Change).
Abstract
Per corpora et diagrammata ad astra: Data visualization in corpus linguistics
Since corpus-based work often involves the quantitative analysis of relatively complex data sets, data visualization has always played a critical role in our field. Today we are confronted with an unprecedented supply of graph types, which are in many cases relatively straightforward to implement with freely available software such as R. While this overabundance holds out many opportunities both for the individual researcher and for the scientific community, it also necessitates critical reflection and debate about the merits and added value of (novel) graph types.
This talk traces the evolution of corpus data visualization over the course of the past 30 years. An analysis of just over 1,200 published corpus-linguistic research articles allows us to chart emerging practices in the field, identify trends, and examine the state-of-the-art. We observe that the usage rate of graphs has increased over time, and, as a means of data communication, they are nowadays on a par with tabular displays. While our review does detect a recent influx of novel graph types, the usage rate of traditional forms is remarkably stable over time, suggesting that certain workhorses of data visualization are here to stay.
The present talk will illustrate what a constructive discourse about data visualization in our field could look like. To this end, I will examine the use of the three most common graph types – bar charts (37% of articles), line plots (23%), and scatterplots (14%) – from the viewpoint of the design recommendations given in the data visualization literature. This kind of critical review allows us to see where we stand, and to acknowledge room for improvement. As this discussion targets the common core of visuals in corpus linguistics, it is likely to be of relevance for most practicing corpus linguists. Further, we will take a closer look at a number of newcomers, which have recently entered the scene of presentation graphs in corpus-linguistic journals. Specifically, we will examine (the use of) dendrograms, mosaic charts, and CARTs (classification and regression trees) from the perspective of graph construction and perception. Since each of these forms has applied emphatically for a permanent position in the corpus-linguistic visualization toolbox, a careful engagement with their (potential) weaknesses is needed.
This talk will give us an opportunity to take a dive into the fascinating field of visualization research, including its cognitive underpinnings and empirical grounding. It will be apparent that there are many implicit and explicit parallels between work on (statistical) data visualization and the study of language.