When word frequencies do not regress towards the mean
- R. Harald Baayenc(Author),
- Fermín Moscoso del Prado Martínb(Author),
- Robert Schreuderc(Author),
- aWayne State University,
- bMRC Cognition and Brain Sciences Unit,
- cUniversity of Nijmegen
Research Output: Chapter in Book/Report/Conference proceeding Chapter
Abstract
Ever since Gernsbacher (1984), it is widely believed that word frequency counts based on corpora are unreliable, particularly for the highest and lowest frequency words due to regression towards the mean. In this study, however, we show that word frequency counts across corpora are not subject to regression towards the mean, neither in theory nor in practice. Sampling error due to underdispersion, however, remains a serious concern.
