Speech Communication Research Group Logo

Speech Communication Research Group



OSCAAR (Online Speech/Corpora Archive and Analysis Resource) provides a secure, web-accessible and extensible repository for the many speech recordings and experimental materials of the Speech Communication Research Group. OSCAAR also includes a small number of speech recording collections from colleagues in the Linguistics Department at Northwestern, as well as from colleagues beyond Northwestern (Indiana University and University College London).

If you are interested in accessing speech recordings from OSCAAR, please sign up with account on the OSCAAR website.

OSCAAR was originally developed (in 2009) by Tyler Kendall. In many ways, the original vision of OSCAAR was influenced by the Sociolinguistic Archive and Analysis Project (SLAAP) at North Carolina State University. The current version of OSCAAR was developed by Chun Liang Chan

To learn more about this project, please visit the OSCAAR website.


ALLSSTAR (Archive of L1 and L2 Scripted and Spontaneous Transcripts And Recordings) is an continuously expanding corpus of digitized speech recordings by speakers from various language backgrounds performing comparable speech production tasks in both English (their L2) and their L1. The ALLSSTAR corpus currently contains recordings from over 120 speakers from a variety of L1 backgrounds. The recorded materials in all languages include both simple and complex sentences, paragraphs and spontaneous monologues.

Intelligibility ratings are available for a number of native Mandarin Chinese speakers, native English speakers and Spanish Heritage speakers.

To learn more about this project, please visit the ALLSSTAR website.

Diapix logo  Diapix

Diapix is a dialogue elicitation technique used within the Speech Communication Research Group that involves pairs of participants cooperatively completing a language task together.

To learn more about this technique, please visit the Diapix website.

The Wildcat Corpus of Native- and Foreign-Accented English is a corpus of scripted and spontaneous speech recordings from 24 native speakers of American English and 52 non-native speakers of English. The core element of this corpus is a set of spontaneous speech recordings, for which a new method of eliciting dialogue-based, laboratory-quality speech recordings was developed (the Diapix task). In addition to scripted materials, the corpus includes dialogues between two native speakers of English, between two non-native speakers of English (either with shared or different L1s) and between one native and one non-native speaker of English.

To learn more about the Wildcat Corpus, please visit the Wildcat Corpus website.