This directory contains the code and data for selecting the nonword stimuli.
As outlined in the paper, we performed the following steps:
- Selected monomorphemic syllables involving orthographically existing onsets, bodies, and legal bigrams from ARC Nonword Database, resulting in
ARC_nonwords.txt - Filtered for low numbers of onset and phonological neighbors (see Filtering nonwords)
- Randomly distributed resulting 256 nonwords into 64 groups of 4, resulting in
stimuli/nonwords.txt
We used filter_nonwords.py to filter nonwords based on NN, NON, and NPN (see the ARC webpage for explanation of column headers) and randomly sample 500. We then manually selected 256 based on pronuncability, as indicated by the Include column in nonwords_filtered_sample_annot.csv.