Automated creak detection

This MATLAB wrapper was created by Manuel E. Díaz Cadiz, MS, for our project investigating percent (%) creak as a potential outcome measure for AdLD. Please cite:

Marks, K. L., Díaz Cádiz, M. E., Toles, L. E., Buckley, D. P., Tracy, L. F., Noordzji, J. P., Grillone, G. A., & Stepp, C. E. (In Press). Automated Creak Differentiates Adductor Laryngeal Dystonia and Muscle Tension Dysphonia. The Laryngoscope.

The algorithm used in the MATLAB wrapper for this project is available open source [Covarep v.1.3.2]1 and this neural network version of the creak detector2 is considered state-of-the-art for creak detection as of 2023.

From Marks et al (In Press)3: “The automated creak detector4 is the result of an artificial neural network model that was trained to detect the following acoustic features associated with creak: 1) H2−H1 and fo creak, which characterizes the strong presence of secondary residual peaks often found in creaky voice; 2) residual peak prominence, which is meant to characterize each excitation peak in the time domain; 3) power peak parameters, which highlight the amplitude variation within individual pulses; 4) inter-pulse similarity, which is used to discriminate glottal pulses corresponding to creaky voice from unvoiced regions; 5) intra-frame period, which was designed to help differentiate creaky voice from other voiced regions; and 6) additional acoustic features: energy norm, power standard deviation, and ZeroXrate, which were included to avoid false positives in unvoiced and silent regions. From their visual analysis of the acoustic speech signals, Drugman, Kane, and Gobl4 described three creaky voice patterns: highly irregular temporal characteristics, fairly regular temporal characteristics with strong excitation peaks, and fairly regular temporal characteristics without strong secondary excitations.”

Downloads

To use our MATLAB wrapper for the creak detector, download all files contained in this Zip File:

Instructions/Troubleshooting

  1. Download the Zip file and save it to your computer (make sure it does not stay in a temporary file or in your downloads).
  2. Next, make sure you have the Digital Signal Processing Toolbox in MATLAB installed. If not, go to home → Add-Ons → Get Add-Ons: type in the search field “Signal Processing Toolbox”, go to the toolbox module found, and hit the install button.
  3. Open detect_creaky_voice_Dataset2XLS in MATLAB. Copy the path where your audio files are located into the INPUT_PATH variable assignment (space in purple). This code will calculate % creak for each file in that folder and make an excel sheet with the file name, group, total voiced time, total creak time, and % creak. If you have multiple groups, you can add subfiles within the audio folder.
  4. The 2 scripts in “creaky_voice_detection” folder (named detect_creaky_voice_Dataset2XLS.and detect_creaky_voice_1example.m) need the function detect_creaky_voice() and the folder “private” to work.
  5. The script detect_creaky_voice_1example.m in particular, also needs the audio file “test1.wav” that is in the same folder.
  6. There is no need to add the folder “private” to the MATLAB path. This folder name “private” is a MATLAB reserved name for folders that may contain private functions (functions only visible to the scripts in the same location as this folder), so the path is added automatically for the scripts above.
  7. The input format/directories inside the INPUT_PATH location (defined in the detect_creaky_voice_Dataset2XLS.m script) should look like:

INPUT_PATH →

  • Group1_folder
  • Group2_folder
  • GroupN_folder →
    • Participant1_folder
    • Participant2_folder
    • ParticipantM_folder →
      • audiofile_1.wav
      • audiofile_2.wav

8. If there are problems running the detect_creaky_voice() function, troubleshoot by checking whether the script detect_creaky_voice_1example.m runs successfully or not. If this script runs, it should display a figure with the “test1.wav” audio file data and the respective portions of the audio where the detector identifies possible instances of creak, like the figure below. If not, refer the error message to understand what the issue is. If it runs and it shows the figure below, then the detect_creaky_voice_Dataset2XLS.m script should run as long as the INPUT_PATH location follows the format illustrated previously in point six (6).

9. The resulting xls sheet will be in the same folder where you saved the MATLAB code.

References

  1. Degottex G, Kane J, Drugman T, Raitio T, Scherer S. COVAREP—A collaborative voice analysis repository for speech technologies. IEEE; 2014:960-964.
  2. Drugman T, Kane J, Gobl C. Data-driven detection and analysis of the patterns of creaky voice. Computer Speech & Language. 2014/09/01/ 2014;28(5):1233-1253. doi:https://doi.org/10.1016/j.csl.2014.03.002
  3. Marks KL, Díaz Cádiz ME, Toles LE, et al. Automated Creak Differentiates Adductor Laryngeal Dystonia and Muscle Tension Dysphonia. The Laryngoscope. 2023.
  4. Drugman T, Kane J, Gobl C. Data-driven Detection and Analysis of the Patterns of Creaky Voice. arXiv preprint arXiv:200600518. 2020.

Funding

This work was funded by National Institute on Deafness and Other Communication Disorders (NIDCD)R01DC015570 (Stepp) and F32DC020349  (Marks), as well as The American Speech-Language Hearing Association Speech Science Research Grant (Marks). Please feel free to use the algorithm in scientific research. If you do so, we ask that you cite in this way when using the creak detector: Percent creak values were calculated using an automated MATLAB program; algorithm details can be found in Marks et al., (In Press).” 

Disclaimer

We do not recommend use of this algorithm or % creak as clinical outcome measures at this time. However, we hope to determine whether there is a role for creak in clinical voice assessment. This algorithm has been applied thus far to investigate whether creak can differentiate between speakers with adductor laryngeal dystonia (AdLD) and controls as well as speakers with AdLD and muscle tension dysphonia3.