Portuguese Version

Year:  2002  Vol. 68   Ed. 4 - (14º)

Artigo Original

Pages: 540 to 544

Standardization of acoustic measures of the normal voice

Author(s): Simone Adad Araújo 1,
Marcos Grellet 2,
José Carlos Pereira 3,
Marcelo Oliveira Rosa 4

Keywords: acoustic measures, normal voice, standardization.

Abstract:
Introduction: Acoustic analysis of the voice with advance of the digital technology increases as a promising complementary exam for raising the diagnostic precision in laringology. Objectives: Standardization of the acoustic measurement of fundamental frequency, Perturbations and noise in the normal voice of male and female brazilians. Methodology: Research was carried out from March to August of 1997 in the Otorlaryngological Clinic of the Clinical Hospital of the Faculty of Medicine of Ribeirão Preto of the University of São Paulo, using 80 volunteers of the city of Ribeirão Preto and its region, consisting of 40 males and 40 females, and selected by means of a triage. Digital recordings of the vowels /a/, /e/, and /i/ were utilized, originating in a sample of 240 acoustic signals, and submitted to the Program of Acoustic Voice Analysis, with the São Carlos School of Engineering of the University of São Paulo securing the acoustic measurements. Results: Mean values in relationship to sex and phonemes were obtained in the measurements of: Fundamental Frequency, Jitter (Directional Perturbation Factor, Perturbation Variation Index, Jitter Ratio, Jitter Factor, Period Perturbation Quotient), Shimmer (Directional Perturbation Factor, Amplitude Variation Index, Amplitude Perturbation Quotient), Spectral Noise Level, Harmonic-to-noise ratio, Harmonic-to-noise ratio cepstrum, Normalized Noise Energy e Breathiness ratio. Conclusion: The normalization of acoustic measurements is necessary to better know the normal voice. The great majority of obtained values are compatible with existing literature.

INTRODUCTION

The voice is a complex phenomenon and it requires multiple measures to describe its characteristics, such as perceptive and acoustic assessments. Perceptive measures are subjective, leading to discrepant results. Acoustic measures are objective, allowing voice documentation and comparison with other results owing to the numeric nature. Bless (1991) referred that voice objective measurements serve not only to document, but also to inform about what our eyes and ears can not discriminate. The analysis of acoustic signals produce indirect measures of the vocal fold vibration pattern, including frequency and intensity.

The Programs of Acoustic Analysis through signal processing and algorithms are capable of defining the shape of sound waves, analyzing fundamental frequency, measuring perturbation such as jitter and shimmer and measuring noise, enabling an almost complete description of human voice. Read et al. (1992) reported that as a result of the advent of microcomputers and programs one system only can perform various analytical functions and it is capable of combining different elements to provide an integrated perspective about the signal.

Measures of fundamental frequency (F0), defined as number of vibrations per second produced by the vocal folds were studied by many authors: Emanuel & Whitehead (1979) who studied the normal male voice; Murry & Doherty (1980) investigated vowel /a/ in male subjects; Horii (1982) studied eight English vowels in male subjects; Sanderson & Maran (1992) collected mean values for male and female voices; Pegoraro-Krook & Castro (1994) used connected speech of male Brazilian subjects, and Behlau (1997) investigated male and female Brazilian speakers from the city of São Paulo.

Jitter measures, defined as perturbation or variability of fundamental frequency cycle by cycle, were studied by the following authors: Deal & Emanuel (1978) studied normal male subjects; Horii (1980, 1982) collected values for sustained vowels; Sorense & Horii (1983) studied sustained vowels in female subjects, and Murray & Zubick (1996) reported normal jitter values.

Shimmer measures, defined as perturbation or variability of amplitude cycle by cycle, were studied by the following authors: Takahashi & Koike (1975) investigated sustained vowel /a/ in male and female subjects; Deal & Emanuel (1978) collected sustained vowels measures for male subjects, and Sorensen & Horii (1984) studied sustained vowels in female normal voices.

Noise measures quantify the noise originated from air turbulence at the glottis level and they were studied by the following authors: Spectral Noise Level (SNL) by Sansone & Emanuel (1970), Lively & Emanuel (1970) and Emanuel et al. (1973) who reported that this measure quantifies roughness characteristics; Harmonic-to-Noise Ratio (HNR) by Behlau (1997) and Rodrigues et al. (1994) who reported that this measure provides an index that correlates the harmonic component with the noise component; Normalized Noise Energy (NNE) by Kasuya et al. (1986) who reported that this measure is an acoustic index to estimate noise generated by insufficient glottis closure, and Breathiness Ratio (BR) by Fukazawa et al. (1988) that reported that this measure estimates perceptual characteristics of breathiness.

The present study intended to standardize the acoustic measures of fundamental frequency, perturbation and noise for normal Brazilian voices concerning gender and phonemes /a/, /e/ and /i/ in Brazilian Portuguese language, using the Program of Vocal Acoustic Analysis developed by the School of Engineering of São Carlos, University of São Paulo, by Rosa (1998).

MATERIAL AND METHOD

The present study was conducted by the Ambulatory of Otorhinolaryngology, Hospital das Clinicas, Medical School of Ribeirão Preto, University of São Paulo, after the approval by the Research Ethics Committee, between March and August 1997.

We selected 80 adult subjects, 40 male and 40 female, using a screening protocol; the subjects were recorded producing vowels /a/, /e/ and /i/ and we collected 240 computer recorded phonemes which were processed using the Program of Vocal Acoustic Analysis developed at the School of Engineering of São Carlos, University of São Paulo, by Rosa (1998).

The study accepted Brazilian volunteers who came from the city of Ribeirão Preto and surrounding areas, ages ranging from 20 to 40 years, both genders, with no history of dysphonia and correlated diseases, as investigated in data collection. Vocal folds were normal, according to videolaryngoscopy, hearing was within the normal range and the voice was perceptually normal according to gender and age, based on perceptual assessment conducted by speech and voice therapists.

Voice recordings were made directly on the computer Pentium 100 MHz with data collection conventional board SoundBlaster SB16, in a noise-free environment. The signal was captured by cardioid unidirectional microphone with dynamic gain positioned 5cm from the volunteers' mouth who was seated and inspired deeply before producing the sustained phonemes /a/, /e/ and /i/, separated by 7 seconds, on average. We instructed the subjects to produce the emission in comfortable intensity and frequency, avoiding excessive vocal fold tension. Recordings were preceded by previous training.

The signals stored in the computer were pre-processed with deletion of initial and final unstable productions, standardizing signals with 5 seconds, amplitude standardized between +1 and -1 and using an algorithm to remove linear tendency. This standardization of acoustic signals was made to collect a uniform analysis and to prevent recording characteristics from influencing the acoustic parameters.

The studied measures were frequency, perturbation and noise. Fundamental frequency was obtained by processing signals with Cepstrum. Perturbation measures for Jitter were Directional Perturbation Factor (DPF), Perturbation Variation Index (PVI), Jitter Ratio (JR), Jitter Factor (JF) and Period Perturbation Quotient (PPQ), and for Shimmer, Directional Perturbation Factor (DPF), Amplitude Variation Index (AVI) and Amplitude Perturbation Quotient (APQ). Noise measures were Spectral Noise Level (SNL), Harmonic-to-Noise Ratio (HNR), Harmonic-to-Noise Ratio cepstrum (HNR cepstrum), Normalized Noise Energy (NNE) and Breathiness ratio (BR).

Results were submitted to statistical analysis of means and standard deviations.

RESULTS

Means and standard deviations were collected for both genders, male and female, for phonemes /a/, /e/ and /i/ in acoustic measures of fundamental frequency, Jitter - Directional Perturbation Factor (DPF), Perturbation Variation Index (PVI), Jitter Ratio (JR), Jitter Factor (JF), Period Perturbation Quotient (PPQ5), Period Perturbation Quotient (PPQ11); Shimmer - Directional Perturbation Factor (DPF), Amplitude Variation Index (AVI), Amplitude Perturbation Quotient (APQ11); Spectral Noise Level (SNL) in the interval 100 to 5,100 Hz; Harmonic-to-Noise Ratio (HNR); Harmonic-to-Noise Ratio cepstrum (HNR cepstrum) in the interval 200 to 5,000 Hz; Normalized Noise Energy (NNE) in the interval 1,000 to 5,000 Hz and Breathiness Ratio (BR), as described in Table 1 and Table 2.

Table 1. Mean Acoustic Measure of Fundamental frequency, Perturbation and Noise concerning gender and phoneme.



Table 2. Standard deviation of Acoustic Measures of Fundamental frequency, Perturbation and Noise concerning gender and phonemes.



DISCUSSION

In our opinion, the results obtained demonstrated that there is great variability among normal voices, possibly owing to the great number of individual differences, since the voice is a personal characteristic, with no voice perfectly identical to the other.

We assume that comparison of results collected with different vocal acoustic analysis programs can present differences even when using similar measures, owing to differences in algorithms, methods to calculate fundamental frequency, type of microphone used, type of storage of the recorded voice and type of token used, if connected or sustained speech. It confirms the observation by Bielamowicz et al. (1996) that compared programs and voice analysis and noticed divergence among results, even when using similar measures.

The results of fundamental frequency (F0) obtained in the present study were in accordance with those reported by the literature. Emanuel & Whitehead (1979) reported mean values for males in vowel /a/ of 105.7 Hz and in /i/ 109.7 Hz; Murry & Doherty (1980) for males in vowel /a/ was 115.3 Hz; Horii (1982) for males in vowel /a/ 125 Hz and /i/ 128.5 Hz; Sanderson & Maran (1992) for males was 117 Hz and females 217 Hz; Pegoraro-Krook & Castro (1994) for males was 134 Hz; Behlau (1997) in males 113 Hz, ranging from 80 and 150 Hz and in females 205 Hz, ranging from 150 and 250Hz. In the present study, we found mean values for male in phonemes /a/ 127.61 Hz; /e/ 132.45 Hz and /i/ 142.63 Hz and for female, /a/ 215.42; /e/ 214.28 and /i/ 226.73.

Measure of Jitter Ratio (JR), Jitter Factor (JF), Period Perturbation Quotient (PPQ5) and (PPQ11) in the present study presented results below 1%, as reported by Horii (1980, 1982), Sorense & Horii (1983) and Murray & Zubick (1996), and most of them, except for Jitter Factor (JF) for phonemes /a/ of 1.85% and /e/ 1.75% in females, a small variation probably due to the calculation procedure.

Directional Perturbation Factor (DPF) presented similar results to those reported in the literature. Sorense & Horii (1984) reported values for male subjects in vowel /a/ 46.24% and /i/ 46.37%, for females, vowel /a/ 48.79% and /i/ 52.04%. In the present study, the results for males in phoneme /a/ were 64.90 % and /i/ 65.94% and for females, /a/ 65.54% and /i/ 68.06%.

Perturbation Variation Index (PVI) presented lower and negative results compared to those reported in the literature and it is difficult to compared because of lack of knowledge about the unit used by Deal & Emanuel (1978), who reported for male subjects in vowel /a/ 0.4712 and /i/ 0.4898; our results for male subjects for phoneme /a/ -1.16 dB and /i/ -1.11.

The present study had results of Shimmer Directional Perturbation Factor (DPF) that were similar to those in the literature. In males, the phonemes /a/ 63.77% and /i/ 64.95% and in females, /a/ 65.17% and /i/ 65.58%, similar to the results obtained by Sorensen & Horii (1984), that is, for males in vowel /a/ 59.47% and /i/ 61.13% and for females in vowel /a/ 63.13% and /i/ 61.71%.

The results of the current study concerning Amplitude Variation Index (AVI) were greater than the literature, and it is difficult to compare owing to lack of knowledge of the unit used by Deal & Emanuel (1978) who reported for males in vowel /i/ -0.1330 and /a/ -0.0619, but our results for males in phoneme /a/ were 2.37 dB and in /i/ 1.91 dB.

The results of the present study were below the literature values even though it is difficult to compare because we did not know about the unit used by Takahashi & Koike (1975) for vowel /a/ in males, reporting values between 21.4 and 56.4, and in females between 18.1 and 47.7, and our results for phoneme /a/ in males was -30%, and -36% for females.

The results of the measures of Spectral Noise Level (SNL) for males in phoneme /a/ -88.39 dB and /i/ -92.12 dB and females in phoneme /a/ -87.98 dB and /i/ -91.97 dB, were negative and below those reported in the literature by Sansone & Emanuel (1970), Lively & Emanuel (1970) and Emanuel et al. (1973) who reported for males in vowel /a/ 18.9 dB and /i/ 17.0 dB, and females /a/ 18.2 dB and /i/ 16.1 dB, possibly owing to standardization of amplitude between -1 and +1, which could have influenced the calculation of the measure.

The results of measure of Harmonic-to-Noise Ratio (HNR) and Harmonic-to-Noise Ratio cepstrum (HNR cepstrum) were about -1.64 to 2.3 dB, below the levels reported in the literature. Behlau (1997) found mean values in females of 13.9 dB and in males of 11.8 dB and Rodrigues et al. (1994) reported mean values in females of 10.17 dB and in males of 8.63 dB. We believe that this difference is due to amplitude standardization of +1 and -1.

The current study also analyzed the mean values of Normalized Noise Energy (NNE) in decibels for males /a/ -13.08 dB; /e/ -9.52 dB; /i/ -9.68 dB and for females /a/ -14.40 dB; /e/ -9.44 dB; /i/-10.63 dB, similar to those reported by Kasuya et al. (1986) of -11 dB.

We have also presented mean values of Breathiness Ratio (BR) for males in phonemes /a/ 21.54 dB, /e/ 23.85 dB and /i/ 24.59 dB and for females /a/ 23.34 dB, /e/ 26.30 dB and /i/ 27.08dB, similar to the mean value of 27dB reported by Fukazawa et al. (1988).

CONCLUSION

Fundamental frequency (F0) is characteristic for both genders and normal female voice presents higher fundamental frequency than male voices. The mean results were compatible to those reported in the literature.

Measures of Jitter produce better discrimination of perturbation, Jitter Ratio (JR), Jitter Factor (JF) and Period Perturbation Quotient (PPQ) presented compatible results, but Directional Perturbation Factor (DPF) and Perturbation Variation Index (PVI) were not compatible. They presented mean results similar to the literature.

The measures of Shimmer produce worse discrimination of perturbation. Directional Perturbation Factor (DPF), Amplitude Variation Index (AVI) and Amplitude Perturbation Quotient (APQ) were not in agreement. They had results similar to those reported in the literature.

Measures of noise produce good noise discrimination. Spectral Noise Level (SNL), Harmonic-to-Noise Ratio (HNR), Harmonic-to-Noise Ratio cepstrum (HNR cepstrum), Normalized Noise Energy (NNE) and Breathiness ratio (BR) presented agreeing results.

Spectral Noise Level (SNL), Harmonic-to-Noise Ratio (HNR) and Harmonic-to-Noise Ratio cepstrum (HNR cepstrum) showed results below those reported in the literature. Normalized Noise Energy (NNE) and Breathiness ratio (BR) showed results similar to those reported in the literature.

REFERENCES

1. Bless DM. Measurement of vocal function. In: Voice Disorders. Otolaryngologic Clinics of North America 1991;24:1023-33.
2. Read C, Buder EH, Kent RD. Speech analysis systems: An evaluation. Journal of Speech and Hearing Research 1992;35:314-32.
3. Emanuel FW, Whitehead RL. Harmonic levels and vowel roughness. Journal of Speech and Hearing Research 1979;22:829-40.
4. Murry T, Doherty ET. Selected acoustic characteristics of pathologic and normal speakers. Journal of Speech and Hearing Research 1980;23:361-69.
5. Horii Y. Jitter and Shimmer differences among sustained vowel phonations. Journal of Speech and Hearing Research 1982;25:12-14.
6. Sanderson RJ, Maran AGD. The quantitative analysis of dysphonia Clinical Otolaryngology 1992;17:440-3.
7. Pegoraro-Krook MI, Castro VC Normative speaking fundamental frequency (SFF) characteristics of Brazilian male subjects. Brazilian Journal Medical Biological Research 1994;27:1659-1661.
8. Behlau M. Considerações sobre a análise acústica em laboratórios computadorizados de voz. In: Fonoaudiologia Atual. São Paulo: Revinter; 1997. cap.12, p.93-115.
9. Deal RE, Emanuel FW. Some waveform and spectral features of vowel roughness. Journal of Speech and Hearing Research 1978;21:250-64.
10. Horii Y. Vocal shimmer in sustained phonation. Journal of Speech and Hearing Research 1980;23:202-09.
11. Sorensen D, Horii Y. Frequency and Amplitude Perturbation in the Voice of Female Speakers. Journal of Communication Disorders 16:57-61, 1983.
12. Murray KD, Zubick HH. Evaluation of vocal function. In: Fried MP. The Larynx. Mosby; 1996. 2°ed. cap.11:115-24.
13. Takahashi H, Koike Y. Some perceptual dimensions and acoustical correlates of pathologic voices. Acta Otolaryngologica 1975; (suppl.)338:1-24.
14. Sorensen D, Horii Y. Directional Perturbation Factors for Jitter and for Shimmer. Journal of Communication Disorders 1984;17:143-51.
15. Sansone FE Jr, Emanuel FW. Spectral Noise Levels and roughness severity ratings for normal and simulated rough vowels produced by adult males. Journal of Speech and Hearing Research 1970;13:472-88.
16. Lively MA, Emanuel FW. Spectral Noise Levels and roughness severity ratings for normal and simulated rough vowels produced by adult females. Journal of Speech and Hearing Research 1970;13:503-17.
17. Emanuel FW, Lively MA, Mccoy JF. Spectral Noise Levels and roughness ratings for vowels produced by males and females. Folia Phoniatrica 1973;25:110-20.
18. Rodrigues S, Behlau M, Pontes P. Proporção harmônico-ruído: valores para indivíduos adultos brasileiros. Acta AWHO 1994;13:112-16.
19. Kasuya H, Ogawa S, Mashima K, Ebihara S. Normalized Noise Energy as an acoustic measure to evaluate pathologic voice. The Journal of the Acoustical Society of America 1986;80:1329-34.
20. Fukazawa T, El-Assuooty A, Honjo I. A new index for evaluation of the turbulent noise in pathological voice. The Journal of the Acoustical Society of America 1988;83:1189-93.
21. Rosa MO. Análise Acústica da Voz para Pré-diagnóstico de Patologias da Laringe. Dissertação (Mestrado). Faculdade de Engenharia Elétrica de São Carlos, Universidade de São Paulo, 1998.
22. Bielamowicz S, Kreiman J, Gerratt BR, Dauer MS, Berke GS. Comparison of voice analysis systems for perturbation measurement. Journal of Speech and Hearing Research 1996;39:126-34.




[1] Master in Otorhinolaryngology, Medical School of Ribeirão Preto, University of São Paulo and Doctorate Studies in Otorhinolaryngology under course, Medical School, University of São Paulo.
[2] Ph.D., Professor, Department of Otorhinolaryngology, Hospital das Clínicas, Medical School of Ribeirão Preto, University of São Paulo - HCFMRP.
[3] Ph.D., Faculty Professor, Department of Electrical Engineering, School of Engineering of São Carlos, University of São Paulo.
[4] Master Degree, Doctorate Studies under course in Electrical Engineering, School of Engineering of São Carlos, University of São Paulo.

Affiliation: Medical School of Ribeirão Preto, University of São Paulo.
Address correspondence to: Simone Adad Araújo - Rua 20, nº 324, apt. 201

Setor Central - Goiania - Goias - 74030-110 - Tel. (55 62) 224.2282
Financial Support: CAPES

Print:

BJORL

 

 

Voltar Back      Topo Top

 

GN1
All rights reserved - 1933 / 2024 © - Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico Facial