Department Library


Brian Anderson (Senior Thesis, July 2001, Advisor: William Strong )


Daniel O Ludwigsen (PhD Dissertation, December 2001, Advisor: William Strong )


Simulations of the player/trombone system can provide understanding of its important mechanisms. The player’s lips are the most subtle and complex structures of the system. This model of the lip reed introduces the finite element method to represent a continuum structure. Previous lip models used masses and springs, with imposed eigenfrequencies and definite degrees of freedom. In the spirt of physical modeling, the geometry, boundary conditions and material parameters of this continuum model are specified from measurements of the actual system when possible. Presumably, fidelity of the resulting lip modes and lip behavior in simulation reflects the degree of validity of the assumptions made in creating the model. This work established the feasibility of a continuum model of the lips in a simulation of the player/trombone system. The lowest modes of the lip structure recall the swinging and stretching motions of the current theory of the lip reed. These modes can be tuned using boundary conditions and material properties to enable self-sustained oscillation in simulation. Two standard tones are simulated here, as well as a pedal tone. Both the Bb2 and F3 utilize a swinging lip mode, showing the lower lip entrained to move with the upper. The motion and waveforms can be compared with those of other models and real players; limited agreement is shown. The main differences are an extended closure phase of the waveform cycle and dissimilarity in the trajectories of a point on the tip of the upper lip. The shape of the area waveform (when open) is encouraging, and radiated pressure waveforms audibly resemble trombone tones. The profound ability to control or tune the embouchure of this model can initiate a new tool of investigation into the mechanisms of player control of the lip reed.


Steven L Tait, Jr. (Honors Thesis, June 2000, Advisor: William Strong )


Michael Wayne Thompson (Masters Thesis, August 2000, Advisor: William Strong )


A frequency-domain model of trombone sound production that includes the effects of wave steeping is proposed. The trombone is approximated as a set of contiguous cylindrical tubes with superposition of incoming and outgoing waves in each cylinder and with continuity of pressure and flow a teach cylinder junction. The far-field radiated pressure spectrum is calculated from the spectrum of a measured pressure wave in the mouthpiece. This calculation includes the effects of wave steeping for the outgoing wave in each cylinder. The equations describing the model are given. Mouthpiece spectra are processed both with and without the effects of wave steepening. The predicted far-field spectra are compared to the corresponding measured far-field spectra. In all cases analyzed, the inclusion of wave steepening greatly reduced the error between predicted and measured spectra.


Ronald E. Ainsworth (Jr.) (Senior Thesis, December 1998, Advisor: William Strong )



Eric J Hunter (Masters Thesis, August 1997, Advisor: William Strong )


A geometrical display of speech spectra intended as an adjunct to lip-reading was developed. Spectra were calculated at 5-ms intervals from speech sound pairs ambiguous to lipreaders. The spectra were displayed as sequences of irregular decagons. Human subjects were asked to discriminate between pairs of spectral decagon sequences derived from pairs of ambiguous speech sounds. Subjects were able to discriminate between most of the visual spectral patterns derived from ambiguous sounds. However, spectral patterns associated with the voiced/unvoiced contrast in some stop pairs were not discriminated consistently.


Rong Lin (PhD Dissertation, April 1996, Advisor: William Strong )


An extensive set of carefully recorded utterances provided a speech database for investigating acoustic correlates among eight emotional states. Four professional actors and four professional actresses simulated the emotional states of joy, conversation, nervousness, anger, sadness, hate, fear, and depression. The values of 14 acoustic parameters were extracted from analyses of the simulated portrayals. Normalization of the parameters was made to reduce the talker-dependence. Correlates of emotion were investigated by means of principal components analysis. Sadness and depression were found to be “ambiguous” with respect to each other, but “unique” with respect to joy and anger in the principal components space. Joy, conversation, nervousness, anger, hate, and fear did not separate well in the space and so exhibited ambiguity with respect with respect to one another. The different talkers expressed joy, anger, sadness, and depression more consistently than the other four emotions. The analysis results were compared with the results of a subjective study using the same speech database and considerable consistency between the two was found.


David C Copley (Masters Thesis, April 1995, Advisor: William Strong )


A physical model of a player/trombone system is developed and used for music synthesis. The trombone is modeled as a series of concatenated cylindrical tubes of varying diameter. The instrument’s acoustic input impedance is calculated and compared to experimental measurements. While some discrepancies exit between theory and measurements, the model appears reasonable. The input impedance is used to obtain the input impulse response. Similarly, the transfer impulse response is obtained through calculation of the transfer impedance. The player’s lips are examined using computer controlled fiber optic stroboscopy. Lip motion is observed from the front and side for six notes (Bb2, F3, Bb3, D4, Ab4) played at loud and soft dynamic levels. The video sequences are used to obtain information on lip opening area, lip motion perpendicular to airflow and lip motion parallel to airflow. The data are used to generate input airflow signals to the trombone. By convolving the input with the transfer impulse response, synthetic trombone tones are realized and compared to real tones. While the syntheses exhibit many desired characteristics, the inherent limitations of the model prevent them from sounding authentic.


Grant Jensen (Honors Thesis, April 1994, Advisor: William Strong )



Mont A. Johnson (Senior Thesis, April 1992, Advisor: William Strong )


Si Ning Li (Masters Thesis, April 1992, Advisor: William Strong )


An image method employing artificial diffusion was used to simulate source-to-receiver transmission of sound in a large rectangular room. The simulation model, which uses an impulse and a generic tone as sources, was evaluated by comparing the standard deviations of its response spectra to theoretical and experimental values obtained by others. Several source and receiver positon combinations were used in order to assess their effects on the spectral variability under various “listening” conditions. A five source/five reliever arrangement provide a room average spectrum with a standard deviation of 1.12 dB. Nominal “performance listening” conditions with single source/single receiver, single source/two receiver, and two source/two receiver arrangements resulted in standard deviations of 5.55, 4.01, and 3.60 dB, respectively. Simulated frequency modulation and critical band effects also reduced the spectral variability. Consideration was also given to room effects on transient spectra. An attempt was made to address the most interesting question –that of perceptual effects- by comparing the simulation results with a perceptual “model”. However, this question remains largely unanswered because of lack of pertinent perceptual data.


David A Berry (PhD Dissertation, August 1991, Advisor: William Strong )


An event-synchronous technique has been designed in an attempt to optimize time and frequency resolution in speech analysis. The technique isolates “microevents” in the speech waveform and then analyzes them, thus differing from commonly used asynchronous methods that employ a fixed window stepped forward in constant time increments. A microevent (ME) is associated with a “packet of energy” in the waveform and is initiated by some underlying input of energy or fluctuation of energy. Four ME types are envisioned: (1) a voiced ME is initiated by a pitch pulse; (2) a plosive ME is initiated by a plosive burst; (3) a noise ME is initiated by a positive fluctuation in energy; and (4) a mixture ME. The current algorithm developed and tested with portions of the 1988 DARPA TIMIT Acoustic-Phonetic Continuous Speech Database, isolates over 99% of the voiced MEs and plosive burst MEs correctly. Noise MEs are also marked in a reasonable fashion. Once isolated, MEs are characterized by their one-third octave spectra. MEs of various allophones are plotted in a two-dimensional principal components space and a new trispectral space, to be introduced. ME trajectories are also plotted in the trispectral space to further investigate the results of microevent analysis.

Ji Lu Feng (PhD Dissertation, April 1991, Advisor: William Strong )


Intonation and stability of clarinet tones are influenced by the resonance frequencies of the instrument, which are influenced in turn by the placement and size of its toneholes. Dimensions of an instrument of moderate quality were measured and served as the starting point for the optimization procedure. The positions, diameters, and heights of the toneholes were then optimized for 47 different fingerings of the instrument. For each fingering minimization of the frequency differences between our modal frequencies – modes 1 and 2 in the chalumeau register, mode 2 in the clarion register, and mode 3 in the altissimo register – and four reference frequencies served as the optimization criteria. The reference frequencies were arbitrarily chosen from an equal tempered chromatic scale tuned to A4 = 440 Hz. In one method the parameters were modified one at a time, while in a second method all three were modified together. Both methods produced similar results with reductions in rms “frequency difference” of 60% for chalumeau mode 1, 36% for chalumeau mode 2, and 60% for combined mode 2 of the clarion and mode 3 of the altissimo registers. Optimization procedures and results are discussed.


Xiaoping Li (Masters Thesis, August 1989, Advisor: William Strong )


A simplified violin model has been designed for investigating the possibility of simplification of violin manufacture. A primary requirement of such an instrument is realistic tonal quality which is determined by its body response. A predetermined body response curve is used as the basis for this model. The simplified violin body system consists of a piston and four mass-spring systems. The simulation is implemented by the digital computer in several phases: string admittance, string impulse response, bow speed, bow force, and radiation pressure. The impulse response is obtained by the Fourier transform of the admittance. A convolution of the string impulse response at the bowing point with the bow force creates the speed of the string at the bowing point. A graphical method is used in the convolution because of hysteretic effects in the interaction between the bow and the string. The speed of the radiating piston is found through a transfer function from the speed of the string at the bowing point. The simplified violin radiation pressure is the product of the speed of the piston and the radiation resistance of the piston.


Paraskev Papachristou (Masters Thesis, August 1987, Advisor: William Strong )


Twenty steady sounds (10 vowel and 10 instrumental) matched for pitch and loudness were synthesized. Three spectral representations for each of the twenty sounds were used: 1) One-Third octave band spectra normalized to the overall sound level; 2) specific loudness per one-third octave band; and 3) F-weighted loudness per one-third octave band. Each of the spectral representations was subjected to a principal component analysis to reduce its dimensionality. The one-third octave spectra and loudnesses produced similar results for the first four principal components, but the first four components of the F-weighted loudnesses accounted for less of the total variance than those of the other two representations. In separate analyses of the vowels and the instrumental sounds, the two were found to produce markedly different correlation matrices and spectral spaces. Listener tests, using paired comparisons, were performed to obtain a perceptual space and multidimensional scaling techniques were utilized to reduce the dimensionality of the perceptual space. Observations are made about the similarities of the sound pressure level and loudness spaces relative to the perceptual space.

Yuan Gong Xu (Masters Thesis, April 1987, Advisor: William Strong )


The influence of top thickness on violin tone has been investigated using impulsive excitation of the instruments and a microphone-filter system to sense the resulting sound. The computer-controlled system includes an electromechanical plunger which repeatedly taps the bridge of the violin and a narrow-band digital filter whose center frequency is stepped over the frequency range of interest. A digital voltmeter measures the response at each frequency and a frequency-response curve is generated. Four commercial violins having tops which were thicker than normal were tested, regraduated to normal top thickness, and again tested. The principal changes in the response curves produced by regraduation were a decrease in the frequency of the “wood” resonance and an increase in the amplitude of the “air” resonance. In the low-frequency region, the final response curves resembled more nearly the response curves of good reference violins than did the initial curves. A panel of violinists judged the violin tone to be considerably improved as a result of regraduation of the tops.


Scott D Sommerfeldt (Masters Thesis, August 1986, Advisor: William Strong )


A time-domain simulation model has been developed for investigating the player-clarinet system. The three components which constitute the simulation model consist of the “sub-reed” system, reed, and clarinet. The “sub-reed” system is represented in terms of an analogous circuit model to obtain the “sub-reed” pressure. The reed is represented in terms of its input impedance impulse response. A convolution of the impulse response with the volume velocity determines the mouthpiece pressure. Use of the model is valid for both small-and- large-amplitude oscillations. Many of the nonlinearities associated with the clarinet are incorporated in the model in a rather natural way. Several vocal tract configurations are investigated to determine the influence of the vocal tract ion the clarinet tone


Lyle Gordon Shirley (Masters Thesis, April 1984, Advisor: William Strong )


A method is developed for calculating the acoustic input impedance of a horn by modeling the horn as a series of conical or cylindrical segments with losses. For horns with continuous cross sectional area and continuous wall slope, the method simplifies to finding a complex valued effective horn length by summing the effective lengths of the individual segments. The effective horn length is compared as calculated by lossy cones with a WKB approximation and by lossy cylinders.There was excellent agreement between measurement and calculations of the input impedance of a truncated cone. Three methods of exciting the sound ion the horn were used, with a high impedance capillary method being emphasized. The perturbing effects due to the finite acoustic impedances of the capillary and probe microphone were calculated. A computer fitting routine was developed to find the frequency dependence of these impedances and a correlation to the effective horn length that gave the best agreement between measurement and theory. The high degree of correlation between these impedance functions as obtained with different hons attached to the measuring system indicates that the measurements errors could be largely eliminated though calibration.


Donald Robert Allen (PhD Dissertation, December 1983, Advisor: William Strong )


A model has been developed which is designed to preserve some of the naturalness that is usually lost in speech synthesis. A parameterized function is used to produce in approximation to the cross-sectional area through the glottis. A circuit model of the subglottal and glottal system is used with the supraglottal pressure to generate the glottal volume-velocity. The tract used to obtain the supraglottal pressure is represented by its input-impudence impulse-response which can be calculated form the area function of the tract. A convolution of the input-impedance impulse-response with the volume velocity determines the supraglottal pressure. The two coupled equations for the volume velocity are solved simultaneously. The output of the model is generated by convolving the resulting glottal volume-velocity with the transfer-function impulse-response of the tract. This technique preserves the interaction between the glottal flow and the vocal tract, glottal-flow interaction. Listening tests shown that vowels synthesized with the interactions were preferred as ore natural sounding than those without the interaction.


Donald Robert Allen (Masters Thesis, April 1980, Advisor: William Strong )


In a series of experiments, the following versions of speech and speech codes have been compared: (1) natural speech; (2) speech severely low-pass filtered at 900 Hz; (3) an all harmonic code consisting of many harmonic sinusoids; (4) a largest harmonic code consisting of four harmonic sinusoids closest to the formats; and (5) a format code consisting of three sinusoids scaled to the format frequencies. Fundamental frequency and format frequencies are scaled by different amounts in the various codes. Normal hearing subjects were tested on three different categories of code. The Diagnostic Rhyme Test (DRT) was used on the speech codes that were not frequency lowered, a Diagnostic Discrimination Test (DDT) was used on frequency lowered speech codes, and a periodic test was run on all versions of the speech and speech codes. Results of each test are presented and compared for the various talker, speech, and speech code combinations; they show that the low-pass-filtered speech was always more intelligible than any low-frequency speech code tested.


Stephen E Stewart (PhD Dissertation, August 1979, Advisor: William Strong )


A functional model of a simplified clarinet has been developed and implemented on a digital computer. The simplified clarinet consists of a standard clarinet mouthpiece and reed attached to a straight cylindrical tube. In the model, the tube and mouthpiece are represented by a lumped element approximation to a transmission line and the reed is represented as a non-uniform bar, clamped at one end. Differential equations for the system are solved numerically on a digital computer to obtain pressures and volume velocities of the air in the tube and mouthpiece and positions of the reed at successive time increments. The model exhibits self-sustained oscillations, threshold blowing pressures, frequency shifts, and spectra of mouthpiece and radiated pressures all in reasonable agreement with published data on the clarinet. A previously unreported dependence of volume velocity in the reed aperture on the initial or rest opening of the aperture was found. Suggestions are made for further work with the functional model.


Gary L Clement (Masters Thesis, April 1975, Advisor: William Strong )


This study describes the analysis and synthesis of speech using linear prediction. A detailed comparison is made of speech synthesized using two, four, six, eight, ten, and twelve predictor co-efficients. The synthetic and original speech are examined using form IV of the diagnostic rhyme test. Overall intelligibility scores are measured for each type of speech as well as mean scores for the six consonant attributes of voicing, nasality, sustention, sibilation, graveness, and compactness. Significant differences in these scores are evaluated by means of “t”—tests. Possible explanations for observed results are proposed to suggest improvements to speech analysis-synthesis systems.

Lynn O. Keeler (Masters Thesis, April 1975, Advisor: William Strong )


This study describes the analysis and synthesis of speech using linear prediction and formant coding methods. A detailed comparison is made of speech synthesized using twelve predictor coefficients, six predictor coefficients, five formant frequencies and amplitudes, three format frequencies and amplitudes, three formant frequencies and amplitudes with monotone pitch, and three formant frequencies with monotone pitch and amplitudes calculated by formula. The synthetic speech and the original speech are examined using Griffiths’ articulation test for rhyming minimal contrasts. Intelligibility score are measured for each method of synthesizing speech, and confusion matrices are constructed for each method. Possible explanations for the observed confusions are investigated with the view in mind of suggesting improvements to speech analysis-synthesis systems.

Kaye Reeder (PhD Dissertation, April 1975, Advisor: William Strong )


A format speech code was tested as a possible speech reception aid for the severely to profoundly hearing impaired. Three formant frequencies were extracted from speech and used to control three sine wave oscillators whose combined outputs were presented aurally to subjects. The formant frequencies were divided by four to place them in the residual hearing range of typical sensory-neural impaired ears and a 1—Hz bias was added to each to place it in a frequency range such as to minimize problems of acoustical coupling between headphones and the ears. The code was first tested for completeness, discriminability, and learnability using ten normally hearing subjects. The normally hearing subjects learned from 20-50 of the most common English words. They also took a diagnostic rhyme test which provided a more refined test of discriminability. Six sensory-neural subjects were given the diagnostic rhyme test to check the ability of the impaired ear4 to discriminate the code. The tests showed positive results for both the normally hearing and the hearing-impaired subjects with the vocabulary used.


Randall L Christensen (Masters Thesis, January 1974, Advisor: William Strong )


Three methods of extracting resonance information for speech from predictor coefficients are compared. The methods are finding roots of the polynomial in the denominator of the transfer function using Newton iteration, picking peaks in the spectrum of the transfer function, and picking peaks in the negative of the second derivative of the spectrum. A relationship was found between the bandwidth of a resonance and the magnitude of the second derivative peak. Data, accumulated from a total of about two minutes of running speech from both female and male talkers, are presented illustrating the relative effectiveness of each method in locating resonances. The seconds-derivative method was shown to locate about 98 percent of the significant resonances while the simple peak-picking method located about 85 percent.

Paul Andrew Wheeler (Masters Thesis, April 1974, Advisor: William Strong )


The homomorphic filtering method and linear prediction method of analysis and synthesis were applied to French horn tones. The homomorphic filtering method created problems in picking the fundamental period and in spectral balance. The linear prediction method produced problems in gain normalization. The homomorphic filtering method used seven parameters, compared to six with the linear prediction method. The linear prediction method is also more economical in terms of computer time. An audio test was given to determine the adequacy of tone representation. Two homomorphic notes out of ten played were chosen by more than half the listeners as being real tones. Six linear prediction notes were chosen as real while seven of the real notes were chosen as being real by more than half of the listeners.


Robert Byron Purves (PhD Dissertation, April 1973, Advisor: William Strong )


A method of automatic speech recognition has been programmed on a small computer. The system accepts syntactic units of carefully spoken continuous speech from a single co-operative male speaker. The recognition parameters are low order cepstrum coefficients, zero crossing rate, slope change rated, cepstrum peak height and apparent place of articulation. The segmentation is performed using a “segmentation by recognition” method. Two phoneme choices are assigned to each segment. The utterance is identified by generating successive phoneme strings until one is found which satisfies the lexical and syntactic constraints. The lexical constraint requires the word string to consist only of phonemicon (phonemic dictionary) entries. The syntactic constraint requires the word string to satisfy a simplified English syntax. The phonemicon was built to contain about a thousand entries. Twenty utterances containing an average of 3.4 words were used to evaluate the system. Of these 35 percent were correctly recognized without application of the syntactic constraint. Imposition of the syntactic constraint improved the recognition rate to 65 percent.


George R. Plitnik (PhD Dissertation, January 1972, Advisor: William Strong )


The purpose of this study was to achieve a better understanding of the physical properties of double-reed instruments and their tones by making extensive use of the digital computer. To attain this objective, the physical dimensions of the instrument were used to compute the input impedance, and an analysis-synthesis scheme was developed. The numerically computed input impedance for various fingerings of the oboe was compared to experimentally derived curves. The agreement was fairly good in most cases. The reasons for the observed discrepancies are discussed and suggestions for improving the agreement between the predicted and experimental frequencies are given. The analysis of double-reed instrument tones was performed by using cepstral techniques. The tones were then synthesized by several different schemes. These synthetic tones were then compared to the results of other analysis-synthesis schemes, and to real tones, by means of psycho-acoustic testing.

Chang Ho Tien (Masters Thesis, May 1972, Advisor: William Strong )


This research is concerned with on-line speech analysis using the Cepstrum technique and fast Fourier algorithm. The analysis scheme was used to measure the fundamental frequencies of the vowels of the Chinese National Phonetic System. The analysis system was simulated on a PDP-15 digital computer, and the analysis results were displayed on the PDP-15 graphic display and photographed. The results of this research are consistent with another acoustic study, with the linguistic results, and with a listening test.


David Edward Cahoon (Masters Thesis, May 1970, Advisor: William Strong )


The purpose of this study was to show that the change in frequency with time observed for a drum struck with a hard blow is caused by a change it the average tension of the drum membrane as it vibrates. The problem was approached theoretically by first determining a relationship between the fundamental frequency and the average tension in the membrane. A relationship was also found between the tension and the displacement amplitude of the center of the membrane. Finally, an experiment was performed to measure the displacement amplitude as a continuous function of time. Combining these relationships, we obtained a theoretical fundamental frequency as a function of time. This theoretical frequency was then compared to the actual measured frequency. Within the limits of the simplifying assumptions, the results were very good. This would indicate that the change in frequency is indeed caused by the change in the average tension of the drum membrane.

Glen Allen Higbee (Masters Thesis, August 1970, Advisor: William Strong )


This research represents the initial effort at Brigham Young University in the use of an acoustic domain approach to speech synthesis by rule to develop control signals for a computer-simulated parallel terminal analog synthesizer. The utterances produced by the synthesis strategy, synthesizer, digital-to-analog converter, and speaker were judged by listeners to be intelligible for vowels (83 percent), semivowels and liquids (89 percent), and diphthongs. For consonants, intelligibility was generally low (34 percent). The failure to achieve high consonant intelligibility may be attributed primarily to errors in the synthesis strategy.


Brent Scott Baxter (Masters Thesis, January 1969, Advisor: William Strong )


A method is described for generating synthetic speech with a computer simulation of the human vocal system. The vocal cords and the vocal tract are each represented in the model in a way that relates closely to the physical processes involved in natural speech production. The vocal cords are modeled as a system of rectangular gates free to move transverse to the axis of the throat. The flow through the glottal opening is described in terms of viscous-turbulent flow equations and the motion of the vocal cords is that of a simple second order mechanical system having mass, stiffness, damping and a driving force due to the intraglottal pressure. The vocal tract is represented as a series of coaxial cylinders with plane waves in them. The boundary conditions at the glottis end and the mouth end enable the radiated pressure to be computed. The model appears to have the correct behavior for stop consonants and vowels. In order to generate fricative type consonants and nasals, the model must be modified slightly.

James M Reynolds (Masters Thesis, August 1969, Advisor: William Strong )


The purpose of this thesis was the production of three demonstrations for use in a survey course of acoustics. The first was delayed speech feedback. An inexpensive tape recorder was modified by the addition of an extra amplifier and head that provided a variable time relay that ranged from 0.18 sec to 0.35 sec. These delay times were sufficient to have deleterious effects upon the speech of several subjects on whom the unit was informally tested. The second was the artificial vocal system. Artificial tracts replicating the human tract for each of six steady vowels were built from wood, clay, and plastic. A Western Electric Electronic Artificial Larynx simulated the glottis. Each tract was equipped with a window so that students might see its configuration. Degraded speech comprised the third. A digital synthesizer had already been developed to synthesize the utterance, “Robby will like you Daddy-oh.” The elimination of one or more formants, the modification of the fundamental frequency, and the proportionate raising of the formants were the categories of degradation utilized.