at the GDC March 1996
and Techniques for Sound Effect Design What
have heard the question "if a tree falls in a forest far from any sound
detector (human ear, microphone, etc.), does the tree's fall make any noise?"
If we define sound as waves that are carried by the air, the answer would
be yes -- wherever there are sound waves there is sound. What if we define
sound subjectively as a sensation in the ear? Then the tree falling would
make no sound if there is no ear to detect it. We could go even further
in the definition and say that sound is what a brain decodes using electrical
impulses from the ears. In this case, too, the tree makes no sound when
there are no ears to detect it. Since we are going to be relating sound
to interactive game play, let's consider sound to be whatever the game player's
brain decodes. It will be helpful for us to keep the following overall goal
in mind: Sound should communicate effectively what we want the listener
to know or experience -- it should focus the player's attention. In order
to meet this goal we must always approach sound from the player's viewpoint.
The Physics of Sound
Sound is produced by vibrations that disturb the air, causing pressure waves
that travel out in all directions from the source of the sound. When the
waves reach someone's ear, they set up vibrations that cause electrical
signals to be sent to the brain. These electrical signals are perceived
as sound. Sound is one of the first sensations an unborn child experiences.
It brings us our first lessons of life while we are still in the womb. We
don't have to learn to detect the vibrations of sound. We take it all for
granted from our earliest experiences.
Recording and Storing Sounds
In the early days of movie making it was discovered that the human eye is
fooled into seeing lifelike moving pictures if the individual frames pass
in front of the eye at 24 frames per second. Similarly, the audio community
experimented with sound to see how many snapshots of a sound must be taken
per second to fool the ear into hearing lifelike sounds from digital signals.
It was decided that 44,100 audio snapshots would do the same trick for the
ears as 24 frames per second does for the eyes. These audio snapshots are
called samples and they are the equivalent of a video frame in a
movie. The audio equivalent of the number of video frames per second is
the sampling rate. You have probably heard the term sample used for a complete
sound. It is also used to describe the smallest part of a digital sound.
Very few people can hear any fewer than 16 Hz or any more than about 22
kHz (thousand cycles per second). So, what sampling rate is required to
record sounds from 16 Hz to 22 kHz? This question is answered in the Nyquist
formula which states that the sampling rate of a sound must be twice the
frequency of the highest sound to be sampled. You might ask what will happen
if you record something at too low a sampling rate, disregarding the Nyquist
formula. The answer is that the average person would not think that the
sound was lifelike.
How do we store the digital samples of a sound?
We take snapshots of the amplitudes of a sound at regular intervals and
store the values. We are merely storing values that represent the air pressure
of a vibration at any one point in time. So a digital sound file is merely
a method of storing air pressure! How much air pressure can be stored in
one byte? One byte (8 bits) can have a value from 0 to 255 decimal. For
this reason, 8-bit digital audio can store 256 discrete air pressure levels.
16 bit digital audio can store air pressure much more precisely, allowing
values from 0 to 65535, or 65536 discrete air pressure levels. As you can
well imagine, 8 bit resolution leaves a lot to be desired. Original waveforms
are "squared off" as a result of using it. The sound is not very clear.
But, maybe clarity is not what you are after.
A good way to visualize the difference in 8 and 16 bit audio is to consider
a home stereo that has a volume knob with "detents" in it. You know the
type that clicks as you turn it? Imagine that you buy a tuner/amplifier
that has a knob with positions (clicks) from 0 or, no volume to 65535, or
full volume. The volume change from one click to an adjacent one would be
indistinguishable. Now take the same knob and give it positions (clicks)
from 0, or no volume to 255, or full volume. Now you have only 256 discrete
volume settings. You can see how coarse your volume changes will be from
click to click -- you can hear each change. Now imagine those changes taking
place thousands of times a second. You can hear the roughness of 8 bit amplitude
The sound effects for Wolfenstein were recorded directly into a 16-bit
sampling keyboard, using live actor voices, live Foley effects and effects
recorded on a cassette recorder. Foley effects are named after a man named
Foley who created a way to add sound effects live during post-production
using a recording studio. Foley effects such as footsteps, rain, thunder,
doors opening, keys rattling, etc. are recorded "live," but are not necessarily
actual recordings of "real" sounds. Thunder can be made by wiggling a large
piece of metal. Fire can be made by crushing paper or plastic. The imagination
is the only limit when it comes to Foley effects.
The sound driver for Wolf 3D was designed for playback of 7k samples
at 8 bit resolution, so I had to start with high quality material for the
voices. That was the reason for recording directly into the sampler. The
cassette recorder was used when the microphone cable to the sampler was
too short. Digital Audio Tape Recorders (DAT's) were still relatively new
and not yet portable. The human voices were recorded using compression.
Compressors bring up the low level sounds and hold down the high level sounds
-- this makes the sound "meatier" and also helps to hide any noise. You
hear compressors at work every time you listen to an FM radio announcer.
That's how they get the boom in their voices. With the advent of digital
recording equipment and sound editing software, there is not a lot of need
for compressors. It can be done in software. The sound driver for Wolf
3D was capable of playing only one digital sound at a time, so priority
of sounds was very important.
Some of the weapon firing sounds were recorded at a shooting range. The
explosions began as the crashing of the lid on a "Dempsey Dumpster." Scott
Miller was the vocal talent for the word "Scheist!" and he could not talk
for several hours after doing 3-4 takes at full volume! The other vocal
"talent" was everyone with id Software at the time and me. I generally do
not suggest that local talent be used, but it worked well in this project.
The only software available at the time to move the effects from the sampler
to the computer was Sample Vision by Turtle Beach. Because it did not have
any digital effects editing capabilities, I used several outboard processors
on some of the final effects. Thank goodness we don't have to do that sort
of thing any longer.
Hexxagon and Argo Checkers
What do you do for sound effects in board games? I had the fun of discovering
just that when I worked on these two games. I went to the toy store and
the party supply store to buy every kind of noisemaker I could find -- whistles,
clickers, horns, and other leftovers from Halloween. They were each recorded
with as many variations as possible so there would be a "palate" of sounds
from which to choose. One of the game pieces had to spin and fall to the
playing surface. To make this sound, a large coin was spun on a coffee table.
The final effect was a combination of a slide whistle and the coin spin.
Wave for Windows was available at the time I was doing these sounds, and
it made mixing different sounds easy. It also had the potential to change
the timing of an effect without affecting its pitch. This came in handy
for getting the sounds to sync with the animations.
Early in the development of Doom, Tom Hall created what was called
the Doom Bible. It gave a lot of background information regarding the demons
in the game. It provided the information I needed to create raw material.
The game changed considerably between that time and the final version, but
the raw material was still appropriate when it came time to complete the
final sound effects. The raw material consisted of many animal sounds, explosions,
weapon sounds, etc. To fatten these sounds up, I used my own voice. The
idea was to make a similar sound on the same pitch and mix it in at a lower
amplitude. The final sounds were completed in the last month of development.
The sound driver for Doom was a major change from Wolf 3D in that
more than one digital sound could be played at once. This made it easier
to set the priority of play but it made it more difficult to get the relative
volumes correct. There were several classes of sounds in Doom. One
was general active sounds that were not attached to any one demon. These
were more or less ambient sounds, but they didn't play until demons close
to the player "woke up" [usually based upon the player making some noise
in the area]. Then there were demon active sounds that were attached to
individual demons. These sounds let the player know what class of demon
was around the corner. Each type of demon had a sight sound that played
when the demon "saw" the player. There were also attack, hurt and death
sounds particular to each type of demon. Another helpful thing about the
sound driver was that the volume of sounds depended upon the distance from
the player to the source of the sound. This helped keep the overall volume
down during non-combat. It also stood to help scare the pants off the player
when a demon in a dark niche woke up and immediately screamed his attack
I was at id Software for the month that the sound effects were finalized.
As I finished sounds, John Romero would plug them into the game so we could
test them. John was the evil voice at the end of Doom II. One evening,
he was playing the game with clipping off so he could walk through walls.
Something in the final level caught his attention. It was his head on a
stake! The artists had digitized John for rough artwork and as a joke put
him in the game. Since he has a great sense of humor and a good (wild) imagination,
John decided to put his own joke in the game. We recorded him saying "In
order to win the game, you must kill me, John Romero." After that, I put
heavy flanging and echo on his voice and then reversed the whole thing.
It was fun to wait for the artists to discover this joke. It was quite a
while before we told them what was being said.
The software that I depended upon most during the development of the Doom
sound effects was Wave for Windows. The most helpful thing to come along
between the two Dooms was a shareware program called Cool Edit. It
added significantly to the arsenal of digital effects and made it easy to
mix sounds from different files. I depended upon it heavily, and it is the
reason that my outboard effects equipment has since gathered a good bit
Because so many of the sounds from Doom were familiar, it was decided
to keep them in Doom II. Several additional sounds were required, though.
The Archvile is an evil healer. Anyone getting in his way is blasted with
fire and disintegrated. This includes other demons. But, after he has wrought
his destruction, he then goes around and reanimates all of the demons. Because
of this interesting dual personality, I decided to give him a very evil
laugh as an active sound. For his death sound, I recorded a young girl saying
"why," pitch shifted it down and mixed it with other sounds. The Archvile
just doesn't understand why anyone would want to kill him as he sees himself
as only doing good for his fellow demon.
Duke Nukem 3D
Another great piece of software came to my attention after Cool Edit. Rob
Wallace had been praising Sound Forge since he first began using it. Because
I was comfortable with Cool Edit, I did not consider using Sound Forge.
In looking for new tools to complete the sound effects for the latest in
the Duke Nukem series, I tried it. What a beautiful program! It is
much faster than Cool Edit for most operations. The one thing that really
made it handy was a pop and click removal tool. I had some very old analog
sound effects that would be excellent for use in Duke, but they had included
a loud hum and some crackling every once in a while. The hum was easily
removed using both Cool Edit and Sound Forge. The crackles were readily
removed with Sound Forge.
The voice of Duke was recorded in California on DAT (Digital Audio Tape)
and sent to me. I then played the tape into a digital i/o board using either
Sound Forge or Cool Edit. The result was a wave file that I could edit to
my heart's content and send on to 3D Realms for a decision as to what was
to be used in the game.
How much space is required to store the different types of samples?
Audio CDs have stereo samples at 44.1 kHz, 16 bit resolution. That resolution
requires 2 bytes per sample. At 44,100 samples per second, each taking 2
bytes (16 bits) of precision, we need 88,200 bytes per second for storage.
Since stereo is two tracks we have to double that: 176,400 bytes per second.
Take that times 60 seconds per minute and you get 10,584,000 bytes per minute
-- one minute of stereo 44.1k 16 bit sound requires almost 10 megabytes
of storage (uncompressed).
Requirements for One Minute of Sound
in Figure 1 indicates the different storage requirements for different
types, sample rates and resolutions of one minute of digital sound. The
decision one makes regarding these variables should be based upon the
balancing of sound quality and storage requirement. If the sound effects
we want to use have a large dynamic range (amplitudes from very low to
very high), we would probably want to use 16 bit resolution. If all sounds
are going to be about the same volume, 8 bit should suffice.
The Psychology Of Sound
We are assailed by sound constantly. It is sometimes enough to completely
disorient us. Do you turn the car radio down when you are trying to concentrate
on street names or when you stop to look at a map? Sounds can be very
distracting. This is something we have to keep in mind before we start
throwing myriad sfx into our projects. Our brains know how distracting
sound can be. To handle the distraction, the brain focuses attention on
the more important sounds. It blocks us from consciously hearing competing
sounds. In a computer game our mind can get very confused. The brain gets
befuddled and doesn't know what sounds to focus on. This is where proper
game sound design comes in. As in the movies, we decide for the
game player's brain what it is to focus on by making one sound predominant
at a time. Careful planning will ensure that this predominant sound is
the most important one for the game player to hear.
What sounds do we hear each day? There are the sounds we remember well:
voices, sirens, explosions, favorite songs, and the like. There are also
the sounds that were there but never caught our attention: a breeze rustling
leaves, an airplane far off in the distance, general traffic noise, the
sounds of animals, the sound of our computer fans, children playing, elevator
music, someone making a presentation during a boring meeting, the sound
of our boss droning on and on ;), etc. We generally do not notice these
sounds, that is until they stop. That we would notice quickly. Complete
silence is neither normal nor realistic.
This brings us to two important aspects of sound in computer games. Consider
both of these as generalities. First, we should have one predominant sound.
That sound can be one single sound or a cacophony of other sounds (multiple
explosions, screams, weapons fire, etc.) -- either one serves to focus
the player's attention. Second, we want to have the normal background
sounds of life going on in our games. Normal here means normal for the
environment we have created within the game. This is ambient sound that
is not ordinarily used to focus attention -- just to make the gaming experience
Ambient sounds can be divided into two categories: the sounds that are
constant in an environment and the sounds that occur on a random basis.
Listening to my environment at this very moment (02/10/96 14:26), I hear:
two computer fans on different pitches (constant), insects (constant--I
live in Florida), a powerboat passing by (random), a small airplane overhead
(random), a lawn mower far in the distance (random), my wife working in
the other room (random, but she says "constant"), my typing on the computer
keyboard (random), and a car passing by very slowly (random -- I said
that I live in Florida!). There is no one sound that my brain is
having to focus on, thankfully, as I need to concentrate on what I am
typing. Until I consciously paid attention to the sounds around me, I
did not even know they exist.
If I were immersed in a computer game and these same things were going
on, I'd not know what to focus on based upon sound cues alone. This is
not the best situation. It breaks the general rule of having one sound
keep the player focused on the game. This does not mean that the focus
sound must be invasive, intruding, loud or any other obvious attention-getting
factor. We could focus attention by playing random sounds more often than
in real life (maybe to keep a fear factor alive, or in a children's game
to make the child wonder what the heck is going on in the distance and
how do I get to that action). We could also focus attention by bringing
up the volume of a constant sound (remember how in movie swamp scenes
the insect noises get very loud to let us know the terrible circumstances
our hero is in?). Another "movie" method of focusing a viewer's attention
works well in computer games too. You could have things get very, very,
very quiet and then BLAM something exciting happens. Whatever method is
used, remember that we are trying to do the sound work the brain handles
in everyday life -- we are trying to focus attention. Of course, there
are computer games where sound should always be ambient and never draw
attention to itself or anything else. Examples would be board games like
chess and checkers.
Questions to ask when deciding what sounds to use in a project:
1. What does the subject matter of the project suggest in the way of sound?
Come up with some adjectives that describe the project and look for sound
effects that bring these adjectives to mind.
2. How much space is available for digital sound? If space is not limited,
think in terms of high quality, but remember that lower sampling rates
can be used to make effects sound more gritty, distant or muffled. Sometimes
that is desirable. Don't be afraid to mix effects at different sampling
rates and resolutions if your sound drivers support them.
3. How many sounds can be played at one time (based upon sound driver
or hardware limitations)? If the number is limited, you will have to decide
which effects will receive priority of play.
4. What is the dynamic range of all of the sounds to be used? If it is
great, use 16 bit resolution. Otherwise, 8 bit resolution will suffice.
5. What is the range of the sampling rates to be used? Using higher sampling
rates with 8 bit resolution can increase the apparent volume of background
hiss. Since 8 bit sounds have more hiss and since hiss is generally higher
pitched, a higher sampling rate will increase the volume of those higher
pitched sounds (hissing included). Don't be afraid to experiment with
changing the resolution of noisy sounds to see if the noise can be reduced.
6. Is there going to be intelligible speech? In general, it requires a
higher sampling rate. A female voice will usually require a higher sampling
rate than a male voice. For "full bodied" speech, use 16 bit resolution.
7. Are the effects or voice-overs to be recorded professionally or are
they to be recorded with less than professional equipment? Using less
than professional equipment usually means more noise.
8. Are the sound effects going to be processed? If so, what types of processing
will be used?
9. What are the relative volumes of the different digital sounds (music/voice-over/Foley)?
Decide which sounds should have priority in volume.
10. Where do you want sound to emphasize the action in your project? Remember
that your goal is to focus the listener's attention.
11. Are ambient sounds needed? Do you ever want complete silence? Remember
that ambient sounds can mask noise.
After answering these questions, where does one start in deciding which
sfx to use in a project?
A good place to start without having to reinvent the wheel is the movies.
State of the art movie sound is years ahead of computer games, so it would
pay to take lessons from Hollywood in this respect. Of course we must
keep in mind the linearity of movies as opposed to the "random access"
of a computer game. In looking to Hollywood for lessons, I decided to
make a list of all of the movies that have won Academy Awards for sound
and/or sound effects. The list in Appendix
A is the result. I have watched and listened to many of these
movies and it is fascinating to see how Hollywood of years ago had many
of the same technical problems that we in computer gaming face today.
Early Hollywood had a major advantage -- they controlled the theaters
and the equipment contained in them. Another advantage that Hollywood
has always had with sound is that they control the whole movie experience
and know what is going to happen next. We can do that with cinematics,
but it is a challenge in truly interactive gameplay.
What do award winning sound effects in movies have in common?
1. They focus the viewer's attention.
2. They are bigger than life.
3. The sound effects and the music work together to focus the viewer's
4. There is rarely complete silence. Some background (ambient) sound is
going on almost all of the time. Otherwise, the viewer will be distracted
by some sounds outside of the movie. How many times in a completely silent
part of a movie have you been annoyed by someone talking? It is very annoying
because it causes a loss of focus. This is not to say that silence cannot
be used in a computer game. But remember that there is always the drone
of the cooling fan(s) and the buzz of a sound card.
5. They do not "get in the face" of the dialog.
6. They do not "get in the face" of one another. Usually one effect takes
precedence over all of the others.
7. They prepare us for what is to come.
8. They set us up for what is to come.
9. They distract us from what is to come.
10. They take the place of the senses that we can experience in a movie
(touch, taste, and smell).
- We know
that the cook has touched something hot when we hear his anguished cry
of pain along with the sizzle of flesh. We "feel" the pain with him.
- We can
smell the roses along with the beautiful princess when we hear her take
a deep breath while a dainty, sparkling sound plays.
- We can
taste the bitter poison along with the murder victim as we hear him
gag and froth at the mouth while a discordant sound effect plays.
help place the listener in another "reality."
General rules for better sound effects:
1. Start with the absolute best raw materials -- samples/recordings/actors/sounds/etc.
2. Start with the highest quality digital data. Record sound effects into
a portable Digital Audio Tape (DAT) recorder at 44.1k 16 bit. This leaves
nothing out of the recording and there is no tape hiss like you get on
an analog tape machine. Make sure that you recorded the hottest signal
you can without pegging the record meter. This will keep the "intelligent
data" at a high enough amplitude to cover up much of the noise present.
3. Use a high quality stereo microphone for foley effects. Use a superior
quality mono microphone for speech.
4. With the possible exception of compression on vocals, do not use outboard
effects during recording. What is compression? It is a reduction in the
dynamic range of a sound. It can help to hide noise in a sample. Digital
effects, including compression, can be added via software.
5. Use a digital sound interface card to transfer the data from the DAT
to computer via fiber optic or coaxial cables. All wave editing software
will make this a simple matter.
6. Edit the sound file before downsampling or converting to 8 bit format.
This keeps the noise out of the sample for as long as possible. It is
acceptable sometimes to convert a file before applying some type of digital
processing. The results can be interesting when the noise is used to make
a sound less realistic.
7. The experienced sound designer does not take sound for granted and
realizes that the sounds we hear every day are pretty wimpy. There is
little expectation of getting usable sound effects by recording "real"
sounds. Instead, record similar, but greatly exaggerated, sounds.
8. If you cannot help recording extraneous background noises, make sure
to get a good sample of them for noise reduction when you get back to
the digital editing software. Also, there will often be noises that you
fail to hear at the time of the recording because you are concentrating
so hard on getting the sound effect. You will want a recording of these
by themselves also.
9. But, never depend on noise reduction algorithms/equipment if at all
10. There is often some type of background noise that will become objectionable
when a 16 bit file is reduced to an 8 bit file.
And Speaking Of Noise --
What is noise? Instead of answering this subjective question, let's ask
one that can be answered more objectively. What is silence? A scientific
definition is that silence is the absence of rapid changes in air pressure.
Gordon Hempton, an Emmy-winning recordist, has searched the world over
for quiet. He defines a quiet place as "a place where, for a period of
time, there is no human intrusion. No chain saws. No trail bikes. No distant
trucks. A quiet place is a place where we're able to hear the world as
our ancestors heard it." Hempton feels lucky if he can record twenty minutes
of noise-free sound, patched together from a week's worth of work. As
a listener and a sound recordist, Hempton says that he has no control
over the performance [noise included], but he does have control over where
the audience will sit. "So that's exactly what I do -- I find the best
seats possible in nature's amphitheater."
Like it or not, even though you may prepare very carefully you will still
face the problem of noisy raw materials for your sound effects. Because
this is a universal problem, it would be good to discuss handling the
problem. Let's say that you have done everything to reduce noise when
recording the raw material but when you get down to using it you hear
noise. The problem gets worse when you reduce the sample resolution to
8 bits. So, what can you do? First of all, as was said above, always record
a few seconds of live mike with nothing but ambient noise. It is best
to start the recorder, let it run a few seconds, record the sound effect,
and then let the tape run for a few more seconds of ambiance -- all of
this being one recording without turning the recorder off. These seconds
of "quiet" will be very useful as you will see. Remember that it is very
important to keep your recording volume maximized. When you load the digital
waveform into your digital editor, find a selection in the sample where
there is only "quiet" (noise). Since you will always record a few seconds
of ambiance, this is no problem. Next, have the editing software analyze
the selection. Then use the noise reduction algorithm on the total sample.
You will probably have to work at this a while to get the feel for the
proper settings, but the time you spend doing so will be well spent. Cool
Edit and Sound Forge have excellent noise reduction algorithms.
What digital editing software is available?
A non-exhaustive list of software is in Appendix
Where can one purchase "raw sound effects" that have already been recorded?
Most game development companies license a sound effect library on CD.
The library is generally purchased as a buyout, meaning that no other
fees have to be paid to use the effects in a product. This license is
good for the life of the CD's, and you get to use the effects as many
times as you wish for as long as you own the CD's. There are differences
in the sound quality of many of the libraries -- some use true man-made
sound effects while others depend upon electronic sounds. The important
things to look for in a library are variety of sounds, quality of sounds,
and (probably most importantly) a complete index with cross references.
If you cannot find the sounds on a CD set without resorting to browsing,
you are no better off than if you had no CD's at all. The ideal situation
is to have the index and cross reference in a database that will allow
you to search for key words. Track names are too limited in most cases,
so the ability to search the descriptions of the effects on a CD becomes
a requirement if you want to hear everything that resembles what you are
looking for. Additionally, a database would allow you to add comments
so that you can make specific notes about each effect.
Many companies in the sound effect CD business offer a demo CD. This should
be your first step in researching what is available. Upon purchase of
a library, most companies will allow 30 day's satisfaction guaranteed
or money back. Ask about this if the offer is not advertised. Appendix
C lists several companies offering buyout effects CD's.
Why not buy the sound effects CD's offered in audio CD stores? You have
to watch out regarding the licensing agreement on these. Some state on
the cover that they are "royalty free," but the fine print states "royalty
free for personal use." Read the agreement carefully.
Who can design good sound effects?
Almost anyone. Some of the common characteristics of designers
of good sound effects: ability to hear the "sound effect" potential in
sounds that others overlook; above normal ability to hear pitch; ability
to visualize what would happen to a sound if it is altered by a sound
editor without actually having to perform the alteration; ability to visualize
a sound from a description of the source of the sound; ability to create
a sound that is only heard in one's head to begin with; knowledge of proper
recording technique and equipment; fondness for gadgets; patience; good
sense of humor; good looks ;).