import numpy as np # This is the main numerical library we will use
import matplotlib.pyplot as plt # This is the main plotting library we will use
import scipy.io.wavfile as wavfile # We will use this library to load in audio
import IPython.display as ipd # This is a library that allows us to play audio samples in Jupyter
The first thing we did as a class is load in audio from a .wav file of someone saying "Happy Birthday." Audio is often sampled at 44100 samples per second, and we see indeed that fs=44100. As we discussed, we need a sampling frequency that's at least twice the highest frequency we want to represent. Since the highest frequency humans can hear is around 20,000hz, 44100hz is adequate. (Another fun fact about this number is it is $2^2 \times 3^2 \times 5^2 \times 7^2$)
In this particular "happy birthday" example, the audio is about 3 seconds long, so we see when we plot the array x that it has about 120,000 samples. Looking at the plot of the audio waveform, we also see that "birth" occurs between about 54000 and 70000 samples. So when we slice that out, we only get "birth."
# Load in the audio with wavfile.read. It gives us both
# the sample rate (fs) and the audio samples as a numpy array (x)
fs, x = wavfile.read("happybirthday.wav")
plt.subplot(2, 1, 1)
plt.plot(x)
plt.title("Full Audio Waveform of \"Happy Birthday\"")
plt.subplot(2, 1, 2)
y = x[54000:70000]
plt.plot(y)
plt.title("\"Birth\" Only")
plt.tight_layout()
ipd.Audio(x, rate=fs)
#print("The sample rates is %i samples/second"%fs)
Next, we saw what happens when we change the rate at which we play back the audio. If we make the playback rate less than the sample rate, then the audio comes out sounding very low pitched and lethargic. Conversely, if we play the audio back at a faster rate, it sounds very quick and high-pitched, like a "chipmunk voice."
## Play twice as fast
ipd.Audio(x, rate=int(fs/2))
We also saw that if we do a different type of slicing and only take one every 12 samples, and play it back 12 times slower, we do get the same sound, but something strange has happened to it called "aliasing." This animation by one of my collaborators explains this concept very well. Basically, if we don't take enough samples to represent a high frequency, it comes back sounding like a lower frequency
Having some fun with #matplotlib animation to illustrate aliasing pic.twitter.com/8LM2XOSThe
— brianmcfee (@functiontelechy) December 24, 2019
# "Decimate" in time by a factor of 12 (take 1 out of every 12 samples) to demo aliasing
fac = 12 ## Downsample factor
y = x[0::fac]
plt.plot(y)
ipd.Audio(y, rate=fs/fac)
We can also do the reverse slicing operation to play the audio backwards. One thing I'd recommend trying is to reverse a sound, then record yourself matching that as well as you can. Then, reverse your sound and see if it sounds like the original
y = x[::-1] #Reverse audio
ipd.Audio(y, rate=fs)
We now did something similar to HW1a generating a sinusoid, but we chose the number of samples based on the sample rate of audio, and we chose the frequency to be 440hz, or the pitch of a "concert A." We then applied the formula
$ f = 440 \times 2^{(h/12)} $
to compute the fundamental frequency $f$ of a note that is $h$ halfsteps away from concert A. When we play back the arrays we generate as an audio waveform, we can hear the different pitches as "pure tone" notes. The current example shows $h=2$, which is a B two halfsteps above A.
# TODO: Change to use formula for halfsteps
h = 0 # Number of halfsteps away from concert A
freq = 660*(2**(h/12))
print(freq)
fs = 44100 # 44100 samples per second
seconds = 1 # Note duration of 1 seconds
# Generate array with seconds*sample_rate steps, ranging between 0 and seconds
t = np.linspace(0, seconds, seconds * fs)
# Generate a 440 Hz sine wave
y = np.sin(2*np.pi*freq*t)
plt.figure(figsize=(8, 4))
plt.plot(t, y)
plt.xlim([0, 0.05]) # Show the first 50 milliseconds
plt.xlabel("Seconds")
plt.ylabel("Value")
plt.title("660hz Frequency")
plt.savefig("660.svg", bbox_inches='tight')
ipd.Audio(y, rate=fs)
wavfile.write("660.wav", fs, y)
660.0
We then did an exercise together in which we made a 3 second audio clip, with the first second containing a 440hz A, the 2nd second containing a C#, which is 4 halfsteps above A, and the third second containing an E, which is 7 halfsteps above an A. Since there are $fs$ samples in a second, we use expressions in terms of $fs$ to mark the beginning and end of second intervals
## Class exercise: Play the major triad notes A, C#, E in a sequence
fs = 44100
y = np.zeros(fs*3)
t = np.linspace(0, 1, fs)
fA = 440
fCSharp = 440*(2**(4/12)) # A C# is 4 halfsteps above A
fE = 440*(2**(7/12)) # An E is 7 halfsteps above A
y[0:fs] = np.sin(2*np.pi*fA*t)
y[fs:fs*2] = np.sin(2*np.pi*fCSharp*t)
y[fs*2:fs*3] = np.sin(2*np.pi*fE*t)
ipd.Audio(y, rate=fs)
We can also add these three notes on top of each other to create a "major chord."
fs = 44100
t = np.linspace(0, 1, fs)
y = np.zeros(fs)
y = np.sin(2*np.pi*fA*t)
y = y + np.sin(2*np.pi*fCSharp*t)
y = y + np.sin(2*np.pi*fE*t)
ipd.Audio(y, rate=fs)
It's getting a little old to have to write the same kind of code over and over again though. Instead, we should make one "block" that we can call over and over again. This is called a method (more details here). Below, we create a method that creates a sinusoid corresponding to a particular note at a particular frequency, for a particular duration. You will have to something similar on the homework.
Note: There's a "docstring" at the top of this method, which is an extended comment (written as a "multiline string"), which gives information about the input and output of this method. This is a standard way for people who write libraries to give people who are using them more information about how to use them.
def make_sinusoid(note, fs, duration):
"""
Parameters
----------
note: int
The number of halfsteps away from concert A
fs: int
The sample rate
duration: float
The number of seconds elapsed
Returns
-------
ndarray(N)
An array of samples from an appropriate sinusoid
"""
## Step 1: Setup time samples
t = np.linspace(0, duration, int(fs*duration))
## Step 2: Figure out appropriate frequency
f = 440*(2**(note/12))
## Step 3: Make the array with the appropriate sine
y = np.sin(2*np.pi*f*t)
return y
We'll now use this method to compose a little ditty consisting of two A notes, a C#, and two E notes, each 0.5 seconds long with a space of 0.1 seconds before them. Note how we have to "cast" the expression fs*0.1
to an integer value before we create an array of all zeros of that length. Also, the concatenate
command, which puts all of the notes together one after another, is a bit strange, since it requires two sets of parenthesese. But this is because it's more general than how we're using it (no need to worry about this too much right now)
fs = 44100
xAHalf = make_sinusoid(0, fs, 0.5) # A that's a half a second long
xCSHalf = make_sinusoid(4, fs, 0.5) # C# that's a half a second long
xEHalf = make_sinusoid(7, fs, 0.5)
xSpace = np.zeros(int(fs*0.1)) ## Put a tenth of a second space
x = np.concatenate((xAHalf, xSpace, xAHalf, xSpace, xCSHalf, xSpace, xEHalf, xSpace, xEHalf))
ipd.Audio(x, rate=fs)
x = np.concatenate((np.array([0, 1, 2]), np.array([3, 4, 5])))
print(x)
[0 1 2 3 4 5]
## Just a quick demo of what int() does to a decimal number,
# based on a question in class. It always rounds down!
int(5.7)
5
Now, we make another kind of periodic waveform known as a "sawtooth wave," which we can get by starting with a bunch of samples in sequence from arange
, and replacing each sample with the remainder after division by 100. Amazingly, this also gives a pitched soundn which sounds like a "concert A," but this sound is a bit "richer" than the pure sinusoid we saw.
In fact, this yields a periodic sound with period 100 samples, and at a sample rate of 44100 cycles / second, this has a frequency of 441hz (which is very close to the 440hz concert A we've been listening to). We saw im class that when we decrease the number by which we're dividing to get the remaider, the period decreases (the waves get closer together), and the pitch goes up. Conversely, when the number goes up and the period goes up, the frequency also goes down.
np.mod(np.arange(50), 5)
array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4])
# Show sawtooth wave
fs = 44100
t = np.arange(fs)
x = np.mod(t, 100)
plt.plot(x[0:1000])
ipd.Audio(x, rate=fs)
We now start to use the fact that multiple sounds can play directly on top of each other to simulate echoes. We start by manually slicing out part of the "happy birthday" sound and playing it back slightly later. This leads to the echo effect
fs, x = wavfile.read("happybirthday.wav")
ipd.Audio(x, rate=fs)
y = np.array(x) # Copy over x
delay = int(fs/4)
y[delay::] = y[delay::] + y[0:-delay]
ipd.Audio(y, rate=fs)
However, it can get quite tedious to add more echoes if we have to keep doing these slices. Instead, there's an operation called convolution which can accomplish this for us. Basically, convolution is an echo simulator. Given an array x that contains a sound and an array h which contains how much echo is coming back at any particular time, the convolution of x and h will properly superimpose the echoes of x according to h. The animation below shows this process
So let's now create an echo signal h that has a pulse every 6000 samples, and we will "convolve" (take the convolution) of that with x. In the result, we will now hear many echoes of "happy birthday" on top of each other. In the code below, try messing with the slice when constructing h to try different numbers and spacings of the pulses
h = np.zeros(fs)
## Impulse response that we create by
## setting every 6000th sample to 1
h[0::4000] = 1
plt.plot(h)
plt.title("h (Echo signal)")
y = np.convolve(x, h)
ipd.Audio(y, rate=fs)
There's an interesting minor variation we can do of this if we convolve with a signal that has ones everywhere. This can be viewed as a kind of averaging over time, and it has the effect of a "lowpass filter," or deadening the sound a bit.
## Show basic lowpass filter
h = np.ones(4000)
y = np.convolve(x, h, 'valid')
ipd.Audio(y, rate=fs)
The h we made above with a periodic "pulse train" with equal lengths between pulses is a case of what's known of as a "comb filter" (since the echo signal looks like a comb), and it has some remarkable properties. To get an idea of its capabilities, we start with an array in which every element is drawn from a Gaussian random distribution, which sounds just like "static"
fs = 44100
x = np.random.randn(fs*3)
plt.plot(x[0:1000])
ipd.Audio(x, rate=fs)
fs = 44100
x = np.random.randn(fs*3)
h = np.ones(1000)
y = np.convolve(x, h)
plt.plot(y[0:1000])
ipd.Audio(y, rate=fs)
Amazingly, when we then take this completely random signal and use convolution to simulate its echoes taking place every 100 samples, we get our concert A again! And the more echoes we add, the clearer the A sound becomes (you can change this by changing the length of h, which will influence how many evenly spaced echoes with 100 samples in between them can be placed there). Even when plotting 500 samples pulled out of the signal, the periodicity becomes visible. So comb filters are able to "shape noise" into a particular pitch that sounds quite ritch compared to a pure sinusoid
fs = 44100
x = np.random.randn(fs)
h = np.zeros(fs)
h[0::75] = 1 ## Impulse response
y = np.convolve(x, h)
plt.subplot(2, 1, 1)
plt.plot(h[0:1000])
plt.subplot(2, 1, 2)
plt.plot(y[fs:fs+500])
ipd.Audio(y, rate=fs)
We're now going to use a tool that shows a different view of our audio by showing us the frequency content of the audio. We will generate plots that show us how strong different sinusoids are in our signal. This is also known as a "Fourier Magnitude" plot. Run the block below to initialize a function which does this
def plot_fourier_mag(x, fs):
"""
Given audio samples and the sample rate, plot
the magnitude of the Fourier transform of x with
the appropriate frequency labels
Parameters
----------
x: ndarray(N)
The audio samples
fs: int
The sample rate in hz
"""
xft = np.abs(np.fft.fft(x))
freqs = np.fft.fftfreq(len(x), 1/fs)
plt.plot(freqs[freqs > 0], xft[freqs > 0])
plt.xlabel("Frequency")
plt.ylabel("Magnitude")
return xft
Now, we add three sinusoids together to create a major triad chord, and we see that there are three peaks in the Fourier magnitude plot corresponding to these three frequencies
## TODO: Add another frequency,
t = np.arange(fs)
freq = 440
fs = 44100
seconds = 1
freq2 = 440*2**(7/12)
freq3 = 440*2**(4/12)
# Generate array with seconds*sample_rate steps, ranging between 0 and seconds
t = np.linspace(0, seconds, seconds * fs, False)
x = np.sin(2*np.pi*freq*t)
x = x + np.sin(2*np.pi*freq2*t)
x = x + np.sin(2*np.pi*freq3*t)
# Take the fourier transform of x
plot_fourier_mag(x, fs)
plt.xlim([0, 2000])
ipd.Audio(x, rate=fs)
It's a lot clearer to see waht's going on in this plot than if we try to look at a snippet of the raw audio from the same signal
plt.plot(x[0:2000])
[<matplotlib.lines.Line2D at 0x7fe6047b00a0>]
Let's now generate a sawtooth wave of period 100 and look at its Fourier Transform magnitude plot.
fs = 44100
T = 100
x = np.arange(fs)
x = np.mod(x, T)
plt.subplot(2, 1, 1)
plt.plot(x[0:1000])
plt.title("Raw signal")
plt.subplot(2, 1, 2)
plot_fourier_mag(x, fs)
plt.xlim([0, 8000])
plt.title("Fourier Transform")
plt.tight_layout()
ipd.Audio(x, rate=fs)
First, we note that its fundamental frequency is 441 cycles/second. This is because
So we see a peak there in the frequency magnitude plot. However, we also see peaks at every integer multiple of 441! So, for example, we see peaks at 882, 1323, and 1764. Interestingly, the amplitude of these peaks falls of as $1/i$, where $i$ is the multiplicative factor in front of the frequency. So, for instance, the 1323hz sinusoid has a factor of $1/3$ in front of it. Furthermore, this is not visible in the magnitude plot, but the coefficients alternate in sign. So the the coefficient in front of the 440hz sinusoid is +1, the coefficient in front of the 880hz sinusoid is -1/2, the coefficient in front of the 1323hz is 1/3, and so on. This is referred to as an alternating harmonic sequence. Let's write some code below to add the first 7 sinusoids together, following this pattern
## TODO: Look at sinusoidal approximation of sawtooth wave
fs = 44100
seconds = 1
freq = 440
t = np.linspace(0, seconds, seconds * fs, False)
y = -np.sin(2*np.pi*freq*t)
y = y - (1.0/2.0)*np.sin(2*np.pi*2*freq*t)
y = y - (1.0/3.0)*np.sin(2*np.pi*3*freq*t)
y = y - (1.0/4.0)*np.sin(2*np.pi*4*freq*t)
y = y - (1.0/5.0)*np.sin(2*np.pi*5*freq*t)
y = y - (1.0/6.0)*np.sin(2*np.pi*6*freq*t)
y = y - (1.0/7.0)*np.sin(2*np.pi*7*freq*t)
plt.subplot(211)
plt.plot(y[0:1000])
plt.title("Sawtooth approximation of sinusoid")
plt.subplot(212)
plot_fourier_mag(y, fs)
plt.xlim([0, 8000])
plt.title("Discrete Fourier Transform Magnitude")
plt.tight_layout()
ipd.Audio(y, rate=fs)
What we see is a "wiggly" approximation of the sawtooth wave we started with. It sounds a lot like the sawtooth wave, except it is "muffled" a bit, since it's missing the higher harmonics.
The code above is quite repetitive, though, so we now give a sneak preview of a "loop," which is a programming paradigm that we will talk about more in the next unit. The loop allows us to do the same thing over and over again with minor alterations. In particular, we can add in 50 harmonics with the appropriate coefficients in an alternative harmonic sequence, with just 4 lines of code. When we do this, we see we get an even closer approximation of the sawtooth wave.
NOTE: Sawtooth waves are a good approximation for what a bowed string does as the string vibrates. Basically, the bow keeps sticking and slipping over and over again as the string vibrates. The slip is the sharp transition you see, and the sticking is the ramp. If you listen carefully, you can convince yourself that it's kind of like a bowed string
## Do the same thing with a for loop
y = np.zeros(fs)
n_harmonics = 6
t = np.linspace(0, 1, fs)
freq = 440
for i in range(1, n_harmonics+1):
coeff = (1.0/i)*(-1)**(i+1)
y = y + coeff*np.sin(2*np.pi*freq*i*t)
plt.subplot(211)
plt.plot(y[0:1000])
plt.title("Sawtooth approximation of sinusoid")
plt.subplot(212)
plot_fourier_mag(y, fs)
plt.xlim([0, 8000])
plt.title("Discrete Fourier Transform Magnitude")
plt.tight_layout()
ipd.Audio(y, rate=fs)
Finally, we show a similar effect on audio of speech by applying a "lowpass filter." First, we note that natural audio has most of its energy in the low frequencies, although there is a bit of energy concentrated around 3000hz.
# look at speech, look at noise
# Show effect of lowpass filter
fs, x = wavfile.read("gameover.wav")
plt.subplot(2, 1, 1)
plt.plot(x)
plt.title("Audio")
plt.subplot(2, 1, 2)
xft = plot_fourier_mag(x, fs)
plt.xlim([0, 8000])
plt.title("Fourier Magnitude")
#plt.yscale("log")
plt.ylim([10, np.max(xft)])
plt.tight_layout()
ipd.Audio(x, rate=fs)
However, if we apply a rudimentary "lowpass filter," by convolving with an array of length 1000 of all 1s, we kill the high frequencies and retain only the low ones. The audio then sounds "muffled," just like the approximation of the sawtooth wave with only a few sinusoids
fs, x = wavfile.read("gameover.wav")
x = np.convolve(np.ones(1000), x)
plt.subplot(2, 1, 1)
plt.plot(x)
plt.title("Audio")
plt.subplot(2, 1, 2)
xft = plot_fourier_mag(x, fs)
plt.xlim([0, 8000])
plt.title("Fourier Magnitude")
#plt.yscale("log")
plt.ylim([10, np.max(xft)])
plt.tight_layout()
ipd.Audio(x, rate=fs)