Learning to program involves learning lots of details. To keep things simple, instructors tend to start with trivial code examples, but these end up being pretty uninteresting. I would like to start this course with an interesting application of computing as a way of motivating you to learn how to write code. I'd like to show that the payoff, from even a little bit of code, can be huge. I don't expect you to understand all the details initially, just the broad strokes. In this first lecture/lab, we're going to leverage existing libraries of code to learn how computers represent music and other audio files.
As we go along, you'll encounter a number of completely new tasks, such as installing software on your computer from the command line. Rather than provide bite-size lectures on specific topics, we'll examine some real applications that require skills and knowledge across topics. It's better to see how all of the pieces fit together, rather than looking at topics in isolation. As you gain more experience, you will look back to these early examples and have an "ah ha!" moment when everything clicks into place for you.
We all play music files on our computers. For example, here are two interesting ones: [initial sequence from Kiss by Prince, Kiss.aiff](sound/Kiss.aiff) and [ahhh sound, ahhh.mp3](sound/ahhh.mp3). You can download those and play them using your music player. But, what if we're building a game or doing speech recognition and we need Python to load sound files and play them? By leveraging libraries of code, which are like cookbooks, we can play audio files with just a few lines of Python code. You'll have an opportunity to try all of this Python code in the [sound lab for this lecture](../labs/sound.md), but for now just try to get the gist of the code and the principles behind digital audio.
To play an audio file in Python, we first have to load that audio file into memory. As we'll see shortly, an audio file is little more than a sequence of numbers. Here's some sample Python that loads in a bit of Prince's Kiss song:
The code begins by `import`ing some necessary code from some useful Python packages. The `sf.read(...)` is the key element that does the loading of the file into memory. After that statement, variable `kiss` has the audio data. The `Audio(kiss,...)` bit is technically Python code, but it is something specific to Jupyter notebooks that lets me play the sound using the browser. It is purely for presentation purposes here. In your lab, you will do something like `sd.play(kiss, ...)` instead.
print(ahhh[3000:3010])# why is each sample actually 2 numbers?
```
[[-0.02444458 -0.02212524]
[-0.02230835 -0.01843262]
[-0.01998901 -0.01403809]
[-0.01727295 -0.00921631]
[-0.0140686 -0.00402832]
[-0.01025391 0.00143433]
[-0.00570679 0.00714111]
[-0.00042725 0.01318359]
[ 0.0055542 0.01953125]
[ 0.01208496 0.02587891]]
You're probably wondering what the sample rate is and how numbers can represent audio. It works in the same way that movies grab snapshots (pictures) very frequently. Playing them back at the same speed gives the illusion of movement. How often a movie takes a picture is called the frame rate and might be something like 32 frames per second. An audio file takes a snapshot as well, but instead of an image, it grabs the audio volume (sound pressure) at a particular instant. A very common sampling rate for music is 44,100 times per second (44,100 Hertz). During audio playback, each value is used to deflect the diaphram of a speaker away from its neutral position. Believe it or not, this shakes the air molecules in the room in a way that reproduces the original sound. See this speaker movement in action in an awesome scene from Big Bang theory:
'''
[[-0.02444458 -0.02212524]
[-0.02230835 -0.01843262]
[-0.01998901 -0.01403809]
[-0.01727295 -0.00921631]
[-0.0140686 -0.00402832]
[-0.01025391 0.00143433]
[-0.00570679 0.00714111]
[-0.00042725 0.01318359]
[ 0.0055542 0.01953125]
[ 0.01208496 0.02587891]]
'''
```
您可能想知道,采样率是多少以及数字如何表示音频。 它的工作方式与电影非常频繁地抓取快照(图片)的方式相同。 以相同的速度播放它们会产生运动的错觉。 电影拍照的频率称为帧速率,可能是每秒 32 帧。 音频文件也会拍摄快照,但不是图像,而是在特定时刻获取音量(声压)。 音频的一个非常常见的采样率是每秒 44,100 次(44,100 赫兹)。 在音频回放期间,每个值用于使扬声器的隔膜偏离其中间位置。 信不信由你,这会以一种再现原始声音的方式震动房间内的空气分子。 在 Big Bang 理论的一个令人敬畏的场景中,看看这个演讲者的动作:
```python
fromIPython.displayimportYouTubeVideo
YouTubeVideo("2CJJ6FrfuGU")
```
<https://www.youtube.com/embed/2CJJ6FrfuGU>
A microphone is the opposite of a speaker and has a very sensitive diaphragm that subtly vibrates in the presence of soundwaves. If we measure the deflection of the microphone away from neutral at a very fast and regular rate, we *digitize* a signal such as an audio signal. Graphically, it looks like this time vs amplitude plot (magnitude of microphone deflection):
The microphone is wiggling in a continuous fashion and knows nothing about sampling rate. It is a so-called analog signal. To get that into a computer, we must convert it to numbers. The numbers you saw above for the Kiss song are the result of digitization.
Now let's go the other way by generating and digitizing our own simple signal then seeing what it sounds like. The key bit in the Python code that follows is the `sin(2*numpy.pi*440*t)` that creates a sine wave at 440 hz (440 full sine waves per second, 440 cycles through 0..2π radians or 0..360 degrees per second). The `plt.scatter(...)` draws the signal versus time (the X axis).
麦克风以连续的方式摆动,对采样率一无所知。这是一种所谓的模拟信号。要将其放入计算机,我们必须将其转换为数字。 您在上面看到的 Kiss 歌曲的数字是数字化的结果。
plt.figure(figsize=(8,2.5))# Prepare a plot 8x2.5 inches
plt.scatter(t[0:1000],y[0:1000],s=1)
plt.show()
```
66150 samples in 1.5 seconds
# 66150 samples in 1.5 seconds
```
![png](img/1.1_sound_12_1.png)
**运动**:如果我们通过扬声器运行它,你觉得它是什么?
**Exercise**: What do you think that will sound like if we run it through the speaker?
It's a pure tone at 440Hz. Imagine a speaker moving out then in then out repeatedly the same distance each time. Now if you move your hand like a speaker up and down you get sort of a Boing Boing Boing motion. Now start walking and move your hand up and down at the same rate. To an observer, the motion looks like a sine wave! So, that is what is happening with the speaker. A constant deflection up and down gives a pure tone to the human ear.
y2=numpy.sin(2*numpy.pi*700*t)# pure sine wave at 700 Hz
Audio(y2,rate=fs)
```
<audiocontrols="controls">
<sourcesrc="img/audio4.wav"type="audio/wav"/>
<ahref='img/audio4.wav'>audio4.wav</a>
</audio>
**Exercise**: If I add those signals together and play the result, what do you think it will sound like?
**练习**:如果我将这些信号加在一起并播放结果,你觉得它是什么?
```python
plt.figure(figsize=(8,2.5))
...
...
@@ -207,33 +164,22 @@ plt.scatter(t[0:1000],y[0:1000]+y2[0:1000],s=1) # zoom in on y+y2 for a plot
plt.show()
```
![png](img/1.1_sound_19_0.png)
Yep, we hear the sounds merged together as a chord. Mathematically, what we are doing is simply adding the signal amplitudes together which we can do with `y+y2` where `y` and `y2` our lists of numbers. Adding the vectors adds the ith elements together to get a new signal, which we can plot and play:
是的,我们听到声音合并为一个和弦。在数学上,我们正在做的只是将信号振幅加在一起,我们可以用`y + y2`来做,其中`y`和`y2`是我们的数字列表。向量加法将第 i 个元素添加到一起来获得新信号,我们可以绘制和播放:
If your wondering why that sounds like the button tones on a phone, it's because phone buttons play two pure tones as the sound to identify which button you pushed.
Now let's look at the signal plots for the two audio files:
现在让我们看看两个音频文件的信号图:
```python
plt.figure(figsize=(10,2.5))
...
...
@@ -241,24 +187,18 @@ plt.plot(kiss);
plt.show()
```
![png](img/1.1_sound_23_0.png)
```python
plt.figure(figsize=(10,2.5))
plt.plot(ahhh);# notice this one has two plots because it is a stereo signal
Those complicated signals can all be decomposed into the addition of a series of pure-tone sine waves. The frequency of the sine waves indicate the frequencies of sound (pitch) present in the audio signal. Humans can hear from about 150Hz to 17,000Hz I think.
A really cool looking plot is the so-called spectrogram, which shows the frequencies in use at a particular moment in time:
一个非常酷的图是所谓的频谱图,它显示了特定时刻存在的频率:
```python
fs=44100# sampling frequency
...
...
@@ -275,9 +215,9 @@ plt.show()
![png](img/1.1_sound_26_0.png)
### Exercises
### 练习
Ok, now that we've got an idea how computers represent music, let's do a lab that lets you try playing sounds via Python. But first, we really need to do a simple "hello" program to introduce the Python tools we'll use in this class.