提交 1cddc478 编写于 作者: W wizardforcel

1.1

上级 d1bc2159
# A bit of motivation (Audio processing)
# 一些动机(音频处理)
Learning to program involves learning lots of details. To keep things simple, instructors tend to start with trivial code examples, but these end up being pretty uninteresting. I would like to start this course with an interesting application of computing as a way of motivating you to learn how to write code. I'd like to show that the payoff, from even a little bit of code, can be huge. I don't expect you to understand all the details initially, just the broad strokes. In this first lecture/lab, we're going to leverage existing libraries of code to learn how computers represent music and other audio files.
学习编程涉及学习很多细节。 为了简单起见,教师倾向于从简单的代码示例开始,但这些最终变得非常无趣。 我想用一个有趣的计算应用开始本课程,来激励你学习如何编写代码。 我想表明,即使是一点点代码,回报也可能是巨大的。 我不希望你最初了解所有的细节,只是广泛的笔画。 在第一个讲义/实验中,我们将利用现有的代码库,来了解计算机如何表示音乐和其他音频文件。
As we go along, you'll encounter a number of completely new tasks, such as installing software on your computer from the command line. Rather than provide bite-size lectures on specific topics, we'll examine some real applications that require skills and knowledge across topics. It's better to see how all of the pieces fit together, rather than looking at topics in isolation. As you gain more experience, you will look back to these early examples and have an "ah ha!" moment when everything clicks into place for you.
随着我们进行下去,您将遇到许多全新的任务,例如从命令行在您的计算机上安装软件。 我们将研究一些实际应用,需要跨主题技能和知识,而不是针对特定主题提供一些讲座。 最好看看所有部分是如何组合在一起的,而不是孤立地看待主题。 随着您获得更多经验,您将回顾这些早期的例子,当你了解了一切,你会觉得“啊哈!” 。
## Playing sound files
## 播放声音文件
We all play music files on our computers. For example, here are two interesting ones: [initial sequence from Kiss by Prince, Kiss.aiff](sound/Kiss.aiff) and [ahhh sound, ahhh.mp3](sound/ahhh.mp3). You can download those and play them using your music player. But, what if we're building a game or doing speech recognition and we need Python to load sound files and play them? By leveraging libraries of code, which are like cookbooks, we can play audio files with just a few lines of Python code. You'll have an opportunity to try all of this Python code in the [sound lab for this lecture](../labs/sound.md), but for now just try to get the gist of the code and the principles behind digital audio.
To play an audio file in Python, we first have to load that audio file into memory. As we'll see shortly, an audio file is little more than a sequence of numbers. Here's some sample Python that loads in a bit of Prince's Kiss song:
我们都在电脑上播放音乐文件。 例如,这里有两个有趣的:[Kiss by Prince,Kiss.aiff 的初始序列](sound/Kiss.aiff)[ahhh sound,ahhh.mp3](sound/ahhh.mp3)。 您可以下载这些并使用音乐播放器播放它们。 但是,如果我们正在构建游戏,或进行语音识别,并且我们需要 Python 来加载声音文件并播放它们呢?通过利用类似烹饪书的代码库,我们可以使用几行 Python 代码来播放音频文件。您将有机会在[本课程的声音实验](../labs/sound.md)中,尝试所有这些 Python 代码,但现在只是尝试获取代码的要点,和数字音频背后的原理。
要在 Python 中播放音频文件,我们首先必须将该音频文件加载到内存中。 我们很快就会看到,音频文件只不过是一系列数字。 这里有一些 Python 示例,加载了 Prince's Kiss 的一首歌:
```python
import soundfile as sf
......@@ -20,22 +19,15 @@ kiss, samplerate = sf.read('sound/Kiss.aiff')
Audio(kiss, rate=samplerate)
```
<audio controls="controls" >
<source src="img/Kiss.wav" type="audio/wav" />
<a href='img/Kiss.wav'>Kiss.wav</a>
</audio>
代码首先从一些有用的 Python 包中导入一些必要的代码。 `sf.read(...)`是将文件加载到内存中的关键元素。 在该语句之后,变量`kiss`持有音频数据。 `Audio(kiss,...)`在技术上是 Python 代码,但它是特定于 Jupyter 笔记本的东西,让我可以使用浏览器播放声音。 这纯粹是为了演示目的。 在你的实验里,你会做一些像`sd.play(kiss, ...)`之类的东西。
The code begins by `import`ing some necessary code from some useful Python packages. The `sf.read(...)` is the key element that does the loading of the file into memory. After that statement, variable `kiss` has the audio data. The `Audio(kiss,...)` bit is technically Python code, but it is something specific to Jupyter notebooks that lets me play the sound using the browser. It is purely for presentation purposes here. In your lab, you will do something like `sd.play(kiss, ...)` instead.
Here's another audio file:
这是另一个音频文件:
```python
import sounddevice as sd
......@@ -44,19 +36,13 @@ Audio(ahhh[:,0], rate=samplerate)
```
<audio controls="controls" >
<source src="img/ahhh.wav" type="audio/wav" />
<a href='img/ahhh.wav'>ahhh.wav</a>
</audio>
To see what's inside the Kiss audio, we can print a subset of the values in variable `kiss`:
要查看 Kiss 音频中的内容,我们可以打印变量`kiss`中值的一个子集:
```python
import numpy as np
......@@ -67,58 +53,51 @@ np.set_printoptions(suppress=True) # weird numpy thing to avoid scientific notat
```python
print(f"n = {len(kiss)}, rate ={samplerate}hz")
print(kiss[5000:5020]) # kiss is a numpy ndarray that you will become intimately familiar with
```
n = 123269, rate =44100hz
[ 0.00003052 0. -0.00009155 0.00018311 -0.00024414 0.00030518
-0.00033569 0.00030518 -0.00027466 0.00027466 -0.00021362 0.00006104
0.00003052 -0.00003052 0.00006104 -0.00003052 -0.00009155 0.00015259
-0.00015259 0.00015259]
We can do the same for "ahhh".
'''
n = 123269, rate =44100hz
[ 0.00003052 0. -0.00009155 0.00018311 -0.00024414 0.00030518
-0.00033569 0.00030518 -0.00027466 0.00027466 -0.00021362 0.00006104
0.00003052 -0.00003052 0.00006104 -0.00003052 -0.00009155 0.00015259
-0.00015259 0.00015259]
'''
```
我们可以为`ahhh`做同样的事情。
```python
print(ahhh[3000:3010]) # why is each sample actually 2 numbers?
```
[[-0.02444458 -0.02212524]
[-0.02230835 -0.01843262]
[-0.01998901 -0.01403809]
[-0.01727295 -0.00921631]
[-0.0140686 -0.00402832]
[-0.01025391 0.00143433]
[-0.00570679 0.00714111]
[-0.00042725 0.01318359]
[ 0.0055542 0.01953125]
[ 0.01208496 0.02587891]]
You're probably wondering what the sample rate is and how numbers can represent audio. It works in the same way that movies grab snapshots (pictures) very frequently. Playing them back at the same speed gives the illusion of movement. How often a movie takes a picture is called the frame rate and might be something like 32 frames per second. An audio file takes a snapshot as well, but instead of an image, it grabs the audio volume (sound pressure) at a particular instant. A very common sampling rate for music is 44,100 times per second (44,100 Hertz). During audio playback, each value is used to deflect the diaphram of a speaker away from its neutral position. Believe it or not, this shakes the air molecules in the room in a way that reproduces the original sound. See this speaker movement in action in an awesome scene from Big Bang theory:
'''
[[-0.02444458 -0.02212524]
[-0.02230835 -0.01843262]
[-0.01998901 -0.01403809]
[-0.01727295 -0.00921631]
[-0.0140686 -0.00402832]
[-0.01025391 0.00143433]
[-0.00570679 0.00714111]
[-0.00042725 0.01318359]
[ 0.0055542 0.01953125]
[ 0.01208496 0.02587891]]
'''
```
您可能想知道,采样率是多少以及数字如何表示音频。 它的工作方式与电影非常频繁地抓取快照(图片)的方式相同。 以相同的速度播放它们会产生运动的错觉。 电影拍照的频率称为帧速率,可能是每秒 32 帧。 音频文件也会拍摄快照,但不是图像,而是在特定时刻获取音量(声压)。 音频的一个非常常见的采样率是每秒 44,100 次(44,100 赫兹)。 在音频回放期间,每个值用于使扬声器的隔膜偏离其中间位置。 信不信由你,这会以一种再现原始声音的方式震动房间内的空气分子。 在 Big Bang 理论的一个令人敬畏的场景中,看看这个演讲者的动作:
```python
from IPython.display import YouTubeVideo
YouTubeVideo("2CJJ6FrfuGU")
```
<https://www.youtube.com/embed/2CJJ6FrfuGU>
A microphone is the opposite of a speaker and has a very sensitive diaphragm that subtly vibrates in the presence of soundwaves. If we measure the deflection of the microphone away from neutral at a very fast and regular rate, we *digitize* a signal such as an audio signal. Graphically, it looks like this time vs amplitude plot (magnitude of microphone deflection):
麦克风与扬声器相对,并且具有非常灵敏的振膜,在声波的存在下巧妙地振动。 如果我们以非常快速和规则的速率测量麦克风远离中线的偏离,我们将信号(例如音频信号)数字化。 在图形上,它看起来像这个时间-振幅图(麦克风偏离的幅度)
<img src="img/Signal_Sampling.png" width="300">
The microphone is wiggling in a continuous fashion and knows nothing about sampling rate. It is a so-called analog signal. To get that into a computer, we must convert it to numbers. The numbers you saw above for the Kiss song are the result of digitization.
Now let's go the other way by generating and digitizing our own simple signal then seeing what it sounds like. The key bit in the Python code that follows is the `sin(2*numpy.pi*440*t)` that creates a sine wave at 440 hz (440 full sine waves per second, 440 cycles through 0..2&pi; radians or 0..360 degrees per second). The `plt.scatter(...)` draws the signal versus time (the X axis).
麦克风以连续的方式摆动,对采样率一无所知。这是一种所谓的模拟信号。要将其放入计算机,我们必须将其转换为数字。 您在上面看到的 Kiss 歌曲的数字是数字化的结果。
现在让我们遵循另一种方式,通过生成和数字化我们自己的简单信号,然后看看它听起来的样子。 接下来的 Python 代码中的关键位是`sin(2*numpy.pi*440*t)`,它创建一个 440 赫兹的正弦波(每秒 440 个完整正弦波,每秒通过 0 到 2pi 440个周期)。 `plt.scatter(...)`绘制信号与时间(X 轴)。
```python
import numpy
......@@ -132,41 +111,29 @@ print(len(y), "samples in", T, "seconds")
plt.figure(figsize=(8, 2.5)) # Prepare a plot 8x2.5 inches
plt.scatter(t[0:1000],y[0:1000],s=1)
plt.show()
```
66150 samples in 1.5 seconds
# 66150 samples in 1.5 seconds
```
![png](img/1.1_sound_12_1.png)
**运动**:如果我们通过扬声器运行它,你觉得它是什么?
**Exercise**: What do you think that will sound like if we run it through the speaker?
It's a pure tone at 440Hz. Imagine a speaker moving out then in then out repeatedly the same distance each time. Now if you move your hand like a speaker up and down you get sort of a Boing Boing Boing motion. Now start walking and move your hand up and down at the same rate. To an observer, the motion looks like a sine wave! So, that is what is happening with the speaker. A constant deflection up and down gives a pure tone to the human ear.
这是 440Hz 的纯音。 想象一下,一个扬声器移出移入,然后每次重复相同的距离。现在,如果你像扬声器一样上下移动你的手,你会得到一种 Boing Boing Boing 动作。 现在开始走路并以相同的速度上下移动你的手。 对观察者来说,这个动作看起来像一个正弦波! 那么,这就是扬声器正在做的事情。上下持续偏离为人耳提供了纯音。
```python
from IPython.display import Audio
Audio(y, rate=fs)
```
<audio controls="controls" >
<source src="img/audio3.wav" type="audio/wav" />
<a href='img/audio3.wav'>audio3.wav</a>
</audio>
让我们制作一个更高频率(700 Hz)的另一个信号`y2`
Let's make another signal, `y2`, that is at a higher frequency (700 Hz).
**Exercise**: What do you think it will sound like in comparison to the previous signal?
**练习**:您认为与之前的信号相比,它听起来像什么?
```python
y2 = numpy.sin(2*numpy.pi*700*t) # pure sine wave at 440 Hz
......@@ -176,30 +143,20 @@ plt.scatter(t[0:1000],y2[0:1000],s=1)
plt.show()
```
![png](img/1.1_sound_16_0.png)
```python
from IPython.display import Audio
y2 = numpy.sin(2*numpy.pi*700*t) # pure sine wave at 700 Hz
Audio(y2, rate=fs)
```
<audio controls="controls" >
<source src="img/audio4.wav" type="audio/wav" />
<a href='img/audio4.wav'>audio4.wav</a>
</audio>
**Exercise**: If I add those signals together and play the result, what do you think it will sound like?
**练习**:如果我将这些信号加在一起并播放结果,你觉得它是什么?
```python
plt.figure(figsize=(8, 2.5))
......@@ -207,33 +164,22 @@ plt.scatter(t[0:1000],y[0:1000]+y2[0:1000],s=1) # zoom in on y+y2 for a plot
plt.show()
```
![png](img/1.1_sound_19_0.png)
Yep, we hear the sounds merged together as a chord. Mathematically, what we are doing is simply adding the signal amplitudes together which we can do with `y+y2` where `y` and `y2` our lists of numbers. Adding the vectors adds the ith elements together to get a new signal, which we can plot and play:
是的,我们听到声音合并为一个和弦。在数学上,我们正在做的只是将信号振幅加在一起,我们可以用`y + y2`来做,其中`y``y2`是我们的数字列表。向量加法将第 i 个元素添加到一起来获得新信号,我们可以绘制和播放:
```python
Audio(y+y2, rate=fs) # Play both sounds together
```
<audio controls="controls" >
<source src="img/audio5.wav" type="audio/wav" />
<a href='img/audio5.wav'>audio5.wav</a>
</audio>
如果您想知道为什么这听起来就像手机上的按键音,那是因为手机按键会播放两个纯音,作为声音来识别您按下的按钮。
If your wondering why that sounds like the button tones on a phone, it's because phone buttons play two pure tones as the sound to identify which button you pushed.
Now let's look at the signal plots for the two audio files:
现在让我们看看两个音频文件的信号图:
```python
plt.figure(figsize=(10, 2.5))
......@@ -241,24 +187,18 @@ plt.plot(kiss);
plt.show()
```
![png](img/1.1_sound_23_0.png)
```python
plt.figure(figsize=(10, 2.5))
plt.plot(ahhh); # notice this one has two plots because it is a stereo signal
```
![png](img/1.1_sound_24_0.png)
那些复杂的信号都可以被分解成一系列纯音正弦波的加法。 正弦波的频率表示音频信号中存在的声音(音调)的频率。 我认为人类可以听到大约 150Hz 到 17,000Hz 的声音。
Those complicated signals can all be decomposed into the addition of a series of pure-tone sine waves. The frequency of the sine waves indicate the frequencies of sound (pitch) present in the audio signal. Humans can hear from about 150Hz to 17,000Hz I think.
A really cool looking plot is the so-called spectrogram, which shows the frequencies in use at a particular moment in time:
一个非常酷的图是所谓的频谱图,它显示了特定时刻存在的频率:
```python
fs = 44100 # sampling frequency
......@@ -275,9 +215,9 @@ plt.show()
![png](img/1.1_sound_26_0.png)
### Exercises
### 练习
Ok, now that we've got an idea how computers represent music, let's do a lab that lets you try playing sounds via Python. But first, we really need to do a simple "hello" program to introduce the Python tools we'll use in this class.
好了,现在我们已经知道了计算机如何表示音乐,让我们做一个实验,让你尝试通过 Python 播放声音。 但首先,我们真的需要做一个简单的“欢迎”程序,来介绍我们将在这个类中使用的 Python 工具。
* [A first taste of Python tools](../labs/hello.md)
* [Playing sounds](../labs/sound.md)
* [Python 工具的初次尝试](../labs/hello.md)
* [播放声音](../labs/sound.md)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册