提交 4a6fd4b5 编写于 作者: W wizardforcel

2.7

上级 ad50ff0b
# How to read code # 如何阅读代码
Fundamentally programmers communicate with code. We not only express our thoughts to the computer but also to other developers. So far we have focused on designing programs and writing Python code. This is the key creative process but, in order to write code, programmers must be able to read code written by others. 从根本上说,程序员与代码交流。我们不仅向计算机,也向其他开发人员表达了我们的想法。到目前为止,我们专注于设计程序和编写 Python 代码。这是关键的创作过程,但是,为了编写代码,程序员必须能够阅读其他人编写的代码。
## Why read code? ## 为什么阅读代码
We read code in order to: 我们阅读代码以便:
* **Gain new experience**. Just as in natural language where we learn to speak by listening to others, we learn programming techniques by recognizing cool patterns in the code of others. Being able to quickly read code allows you to gain experience watching a programming lecture or video. * **获得新体验**。就像在自然语言中,我们通过倾听他人来学习说话一样,我们通过识别他人代码中的酷炫模式来学习编程技巧。能够快速阅读代码,使您可以获得观看编程讲座或视频的经验。
* **Find and adapt code snippets**. We can often find hints or solutions to a coding problem through code snippets found via Google search or at [StackOverlow](https://stackoverflow.com/). Be careful here that you do not violate copyright laws or, in the case of student projects, academic honesty rules. * **查找并修改代码段**。我们经常可以通过试用 Google 搜索或 [StackOverlow](https://stackoverflow.com/)找到的代码段,找到编码问题的提示或解决方案。 请注意,您不违反版权法,如果是学生项目,则不违反学术诚信规则。
* **Discover the behavior of library functions or other shared code**. The complete behavior of a library function is not always clear from the name or parameter list. Looking at the source code for that function is the best way to understand what it does. The code **is** the documentation. * **发现库函数或其他共享代码的行为**。 从名称或参数列表中并不总是清楚库函数的完整行为。 查看该函数的源代码是了解它的作用的最佳方法。代码**就是**文档。
* **Uncover bugs, in our code or others' code**. All code has bugs, particularly code we just wrote that has not been tested exhaustively. As part of the coding process, we are constantly bouncing around, reading our existing code base to make sure everything fits together. * **在我们的代码或其他代码中发现错误**。 所有代码都有错误,特别是我们刚刚编写,但没有经过详尽测试的代码。作为编码过程的一部分,我们不断跳来跳去,阅读我们现有的代码库,来确保一切都组合在一起。
<img src="img/redbang.png" width="30" align="left">While we're discussing library functions, let me highlight a golden rule: *You should never ever ask your fellow programmers about the details of parameters and return values from library functions.* You can easily discover this yourself using "jump to definition" in PyCharm or by searching on the web. <img src="img/redbang.png" width="30" align="left">
The purpose of this document is to explain how exactly a programmer reads code. Our first clue comes from the fact that we are not computers, hence, we should not read code like a computer, examining one symbol after the other. Instead, we're going to look for key elements and code patterns. 在我们讨论库函数时,让我强调一条黄金法则:*你永远不应该向你的程序员询问参数的细节和库函数的返回值。*你可以通过 PyCharm 中的“跳转到定义”或网络搜索来自己发现它。
This is what we do when reading sentences in a foreign language. For example, my French is pretty bad so, when reading a French sentence, I have to consciously ask *who is doing what to whom*. In practice, that means identifying the subject, the verb, and the object. From these key elements, I try to imagine the thought patterns in the mind of the author. I am essentially trying to reverse the process followed by the author. 本文档的目的是解释程序员如何读取代码。 我们的第一个线索来自于我们不是计算机这一事实,因此,我们不应该像计算机一样阅读代码,一个接一个地检查一个符号。 相反,我们将寻找关键元素和代码模式。
In the programming world, the process goes like this: The code author might have thought "*convert prices to a new list by dividing by 2*", which they converted to "map" pseudocode and finally to a Python `for` loop. When reading that loop code, our job is to reverse the process and imagine the original goal of the author. We are not trying to figure out the emergent behavior of the code by simulating it in our heads or on paper; rather we are looking for patterns that tell us what high-level operations are being performed. 这就是我们用外语阅读句子时所做的事情。 例如,我的法语非常糟糕,因此,在阅读法语句子时,我必须有意识地询问*谁在对谁做什么*。在实践中,这意味着识别主语,动词和宾语。从这些关键要素中,我试图想象作者心中的思维模式。基本上我试图反转作者所遵循的过程。
That's why you should emphasize clarity when writing code, so that reading the code more easily leads the reader to your intentions. There is an excellent quote (by [John F. Woods](https://groups.google.com/forum/#!msg/comp.lang.c++/rYCO5yn4lXw/oITtSkZOtoUJ) I think) that summarizes things well: 在编程世界中,过程如下:代码作者可能会想到“*通过除以 2 *将价格转换为新列表”,然后将它们转换为“映射”的伪代码,最后转换为 Python `for`循环。在阅读循环代码时,我们的工作是反转过程,并想象作者的原始目标。 我们不是试图通过在我们的头脑或纸上模拟它,来弄清楚代码的突现行为;相反,我们正在寻找模式,它们能够告诉我们正在执行哪些高级操作。
> Always code as if the person who ends up maintaining your code will be a violent psychopath who knows where you live. 这就是为什么在编写代码时应该强调清晰度,以便读者阅读更多内容。[约翰 F. 伍兹](https://groups.google.com/forum/#!msg/comp.lang.c++/rYCO5yn4lXw/oITtSkZOtoUJ) 有一个很好的引言,总结了很多东西:
## Getting the gist of a program > 写代码的时候总是想象,维护你代码的家伙是一个知道你住在哪里的暴力精神病患者。
When looking at a textbook for the first time, it makes sense to scan through the table of contents to get an overall view of the book content. The same is true when looking at a program for the first time. Look through all of the files and the names of the functions contained in those files. Also figure out where the main program is. Depending on your goal in reading the program, you might start stepping through the main program or immediately jump to a function of interest. ## 获得程序的要点
It's also useful to look at the input-output pairs of the program from sample runs or unit tests, because it helps you understand the program's functionality. In some sense, we are reverse-engineering the program work plan by examining and testing the program. Previously, we used the program work plan in the forward direction to design programs. 在第一次查看教科书时,扫描目录来获得书籍内容的整体视图,是有意义的。 第一次看节目时也是如此。 查看所有文件以及这些文件中包含的函数的名称。 同时,找出主程序的位置。 根据您在程序中的目标,您可能会开始单步执行主程序或立即跳转到感兴趣的函数。
## Getting the gist of a function 从样例运行或单元测试中查看程序的输入 - 输出对也很有用,因为它可以帮助您了解程序的功能。 从某种意义上说,我们通过检查和测试程序,对程序的工作计划进行逆向工程。 以前,我们在前进方向使用程序的工作计划来设计程序。
Once we identify a main program or function to examine, it's time to reverse-engineer the function work plan. The function's name is perhaps the biggest clue as to what the function does, assuming the code author was a decent programmer. (Using a generic function name like `f` is how faculty write code-reading questions without giving away the answer.) For example, there is no doubt what the following function's goal is: ## 获得函数的要点
一旦我们确定了要检查的主程序或函数,就应该对函数的工作计划进行反向工程。 函数的名称可能是函数功能的最大线索,假设代码作者是一个不错的程序员。 (使用像`f`这样的通用函数名称,是教师在不泄露答案的情况下,编写代码阅读问题的方式。)例如,毫无疑问,以下函数的目标是什么:
```python ```python
def average(...): def average(...):
... ...
``` ```
even without looking at the arguments or the function statements. 即使不查看参数或函数语句。
Often programmers will provide comments about the usage of a function, but be careful. Often programmers change the code without changing the comments and so the comments will be misleading. An acceptable comment might look like: 程序员通常会提供函数用法的注释,但要小心。 程序员通常会在不更改注释的情况下更改代码,因此注释会产生误导。可接受的注释可能如下所示:
```python ```python
def average(...): def average(...):
...@@ -48,9 +50,9 @@ def average(...): ...@@ -48,9 +50,9 @@ def average(...):
... ...
``` ```
If we're lucky, that comment corresponds to the function objective description in the work plan. 如果我们幸运的话,该注释对应于工作计划中的函数目标描述。
The next step is to identify the parameters and return value. Again, the names of the parameters often tell us a lot but, unfortunately, Python usually does not have explicit parameter types (they aren't checked by Python anyway) so we have to figure that out ourselves. Knowing the types of values and variables is critical to understanding a program. In a simple function like this, we can usually figure out the types of the parameters and the return values quickly. In other cases, we will have to dig through the statements of the function to figure this out (more on this later). Let's zoom in to see more detail about our function: 下一步是确定参数和返回值。 同样,参数的名称经常告诉我们很多,但不幸的是,Python 通常没有明确的参数类型(它们不会被 Python 检查)所以我们必须自己解决这个问题。 了解值和变量的类型对于理解程序至关重要。 在这样的简单函数中,我们通常可以快速找出参数的类型和返回值。 在其他情况下,我们将不得不深入研究函数的语句来解决这个问题(稍后会详细介绍)。 让我们放大来查看我们函数的更多细节:
```python ```python
def average(data): def average(data):
...@@ -58,11 +60,11 @@ def average(data): ...@@ -58,11 +60,11 @@ def average(data):
return sum / n return sum / n
``` ```
At this point, we know that `data` is almost certainly a list of numbers and the function returns a single number. That means we can fill in the first part of the work plan for the function. 在这一点上,我们知道`data`几乎肯定是一个数字列表,函数返回一个数字。 这意味着我们可以填写该功能的工作计划的第一部分。
## What to look for in function code ## 在函数代码中寻找什么
Because we have prior knowledge of what the average is, we can fill in the work plan description of the function objective. In general, though, we have to scan the statements of the function to figure that out. (We might get lucky and find a reasonable function comment as well.) Let's look at the full function now: 因为我们事先知道平均值是什么,所以我们可以填写函数目标的工作计划描述。 但是,一般来说,我们必须扫描函数的语句才能弄明白。 (我们可能会很幸运并找到合理的函数注释。)现在让我们看一下完整的函数:
```python ```python
def average(data): def average(data):
...@@ -73,27 +75,27 @@ def average(data): ...@@ -73,27 +75,27 @@ def average(data):
return sum / n return sum / n
``` ```
An inexperienced programmer must examine the statements of the function individually and literally, emulating a computer to figure out the emergent behavior. In contrast, *an experienced programmer looks for patterns in the code that represent implementations* of high-level operations like map, search, filter, etc... 缺乏经验的程序员必须单独和逐字地检查函数的语句,模拟计算机来找出突现行为。 相比之下,*经验丰富的程序员在代码中寻找模式,代表映射,搜索,过滤等高级操作的实现*.....
By analogy, consider memorizing the state of a chessboard in the middle of play. A beginner has to memorize where all of the pieces are individually whereas a chessmaster recognizes that the board is, say, merely a variation on the Budapest Gambit. 通过类比,考虑在游戏过程中记住棋盘的状态。 初学者必须单独记住所有东西在哪儿,而国际象棋大师则认为棋盘只是布达佩斯开局的变种。
How do we know where to start and what to look at? Well, let's think back to our generic data science program template: 我们如何知道从哪里开始以及看什么? 那么,让我们回想一下我们的通用数据科学程序模板:
1. Acquire data, which means finding a suitable file or collecting data from the web and storing in a file 1. 获取数据,这意味着找到合适的文件或从 Web 收集数据并存储在文件中
2. Load data from disk and place into memory organized into data structures 2. 从磁盘加载数据并放入组织成数据结构的内存中
2. Normalize, clean, or otherwise prepare data 3. 规范,清理或以其他方式准备数据
3. Process the data, which can mean training a machine learning model, computing summary statistics, or optimizing a cost function 4. 处理数据,这可能意味着训练机器学习模型,计算汇总统计量或优化成本函数
4. Emit results, which can be anything from simply printing an answer to saving data to the disk to generating a fancy visualization 5. 输入结果,可以是任何东西,从简单地打印答案,到将数据保存到磁盘,以及生成花哨的可视化
The gist of that process is to load data into a handy data structure and process it. What do loading data, creating a data structure, and processing a data structure have in common? They all repeatedly execute a set of operations, which means that the gist of a program that processes data is looping. (There is even a famous book is entitled [Algorithms + Data Structures = Programs](https://www.amazon.com/Algorithms-Structures-Prentice-Hall-Automatic-Computation/dp/0130224189) where *algorithm* means a process described by pseudocode or code.) A program that does not loop would likely be very boring as it could not traverse a data structure or process a data file. 该过程的要点是,将数据加载到方便的数据结构中并对其进行处理。加载数据,创建数据结构和处理数据结构有什么共同之处?它们都重复执行一组操作,这意味着处理数据的程序的要点是循环。(甚至有一本着名的书名为[算法+数据结构=程序](https://www.amazon.com/Algorithms-Structures-Prentice-Hall-Automatic-Computation/dp/0130224189),其中*算法*表示伪代码或代码描述的过程。)没有循环的程序可能会非常无聊,因为它无法遍历数据结构或处理数据文件。
From this, we can conclude that all of the action occurs in loops so we should **look for loops in the code first**. Reading code is a matter of finding such templates in the code of a function, which immediately tells us the kind of operation or pattern the author intended. 从这里,我们可以得出结论,所有的动作都发生在循环中,所以我们应该首先在代码中寻找循环**。阅读代码是在函数代码中找到这样的模板的问题,它立即告诉我们作者想要的操作或模式的类型。
## Identifying programming patterns in code ## 识别代码中的编程模式
Let's dig through some loop examples, trying to identify the high-level patterns and corresponding operations. The key elements to look for are the holes in the templates we studied. This usually means identifying the loop variable, the loop bounds, which data structure we're traversing, and the operation performed on the data elements. **The goal is to reverse engineer the intentions of the code author.** 让我们深入研究一些循环示例,尝试识别高级模式和相应的操作。 要寻找的关键要素是我们研究的模板中的空位。 这通常意味着识别循环变量,循环边界,我们正在遍历的数据结构以及对数据元素执行的操作。**目标是对代码作者的意图进行逆向工程。**
**Exercise**: To get started, what is the operation corresponding to the code pattern in the `sum` function above? **练习**:首先,上面的`sum`函数中的代码模式的对应操作是什么?
```python ```python
sum = 0.0 sum = 0.0
...@@ -101,9 +103,9 @@ for x in data: ...@@ -101,9 +103,9 @@ for x in data:
sum = sum + x sum = sum + x
``` ```
That's an accumulator. 那是一个累积器。
**Exercise**: Let's look at a loop where I have deliberately used crappy variable names so you have to focus at the functionality. **练习**:让我们看一个循环,我故意使用蹩脚的变量名称,所以你必须专注于功能。
```python ```python
foo = [] foo = []
...@@ -111,9 +113,9 @@ for blah in blort: ...@@ -111,9 +113,9 @@ for blah in blort:
foo.append(blah * 2) foo.append(blah * 2)
``` ```
That's a map operation, which we can see from the initialization of an empty target list and the `foo.append(...)` call. The `blah * 2` is not relevant to finding the pattern other than the fact that the target list is a function of `blah`, which comes from the source list `blort`. 这是一个映射操作,我们可以从空目标列表的初始化和`foo.append(...)`调用中看到。 除了目标列表是`blah`的函数,它来自源列表`blort`之外,`blah * 2`与寻找模式无关。
**Exercise**: What kind of loop (for-each, indexed, nested, etc...) do you see in the following code? What kind of high level operation is the code performing? **练习**:你在下面的代码中看到了什么样的循环(`for-each`,索引,嵌套等等)? 代码执行什么样的高级操作?
```python ```python
blort = [] blort = []
...@@ -121,9 +123,9 @@ for boo in range(len(foo)): ...@@ -121,9 +123,9 @@ for boo in range(len(foo)):
blort.append(foo[boo] * 2) blort.append(foo[boo] * 2)
``` ```
That's an indexed-loop that again does a map operation. The clue that it is an indexed loop is that the bounds are `range(len(foo))` which is giving a range of indices. Because of the `blort.append` and reference to `foo[boo]`, we know it is a map operation. We know that `foo` is a list of some kind because of the `[boo]` index operator. 这是一个索引循环,它再次执行映射操作。 它是一个索引循环的线索是,边界是`range(len(foo))`,它给出一系列索引。 由于`blort.append``foo[boo]`的引用,我们知道它是一个映射操作。 因为`[boo]`索引运算符,我们知道`foo`是某种类型的列表。
**Exercise**: What is the high-level operation corresponding to the pattern in this code: **练习**:对应此代码中模式的高级操作是什么:
```python ```python
foo = [] foo = []
...@@ -131,9 +133,9 @@ for i in range(len(X)): ...@@ -131,9 +133,9 @@ for i in range(len(X)):
foo.append(X[i]+Y[i]) foo.append(X[i]+Y[i])
``` ```
It is combining two columns (lists) into a target column/list `foo`. We know that `X` and `Y` are lists because of the `[i]` array indexing. 它将两列(列表)组合成目标列/列表`foo`。我们知道`X``Y`是列表,因为`[i]`数组索引。
**Exercise**: What high-level math operation is this code performing? **练习**:此代码执行什么高级数学运算?
```python ```python
for i in range(n): for i in range(n):
...@@ -141,9 +143,9 @@ for i in range(n): ...@@ -141,9 +143,9 @@ for i in range(n):
C[i][j] = A[i][j] + B[i][j] C[i][j] = A[i][j] + B[i][j]
``` ```
Matrix addition. It's important here to recognize that a nested indexed-loop gives all combinations of the loop variables, `i` and `j`, in the range [0..n). One of the most common reasons to do this is to iterate through the elements of a matrix or an image. The answer here could also be image addition. 矩阵加法。这里重要的是要认识到,嵌套的索引循环给出了在`[0..n]`范围内的循环变量`i``j`的所有组合。 执行此操作的最常见原因之一是迭代矩阵或图像的元素。 这里的答案也可能是图像加法。
**Exercise**: How many `hi`'s get printed by this loop? **练习**:这个循环打印了多少个`hi`
```python ```python
for i in range(n): for i in range(n):
...@@ -151,9 +153,9 @@ for i in range(n): ...@@ -151,9 +153,9 @@ for i in range(n):
print('hi') print('hi')
``` ```
n * n. The inner loop goes around n times. The outer loop means we perform the entire inner loop n times. `n * n`。内循环`n`次。外循环意味着我们执行整个内循环`n`次。
**Exercise**: **练习**
```python ```python
blort = [] blort = []
...@@ -162,12 +164,12 @@ for foo in A: ...@@ -162,12 +164,12 @@ for foo in A:
blort.append(foo + bar) blort.append(foo + bar)
``` ```
That's finding all conditions from all possible combinations from `A` and `B`. 这从`A``B`的所有可能组合中找到所有情况。
**Exercise**: What is this code doing? I.e., what is the value of `blort` in the abstract after the loop completes? **练习**:这段代码在做什么? 即,循环完成后,`blort`的值是多少?
```python ```python
blort = -99999 blort = float('-inf')
for x in X: for x in X:
if x > blort: if x > blort:
blort = x blort = x
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册