Fundamentally programmers communicate with code. We not only express our thoughts to the computer but also to other developers. So far we have focused on designing programs and writing Python code. This is the key creative process but, in order to write code, programmers must be able to read code written by others.
***Gain new experience**. Just as in natural language where we learn to speak by listening to others, we learn programming techniques by recognizing cool patterns in the code of others. Being able to quickly read code allows you to gain experience watching a programming lecture or video.
***Find and adapt code snippets**. We can often find hints or solutions to a coding problem through code snippets found via Google search or at [StackOverlow](https://stackoverflow.com/). Be careful here that you do not violate copyright laws or, in the case of student projects, academic honesty rules.
***Discover the behavior of library functions or other shared code**. The complete behavior of a library function is not always clear from the name or parameter list. Looking at the source code for that function is the best way to understand what it does. The code **is** the documentation.
***Uncover bugs, in our code or others' code**. All code has bugs, particularly code we just wrote that has not been tested exhaustively. As part of the coding process, we are constantly bouncing around, reading our existing code base to make sure everything fits together.
<imgsrc="img/redbang.png"width="30"align="left">While we're discussing library functions, let me highlight a golden rule: *You should never ever ask your fellow programmers about the details of parameters and return values from library functions.* You can easily discover this yourself using "jump to definition" in PyCharm or by searching on the web.
<imgsrc="img/redbang.png"width="30"align="left">
The purpose of this document is to explain how exactly a programmer reads code. Our first clue comes from the fact that we are not computers, hence, we should not read code like a computer, examining one symbol after the other. Instead, we're going to look for key elements and code patterns.
This is what we do when reading sentences in a foreign language. For example, my French is pretty bad so, when reading a French sentence, I have to consciously ask *who is doing what to whom*. In practice, that means identifying the subject, the verb, and the object. From these key elements, I try to imagine the thought patterns in the mind of the author. I am essentially trying to reverse the process followed by the author.
In the programming world, the process goes like this: The code author might have thought "*convert prices to a new list by dividing by 2*", which they converted to "map" pseudocode and finally to a Python `for` loop. When reading that loop code, our job is to reverse the process and imagine the original goal of the author. We are not trying to figure out the emergent behavior of the code by simulating it in our heads or on paper; rather we are looking for patterns that tell us what high-level operations are being performed.
That's why you should emphasize clarity when writing code, so that reading the code more easily leads the reader to your intentions. There is an excellent quote (by [John F. Woods](https://groups.google.com/forum/#!msg/comp.lang.c++/rYCO5yn4lXw/oITtSkZOtoUJ) I think) that summarizes things well:
> Always code as if the person who ends up maintaining your code will be a violent psychopath who knows where you live.
这就是为什么在编写代码时应该强调清晰度,以便读者阅读更多内容。[约翰 F. 伍兹](https://groups.google.com/forum/#!msg/comp.lang.c++/rYCO5yn4lXw/oITtSkZOtoUJ) 有一个很好的引言,总结了很多东西:
## Getting the gist of a program
> 写代码的时候总是想象,维护你代码的家伙是一个知道你住在哪里的暴力精神病患者。
When looking at a textbook for the first time, it makes sense to scan through the table of contents to get an overall view of the book content. The same is true when looking at a program for the first time. Look through all of the files and the names of the functions contained in those files. Also figure out where the main program is. Depending on your goal in reading the program, you might start stepping through the main program or immediately jump to a function of interest.
## 获得程序的要点
It's also useful to look at the input-output pairs of the program from sample runs or unit tests, because it helps you understand the program's functionality. In some sense, we are reverse-engineering the program work plan by examining and testing the program. Previously, we used the program work plan in the forward direction to design programs.
Once we identify a main program or function to examine, it's time to reverse-engineer the function work plan. The function's name is perhaps the biggest clue as to what the function does, assuming the code author was a decent programmer. (Using a generic function name like `f` is how faculty write code-reading questions without giving away the answer.) For example, there is no doubt what the following function's goal is:
even without looking at the arguments or the function statements.
即使不查看参数或函数语句。
Often programmers will provide comments about the usage of a function, but be careful. Often programmers change the code without changing the comments and so the comments will be misleading. An acceptable comment might look like:
If we're lucky, that comment corresponds to the function objective description in the work plan.
如果我们幸运的话,该注释对应于工作计划中的函数目标描述。
The next step is to identify the parameters and return value. Again, the names of the parameters often tell us a lot but, unfortunately, Python usually does not have explicit parameter types (they aren't checked by Python anyway) so we have to figure that out ourselves. Knowing the types of values and variables is critical to understanding a program. In a simple function like this, we can usually figure out the types of the parameters and the return values quickly. In other cases, we will have to dig through the statements of the function to figure this out (more on this later). Let's zoom in to see more detail about our function:
At this point, we know that `data` is almost certainly a list of numbers and the function returns a single number. That means we can fill in the first part of the work plan for the function.
Because we have prior knowledge of what the average is, we can fill in the work plan description of the function objective. In general, though, we have to scan the statements of the function to figure that out. (We might get lucky and find a reasonable function comment as well.) Let's look at the full function now:
An inexperienced programmer must examine the statements of the function individually and literally, emulating a computer to figure out the emergent behavior. In contrast, *an experienced programmer looks for patterns in the code that represent implementations* of high-level operations like map, search, filter, etc...
By analogy, consider memorizing the state of a chessboard in the middle of play. A beginner has to memorize where all of the pieces are individually whereas a chessmaster recognizes that the board is, say, merely a variation on the Budapest Gambit.
How do we know where to start and what to look at? Well, let's think back to our generic data science program template:
我们如何知道从哪里开始以及看什么? 那么,让我们回想一下我们的通用数据科学程序模板:
1.Acquire data, which means finding a suitable file or collecting data from the web and storing in a file
2.Load data from disk and place into memory organized into data structures
2. Normalize, clean, or otherwise prepare data
3. Process the data, which can mean training a machine learning model, computing summary statistics, or optimizing a cost function
4. Emit results, which can be anything from simply printing an answer to saving data to the disk to generating a fancy visualization
1. 获取数据,这意味着找到合适的文件或从 Web 收集数据并存储在文件中
2. 从磁盘加载数据并放入组织成数据结构的内存中
3. 规范,清理或以其他方式准备数据
4. 处理数据,这可能意味着训练机器学习模型,计算汇总统计量或优化成本函数
5. 输入结果,可以是任何东西,从简单地打印答案,到将数据保存到磁盘,以及生成花哨的可视化
The gist of that process is to load data into a handy data structure and process it. What do loading data, creating a data structure, and processing a data structure have in common? They all repeatedly execute a set of operations, which means that the gist of a program that processes data is looping. (There is even a famous book is entitled [Algorithms + Data Structures = Programs](https://www.amazon.com/Algorithms-Structures-Prentice-Hall-Automatic-Computation/dp/0130224189) where *algorithm* means a process described by pseudocode or code.) A program that does not loop would likely be very boring as it could not traverse a data structure or process a data file.
From this, we can conclude that all of the action occurs in loops so we should **look for loops in the code first**. Reading code is a matter of finding such templates in the code of a function, which immediately tells us the kind of operation or pattern the author intended.
Let's dig through some loop examples, trying to identify the high-level patterns and corresponding operations. The key elements to look for are the holes in the templates we studied. This usually means identifying the loop variable, the loop bounds, which data structure we're traversing, and the operation performed on the data elements. **The goal is to reverse engineer the intentions of the code author.**
**Exercise**: To get started, what is the operation corresponding to the code pattern in the `sum` function above?
**练习**:首先,上面的`sum`函数中的代码模式的对应操作是什么?
```python
sum=0.0
...
...
@@ -101,9 +103,9 @@ for x in data:
sum=sum+x
```
That's an accumulator.
那是一个累积器。
**Exercise**: Let's look at a loop where I have deliberately used crappy variable names so you have to focus at the functionality.
**练习**:让我们看一个循环,我故意使用蹩脚的变量名称,所以你必须专注于功能。
```python
foo=[]
...
...
@@ -111,9 +113,9 @@ for blah in blort:
foo.append(blah*2)
```
That's a map operation, which we can see from the initialization of an empty target list and the `foo.append(...)` call. The `blah * 2` is not relevant to finding the pattern other than the fact that the target list is a function of `blah`, which comes from the source list `blort`.
**Exercise**: What kind of loop (for-each, indexed, nested, etc...) do you see in the following code? What kind of high level operation is the code performing?
That's an indexed-loop that again does a map operation. The clue that it is an indexed loop is that the bounds are `range(len(foo))` which is giving a range of indices. Because of the `blort.append` and reference to `foo[boo]`, we know it is a map operation. We know that `foo` is a list of some kind because of the `[boo]` index operator.
**Exercise**: What high-level math operation is this code performing?
**练习**:此代码执行什么高级数学运算?
```python
foriinrange(n):
...
...
@@ -141,9 +143,9 @@ for i in range(n):
C[i][j]=A[i][j]+B[i][j]
```
Matrix addition. It's important here to recognize that a nested indexed-loop gives all combinations of the loop variables, `i` and `j`, in the range [0..n). One of the most common reasons to do this is to iterate through the elements of a matrix or an image. The answer here could also be image addition.