We've studied arrays/lists that are built into Python but they are not always the best kind of list to use. Sometimes, we are inserting and deleting things from the head or middle of the list. If we do this with lists implemented as contiguous arrays (made up of contiguous cells in memory), we have to move a lot of cells around to make room for a new element or to close a hole made by a deletion.
*Linked list* implementations of abstract lists allow us to efficiently insert and remove things anywhere we want. This flexibility comes at the cost of more memory.
The other reason to study linked lists is that the notion of a linked list *node* is easily extended to create hierarchical parent-child data structures called *trees* and then to *graphs* (that track any node->node relationship).
*TODO*: use graphviz or something else to visualize the structures. move the tree discussion to a separate page. This lecture going very quickly, confusing some students, took about 1.5 hours without the trees. Have the students connect a few notes together and visualize it. Start with tuples then do graphviz.
Imagine that I wanted to take roll in class. Since everyone is sitting next to each other, i.e. contiguous, I can simply point from one person to the next by looking to the left or right. That's the way lists work, as contiguous chunks.
A linked list requires everybody to not only remember their name but also who is to the right of them (a next pointer). As long as I remember the first person (the head) in the list, I can call that person later and ask for their name. Then I can ask them to refer to the next person in line. This works even if people distribute across the continent or randomly reassign where they are sitting. There is no requirement that these elements be contiguous because each node in the list has the information needed to get to the next person.
A linked list implementation associates a next pointer with each list value. We call these things *nodes* usually: `[value,next]` or `(value,next)`. We also keep a pointer to the *head* of the list and sometimes the *tail* of the list.
The simplest list has one element with a `next` pointer/reference that points at nothing.
最简单的列表有一个元素,`next`指针/引用不指向任何东西。
```python
users=("parrt",None)
```
<imgsrc="img/links1.png"style="width:200px">
Here's one with two elements:
这是一个有两个元素的链表:
```python
users=("parrt",("tombu",None))
```
<imgsrc="img/links2.png"style="width:290px">
and three elements:
三个元素的:
```python
users=("parrt",("tombu",("afedosov",None)))
```
<imgsrc="img/links3.png"style="width:400px">
In practice, we'll use lists not tuples, because tuples are immutable. We want to be able to change the node `next` pointers:
在实践中,我们将使用列表而不是元组,因为元组是不可变的。 我们希望能够更改节点的`next`指针:
```python
a=["parrt",None]
...
...
@@ -49,14 +49,13 @@ a[1] = b # first node's next points to 2nd element
b[1]=c
```
The most basic implementation of a list is just a *head* pointer (here I'm using `users` for a specific list). Creating an empty linked list is then just a matter of saying `users=None`.
Naturally, we can store any kind of object we want for the value component, not just strings. We can store numbers or even other lists!
Let's add some support code that will make it easier to build and manipulate linked lists. It uses some object-oriented programming syntax from Python, which you can ignore if you want. Basically, I'm defining an object (a data aggregate) called `Node` that will have two fields `value` and `next`. It's much easier to access field names than `p[0]` and `p[1]` for some tuple/list `p`.
With that definition, we can create lists more naturally:
有了这个定义,我们可以更自然地创建列表:
```python
a=Node("parrt")
...
...
@@ -90,18 +88,17 @@ print(users)
'''
```
## Inserting nodes at the head
## 在头部插入节点
Ok, so let's say we create an empty list with `head=None`. To insert something, say, `'hi'` at the head of a linked list, there are two cases: an empty list and a nonempty list. An empty list is a case where `head==None`, the initial conditions of the list. A nonempty list of course will have `head` pointing at some tuple. Both cases can be handled the same way:
That makes a new node holding the new value `'hi'` and a `next` pointer pointing at the old head tuple (even if `None`). Finally, it sets the `head` pointer to point at the new tuple.
Inserting in the middle is more complicated. We need to find the node *after* which we want to insert something. Then we hook in the new node.For example, to insert something between `tombu` and `dmose`, we create a new node whose `next` pointer points at `dmose`'s node (what `tombu` points at. Then we set `tombu`'s `next` pointer to point to the new node containing `"mary"`:
Assuming we have a linked list constructed, how do we walk that list? With an array, we can just access the *i*th value and move *i* along the list. To walk a linked list, we have to define a *cursor* (often called `p` or `q`), which we can think of as just a finger we move along between the nodes in a list. Here's the code pattern to walk a linked list and print out the values:
There's another way to walk data structures that uses what we call *recursion*. We are familiar with the concept from recurrence relations. In this case, note that starting at any node in the linked list, the rest of the list looks like a linked list. It's like a fractal, where no matter how much you zoom in and it still looks like a fractal.
A recursive function call is one that calls the function surrounding it. The most obvious one is:
递归函数调用是调用它周围函数的调用。 最明显的一个是:
```python
deff():
f()
```
But, that's not very useful because it is an infinite loop. The first thing the function `f()` does is call it self, causing a loop. We typically have a termination condition that tells it when to stop recursing.
Create a function called `tostr` that returns a bracketed string like `['parrt', 'tombu', 'dmose']` given a link list `head` as an argument. Hint: Reduce this to a problem we know how to solve. Just add the elements to a regular Python list (`[]`) as you find them in the linked list; then return the string representation of that list using `str()`.
We can also make a recursive version of that function:
我们还可以制作该函数的递归版本:
```python
defrlen(head):
...
...
@@ -233,16 +215,9 @@ rlen(users)
# 5
```
## 练习
## Exercise
Implement method called `getitem(head, i)` that returns the ith node in a list `head` starting from zero. Hint: Combine a counter-accumulator loop with a search loop (look for counter hitting `i`) and the list walker pattern. Another way to think about this is that it is adding a search pattern to the previous length exercise.
To delete the first node of a list, all we have to do is make the `head` point at what the first node's `next` points at, being careful to check for the empty list condition:
Once we're comfortable with the notion of pointers/references from one node to the other, we can extend nodes to have two pointers instead of just one `next` pointer. A *binary tree* consists of nodes that have `left` and `right` child pointers; one or both of those pointers can be `None`. A tree reduces to a linked list if each tree node has at most one child.
Trees are very common data structure in analytics, particularly machine learning. For example, there is a form of clustering called *hierarchical clustering* that constructs trees of related element groups. Then, one of the most powerful machine learning models is called *random forest* and consists of a collection of decision trees that work together to provide a very strong predictor or classifier.
We draw trees upside down from their normal biological perspective. The **root** of the tree is a single node and is drawn at the top of the diagram. The **leaves** of the tree are the children that themselves have no children.
Here is some support code for a tree node that you can safely ignore, except that the tree nodes (data aggregates) contain `value`, `left`, and `right` fields.
While it is most common to use a `while` loop to traverse a linked list, is most common to use recursion to traverse a tree. For example, here is how we would print out all the elements in a tree in *preorder*, meaning printing the value added note before it's children.