# 面向对象编程 ## 大揭秘 到目前为止,我们一直在使用函数和函数包,以及定义我们自己的函数。 但事实证明,我们一直在使用对象,我们只是没有认识到它们。 例如, ```python x = 'Hi' x.lower() # 'hi' ``` 字符串`x`是我们可以发送消息的对象。 ```python print( type(x) ) # ``` 甚至整数都是对象: ```python print(dir(99)) # ['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes'] ``` *类*是对象的蓝图,基本上是类型的名称,在这种情况下是`str`。 对象称为类的*实例*。 在`x.lower()`中,我们将`lower`消息发送到`x`字符串对象。 消息实际上只是与类/对象相关的函数。 ```python x.lower # ``` 在不支持对象学习编程的语言中,我们会做类似的事情: ```python lower(x) ``` Python 有函数和对象重编程,这就是为什么有`x.lower()`和: ```python len(x) # 2 ``` 函数或“消息”的选择取决于库设计者,但是`lower`仅对字符串有意义,因此将它与`str`的定义组合起来是有意义的。 然而,在实现方面,`x.lower()`实际上实现为`str.lower(x)`其中`str`是字符串的类定义。 电脑处理器了解函数调用; 他们不理解对象,所以我们在 Python 解释器本身中执行了这个翻译。 # 包 VS 对象成员 让我们直截了当。点`.`运算符在 Python 中重载,表示包成员和对象成员访问。你已经熟悉了这个: ```python import numpy as np np.array([1,2,3]) # array([1, 2, 3]) ``` ```python import math math.log(3000) # 8.006367567650246 ``` 阅读代码时,这是一个常见的混淆点。 当我们看到`a.f()`时,我们不知道函数`f`是由`a`标识的包的成员,还是由`a`引用的对象的成员。 在`wordsim`项目中,你定义了一个名为`wordsim.py`的文件,然后我的`test_wordsim.py`文件执行`from wordsim import *`来导入`wordsim.py`中的所有函数。 ## 练习 在下文中,将*标识符*(单词)标识为包或函数或字段: 1. `np.log(3)` 1. `np.linalg.norm(v)` 1. `from sklearn.ensemble import RandomForestRegressor` 1. `pd.read_csv("foo.csv")` 1. `pd.read_csv` 1. `'hi'.lower()` 1. `'hi'.lower` 1. `df_train.columns` 1. `np.pi` 1. `img = img.convert("L")` 现在,确定子表达式的数据类型,并将*标识符*(单词)标识为包或函数或字段: 1. `df["saledate"].dt.year` 1. `df_train.isnull().any().head(60)` # 字段 VS 方法 对象具有函数,我们将其称为*方法*,以将它们与不与对象关联的函数区分开来。 对象也有变量,我们称之为*字段*或*实例变量*。 字段是对象的*状态*。 方法是对象的*行为*。 我们一直在使用字段,例如`df.columns`,它获取数据帧中的列名的列表。 ```python import datetime now = datetime.date.today() print( type(now) ) print( now.year ) # access field year print( now.month ) ''' 2018 8 ''' ``` 如果尝试在没有括号的情况下访问对象函数,则表达式将计算为函数对象本身而不是调用它: ```python s='hi' s.title # ``` # 简单的对象定义 类是多个对象的蓝图,通常称为*实例*。 类*封装了对象的状态和行为。 想象一下外星人在你家后院的土地上,并要求你描述一辆汽车。 您可能会描述其属性,例如轮子的数量及其功能,例如可以启动和停止。 这些是状态和行为。 通过定义它们,我们有效地定义了对象。 类名仅仅为实体命名。 按照惯例,类名称应该像“Point”一样大写。 ## 作为对象替代品的元组 对象的字段是我们想要关联在一起的数据项。 例如,如果我想跟踪书名/作者,我可以使用元组列表: ```python from lolviz import * books = [ ('Gridlinked', 'Neal Asher'), ('Startide Rising', 'David Brin') ] objviz(books) ``` ![svg](img/2.8_OO_26_0.svg) ```python for b in books: print(f"{b[1]}: {b[0]}") ''' Neal Asher: Gridlinked David Brin: Startide Rising ''' ``` ```python # Or, more fancy for title, author in books: print(f"{author}: {title}") ''' Neal Asher: Gridlinked David Brin: Startide Rising ''' ``` 为了在两种情况下访问元组的元素,我们必须跟踪我们头脑中的顺序。 换句话说,我们必须访问元组元素,就像它们是列表元素一样。 ## 形式对象 更好的方法是正式声明,作者和标题数据元素应该封装到称为书籍的单个实体中。我认为 Python 的规范非常古怪,但它非常灵活。 例如,我们可以定义一个没有方法没有字段的对象,但是可以使用赋值语句动态添加字段: ```python class Book: pass b = Book() print(b) b.title = 'Gridlinked' b.author = 'Neal Asher' print(b.title, b.author) objviz(b) ''' <__main__.Book object at 0x115c51fd0> Gridlinked Neal Asher ''' ``` ![svg](img/2.8_OO_31_1.svg) 但这并不能让我们定义与该对象相关的方法(很容易)。让我们看看我们的第一个真正的类定义,它包含一个名为*构造器*的函数。 ```python class Book: def __init__(self, title, author): self.title = title self.author = author self.chapters = [] ``` 构造器通常根据参数设置初始和默认字段值。 在对象中定义的所有方法,函数必须有一个名为`self`的显式第一个参数。 这是正在考虑的对象。 然后我们可以使用实例创建语法`Book(..., ...)`来创建一列`Book`类的书籍对象或实例: ```python books = [ Book('Gridlinked', 'Neal Asher'), Book(title='David Brin', author='Startide Rising') ] ``` ```python objviz(books) ``` ![svg](img/2.8_OO_36_0.svg) ```python for b in books: print(f"{b.author}: {b.title}") # access fields ''' Neal Asher: Gridlinked Startide Rising: David Brin ''' ``` 请注意,我们不会将`self`参数传递给构造函数。 **它在调用一侧隐藏,但在定义一侧出现!** ## 顽皮的行为 还要注意我们一直在使用构造函数设置对象的字段,Python 以其无限的灵活性允许你做非常顽皮的事情,比如在任意对象上设置字段: ```python class Foo: pass # just says "empty" x = Foo() x.foo = 3 ``` 即使类本身没有定义`foo`,也不会出错! 您甚至可以[动态添加方法](https://stackoverflow.com/questions/972/adding-a-method-to-an-existing-object-instance)。 # 定义方法 如果您尝试打印一本书,您将只看到类型信息和物理内存地址: ```python print(books[0]) # <__main__.Book object at 0x115c51eb8> ``` ```python class Book: def __init__(self, title, author): self.title = title self.author = author def __str__(self): # called when conversion to string needed like print return f"Book({self.title}, {self.author})" def __repr__(self): # called in interactive mode return self.__str__() # call the string books = [ Book('Gridlinked', 'Neal Asher'), Book('Startide Rising', 'David Brin') ] ``` ```python print(books[0]) # calls __str__() books[0] # calls __repr__() ''' Book(Gridlinked, Neal Asher) Book(Gridlinked, Neal Asher) ''' ``` 确保使用`self.x`来引用字段`x`,否则你在方法中创建一个局部变量: ```python class Foo: def __init__(self): self.x = 0 def foo(self): x = 3 # WARNING: does not alter the field! should be self.x ``` 让我们创建另一种设置销售图书数量的方法。 ```python class Book: def __init__(self, title, author): self.title = title self.author = author self.sold = 0 # set default def sell(self, n): self.sold += n def __str__(self): # called when conversion to string needed like print return f"Book({self.title}, {self.author}, sold={self.sold})" def __repr__(self): # called in interactive mode return self.__str__() # call the string ``` ```python b = Book('Gridlinked', 'Neal Asher') print(b) b.sell(100) # Book.sell(b, 100) print(b) ''' Book(Gridlinked, Neal Asher, sold=0) Book(Gridlinked, Neal Asher, sold=100) ''' ``` **注意**:在方法定义中,我们调用同一对象上的其他方法,使用`self.foo(...)`调用方法`foo`。 ## The key to understanding methods versus functions `b.sell(100)` **method call** is translated and executed by the Python interpreter as **function call** `Book.sell(b,100)`. `b` becomes parameter `self` and so the `sell()` function is updating book `b`. Why we prefer `b.sell(100)` over `Book.sell(b,100)`: Instead of just functions, we send messages back and forth between objects. Instead of bark(dog) we say dog.bark() or instead of inflate(ball) we say ball.inflate(). ## Exercise Real-world objects contain ... and ... A software object's state is stored in ... A software object's behavior is exposed through ... A blueprint for a software object is called a ... ## Exercise Define a class called `Point` that has a constructor taking x, y coordinates and make them fields of the class. Define method `distance(q)` that takes a `Point` and returns the Euclidean distance from `self` to `q`. Test with ``` p = Point(3,4) q = Point(5,6) print(p.distance(q)) ``` Add method `__str__` so that `print(q)` prints something nice like `(3,4)`. ### Solution ```python import numpy as np class Point: def __init__(self, x, y): self.x = x self.y = y def distance(self, other): return np.sqrt( (self.x - other.x)**2 + (self.y - other.y)**2 ) def __str__(self): return f"({self.x},{self.y})" ``` ```python p = Point(3,4) q = Point(5,6) print(p, q) print(p.distance(q)) ''' (3,4) (5,6) 2.8284271247461903 ''' ``` # Inheritance Defining something new as it relates to something we already understand is usually a lot easier. The same thing is true in programming. Let's start with an account object: ```python class Account: def __init__(self, starting): self.balance = starting def add(self, value): self.balance += value def total(self): return self.balance ``` ```python a = Account(100.0) a.add(15) a.total() # 115.0 ``` ```python objviz(a) ``` ![svg](img/2.8_OO_65_0.svg) Inheritance behaves like an `import` or include operation from another class into a new class. (*Note that this is not really true, but we can think of it as an include for our purposes.*) If we do not specify a superclass, class `object` is the implicit superclass. That class is called the root of the class hierarchy and defines a number of standard methods: ```python x = object() # yes, we can make a generic object print(dir(x)) # ['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__'] ``` We could define an interest-bearing account as it differs from a regular account: ```python class InterestingAccount(Account): # derive from super class to get subclass def __init__(self, starting, rate): self.balance = starting # super().__init__(starting) self.rate = rate def total(self): # OVERRIDE method return self.balance + self.balance * self.rate b = InterestingAccount(100.0, 0.15) b.add(15) b.total() ``` ```python objviz(b) ``` ![svg](img/2.8_OO_70_0.svg) The key is that we get to use `add()` without having to redefine it in `InterestingAccount` and `InterestingAccount` also gets to override what the `total()` of the account is. We have *reused* and *refined* previous functionality. You can think of the superclass as defining some initial functions that we can reuse or override in the subclass. We can also *extend* the functionality by adding a method that is not in the superclass. ```python class InterestingAccount(Account): # derive from super class to get subclass def __init__(self, starting, rate): super().__init__(starting) # does self.balance = starting above self.rate = rate def total(self): # OVERRIDE method return self.balance + self.balance * self.rate def profit(self): return self.balance * self.rate ``` ```python b = InterestingAccount(100.0, 0.15) b.add(15) b.profit() # 17.25 ``` ```python a = Account(100.0) b = InterestingAccount(100.0, 0.15) print(type(a)) print(type(b)) ''' ''' ``` The class definitions are actually objects themselves that you can access with a secret field of any object: ```python print(b.__class__) print(b.__class__.__base__) ''' ''' ``` ## Exercise 1. What is a class? 1. What's the difference between a class and an instance? 1. Define a new instance of class `Foo` using a constructor that takes no arguments. 1. What is the syntax to access a field of an object? 1. How does a method differ from a function? 1. What does the `__init__` method do? 1. Given classes `Employee` and `Manager`, which is the superclass in which is the subclass? 1. Can a method in a subclass call a method defined in the superclass? 1. How do you override a method inherited from your superclass? ## Dynamic dispatch (Advanced) When you call `b.add(15)`, Python looks up function `add` within the object definition for `b` (`InterestingAccount`). Because we have inherited that method from the superclass, subclass knows about it. When we call `b.total()`, Python again looks up method within `InterestingAccount` and finds an overridden method. That is why `b.total()` doesn't invoke the `Account` version. This behavior is desirable but extremely confusing at first. Here is an example of it in action where I have added a `__str__` method to the superclass: ```python class Account: def __init__(self, starting): self.balance = starting def add(self, value): self.balance += value def total(self): return self.balance def __str__(self): return f"Balance {self.total()}" # can call 2 different functions class InterestingAccount(Account): # derive from super class to get subclass def __init__(self, starting, rate): self.balance = starting self.rate = rate def total(self): # OVERRIDE method return self.balance + self.balance * self.rate def profit(self): return self.balance * self.rate ``` The devious part is that `__str__` in `Account` calls `Account.total()` or `InterestingAccount.total()`, depending on the type of `self`: ```python a = Account(100.0) b = InterestingAccount(100.0, 0.15) print(a) # calls Account.total() print(b) # calls InterestingAccount.total() ''' Balance 100.0 Balance 115.0 ''' ``` ## Exercise Define a `Point3D` that inherits from `Point`. Define constructor that takes x,y,z values and sets fields. Call `super().__init__(x,y)` to call constructor of superclass. Define / override `distance(q)` so it works with 3D field values to return distance. Test with ``` p = Point3D(3,4,9) q = Point3D(5,6,10) print(p.distance(q)) ``` Add method `__str__` so that `print(q)` prints something nice like `(3,4,5)`. Recall: $dist(x,y) = \sqrt{(x_1-y_1)^2 + (x_2-y_2)^2 + (x_3-y_3)^2)}$ ### Solution ```python import numpy as np class Point3D(Point): def __init__(self, x, y, z): # reuse/refine super class constructor super().__init__(x,y) self.z = z def distance(self, other): return np.sqrt( (self.x - other.x)**2 + (self.y - other.y)**2 + (self.z - other.z)**2 ) def __str__(self): return f"({self.x},{self.y},{self.z})" ``` ```python p = Point3D(3,4,9) q = Point3D(5,6,10) print(p.distance(q)) # 3.0 ``` # Rationale and general thoughts Because the mind of a hunter-gatherer views the world as a collection of objects that interact by sending messages, an OO programming paradigm maps well to the real world problems we try to simulate via computer. Further, we are at our best when programming the way our minds are hardwired to think. In general when writing software, we try to map real-world entities onto programming constructs. If we take a word problem, the nouns typically become objects and the verbs typically become methods within these objects. Because we can specify how differently-typed objects are similar, we can define new objects as they differ from existing objects. By correctly relating similar classes by their category/commonality/ similarity, code reuse occurs as a side-effect of inheritance. Non-OO languages are inflexible/brittle because the exact type of variables must be specified. In OO languages, *polymorphism* is the ability to refer to groups of similar but different types using a single type reference.