diff --git "a/20171016 \347\254\25414\346\234\237/A Quick Introduction to Neural Networks.md" "b/20171016 \347\254\25414\346\234\237/A Quick Introduction to Neural Networks.md" index a0749110e442331d3b2a4f0a8986af2c2e1866aa..3bf2e037af085c3acf6fda8984e29a1ba566bde3 100644 --- "a/20171016 \347\254\25414\346\234\237/A Quick Introduction to Neural Networks.md" +++ "b/20171016 \347\254\25414\346\234\237/A Quick Introduction to Neural Networks.md" @@ -1,189 +1,207 @@ -### A Quick Introduction to Neural Networks +### 神经网络的快速介绍 原文链接:[A Quick Introduction to Neural Networks](https://ujjwalkarn.me/2016/08/09/quick-intro-neural-networks/?from=hackcv&hmsr=hackcv.com&utm_medium=hackcv.com&utm_source=hackcv.com) -An Artificial Neural Network (ANN) is a computational model that is inspired by the way biological neural networks in the human brain process information. Artificial Neural Networks have generated a lot of excitement in Machine Learning research and industry, thanks to many breakthrough results in speech recognition, computer vision and text processing. In this blog post we will try to develop an understanding of a particular type of Artificial Neural Network called the Multi Layer Perceptron. +人工神经网,是受人脑中神经元处理信息的方式激励,通过使用电脑建立的模拟这种数据处理方式的建立起来的模型。由于在语义理解、机器视觉和文本处理方向的突破性成就,人工神经网络令机器学习领域和工业界同仁都感到十分激动。在这篇博文中,我们简单的理解一种比较特殊的人工神经网络——多层感知机。 -#### A Single Neuron +#### **一个单独的神经元** -The basic unit of computation in a neural network is the **neuron**, often called a **node** or **unit**. It receives input from some other nodes, or from an external source and computes an output. Each input has an associated **weight** (w), which is assigned on the basis of its relative importance to other inputs. The node applies a function **f** (defined below) to the weighted sum of its inputs as shown in Figure 1 below: +在神经网络中最基础的计算单元就是**神经元**,经常被叫**节点**或者**单元**。其他节点或者外部数据由它输入,经过计算输出。每个输入有一个与之对应的**权重**W。同时这个节点经过下面的**激活方程**作用到权重矩阵和输入矩阵的乘积之和。 ![Screen Shot 2016-08-09 at 3.42.21 AM.png](https://ujwlkarn.files.wordpress.com/2016/08/screen-shot-2016-08-09-at-3-42-21-am.png?w=568&h=303) ###### Figure 1: a single neuron -The above network takes numerical inputs **X1** and **X2** and has weights **w1** and **w2** associated with those inputs. Additionally, there is another input **1** with weight **b** (called the **Bias**) associated with it. We will learn more details about role of the bias later. +上面的网络需要数值的输入**X1**和**X2**权重**W1**和**W2**分别相乘,另外需要加上一个称为偏置的**b**。以后我们将了解到更多关于偏置的细节信息。 -The output **Y** from the neuron is computed as shown in the Figure 1. The function **f** is non-linear and is called the **Activation Function**. The purpose of the activation function is to introduce non-linearity into the output of a neuron. This is important because most real world data is non linear and we want neurons to *learn* these non linear representations. +图1中从神经元中计算得到的输出**Y**。方程**F**是非线性的一般称为**激活函数**。激活函数的目的就是引入非线性到神经元的输出。这一点的重要性表现在,现实生活中的数据一般都是非线性的,我们需要这些神经元学习表现非线性关系。 -Every activation function (or *non-linearity*) takes a single number and performs a certain fixed mathematical operation on it [2]. There are several activation functions you may encounter in practice: +每一种激活函数使数值产生固定的数学运算。下面介绍几种经常用到的激活函数。 -- **Sigmoid:** takes a real-valued input and squashes it to range between 0 and 1 +- **Sigmoid**激活函数: 接收一个实数值输入值,将值转化到0-1区间之内。 σ(x) = 1 / (1 + exp(−x)) -- **tanh:** takes a real-valued input and squashes it to the range [-1, 1] +- **tanh**激活函数**:** 接收一个实数值输入值,将值转化到[-1, 1]区间之内。 tanh(x) = 2σ(2x) − 1 -- **ReLU**: ReLU stands for Rectified Linear Unit. It takes a real-valued input and thresholds it at zero (replaces negative values with zero) +- **ReLU**激活函数: ReLU代表一个修正线性单元。接收一个实数值输入值,当值大于零保持不变,当小于零输出值为零。 f(x) = max(0, x) -The below figures [2] show each of the above activation functions. +下图展示以上激活函数: ###### ![Screen Shot 2016-08-08 at 11.53.41 AM](https://ujwlkarn.files.wordpress.com/2016/08/screen-shot-2016-08-08-at-11-53-41-am.png?w=748)Figure 2: different activation functions -**Importance of Bias:** The main function of Bias is to provide every node with a trainable constant value (in addition to the normal inputs that the node receives). See [this link](http://stackoverflow.com/q/2480650/3297280) to learn more about the role of bias in a neuron. +**偏置的重要性**: 偏置的主要功能在于给每个节点提供一个常数值(作为节点常规输入值的补充)。 See [this link](http://stackoverflow.com/q/2480650/3297280) to learn more about the role of bias in a neuron. -#### Feedforward Neural Network +#### **前馈神经网络** -The feedforward neural network was the first and simplest type of artificial neural network devised [3]. It contains multiple neurons (nodes) arranged in **layers**. Nodes from adjacent layers have **connections** or **edges** between them. All these connections have **weights** associated with them. +前馈神经网络是最早的最简单的人工神经网络的设计。它包含神经元在**层**间的你安排,相邻的层之间有有**连接**或者**边**。所有这些连接都有相应的**权重值**。 -An example of a feedforward neural network is shown in Figure 3. +以下是一个三层神经网络图: ![Screen Shot 2016-08-09 at 4.19.50 AM.png](https://ujwlkarn.files.wordpress.com/2016/08/screen-shot-2016-08-09-at-4-19-50-am.png?w=498&h=368) ###### Figure 3: an example of feedforward neural network -A feedforward neural network can consist of three types of nodes: +一个前馈神经网络可以包含三种类型的神经元: -1. **Input Nodes –** The Input nodes provide information from the outside world to the network and are together referred to as the “Input Layer”. No computation is performed in any of the Input nodes – they just pass on the information to the hidden nodes. -2. **Hidden Nodes –** The Hidden nodes have no direct connection with the outside world (hence the name “hidden”). They perform computations and transfer information from the input nodes to the output nodes. A collection of hidden nodes forms a “Hidden Layer”. While a feedforward network will only have a single input layer and a single output layer, it can have zero or multiple Hidden Layers. -3. **Output Nodes –** The Output nodes are collectively referred to as the “Output Layer” and are responsible for computations and transferring information from the network to the outside world. + -In a feedforward network, the information moves in only one direction – forward – from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network [3] (this property of feed forward networks is different from Recurrent Neural Networks in which the connections between the nodes form a cycle). +\1. **输入神经元**--输入神经元给神经网络带来外部信息,被称为输入层。在输入层没有涉及计算--它仅仅是将信息传入隐含层。 -Two examples of feedforward networks are given below: +\2. **隐含层神经元**--隐含层的神经元与玩不神经之间没有之间联系,因此称为隐含层。他们能计算和传递输入层与输出层之间的信息。全部的隐含层神经元组成“隐含层”这个概念。一般一个前馈神经网络有一层输入层和一层单独的输出层,可以没有或者有多层隐藏层。 -1. **Single Layer Perceptron** – This is the simplest feedforward neural network [4] and does not contain any hidden layer. You can learn more about Single Layer Perceptrons in [4], [5], [6], [7]. -2. **Multi Layer Perceptron** – A Multi Layer Perceptron has one or more hidden layers. We will only discuss Multi Layer Perceptrons below since they are more useful than Single Layer Perceptons for practical applications today. +\3. **输出神经元**--输出神经元集体被称为输出层,代表着从网络到外部的计算和传导。 -#### Multi Layer Perceptron +在一个前馈神经网络中,信息只在一个方向运行-前向-由输入层,经由隐藏层到达输出层。在前馈神经网络中没有循环或者圆环网络(这种特性使得前馈神经网络不通与循环神经网络)。 -A Multi Layer Perceptron (MLP) contains one or more hidden layers (apart from one input and one output layer). While a single layer perceptron can only learn linear functions, a multi layer perceptron can also learn non – linear functions. +下面是两个前馈神经网络的例子: -Figure 4 shows a multi layer perceptron with a single hidden layer. Note that all connections have weights associated with them, but only three weights (w0, w1, w2) are shown in the figure. +\1. **单层感知机**--这是最简单的前馈神经网,这种网络不包含隐藏层。 -**Input Layer:** The Input layer has three nodes. The Bias node has a value of 1. The other two nodes take X1 and X2 as external inputs (which are numerical values depending upon the input dataset). As discussed above, no computation is performed in the Input layer, so the outputs from nodes in the Input layer are 1, X1 and X2 respectively, which are fed into the Hidden Layer. +\2. **多层感知机-**-一个多层感知机有一层或者多层隐藏层。由于实际应用中,相对于单层感知机而言,多层感知机更实用,接下来我们将介绍多层感知机。 -**Hidden Layer:** The Hidden layer also has three nodes with the Bias node having an output of 1. The output of the other two nodes in the Hidden layer depends on the outputs from the Input layer (1, X1, X2) as well as the weights associated with the connections (edges). Figure 4 shows the output calculation for one of the hidden nodes (highlighted). Similarly, the output from other hidden node can be calculated. Remember that **f** refers to the activation function. These outputs are then fed to the nodes in the Output layer. +**多层感知机** + +一个多层感知机(MLP)除了输出层和输出层之外,包含一层或者多层隐藏层。一般单层感知机只能训练线性模型,而多层感知机不仅可以训练线性模型,同时可以训练非线性摩西。 + +图4 展示了一个有一层隐藏层的多层感知机。需要注意的是所有连接均包含与之相对应的权重,只是在图中标示出了w0,w1,w3三个权重。 + +**输入层**:输入层有三个神经元。偏置神神经元值设置值为1,其他两个神经元分别作为外部输入设置为x1,x2(根据输入值情况数值型标示)。跟上面介绍的一样,在输入层没有计算操作,所以经过层数输出进入隐藏层的分别是1,x1,x2。 + +**隐藏层**:隐藏层同样有三个神经元,其中偏置神经元的输出设置为1。其他两个神经元的输出取决于输入层的的输出1,x1,x2和连接边的权重的乘积。图中标红的神经元,展示这个神经元的计算方式,另一个神经元以同样的方式输出。需要注意的是其中f指代的是激活函数。这些输出将会输入到输出层。 ![ds.png](https://ujwlkarn.files.wordpress.com/2016/08/ds.png?w=1128) ###### Figure 4: a multi layer perceptron having one hidden layer -**Output Layer:** The Output layer has two nodes which take inputs from the Hidden layer and perform similar computations as shown for the highlighted hidden node. The values calculated (Y1 and Y2) as a result of these computations act as outputs of the Multi Layer Perceptron. +**输出层**:输出层有两个神经元,它们接收隐含层的输入,执行类似标红神经元的操作。计算出的Y1 Y2就是这个多层感知机的结果。 -Given a set of features **X = (x1, x2, …)** and a target **y**, a Multi Layer Perceptron can learn the relationship between the features and the target, for either classification or regression. +在给定一组特征x=(x1,x2,...)和目标变量y的前提下,一个多层感知机能学习特征和目标变量之间的关系,进而建立分类或者回归模型。 -Lets take an example to understand Multi Layer Perceptrons better. Suppose we have the following student-marks dataset: +下面使用一个例子来更好的理解多层感知机。假设我们有一下学生的标记数据: ![train.png](https://ujwlkarn.files.wordpress.com/2016/08/train.png?w=297&h=112) -The two input columns show the number of hours the student has studied and the mid term marks obtained by the student. The Final Result column can have two values 1 or 0 indicating whether the student passed in the final term. For example, we can see that if the student studied 35 hours and had obtained 67 marks in the mid term, he / she ended up passing the final term. +两个输入项分别表明学生学习的时间和在期中考中获得的成绩。结果列有两个值0或者1,分别表明学生在期末考中是否能通过。例如,我们可知如果学生学习35个小时,在期中考试中获得70分,那么他或者她最终能通过期末考。 -Now, suppose, we want to predict whether a student studying 25 hours and having 70 marks in the mid term will pass the final term. +现在,结社,我们需要预测一个学习学习了25个小时期中考70分能否通过最终考试。 ![test.png](https://ujwlkarn.files.wordpress.com/2016/08/test.png?w=300&h=40) -This is a binary classification problem where a multi layer perceptron can learn from the given examples (training data) and make an informed prediction given a new data point. We will see below how a multi layer perceptron learns such relationships. +这是一个多层感知机能通过已知例子学习,并给出预测的,二分类的问题。接下来饿哦们看看多层感知机是如何学习这些关系的。 + +训练我们自己的多层感知机:反向传播算法 + +训练多层感知机的过程被称为后向传播。我建议读下下面索引中的内容,这些内容给后向传播比较明确的介绍。 + +**误差反向传播**,也被简写为反向传播,是人工神经网络训练的一种方式。是监督学习的一种,是需要通过标记过的数据来训练模型的。 + +简单来说,后向传播类似**从错误中学习**。这种监督意味着修正网络产生的错误。 + +人工神经网络包含来自不同层的神经元:谁层、中间的隐藏层、和输出层。相邻层之间通过与权重相乘来连接。学习的目标给边分配对的权重。在输入矩阵已知的情况下,这些权重决定输出矩阵。 -##### Training our MLP: The Back-Propagation Algorithm +在有监督学习中,训练集是有标记的。这就意味着,对于给定的输出,我们知道想要的输出值或者标签。 -The process by which a Multi Layer Perceptron learns is called the Backpropagation algorithm. I would recommend reading [this Quora answer by Hemanth Kumar](https://www.quora.com/How-do-you-explain-back-propagation-algorithm-to-a-beginner-in-neural-network/answer/Hemanth-Kumar-Mantri) (quoted below) which explains Backpropagation clearly. +**后向传播**:随机的初始化分配边上的权重值。对于训练集中的每次输入,人工神经网络均被激活同时它的产出也是可以被观察到的。网络的输出将同已知标记好的产出进行比较,差异将通过后向传播传给之前的层。权重会根据误差相应的调整。这个过程将不断进行直到输出的错误率小于事先设置的门槛值。 -> **Backward Propagation of Errors,** often abbreviated as BackProp is one of the several ways in which an artificial neural network (ANN) can be trained. It is a supervised training scheme, which means, it learns from labeled training data (there is a supervisor, to guide its learning). -> -> To put in simple terms, BackProp is like “**learning from mistakes**“. The supervisor **corrects**the ANN whenever it makes mistakes. -> -> An ANN consists of nodes in different layers; input layer, intermediate hidden layer(s) and the output layer. The connections between nodes of adjacent layers have “weights” associated with them. The goal of learning is to assign correct weights for these edges. Given an input vector, these weights determine what the output vector is. -> -> In supervised learning, the training set is labeled. This means, for some given inputs, we know the desired/expected output (label). -> -> **BackProp Algorithm:** -> Initially all the edge weights are randomly assigned. For every input in the training dataset, the ANN is activated and its output is observed. This output is compared with the desired output that we already know, and the error is “propagated” back to the previous layer. This error is noted and the weights are “adjusted” accordingly. This process is repeated until the output error is below a predetermined threshold. -> -> Once the above algorithm terminates, we have a “learned” ANN which, we consider is ready to work with “new” inputs. This ANN is said to have learned from several examples (labeled data) and from its mistakes (error propagation). +一旦上面的进程终止,我们训练完的人工神经网络。 -Now that we have an idea of how Backpropagation works, lets come back to our student-marks dataset shown above. +既然我们队后向传播有了大体理解,现在我们回到上面学生分数的数据集。 -The Multi Layer Perceptron shown in Figure 5 (adapted from Sebastian Raschka’s [excellent visual explanation of the backpropagation algorithm](https://github.com/rasbt/python-machine-learning-book/blob/master/faq/visual-backpropagation.md)) has two nodes in the input layer (apart from the Bias node) which take the inputs ‘Hours Studied’ and ‘Mid Term Marks’. It also has a hidden layer with two nodes (apart from the Bias node). The output layer has two nodes as well – the upper node outputs the probability of ‘Pass’ while the lower node outputs the probability of ‘Fail’. + -In classification tasks, we generally use a [Softmax function](http://cs231n.github.io/linear-classify/#softmax) as the Activation Function in the Output layer of the Multi Layer Perceptron to ensure that the outputs are probabilities and they add up to 1. The Softmax function takes a vector of arbitrary real-valued scores and squashes it to a vector of values between zero and one that sum to one. So, in this case, +图5中展示的多层感知机在输出层除了偏置神经元,有两个神经元接收输入的学习时间和期中成绩。感知机中隐藏层,除了偏置神经元之外有两个神经元。输出层也有两个神经元-上面点神经元输出通过的概率,下面点的神经元输出挂掉的概率。 -Probability (Pass) + Probability (Fail) = 1 + -**Step 1: Forward Propagation** +在分类任务中,我们一般使用归一化指数函数在输出层作为激活函数,来保证输出概率和为1.归一化指数函数可以将一个任意真实值转化到0-1之间,并保证和为1.所以,在这个例子中,p(pass)+p(fail)=1 -All weights in the network are randomly assigned. Lets consider the hidden layer node marked **V** in Figure 5 below. Assume the weights of the connections from the inputs to that node are w1, w2 and w3 (as shown). + -The network then takes the first training example as input (we know that for inputs 35 and 67, the probability of Pass is 1). +**第1步:前向过程** -- Input to the network = [35, 67] -- Desired output from the network (target) = [1, 0] +随机初始化网络中的权重。让我们只观看下图5中被标记为V的神经元。假设相关的权重分别为w1,w2,w3。 -Then output V from the node in consideration can be calculated as below (**f** is an activation function such as sigmoid): +网络接到第一个训练集中的样本作为输入。 + +l 输入=【35,67】 + +l 标记的结果=【1,0】 + +经过下面的计算神经元V的输出: V = **f** (1*w1 + 35*w2 + 67*w3) -Similarly, outputs from the other node in the hidden layer is also calculated. The outputs of the two nodes in the hidden layer act as inputs to the two nodes in the output layer. This enables us to calculate output probabilities from the two nodes in output layer. +类似地,隐藏层中另一个神经元也将被计算出来。隐含层中的神经元经过类似输出层神经元类似的操作输入到输出层。这使得我们能够计算输出层输出的概率。 -Suppose the output probabilities from the two nodes in the output layer are 0.4 and 0.6 respectively (since the weights are randomly assigned, outputs will also be random). We can see that the calculated probabilities (0.4 and 0.6) are very far from the desired probabilities (1 and 0 respectively), hence the network in Figure 5 is said to have an ‘Incorrect Output’. + + +假设,输出层两个神经元给出的概率分别为0.4和0.6(因为权值是随机初始化的,所以输出也是随机的)。这里我们可以发现计算出的概率同想要的概率之间纯真很大差异。因此,图5中的网络输出了一个错误输出。 ![Screen Shot 2016-08-09 at 11.52.57 PM.png](https://ujwlkarn.files.wordpress.com/2016/08/screen-shot-2016-08-09-at-11-52-57-pm.png?w=748) ###### Figure 5: forward propagation step in a multi layer perceptron -**Step 2: Back Propagation and Weight Updation** +**第2****步:后向传播和权重更新** -We calculate the total error at the output nodes and propagate these errors back through the network using Backpropagation to calculate the *gradients*. Then we use an optimization method such as *Gradient Descent* to ‘adjust’ **all** weights in the network with an aim of reducing the error at the output layer. This is shown in the Figure 6 below (ignore the mathematical equations in the figure for now). +我们计算了输出神经元中的误差,同时后向传播通过这些误差计算梯度。以减少输出层的误差为目标,使用类似梯度下降的优化器,来调整网络中的全部参数。这个过程展示在下图6中。 -Suppose that the new weights associated with the node in consideration are w4, w5 and w6 (after Backpropagation and adjusting weights). +假设与神经元相关的新权证是w4,w5,w6(后向传播和调整权重后)。 ![Screen Shot 2016-08-09 at 11.53.06 PM.png](https://ujwlkarn.files.wordpress.com/2016/08/screen-shot-2016-08-09-at-11-53-06-pm.png?w=748) ###### Figure 6: backward propagation and weight updation step in a multi layer perceptron -If we now input the same example to the network again, the network should perform better than before since the weights have now been adjusted to minimize the error in prediction. As shown in Figure 7, the errors at the output nodes now reduce to [0.2, -0.2] as compared to [0.6, -0.4] earlier. This means that our network has learnt to correctly classify our first training example. +如果我们现在输入相同的样本带网络中,网络应该表现的更好一点,因为我们已经通过最小化误差调整了权重参数。正如图7中所示,输入层的误差同【0.6,-0.4】相比,已经减少到【0.2,-0.2】。这就意味着网络已经学会了如何正确的分类我们之前的训练样本。 ![Screen Shot 2016-08-09 at 11.53.15 PM.png](https://ujwlkarn.files.wordpress.com/2016/08/screen-shot-2016-08-09-at-11-53-15-pm.png?w=748) ###### Figure 7: the MLP network now performs better on the same input -We repeat this process with all other training examples in our dataset. Then, our network is said to have *learnt* those examples. +我们在所有的训练样本中重复以上过程。我们的网络就能学习这些样本。 + + -If we now want to predict whether a student studying 25 hours and having 70 marks in the mid term will pass the final term, we go through the forward propagation step and find the output probabilities for Pass and Fail. +假设我们想预测一个学生学习了25个小时期中考试75分是否能通过期末考,我们将数据输入到网络中就能得到结果。 -I have avoided mathematical equations and explanation of concepts such as ‘Gradient Descent’ here and have rather tried to develop an intuition for the algorithm. For a more mathematically involved discussion of the Backpropagation algorithm, refer to [this link](http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html). + -#### 3d Visualization of a Multi Layer Perceptron +这里我尽量避免使用数学上方程和解释梯度下降的概念,想通过建立一种对这种算法直观的理解。 -Adam Harley has created a [3d visualization](http://scs.ryerson.ca/~aharley/vis/fc/) of a Multi Layer Perceptron which has already been trained (using Backpropagation) on the MNIST Database of handwritten digits. + -The network takes 784 numeric pixel values as inputs from a 28 x 28 image of a handwritten digit (it has 784 nodes in the Input Layer corresponding to pixels). The network has 300 nodes in the first hidden layer, 100 nodes in the second hidden layer, and 10 nodes in the output layer (corresponding to the 10 digits) [15]. +三维视角的多层感知机 -Although the network described here is much larger (uses more hidden layers and nodes) compared to the one we discussed in the previous section, all computations in the forward propagation step and backpropagation step are done in the same way (at each node) as discussed before. +Adam Harley建立一种三维视角的剁成感知机,这种感知机已经在手写字识别中被训练和运用。 -Figure 8 shows the network when the input is the digit ‘5’. + + +该网络将784个数字像素值作为来自手写数字的28×28图像的输入(其在输入层中具有对应于像素的784个节点)。网络在第一个隐藏层中有300个节点,在第二个隐藏层中有100个节点,在输出层中有10个节点(对应于10个数字) + +虽然与前面部分讨论的网络相比,此处描述的网络要大得多(使用更多的隐藏层和节点),但前向传播步骤和反向传播步骤中的所有计算都以相同的方式(在每个节点处)完成,如上所述之前。 ![Screen Shot 2016-08-09 at 5.45.34 PM.png](https://ujwlkarn.files.wordpress.com/2016/08/screen-shot-2016-08-09-at-5-45-34-pm.png?w=748) ###### Figure 8: visualizing the network for an input of ‘5’ -A node which has a higher output value than others is represented by a brighter color. In the Input layer, the bright nodes are those which receive higher numerical pixel values as input. Notice how in the output layer, the only bright node corresponds to the digit 5 (it has an output probability of 1, which is higher than the other nine nodes which have an output probability of 0). This indicates that the MLP has correctly classified the input digit. I highly recommend playing around with this visualization and observing connections between nodes of different layers. +具有比其他输出值更高的输出值的节点由更亮的颜色表示。在输入层中,亮节点是接收较高数值像素值作为输入的节点。请注意,在输出层中,唯一的明亮节点对应于数字5(输出概率为1,高于其他9个输出概率为0的节点)。这表明MLP已正确分类输入数字。我强烈建议玩这种可视化并观察不同层节点之间的连接。 + +#### 深度神经网络 + +1. \1. [深度学习和普通机器学习有什么区别?](https://github.com/rasbt/python-machine-learning-book/blob/master/faq/difference-deep-and-normal-learning.md) -#### Deep Neural Networks + \2. [神经网络和深层神经网络有什么区别?](http://stats.stackexchange.com/questions/182734/what-is-the-difference-between-a-neural-network-and-a-deep-neural-network?rq=1) -1. [What is the difference between deep learning and usual machine learning?](https://github.com/rasbt/python-machine-learning-book/blob/master/faq/difference-deep-and-normal-learning.md) -2. [What is the difference between a neural network and a deep neural network?](http://stats.stackexchange.com/questions/182734/what-is-the-difference-between-a-neural-network-and-a-deep-neural-network?rq=1) -3. [How is deep learning different from multilayer perceptron?](https://www.quora.com/How-is-deep-learning-different-from-multilayer-perceptron) + \3. [深层学习与多层感知器有何不同?](https://www.quora.com/How-is-deep-learning-different-from-multilayer-perceptron) -#### Conclusion -I have skipped important details of some of the concepts discussed in this post to facilitate understanding. I would recommend going through [Part1](http://cs231n.github.io/neural-networks-1/), [Part2](http://cs231n.github.io/neural-networks-2/), [Part3](http://cs231n.github.io/neural-networks-3/) and [Case Study](http://cs231n.github.io/neural-networks-case-study/) from Stanford’s Neural Network tutorial for a thorough understanding of Multi Layer Perceptrons. +#### 结论 -Let me know in the comments below if you have any questions or suggestions! +我已经跳过了本文中讨论的一些概念的重要细节,以便于理解。我会建议通过去[第一部分](http://cs231n.github.io/neural-networks-1/),[第2部分](http://cs231n.github.io/neural-networks-2/),[第三部分](http://cs231n.github.io/neural-networks-3/)和[案例分析](http://cs231n.github.io/neural-networks-case-study/)从斯坦福大学的神经网络教程多层感知的全面理解。 #### References diff --git "a/20171016 \347\254\25414\346\234\237/BUILDING A NEURAL NET FROM SCRATCH IN GO.md" "b/20171016 \347\254\25414\346\234\237/BUILDING A NEURAL NET FROM SCRATCH IN GO.md" index 0ef4e807b7b73d7cf55b32f581276b0f12a887d4..4b9508f59627038e7a512a2e2d22cddb3255261c 100644 --- "a/20171016 \347\254\25414\346\234\237/BUILDING A NEURAL NET FROM SCRATCH IN GO.md" +++ "b/20171016 \347\254\25414\346\234\237/BUILDING A NEURAL NET FROM SCRATCH IN GO.md" @@ -1,40 +1,40 @@ -## BUILDING A NEURAL NET FROM SCRATCH IN GO +## 用Go建立神经网络 原文链接:[BUILDING A NEURAL NET FROM SCRATCH IN GO](https://www.datadan.io/building-a-neural-net-from-scratch-in-go/?from=hackcv&hmsr=hackcv.com&utm_medium=hackcv.com&utm_source=hackcv.com) -I'm super pumped that my new book [Machine Learning with Go](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-go) is now available! Writing the book allowed me to get a complete view of the current state of machine learning in Go, and let's just say that I'm pretty excited to see how the community growing! +我很高兴我的新书[Machine Learning with Go](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-go)现已推出!写这本书让我可以全面了解Go中机器学习的现状,我很高兴看到社区如何成长! [![img](https://www.datadan.io/content/images/2017/10/book_wide.png)](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-go) -In the book (and for my own edification), I decided that I would build a neural network from scratch in Go. Turns out, this is fairly easy, and I thought it would be great to share my little neural net here. +在书中(包含我自己的启发),我决定用Go从零开始构建一个神经网络。事实证明,这很容易,我认为在这里分享我的小神经网络是一件很棒的事。 -All the code and data shown below is available [on GitHub](https://github.com/dwhitena/gophernet). +下面显示的所有代码和数据都可以在[GitHub](https://github.com/dwhitena/gophernet)上找到。 -(If you are interested in leveraging pre-existing Go packaging for machine learning, check out [all the great existing packages](https://github.com/gopherdata/resources/tree/master/tooling), and be sure to watch [Chris Benson's recent talk](https://youtu.be/CHzMEamGZDA) at GolangUK about Deep Learning in Go) +(如果您有兴趣利用已有的Go包装进行机器学习,请查看[所有优秀的现有软件包](https://github.com/gopherdata/resources/tree/master/tooling),并观看Chris Benson最近在GolangUK关于Go中深度学习的[演讲](https://youtu.be/CHzMEamGZDA)) -## Goals +## 目标 -There are a whole variety of ways to accomplish this task of building a neural net in Go, but I wanted to adhere to the following guidelines: +在Go中完成构建神经网络的任务有多种方法,但我想遵循以下准则: -- **No cgo** - I want my little neural net to compile nicely to a statically linked binary, and I also want to highlight the numerical functionality that is available natively in Go. -- **gonum matrix input** - I want supply matrices to my neural network for training, similar to how you would supply `numpy` arrays to most Python machine learning functions. -- **Variable numbers of nodes** - Although I will only illustrate one architecture here, I wanted my code to be flexible, such that I could tweak the numbers of nodes in each layer for other scenarios. +- **没有cgo** - 我希望我的小神经网络可以很好地编译为静态链接的二进制文件,我还想强调Go中本机可用的数字功能。 +- **gonum矩阵输入** - 我希望为我的神经网络提供供应矩阵进行训练,类似于如何为大多数Python机器学习功能提供`numpy`数组。 +- **可变数量的节点** - 虽然我只在这里说明一个架构,但我希望我的代码是灵活的,这样我就可以调整每层中的节点数量以用于其他场景。 -## Network Architecture +## 网络架构 -The basic network architecture that we will utilize in this example includes an input layer, a single hidden layer, and an output layer: +我们将在此示例中使用的基本网络体系结构包括输入层,单个隐藏层和输出层: ![img](https://www.datadan.io/content/images/2017/09/B05151_Chapter_08_05-1.png) -This type of single layer neural net might not be very "deep," but it has proven to be very useful for a huge majority of simple classification tasks. In our case, we will be training our model to classify iris flowers based on the [famous iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set). This should be more than enough to solve that problem with a high degree of accuracy. +这种类型的单层神经网络可能不是很“深”,但它已被证明对绝大多数简单的分类任务非常有用。在我们的例子中,我们将根据着名的鸢尾花数据集训练我们的模型来对[著名的鸢尾花数据集合](https://en.wikipedia.org/wiki/Iris_flower_data_set)进行分类。这应该足以以高精度解决该问题。 -Each of the **nodes** in the network will take in one or more inputs, combine those together linearly (using **weights** and a **bias**), and then apply a non-linear **activation function**. By optimizing the weights and the biases, with a process called [**backpropagation**](https://en.wikipedia.org/wiki/Backpropagation), we will be able to mimic the relationships between our inputs (measurements of flowers) and what we are trying to predict (species of flowers). We will then be able to feed new inputs through the optimized network (i.e., we will **feed** them **forward**) to predict the corresponding output. +网络中的每个**节点**将接收一个或多个输入,将它们线性地组合在一起(使用**权重**和**偏差**),然后应用非线性激活函数。通过优化权重和偏差,通过称为[**反向传播**](https://en.wikipedia.org/wiki/Backpropagation)的过程,我们将能够模仿我们的输入(花的测量)和我们想要预测的(花的种类)之间的关系。然后,我们将能够通过优化网络提供新输入(即,我们将**向前**它们**传播**)以预测相应的输出。 -(If you are new to neural nets, you might also check out [this great intro](https://ujjwalkarn.me/2016/08/09/quick-intro-neural-networks/), or, of course, you can read the relevant section in [Machine Learning with Go](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-go).) +(如果你是神经网络的新手,你也可以查看[这个很棒的介绍](https://ujjwalkarn.me/2016/08/09/quick-intro-neural-networks/),当然你还可以阅读[机器学习与Go]中的相关部分(https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-go)。) -## Defining Useful Functions and Types +## 定义有用的函数和类型 -Before diving into the backpropagation and feeding forward, let's define a couple types that will help us as we work with our model: +在深入反向传播和前向传播之前,让我们定义一些类型,这将有助于我们使用我们的模型: ```go // neuralNet contains all of the information @@ -63,7 +63,7 @@ func newNetwork(config neuralNetConfig) *neuralNet { } ``` -We also need to define our activation function and it's derivative, which we will utilize during backpropagation. There are many choices for activation functions, but here we are going to use the [sigmoid function](http://mathworld.wolfram.com/SigmoidFunction.html). This function has various advantages, including probabilistic interpretations and a convenient expression for it's derivative. +我们还需要定义我们的激活函数及其衍生函数,我们将在反向传播过程中使用它。激活函数有很多,但在这里我们将使用[sigmoid函数](http://mathworld.wolfram.com/SigmoidFunction.html)。该函数具有各种优点,包括概率解释和对其衍生的方便表达。 ```go // sigmoid implements the sigmoid function @@ -79,22 +79,22 @@ func sigmoidPrime(x float64) float64 { } ``` -## Implementing Backpropagation for Training +## 为训练实施反向传播 -With the definitions above taken care of, we can write an implementation of the [backpropagation method](https://en.wikipedia.org/wiki/Backpropagation) for training, or optimizing, the weights and biases of our network. The backpropagation method involves: +通过上面的定义,我们可以编写[backpropagation方法](https://en.wikipedia.org/wiki/Backpropagation)的实现来训练或优化我们网络的权重和偏差。反向传播方法包括: -1. Initializing our weights and biases (e.g., randomly). -2. Feeding training data through the neural net forward to produce output. -3. Comparing the output to the correct output to get errors. -4. Calculating changes to our weights and biases based on the errors. -5. Propagating the changes back through the network. -6. Repeating steps 2-5 for a given number of **epochs** or until a stopping criteria is satisfied. +1. 初始化我们的权重和偏差(例如,随机)。 +2. 通过神经网络向前馈送训练数据以产生输出。 +3. 将输出与正确的输出进行比较以获得错误。 +4. 根据错误计算我们的权重和偏差的变化。 +5. 通过网络传播更改。 +6. 对于给定数量的**批次**重复步骤2-5或直到满足停止标准。 -In steps 3-5, we will utilize [**stochastic gradient descent**](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) (SGD) to determine the updates for our weights and biases. +在步骤3-5中,我们将利用[**随机梯度下降**](https://en.wikipedia.org/wiki/Stochastic_gradient_descent)(SGD)来确定我们的权重和偏差的更新。 -To implement this network training, I created a method on `neuralNet` that would take pointers to two matrices as input, `x` and `y`. `x` will be the features of our data set (i.e., the independent variables) and `y` will represent what we are trying to predict (i.e., the dependent variable). I will show an example of these later in the article, but for now, let's assume that they take this form. +为了实现这种网络训练,我在`neuralNet`上创建了一个方法,它将用两个矩阵作为输入,`x`和`y`。`x`将是我们数据集的特征(即独立变量),而'y`将代表我们试图预测的内容(即因变量)。我将在本文后面展示一些这样的例子,但是现在,让我们假设它们采用这种形式。 -In this function, we first initialize our weights and biases randomly and then use backpropagation to optimize the weights and the biases: +在这个函数中,我们首先随机初始化我们的权重和偏差,然后使用反向传播来优化权重和偏差: ```go // train trains a neural network using backpropagation. @@ -143,7 +143,7 @@ func (nn *neuralNet) train(x, y *mat.Dense) error { } ``` -The actual implementation of backpropagation is shown below. **Note/Warning**, for clarity and simplicity, I'm going to create a handful of matrices as I carry out the backpropagation. For large data sets, you would likely want to optimize this to reduce the number of matrices in memory. +反向传播的实际实现如下所示。**注意/警告**,为了表现地清晰和简单,我将在执行反向传播时创建一些矩阵。对于大型数据集,您可能希望对其进行优化以减少内存中的矩阵数。 ```go // backpropagate completes the backpropagation method. @@ -217,7 +217,7 @@ func (nn *neuralNet) backpropagate(x, y, wHidden, bHidden, wOut, bOut, output *m } ``` -Here we have utilized a helper function that allows us to sum values along one dimension of a matrix, keeping the other dimension intact: +在这里,我们使用了一个辅助函数,它允许我们沿着矩阵的一个维度对值求和,保持其他维度的完整性: ```go // sumAlongAxis sums a matrix along a particular dimension, @@ -251,9 +251,9 @@ func sumAlongAxis(axis int, m *mat.Dense) (*mat.Dense, error) { } ``` -## Implementing Feed Forward for Prediction +## 实现前向预测 -After training our neural net, we are going to want to use it to make predictions. To do this, we just need to feed some given `x` values forward through the network to produce an output. This looks similar to the first part of backpropagation. Except, here we are going to return the generated output. +在训练我们的神经网络之后,我们将要用它来进行预测。为此,我们只需要通过网络向前提供一些给定的“x”值以产生输出。这看起来类似于反向传播的第一部分。除此之外,我们将返回生成的输出。 ```go // predict makes a prediction based on a trained @@ -292,11 +292,11 @@ func (nn *neuralNet) predict(x *mat.Dense) (*mat.Dense, error) { } ``` -## The Data +## 数据 -Ok, we now have the building blocks that we will need to train and test our neural network. However, before we go off and try to run this, let's take a brief look at the data that I will be using to experiment with this neural net. +好的,我们现在拥有训练和测试神经网络所需的构建模块。然而,在我们开始试图运行之前,让我们简要地看一下我将用来试验这个神经网络的数据。 -The data I'm going to use is a slightly transformed version of the popular [iris data set](https://archive.ics.uci.edu/ml/datasets/iris). This data set includes sets of four iris flower measurements (what will become our `x` values) along with a corresponding indication of iris species (what will become our `y` values). To utilize this data set with our neural net, I have slightly transformed the data set, such that the species values are represented by three binary columns (1 if the row corresponds to that species, 0 otherwise). I have also added a little bit of random noise to the measurement to try and confuse the neural net (because this problem is pretty easy to solve otherwise): +我将要使用的数据是流行的[鸢尾花数据集](https://archive.ics.uci.edu/ml/datasets/iris)的略微转换版本。该数据集包括四组虹膜花测量(将成为我们的`x`值)以及鸢尾花种类的相应指示(将成为我们的`y`值)。为了利用我们的神经网络利用这个数据集,我稍微改变了数据集,使得物种值由三个二进制列表示(如果行对应于该物种,则为1,否则为0)。我还在测量中添加了一点随机噪声,试图混淆神经网络(因为这个问题很容易解决): ``` $ head train.csv @@ -312,11 +312,11 @@ sepal_length,sepal_width,petal_length,petal_width,setosa,virginica,versicolor 0.416666666667,0.291666666667,0.525423728814,0.375,0.0,0.0,1.0 ``` -I also split the data set up for training and testing (via an 80/20 split) into `train.csv` and `test.csv` respectively. +我还将用于训练和测试的数据(通过80/20比例分割)分别分为`train.csv`和`test.csv`。 -## Putting It All Together +## 全部放在一起 -Let's put this neural net to work. To do this, we first need to read our training data, initialize a `neuralNet` value, and call the `train()` method: +让我们把这个神经网络运用起来。为此,我们首先需要读取我们的训练数据,初始化`neuralNet`值,并调用`train()`方法: ```go package main @@ -425,7 +425,7 @@ func main() { } ``` -That gives us a trained neural net. We can then parse the test data into matrices `testInputs` and `testLabels` (I'll spare you these details as they are the same as above), use our `predict()`method to make predictions for the flower species, and then compare the predictions to the actual species. Calculating the predictions and accuracy looks like the following: +这给了我们一个已经训练好的神经网络。然后我们可以将测试数据解析为矩阵`testInputs`和`testLabels`(我将略过这些细节,因为它们与上面相同),使用我们的`predict()`方法来预测花种,以及将预测情况与实际物种进行比较。计算预测和准确性如下所示: ```go func main() { @@ -475,9 +475,9 @@ func main() { } ``` -## Results +## 结果 -Compiling and running the full program results in something similar to: +编译并运行完整的程序会产生类似于: ``` $ go build @@ -485,11 +485,10 @@ $ ./gophernet Accuracy = 0.97 ``` +哇噢!对于我们从头开始的神经网络,97%的准确度并不算太糟糕! 当然,这个数字会因模型中的随机性而有所不同,但它通常表现得非常好。 -Woohoo! 97% accuracy isn't too shabby for our little from scratch neural net! Of course this number will vary due to the randomness in the model, but it does generally perform very nicely. +我希望这对你来说是有益的和有趣的。所有的代码和数据都是[可在这里](https://github.com/dwhitena/gophernet),所以自己试试吧! 此外,如果您对Go for ML / AI和Data Science感兴趣,我强烈建议: -I hope this was informative and interesting for you. All of the code and data is [available here](https://github.com/dwhitena/gophernet), so try it out yourself! Also, if you are interested in Go for ML/AI and Data Science in general, I highly recommend: - -- Joining [Gophers Slack](https://invite.slack.golangbridge.org/), and participating in the #data-science channel (I'm @dwhitena there) -- Checking out all the great Go ML/AI/data tooling [here](https://github.com/gopherdata/resources/tree/master/tooling) -- Following the [GopherData blog/website](http://gopherdata.io/) for more interesting articles and community information \ No newline at end of file +- 加入[Gophers Slack](https://invite.slack.golangbridge.org/),并参与#data-science频道(我在那里@dwhitena) +- 检查所有伟大的Go ML / AI /数据工具[这里](https://github.com/gopherdata/resources/tree/master/tooling) +- 关注[GopherData博客/网站](http://gopherdata.io/)获取更多有趣的文章和社区信息 \ No newline at end of file diff --git "a/20171016 \347\254\25414\346\234\237/Practical Data Science in Python.md" "b/20171016 \347\254\25414\346\234\237/Practical Data Science in Python.md" index 8f1cb69c8f4c0311fb78844c2b7c129faa39f404..9b72f490f4d2767eddb1f5c0516872596da9860f 100644 --- "a/20171016 \347\254\25414\346\234\237/Practical Data Science in Python.md" +++ "b/20171016 \347\254\25414\346\234\237/Practical Data Science in Python.md" @@ -1,45 +1,41 @@ -# Practical Data Science in Python +# Python中的实用数据科学 原文链接:[Practical Data Science in Python](https://radimrehurek.com/data_science_python/?from=hackcv&hmsr=hackcv.com&utm_medium=hackcv.com&utm_source=hackcv.com) -This notebook accompanies my talk on "Data Science with Python" at the [University of Economics](https://www.vse.cz/english/) in Prague, December 2014. Questions & comments welcome [@RadimRehurek](https://twitter.com/radimrehurek). +这本笔记本伴随着我在2014年12月在布拉格的[经济大学](https://www.vse.cz/english/)上的“数据科学与Python”的讨论。欢迎提出问题和评论[@RadimRehurek](https://twitter.com/radimrehurek)。 -The goal of this talk is to demonstrate some high level, introductory concepts behind (text) machine learning. The concepts are demonstrated by concrete code examples in this notebook, which you can run yourself (after installing IPython, see below), on your own computer. +本演讲的目的是展示(文本)机器学习背后的一些高级的介绍性概念。 这些概念通过这个笔记本中的具体代码示例来演示,您可以在自己的计算机上自行运行(在安装IPython之后,见下文)。 -The talk audience is expected to have some basic programming knowledge (though not necessarily Python) and some basic introductory data mining background. This is *not*an "advanced talk" for machine learning experts. - -The code examples build a working, executable prototype: an app to classify phone SMS messages in English (well, the "SMS kind" of English...) as either "spam" or "ham" (=not spam). +我们假设听众(读者)将拥有一些基本的编程知识(虽然不一定是Python)和一些基本的入门数据挖掘背景。 对于机器学习专家来说,这不是一个“高级谈话”。 +代码示例构建了一个可工作的可执行原型:一个应用程序,用于将英语中的电话SMS消息(以及英语的“SMS类型”)分类为“垃圾邮件”或“火腿”(=非垃圾邮件)。 [![img](https://radimrehurek.com/data_science_python/python.png)](https://xkcd.com/353/) +整个过程中使用的语言将是[Python](https://www.python.org/),这是一种通用语言,有助于管道的所有部分:I/O,数据清洗和预处理,模型训练和评估。虽然Python绝不是唯一的选择,但由于其成熟的科学计算生态系统,它提供了灵活性,易开发性和性能的独特组合。其庞大的开源生态系统还避免了任何单个特定框架或库的锁定(以及相关的bitrot)。 +Python(及其大多数库)也是独立于平台的,因此您可以在Windows,Linux或OS X上运行此笔记本而无需更改。 -The language used throughout will be [Python](https://www.python.org/), a general purpose language helpful in all parts of the pipeline: I/O, data wrangling and preprocessing, model training and evaluation. While Python is by no means the only choice, it offers a unique combination of flexibility, ease of development and performance, thanks to its mature scientific computing ecosystem. Its vast, open source ecosystem also avoids the lock-in (and associated bitrot) of any single specific framework or library. - -Python (and of most its libraries) is also platform independent, so you can run this notebook on Windows, Linux or OS X without a change. - -One of the Python tools, the IPython notebook = interactive Python rendered as HTML, you're watching right now. We'll go over other practical tools, widely used in the data science industry, below. - +其中一个Python工具,IPython notebook =以HTML呈现的交互式Python,你现在正在观看。我们将介绍下面广泛用于数据科学行业的其他实用工具。 -Want to run the examples below interactively? (optional) -1. Install the (free) [Anaconda](https://store.continuum.io/cshop/anaconda/) Python distribution, including Python itself. -2. Install the "natural language processing" TextBlob library: [instructions here](https://textblob.readthedocs.org/en/dev/install.html). -3. Download the source for this notebook to your computer: [http://radimrehurek.com/data_science_python/data_science_python.ipynb](https://radimrehurek.com/data_science_python/data_science_python.ipynb) and run it with: - `$ ipython notebook data_science_python.ipynb` -4. Watch the [IPython tutorial video](https://www.youtube.com/watch?v=H6dLGQw9yFQ) for notebook navigation basics. -5. Run the first code cell below; if it executes without errors, you're good to go! +想以交互方式运行以下示例吗? (可选的) +1.安装(免费)[Anaconda](https://store.continuum.io/cshop/anaconda/)Python发行版,包括Python本身。 +2.安装“自然语言处理”TextBlob库:[此处说明](https://textblob.readthedocs.org/en/dev/install.html)。 +3.将此笔记本的源代码下载到您的计算机:[http://radimrehurek.com/data_science_python/data_science_python.ipynb](https://radimrehurek.com/data_science_python/data_science_python.ipynb)并运行它: +   `$ ipython notebook data_science_python.ipynb` +4.观看[IPython教程视频](https://www.youtube.com/watch?v=H6dLGQw9yFQ)了解笔记本导航基础知识。 +5.运行下面的第一个代码单元格;如果它没有错误地执行,你将可以顺利的进入下一步! -# End-to-end example: automated spam filtering +# 端到端示例:自动垃圾邮件过滤 In [1]: -``` +```python %matplotlib inline import matplotlib.pyplot as plt import csv @@ -61,13 +57,9 @@ from sklearn.learning_curve import learning_curve -## Step 1: Load data, look around - - - -Skipping the *real* first step (fleshing out specs, finding out what is it we want to be doing -- often highly non-trivial in practice!), let's download the dataset we'll be using in this demo. Go to https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection and download the zip file. Unzip it under `data` subdirectory. You should see a file called `SMSSpamCollection`, about 0.5MB in size: - +## 第1步:加载数据,总览全局 +跳过*真正的*第一步(充实规范,找出我们想要做的事情 - 在实践中经常非常重要!),让我们下载我们将在本演示中使用的数据集。 转到https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection并下载zip文件。 在`data`子目录下解压缩它。 你应该看到一个名为`SMSSpamCollection`的文件,大小约为0.5MB: ``` $ ls -l data @@ -77,30 +69,24 @@ total 1352 -rw-r-----@ 1 kofola staff 203415 Dec 1 15:30 smsspamcollection.zip ``` - - -This file contains **a collection of more than 5 thousand SMS phone messages** (see the `readme` file for more info): +此文件包含**超过5千条短信电话消息的集合**(有关详细信息,请参阅`readme`文件): In [2]: -``` +```python messages = [line.rstrip() for line in open('./data/SMSSpamCollection')] print len(messages) ``` - - ``` 5574 ``` - - -A collection of texts is also sometimes called "corpus". Let's print the first ten messages in this SMS corpus: +文本集合有时也称为“语料库”。 让我们输出这个SMS语料库中的前十条消息: In [3]: -``` +```python for message_no, message in enumerate(messages[:10]): print message_no, message ``` @@ -120,23 +106,17 @@ for message_no, message in enumerate(messages[:10]): 9 spam Had your mobile 11 months or more? U R entitled to Update to the latest colour mobiles with camera for Free! Call The Mobile Update Co FREE on 08002986030 ``` +我们看到这是一个[TSV](https://en.wikipedia.org/wiki/Tab-separated_values)(“制表符分隔值”)文件,其中第一列是一个标签,说明给定的消息是否正常 消息(“火腿”)或“垃圾邮件”。 第二列是消息本身。 - -We see that this is a [TSV](https://en.wikipedia.org/wiki/Tab-separated_values) ("tab separated values") file, where the first column is a label saying whether the given message is a normal message ("ham") or "spam". The second column is the message itself. - -This corpus will be our labeled training set. Using these ham/spam examples, we'll **train a machine learning model to learn to discriminate between ham/spam automatically**. Then, with a trained model, we'll be able to **classify arbitrary unlabeled messages** as ham or spam. - - +这个语料库将是我们标记的训练集。 使用这些火腿/垃圾邮件示例,我们将**训练机器学习模型,以学习自动区分火腿/垃圾邮件**。 然后,通过训练有素的模型,我们将能够将任意未标记的消息**分类为火腿或垃圾邮件**。 [![img](https://radimrehurek.com/data_science_python/plot_ML_flow_chart_11.png)](http://www.astroml.org/sklearn_tutorial/general_concepts.html#supervised-learning-model-fit-x-y) - - -Instead of parsing TSV (or CSV, or Excel...) files by hand, we can use Python's `pandas` library to do the work for us: +我们可以使用Python的`pandas`库为我们完成工作,而不是手工解析TSV(或CSV或Excel ...)文件: In [4]: -``` +```python messages = pandas.read_csv('./data/SMSSpamCollection', sep='\t', quoting=csv.QUOTE_NONE, names=["label", "message"]) print messages @@ -212,8 +192,7 @@ print messages ``` - -With `pandas`, we can also view aggregate statistics easily: +使用`pandas`,我们还可以轻松查看汇总统计信息: In [5]: @@ -237,11 +216,11 @@ Out[5]: -How long are the messages? +消息有多长? In [6]: -``` +```python messages['length'] = messages['message'].map(lambda text: len(text)) print messages.head() ``` @@ -275,13 +254,13 @@ Out[7]: In [8]: -``` +```python messages.length.describe() ``` Out[8]: -``` +```python count 5574.000000 mean 80.604593 std 59.919970 @@ -293,13 +272,11 @@ max 910.000000 Name: length, dtype: float64 ``` - - -What is that super long message? +什么是超长消息? In [9]: -``` +```python print list(messages.message[messages.length > 900]) ``` @@ -309,19 +286,17 @@ print list(messages.message[messages.length > 900]) ["For me the love should start with attraction.i should feel that I need her every time around me.she should be the first thing which comes in my thoughts.I would start the day and end it with her.she should be there every time I dream.love will be then when my every breath has her name.my life should happen around her.my life will be named to her.I would cry for her.will give all my happiness and take all her sorrows.I will be ready to fight with anyone for her.I will be in love when I will be doing the craziest things for her.love will be when I don't have to proove anyone that my girl is the most beautiful lady on the whole planet.I will always be singing praises for her.love will be when I start up making chicken curry and end up makiing sambar.life will be the most beautiful then.will get every morning and thank god for the day because she is with me.I would like to say a lot..will tell later.."] ``` - - -Is there any difference in message length between spam and ham? +垃圾邮件和火腿之间的邮件长度有什么不同吗? In [10]: -``` +```python messages.hist(column='length', by='label', bins=50) ``` Out[10]: -``` +```python array([, ], dtype=object) ``` @@ -330,37 +305,29 @@ array([, ![img](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAYgAAAERCAYAAABhKjCtAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz%0AAAALEgAACxIB0t1+/AAAHgdJREFUeJzt3X20XXV95/H3h0SehBIiNc8SlNA2FpQHQ3XqcNCaRscS%0A2rWGyPKhAnZmTdoBZ2wloWuV68waBac6heXAqkUgiIlNkaIURALtsTgUghQxJaQkU6LcC7lRCKBW%0AhgS+88feN3fn5HcfzsM+j5/XWndln99++P3uzf7u7/nt3+/so4jAzMys1iGdboCZmXUnJwgzM0ty%0AgjAzsyQnCDMzS3KCMDOzJCcIMzNLcoLoEpJ2Snp3p9thZjbGCaJ7RP5jZtYVnCDMzCzJCaK7nCrp%0AUUnPS/qqpMMkHSvpbyTtlvScpNslLRjbQVJV0n+X9H8k/UTSNyQdJ+krkl6QtFnS8Z38pcymQ9Kl%0AkoYlvShpm6R3SRqSdEseDy9KeljSKYV91kjaka97TNK5hXUfzePi85L25Nu9Q9IFkn4oaVTSRzrz%0A2/YGJ4juIeDfA78JnACcAnw0L/8S8Ib85+fAF2r2XQV8CFgAvAn4h3yf2cDjwOWlt96sCZJ+Cfh9%0A4IyI+AVgObAzX30OsBE4FlgP3CZpRr5uB/Dr+T6fAm6WNKdw6GXAo2SxsCE/zmlkcfIh4AuSjizx%0AV+tpThDdI4CrI2JXROwBbgfeGhHPRcRfR8RLEfFT4NPAWTX73RART0bEi8A3gSci4m8j4hXgr4BT%0A2/y7mNXrFeAw4M2SXhMRP4yIf8nXfTcibs3P588DhwNvB4iIWyJiV768EdgOnFk47pMRsS6yh85t%0ABOYD/y0i9kbEJuBl4MR2/IK9yAmiu+wqLP8cOErSEZL+PJ/l9ALwbeAYSSpsO1pYfgnYXfP6qNJa%0AbNYCEbED+DgwBIxK2iBpXr56uLBd5K/nAUj6iKRH8ltIe4BfBV5XOHQxNn6eH+NHNWWOjwk4QXSv%0AsRlNfwicBCyLiGPIeg/Kfybbz6ynRMSGiHgncDzZeXxl/u+isW0kHQIsBJ7Ox9a+SHZranZEHAv8%0AExPHhtXJCaJ7jZ3kR5G9y3lB0mzS4wmaYNmsJ0g6KR+UPgz4f2Q931fy1adL+m1JM8l6GS8BDwCv%0AJUsgPwYOkXQBWQ/CWsQJonuNfS7iz4AjyILgfrIxhtpeQiT2m2i9WTc6DPgM8CPgGeA44LJ83dfJ%0AJmI8B3wQ+J2IeCUitgKfI5uUsYssOXyncEzHQpM02RcGSboe+HfA7og4uVD+n4HVZBn+joi4NC9f%0AC1yYl18cEXfn5acDN5INLt0ZEZeU8tuYdUCr4sQOJuly4MSI+HCn2zKIpupB3ACsKBZIOpts2tkp%0AEfGrwJ/m5UvJsvzSfJ9rCgOp1wIXRcQSYImkA45p1uOajRP35CfmW6YdNOmJGRH3AXtqiv8T8JmI%0A2JtvMzYjYCWwIZ8+tpNsfvKZ+UyEoyNic77dTcC5mPWJFsTJsna1tQf5ETQd1Mg7lyXAv5X0QP4p%0A3jPy8vkUpqPlywsS5SN5uVk/qzdOLCEiPhUR/rRzh8xscJ9jI+LXJL2N7MMnb2xts8x6Xj1x4nfI%0A1pUaSRDDwK0AEfGQpFclHUfWM1hU2G5hvu1IvlwsH0kdWJIDxUoREe2+l11PnBwUD44FK0s9sdDI%0ALabbgHdBNncZODQifgx8A/iApEMlnUDWxd6cfwz+RUln5oPWH86PMVHj2/5z+eWXu94+rrdD6oqT%0A1AEG6f/I9bbnp16T9iAkbSD75O7rJD0F/AlwPXC9pC1kzzH5SH4yb5W0EdgK7ANWx3iLVpNNcz2C%0AbJrrXXW31KxLtTBOzLrKpAkiIs6fYFVyTnJEfJrsYXK15Q8DJx+8h1nva1WcmHUbz78GKpWK6+3j%0Aem36Bu3cGLR66zXpJ6nbTZJ729Zykoj2D1I3xbFgZag3FtyDMDOzJCcIMzNLcoIwM7MkJwgzM0ty%0AgjAzsyQnCDMzS3KCMDOzJCcIMzNLcoIwM7MkJwgzM0tygjAzsyQnCDMzS3KCMDOzJCcIMzNLcoIw%0AM7OkSb9RrpOyr68e52fjm5m116Q9CEnXSxrNv1e3dt0nJL0qaXahbK2k7ZK2SVpeKD9d0pZ83VXT%0Ab17kP2bdq1VxYtZtprrFdAOworZQ0iLgPcAPCmVLgVXA0nyfazTeDbgWuCgilgBLJB10TLMe1myc%0A+FavdaVJT8yIuA/Yk1j1eeCTNWUrgQ0RsTcidgI7gDMlzQOOjojN+XY3Aec21WqzLtKCOFlWbgvN%0AGlP3OxdJK4HhiPh+zar5wHDh9TCwIFE+kpeb9a0G4sSs69Q1SC3pSOAysm7z/uKWtsisxzUQJx5o%0A6xKeHHOgemcxvQlYDDya/yEXAg9LOpOsZ7CosO1CsndHI/lysXxkogqGhoYKr6pApc4m2qCrVqtU%0Aq9VONqHeOEnGQzEWKpUKlUqllMZarbGk0PvvfZuNBU2VISUtBm6PiJMT654ETo+I5/LBt/Vk91MX%0AAPcAJ0ZESHoQuBjYDNwBXB0RdyWOF2PtyQJr/D9q0DO5NU4SEVFqtLciTmr2qS2yNuj36069sTDV%0ANNcNwP3ASZKeknRBzSb7/3oRsRXYCGwFvgmsLpzhq4HrgO3AjlRyMOtVLYwTs64yZQ+indyDsDK0%0AowfRau5BdEa/X3da2oMwM7PB5QRhZmZJThBmZpbkBGFmZklOEGZmluQEYWZmSU4QZmaW5ARhZmZJ%0AThBmZpbkBGFmZklOEGZmluQEYWZmSU4QZmaW5ARhZmZJThBmZpbkBGFmZklOEGZmluQEYWZmSVN9%0AJ/X1kkYlbSmU/U9Jj0t6VNKtko4prFsrabukbZKWF8pPl7QlX3dVOb+KWWe0Kk7Mus1UPYgbgBU1%0AZXcDb46ItwBPAGsBJC0FVgFL832uUfYFrwDXAhdFxBJgiaTaY5r1smbjxD1560qTnpgRcR+wp6Zs%0AU0S8mr98EFiYL68ENkTE3ojYCewAzpQ0Dzg6Ijbn290EnNui9pt1XAviZFm72mpWj2bfuVwI3Jkv%0AzweGC+uGgQWJ8pG83GxQTCdOzLpOwwlC0h8DL0fE+ha2x6yvTDNOol3tMavHzEZ2kvRR4H3AuwvF%0AI8CiwuuFZO+ORhjvXo+Vj0x07KGhocKrKlBppIk2wKrVKtVqtdPNqCdOkvFQjIVKpUKlUml1E63P%0ANRsLipj8zYukxcDtEXFy/noF8DngrIj4cWG7pcB6svupC4B7gBMjIiQ9CFwMbAbuAK6OiLsSdcVY%0Ae7Lx7bG2ianaaTYRSUSEpt6yqToW02Sc1ByvtsjaoN+vO/XGwqQ9CEkbgLOA4yQ9BVxONhvjUGBT%0APknpHyJidURslbQR2ArsA1YXzvDVwI3AEcCdqeRg1qtaGCdmXWXKHkQ7uQdhZWhHD6LV3IPojH6/%0A7tQbC55/bWZmSU4QZmaW5ARhZmZJThBmZpbkBGFmZklOEGZmluQEYWZmSU4QZmaW1NCzmMzM+sn4%0AV9dYkXsQZmaAH6p7MCcIMzNLcoIwM7MkJwgzM0tygjAzsyQnCDMzS3KCMDOzJCcIMzNLcoIwM7Ok%0ASROEpOsljUraUiibLWmTpCck3S1pVmHdWknbJW2TtLxQfrqkLfm6q8r5Vcw6o1VxYtZtpupB3ACs%0AqClbA2yKiJOAe/PXSFoKrAKW5vtco/HPr18LXBQRS4AlkmqPadbLmo0T9+StK016YkbEfcCemuJz%0AgHX58jrg3Hx5JbAhIvZGxE5gB3CmpHnA0RGxOd/upsI+Zj2vBXGyrB3tNKtXI+9c5kTEaL48CszJ%0Al+cDw4XthoEFifKRvNysn9UbJ2Zdp6mubUQEfsKV2aSmESeOIetKjTzue1TS3IjYld8+2p2XjwCL%0ACtstJHt3NJIvF8tHJjr40NBQ4VUVqDTQRBtk1WqVarXa6WbUEyfJeCjGQqVSoVKplNNS61vNxoKy%0ANzeTbCAtBm6PiJPz158Fno2IKyWtAWZFxJp88G092f3UBcA9wIkREZIeBC4GNgN3AFdHxF2JumKs%0APdn49ljbxFTtNJuIJCKi1Af+tyJOao5XW2QlGr/e9Pd1p95YmLQHIWkDcBZwnKSngD8BrgA2SroI%0A2AmcBxARWyVtBLYC+4DVhTN8NXAjcARwZyo5TGVsQlS//YdZ72thnJh1lSl7EO00WQ9iLLt3U3ut%0AN7SjB9Fq7kG0l3sQaZ5/bWZmSU4QZmaW5ARhZmZJThBmZpbkBGFmZklOEGZmluQEYWZmSU4QZmaW%0A5ARhZmZJThBmZpbkBGFmZklOEGZmluQEYWZmSU4QZmaW5ARhZmZJThBmZpbkBGFmZklOEGZmltRw%0AgpC0VtJjkrZIWi/pMEmzJW2S9ISkuyXNqtl+u6Rtkpa3pvlm3a3eODHrJg0lCEmLgd8DTouIk4EZ%0AwAeANcCmiDgJuDd/jaSlwCpgKbACuEaSey/W1+qNE7Nu0+hF+kVgL3CkpJnAkcDTwDnAunybdcC5%0A+fJKYENE7I2IncAOYFmjjTbrEfXGiVlXaShBRMRzwOeAH5Kd8M9HxCZgTkSM5puNAnPy5fnAcOEQ%0Aw8CChlps1iMaiBOzrtLoLaY3AR8HFpNd/I+S9KHiNhERQExymMnWmfW8FsWJWcfMbHC/M4D7I+JZ%0AAEm3Am8HdkmaGxG7JM0DdufbjwCLCvsvzMsOMjQ0VHhVBSoNNtEGVbVapVqtdroZUH+cHKAYC5VK%0AhUqlUnqDrb80GwvK3sDUuZP0FuArwNuAl4Abgc3A8cCzEXGlpDXArIhYkw9Srycbd1gA3AOcGDWV%0AS9pfJInxN1Zjy6KR9tpgk0REqAP11hUnNfvWhoeVaPx6c+B1p9/+D+qNhYZ6EBHxqKSbgO8CrwL/%0ACHwROBrYKOkiYCdwXr79Vkkbga3APmC1z37rd/XGiVm3aagHURb3IKwMnepBNMM9iPZyDyLNn0Uw%0AM7MkJwgzM0tygjAzsyQnCDMzS3KCMDOzJCcIMzNLcoIwM7MkJwgzM0tygjAzsyQnCDMzS3KCMDOz%0AJCcIMzNLavT7IMzMelL2YL5Mvz2Mr9XcgzCzAeTEMB1OEGZmluQEYWZmSU4QZmaW5ARhZmZJDScI%0ASbMk3SLpcUlbJZ0pabakTZKekHS3pFmF7ddK2i5pm6TlrWm+WXerN07MukkzPYirgDsj4leAU4Bt%0AwBpgU0ScBNybv0bSUmAVsBRYAVwjyb0XGwTTjhOzbqNG5gFLOgZ4JCLeWFO+DTgrIkYlzQWqEfHL%0AktYCr0bElfl2dwFDEfFAzf77v6h9/EvEYfyLxPvvS8StfPV+UXsL660rTmq2CZ/r5Ri/toxfT4pl%0AxetOv/0f1BsLjb6LPwH4kaQbJP2jpL+Q9FpgTkSM5tuMAnPy5fnAcGH/YWBBg3Wb9Yp648SsqzSa%0AIGYCpwHXRMRpwM+o6Sbnb38mS7/9lZrNDtaKOLESSTrgk9V2oEYftTEMDEfEQ/nrW4C1wC5JcyNi%0Al6R5wO58/QiwqLD/wrzsIENDQ4VXVaDSYBNtUFWrVarVaqebAfXHyQGKsVCpVKhUKuW2diAVb2P3%0An2ZjoaExCABJfw98LCKekDQEHJmvejYirpS0BpgVEWvyQer1wDKyW0v3ACfW3mSdzhhEUb/dH7Ry%0AdGoMIq972nFSs5/HIEoy0XiDxyAS2zeRIN4CXAccCvxf4AJgBrAReAOwEzgvIp7Pt78MuBDYB1wS%0AEd9KHHMaCaJ///OsHB1OEHXFSWE/J4iSOEG0IUGUwQnCytDJBNEoJ4jyOEGUP4vJzMz6nBOEmZkl%0AOUGYmVmSE4SZ2QQG/XMSThBmZhPqr0HqejlBmJlZkhOEmZklOUGYmVmSE4SZmSU5QZiZWZIThJmZ%0AJTlBmJlZkhOEmZklOUGYmVmSE4SZmSU5QZiZWZIThJmZJTlBmJlZUlMJQtIMSY9Iuj1/PVvSJklP%0ASLpb0qzCtmslbZe0TdLyZhtu1ivqiROzbtJsD+ISYCvjz8RdA2yKiJOAe/PXSFoKrAKWAiuAayS5%0A92KDYlpxYtZtGr5IS1oIvA+4juybvgHOAdbly+uAc/PllcCGiNgbETuBHcCyRusutGHgv9DDulud%0AcWLWVZp5F/+/gD8CXi2UzYmI0Xx5FJiTL88HhgvbDQMLmqg7Fwz6F3pY16snTsy6SkMJQtL7gd0R%0A8Qjj74oOEBFTXb19Zbe+1qI4MeuYmQ3u9w7gHEnvAw4HfkHSl4FRSXMjYpekecDufPsRYFFh/4V5%0A2UGGhoYKr6pApcEm2qCqVqtUq9VONwPqj5MDFGOhUqlQqVTKb3GfKd5+znLxYGk2FtTsH03SWcAf%0ARsRvSfos8GxEXClpDTArItbkg9TrycYdFgD3ACdGTeWS9hdl/7Fjq8eWU2XZ8iD+59v0SCIiOjpQ%0ANZ04qdm+NjysAePXkfFrRLFseteY/rm+1BsLjfYgao399a4ANkq6CNgJnAcQEVslbSSbybEPWO2z%0A3wbQpHFi5fJklvo13YNoJfcgrAzd0IOol3sQjTs4EdTXW3APYpw/i2Bmfchj/63gBGFmZklOEGZm%0AluQEYWZmSU4QZmaW5ARhZmZJThBmZpbkBGFmZklOEGZmluQEYWZmSU4QZmaW5ARhZmZJrXqaa1er%0AfXhXvzx4y8ysTAPUg/DDu8zM6jFACcLMzOrRN7eYUl8G4ltJZmaN66MeRBT+dWIwM2tWHyUIMzNr%0ApYYShKRFkv5O0mOS/knSxXn5bEmbJD0h6W5Jswr7rJW0XdI2Sctb9QuYdatG4sSsmzT0ndSS5gJz%0AI+J7ko4CHgbOBS4AfhwRn5V0KXBsRKyRtBRYD7wNWADcA5wUEa/WHLfh76ROrZ/oWB6bGCyd+k7q%0AeuOkZl9/J3WDpn/t8HdST6WhHkRE7IqI7+XLPwUeJ7vwnwOsyzdbRxYMACuBDRGxNyJ2AjuAZY3U%0AXQ9JycFrs3ZoIE7MukrTYxCSFgOnAg8CcyJiNF81CszJl+cDw4XdhskCpWQesLbuMM04MesqTSWI%0AvNv8NeCSiPhJcV3eP57s6uwrtw2EJuPErGMa/hyEpNeQnfRfjojb8uJRSXMjYpekecDuvHwEWFTY%0AfWFedpChoaHCqypQabSJNqCq1SrVarXTzQDqjpMDFGOhUqlQqVRKbq31m2ZjodFBapHdO302Iv5L%0AofyzedmVktYAs2oGqZcxPkh9Yu0oXKsHqSda7pcBJ5ueDg5S1xUnNft6kLpBHqSeWL2x0GiC+HXg%0A74HvM/7XXAtsBjYCbwB2AudFxPP5PpcBFwL7yLra30oc1wnCWq6DCaLuOCns6wTRICeIibUlQZTF%0ACcLK0KkE0QwniIlN9XRmJ4iJ1RsLXfcsphdeeKHTTTCzrle8iFtZui5BvP71b2Dfvn/tdDPMzAZe%0A1z2L6eWXX+DII8/rdDPMrIPGPuQ63Q+61ru9TU/XJQgzs0y99/39kZJW67pbTGZmRcVeQb8MFvcK%0A9yDMrMu5Z9ApThBmZpbkW0xm1lHN3kLywHR5nCDMrCGtHRsY+5Bau/e1yfgWk5k1wWMD/cwJwszM%0AknyLycymfL7RoBvUqbbuQZhZztNJJzaYfxsnCDMzS/ItJjMrxaDeluknThBm1rSJk8GBU1Cn+sxC%0As+uttQbyFpOf+mjWavXco4/Cv7X7THWczvdEBun6MZAJohtOMjPrVYNz/WhrgpC0QtI2SdslXdrO%0Aus26Sbtiofg9Ca1419vosQbpXXc/aVuCkDQD+AKwAlgKnC/pV9pV/2Sq1arr7eN6u037Y+HA2zaT%0AJ41q3cdrpA0Hm069ZWi83tq/Yz0JsFdioZ09iGXAjojYGRF7ga8CK9tY/0HG/lPPPvvsjtQ/aBfq%0AXgmKNuiCWJjogl0FmnvH39i+1Ybqal4z9Rb/hvUlzF6JhXbOYloAPFV4PQyc2cb6E8a/+HyiE7p2%0Aep4/cWot0FQs3HzzV7jxxlsBOPxw+PrXNzJjxozWtrDph+fRxP7WLdqZIKZ5Jf0a+/b9sNyWJBVP%0A6oMTx8FT9w5cf8CRnDRsck2dIFu2PMa99+4A3gxsYObMdBinzsPU+TrZu32PG9Rnss9+1P4th4aG%0AGjpu6thlUdsqkn4NGIqIFfnrtcCrEXFlYRtfWa0UEdE1VzrHgnVSPbHQzgQxE/hn4N3A08Bm4PyI%0AeLwtDTDrEo4F6xVtu8UUEfsk/QHwLWAG8CUHhA0ix4L1irb1IMzMrLd07FlM+bzvlWQzOiCbyfEN%0Av5OyQeNYsG7VkR5E/snR88nmfw/nxYuAVcBfRsRnSqz7ELJ56AvIZpOMAJuj5D+E621Pvb2mU7Eg%0AaRawBjgXmEP2f7QbuA24IiKeL6PevO6BOid7ud5OJYjtwNL8Q0LF8kOBrRFxYkn1LgeuAXYwHowL%0AgSXA6oj4luvt3XrzuleQXfTG3o2PALdFxF1l1dmMDsbC3cC9wDpgNCJC0jzgd4F3RcTykuodqHOy%0A5+uNiLb/ANuAxYnyxcA/d6DeE4Btrrfn670KuBP4APDO/Of8vOzqsuot6W9Vdiw80ci6Hj43XG8D%0A9XZqDOLjwD2SdjD+idJFZNntD0qsdwbZO8paI5Q7HuN621Pv+yJiSW2hpK8C24GLS6y7UZ2KhR9I%0A+iSwLiJGASTNJetBlPlJ1UE7J3u63o4kiIi4S9IvcfD9se9GxL4Sq74eeEjSBg683/uBfJ3r7e16%0AX5K0LCI215QvA35eYr0N62AsrCIbg/i2pDl52SjwDeC8EusdtHOyp+sduGmukpaSzRiZnxeNkM0Y%0A2ep6e7teSacD1wJHc+B91xfJ7rs+XFbdvU7SO8mS1JaIuLvkugbmnOz1egcuQVj/ywdb9wdFROzq%0AZHu6kaTNEbEsX/494PeBvwaWA38TJc4ktN4xUN8oJ2mWpCvyL2rZI+m5fPmKfNqf6+3hevO6BRxP%0ANsi7GDhefuJcymsKy/8ReE9EfIosQXywrEo7eE6+t6YNX5K0RdL6wi22Murt6d93oBIEsBHYA1SA%0A2RExGzgbeD5f53p7uN58at92YAh4b/7zKWCHpN8sq94eNUPSbEmvA2ZExI8AIuJnQJljH506Jz9d%0AWP4c8AzwW8BDwJ+XWG9P/74DdYtJ0hMRcVK961xvz9S7DVgRETtryk8AvhkRv1xGvb1I0k4O/Lab%0AfxMRz0g6GrgvIt5aUr2dOjceiYhT8+VHgbdGfvGT9GhEvKWkenv69x20HsQPJH2y2MWSNFfZp1nL%0AnNrnettTb6emFPaciFgcESfkP2+MiGfyVa8Av11i1Z06N35R0n+V9AngmJp1Zd6C7Onfd9ASxCrg%0AOLKpfXsk7SH7zsHXUe7Uvm6q9+/6uN6xqX2XSvpg/rOG7HHaZU4p7BsR8a8R8WSJVXQqFq4jm912%0AFHAD8Iuwf0LD90qst6d/34G6xQT7H4y2AHgwIn5SKF8RbXwcg6QvR8SHS67jTLJPTb4g6bVk895P%0AAx4D/kdEvFBSvYeRzbd+OiI2SfoQ8HZgK/DFqHmsRIvr7siUQmuepAsi4oYO1HthRJT2BqJT15xC%0AvQ9ExE8L5e+NiG9O6xiDlCAkXUw2ne9x4FTgkoi4LV+3/55dCfXezsFf8vsu4G+BiIhzSqp3K3BK%0AZN8/8BfAz4BbgN/Iy3+npHrXk93uOZJsMO4o4Na8XiLid8uo13qbpKciYlE/1dvBa05L6h20+7L/%0AATg9In4qaTHwNUmLI+LPSq53Idm75+uAV8kSxRnAn5Zcrwqfxj09Ik7Ll7+TD1yV5eSIOFnZN6c9%0ADczPk9TNwPfLqlQdfEKpTY+kLZOsLnO6aUfqpXPXnNp6b2mk3kFLEBrrakXETklnkf2HHU+5A1Vn%0AAJcAfwz8UUQ8IumliPh2iXUCPFboPj8q6W0R8ZCkk4CXS6z3kPw205HAEWSDZM8Ch1PuuNdGsieU%0AVjj4CaUbyeb4W2e9HlhBNvWz1v19WG+nrjm19VYaqXfQBql3S9o/fS//A76fbMDolLIqjYhXIuLz%0AwEeByyT9b9qTnD8GnCXpX4ClwP2SniTryXysxHpvJuvaPgB8ArhP0nVkc7DXlVjv4oi4MiJ2jU3p%0Ai4hnIuIKsg/NWefdARwVETtrf4Ay3zB1qt6OXHNaVe+gjUEsAvbWPnpBksjmgX+nTe14P/COiLis%0ATfUdQ/aY35nAcDsePZF3a1+MiOckvYmsF7UtIkq7tSVpE7CJ9BNK3xMRv1FW3WYpnbrmtKregUoQ%0A1t8kzSYbgziH8fvKY08ovSIinutU28x6kROEDYROTaE062VOEDYQOjWF0qyXDdosJutjHZzKaNaX%0AnCCsn3RqKqNZX3KCsH4yNpXxkdoVksr+zIlZ3/EYhJmZJQ3aB+XMzGyanCDMzCzJCcLMzJKcIMzM%0ALMkJwszMkv4/0VSQ6FMw/FIAAAAASUVORK5CYII=) +很有趣,但我们如何让计算机自己理解纯文本消息呢? 或者可以在这种畸形的胡言乱语之下呢? +## 步骤2:数据预处理 -Good fun, but how do we make computer understand the plain text messages themselves? Or can it under such malformed gibberish at all? - - +在本节中,我们将按原始消息(字符序列)按向矢量(数字序列)。 -## Step 2: Data preprocessing +映射不是1比1; 我们将使用[bag-of-words](https://en.wikipedia.org/wiki/Bag-of-words_model)方法,其中文本中的每个唯一单词将由一个数字表示。 - - -In this section we'll massage the raw messages (sequence of characters) into vectors (sequences of numbers). - -The mapping is not 1-to-1; we'll use the [bag-of-words](https://en.wikipedia.org/wiki/Bag-of-words_model) approach, where each unique word in a text will be represented by one number. - -As a first step, let's write a function that will split a message into its individual words: +作为第一步,让我们编写一个将消息拆分为单个单词的函数: In [11]: -``` +```python def split_into_tokens(message): message = unicode(message, 'utf8') # convert bytes into proper unicode return TextBlob(message).words ``` - - -Here are some of the original texts again: +这里再次输出原始文本: In [12]: -``` +```python messages.message.head() ``` @@ -375,13 +342,13 @@ Out[12]: Name: message, dtype: object ``` - +...以下是相同信息,分词: ...and here are the same messages, tokenized: In [13]: -``` +```python messages.message.head().apply(split_into_tokens) ``` @@ -396,21 +363,19 @@ Out[13]: Name: message, dtype: object ``` +NLP问题: +1. 大写字母是否蕴含信息? +2. 区分变形形式(“去”与“去”)是否蕴含信息? +3. 有感叹词,是否决定蕴含信息吗? -NLP questions: - -1. Do capital letters carry information? -2. Does distinguishing inflected form ("goes" vs. "go") carry information? -3. Do interjections, determiners carry information? +换句话说,我们希望更好地“标准化”文本。 -In other words, we want to better "normalize" the text. - -With textblob, we'd detect [part-of-speech (POS)](http://www.ling.upenn.edu/courses/Fall_2007/ling001/penn_treebank_pos.html) tags with: +使用textblob,我们会检测[词性(POS)](http://www.ling.upenn.edu/courses/Fall_2007/ling001/penn_treebank_pos.html)标签: In [14]: -``` +```python TextBlob("Hello world, how is it going?").tags # list of (word, POS) pairs ``` @@ -425,13 +390,11 @@ Out[14]: (u'going', u'VBG')] ``` - - -and normalize words into their base form ([lemmas](https://en.wikipedia.org/wiki/Lemmatisation)) with: +并将单词标准化为基本形式([lemmas](https://en.wikipedia.org/wiki/Lemmatisation)): In [15]: -``` +```python def split_into_lemmas(message): message = unicode(message, 'utf8').lower() words = TextBlob(message).words @@ -452,70 +415,54 @@ Out[15]: Name: message, dtype: object ``` +现在更好了。 您可以想到更多改进预处理的方法:解码HTML实体(我们在上面看到的那些`&amp;`和`&lt;`); 过滤掉停用词(代词等); 添加更多功能,例如所有大写字母指示符等。 +## 步骤3:向量的数据 -Better. You can probably think of many more ways to improve the preprocessing: decoding HTML entities (those `&` and `<` we saw above); filtering out stop words (pronouns etc); adding more features, such as an word-in-all-caps indicator and so on. - - - -## Step 3: Data to vectors - - +现在我们将每个消息(表示为上面的标记(lemmas)列表)转换为机器学习模型可以理解的向量。 -Now we'll convert each message, represented as a list of tokens (lemmas) above, into a vector that machine learning models can understand. +这样做基本上需要三个步骤,在词袋模型中: -Doing that requires essentially three steps, in the bag-of-words model: +1.计算每个消息中出现一个单词的次数(术语频率) +2.加权计数,使频繁的令牌获得较低的权重(逆文档频率) +3.将向量归一化为单位长度,从原始文本长度(L2范数)中抽象出来 -1. counting how many times does a word occur in each message (term frequency) -2. weighting the counts, so that frequent tokens get lower weight (inverse document frequency) -3. normalizing the vectors to unit length, to abstract from the original text length (L2 norm) - - - -Each vector has as many dimensions as there are unique words in the SMS corpus: +每个向量的维度与SMS语料库中的唯一单词一样多: In [16]: -``` +```python bow_transformer = CountVectorizer(analyzer=split_into_lemmas).fit(messages['message']) print len(bow_transformer.vocabulary_) ``` - - ``` 8874 ``` +在这里,我们使用了`scikit-learn`(`sklearn`),这是一个功能强大的Python库,用于使用机器学习。 它包含多种方法和选项。 - -Here we used `scikit-learn` (`sklearn`), a powerful Python library for teaching machine learning. It contains a multitude of various methods and options. - -Let's take one text message and get its bag-of-words counts as a vector, putting to use our new `bow_transformer`: +让我们拿一条短信,把它的字袋计数作为一个向量,使用我们新的`bow_transformer`: In [17]: -``` +```python message4 = messages['message'][3] print message4 ``` - - ``` U dun say so early hor... U c already then say... ``` In [18]: -``` +```python bow4 = bow_transformer.transform([message4]) print bow4 print bow4.shape ``` - - ``` (0, 1158) 1 (0, 1899) 1 @@ -529,59 +476,47 @@ print bow4.shape (1, 8874) ``` - - -So, nine unique words in message nr. 4, two of them appear twice, the rest only once. Sanity check: what are these words the appear twice? +所以,消息nr中有九个独特的单词。 4,其中两个出现两次,其余只出现一次。 理智检查:是什么单词出现了两次? In [19]: -``` +```python print bow_transformer.get_feature_names()[6736] print bow_transformer.get_feature_names()[8013] ``` - - ``` say u ``` - - -The bag-of-words counts for the entire SMS corpus are a large, sparse matrix: +整个SMS语料库的词袋计数是一个庞大的稀疏矩阵: In [20]: -``` +```python messages_bow = bow_transformer.transform(messages['message']) print 'sparse matrix shape:', messages_bow.shape print 'number of non-zeros:', messages_bow.nnz print 'sparsity: %.2f%%' % (100.0 * messages_bow.nnz / (messages_bow.shape[0] * messages_bow.shape[1])) ``` - - ``` sparse matrix shape: (5574, 8874) number of non-zeros: 80272 sparsity: 0.16% ``` - - -And finally, after the counting, the term weighting and normalization can be done with [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf), using scikit-learn's `TfidfTransformer`: +最后,在计数之后,可以使用[TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf),使用scikit-learn的`TfidfTransformer`来完成术语加权和归一化。: In [21]: -``` +```python tfidf_transformer = TfidfTransformer().fit(messages_bow) tfidf4 = tfidf_transformer.transform(bow4) print tfidf4 ``` - - ``` (0, 8013) 0.305114653686 (0, 7698) 0.225299911221 @@ -594,117 +529,89 @@ print tfidf4 (0, 1158) 0.274934159477 ``` - - -What is the IDF (inverse document frequency) of the word `"u"`? Of word `"university"`? +什么是“u”字的IDF(逆文档频率)是多少? "university"这个词? In [22]: -``` +```python print tfidf_transformer.idf_[bow_transformer.vocabulary_['u']] print tfidf_transformer.idf_[bow_transformer.vocabulary_['university']] ``` - - ``` 2.85068150539 8.23975323521 ``` - - -To transform the entire bag-of-words corpus into TF-IDF corpus at once: +将整个词袋语料库立即转换为TF-IDF语料库: In [23]: -``` +```python messages_tfidf = tfidf_transformer.transform(messages_bow) print messages_tfidf.shape ``` - - ``` (5574, 8874) ``` +有多种方法可以对数据进行预处理和向量化。 这两个步骤,也称为“特征工程”,通常是构建预测管道的最耗时和“不合时宜”的部分,但它们非常重要并且需要一些经验。 诀窍是不断评估:分析模型的错误,改进数据清理和预处理,为新功能进行头脑风暴,评估...... +## 步骤4:训练模型,检测垃圾邮件 -There are a multitude of ways in which data can be proprocessed and vectorized. These two steps, also called "feature engineering", are typically the most time consuming and "unsexy" parts of building a predictive pipeline, but they are very important and require some experience. The trick is to evaluate constantly: analyze model for the errors it makes, improve data cleaning & preprocessing, brainstorm for new features, evaluate... - - - -## Step 4: Training a model, detecting spam - - +将消息表示为向量,我们最终可以训练我们的垃圾邮件/火腿分类器。 这部分非常简单,有许多库可以实现训练算法。 -With messages represented as vectors, we can finally train our spam/ham classifier. This part is pretty straightforward, and there are many libraries that realize the training algorithms. - - - -We'll be using scikit-learn here, choosing the [Naive Bayes](https://en.wikipedia.org/wiki/Naive_Bayes_classifier) classifier to start with: +我们将在这里使用scikit-learn,一开始选择[Naive Bayes](https://en.wikipedia.org/wiki/Naive_Bayes_classifier)分类器: In [24]: -``` +```python %time spam_detector = MultinomialNB().fit(messages_tfidf, messages['label']) ``` - - -``` +```python CPU times: user 3.16 ms, sys: 699 µs, total: 3.86 ms Wall time: 3.33 ms ``` - - -Let's try classifying our single random message: +让我们尝试对单个随机消息进行分类: In [25]: -``` +```python print 'predicted:', spam_detector.predict(tfidf4)[0] print 'expected:', messages.label[3] ``` - - ``` predicted: ham expected: ham ``` +万岁! 您也可以使用自己的文本进行尝试。 - -Hooray! You can try it with your own texts, too. - -A natural question is to ask, how many messages do we classify correctly overall? +自然而然的有一个问题是,我们总共正确分类了多少条消息? In [26]: -``` +```python all_predictions = spam_detector.predict(messages_tfidf) print all_predictions ``` - - ``` ['ham' 'ham' 'spam' ..., 'ham' 'ham' 'ham'] ``` In [27]: -``` +```python print 'accuracy', accuracy_score(messages['label'], all_predictions) print 'confusion matrix\n', confusion_matrix(messages['label'], all_predictions) print '(row=expected, col=predicted)' ``` - - ``` accuracy 0.969501255831 confusion matrix @@ -715,7 +622,7 @@ confusion matrix In [28]: -``` +```python plt.matshow(confusion_matrix(messages['label'], all_predictions), cmap=plt.cm.binary, interpolation='nearest') plt.title('confusion matrix') plt.colorbar() @@ -729,22 +636,16 @@ Out[28]: ``` - - ![img](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAQsAAAD0CAYAAACM5gMqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz%0AAAALEgAACxIB0t1+/AAAHF5JREFUeJzt3XuYXFWZ7/HvLwEMhIsT5QSCQLiEDDpACDEgjBgG9AmI%0AUWeUEQQBURkvwODoAdGjccZBOD4gA4oeCUqA4eYoDEEQAsIhHi6BEALhjpIQSNKEiSOQEEnIe/7Y%0Aq9KVtqp7dV26uqp/n+eph33fq0L322uvtfZ6FRGYmfVlWKsLYGbtwcHCzLI4WJhZFgcLM8viYGFm%0AWRwszCyLg0ULSPqZpJWS7qvjGu+V9GQjy9UqknaS9KoktbosVp08zmJgSXovcBUwLiLWtLo8zSZp%0AEfDpiPhNq8ti9dmk1QUYgnYGFg2FQJEEULXGIGmTiFg3gOVpOUn9+gsdEYOixuXHkF5I2lHSLyW9%0AJOllSRel7cMkfUPSIkldkmZK2jrtGytpvaRPSVosaYWks9K+k4BLgPekavd0SSdImtPjvusl7ZqW%0Aj5D0mKRXJL0g6Z/S9imSlpSds6ekuyT9QdJCSR8q23eZpB9Kuild577S9St851L5T5D0vKT/kvQP%0Akt4t6ZF0/YvKjt9N0m/Sv88KSVdK2ibtuwLYCZiVvu9Xyq7/aUmLgdsl7Zy2DZM0StISSUema2wp%0A6VlJx9b9P3QQkZT1GVQiwp8KH2A4sAA4D9gceAtwYNr3aeAZYCwwEvgFcHnaNxZYD/yfdM7ewBpg%0AfNp/PDCn7D4nlK+nbeuBXdPyMuCgtLwNsG9angIsScubAs8CZ1LUFg8BXgH2SPsvA14GJqXvdSVw%0AdZXvXSr/xcBmwPuBPwHXA28HxgBdwMHp+N2AQ1MZ3g78X+D7Zdd7DvibCte/rOzftbRtWDrm/el7%0Ab0sRXK9r9c9Dg3+2YtiwYVmf4le09WWOCNcsejEZ2B74akS8HhF/ioh70r5PAudFxKKIWAV8DfiE%0ApPJ/z2+ncx6hCDr7pO39/XPxBvAuSVtHxB8jYn6FYw4ARkbEORGxLiLuBG4Cji475pcR8WBEvAn8%0AOzChj/v+S0S8ERGzgVeBqyLi5YhYCswB9gWIiN9FxB0RsTYiXga+D7wv43tNL/279tyR7vlz4DfA%0AVODkjOu1lXasWThYVLcjsDgi1lfYtz2wuGz9eYq/6KPLti0vW14NbFljOf4OOAJYlB4zDqhwzBhg%0ASY9ti9N2KNoNusr2vZ5Rnp7HVzxf0mhJ16RHpD8CVwBv6+PaVChvT5cA7wIui4g/ZFyvrQwbNizr%0AM5gMrtIMLkuAnSQNr7BvKUXVuWQnYB0b/0LlWgVsUVqRtF35zlQb+AhFlfwG4Loq5dmxR9fjzsCL%0ANZQnV6mR7mzgTeCvImIb4Dg2/rmq1phXtZEv/Zv/BLgc+KKk3eov7uDimkVnuZ/iufkcSVtIGiHp%0AwLTvauD01Fi3JcUvzDVVaiF9WUDxmLGPpBHA9NIOSZtK+qSkbdLjw6sUv5iVyroa+J/pnCnAkcA1%0ApUvVUK7elF9vS4qA94qkHYCv9ji2i6Jdoz/OovieJwLfAy7v8YjX9hwsOkj6xf8QsDvFY8YS4Ki0%0A+6cU1e27gd9T/KKeUn56b5cu3x8RTwP/DNwOPEXRHlB+/rHAc6mK/zmK9pKN7hMRb6SyHg6sAH4A%0AHJeu/Wf3zCxjb8r3fxuYCPwRmEXR2Fu+/7vAN1Ivypd7uX4ASNoPOB34VBStgeemfWf0Uaa20o7B%0AwoOyMkmaClxA0ZswIyLObXGROo6knwIfBF6KiL1aXZ5mkRSbb7551rGvv/464XEW7SM9Q/+AomX+%0AncDRkvZsbak60s8o/o07XjvWLBws8kwGnk1dpWsp2gI+3OIydZyImAN0XM9HJfUGC0nDJc2XNCut%0AT089UvPT5/CyY78m6RlJT0r6QNn2/SQ9mvb9W19ldrDIswMbd/W9kLaZ1aQBXaenAY/T3f4TwPkR%0AsW/63AIg6Z3A31PUiKcCF5f1mv0IOCkixgHj0qN29TLX/G2HFjfsWEPVU7OQ9A6KsTcz6O6ZEpV7%0AvT5MMVp3bUQsohjpu7+k7YGtImJuOu5y4CO9ldnBIs+LFIO0SnakqF2Y1aTOx5DvU3RRl3fVB3CK%0ApAWSLpX01rR9DBv/rJZqxT23v0gftWUHizwPUlTTxkrajKJad2OLy2RtrNZgoeIFu5fSsP/yA34E%0A7EIxjH8ZxTtNDeVgkSGKV6i/BNxK8Zx4bUQ80dpSdR5JVwP3AHuoePP0xFaXqVmqBYd169axZs2a%0ADZ8KDgSmSXqOYnDg30i6PCJeioTi8WRyOr5nrfgdFDWKF9Ny+fZeR/x6nIXZAJMUo0aNyjp25cqV%0AVcdZSHof8JWI+JCk7SNiWdp+OvDuiDgmNXBeRRE8dqAY/Ld7RISk+4FTgbnAr4ALI+LX1criyW/M%0AWqBBYyhEd+P7/5a0T1p/jvSmbkQ8Luk6ihrxOuAL0V1D+ALdUwXc3FugANcszAacpNh2222zjl2x%0AYsWgGcHpmoVZCwy20Zk5HCzMWsDBwsyyOFj0k/o5y7HZYNaftoV2DBYeZzEEtHqi1/5+vvWtb7W8%0ADP399Fc7vnXqxxCzFhhsgSCHg4VZCwy2yXhzOFjYoDNlypRWF6HpXLMwawAHi8HJwcKsBRwszCyL%0Ag4WZZXGwMLMs7g0xsyztWLNov/Bm1gGakApglKTZkp6WdFvZHJxOBWDWzhow3LtnKoAzgdkRsQdw%0AR1p3KgCzdteEVADTgJlpeSbd0/o3LBWA2yzMWqDONotSKoCty7aNjoiutNwFjE7LY4D7yo4rpQJY%0Ai1MBmA1+TUgFsEGaY7Ph0z+4ZmHWAtW6Tl977TVWrVrV26mlVABHACOArSVdAXRJ2i4ilqdHjJfS%0A8Q1LBeCahVkLVKtJbLXVVmy33XYbPj1FxFkRsWNE7AJ8AvhNRBxHkfTq+HTY8cANaflG4BOSNpO0%0ACzAOmBsRy4FXJO2fGjyPKzunItcszFqggeMsSo8b5wDXSToJWAQcBR2UCsDT6g0Mp3toPknZ0+pJ%0Ain322SfrugsWLHAqALOhrB1HcDpYmLWAg4WZZXGwMLMsfuvUzLK4ZmFmWRwszCyLg4WZZXGwMLMs%0ADhZmlsXBwsyyuOvUzLK4ZmFmWRwszCxLOwaL9ntwMusAdUyrN0LS/ZIelvS4pO+m7dMlvZDSA8yX%0AdHjZOQ1JBeCahVkL1FqziIg1kg6JiNWSNgF+K+mvKSbBOT8izu9xn/JUADsAt0salybAKaUCmCvp%0AZklTe5sAxzULsxaoJxVARKxOi5sBw4E/lC5b4fCGpQJwsDBrgWHDhmV9KpE0TNLDFFP+3xkRj6Vd%0Ap0haIOlSdWckG8PGU/6XUgH03O5UAGaDUZ01i/URMYFiRu6DJU2heKTYBZgALAPOa3SZ3WZh1gLV%0AAsHKlStZuXJl1jUi4o+SfgVMioi7yq49A5iVVtsjFYCkqakF9hlJZzTzXmbtpFpN4m1vexvjxo3b%0A8Klw3ttLjxiSNgfeD8yXVJ434KPAo2l58KcCkDQc+AFwGEXEekDSjRHxRLPuadYu6hhnsT0wU9Iw%0Aij/2V0TEHZIulzSBolfkOeBkaGwqgGY+hkwGnk0tsEi6hqJl1sHChrw6uk4fBSZW2P6pXs45Gzi7%0AwvZ5wF65925msNgBWFK2/gKwfxPvZ9Y2/CLZxpzZxqyKdhzu3cxg0bMVdkc27tc1a1t33XUXd911%0AV83nt2OwaFr6wjQU9SngUGApMBc4uryB0+kLB4bTFzZff9MXTps2Leu6N954Y+enL4yIdZK+BNxK%0AMST1UveEmBXasWbR1EFZEXELcEsz72HWjhwszCyLg4WZZXHXqZllcc3CzLI4WJhZlo4KFpIu6uW8%0AiIhTm1AesyGho4IFMI/uIdulbxZp2aN8zOrQUcEiIi4rX5c0MiJWNb1EZkNAOwaLPvtvJB0o6XHg%0AybQ+QdLFTS+ZWQerdQ7OXlIBjJI0W9LTkm4rm4OzYakAcjp7LwCmAi8DRMTDwPsyzjOzKmqdgzMi%0A1gCHpDk49wYOSakAzgRmR8QewB1pvWcqgKnAxeq+cCkVwDhgnKSpvZU5a2RIRDzfY9O6nPPMrLIm%0ApAKYBsxM22fSPa3/gKYCeF7SQekLbibpK3i2K7O61BMsqqQCGB0RXemQLmB0Wm5YKoCccRafB/4t%0AXehF4DbgixnnmVkV9TRwRsR6YIKkbYBbJR3SY380Y/qHPoNFRKwAjmn0jc2GsmrBYtmyZSxfvjzr%0AGmWpAPYDuiRtFxHL0yPGS+mwgUsFIGk3SbMkvSxphaT/lLRr1rcxs4qqPXaMGTOGiRMnbvhUOK9i%0AKgCKKf+PT4cdT/e0/gOaCuAqiin9/zat/z1wNZ5816xmdbx1Wi0VwHzgOkknAYuAo6CxqQD6nFZP%0A0iMRsXePbQsiYp/+fceK1/ZI0AHgafWar7/T6n3mM5/Juu6MGTMG/7R6kkZRDO2+RdLXKGoTUNQs%0APPuVWR3acQRnb48hD7HxOyCfS/8tvRtyZrMKZdbpOipYRMTYASyH2ZDSUcGinKS/ohguOqK0LSIu%0Ab1ahzDpdRwYLSdMp3gV5F/Ar4HDgtxTDQ82sBu0YLHL6bz5GkQl9WUScCOwDvLX3U8ysN7W+ddpK%0AOY8hr0fEm5LWpeGlL7HxiDAz66d2rFnkBIsHJP0FcAnwILAKuKeppTLrcB0ZLCLiC2nxx5JuBbaO%0AiAXNLZZZZ+uoYCFpP6rMtSlpYkQ81LRSmXW4jgoWwHn0PjHvIb3sM7NedFSwiIgpA1gOsyFlsPV0%0A5HCSIbMW6KiahZk1TzsGi/arC5l1gFrn4JS0o6Q7JT0maaGkU9P26ZJekDQ/fQ4vO6chqQByekMq%0AZiBzb4hZ7eqoWawFTo+IhyVtCcyTNJvid/T8iDi/x33KUwHsANwuaVyaAKeUCmCupJslTe1tApyc%0A3pDNKeb4eyRt35ticNZ7avmmZlZ7sEjT4S1Py69JeoLuWbkrXXRDKgBgkaRSKoDFVE4FUDVYVH0M%0AiYgpEXEIsBSYGBH7RcR+wL5pm5nVqJ5UAGXXGEvx+3hf2nSKpAWSLlV3RrKGpQLIabP4y4h4tLQS%0AEQuBPTPOM7Mq6n2RLD2C/AdwWkS8RvFIsQswAVhG8WTQUDm9IY9ImgFcSVHNOQbwcG+zOlSrNSxe%0AvJjFixf3de6mwC+AKyPiBoCIeKls/wxgVlptWCqAnGBxIkWiodPS+t0UUczMalQtWIwdO5axY8du%0AWJ8zZ07P8wRcCjweEReUbd8+Ipal1Y8CpaeBG4GrJJ1P8ZhRSgUQkl6RtD8wlyIVwIW9lTnnRbLX%0AJf2YYqrwJ/s63sz6VkdvyEHAsRQ1/vlp21nA0ZImUHRKPAecDI1NBZAzU9Y04HvAW4CxkvYFvh0R%0A0/r1Fc1sgzp6Q35L5bbGqjPuR8TZwNkVts8D9sq9d85jyHSKhEJ3phvMb2RGstWrV/d9kNXl+eef%0Ab3URrId2HMGZEyzWRsR/9/hy65tUHrMhoVODxWOSPglsImkccCqeKcusLu341mlOiU+hmNn7TxRZ%0AyV4B/rGZhTLrdI0YlDXQcmoWR0TEWRQtrgBI+jjw86aVyqzDDbZAkCOnZnFW5jYzy9RRNYv0iusR%0AwA6SLqT7JZWtKN58M7MaDbZAkKO3x5ClwDyKt9bm0f2q+qvA6c0vmlnn6qhgkab7XyDpl8CqiHgT%0AQNJwigFaZlajdgwWOW0Wt1EMBy3ZAri9OcUxGxo6NX3hiPQKLAAR8aqkLZpYJrOO16k1i1Vpij0A%0AJE0CXm9ekcw6X0f1hpT5R+DnkkqzY21PMaefmdVosAWCHDmvqD8gaTwwnqJH5Mk0n5+Z1agjg4Wk%0AkcCXgZ0i4rOSxkkaHxE3Nb94Zp2pHYNFTpvFz4A3gAPT+lLgX5tWIrMhoNY2C1XPGzJK0mxJT0u6%0ArWzC3oblDckJFrtFxLkUAYOIWJVxjpn1oo6u01LekHcBBwBflLQncCYwOyL2AO5I6z3zhkwFLlZ3%0AFCrlDRkHjJM0tdcyZ3yvP0naMM5C0m4Ub6CaWY1qrVlExPKIeDgtvwaU8oZMA2amw2ZS5ACBsrwh%0AEbEIKOUN2Z7KeUOqyp0p69fAOyRdRTEH4AkZ55lZFY1os1B33pD7gdER0ZV2dQGj0/IYuvOKQHfe%0AkLX0M29ITm/IbZIeophaT8CpEfFyX+eZWXXVgsXTTz/NM888k3P+lhTpAE5LAyU37Eszd/9ZytF6%0A5fSGCHgf8NcUL5JtClzf6IKYDSXVgsX48eMZP378hvWbb7650rmlvCFXlPKGAF2StouI5ekRo5RH%0ApGF5Q3LaLC6mmFb8EWAhcLKkizPOM7Mq6ugNqZg3hCI/yPFp+XjghrLtn5C0maRd6M4bshx4RdL+%0A6ZrHlZ1TUU6bxSHAOyNifSrsZRQ5CMysRnW8JFYpb8jXgHOA6ySdBCwCjoIBzhtC0Xq6UyoAafnZ%0AnG9lZpXV2sDZS94QgMOqnDNgeUO2Bp6QNJeizWIy8ICkWcX9nGzIrL/acQRnTrD4ZoVtQffMWWbW%0AT50aLF6KiI3aKCRNiYi7mlMks87XjsEip5XlOklnqLCFpIsoGlPMrEbtOJ9FTrDYn6Kf9l6K1OzL%0A6H6pzMxq0I7BIucxZB3FzFibAyOA35e6Uc2sNoNtfs0cOSWeC6wBJgHvBY6R5GxkZnXo1JrFZyLi%0AgbS8DJgm6bgmlsms4w22QJAjp2YxT9Jxkr4JIGkn4OnmFsuss7VjzSL33ZD3AMek9deAHzatRGZD%0AQDsGi5zHkP0jYt/SOPSIWJneejOzGg22QJAjJ1i8oSJlIQCStgXcG2JWh04NFhdRzF/xPySdDXwM%0A+EZTS2XW4dqx6zRnpqwrJc0DDk2bPhwRTzS3WGadrR1rFlnhLSKeiIgfpI8DhVmd6pj85qeSuiQ9%0AWrZtuqQXJM1Pn8PL9jUkDQBkBgsza6w6ekN+RjGlf7kAzo+IfdPnlnSPhqUBgCYHi0pR0MzqSgUw%0AB/hDpUtW2NawNADQ/JpFpShoNuQ1YZzFKZIWSLpU3dnIxrDxdP+lNAA9t/eZBgDyekNqFhFzVOQ2%0AMLMy1QLBwoULWbhwYX8v9yPgn9PyvwDnASfVXLgqmhoszKyyal2ne++9N3vvvfeG9WuvvbbPa0VE%0Aadp/JM0AZqXVhqUBADdwmrVEIx9DUhtEyUeBUhthw9IAwCCoWXznO9/ZsHzwwQdz8MEHt7A0Znnu%0Avfde7rvvvr4PrKLWcRaSrqZI+vV2SUuAbwFTJE2g6BV5jiLPT0PTAACo+9zmSG0WsyLiz6YclxSr%0AV69u6v0NVqxY0eoidLydd96ZiMiKAJLipptuyrrukUcemX3dZmt21+nVwD3AHpKWSDqxmfczaxed%0A+tZpzSLi6GZe36xdDbZAkKPlbRZmQ5GDhZll6ci3Ts2s8VyzMLMsDhZmlsXBwsyyOFiYWRYHCzPL%0A4t4QM8vimoWZZXGwMLMsDhZmlqUdg0X7tbKYdYAGpwIYJWm2pKcl3VY2B6dTAZi1uwanAjgTmB0R%0AewB3pPX2SgVgZpUNGzYs69NTlVQA04CZaXkm3dP6NzQVgNsszFqgwW0WoyOiKy13AaPT8higfO6/%0AUiqAtQy2VABmVlmzGjgjIiQ1Za5MBwuzFqgWLObNm8e8efP6e7kuSdtFxPL0iFFKDdDQVAAOFmYt%0AUC1YTJo0iUmTJm1Yv+SSS3IudyNwPHBu+u8NZduvknQ+xWNGKRVASHpF0v7AXIpUABf2dRMHC7MW%0AaGAqgG8C5wDXSToJWAQcBW2YCqDXmzsVwIBwKoDm628qgIceeijruhMnThw0qQBcszBrAb91amZZ%0A2nG4t4OFWQs4WJhZFgcLM8viYGFmWRwszCyLg4WZZXHXqZllcc3CzLI4WJhZFgcLM8viYGFmWRws%0AzCxLOwaL9uu/MesAtU7YCyBpkaRHJM2XNDdt63c6gH6XudYTh6q777671UXoePfee2+ri9B0daQC%0AAAhgSkTsGxGT07b+pAOo6ffewaKfHCya77777uv7oDZXZ7AA6LmzP+kAJlMDBwuzFmhAzeJ2SQ9K%0A+mza1ls6gPJp/0vpAPrNDZxmLVBnA+dBEbFM0rbAbElPlu/MSAdQ01yaLQ8WW2yxRauL0G9nn312%0Aq4vQ8S644IJWF6GpqgWLe+65p882m4hYlv67QtL1FI8V/UkH0Oe0/xXL3MoJe82GIkmxdOnSrGPH%0AjBmz0YS9krYAhkfEq5JGArcB3wYOA/4rIs6VdCbw1og4MzVwXkURUHYAbgd2jxp+8VteszAbiup4%0A63Q0cH2qmWwC/HtE3CbpQfqfDqBfXLMwG2CSoqurq+8DgdGjRzsVgNlQ1o4jOB0szFqgHYOFx1kM%0AcpKmSJqVlj8k6Yxejt1G0udruMd0Sf+Uu73HMZdJ+rt+3GuspEf7W8ZO04BBWQPOwaJFahlyGxGz%0AIuLcXg75C4oclv2+dD+39/cY68HBwkp/OZ+UdKWkxyX9XNLmad8iSedImgd8XNIHJN0jaZ6k61JX%0AGJKmSnoiHffRsmufIOmitDxa0vWSHk6f91AkyN0tvWB0bjruq5LmSlogaXrZtb4u6SlJc4DxGd/r%0As+k6D0v6j9J3Sg6T9EC63gfT8cMlfa/s3p+r85+2o9TzIlmrDK7SdI49gB9GxDuBV+j+ax/AyxGx%0AH8XLPl8HDk3r84AvSxoB/AQ4Mm3fjsp/vS8E7oyICcBE4DHgDOB36QWjM1S8Ybh7etloX2A/Se+V%0AtB/Fy0X7AEcA765yj3K/iIjJ6X5PACel7QJ2joh3Ax8EfizpLWn/f6d7TwY+K2ls1r/eENCONQs3%0AcDbHkogoDcO7EjgVOC+tX5v+ewDFm4D3pB+KzYB7KP7KPxcRvys7v9Jf5UOAYwEiYj3wiqRRPY75%0AAPABSfPT+khgHLAV8MuIWAOskXQjf/5iUk97SfoOsA2wJfDrtD2A61I5npX0e+Av0733kvSxdNzW%0AwO4ULzINeYMtEORwsGiO8r/S6rG+qmx5dkQcU36ipH16XKu3n6qcn7jvRsRPetzjtB7n9nadUtkv%0AA6ZFxKOSjgemZJzzpYiY3ePeY/sucudrx2Dhx5Dm2EnSAWn5GGBOhWPuBw6StBuApJGSxgFPAmMl%0A7ZqOO7rKPe4APp/OHS5pa+BVilpDya3Ap8vaQnZQ8fLR3cBHJI2QtBVwJNUfQ0o/1VsCyyVtSlGj%0AibL9H1dhN2DX9B1uBb4gaZN07z1UDFU2/Bhi3Z4CvijppxRtCT9K2zf8QqaXgE4Ark7P+ABfj4hn%0AUmPgryStpgg0I8vOL13jNOAnKob3vgn8Q0TcL+n/qeiavDm1W+wJ3Jt+8F4Fjo2I+ZKuBRZQvHA0%0At5fvUrrf/6IIcCvSf7cs2/98usbWwMkR8YakGcBY4CEVN3+J7jkWhnwPymALBDk83LvBUjV7VkTs%0A1eKi2CAlKVatWtX3gcDIkSM93LvDOQJbrwZbt2gO1yzMBpikWLNmTdaxI0aMcM3CbChrxzaL9qsL%0AmXWAenpD0gjfJ1VM71/1XaGGl9mPIWYDS1KsXbs269hNN92050xZwyl62w6jmB7vAeDoiHiiGWUt%0A55qFWQvUUbOYDDwbEYsiYi1wDcV0/03nYGHWAnUEix2AJWXrNU/t319u4DRrgTq6TlvWbuBgYdYC%0AdfSG9Jzaf0c2TiLUNG7gNGsj6V2bp4BDgaUUw+wHpIHTNQuzNhIR6yR9ieJFveHApQMRKMA1CzPL%0A5N4QM8viYGFmWRwszCyLg4WZZXGwMLMsDhZmlsXBwsyyOFiYWZb/D92TBquXko5CAAAAAElFTkSu%0AQmCC) - - -From this confusion matrix, we can compute precision and recall, or their combination (harmonic mean) F1: +从这个混淆矩阵,我们可以计算准确率和召回旅,或他们的组合(调和平均值)F1: In [29]: -``` +```python print classification_report(messages['label'], all_predictions) ``` - - ``` precision recall f1-score support @@ -754,48 +655,36 @@ print classification_report(messages['label'], all_predictions) avg / total 0.97 0.97 0.97 5574 ``` +评估模型性能有很多可选择的指标。哪一个最合适取决于任务。例如,错误预测“垃圾邮件”为“火腿”的成本可能远低于错误预测“火腿”为“垃圾邮件”。 +## 步骤5:如何进行实验? -There are quite a few possible metrics for evaluating model performance. Which one is the most suitable depends on the task. For example, the cost of mispredicting "spam" as "ham" is probably much lower than mispredicting "ham" as "spam". - +在上面的“评价”中,我们犯了一个严重的问题。为了简化演示,我们评估了用于训练的相同数据的准确性。 **永远不要评估您训练的同一数据集!坏!乱伦!** +这样的评估没有告诉我们模型真正的预测能力。如果我们只是在训练期间记住每个例子,那么即使我们无法对任何新消息进行分类,训练数据的准确性也只有100%。 -## Step 5: How to run experiments? - - - -In the above "evaluation", we committed a cardinal sin. For simplicity of demonstration, we evaluated accuracy on the same data we used for training. **Never evaluate on the same dataset you train on! Bad! Incest!** - -Such evaluation tells us nothing about the true predictive power of our model. If we simply remembered each example during training, the accuracy on training data would trivially be 100%, even though we wouldn't be able to classify any new messages. - -A proper way is to split the data into a training/test set, where the model only ever sees the **training data** during its model fitting and parameter tuning. The **test data** is never used in any way -- thanks to this process, we make sure we are not "cheating", and that our final evaluation on test data is representative of true predictive performance. +一种正确的方法是将数据拆分为训练/测试集,其中模型仅在其模型拟合和参数调整期间看到**训练数据**。 **测试数据**从未以任何方式使用 - 由于这个过程,我们确保我们不是“作弊”,并且我们对测试数据的最终评估代表了真正的预测性能。 In [30]: -``` +```python msg_train, msg_test, label_train, label_test = \ train_test_split(messages['message'], messages['label'], test_size=0.2) print len(msg_train), len(msg_test), len(msg_train) + len(msg_test) ``` - - ``` 4459 1115 5574 ``` +因此,根据要求,测试大小是整个数据集的20%(总共5574个中的1115个消息),其余的训练(5574个中有4459个)。 - -So, as requested, the test size is 20% of the entire dataset (1115 messages out of total 5574), and the training is the rest (4459 out of 5574). - - - -Let's recap the entire pipeline up to this point, putting the steps explicitly into scikit-learn's `Pipeline`: +让我们回顾整个流程到目前为止,将步骤明确地放入scikit-learn的“Pipeline”中: In [31]: -``` +```python pipeline = Pipeline([ ('bow', CountVectorizer(analyzer=split_into_lemmas)), # strings to token integer counts ('tfidf', TfidfTransformer()), # integer counts to weighted TF-IDF scores @@ -803,15 +692,13 @@ pipeline = Pipeline([ ]) ``` +通常的做法是将训练集再次划分为较小的子集; 例如,5个大小相等的子集。 然后我们在四个部分上训练模型,并在最后一部分计算精度(称为“验证集”)。 重复五次(每次评估不同的部分),我们得到模型“稳定性”的感觉。 如果模型为不同的子集提供了截然不同的分数,则表明存在错误(不良数据或模型差异)的迹象。 返回,分析错误,重新检查输入数据是否有垃圾,重新检查数据清理。 - -A common practice is to partition the training set again, into smaller subsets; for example, 5 equally sized subsets. Then we train the model on four parts, and compute accuracy on the last part (called "validation set"). Repeated five times (taking different part for evaluation each time), we get a sense of model "stability". If the model gives wildly different scores for different subsets, it's a sign something is wrong (bad data, or bad model variance). Go back, analyze errors, re-check input data for garbage, re-check data cleaning. - -In our case, everything goes smoothly though: +在我们的例子中,一切顺利但是: In [32]: -``` +```python scores = cross_val_score(pipeline, # steps to convert raw messages into models msg_train, # training data label_train, # training labels @@ -822,49 +709,38 @@ scores = cross_val_score(pipeline, # steps to convert raw messages into models print scores ``` - - ``` [ 0.93736018 0.96420582 0.94854586 0.94183445 0.96412556 0.94382022 0.94606742 0.96404494 0.94831461 0.94606742] ``` - - -The scores are indeed a little bit worse than when we trained on the entire dataset (5574 training examples, accuracy 0.97). They are fairly stable though: +分数确实比我们训练整个数据集时差一点(5574训练样例,准确度0.97)。 它们相当稳定: In [33]: -``` +```python print scores.mean(), scores.std() ``` - - ``` 0.9504386476 0.00947200821389 ``` +一个自然的问题是,我们如何改进这个模型? 这里的分数已经很高了,但我们如何改进模型呢? - -A natural question is, how can we improve this model? The scores are already high here, but how would we go about improving a model in general? - -Naive Bayes is an example of a [high bias - low variance](https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff) classifier (aka simple and stable, not prone to overfitting). An example from the opposite side of the spectrum would be Nearest Neighbour (kNN) classifiers, or Decision Trees, with their low bias but high variance (easy to overfit). Bagging (Random Forests) as a way to lower variance, by training many (high-variance) models and averaging. - - +朴素贝叶斯是[高偏差 - 低差异](https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff)分类器(又简单且稳定,不易过度拟合)的一个例子。 来自频谱另一侧的示例将是最近邻(kNN)分类器或决策树,其具有低偏差但高方差(易于过度拟合)。 套袋(随机森林)作为降低方差的一种方法,通过训练许多(高方差)模型和平均。 [![img](https://radimrehurek.com/data_science_python/plot_bias_variance_examples_2.png)](http://www.astroml.org/sklearn_tutorial/practical.html#bias-variance-over-fitting-and-under-fitting) +换一种说法: +- **高偏差** =分类是固执己见的。 用数据改变主意的空间不大,它有自己的想法。 另一方面,它没有那么多空间可以欺骗自己过度拟合(左图)。 +- **低偏差** =分类器更听话,但也更神经质。 将完全按照你的要求去做,众所周知,这可能是一个真正的麻烦(右图)。 -In other words: - -- **high bias** = classifer is opinionated. Not as much room to change its mind with data, it has its own ideas. On the other hand, not as much room it can fool itself into overfitting either (picture on the left). -- **low bias** = classifier more obedient, but also more neurotic. Will do exactly what you ask it to do, which, as everybody knows, can be a real nuisance (picture on the right). In [34]: -``` +```python def plot_learning_curve(estimator, title, X, y, ylim=None, cv=None, n_jobs=-1, train_sizes=np.linspace(.1, 1.0, 5)): """ @@ -927,12 +803,10 @@ def plot_learning_curve(estimator, title, X, y, ylim=None, cv=None, In [35]: -``` +```python %time plot_learning_curve(pipeline, "accuracy vs. training set size", msg_train, label_train, cv=5) ``` - - ``` CPU times: user 382 ms, sys: 83.1 ms, total: 465 ms Wall time: 28.5 s @@ -944,50 +818,35 @@ Out[35]: ``` - - ![img](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZEAAAEZCAYAAABWwhjiAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz%0AAAALEgAACxIB0t1+/AAAIABJREFUeJzsnXl8VNX5/99PZrIvJEBYEwhirVoXLOISv2KqKFjc14p+%0A+aFW22/rVmtrFbW4a1vrXleEqlgU1GpFBVEDSJRFxBWtgkKAsIRAkklmn/P748xMZpKZJMBMMjc5%0A79drXsy959x7n7kT7mee5znnOaKUwmAwGAyGPSGtuw0wGAwGg3UxImIwGAyGPcaIiMFgMBj2GCMi%0ABoPBYNhjjIgYDAaDYY8xImIwGAyGPcaIiMHQjYjIhSIyP9F9Ux0RGSYijSIi3W2LYe8QM0/EYNgz%0ARGQmUK2Uurm7bekORKQMWAfYlVKB7rXG0F0YT8TQbUiQ7rYjWYiIvbtt6CJ67Hdo6BgjIr0cEfmT%0AiHwnIg0i8qWInNGq/TIR+Sqi/bDg/lIReUVEtolIrYg8HNw/TUSeizi+TEQCIpIW3K4UkTtEZCnQ%0ABOwjIhdHXGOtiFzeyobTRWS1iNQHbR0vIueKyMpW/a4VkX/H+Izni8iKVvt+JyKvBd//PPjZGkRk%0Ao4j8vhP37XJgEvDHYFgmdK4fROSPIvIZ0CgitvbusYhMEZElEdsBEfmViPxXRHaKyCN72DdNRO4T%0Ake0isk5Eroj8HmJ8nuuDn71BRL4WkeOD+yXC/loReVFEioKHLQ7+uyt4D46Mcd4jRGRl8LvbIiL3%0ABfeH/y5E5Ojg8aGXS0S+j/gc8a5vSAWUUubVi1/AOcCg4PvzAAcwMLh9LrARGB3cHgkMA2zAp8B9%0AQDaQCZQH+/wZeC7i/GVAAEgLblcCPwAHoH/E2IGfAyOC7WPR4nJYcPsIYBdwQnB7CPBjIAPYAewf%0Aca1PgDNjfMZsoAHYN2LfCuC84Psa4Jjg+z6ha3fi3s0Abmu17wdgFTAUyOzEPZ4CLIk4PgC8DhQA%0ApcA2YPwe9P018GXwfhUCCwF/6HtoZfOPgQ0RNg4D9gm+vxqoCp4nHXgceCHYNjzyu41zjz4ELgy+%0AzwGOjPV3EdHfHvwbubOj65tXary63QDzSq1X8EF8avD9fODKGH2ODj6wYj2QptG+iLwPTOvAhleB%0Aq4LvnwDui9PvMeCO4PufAHVAepy+zwE3B9//CC0qWcHt9cDlQMFu3qsZwO2t9n0PTOnEPT4t+D6W%0AMJRHbL8IXL8bff8YfP8ecFlE2wnxHvjAvsDWYJ/0Vm1fAcdHbA8GPOgfADGFoNXxi4J/E/1b7Y8n%0AIo8Br3fm+t39f8W89MuEs3o5IjJZRD4JhkN2AgcB/YPNJcDaGIeVAuvVnidTq1vZcLKIfCQiO4I2%0A/Bzo14ENAP9Eh5QA/hd4USnljdP3BeCC4PtJwKtKKVdw++zgNX8IhtuO2u1PFE3rzxfrHveLfSgA%0AWyLeNwO5u9E3L/h+cCs7NsY7gVLqO+Aa9MN+q4j8S0QGB5vLgFcjbP8K8AED27EpkkuB/YA1IrJc%0ARCbG6ygiv0J7opMidu/t9Q1JxohIL0ZEhgNPAr8F+iqlioAvaEmUVqN/pbamGhgmIrYYbQ502CLE%0AoBh9wkMCRSQTeBn4CzAgaMObnbABpdRHgEdExqIF4rlY/YIsBIpF5FDgF2hRCZ1npVLqDKAY+Dfw%0AUjvnifk54u3vxD1OFjVosQ9RGq8jgFLqX0qpY9EhKgXcG2zaAExQShVFvHKUUjXE//yR5/1OKTVJ%0AKVUcPOdcEclu3U9EjgVuA05XSjkimtq7viEFMCLSu8lFPwhqgTQRuRj9KznE08B1IvLTYIJ1XxEZ%0ABixDP6TuEZEcEckSkfLgMauBsaIT732AG2JcN/IBmhF81QIBETkZOCmifTpwsYgcH0yyDhWRH0e0%0APwc8AniUUlXxPmjQQ5kD/A0oAt4BEJF00fMv+iil/EAjOnfQGbYC+3TQp6N73BFC5wUnsu9LwNUi%0AMkRECoHrifPQF5H9gvc3E3ADLlruwePAXcHvHREpFpHTgm3b0SGpkXENErlIRIqDm/VBGwKt+pQG%0A7f3foFcUSXvXN6QARkR6MUqpr9DJ8Q/RYZGDgA8i2ucCd6J/tTcArwBFwTDWqWgPYQPaWzgveMxC%0AdGz+M3Ty+j+0fXiFt5VSjcBV6IdIHdqjeC2ifQVwMXA/OsH+PjrxG+I5dD7k+U585BfQcf85rUJx%0AFwHfi0g9OjdyIURNiCuJc77pwIHBUMsrsTp0dI/R90K12iZO++70fQpYgP4ePgbmAf44IchM4G60%0AKNSgw5kh8X8QnbxfICINwc9xRPCzNaP/PpYG78ERMc49HvhCRBrR3+EvlFLuVvafAAwAXo4YofV5%0AR9c3pAZJnWwoIhOAB9CjeZ5WSt3bqr0IeAb9a84FXKKU+jLYdgP6P3cA+By4OOKPz2AAIBga2Yoe%0AURUvd9LrCXp4jymlyrrbFkPPImmeSDBe/ggwATgQuEBEDmjV7UZglVLqUGAy+ldHaCbsZcBPlVIH%0Ao0XoF8my1WBp/g9YbgQkmmCI8eciYheRoeih1zG9JYNhb0hmOOsI4Dul1A/BePRs4PRWfQ5AhydQ%0ASn0DlAXjpw2AF8gRPes3B9iURFsNFkREfgCuBDqcHNgLEfRoqzr0vJUvgVu60yBDzySZZRmG0naI%0AYesZrZ8CZwEfBOOpw4ESpdQnwZmtGwAnMD8YazcYwpjQTHyUUk5M7sDQBSTTE+lMsuUeoFBEPgGu%0AQE/C8ovISPS49TL0TNU8EbkwWYYaDAaDYc9Ipieyibbj1KMmPAVH5lwS2g7Wy1kHTASqlFI7gvtf%0AAcqBWZHHi4gpQWwwGAx7gFIqIXOVkumJrAR+FCy0lgGcjx6qF0ZE+gTbEJHLgEXBiUbfAEeJSLaI%0ACDAOPVO1Dd095b8zrz//+c/dboOx09hpZTutYKOV7EwkSfNElFI+EbkCXX/JBkxXSq0JljZAKfUE%0AetTWzKBH8QW6RAJKqdUi8ixaiALoxOCTybI12fzwww/dbUKnMHYmFmNn4rCCjWAdOxNJUtc7UEq9%0ABbzVat8TEe8/RFcQjXXsX9ClMAwGg8GQopgZ613AlClTutuETmHsTCzGzsRhBRvBOnYmEksvjysi%0Aysr2GwwGQ3cgIigLJNYNQSorK7vbhE5h7Ewsxs7EYQUbwTp2JhIjIgaDwWDYY0w4y2AwGHoZJpxl%0AMBgMhpTAiEgXYJU4qbEzsRg7E4cVbATr2JlIjIgYDAaDYY8xORGDwWDoZZiciMFgMBhSAiMiXYBV%0A4qTGzsRi7EwcVrARrGNnIjEiYjAYDIY9xuREDAaDoZdhciIGg8FgSAmMiHQBVomTGjsTi7EzcVjB%0ARrCOnYnEiIjBYDAY9hiTEzEYDIZehsmJGAwGgyElMCLSBVglTmrsTCzGzsRhBRvBOnYmEiMiBoPB%0AYNhjTE7EYDAYehkmJ2IwGAyGlMCISBdglTipsTOxGDsThxVsBOvYmUiMiBgMBoNhjzE5EYPBYOhl%0AmJyIwWAwGFICIyJdgFXipMbOxGLsTBxWsBHa2rl43jxuGj+eaRUV3DR+PIvnzesew5KIvbsNMBgM%0APYvF8+ax4KGHsLvd+DIzOemqqxg7cWLXXFwp/Qq9b/3v7raF3gcCbf8NtUfu27EDNm2CQIDFCxcy%0A/9ZbuXP9+rB5U9euBei6+9EFJDUnIiITgAcAG/C0UureVu1FwDPAPoALuEQp9WWwrRB4GvgJoIJt%0AH7U63uREDIYUYvG8ecy/+mruDD4sAabusw/j772XsePH6x2xHsix9nXUFvkAj3wO+P3g84HXCx6P%0AfoXeR/7bel/k/s7072DfTdXV3OFytblHN48fz+1vv703t3mvSWROJGmeiIjYgEeAccAmYIWIvK6U%0AWhPR7UZglVLqTBH5MfBosD/Ag8CbSqlzRMQO5CbLVoPBsIeEHppuNzQ3s+Cee6IEBODOdeu4+eab%0AGbt+ffTD1udreXh7PNEP/o4e1LHEwe1uOW9GBqSn639Dr9B2enr0+472padDXl7bttbXsNujrmW/%0A4Qb4/PM2t8wWQ1isTDLDWUcA3ymlfgAQkdnA6UCkiBwA3AOglPpGRMpEpBjwAMcqpf5fsM0H1CfR%0A1qRSWVlJRUVFd5vRIcbOxNKj7FSq5YHuckFzs/5XKairg08/hU8/xf7ppzEPt23dCosXx35Ap6dD%0Afn70duSDOTOTyrVrqTjoIMjMbGkPtrU5Lj1dX1REv9LSWrbT0qL3h9pC7+O1hfZHnjfyfXC7cvFi%0AfS9F8D34YEwR8WdldfKbsQbJFJGhQHXE9kbgyFZ9PgXOAj4QkSOA4UAJOny1XURmAIcCHwNXK6Wa%0Ak2ivwWAAHSJqLRhud0vbunWwejV88gmsXAm1tXDYYTBqFL5hw+DLL9uc0n/IIfCPf7R9SHf2ob1k%0ACYwd2+ah3d4DPfxvV2K36xdw0tVXM3XduijP7MaRI5lw5ZVdb1cSSVpORETOBiYopS4Lbl8EHKmU%0AujKiTz46bHUY8DmwP/BLIAP4EChXSq0QkQeABqXULa2uYXIiBsPe4Pe3CIbTqQXD621pb2qCr75q%0AEYxVq6BvXzj8cC0chx4KI0fqB2dODouXLmX+n/7EnevWhU9x48iRTHjwwR6VTO4si+fN452HH8bm%0AcuHPyuLEK69MiftgiZwIOg9SGrFdivZGwiilGoFLQtsi8j2wDsgDNiqlVgSb5gJ/inWRKVOmUFZW%0ABkBhYSGjRo0Ku+ah4XZm22ybbah8913w+6k46ihwOql87z29faQOEFQuXw47dlAB8PHHVC5aBDU1%0AVIwaBYcfTuXo0XDeeVSM02nLytWrQSkqRoyAjAzdv7iY8Q89xM0PP0z1li34MzK47M9/ZuzEid3/%0A+btjOzc3nESvrKwkOAygy+2prKxk5syZAOHnZaJIpidiB74BTgA2A8uBCyIT6yLSB3AqpTwichlw%0AjFJqSrBtMfBLpdR/RWQakK2Uur7VNSzhiVT2pNh4CmDs7ASRSermZu1l+P26TQRsNr395ZdUzp1L%0ARW2t9jTsdu1lHH44jBoFP/qRzjEoBdnZOneRlaVzEV0YLjLfeWKxhCeilPKJyBXAfPQQ3+lKqTUi%0A8qtg+xPAgcBMEVHAF8ClEae4EpglIhnAWuDiZNlqMFgWpVoEIzhCCperZehrWpoWjMxM2LpVC8XH%0AH+t/16zRIjF0KJx2Gvz5z9C/f8tw2YwMLRrZ2fr4NDM32dAWUzvLYLAK7Y2QAv2QT0/X3oTfr0Vi%0A5cqWl8MBo0e3eBqHHKIFJiQ4djsUFLSIhs3WfZ/VkFQS6YkYETEYUpHIEVKhhLfH0yIYNpt+6IeG%0As+7cqZPeIcH49FPtYYQEY/Ro2GeflvkYoZBWfj7k5GjRsJsCFr0FIyJBrCIiVomTGjsTS6ftjDVC%0AyuPRbaHhryEPA7TArF3bIhgffwybN+scRkg0fvpT6NOnRTRAnycvD3JzW+ZX7I6d3YgVbATr2GmJ%0AnIjBYIiB398y2zokGKGEN2jvIDRDOkRTEyxb1iIaq1bpsFPIw7j4Yth/fy0yodnboM+fm6vPlZmp%0AcxwGQ4IxnojBkCw6M0IqPT06Ya0UbNzYkvxeuRK++w4OPDA6NDVwoO4f8mCU0ucMjaAKiUZ3TLgz%0ApDwmnBXEiIghZQgEdJI71ggpkZb8ReuHutsNX3wRHZoKBGDMGC0Wo0fDwQfrYbXQUm8q9HeflaW9%0AkszMLh92a7AuRkSCWEVErBInNXbuJoEAOJ0sfuUVFjzxBHaPR5c+v+QSxp50EpUffkhFeXn0Mdu3%0AR3sZX3yhE94hL+Pww6G0tEUMQuEvv1/vy8jQohGaq5GAYbcpcz/bwQo2gnXsNDkRg6G7CHkcDQ3Q%0A2Mji995j/l13ceeGDeEuU6urdagqK0vXkYr0Mnbt0knv0aPh97/XpUMi8x+h84dEIz0dCgvNsFtD%0AymI8EYOhI1oJB0rph3tmJjdNmsQdixa1OeTmoiJu9/lgwIBoL2PffaO9h0Ag9rDb3NyWSrUGQ4Ix%0AnojBkGxaCwfoB3xu9LI2dqcz5uG2AQNg7lxdrDASpXQexOfT7+32lhFUGRFlzA0Gi2DqGHQBoUJo%0AqU6vtzMQ0EnxLVv0PIyNG/V2To5+0IeS20rBZ5/BTTfhW7Uq5qn8gwdT+fXXesPj0cN0HQ4tTFlZ%0AMHgwlJXpfMjAgfr83SQgVvjerWAjWMfORGI8EUPvJuRxNDZqryPkHeTktB3ptH07vPIKzJmjBeG8%0A8zjpnnuY+vDDUeto3zhsGBMmTSIQmgeSna09EjPs1tADMTkRQ+9DKT1no7FRvwKBllXyWj/gPR54%0A91146SX46CMYPx7OOw+OOiqc21i8cCHvPP20XjMiM5MTL72UsWeeqT0OIxq9HqUUkmJ/A2aIbxAj%0AIoZOo1S0x9GecIAeevvSS/Dvf+tKt+edBxMnxh5JFQjoZHhhoal2a8Af8OPxe3D5XDg8Dlw+FyUF%0AJWSnZ3e3aWESKSLmr70LsEqctMfZGfI4tm3TOY7qai0i2dlaDLKyogVkxw54+mk48US45BItDK+/%0ADi+/DOef3yIgXq8OZ7nd0K8fjBihcxzZ2VEC0uPuZzeSyjb6Aj6aPE3UNtUy+z+zWbtzLdUN1exw%0A7iCgAiilUPTcH7smJ2LoWbT2OJTSD/bs7Ngeh9cL778PL74IVVUwbhzccgscc0zbciQulx5VlZ2t%0AK+TGypsYejRKKbwBLx6/hyZPE02eJvxKl7JJkzTSJI28jLyoY9w+d3eY2mWYcJbB+oQe8A4H1Nfr%0A8FJosl+8h/yaNTpc9eqrepTUeefBqadq7yMSv1+fG3RV3D59dMjK0CsIqAAevwe3z43D48DpdRJQ%0AAUQEW5qNDFsGadJ+QMfhdlDSp4Sc9JwusrpjzDwRgyFSOBoa9MPeZovvcQDU1cFrr2mvo7YWzjlH%0Aj7baZ5+2fUNeR3p6yxBcM1u8xxOZz2j0NIa9CEGw2+xkp2fvVpJ84fsLefKlJ5E0IceWw1WTrmLi%0AiROTZX63YESkC7BKPZ2UtzM4Ua9y/nwqDjqoRTja8zh8Pqis1MKxZAmccALceKMOV7UWhVCiXCkt%0AGkVFWpT2kJS/n0GsYGeybPT6dWiq2duMw+PAG9Bl9G1iI92WTm5GbgdniKZqSRXlx+p6aQvfX8gt%0AT9/C+tEtw7/XProWoEcJiRERQ2oTmuEdClX5/fp9e8IB8M03Olz1yitQUqIT43/7mw5HtSa0PrnN%0Apudz5OebmeM9EKUUHr8nnM9o9jbjUz4EIU3SyLBlkGlPXKhy+pzpUQICsPawtTz8r4d7lIiYnIgh%0A9YglHDZbx8Nnd+3SQ3LnzNGzzs8+W+c69t039jUiE+X9+plEeQ8jVj4jNErKnmYnw5aR0PkbSil+%0A2PUDVdVVVFVX8cZTb+Ab62vT77jvj6NyZmXCrrsnmJyIoeexp8Lh98OiRdrrqKyEigq47jo49tjY%0AxQtNorzH4gv4WvIZ7pZ8RpqkYbfZyclIfGJ7Y8NGllYvpaq6iqUblqKUory0nP8Z9j9sHrCZ5Sxv%0Ac0xWWlbC7ehOjIh0AVaIOUM32ely6bpS9fXaK+iEcFRWVVExYIAWjpdf1nM0zj0X7r5b5zHiXaeL%0AE+Xme08csWxsnc/wBfSv/jRJI92WTl5mXowz7R01jTVhT6NqYxXN3mbKS8spLy3nqiOvYsvnWzhm%0A7DEAFF9QzNant0aFtEauGsmVV1yZcLu6EyMihq4n0uOIFI6sDn6h1dfryX/Tp+v3Z58NL7wAP/5x%0A7P6RM8rz8vY6UW7oPpRSuH1uPH4PDo+DZm9zeH5GKDSVyHxGiO1N26naWBX2NHa5dnF0ydGUl5Zz%0A+ejL2a/fflEhsa2yNfx+3M/GAfDUnKdAINeey5VXXNmj8iFgciKGrsLt1h7Hrl2d9jgAHX764APt%0Adbz3ng5TnXeeDlvFW2sjMlFeVGQS5RYklM9weVtKhwQI6KG2SchnhKhz1vFh9YdhT2OrYytHlhwZ%0A9jYO6H9Ah/NCWtPT54kYETEkjz0VDtBlSubM0WtyFBdr4Tj99Lbrc4QIJcr9fu3R9OvXpgyJIXUJ%0A5TOcXicOjwOP3wME8xlpdtJtyfkRUO+qZ9mmZSytXsrSDUvZ2LCRMUPGhEXjoAEHYUvbu7CnEZEU%0AxioiYoWYMyTIzr0RjsZG+M9/tNfx/fdw1lk613HggdF2VlW1rF0eSpQrpQsgplCivFd977tJaKht%0As7cZh7sln2FL0/Mz7GnRXmbk/Iu9weFxsGzjsrCnsbZuLaOHjNaiUVLOIQMP2SvBirTT6/fi9XsJ%0AqAClfUp7bAFGkxMx7D1ut143Y+fOFuHIyOg4xwE6X7F0qRaOhQv1JMDf/AZ+9rP2Q1Butw5bpafr%0AJWjz8syM8hQlND/D7XPT5NXzMwIqAGjRyLBnkCXJGbHk9DpZsXlFeATV17Vfc+jAQzmm9BimHTeN%0AUYNGJSyX4g/4w8OJUZCdnk1xbjFZ9qyk5GtSBeOJGPYctxtqavSaGyHh6OyD/IcfdLhqzhztQZx3%0AHpx5pg5DxSNWoryjSYeGLidWKXSFQhDSbemkp6UnbX0Nl8/Fx5s/DnsaX2z7goMGHER5iQ5PjR4y%0Amix7YgQrJI6RXlRBRgG5Gblk2jN3O3fSlZhwVhAjIt2I06mXj92ddcEdDpg3T3sd334LZ5yhxeOg%0Ag9o/ziTKU5rIfEajuzFcOkRESE9LT1o+A3RYbPWW1WFP49Mtn7Jfv/04pvQYykvLGTN0TEJzEeEQ%0AFQHS0BV78zLzyLRlJvVzJhrLiIiITAAeAGzA00qpe1u1FwHPAPsALuASpdSXEe02YCWwUSl1aozz%0AW0JEelxsvKlJC0h2dvwRUiECAb0i4Esvwfz5ekXA887TNawyMuIf106ivMfdz25md+30+r24/e6Y%0A+YwMW8ZeJ6JjEco1+AI+Pt3yaXjY7aqaVexTtE/Y0zhi6BHkZ+Z3fMJOElAB3D53OPyWac+kIKOA%0A7PTsmCPErPKdWyInEhSAR4BxwCZghYi8rpRaE9HtRmCVUupMEfkx8Giwf4irga+AxP1VGPaOhgYd%0AwsrJaT90tWFDS7gqN1cLx9SpeqRVe5gZ5SlFZL2pyPkZgiQ9nwE6NPbl9i95/ZvXeaz2MVZsWkFJ%0AQQnlpeVMOXQKj018jMKswoRdLzJEpVDYxU6fzD7kZOSQactMikBanaR5IiJyNPBnpdSE4PafAJRS%0A90T0eQO4Ryn1QXD7O+BopdR2ESkBZgJ3Atda2RPpMezapWtS5eWx+L33WPDMM9jdbnyZmZx0ySWM%0ALS/X4aoXX4Svv24JVx18cMd5i8hEed++JlHeTXR1valY1/+69utweGr5xuUMyBsQ9jSOLj2avtlx%0AhnnvIb6ALzyKShByMnIoyCywXIhqd7CEJwIMBaojtjcCR7bq8ylwFvCBiBwBDAdKgO3A/cAfgIIk%0A2mjoLHV1sH075Oez+N13mX/LLdy5vqWcw9SVK0Epxh59NEyZopeY7ciDaJ0oHzTIJMq7mFAS3OnT%0A8zNcXu0FJrPeVCRKKb6t+1bPCK9eykcbP6JPZh/KS8s5Y/8z+Mu4v1Cc24H3upuEQlT+gB8EMm2Z%0A9M3uS3Z6Npm2zKSKZE8kmSLSGRfhHuBBEfkE+Bz4BAiIyCnANqXUJyJS0d4JpkyZQllZGQCFhYWM%0AGjUqHJMMrcvc3duhfaliT7ztBx54oO39U0qv3VFXR+Vnn4EIC595hjvXryf06SqAO5ua+N8DDyTw%0A61+H53BUVlXp9tbbY8bodUFWrID8fCpOPhnS03vH/exm+3x+H0cfezTN3mYWvrcwnM8oP7aclVUr%0AsaXZwvMcqpZUhdt2Z7vZ18wzc59hW802MiSDa6+4lnE/G0fVkiqUUgw+eDBV1VW8Nv81vtz2JX32%0A70N5STkjd43ktH1O49Txp4bP9+22byk+tjh87j2xp/zYcjx+Dx8s+iC8XZBZwMcffkx6WjonHH9C%0Awu7v6tWrueaaaxJ2vkRtV1ZWMnPmTIDw8zJRJDOcdRQwLSKcdQMQaJ1cb3XM98AhwA3A/wI+IAvt%0AjbyslJrcqr8lwllWSba1sVMp2LpV50HyWorZTTv7bKZ99FGb46cddRTTXn459slbJ8r79tV5lbTd%0AHwZp2fvZDUSun9F6PfBQEjxRE/kg9kJMQ5cPZcKECewcuJOq6ioEoby0PDyCqrRPaYfn3V0bQyPG%0AAgG9lG1OejBEZc8kw9bOgI69JBW+885gidFZImIHvgFOADYDy4ELIhPrItIHcCqlPCJyGXCMUmpK%0Aq/McB1xnciJdTCCgE+jNzToxHsFN48Zxx5o1bQ65uaKC22fNit5pEuVdRuSkPoe3ZT1woNPrge8t%0AF/zmAhbvu7jN/gHLBnDd1OsoLy2nrLAs4SEjpRRufzBEBaSnpdMnq094FFUqz9noDiyRE1FK+UTk%0ACmA+eojvdKXUGhH5VbD9CeBAYKaIKOAL4NJ4p0uWnYYY+P2webNOdrcSEJ59lpNqapg6eDB31tSE%0Ad984fDgTLr64pV8oUW63mxnlSSIU23f73DR6GttM6suyZyU9vu/xe/h86+es2LyClZtXUrWpCmKs%0AAbZP33248JALE35tr9+LUgpbmo38zHxy0/VEv9ZlUwzJI6l3Win1FvBWq31PRLz/EIhTxzvcZxGw%0AKCkGdhFWcXErKyup+J//0XNA/H4dbgqhFDz0EMyezdh58+C777h5xgxsLhf+rCwmXHwxY48/Xk9C%0A9PuTmii31P1MoJ0dFSnc3fXAQ+xOqKjOWcfKzSvDr8+3fc6IwhGMGTKGU/Y7hbpBdSxjWZvjsmx7%0ANwy4akkVRx5zpA5RBb2rbHs2fXP7kpWeldQQ1e5glb/NRGLk2tCC16vnd4hEr7sRCMBtt8GSJfDq%0AqzBoEGPLyhg7blzLcW63DluZNcoTRqxFl0K/utNt6XssGp1FKcXanWtZuXklKzatYGXNSrY6tnLY%0A4MMYM2QM1xx1DT8d/FPyMlryZXm/yGPL01uiciLDVw7n4ssujnWJDq/v9rvxBXw4vU58AR9F2UVk%0A27NTvqwr93dZAAAgAElEQVRIb8KUPTFo3G7tgaSlRecsfD693Oy6dfDss7rOVWSb06kFZy8S5Qb9%0AwPQGvLpIoUcXKfSp4ExwsXXJQ9Plc/HZ1s/CgrFy80py0nMYM2QMhw85nDFDx7B/v/07nHC38P2F%0AzHh5Bi6/iyxbFheffXF4gaaOCJUVUSjSJFhWJCPPhKgSjCUS612BEZEEEa8OlsulK+q63fDUU9Hh%0ALa9XF14cOjR6v6FTxJrUF1B6JFFXTOoDqG2uZcWmFeF8xlfbv2K/fvtx+JDDtWgMGcPg/MFJtSE0%0AT8Wv/OHKtwWZBWTZs7rkHvRWjIgEsYqIpHScNKIOVuXy5S3rdDQ2wsUX6zIlDz4YXefK49F5j5KS%0AbhlpldL3M4JIO2NVtgX0Sn02e1Ir24IWrW93fBsWjBWbV7DTuZPRg0dTvL2Ys08+m8MGH5b0hZNa%0AV761p9kpyCwgJz2nXW/Lit95KmOJ0VkGCxCvDtaOHXDhhTBqFNx5Z3RbaLhuaWn7BRR7Ob6AD5fP%0ARW1TrU6CB1qS4Olpyc9nOL1OPtnySVg0Vm1eRWFWIaOHjGbM0DH8+vBfs1+//UiTNJ1YH5aYeSKx%0A6CmVbw2xMZ5Ib2XXLj2RMDc3Oo+xaRP84hdw6qnwhz9Ej6xyOnXfkhKTOG+FUopmbzNN3qYuq2wb%0AyRbHlrBgrNy0km92fMMBxQeEw1KHDzmcAbkDkmpDiMhaVABZ9iwTokoxTDgriBGRPWTHDqit1cNw%0AI/9Df/stTJoEl18Ol10WfYzTqYVjyJCOy7/3Mjx+D1sdW2n2NpNuS0/65DZ/wM/XO75mxaYVfLz5%0AY1ZsXkGjpzEsGGOGjOGQgYd0yXKsofCUP+BHoVBKkWnPJDc911S+TWGMiASxioikTJxUKS0eO3dG%0AlTEBYPVqKidNomLaNF15N5LmZj0Ca9CglJgwmDL3E6h31bO1aSv2NHubFfMSuS74qppVYcFYVbOK%0A4tzisGCMGTqGfYr22WPh2h07I70MhcImNnLSc8I5jWQJaCp95+1hFTtNTsSw+yily7g3NrYVkCVL%0A9CisX/+6rYA4HHrex8CBZvhuBF6/l21N22jyNpGTnpPQB+emhk1RCfC1dWs5aMBBjBkyhimjpvDI%0Azx9JeDn0WMTzMgqzCsOhKZPTMBhPpDfQTh0s3noLrr8enngCjj46uq2xUc8LGTDAlGePoMHVwNam%0ArdjERlb63s3E9gV8fLX9q7BgrNi0Ao/fE/YwRg8ZzSEDDiHTnvxRcK1zGWmS1iVehqHrMeGsIEZE%0AOoHfr5PlHk/b+RyzZ8O99+pJhAcf3LJfKe2B9O3b8UqEvQhfwMc2xzYaPY3kZuTu0QO1wd3Ax5s/%0ADovGp1s/ZUj+kHDy+/AhhzOicETSk8/xvIy8jDzjZfQCjIgEsYqIdFuc1OvVAuL3R5cxAXj8cZgx%0AA154AUaO1HZWVVFx9NHaAxkwQItICtId99PhcbDFsQVB2iSsF76/kGfmPoM74CYzLZNLzrmEcT8b%0Ax9LFSyk5pCQsGCs3r2R9/XoOHXhoWDBGDx5NUXZR0u1vz8tYtnQZ444fl9JehlVyDVax0+REDB3j%0A8ehJhK3rYCkFd98N8+fDK6/oGeeRbQ4HDB6sS7Yb8Af8bGvaRoO7gZz0nDYjjWKtn/HZo58xctlI%0Avqv7jsw1mWHBOP8n5/OTAT9JerHA3c1lmDCVYW8wnkhPxOXSHojNFj0h0O+HG26AL7+E556L9jT8%0Afp0zGTJEJ9INNHmaqGmsQaSt9xFi0m8nsWhk2yLT+3+2PzMenEFpQWnSQ1Mml2HYXYwnYohPvDpY%0AbjdceSXU18OLL0aP0PL5tPCUlpo6WGjvo7a5ll2uXTG9j0jqPfUx9xdmFzKsz7CE22ZGTBlSDfPz%0ApAuIXBs8qTQ1QXW1rmcVKSBNTTBlig5XPftstICEyriXllK5fHnX2LmXJPN+NnubWV+/nkZ3I/mZ%0A+XEFxB/w8/jKx/liyxcx27NsWVHrgu8poTLooeVtXT4XWfYsinOLKe1Tyr799mV44XD65fQjNyN3%0AjwSky/4+9wIr2AjWsTORGE+kp9DQoFcjzM2NnhBYVweTJ8P++8M990TPNne79fDfYcN6/ZK1ARVg%0AR/MO6px1ZKdnY29nVv5/d/yXa+dfS5Y9i7suv4tHZz+asPUzIosTGi/DYAVMTqQnsHMnbNvWtg5W%0ATY0uY3LCCTB1avRcD7dbeyYlJb2+kKLT66TGUUNABdqtYusL+Hhs5WM8+fGT/KH8D1x0yEWkSdoe%0Ar58RL5eRm55Lhj3D5DIMScMM8Q1iRARdxmTHjrZ1sNat0wIyebKejR6J06m9lZKSXl0HK6AC1DXX%0AscO5gyx7Vru/8r/a/hXXzr+Wouwi/nriXykpKNmta7X2MkCPijLzMgzdQSJFxPzM6QKSEidVSnsf%0AO3bo0VSRAvLFF3DOOXDVVW0FpLlZ50tKS9sIiFXiuYmw0+VzsX7Xena6dpKfmR/3Ae7xe7iv6j7O%0An3s+U0ZN4YWzXuiUgPgCPt5///02uYyBuQMp7VPKyL4j9zqXkSis8L1bwUawjp2JpFM/Q0UkByhV%0ASn2TZHsMnSEQ0GXcQ3WtIvnoI12F96674JRTotuamvToq8GDe20dLKUUO1072d60nSx7Vrvreny2%0A9TOunX8tQ/KHsOCiBe2u8qeUwul1otCeccjLGFowlPS0dONlGHosHYazROQ04K9AplKqTEQOA25V%0ASp3WFQa2R68MZ4XqYDmdbYfjvvMOXHstPPoojB0b3dbUpENevbiQotvnpqaxBm/AS056Ttz5Gy6f%0Ai/s/up9/ff4vbjnuFs4+4Ox253q4fC58AR/9c/qTk55jchmGlKer54lMA44E3gdQSn0iIvsk4uKG%0A3SRUB8vrbSsgL78Mt9+uh/Aedlh0Wy8vpKiUYpdrF9ubtpNhz2jX+1hVs4pr51/Lvn33ZeHkhe0u%0A5BRQAZq9zWTbsykpKEn6THSDIRXpzM8lr1JqV6t9gWQY01NJSJzU64UNG/TEwNZ1sKZP18N3X3op%0AWkCU0gLSr5/2QDoQEKvEc3fHTo/fQ3VDNbXNteRm5MZ90Du9Tm5bdBuXvHYJ1x59LU+d+lS7AuL0%0AOnH5XAzKHURpn9KY5+2J97O7sIKNYB07E0lnPJEvReRCwC4iPwKuAvZ+FpWh84TqYAFkRZQeVwru%0Auw/+/W949VU92iqyLcULKSYTpRQNbl2yvaM1zZdtXMbvF/yegwcezLuT36VfTr+4fUOT/woyCyjO%0ALcae1ntHtxkM0LmcSA5wE3BScNd84HallCvJtnVIr8iJuFxaQOz26PkcgQDcfDOsWAGzZkWXbA8E%0AdA5k0KBeWUjR6/eyxbEFp9dJbkZu3HxGs7eZu5fczZvfvskdx9/ByT86ud3zNnuaEREG5Q1qV5QM%0AhlSny+aJiIgdeEcp9bNEXCzR9HgRcTpjlzHxeuGaa/RKhTNmQEFBS1uokOLQoW1XMOwFhBeMSrO1%0AWa42kg82fMAf3vkDY4aM4daKW9stx+71e3H5XPTN7kvf7L5mzXCD5emyeSJKKR8QEJHCRFyst7JH%0AcdJQHaysrGgBcTrhkkt0+/PPRwuIz6fbS0v3SECsEs+NZafX72VTwya2NG0hOz07roA0uhu5fuH1%0AXPP2Ndz+s9t56OSH4gqIUgqHx0FABRheOJzi3OLdEhAr389Uwwo2gnXsTCSdSaw3AZ+LyDMi8nDw%0A9VBnLyAiE0TkaxH5VkSuj9FeJCKvisinIrJMRH4S3F8qIu+LyJci8oWIXNX5j2VxGhp0CCsnJ3pC%0AYH09XHABFBXBU09FJ9i9Xp07GTas11XibXQ3sr5+PW6fm7yMvLjDayt/qOSEZ08gEAjw3v97j3H7%0AxC9N4va5cXgc9M/pT1lhWbtejcHQm+lMTmRK8G2oowBKKfXPDk8uYgO+AcYBm4AVwAVKqTURff4K%0ANCilbheRHwOPKqXGicggYJBSarWI5AEfA2e0OrbnhbPi1cHauhUuvBDKy2HatOi2UCHFkpJeVUjR%0AF/CxvWk7De6Gdper3eXaxW2LbmNp9VL+euJfGTt8bMx+oIftNnmayEnPYWDeQDNs19Aj6dJ5Ikqp%0AmSKSCewX3PW1UsrbyfMfAXynlPoBQERmA6cDayL6HADcE7zWNyJSJiLFSqktwJbgfoeIrAGGtDq2%0AZ1Fbq1+ty5isX6/rYJ1zjs6FxCqkOGxYdNirh9PkaWKLYwsA+ZnxF9FasHYBN7x7A+NHjufdye+S%0AlxE/zOf0OgmoAIPzBpOfmZ/0xaQMhp5Ah+EsEakA/gs8Gnx9KyLHdfL8Q4HqiO2NwX2RfAqcFbzW%0AEcBwIKo4kYiUAYcByzp53ZSiwzipUtrT2LFD5zgiH15r1sBZZ8Fll8Hvfhfd5nTq7QQJiBXiuf6A%0An1feeoWNDRvJsGXEXXGwzlnHlW9eya2Vt/LwyQ9z1wl3xRUQf8BPo7uRnPQcRhSNoCCrICECYoX7%0ACdaw0wo2gnXsTCSdGeT+d+CkUN0sEdkPmA38tBPHdibWdA/woIh8AnwOfAL4Q43BUNZc4GqllKP1%0AwVOmTKGsrAyAwsJCRo0aRUVFBdDyhXb3doiY7YEAFQccAI2NVH7+uW4vL9ftM2bAX/5Cxd13wxln%0AUFlV1dLe3EzlqlXQrx8Vwc+/t/auXr26W+5PZ7ffXvg2O5p3ANr7CC36VH6svl+h7Z0Dd3Lz+zcz%0A2jOaOw66g/LS6PbI/m6vm6OOPYqSghJWVK3gG75JmL2pfj879fdptndre/Xq1SllT2i7srKSmTNn%0AAoSfl4miMzmRz5RSh3S0L86xRwHTlFITgts3AAGl1L3tHPM9cHAwhJUOvAG8pZR6IEZfa+dE/H5d%0AB8vlapsMr6zUy9k++CAcf3x0Wy8rpBi5XG12enbcCX61zbXc+O6NrKldw99P+jtjho6Je87QsN2i%0A7CL6Zfczw3YNvYquLgX/sYg8LSIVIvIzEXkaWNnJ868EfhTMc2QA5wOvR3YQkT7BNkTkMmBRUEAE%0AmA58FUtALI/Pp+tgud1tBeS11+Dqq+GZZ9oKiMOhh+8OGdIrBMTpdUYtVxtLQJRS/PvrfzPu2XEM%0A7zOcBRctiCsgSimaPE0EVIBhfYYxIHeAERCDYS/ozFPo/9DJ7KuAK4Evg/s6JDjP5Ar0LPevgBeV%0AUmtE5Fci8qtgtwPRQ4i/BsYDVwf3HwNcBPxMRD4JviZ08nOlFK3DBni9eg5IrDpYzz4Lt90G//oX%0AjGn1IHQ49PDeQYOSUkixjZ3dSEAF2N60nQ31G7Cn2cnJaBHayLXLtzq2cunrl/LQsoeYcfoMpo6d%0AGjdP4va5afI00S+nH8MLh8ftlyhS6X62hxXstIKNYB07E0lnciI24AGl1H0QHrbb6XGkSqm3gLda%0A7Xsi4v2HwI9jHPcBPXHRrPbqYD30ELz4oq7IGxm3VEoLSP/+uphiD8flc1HTWINf+eOOvFJKMeer%0AOdyx+A4uOuQiHpv4GJn22H+WARWg2dNMVnoWQ/KHxO1nMBh2n87kRJYBJ4SS2iKSD8xXSpV3gX3t%0AYrmcSHt1sG67DZYsgRde0BV3I9uamnQhxaL4pTl6Ap1drnZT4yb+9M6f2NK0hfvH389BAw6Ke06X%0A14Vf+RmQO4CCzMSMujIYrE5XryeSGTkqSinVGCzKaNgdmpu1gLSug+XzwXXX6TXRX35Zr/sRIiQg%0AgwdHlzfpgbh8LrY0bsEb8JKXkRfzYa+U4oXPX+CepfdwyWGXcMWYK+IKjT/gp9nbTF5GHgNyB5iV%0ABQ2GJNGpsiciMjq0ISKHA87kmdTzqHzrLS0g2dnRAuJy6aVsa2th9uxoAYkspNhFAtId8VylFHXO%0AOtbvWg9C3Kq71fXV/OLlXzDr81lMLZ3K7476XVxhcHqduH1uhhYM1cvTdpOAWCU+bgU7rWAjWMfO%0ARNIZEbkGeElEPhCRD9BzRK5Mrlk9iPp62L5dj8CyRYwCamyEiy7SeZFnnokeoRUqpFhS0qMr8bp9%0AbjbUb6C2uZa8jLyYJUYCKsDM1TM5edbJjB02ltcveJ1hfYbFPJ8v4KPR3UheRh4jika0OzvdYDAk%0Ahrg5keDs8WqlVE1wCO7l6Jnla4CblVJ1XWdmbFI+J1JXp+tgtS5jUlurBeSww+COO6LFxePRIlJS%0AEp1470GElqvd1rSNDFtG3ET39zu/57oF1+ENePn7+L+zb999457P6XOSJmkMyhtETrqJthoM7dFV%0A80SeANzB90cBU9FlT3YCTybi4j2a2lrtgbQWkI0b4cwz9fyPu+6KFpBQIcXS0h4rIB6/h40NG9ne%0AtJ28jLyYAuIP+Hny4yc59V+nMuFHE3j1/FfjCojH78HhcVCYWUhZYZkREIOhi2lPRNIivI3zgSeU%0AUi8rpW4CfpR80yxKZB2soICEypXw7bdaQCZPhj/+MVpcXMGFIktLu60SbzLjuUop6l31/LDrB508%0Az4ydPP+u7jvOfPFM5n83n/9c8B8u++llbSYDVi2p0mt9uB0IQllhGf1z+8et4ttdWCU+bgU7rWAj%0AWMfORNLe6CybiKQHK/aOQ4ezOnNc7yUQ0ALS2KgFJJLVq2HKFJg6Fc49N7rN6dQeSUlJ9PohPQSv%0A38tWx1aavE1xS7b7Aj4eX/k4j698nOvKr2PyoZPjioLH76HJ28SA3AH0yepjhu0aDN1IezmRqcBE%0AoBYoBUYrpQIi8iNgplLqmK4zMzYplRPx+2Hz5thlTJYsgd/+Fv72NzjppOi25mbteQwZEh3a6iGE%0Alqu1p9nj5j7WbF/D7xf8noLMAv564l8p7VMas58ZtmswJIauXGP9aGAQsEAp1RTctx+Qp5RalQgD%0A9oaUEZFQHaxYZUzefBP+9Cd44gk4+ujoth5cSNEX8LHVsRWHxxHX+/D6vTyy/BGeWf0MN/zPDVxw%0A0AVxvQqn14lSioF5A9tdP8RgMHRMV66x/qFS6tWQgAT3/TcVBCRlCNXB8vvbCsjs2TB1KpV//GNb%0AAXE4dMgrhQopJiqe6/A4+GHXD7h9bvIz82MKyBfbvuDnL/ycVTWrePuit5l08KSYAhIatpubkUtZ%0AURn5mfmWiTsbOxOHFWwE69iZSHpeAL4rcbu1ByLSdjTV44/DjBkwd67Ok0QSKqTYv39SCil2F5HL%0A1eak58Ssjuv2uXlg2QPM+mwWNx93M+cccE5c76PZ20yapFHap9SMujIYUpQOa2elMt0azopXB0sp%0AuPtuWLBA18EaMiS6rYcWUmzyNFHTWIOIxK2O+0nNJ/x+we8pKyzj7hPuZmDewJj9PH4PHr+Hvll9%0A6ZvTN+VGXRkMVqera2cZWhOqg5WVFT2ayu/X+Y+vvoJXXoG+fVvaemghxdCCUTudO8nNyI3pfTi9%0ATu778D7mfjWXWytu5bQfnxa3NlaTt4mMtAyG9RlGlr1nzpUxGHoS5ife7uJ2t9TBihQQtxv+7/9g%0AwwZdzj1CQCo/+KClkGIKC8juxnPdPjfrd63H4XFQkFUQU0BWbFrBSc+fRHVDNQsnL+T0/U+PKSAu%0An4smbxPFOcUMLxzeroBYJe5s7EwcVrARrGNnIjGeyO4SCOg8RuRw3KYm+OUvdZ2rZ5+Nnizo82mB%0AKSmB3NyutzdJhGae29JsMWteNXubueeDe3jjv29w+89uZ+J+E2OeJ6ACNHubybZnU1JQEvNcBoMh%0AdTE5kd3F6dSeSEgQ6ur0DPT994d7740WF69XC0hpaduRWxbG4/dQXV8dV0Cqqqu4bsF1jB4ymlsr%0AbqVvdt8YZwkO20UxIGcABVk9u9S9wZBKmJxIqlBTA5MmwbhxcOON0SOtQoUUhw3rUXWw2vNAHB4H%0Ady65kwVrF3DPuHs4cZ8TY57DF/Dh9DopyCygOLc45rrpBoPBGpicyG6weN48bjrtNKZdeCE3nXEG%0AiydM0CVMpk6NFpBQIcWggFglTtqRnV6/l40NGxGExYsXM+m3kzj7/85m0m8ncf/s+znh2RPw+Dy8%0AN/m9uALS7GnG6/dSUlDC4PzBeyQgPeV+pgpWsNMKNoJ17Ewk5idgJ1k8bx7zr76aO9euDe+b2q8f%0A7LcfYyM7ulxaUEpLoxegsjhev5fqhmoEYcmSJdzy9C2sH70+3P7B8x/wu4t+x+/G/y7u8S6fi77Z%0Afemb3TdmEt5gMFgPkxPpJDeNH88dCxa02X9zRQW3z5qlN5xOPWJr6NAeVUgxUkAy7ZlM+u0kFo1c%0A1KZfxboKZj0yK2qfUopmbzP2NDuD8gbFnUNiMBi6DpMT6QbsbnfM/bZQCfceWkgxFMICwgUU3YHY%0A98Lld0Vtu31uvH4v/XP7U5hVaCYNGgw9EPO/upP44qzx4c/KaimkOHRoTAGxSpy0tZ2+gC8sIJHz%0ANtze2CKSZdN9AipAo7sRW5qN4YXD6Zud2FnnVr2fqYoV7LSCjWAdOxOJEZFOctJVVzF15MiofTcO%0AH86Jv/gFFBT0uEq8voCP6vpqgKgS7h9v/phvC79lwEcDovoPXzmci8++GJfXhdPrZFDeIEoLSuOW%0AfzcYDD0DkxPZDRbPm8c7DzyArb4ef04OJ15wAWPPOqtHFlLcWL+RgAqQld7igazYvIJLX7uU+8ff%0Aj1qvmPHyDFx+F1m2LCafOZny/yknPyOf4txis9aHwZDCdNl6IqlOt002rK7WxRSLi6PrY/UA4gnI%0Aso3LuOw/l/HQyQ9RUVYRdUyzpxmAwfmDyc3oObPyDYaeSpetJ2KIg88HAwd2WkCsEid99713YwrI%0Ah9Ufctl/LuORnz8SJSD+gJ8GdwP5mfmMKBrRZQJilftp7EwcVrARrGNnIjEisrukp8Pw4VBY2N2W%0AJBR/wM/25u34lT9KQJZuWMqv3vgV/5j4D8YOb5kR4wv4aPY2U1pQysC8gWbeh8HQS0lqOEtEJgAP%0AADbgaaXUva3ai4BngH0AF3CJUurLzhwb7JMay+NaHH/Az6bGTXj93qh5HIvXL+aKN6/giVOe4OjS%0AlpUZfQEfLp+L0oJSM+/DYLAglsiJiIgN+AYYB2wCVgAXKKXWRPT5K9CglLpdRH4MPKqUGteZY4PH%0AGxHZS+IJyKIfFnHlW1fy1KlPcWTJkeH9Xr8Xj99DaZ9Ss96HwWBRrJITOQL4Tin1g1LKC8wGTm/V%0A5wDgfQCl1DdAmYgM6OSxliFV46QBFWBz4+awgFQtqQLgve/f48q3rmT6adOjBMTj9+ANeLtdQFL1%0AfrbG2Jk4rGAjWMfORJJMERkKVEdsbwzui+RT4CwAETkCGA6UdPJYw14QUAE2NWzC4/dEeSDvrHuH%0Aa96+hmdOf4YxQ8eE93v8HvwBP6UFxgMxGAwtJLPsSWfiTPcAD4rIJ8DnwCeAv5PHAjBlyhTKysoA%0AKCwsZNSoUVRUVAAtvwrMdvT22OPGsqlhE4srF5OZnkn5seUArNy8ksdef4wXrn2BwwYfFvZMRh89%0AGoVi3SfrqLZVd7v9VtkO7UsVe6y8XVFRkVL2tLcdIlXsCd27mTNnAoSfl4kimTmRo4BpSqkJwe0b%0AgECsBHnEMd8DBwMHdeZYkxPZfQIqQE1jDU6vk5yMnPD+t759iz+9+yeePeNZDh10aHi/26dLnJQU%0AlJgJhAZDD8EqOZGVwI9EpExEMoDzgdcjO4hIn2AbInIZsEgp5ejMsVai9S+U7iIsIL5oAZn333nc%0A8O4N/GHIH6IExOV1IQilfUpTSkBS5X52hLEzcVjBRrCOnYkkaeEspZRPRK4A5qOH6U5XSq0RkV8F%0A258ADgRmiogCvgAube/YZNnaG4gSkPQWAXn9m9e55f1beP6s52n4piG83+l1Yk+zM7RgqFl50GAw%0AxMWUPekFKKWoaayhydsUNav831//m1sX3cqss2ZxYPGB4f1Or5N0WzpD84eaSYQGQw/ErCdi6DRK%0AKbY4trQRkJe/epk7l9zJv87+F/v33z+83+l1kmHLYEj+ECMgBoOhQ0zZky6gu+KkIQFxeBxRAvLS%0Aly9x15K7mH3O7CgBef/998myZzG0ILU9EKvEnY2dicMKNoJ17EwkxhPpoSil2Nq0lUZPI3kZeeH9%0As7+YzV+r/sqL577Ivn33De93eBxk2bMYnD/YrEBoMBg6jcmJ9ECUUmxr2ka9uz5KQGZ9Nov7P7qf%0AF899kZFFLQtsOdwOCrIKGJg7EOlB66IYDIbYmJyIIS5hAXHVk5fZIiDPfvosDy9/mDnnzmFE0Yjw%0A/kZ3I4VZhQzIHWAExGAw7DYmbtEFdFWcNJ6AzFw9k0eWP9JGQBweB32z+zIwT3sgVonnGjsTixXs%0AtIKNYB07E4nxRHoQtc21bQRk+qrpPLXqKeaeN5dhfYYBWmwcHgf9c/rTL6dfd5lrMBh6ACYn0kPY%0A3rSdOmcd+Zn54X1PfvwkM1bPYM65cygpKAFaBKQ4t5i+2T1raV+DwdA5TE7EEEUsAXlsxWM8/9nz%0AzD13LkMLdAHkkIAMzB1IYXbPWpnRYDB0DyYn0gUkM05a21TLTtfOKAF5ZPkjzPp8FnPOmxMWkIAK%0A0OhpZFDeoLgCYpV4rrEzsVjBTivYCNaxM5EYT8TC1DbVssO5I0pAHlz2IHO/msvc8+YyKG8QoAWk%0AydPEkLwhFGQVdJe5BoOhB2JyIhZlR/MOaptrowTk7x/+nde+eY2XznmJgXkDAb38bbO3maEFQ6Pm%0AjBgMht6LyYn0cuqcdVECopTib1V/483v3mTuuXMpzi0GWgSkpKAkquyJwWAwJAqTE+kCEhknrXPW%0Asa1pW9irUEpx79J7efu7t5lz7pywgPgCPpw+J6V9SjstIFaJ5xo7E4sV7LSCjWAdOxOJ8UQsRJ2z%0Aju1N28nPyA+5o9z9wd289/17zDlvTnjIrtfvxeP3UFpQGrV+usFgMCQakxOxCDudO8MeSEhAbl98%0AOx9s+IDZ58wOC4jH78EX8FFSUEKWPaubrTYYDKmIyYn0MnY5d7HVsZX8zBYPZNqiaSzbuIwXz3mR%0AokehmHAAACAASURBVOwiQAuIP+CntKCUTHtmN1ttMBh6AyYn0gXsTZy03lXPFseWKAG55f1bWLlp%0AJbPPmR0WELfPTUAFKO2z5wJilXiusTOxWMFOK9gI1rEzkRhPJIVpLSABFeCm927is62f8cLZL9An%0Aqw+gBUShKC0oJd2W3s1WGwyG3oTJiaQo9a56ahw14SR6QAW44d0bWLN9DbPOmhUe3uvyuhARSvuU%0AYk8zvwkMBkPHmJxID6fB1UCNoyacRA+oANe/cz3f7fyOF85+ITy81+l1YhMbJX1KjIAYDIZuweRE%0AuoDdiZNGCkiapOEP+LluwXWs27mO5898PkpA0m3pCfVArBLPNXYmFivYaQUbwTp2JhLz8zWFaHQ3%0AUuOoITcjNywg1y64lk0Nm3jurOfISc8BoNnTTKY9kyH5Q7Cl2brZ6p6PWfHRYGWSHfI3OZEUodHd%0AyKaGTeRlag/EF/Dxu7d/x7bmbcw8fWZ40mCzt5lsezaD8weTJsaR7ApCo+IMBqsR72/X5ER6GE2e%0AJjY3bo4SkKveuoqdrp1RAtLkaSIvI4+BeQONgBgMhpTAPIm6gPbipE2eJjY2bAyHsLx+L79987c0%0AuBt45rRnogQkPzOfQXmDkiYgVonnWsVOg6E3YDyRbiQkIDnpOaRJGh6/h9/O+y0uv4unT3s6XLbE%0A4XbQJ6sPA3IHmPi8wWBIKUxOpJto9jazsX4j2enZ2NJsePwefv3GrwmoAE+c8kR41rnD46Aoq4j+%0AOf2NgHQTJidisCpdkRNJajhLRCaIyNci8q2IXB+jvb+IvC0iq0XkCxGZEtF2g4h8KSKfi8gLItJj%0AikE1e5uprq8OC4jb5+by/1yOIDx56pNk2jNRStHobqRvVl+Kc4uNgBiSys9//nOee+65hPc19HyS%0A5omIiA34BhgHbAJWABcopdZE9JkGZCqlbhCR/sH+A4ES4D3gAKWUW0ReBN5USv2z1TUs4YlUVlZS%0AUVEBtAhITnoOtjQbLp+Ly/5zGVm2LP4x8R+k29JRSuHwOCjOLQ5X5+1qO1OZrrYzVT2RvLy88I+L%0ApqYmsrKysNn0kO8nn3ySCy64oDvNM6QAVh+ddQTwnVLqBwARmQ2cDqyJ6FMDHBJ8XwDsUEr5RKQB%0A8AI5IuIHctBCZGmcXmc4BxISkF++/ktyM3J55ORHogRkQO6AcHFFQ2qyeN48Fjz0EHa3G19mJidd%0AdRVjJ07ssnM4HI7w+xEjRjB9+nSOP/74Nv18Ph92u0l/mvuQJJRSSXkB5wBPRWxfBDzcqk8aUAls%0ABhqBkyPaLg/u2wY8F+cayio0e5rVN7XfqPW71qtNDZvUdzu+U2NnjFWn/+v08L7q+mq1Zvsatcu5%0Aq7vNNUQQ6+9s0RtvqBtHjlQKwq8bR45Ui954o9PnTcQ5QpSVlal3331XKaXU+++/r4YOHaruvfde%0ANWjQIDV58mS1c+dONXHiRFVcXKyKiorUKaecojZu3Bg+/rjjjlNPP/20UkqpGTNmqGOOOUZdd911%0AqqioSI0YMUK99dZbe9R33bp16thjj1X5+flq3Lhx6je/+Y266KKLYn6G7du3q4kTJ6rCwkLVt29f%0Adeyxx6pAIKCUUmrDhg3qzDPPVMXFxapfv37qiiuuUEop5ff71e23366GDx+uBgwYoCZPnqzq6+uV%0AUkp9//33SkTU9OnT1bBhw9Rxxx2nlFJq+vTp6oADDlBFRUVq/Pjxav369bt9v61CvGdkcH9CnvXJ%0AzIl0xv+/EVitlBoCjAIeFZE8ERkJXAOUAUOAPBG5MGmWJhmXz0V1QzVZ9izsaXacXif/79//j/7Z%0A/Xno5Iewp9kJqABNniaG5A0JV+c1pC4LHnqIO9eujdp359q1vPPww116jnhs3bqVnTt3smHDBp54%0A4gkCgQCXXnopGzZsYMOGDWRnZ3PFFVeE+4tIVN5t+fLl7L///uzYsYM//vGPXHrppXvUd9KkSRx1%0A1FHU1dUxbdo0nn/++bj5vfvuu4/S0lJqa2vZtm0bd999NyKC3+/nlFNOYcSIEaxfv55NmzaFQ3Uz%0AZ87kn//8J5WVlaxbtw6HwxH1uQAWL17M119/zdtvv81rr73G3XffzauvvkptbS3HHnusCfvtJcn0%0A7TYBpRHbpcDGVn3KgTsBlFJrReR74ABgBFCllNoBICKvBPvOan2RKVOmUFZWBkBhYSGjRo0Kx8tD%0A8wm6c9vj97DNsY2xFWNZvnQ5bp+bR7c/ytCCoZybfS7Lly7nyGOOpNnbzLpP1lGTXtNt9j7wwAMp%0Ad/9ibYf2deX1WmN3u2Put82fD50cBBHvP5/N5erU8e2Rlpb2/9s78+gqqqxvPzsTBDJdEkhCBoag%0ANKgv0m8MEGRwABQQQVECSLfa/SpLJhFeERQBPxZTNyh2oyAiToC0dKtIQFEZFp+gfDQiiArIGJK0%0ABAiQQEhI2N8fVbnchBtIQm5yg+dZqxZVp06d+t3NTe17zqmzN1OmTMHf3x9/f3/q1q1Lv379nOcn%0ATJjgduirmCZNmjidwR/+8Aeeeuopjh07RqNGjcpd9/z582zbto3169fj5+dHx44d6dOnT5nzSwEB%0AAWRmZnLo0CESEhLo2LEjYDmpzMxM/vKXv+DjY/3uTU5OBmDJkiWMGTPG+QyYPn06N998M2+//baz%0A3cmTJxMYaK23mj9/PuPHj6dly5YAjB8/nmnTppGWlkZcnOvj6vpiw4YNTpsU26rKqKouTekN629k%0AP1ZvIgDYgTVR7lpnDjDJ3o/EcjINgDbAD0AgIMA7wDA396h0N686yLuQp3uP79UPPv1A08+k697j%0Ae7Xdwnb68IcP65FTRzT9TLoeOXVE92Tt0dz83JqWq+vXr69pCeWiunW6+5493717iWGo4u2FHj3K%0A3W5VtFGMu+EsV86ePatPPPGENmnSRENCQjQkJER9fHycw0Vdu3bVRYsWqao1RHX77beXuF5EdP/+%0A/RWqu2XLFm3UqFGJc+PHjy9zOCsnJ0fHjBmjzZs31+bNm+uMGTNUVXX58uWamJjo9ppWrVrp6tWr%0Ancd5eXkqIpqRkeEcziosLCxRPygoSMPCwpxbvXr1dMuWLW7br+2U9YykNgxnqWohMBz4HPgRWK6q%0AP4nIkyLypF1tGpAoIt8DXwLPqupJVf0eeBfYBuy0677hKa2eIL8wn6NnjhLgG0CnLp3ILchl8L8G%0Ak+BIYHb32fj6+FJ4sZC8wjxiQ2OpH1C/piXXijezwDt0dh85kucTEkqUTUhIoNuIEdXaRlmUHjKa%0APXs2e/fuZevWrZw+fZqNGze6/hjzCNHR0Zw8eZK8vDxn2ZEjR8qsHxQUxF//+lf279/PypUrmTNn%0ADuvWrSM+Pp4jR45QVFR02TWNGzfm0KFDJdr38/MjMjLSWeZqi/j4eN544w2ys7Od29mzZ2nfvv01%0AftrfLh59VUFV1wBrSpUtcNk/DtxXxrWzgFme1Ocp8gvzSTuThr+PP/6+/uTk5zD4X4Np1bAV0++a%0A7gxvUlBUQFxInDO0iaH2UPwG1cS//Q3f8+cpqluXe0aMqNDbWVXRRnnJzc0lMDCQ0NBQTp48yZQp%0AU6r8HqVp0qQJiYmJTJ48malTp7Jt2zZWrVpFnz593NZPTU2lZcuWJCQkEBISgq+vL76+viQlJREd%0AHc1zzz3HlClT8PHxYfv27SQnJzNw4EBmzpzJvffeS0REBBMmTCAlJcU57FWaoUOHMnHiRNq0aUPr%0A1q05ffo0a9eu5aGHHvKkKa5rzPtuVUxpB3Im/wz3TbuPjp07MvXOqU4HcuHiBeJC45yhTbwBs06k%0AYnTu1euaH/hV0YY7SvdEnn76aQYNGkRERAQxMTE888wzrFy5ssxrS19f1mT41eouWbKERx99lPDw%0AcJKSkhgwYIDbHgXAvn37GD58OFlZWTgcDoYNG0aXLl0A+PTTTxk5ciTx8fGICIMHDyY5OZnHH3+c%0AjIwMOnfuzPnz57nnnnv4m8uLCaW19e3bl9zcXFJSUjh8+DChoaF0797dOJFrwIQ9qUIKigpIO52G%0Ar48vAb4BnDp/isH/HEz08WgWjlyIiFBQVEDRxSJiQ2KdoU28BW95OF8Ns9iw9jJgwABat27NpEmT%0AalrKb4LqWGxonEgVUdqBZOdlM/CfA2kX247JXSYjIuQX5qMosSGxBPgG1LRkQzkxTqTybNu2DYfD%0AQbNmzfj888954IEH+Oabb2jTpk1NS/tNUNtXrP9mKCgq4OiZo04HcjLvJCkrUugU34kXOr9QwoHE%0AhcTh7+tf05INhmrhP//5Dw888AAnTpwgLi6O+fPnGwdynWF6ItfIhaILpJ1JQxDq+NXhxLkTDFgx%0AgDub3cn428cjImxYv4EOnToQGxLr1Q7EDGe5x/REDLUV0xPxcko7kOPnjjPgwwF0S+jGuI7jEBHy%0ALuThIz7Ehcbh52PMbTAYri9MT6SSlHYgWWezeHjFw/S6oRdjOoxxOhB/X39igmPw9fGtEZ2Ga8f0%0ARAy1FdMT8VJKO5Bfc3/l4RUP07dlX0Z3GA3AuYJz1PGrQ+PgxsaBGAyG6xaTY72CFF0s4ugZKwRY%0AHb86ZOZk0v/D/jzQ6oFLDuTCOQL9A4kJsXogtSUnuNFpMBgqinEiFaSgqIDCi4XU9atLRk4G/T/s%0Az4CbBjCq3SjAyptez68e0cHR+Igxr8FguL4xcyIVIPWLVF5Z8gqnL5zGR3040OAAQx8cytDEoYDl%0AQIICgogKijLpbK8jzJxI9XDo0CGaN29OYWEhPj4+9OzZk4EDBzJkyJCr1q0o06dP58CBAyxcuLAq%0ApHstZk7Ei0j9IpVR80axv+2l/A+OzQ5a5LQAIDc/l9C6oTSq38g4EEO1snTpUubMmcOePXsIDg7m%0A1ltv5fnnn3eGUq+trF69ukra2bBhA0OGDCEtLc1ZNn78+Cpp22CGs8rNq0tfLeFAALKTs1n8z8Xk%0AFuTiCHSU6UBqyxi+0VkxUr9IpcdjPej6aFd6PNaD1C9Sq72NOXPmMHr0aF544QWOHTtGWloaw4YN%0AKzMuVllxqwy1g8LCwpqWcBnGiZSTfHWfhOhs4Vka1G1Aw/oNTQ/kN0Rxz3Rt07VsbLaRtU3XMmre%0AqAo5gWtt4/Tp00yaNInXXnuNvn37EhgYiK+vL7169WLmzJmAlZCpf//+DBkyhNDQUN555x0yMjLo%0A06cP4eHh3HDDDbz55pvONrdu3UpiYiKhoaFERUUxZswYAM6fP88jjzxCREQEDoeDpKQkjh07dpmm%0A5cuXc9ttt5Uoe/nll7n//vutz5yaStu2bQkNDSU+Pv6K0YS7du3KokWLAMv5jR07loYNG5KQkEBq%0AakkbLV68mNatWxMSEkJCQgJvvGFljjh79iz33nsvGRkZBAcHExISQmZmJpMnTy4xTLZy5Upuuukm%0AHA4Hd9xxBz///LPzXNOmTZk9ezZt2rQhLCyMlJQU8stISvbLL7/QpUsXwsLCaNiwISkpKc5zu3fv%0Aplu3boSHhxMVFcX06dMByM/P5+mnnyYmJoaYmBhGjx5NQUEBYP1gio2NZdasWURHR/OnP/0JVWXG%0AjBm0aNGCiIgIBgwYQHZ2dpl29DhVlZikJjaqMSlV90e7K5O5bLvzj3dWmwZDzeDue1bW96HHY+VP%0AKHWtbaxZs0b9/Py0qKiozDqTJk1Sf39//eSTT1TVStrUqVMnHTZsmObn5+uOHTu0YcOGum7dOlVV%0Abd++vb7//vuqaiWy+vbbb1VVdf78+XrfffdpXl6eXrx4Ubdv365nzpy57H7nzp3T4OBg3bdvn7Ms%0AMTFRly9frqqqGzZs0B9++EFVVXfu3KmRkZH68ccfq+qlnOjFn8c1+dXrr7+uv/vd7/To0aN68uRJ%0A7dq1q/r4+Djrpqam6oEDB1RVdePGjVqvXj3dvn27856xsbEldE6ePNmZHGvPnj1av359/fLLL7Ww%0AsFBnzZqlLVq00AsXLqiqlfCrXbt2mpmZqSdPntRWrVrp/Pnz3do7JSVFp02bpqqq+fn5+vXXX6uq%0A6pkzZzQqKkrnzJmj+fn5mpOT47TtxIkTtUOHDpqVlaVZWVmanJysEydOVFUrwZifn58+99xzWlBQ%0AoHl5efrKK69ohw4dND09XQsKCvTJJ5/UgQMHutVT1jOS2pCU6npj5KCRJHxXMoFQ03835ZnBz9SQ%0AIkNNUlbP9PMDnyNTpFzb2oNr3bZx/mL50uOeOHGCiIiIq04sJycnO3N4ZGVlsXnzZmbOnElAQABt%0A2rThz3/+M++++y5gpajdt28fx48fp169eiQlJTnLT5w4wb59+xAR2rZtS3Bw8GX3CgwM5P7772fZ%0AsmWAFd59z549zvt36dKFm266CYBbbrmFlJQUNm7ceNXP+o9//IPRo0cTExODw+FgwoQJJSaMe/bs%0ASbNmzQDo3Lkz3bt3Z9OmTQBuJ5Zdy5YvX07v3r2566678PX1ZezYseTl5bF582ZnnZEjRxIVFYXD%0A4eC+++5jx44dbnUGBARw6NAh0tPTCQgIcKbxXbVqFY0bN2b06NEEBAQQFBTktO3SpUt58cUXiYiI%0AICIigkmTJvHee+8523RNdVy3bl0WLFjA1KlTady4Mf7+/kyaNIkVK1Zw8eLFq9rRExgnUk56devF%0A3GFzufvQ3STuSeSug3fx9xF/p1e3q+eC8JYx/KthdJafOuI+jH+P5j3QSVqurXuz7m7bqOtTvhwz%0A4eHhHD9+/KoPj9jYWOd+RkYGDRo0oH79S5k04+PjSU9PB2DRokXs3buXVq1akZSU5Bw2GjJkCD16%0A9CAlJYWYmBjGjRtHYWEhmzZtIjg4mODgYG655RYABg0a5HQiS5cupV+/ftSta32mb7/9ljvuuING%0AjRoRFhbGggULOHHixFU/a2ZmZokc6PHx8SXOr1mzhvbt2xMeHo7D4WD16tXlarfYJq7tiQhxcXFO%0AmwBERUU59wMDA8nNzXXb1qxZs1BVkpKSuPnmm1m8eDEAaWlpNG/evMz7N2nSpMRny8jIcB43bNiQ%0AgIBLUb8PHTpEv379cDgcOBwOWrdujZ+fH7/++mu5Pm9VY5xIBejVrRdrFq1hw9sb+PLtL8vlQAzX%0AJ+56pgnbExgxsPypba+1jQ4dOlCnTh0++uijMuuUThrVuHFjTp48WeIheOTIEaejadGiBUuXLiUr%0AK4tx48bRv39/8vLy8PPz48UXX2T37t1s3ryZVatW8e6779KpUydycnLIyclh165dANx9991kZWXx%0A/fff88EHHzBo0CDnvQYNGkTfvn05evQop06dYujQoeX6BR0dHV0ita7rfn5+Pg8++CDPPvssx44d%0AIzs7m549ezp7G1ebq4yJieHw4cPOY1UlLS2NmJiYMm1aFpGRkbzxxhukp6ezYMECnnrqKfbv3098%0AfDwHDhxwe427FL+NGzcu837x8fF89tlnJVL8njt3jujo6Ct+Tk9hnEgF8fPxq3A+9NoQGReMzopQ%0A3DPtcbgHXQ52ocfhHswdPrdCPyyutY3Q0FBeeuklhg0bxieffMK5c+e4cOECa9asYdy4ccDlQzlx%0AcXEkJyczfvx48vPz2blzJ2+99RaPPPIIAO+//z5ZWVnO9kUEHx8f1q9fz65duygqKiI4OBh/f398%0Afd2H8/H39+ehhx5i7NixZGdn061bN+e53NxcHA4HAQEBbN26laVLl5brhZSHH36YV199lfT0dLKz%0As5kxY4bzXEFBAQUFBc6hvTVr1rB27aWhwsjISE6cOMGZM2fctv3QQw+RmprKunXruHDhArNnz6Zu%0A3brOoajSuBseK+bDDz/k6FErokVYWBgigq+vL7179yYzM5O5c+eSn59PTk4OW7duBWDgwIFMnTqV%0A48ePc/z4cV566SW3a2OKGTp0KBMmTHA60qysrDLfxqsWqmpypSY2qnFi3fDbxdu/Z0uWLNHExESt%0AX7++RkVFae/evXXLli2qak0gDxkypET9o0ePau/evbVBgwaakJCgCxYscJ575JFHtFGjRhoUFKQ3%0A33yzc0J+2bJl2rJlS61fv75GRkbqqFGjrjihv2nTJhURHT58eInyFStWaJMmTTQ4OFh79+6tI0aM%0AcOo7ePBgicly14n1wsJCHT16tIaHh2vz5s113rx5JerOmzdPIyMjNSwsTIcMGaIDBw50Tk6rqj7+%0A+OMaHh6uDodDMzIyLrPLRx99pK1bt9bQ0FDt2rWr/vjjj85zTZs21a+++sp57M6mxTz77LMaExOj%0AQUFBmpCQoAsXLnSe++GHH/Suu+5Sh8OhUVFROnPmTFVVPX/+vI4cOVKjo6M1OjpaR40apfn5+apq%0ATazHxcWVuMfFixd1zpw52rJlSw0ODtaEhAR9/vnn3eop67tLFU6smxXr1YDJ01G1mHwiBkP5qI4V%0A62Y4y2AwGAyVxvREDIarYHoihtqK6YkYDAaDwasxTqQa8IZ1DeXB6DQYDBXFOBGDwWAwVBozJ2Iw%0AXAUzJ2KorZh8IgaDl2AiNBsM7vHocJaI3CMiP4vIPhEZ5+Z8hIh8JiI7ROQHEXnU5VyYiKwQkZ9E%0A5EcRae9JrZ6ktozhG53uqewirPXr19f4gtzrRWdt0OitOj2Nx5yIiPgCfwfuAVoDA0WkValqw4Hv%0AVPVWoCswW0SKe0dzgdWq2gr4L+AnT2n1NGVF/PQ2jM6qxeisOmqDRqg9OqsST/ZEkoBfVPWQql4A%0APgDuL1UnEwix90OAE6paKCKhQCdVfQtAVQtV9bQHtXqUU6dO1bSEcmF0Vi1GZ9VRGzRC7dFZlXjS%0AicQAaS7HR+0yVxYCN4lIBvA9MMoubwZkichiEdkuIgtFpJ4HtRoMBoOhEnjSiZRnMG4CsENVGwO3%0AAvNEJBhrwv/3wGuq+nvgLPCcx5R6GNcwz96M0Vm1GJ1VR23QCLVHZ5Xiwcmc9sBnLsfjgXGl6qwG%0AOrocfwUkAlHAQZfy24FVbu6hZjOb2cxmtopvVfWs9+QrvtuAG0SkKZABDAAGlqrzM3A38LWIRAIt%0AgQOqelJE0kTkRlXda9fZXfoGWkXvORsMBoOhcnjMidgT5MOBzwFfYJGq/iQiT9rnFwDTgMUi8j3W%0A0NqzqnrSbmIEsEREAoD9wGOe0mowGAyGylGrV6wbDAaDoWaptbGzrraQsZq1HBKRnSLynYhstcsa%0AiMgXIrJXRNaKSJhL/fG27p9FpLsHdb0lIr+KyC6XsgrrEpH/FpFd9rm51aRzsogctW36nYjc6wU6%0A40RkvYjsthfHjrTLvcqmV9DpNTYVkboi8q290PhHEZlul3ubLcvS6TW2LKXX19bzqX3seXvW9GrK%0ASk7a+wK/AE0Bf2AH0KoG9RwEGpQqm4U1PAcwDphh77e29frb+n8BfDykqxPQFthVSV3FPdWtQJK9%0Avxq4pxp0TgKecVO3JnVGAbfa+0HAHqCVt9n0Cjq9yqZAPftfP+AbrBdovMqWV9DpVbZ0uf8zwBJg%0ApX3scXvW1p5IeRYyVjelJ/n7AO/Y++8Afe39+4FlqnpBVQ9h/ecleUKQqm4Csq9BVzsRiQaCVXWr%0AXe9dl2s8qRMut2lN6/yPqu6w93OxoijE4GU2vYJO8CKbquo5ezcA64dhNl5myyvoBC+yJYCIxAI9%0AgTddtHncnrXViZRnIWN1osCXIrJNRP7HLotU1V/t/V+BSHu/MZbeYqpbe0V1lS5Pp/r0jhCR70Vk%0AkUs33Ct0ivXWYVvgW7zYpi46v7GLvMamIuIjIjuwbLZeVXfjhbYsQyd4kS1tXgb+F7joUuZxe9ZW%0AJ+JtbwN0VNW2wL3AMBHp5HpSrX7hlTTXyOcph66a5HWsyAW3YoXHmV2zci4hIkHAP4FRqprjes6b%0AbGrrXIGlMxcvs6mqXlQrbl4s0FlE7ih13its6UZnV7zMliLSGzimqt/hvofkMXvWVieSDsS5HMdR%0A0ntWK6qaaf+bBXyENTz1q4hEAdhdxGN29dLaY+2y6qIiuo7a5bGlyj2uV1WPqQ1W97x4yK9GdYqI%0AP5YDeU9VP7aLvc6mLjrfL9bprTZVKy5eKvDfeKEt3ehM9EJbJgN9ROQgsAy4U0TeoxrsWVudiHMh%0Ao1jrSAYAK2tCiIjUEytUCyJSH+gO7LL1/NGu9keg+IGzEkgRkQARaQbcgDWRVV1USJeq/gc4IyLt%0ARESAIS7XeAz7C19MPyyb1qhOu91FwI+q+orLKa+yaVk6vcmmYqWBCLP3A4FuwHd4ny3d6ix+MNvU%0A+PdTVSeoapyqNgNSgHWqOoTqsGdl3wKo6Q1r6GgP1oTQ+BrU0QzrLYcdwA/FWoAGwJfAXmAtEOZy%0AzQRb989ADw9qW4YVLaAAaw7pscrowvqFuMs+92o16Hwca0JvJ1Zgzo+xxnZrWuftWOPNO7AeeN9h%0ApTrwKpuWofNeb7IpcAuw3da4E/jfyv7deNiWZen0Glu60dyFS29nedyeZrGhwWAwGCpNbR3OMhgM%0ABoMXYJyIwWAwGCqNcSIGg8FgqDTGiRgMBoOh0hgnYjAYDIZKY5yIwWAwGCqNcSIGr0dEwl1CbmfK%0ApRDc20XkionV7LDWVw1nLSJfV53imkdEHhWRv9W0DsP1jyfT4xoMVYKqnsAKIoiITAJyVHVO8XkR%0A8VXVojKu/Tfw73Lco2MVyfUWzAIwQ7VgeiKG2oiIyNsiMl9EvgFmishtIrLZ7p18LSI32hW7yqUE%0APZPFSoC1XkT2i8gIlwZzXepvEJEPReQnEXnfpU5Pu2ybiLxa3G4pYb4i8hcR2WpHeH3CLh8tIovs%0A/VvESvpTV0SSytD9qIh8LFYioYMiMlxExtr1toiIw663QUResXtmu0TkNjeaGorIClvTVhFJtsu7%0AuPTwtosVsNFgqBCmJ2KorShW2OoOqqpixS/rpKpFInI3MA3o7+a6G4E7gBBgj4i8ZvdiXH+534qV%0AtCcT+Np+6G4H5tv3OCwiS3H/a/9PwClVTRKROsD/FZHPgVeADSLSDyvcxBOqel5EfrqC7ptsl+4W%0APwAAAm5JREFULYHAfqyQG78XkTnAH4C5toZAVW0rVvTot7BCdbhGcp0LvKyqX4tIPPCZ/fnGAE+p%0A6hYRqQfkX8XmBsNlGCdiqM18qJfi9oQB74pIC6wHq7+b+gqkqpXI7ISIHMPKr5BRqt5WVc0AECuP%0ARDPgHHBAVQ/bdZYBT7i5R3fgFhEpdgQhwA2243kUKybR66q6pQzdrn+T61X1LHBWRE4BxT2fXcB/%0AudRbBlZyLxEJEZHQUpruBlpZ8fQACBYrWOjXwMsisgT4l6pWZzRpw3WCcSKG2sw5l/3/A3ylqv1E%0ApAmwoYxrClz2i3D/N5Dvpk7pXofbnA02w1X1CzflNwI5lEzycyXdrjouuhxfLEO3a93SWtupakGp%0A8pkisgrohdXj6qGqe67QrsFwGWZOxHC9EMKlHsVjZdS50oP/SihWxOjm9oMerPQD7oazPgeeKn5r%0ATERuFCtdQCjWsFInIFxEHqyA7tJIqf0B9r1uxxpKyylVfy0w0nmByK32vwmqultVZwH/D2hZzvsb%0ADE6MEzHUZlwf4rOA6SKyHSsPtrqpd6XMbu7qXypQPQ88BXwmItuAM/ZWmjeBH4HtIrILKwOeHzAH%0A+Luq/oI1bzJDRCKuoLu01tL7rvXO29e/Zrddus5IINGe6N/NpWG4UfZk/PdYPbQ1bi1jMFwBEwre%0AYCgnIlLfnqNAROYBe1X1qmtQPKxpPTBGVbfXpA7DbxfTEzEYys//2K/D7sYahlpQ04IMhprG9EQM%0ABoPBUGlMT8RgMBgMlcY4EYPBYDBUGuNEDAaDwVBpjBMxGAwGQ6UxTsRgMBgMlcY4EYPBYDBUmv8P%0AasrGYHAu0scAAAAASUVORK5CYII=) +(我们对所有可用数据的64%进行了有效训练:我们为上述测试集保留了20%,并且5折交叉验证为验证集保留了另外20%=>`0.8 * 0.8 * 5574 = 3567`训练左边的例子。) +由于性能在训练和交叉验证分数方面都在不断增长,我们认为我们的模型不够复杂/灵活,无法在很少数据的情况下捕获所有细微差别。在这种特殊情况下,它并不是非常明显,因为无论如何精度都很高。 -(We're effectively training on 64% of all available data: we reserved 20% for the test set above, and the 5-fold cross validation reserves another 20% for validation sets => `0.8*0.8*5574=3567` training examples left.) - - - -Since performance keeps growing, both for training and cross validation scores, we see our model is not complex/flexible enough to capture all nuance, given little data. In this particular case, it's not very pronounced, since the accuracies are high anyway. - -At this point, we have two options: - -1. use more training data, to overcome low model complexity -2. use a more complex (lower bias) model to start with, to get more out of the existing data - -Over the last years, as massive training data collections become more available, and as machines get faster, approach 1. is becoming more and more popular (simpler algorithms, more data). Straightforward algorithms, such as Naive Bayes, also have the added benefit of being easier to interpret (compared to some more complex, black-box models, like neural networks). - -Knowing how to evaluate models properly, we can now explore how different parameters affect the performace. - - +此时,我们有两个选择: -## Step 6: How to tune parameters? +1. 使用更多的训练数据,克服低模型的复杂性 +2. 使用更复杂(更低偏差)的模型开始,从现有数据中获取更多信息 +在过去几年中,随着大量训练数据收集变得更加可用,并且随着机器变得更快,方法1.变得越来越流行(更简单的算法,更多数据)。简单易懂的算法,如朴素贝叶斯,也具有更易于理解的额外好处(与一些更复杂的黑盒模型,如神经网络相比)。 +知道如何正确评估模型,我们现在可以探索不同的参数如何影响性能。 -What we've seen so far is only a tip of the iceberg: there are many other parameters to tune. One example is what algorithm to use for training. - -We've used Naive Bayes above, but scikit-learn supports many classifiers out of the box: Support Vector Machines, Nearest Neighbours, Decision Trees, Ensamble methods... +##步骤6:如何调整参数? +到目前为止我们所看到的仅仅是冰山一角:还有许多其他参数需要调整。一个例子是用于训练的算法。 [![img](https://radimrehurek.com/data_science_python/drop_shadows_background.png)](https://peekaboo-vision.blogspot.cz/2013/01/machine-learning-cheat-sheet-for-scikit.html) +我们可以问:IDF加权对准确性的影响是什么? 词形还原的额外处理成本(仅与普通词汇相比)真的有帮助吗? - -We can ask: What is the effect of IDF weighting on accuracy? Does the extra processing cost of lemmatization (vs. just plain words) really help? - -Let's find out: +我们来看看: In [37]: -``` +```python params = { 'tfidf__use_idf': (True, False), 'bow__analyzer': (split_into_lemmas, split_into_tokens), @@ -1005,7 +864,7 @@ grid = GridSearchCV( In [38]: -``` +```python %time nb_detector = grid.fit(msg_train, label_train) print nb_detector.grid_scores_ ``` @@ -1018,51 +877,41 @@ Wall time: 20.2 s [mean: 0.94752, std: 0.00357, params: {'tfidf__use_idf': True, 'bow__analyzer': }, mean: 0.92958, std: 0.00390, params: {'tfidf__use_idf': False, 'bow__analyzer': }, mean: 0.94528, std: 0.00259, params: {'tfidf__use_idf': True, 'bow__analyzer': }, mean: 0.92868, std: 0.00240, params: {'tfidf__use_idf': False, 'bow__analyzer': }] ``` +(首先显示最佳参数组合:在这种情况下,`use_idf = True`和`analyzer = split_into_lemmas`是最优的)。 - -(best parameter combinations are displayed first: in this case, `use_idf=True` and `analyzer=split_into_lemmas` take the prize). - -A quick sanity check: +快速全面地检查: In [39]: -``` +```python print nb_detector.predict_proba(["Hi mom, how are you?"])[0] print nb_detector.predict_proba(["WINNER! Credit for free!"])[0] ``` - - ``` [ 0.99383955 0.00616045] [ 0.29663109 0.70336891] ``` - - -The `predict_proba` returns the predicted probability for each class (ham, spam). In the first case, the message is predicted to be ham with > 99% probability, and spam with < 1%. So if forced to choose, the model will say "ham": +`predict_proba`返回每个类(火腿,垃圾邮件)的预测概率。 在第一种情况下,预测消息的概率为> 99%,垃圾邮件<1%。 因此,如果被迫选择,该模型会说“火腿”: In [40]: -``` +```python print nb_detector.predict(["Hi mom, how are you?"])[0] print nb_detector.predict(["WINNER! Credit for free!"])[0] ``` - - ``` ham spam ``` - - -And overall scores on the test set, the one we haven't used at all during training: +测试集上的整体分数,我们在训练期间根本没有计算分数: In [41]: -``` +```python predictions = nb_detector.predict(msg_test) print confusion_matrix(label_test, predictions) print classification_report(label_test, predictions) @@ -1081,17 +930,13 @@ print classification_report(label_test, predictions) avg / total 0.96 0.96 0.96 1115 ``` +这是我们可以从垃圾邮件检测管道中获得的真实预测性能,当使用带有词形化的小写,TF-IDF和Naive Bayes用于分类器时。 - -This is then the realistic predictive performance we can expect from our spam detection pipeline, when using lowercase with lemmatization, TF-IDF and Naive Bayes for classifier. - - - -Let's try with another classifier: [Support Vector Machines (SVM)](https://en.wikipedia.org/wiki/Support_vector_machine). SVMs are a great starting point when classifying text data, getting state of the art results very quickly and with pleasantly little tuning (although a bit more than Naive Bayes): +让我们尝试使用另一个分类器:[支持向量机(SVM)](https://en.wikipedia.org/wiki/Support_vector_machine)。 在对文本数据进行分类时,SVM是一个很好的起点,可以非常快速地获得最先进的结果,并且调参得很愉快(虽然比Naive Bayes多一点): In [42]: -``` +```python pipeline_svm = Pipeline([ ('bow', CountVectorizer(analyzer=split_into_lemmas)), ('tfidf', TfidfTransformer()), @@ -1116,34 +961,30 @@ grid_svm = GridSearchCV( In [43]: -``` +```python %time svm_detector = grid_svm.fit(msg_train, label_train) # find the best combination from param_svm print svm_detector.grid_scores_ ``` -``` +```python CPU times: user 5.24 s, sys: 170 ms, total: 5.41 s Wall time: 1min 8s [mean: 0.98677, std: 0.00259, params: {'classifier__kernel': 'linear', 'classifier__C': 1}, mean: 0.98654, std: 0.00100, params: {'classifier__kernel': 'linear', 'classifier__C': 10}, mean: 0.98654, std: 0.00100, params: {'classifier__kernel': 'linear', 'classifier__C': 100}, mean: 0.98654, std: 0.00100, params: {'classifier__kernel': 'linear', 'classifier__C': 1000}, mean: 0.86432, std: 0.00006, params: {'classifier__gamma': 0.001, 'classifier__kernel': 'rbf', 'classifier__C': 1}, mean: 0.86432, std: 0.00006, params: {'classifier__gamma': 0.0001, 'classifier__kernel': 'rbf', 'classifier__C': 1}, mean: 0.86432, std: 0.00006, params: {'classifier__gamma': 0.001, 'classifier__kernel': 'rbf', 'classifier__C': 10}, mean: 0.86432, std: 0.00006, params: {'classifier__gamma': 0.0001, 'classifier__kernel': 'rbf', 'classifier__C': 10}, mean: 0.97040, std: 0.00587, params: {'classifier__gamma': 0.001, 'classifier__kernel': 'rbf', 'classifier__C': 100}, mean: 0.86432, std: 0.00006, params: {'classifier__gamma': 0.0001, 'classifier__kernel': 'rbf', 'classifier__C': 100}, mean: 0.98722, std: 0.00280, params: {'classifier__gamma': 0.001, 'classifier__kernel': 'rbf', 'classifier__C': 1000}, mean: 0.97040, std: 0.00587, params: {'classifier__gamma': 0.0001, 'classifier__kernel': 'rbf', 'classifier__C': 1000}] ``` +显然,具有“C = 1”的线性内核是最佳参数组合。 - -So apparently, linear kernel with `C=1` is the best parameter combination. - -Sanity check again: +再次进行全面检查: In [44]: -``` +```python print svm_detector.predict(["Hi mom, how are you?"])[0] print svm_detector.predict(["WINNER! Credit for free!"])[0] ``` - - ``` ham spam @@ -1151,7 +992,7 @@ spam In [45]: -``` +```python print confusion_matrix(label_test, svm_detector.predict(msg_test)) print classification_report(label_test, svm_detector.predict(msg_test)) ``` @@ -1169,25 +1010,19 @@ print classification_report(label_test, svm_detector.predict(msg_test)) avg / total 0.98 0.98 0.98 1115 ``` +这是我们在使用SVM时可以从垃圾邮件检测管道中获得的实际预测性能。 +## 步骤7:生成预测变量 -This is then the realistic predictive performance we can expect from our spam detection pipeline, when using SVMs. - - - -## Step 7: Productionalizing a predictor - - +通过基本分析和调整,真正的工作(工程)开始了。 -With basic analysis and tuning done, the real work (engineering) begins. +生产预测器的最后一步是再次对整个数据集进行训练,以充分利用所有可用数据。 当然,我们使用上面通过交叉验证找到的最佳参数。 这与我们在开始时所做的非常相似,但这一次对其行为和稳定性有所了解。 在不同的训练/测试子集分裂上诚实地进行评估。 -The final step for a production predictor would be training it on the entire dataset again, to make full use of all the data available. We'd use the best parameters found via cross validation above, of course. This is very similar to what we did in the beginning, but this time having insight into its behaviour and stability. Evaluation was done honestly, on distinct train/test subset splits. - -The final predictor can be serialized to disk, so that the next time we want to use it, we can skip all training and use the trained model directly: +最终预测器可以序列化到磁盘,以便下次我们想要使用它时,我们可以跳过所有培训并直接使用训练模型: In [46]: -``` +```python # store the spam detector to disk after training with open('sms_spam_detector.pkl', 'wb') as fout: cPickle.dump(svm_detector, fout) @@ -1196,68 +1031,50 @@ with open('sms_spam_detector.pkl', 'wb') as fout: svm_detector_reloaded = cPickle.load(open('sms_spam_detector.pkl')) ``` - - -The loaded result is an object that behaves identically to the original: +加载的结果是一个与原始行为完全相同的对象: In [47]: -``` +```python print 'before:', svm_detector.predict([message4])[0] print 'after:', svm_detector_reloaded.predict([message4])[0] ``` - - ``` before: ham after: ham ``` +生产发布的另一个重要部分是**性能**。在如此处所示的快速,迭代模型调整和参数搜索之后,可以将表现良好的模型翻译成不同的语言并进行优化。交易几个精度点会给我们一个更小,更快的模型吗?是否值得优化内存使用,也许使用`mmap`来跨进程共享内存? +请注意,并不总是需要优化;总是从实际的分析开始。 -Another important part of a production implementation is **performance**. After a rapid, iterative model tuning and parameter search as shown here, a well performing model can be translated into a different language and optimized. Would trading a few accuracy points give us a smaller, faster model? Is it worth optimizing memory usage, perhaps using `mmap` to share memory across processes? - -Note that optimization is not always necessary; always start with actual profiling. - -Other things to consider here, for a production pipeline, are **robustness** (service failover, redundancy, load balancing), **monitoring** (incl. auto-alerts on anomalies) and **HR fungibility** (avoiding "knowledge silos" of how things are done, arcane/lock-in technologies, black art of tuning results). These days, even the open source world can offer viable solutions in all of these areas. All the tool shown today are free for commercial use, under OSI-approved open source licenses. - - - -# Other practical concepts - - - -data sparsity - -online learning, data streams - -`mmap` for memory sharing, system "cold-start" load times - -scalability, distributed (cluster) processing - - - -# Unsupervised learning +对于生产管道,此处需要考虑的其他事项包括:**稳健性**(服务故障转移,冗余,负载平衡),**监控**(包括异常情况下的自动警报)和**HR可替代性**(避免关于事情如何完成的“知识孤岛”,神秘/锁定技术,调整结果的黑色艺术。如今,即使是开源世界也可以在所有这些领域提供可行的解决方案。根据OSI批准的开源许可证,今天显示的所有工具都可以免费用于商业用途。 +# 其他实用概念 +数据稀疏性 -Most data *not* structured. Gaining insight, no intrinsic evaluation possible (or else becomes supervised learning!). +在线学习,数据流 -How can we train *anything* without labels? What kind of sorcery is this? +`mmap`用于内存共享,系统“冷启动”加载时间 -[Distributional hypothesis](https://en.wikipedia.org/wiki/Distributional_semantics): *"Words that occur in similar contexts tend to have similar meanings"*. Context = sentence, document, sliding window... +可伸缩性,分布式(集群)处理 -Check out this [live demo of Google's word2vec](https://radimrehurek.com/2014/02/word2vec-tutorial/#app) for unsupervised learning. Simple model, large data (Google News, 100 billion words, no labels). +# 无监督学习 +大多数数据*不是*结构化的。获得洞察力,没有内在的评估可能(或者成为有监督的学习!)。 +如何在没有标签的情况下培训*任何东西*?这是什么魔法? -# Where next? +[分布式假设](https://en.wikipedia.org/wiki/Distributional_semantics):*“在类似情境中出现的词语往往具有相似的含义”*。上下文=句子,文档,滑动窗口...... +对于无监督学习,请查看[Google的word2vec的实时演示](https://radimrehurek.com/2014/02/word2vec-tutorial/#app)。简单的模型,大数据(谷歌新闻,1000亿字,没有标签)。 +# 下一个? -A static (non-interactive version) of this notebook rendered into HTML at [http://radimrehurek.com/data_science_python](https://radimrehurek.com/data_science_python) (you're probably watching it right now, but just in case). +这个笔记本的静态(非交互版本)在[http://radimrehurek.com/data_science_python](https://radimrehurek.com/data_science_python)上呈现为HTML(你现在可能正在观看它,但只是在案例)。 -Interactive notebook source lives on GitHub: (see top for installation instructions). +交互式笔记本电脑源于GitHub:(有关安装说明,请参见顶部)。 -My company, [RaRe Technologies](http://rare-technologies.com/), lives at the exciting intersection of **pragmatic, commercial system building** and **cutting edge research**. Interested in interning / collaboration? [Get in touch](http://rare-technologies.com/#contactus). \ No newline at end of file +我的公司[RaRe Technologies](http://rare-technologies.com/),生活在**务实的商业系统建设**和**前沿研究**的令人兴奋的交叉点。对实习/合作感兴趣? [联系](http://rare-technologies.com/#contactus)。 diff --git "a/20171016 \347\254\25414\346\234\237/README.md" "b/20171016 \347\254\25414\346\234\237/README.md" index 815d47fcdb616e792a9c0ae7b2911b6a3e832ecb..441fdbd2c7cbc9ecdc5b1c9b9c0616687b0fc811 100644 --- "a/20171016 \347\254\25414\346\234\237/README.md" +++ "b/20171016 \347\254\25414\346\234\237/README.md" @@ -1,5 +1,5 @@ | 标题 | 简介 | | ------------------------------------------------------------ | ---- | -| [BUILDING A NEURAL NET FROM SCRATCH IN GO](https://www.datadan.io/building-a-neural-net-from-scratch-in-go/?from=hackcv&hmsr=hackcv.com&utm_medium=hackcv.com&utm_source=hackcv.com) | | -| [A Quick Introduction to Neural Networks](https://ujjwalkarn.me/2016/08/09/quick-intro-neural-networks/?from=hackcv&hmsr=hackcv.com&utm_medium=hackcv.com&utm_source=hackcv.com) | | -| [Practical Data Science in Python](https://radimrehurek.com/data_science_python/?from=hackcv&hmsr=hackcv.com&utm_medium=hackcv.com&utm_source=hackcv.com) | | \ No newline at end of file +| [BUILDING A NEURAL NET FROM SCRATCH IN GO](https://www.datadan.io/building-a-neural-net-from-scratch-in-go/?from=hackcv&hmsr=hackcv.com&utm_medium=hackcv.com&utm_source=hackcv.com) | 介绍用Go构建一个简单的神经网络 | +| [A Quick Introduction to Neural Networks](https://ujjwalkarn.me/2016/08/09/quick-intro-neural-networks/?from=hackcv&hmsr=hackcv.com&utm_medium=hackcv.com&utm_source=hackcv.com) | 神经网络的快速入门 | +| [Practical Data Science in Python](https://radimrehurek.com/data_science_python/?from=hackcv&hmsr=hackcv.com&utm_medium=hackcv.com&utm_source=hackcv.com) | 用垃圾邮件分类完整的介绍了建立机器学习模型过程(Python2) | \ No newline at end of file