2021-02-20 20:38:44

aa628f81 · wizardforcel · 83dd9939 · aa628f81 · aa628f81 · aa628f81
12 changed file
--- a/new/begin-ds-py-jupyter/1.md
+++ b/new/begin-ds-py-jupyter/1.md
@@ -158,7 +158,7 @@ Jupyter 具有的许多吸引人的功能，这些特征使高效的 Python 编

 ### 注意

-可以在以下位置找到 IPython 的官方文档： [http://ipython.readthedocs.io/en/stable/](http://ipython.readthedocs.io/en/stable/) 。 它包含有关我们将在此处和其他地方讨论的功能的详细信息。
+[可以在以下位置找到 IPython 的官方文档](http://ipython.readthedocs.io/en/stable/)。 它包含有关我们将在此处和其他地方讨论的功能的详细信息。

 ### 探索 Jupyter 的一些最有用的功能

@@ -676,7 +676,7 @@ scikit_learn==0.19.0

 这些字段中的任何一个都可能很适合四阶多项式。 例如，如果您有兴趣预测连续日期范围内的温度或降水，这将是一个非常有价值的模型。

-您可以在此处找到此数据源： [http://climate.weather.gc.ca/climate_normals/results_e.html?stnID=888](http://climate.weather.gc.ca/climate_normals/results_e.html?stnID=888) 。
+[您可以在此处找到此数据源](http://climate.weather.gc.ca/climate_normals/results_e.html?stnID=888)。

 ## 活动 B：建立三阶多项式模型


--- a/new/begin-ds-py-jupyter/2.md
+++ b/new/begin-ds-py-jupyter/2.md
@@ -809,7 +809,7 @@ k 折交叉验证算法如下：

 ### 注意

-可在此处找到绘图验证曲线的文档： [http://scikit-learn.org/stable/auto_examples/model_selection/plot_validation_curve.html](http://scikit-learn.org/stable/auto_examples/model_selection/plot_validation_curve.html) 。
+[可在此处找到绘图验证曲线的文档](http://scikit-learn.org/stable/auto_examples/model_selection/plot_validation_curve.html)。

 考虑以下验证曲线，在该曲线中，准确度得分是根据伽玛 SVM 参数绘制的：


--- a/new/begin-ds-py-jupyter/3.md
+++ b/new/begin-ds-py-jupyter/3.md
@@ -368,7 +368,7 @@ HTTP 方法

 我们将获得每个国家的人口。 然后，在下一个主题中，此将与一起可视化为上一节中提取的利率数据。

-我们在此活动中查看的页面可在此处找到： [http://www.worldometers.info/world-population/population-by-country/](http://www.worldometers.info/world-population/population-by-country/) 。
+[我们在此活动中查看的页面可在此处找到](http://www.worldometers.info/world-population/population-by-country/)。

 既然我们已经了解了 Web 抓取的基础知识，那么让我们将相同的技术应用于新的网页并抓取更多数据！


--- a/new/py-ds-essentials/1.md
+++ b/new/py-ds-essentials/1.md
@@ -1302,7 +1302,7 @@ Out: (1605, 119) (1605,)

 例如，如果您打算分析波士顿的住房数据并使用 [http://mldata.org/repository/data/viewslug/regression-datasets-housing](http://mldata.org/repository/data/viewslug/regression-datasets-housing) 中提供的版本，则必须首先下载 您本地目录中的`regression-datasets-housing.csv`文件。

-您可以使用以下链接直接下载数据集： [http://mldata.org/repository/data/download/csv/regression-datasets-housing](http://mldata.org/repository/data/download/csv/regression-datasets-housing) 。
+[您可以使用以下链接直接下载数据集](http://mldata.org/repository/data/download/csv/regression-datasets-housing)。

 由于数据集中的变量都是数字变量（13 个连续和一个二进制），因此加载和开始使用它的最快方法是尝试使用`loadtxt` NumPy 函数并将所有数据直接加载到数组中。


--- a/new/py-ds-essentials/4.md
+++ b/new/py-ds-essentials/4.md
@@ -771,7 +771,7 @@ In: from sklearn.metrics import accuracy_score, confusion_matrix

 当您的数据集包含大量案例或变量时，即使 XGBoost 是用 C ++编译的，训练起来也确实需要很长时间。 因此，尽管 XGBoost 取得了成功，但仍有空间在 2017 年 1 月推出另一种算法（XGBoost 的首次出现是在 2015 年 3 月）。 它是高性能的 LightGBM，能够分发和快速处理大量数据，并且由 Microsoft 的团队作为一个开源项目开发。

-这是它的 GitHub 页面： [https://github.com/Microsoft/LightGBM](https://github.com/Microsoft/LightGBM) 。 并且，[这里是说明该算法背后思想的学术论文](https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree)。
+[这是它的 GitHub 页面](https://github.com/Microsoft/LightGBM)。 并且，[这里是说明该算法背后思想的学术论文](https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree)。

 LightGBM 基于决策树以及 XGBoost，但遵循不同的策略。 XGBoost 使用决策树对变量进行拆分，并在该变量上探索不同的切分（逐级树增长策略），而 LightGBM 则专注于拆分并从那里进行拆分，以实现更好的拟合（这就是叶子 明智的树木生长策略）。 这使得 LightGBM 可以首先快速获得数据的良好拟合度，并生成与 XGBoost 相比的替代解决方案（如果您希望将这两个解决方案进行混合（即平均），则可以很好地解决这一问题，以减少估计值的方差） ）。


--- a/new/py-ds-essentials/5.md
+++ b/new/py-ds-essentials/5.md
@@ -555,7 +555,7 @@ In: current_palette = sns.color_palette()

 您还必须添加`hls`，`husl`和所有 matplotlib 颜色图，可以通过在名称后添加 *_r* 来反转它们，或者通过添加 *_d* 使其更暗。

-matplotlib 颜色图的名称和示例都可以在以下网页上找到： [http://matplotlib.org/examples/color/colormaps_reference.html](http://matplotlib.org/examples/color/colormaps_reference.html) 。
+[matplotlib 颜色图的名称和示例都可以在以下网页上找到](http://matplotlib.org/examples/color/colormaps_reference.html)。

 `hls`颜色空间是 RGB 值刻度的自动转换，由于颜色的强度不同（例如，黄色和绿色被感知为较亮，而蓝色被视为为较暗），因此它可能对您的表示形式有效或无效。 。


--- a/new/py-ds-essentials/7.md
+++ b/new/py-ds-essentials/7.md
@@ -59,7 +59,7 @@

 现在让我们将深度神经网络应用于图像分类问题。 在这里，我们将尝试根据其图像预测交通标志。 对于此任务，我们将使用 **CNN** （**卷积神经网络**），该技术能够利用图像中附近像素之间的空间相关性，这是目前的水平 在解决此类问题时进行深度学习。

-数据集可在此处找到： [http://benchmark.ini.rub.de/?section=gtsrb & subsection = dataset](http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset) 。 我们要感谢团队免费发布了数据集，并参考了涉及该数据集的出版物：
+[数据集可在此处找到](http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset)。 我们要感谢团队免费发布了数据集，并参考了涉及该数据集的出版物：
 <q>J. Stallkamp，M。Schlipsing，J。Salmen 和 C. Igel。 德国交通标志识别基准测试：多类别分类比赛。 在 IEEE 国际神经网络联合会议论文集第 1453-1460 页中。 2011\.</q>

 首先，下载数据集，然后将其解压缩。 数据集的文件名为`GTSRB_Final_Training_Images.zip`，解压缩后会发现一个名为`GTSRB`的新目录，其中包含与 Jupyter 笔记本相同目录中的所有图像。

--- a/new/thoughtful-ds/01.md
+++ b/new/thoughtful-ds/01.md
@@ -165,7 +165,7 @@

 如果您正在考虑学习这些技能，但又没有时间参加传统课程，我强烈建议您使用在线课程。

-我特别推荐此课程： [https://www.coursera.org/](https://www.coursera.org/) ： [https://www.coursera.org/learn/data-science-course](https://www.coursera.org/learn/data-science-course) 。
+[我特别推荐此课程](https://www.coursera.org/)： [https://www.coursera.org/learn/data-science-course](https://www.coursera.org/learn/data-science-course) 。

 经典的 Drew 的 Conway Venn 图很好地展示了什么是数据科学以及为什么数据科学家有点独角兽：


--- a/new/thoughtful-ds/02.md
+++ b/new/thoughtful-ds/02.md
@@ -109,7 +109,7 @@ my_data = [

 ### 注意

-您可以按照以下说明在本地安装 Notebook 服务器： [http://jupyter.readthedocs.io/en/latest/install.html](http://jupyter.readthedocs.io/en/latest/install.html) 。
+[您可以按照以下说明在本地安装 Notebook 服务器](http://jupyter.readthedocs.io/en/latest/install.html)。

 要在本地启动 Notebook 服务器，只需从终端运行以下命令：


--- a/new/thoughtful-ds/05.md
+++ b/new/thoughtful-ds/05.md
@@ -787,7 +787,7 @@ class SimpleDisplayWithRenderer(BaseChartDisplay):

 测试简单表的渲染器实现

-您可以在以下位置找到有关此主题的更多材料： [https://pixiedust.github.io/pixiedust/develop.html](https://pixiedust.github.io/pixiedust/develop.html) 。 希望到目前为止，您对定制的类型有个好主意，可以编写将定制的可视化集成到`display()`框架中。
+[您可以在以下位置找到有关此主题的更多材料](https://pixiedust.github.io/pixiedust/develop.html)。 希望到目前为止，您对定制的类型有个好主意，可以编写将定制的可视化集成到`display()`框架中。

 在下一节中，我们将为开发人员讨论一个非常重要的主题：调试。


--- a/new/thoughtful-ds/07.md
+++ b/new/thoughtful-ds/07.md
@@ -383,7 +383,7 @@ stream.disconnect()

 流式 DataFrame 流

-来源： [https://spark.apache.org/docs/latest/img/structured-streaming-stream-as-a-table.png](https://spark.apache.org/docs/latest/img/structured-streaming-stream-as-a-table.png)
+[来源](https://spark.apache.org/docs/latest/img/structured-streaming-stream-as-a-table.png)

 Spark Streaming Python API 提供了一种优雅的方法，可以使用`spark.readStream`属性创建 Streaming DataFrame，该属性创建一个新的`pyspark.sql.streamingreamReader`对象，该对象方便地使您链接方法调用，并具有创建更清晰代码的额外好处（请参见 [https： //en.wikipedia.org/wiki/Method_chaining](https://en.wikipedia.org/wiki/Method_chaining) ，以获取有关此模式的更多详细信息）。


--- a/new/thoughtful-ds/08.md
+++ b/new/thoughtful-ds/08.md
@@ -412,7 +412,7 @@ array([[ 0,  2,  4,  6],

 广播流程说明

-来源： [http://www.scipy-lectures.org/_images/numpy_broadcasting.png](http://www.scipy-lectures.org/_images/numpy_broadcasting.png)
+[来源](http://www.scipy-lectures.org/_images/numpy_broadcasting.png)

 上图中演示的三个用例是：