{ "nbformat": 4, "nbformat_minor": 2, "metadata": { "language_info": { "name": "python", "codemirror_mode": { "name": "ipython", "version": 3 }, "version": "3.6.10-final" }, "orig_nbformat": 2, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "npconvert_exporter": "python", "pygments_lexer": "ipython3", "version": 3, "kernelspec": { "name": "python3", "display_name": "Python 3" } }, "cells": [ { "source": [ "# 使用PaddlePaddle完成新冠疫情病例数预测\n", "\n", "2019年12月以来,新冠疫情在全球肆虐,呈现大流行的特征。新型冠状病毒肺炎以发热、干咳、乏力等为主要表现,重症病例多在1周后出现呼吸困难,严重者快速进展为急性呼吸窘迫综合征、脓毒症休克、难以纠正的代谢性酸中毒和出凝血功能障碍及多器官功能衰竭等,对人们的健康造成了极其严重的威胁。同时,为抵御新冠病毒的扩散,不少国家和地区采取了封锁性防疫举措,全球经济复苏的进程因此受阻,政府债务不断上升。\n", "\n", "在这种背景下,各国人民都期盼着疫情的结束,早日恢复往常的生产、生活方式。本文关注到这一问题,结合约翰斯·霍普金斯大学发布的全球新冠肺炎实时统计数据,通过时间卷积神经网络对时间序列建模,实现预测未来病例数的目的。" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "## 准备环境\n", "\n", "在开始建模之前,我们需要导入必要的包,同时为了更好地展示数据结果,我们在这里配置画图功能。" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stderr", "text": "/mnt/qiujinxuan/PaddleNLP/paddlenlp/seq2vec/encoder.py:683: DeprecationWarning: invalid escape sequence \\s\n \"\"\"\n/mnt/qiujinxuan/PaddleNLP/paddlenlp/seq2vec/encoder.py:740: DeprecationWarning: invalid escape sequence \\s\n \"\"\"\n" } ], "source": [ "import os\n", "import sys\n", "\n", "import paddle\n", "import paddle.nn as nn\n", "import numpy as np\n", "import pandas as pd\n", "import seaborn as sns\n", "from pylab import rcParams\n", "import matplotlib.pyplot as plt\n", "from matplotlib import rc\n", "from sklearn.preprocessing import MinMaxScaler\n", "from pandas.plotting import register_matplotlib_converters\n", "\n", "sys.path.append(os.path.abspath(os.path.join(os.getcwd(), \"../..\")))\n", "from paddlenlp.seq2vec import TCNEncoder\n", "\n", "\n", "# config matplotlib\n", "%matplotlib inline\n", "%config InlineBackend.figure_format='retina'\n", "sns.set(style='whitegrid', palette='muted', font_scale=1.2)\n", "HAPPY_COLORS_PALETTE = [\"#01BEFE\", \"#FFDD00\", \"#FF7D00\", \"#FF006D\", \"#93D30C\", \"#8F00FF\"]\n", "sns.set_palette(sns.color_palette(HAPPY_COLORS_PALETTE))\n", "rcParams['figure.figsize'] = 14, 10\n", "register_matplotlib_converters()" ] }, { "source": [ "## 数据下载\n", "\n", "数据集由约翰·霍普金斯大学系统科学与工程中心提供,每日最新数据可以从https://github.com/CSSEGISandData/COVID-19 仓库中获取,我们在本例中提供了2020年11月24日下载的病例数据。" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# !wget https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv" ] }, { "source": [ "数据集中包含了国家、省份、纬度、经度以及从2020年1月22日至今的病例数等信息。" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "## 数据预览\n", "\n", "数据集中包含了国家/地区、省份/州、纬度、经度、日期、病例数等信息。" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": " Province/State Country/Region Lat Long 1/22/20 1/23/20 \\\n0 NaN Afghanistan 33.93911 67.709953 0 0 \n1 NaN Albania 41.15330 20.168300 0 0 \n2 NaN Algeria 28.03390 1.659600 0 0 \n3 NaN Andorra 42.50630 1.521800 0 0 \n4 NaN Angola -11.20270 17.873900 0 0 \n\n 1/24/20 1/25/20 1/26/20 1/27/20 ... 11/13/20 11/14/20 11/15/20 \\\n0 0 0 0 0 ... 42969 43035 43240 \n1 0 0 0 0 ... 26701 27233 27830 \n2 0 0 0 0 ... 65975 66819 67679 \n3 0 0 0 0 ... 5725 5725 5872 \n4 0 0 0 0 ... 13228 13374 13451 \n\n 11/16/20 11/17/20 11/18/20 11/19/20 11/20/20 11/21/20 11/22/20 \n0 43403 43628 43851 44228 44443 44503 44706 \n1 28432 29126 29837 30623 31459 32196 32761 \n2 68589 69591 70629 71652 72755 73774 74862 \n3 5914 5951 6018 6066 6142 6207 6256 \n4 13615 13818 13922 14134 14267 14413 14493 \n\n[5 rows x 310 columns]", "text/html": "
\n | Province/State | \nCountry/Region | \nLat | \nLong | \n1/22/20 | \n1/23/20 | \n1/24/20 | \n1/25/20 | \n1/26/20 | \n1/27/20 | \n... | \n11/13/20 | \n11/14/20 | \n11/15/20 | \n11/16/20 | \n11/17/20 | \n11/18/20 | \n11/19/20 | \n11/20/20 | \n11/21/20 | \n11/22/20 | \n
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \nNaN | \nAfghanistan | \n33.93911 | \n67.709953 | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n... | \n42969 | \n43035 | \n43240 | \n43403 | \n43628 | \n43851 | \n44228 | \n44443 | \n44503 | \n44706 | \n
1 | \nNaN | \nAlbania | \n41.15330 | \n20.168300 | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n... | \n26701 | \n27233 | \n27830 | \n28432 | \n29126 | \n29837 | \n30623 | \n31459 | \n32196 | \n32761 | \n
2 | \nNaN | \nAlgeria | \n28.03390 | \n1.659600 | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n... | \n65975 | \n66819 | \n67679 | \n68589 | \n69591 | \n70629 | \n71652 | \n72755 | \n73774 | \n74862 | \n
3 | \nNaN | \nAndorra | \n42.50630 | \n1.521800 | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n... | \n5725 | \n5725 | \n5872 | \n5914 | \n5951 | \n6018 | \n6066 | \n6142 | \n6207 | \n6256 | \n
4 | \nNaN | \nAngola | \n-11.20270 | \n17.873900 | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n... | \n13228 | \n13374 | \n13451 | \n13615 | \n13818 | \n13922 | \n14134 | \n14267 | \n14413 | \n14493 | \n
5 rows × 310 columns
\n