提交 2713acfd 编写于 作者: W wizardforcel

2021-01-15 23:33:20

上级 7f75695d
......@@ -8,67 +8,68 @@
1. Import the required libraries, including pandas, for importing a CSV file:
将熊猫作为 pd 导入
进口火炬
将 torch.nn 导入为 nn
导入 matplotlib.pyplot 作为 plt
```py
import pandas as pd
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
```
2. Read the CSV file containing the dataset:
数据= pd.read_csv(“ SomervilleHappinessSurvey2015.csv”)
```py
data = pd.read_csv("SomervilleHappinessSurvey2015.csv")
```
3. Separate the input features from the target. Note that the target is located in the first column of the CSV file. Convert the values into tensors, making sure the values are converted into floats:
x = torch.tensor(data.iloc [:,1:]。values).float()
y = torch.tensor(data.iloc [:,:1] .values).float()
```py
x = torch.tensor(data.iloc[:,1:].values).float()
y = torch.tensor(data.iloc[:,:1].values).float()
```
4. Define the architecture of the model and store it in a variable named **model**. Remember to create a single-layer model:
模型= nn.Sequential(nn.Linear(6,1),
nn.Sigmoid())
```py
model = nn.Sequential(nn.Linear(6, 1),
                      nn.Sigmoid())
```
5. Define the loss function to be used. Use the MSE loss function:
loss_function = torch.nn.MSELoss()
```py
loss_function = torch.nn.MSELoss()
```
6. Define the optimizer of your model. Use the Adam optimizer and a learning rate of`0.01`:
优化程序= torch.optim.Adam(model.parameters(),lr = 0.01)
```py
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
```
7. Run the optimization for 100 iterations. Every 10 iterations, print and save the loss value:
损失= []
对于我在范围(100)中:
y_pred =模型(x)
损失= loss_function(y_pred,y)
loss.append(loss.item())
Optimizer.zero_grad()
loss.backward()
Optimizer.step()
如果 i% 10 == 0:
打印(loss.item())
```py
losses = []
for i in range(100):
    y_pred = model(x)
    loss = loss_function(y_pred, y)
    losses.append(loss.item())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if i%10 == 0:
        print(loss.item())
```
最终损失应约为`0.24`。
8. Make a line plot to display the loss value for each iteration step:
plt.plot(范围(0,100),损失)
```py
plt.plot(range(0,100), losses)
plt.show()
```
结果图应如下所示:
......@@ -92,13 +93,16 @@
1. Import the required libraries:
将熊猫作为 pd 导入
```py
import pandas as pd
```
2. Using pandas, load the **.csv** file:
数据= pd.read_csv(“ YearPredictionMSD.csv”,nrows = 50000)
data.head()
```py
data = pd.read_csv("YearPredictionMSD.csv", nrows=50000)
data.head()
```
注意
......@@ -112,11 +116,11 @@
3. Verify whether any qualitative data is present in the dataset:
```py
cols = data.columns
num_cols = data._get_numeric_data()。列
清单(设定(cols)-设定(num_cols))
num_cols = data._get_numeric_data().columns
list(set(cols) - set(num_cols))
```
输出应为空列表,这意味着没有定性特征。
......@@ -124,51 +128,45 @@
如果在先前用于此目的的代码行中添加一个附加的`sum()`函数,则将获得整个数据集中的缺失值之和,而无需按列进行区分:
data.isnull()。sum()。sum()
```py
data.isnull().sum().sum()
```
输出应为`0`,这意味着所有要素均不包含缺失值。
5. Check for outliers:
离群值= {}
对于范围内的 i(data.shape [1]):
min_t = data [data.columns [i]]。mean()\
-(3 * data [data.columns [i]]。std())
max_t = data [data.columns [i]]。mean()\
+(3 * data [data.columns [i]]。std())
计数= 0
对于 data [data.columns [i]]中的 j:
如果 j < min_t or j > max_t:
计数+ = 1
百分比= count / data.shape [0]
离群值[data.columns [i]] =“% .3f”% 百分比
打印(异常值)
```py
outliers = {}
for i in range(data.shape[1]):
    min_t = data[data.columns[i]].mean() \
            - (3 * data[data.columns[i]].std())
    max_t = data[data.columns[i]].mean() \
            + (3 * data[data.columns[i]].std())
    count = 0
    for j in data[data.columns[i]]:
        if j < min_t or j > max_t:
            count += 1
    percentage = count/data.shape[0]
    outliers[data.columns[i]] = "%.3f" % percentage
print(outliers)
```
输出字典应显示所有要素均不包含代表超过 5% 数据的离群值。
6. Separate the features from the target data:
X = data.loc [:,1:]
Y = data.iloc [:,0]
```py
X = data.iloc[:, 1:]
Y = data.iloc[:, 0]
```
7. Rescale the features data using the standardization methodology:
X =(X-X.mean())/ X.std()
X.head()
```py
X = (X - X.mean())/X.std()
X.head()
```
输出如下:
......@@ -178,49 +176,38 @@
8. Split the data into three sets: training, validation, and test. Use the approach of your preference:
从 sklearn.model_selection 导入 train_test_split
X_shuffle = X.sample(frac = 1,random_state = 0)
Y_shuffle = Y.sample(frac = 1,random_state = 0)
x_new,x_test,\
y_new,y_test = train_test_split(X_shuffle,\
Y_shuffle,\
test_size = 0.2,\
random_state = 0)
dev_per = x_test.shape [0] /x_new.shape [0]
x_train,x_dev,\
y_train,y_dev = train_test_split(x_new,\
y_new,\
test_size = dev_per,\
random_state = 0)
```py
from sklearn.model_selection import train_test_split
X_shuffle = X.sample(frac=1, random_state=0)
Y_shuffle = Y.sample(frac=1, random_state=0)
x_new, x_test, \
y_new, y_test = train_test_split(X_shuffle, \
                                 Y_shuffle, \
                                 test_size=0.2, \
                                 random_state=0)
dev_per = x_test.shape[0]/x_new.shape[0]
x_train, x_dev, \
y_train, y_dev = train_test_split(x_new, \
                                  y_new, \
                                  test_size=dev_per, \
                                  random_state=0)
```
9. Print the resulting shapes as follows:
打印(x_train.shape,y_train.shape)
打印(x_dev.shape,y_dev.shape)
打印(x_test.shape,y_test.shape)
```py
print(x_train.shape, y_train.shape)
print(x_dev.shape, y_dev.shape)
print(x_test.shape, y_test.shape)
```
输出应如下所示:
```py
(30000, 90) (30000, )
(10000, 90) (10000, )
(10000, 90) (10000, )
```
注意
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册