# 使用 Ray Tune 的超参数调整 > 原文: 超参数调整可以使平均模型与高精度模型有所不同。 通常,选择不同的学习率或更改网络层大小等简单的事情可能会对模型表现产生巨大影响。 幸运的是,有一些工具可以帮助您找到最佳的参数组合。 [Ray Tune](https://docs.ray.io/en/latest/tune.html) 是用于分布式超参数调整的行业标准工具。 Ray Tune 包含最新的超参数搜索算法,与 TensorBoard 和其他分析库集成,并通过 [Ray 的分布式机器学习引擎](https://ray.io/)本地支持分布式训练。 在本教程中,我们将向您展示如何将 Ray Tune 集成到 PyTorch 训练工作流程中。 我们将扩展 PyTorch 文档的[本教程](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html),以训练 CIFAR10 图像分类器。 如您所见,我们只需要添加一些细微的修改即可。 特别是,我们需要 1. 在函数中包装数据加载和训练, 2. 使一些网络参数可配置, 3. 添加检查点(可选), 4. 并定义用于模型调整的搜索空间 要运行本教程,请确保已安装以下包: * `ray[tune]`:分布式超参数调整库 * `torchvision`:用于数据转换器 ## 设置/导入 让我们从导入开始: ```py from functools import partial import numpy as np import os import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import random_split import torchvision import torchvision.transforms as transforms from ray import tune from ray.tune import CLIReporter from ray.tune.schedulers import ASHAScheduler ``` 建立 PyTorch 模型需要大多数导入产品。 Ray Tune 仅最后三个导入。 ## 数据加载器 我们将数据加载器包装在它们自己的函数中,并传递一个全局数据目录。 这样,我们可以在不同的试验之间共享数据目录。 ```py def load_data(data_dir="./data"): transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ]) trainset = torchvision.datasets.CIFAR10( root=data_dir, train=True, download=True, transform=transform) testset = torchvision.datasets.CIFAR10( root=data_dir, train=False, download=True, transform=transform) return trainset, testset ``` ## 可配置的神经网络 我们只能调整那些可配置的参数。 在此示例中,我们可以指定全连接层的层大小: ```py class Net(nn.Module): def __init__(self, l1=120, l2=84): super(Net, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 5 * 5, l1) self.fc2 = nn.Linear(l1, l2) self.fc3 = nn.Linear(l2, 10) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 5 * 5) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x ``` ## 训练函数 现在变得有趣了,因为我们对 [PyTorch 文档中的示例](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html)进行了一些更改。 我们将训练脚本包装在函数`train_cifar(config, checkpoint_dir=None, data_dir=None)`中。 可以猜到,`config`参数将接收我们要训练的超参数。 `checkpoint_dir`参数用于还原检查点。 `data_dir`指定了我们加载和存储数据的目录,因此多次运行可以共享同一数据源。 ```py net = Net(config["l1"], config["l2"]) if checkpoint_dir: model_state, optimizer_state = torch.load( os.path.join(checkpoint_dir, "checkpoint")) net.load_state_dict(model_state) optimizer.load_state_dict(optimizer_state) ``` 优化器的学习率也可以配置: ```py optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9) ``` 我们还将训练数据分为训练和验证子集。 因此,我们训练了 80% 的数据,并计算了其余 20% 的验证损失。 我们遍历训练和测试集的批量大小也是可配置的。 ### 通过`DataParallel`添加(多)GPU 支持 图像分类主要受益于 GPU。 幸运的是,我们可以继续在 Ray Tune 中使用 PyTorch 的抽象。 因此,我们可以将模型包装在`nn.DataParallel`中,以支持在多个 GPU 上进行数据并行训练: ```py device = "cpu" if torch.cuda.is_available(): device = "cuda:0" if torch.cuda.device_count() > 1: net = nn.DataParallel(net) net.to(device) ``` 通过使用`device`变量,我们可以确保在没有 GPU 的情况下训练也能正常进行。 PyTorch 要求我们将数据显式发送到 GPU 内存,如下所示: ```py for i, data in enumerate(trainloader, 0): inputs, labels = data inputs, labels = inputs.to(device), labels.to(device) ``` 该代码现在支持在 CPU,单个 GPU 和多个 GPU 上进行训练。 值得注意的是,Ray 还支持[分数 GPU](https://docs.ray.io/en/master/using-ray-with-gpus.html#fractional-gpus) ,因此我们可以在试验之间共享 GPU,只要模型仍适合 GPU 内存即可。 我们稍后再讲。 ### 与 Ray Tune 交流 最有趣的部分是与 Ray Tune 的交流: ```py with tune.checkpoint_dir(epoch) as checkpoint_dir: path = os.path.join(checkpoint_dir, "checkpoint") torch.save((net.state_dict(), optimizer.state_dict()), path) tune.report(loss=(val_loss / val_steps), accuracy=correct / total) ``` 在这里,我们首先保存一个检查点,然后将一些指标报告给 Ray Tune。 具体来说,我们将验证损失和准确率发送回 Ray Tune。 然后,Ray Tune 可以使用这些指标来决定哪种超参数配置可以带来最佳结果。 这些指标还可用于尽早停止效果不佳的试验,以避免浪费资源进行试验。 保存检查点是可选的,但是,如果我们想使用高级调度器,例如[基于总体的训练](https://docs.ray.io/en/master/tune/tutorials/tune-advanced-tutorial.html),则有必要。 另外,通过保存检查点,我们可以稍后加载经过训练的模型并在测试集上对其进行验证。 ### 完整的训练函数 完整的代码示例如下所示: ```py def train_cifar(config, checkpoint_dir=None, data_dir=None): net = Net(config["l1"], config["l2"]) device = "cpu" if torch.cuda.is_available(): device = "cuda:0" if torch.cuda.device_count() > 1: net = nn.DataParallel(net) net.to(device) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9) if checkpoint_dir: model_state, optimizer_state = torch.load( os.path.join(checkpoint_dir, "checkpoint")) net.load_state_dict(model_state) optimizer.load_state_dict(optimizer_state) trainset, testset = load_data(data_dir) test_abs = int(len(trainset) * 0.8) train_subset, val_subset = random_split( trainset, [test_abs, len(trainset) - test_abs]) trainloader = torch.utils.data.DataLoader( train_subset, batch_size=int(config["batch_size"]), shuffle=True, num_workers=8) valloader = torch.utils.data.DataLoader( val_subset, batch_size=int(config["batch_size"]), shuffle=True, num_workers=8) for epoch in range(10): # loop over the dataset multiple times running_loss = 0.0 epoch_steps = 0 for i, data in enumerate(trainloader, 0): # get the inputs; data is a list of [inputs, labels] inputs, labels = data inputs, labels = inputs.to(device), labels.to(device) # zero the parameter gradients optimizer.zero_grad() # forward + backward + optimize outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() # print statistics running_loss += loss.item() epoch_steps += 1 if i % 2000 == 1999: # print every 2000 mini-batches print("[%d, %5d] loss: %.3f" % (epoch + 1, i + 1, running_loss / epoch_steps)) running_loss = 0.0 # Validation loss val_loss = 0.0 val_steps = 0 total = 0 correct = 0 for i, data in enumerate(valloader, 0): with torch.no_grad(): inputs, labels = data inputs, labels = inputs.to(device), labels.to(device) outputs = net(inputs) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() loss = criterion(outputs, labels) val_loss += loss.cpu().numpy() val_steps += 1 with tune.checkpoint_dir(epoch) as checkpoint_dir: path = os.path.join(checkpoint_dir, "checkpoint") torch.save((net.state_dict(), optimizer.state_dict()), path) tune.report(loss=(val_loss / val_steps), accuracy=correct / total) print("Finished Training") ``` 如您所见,大多数代码直接来自原始示例。 ## 测试集准确率 通常,机器学习模型的表现是在保持测试集上使用尚未用于训练模型的数据进行测试的。 我们还将其包装在一个函数中: ```py def test_accuracy(net, device="cpu"): trainset, testset = load_data() testloader = torch.utils.data.DataLoader( testset, batch_size=4, shuffle=False, num_workers=2) correct = 0 total = 0 with torch.no_grad(): for data in testloader: images, labels = data images, labels = images.to(device), labels.to(device) outputs = net(images) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() return correct / total ``` 该函数还需要一个`device`参数,因此我们可以在 GPU 上进行测试集验证。 ## 配置搜索空间 最后,我们需要定义 Ray Tune 的搜索空间。 这是一个例子: ```py config = { "l1": tune.sample_from(lambda _: 2**np.random.randint(2, 9)), "l2": tune.sample_from(lambda _: 2**np.random.randint(2, 9)), "lr": tune.loguniform(1e-4, 1e-1), "batch_size": tune.choice([2, 4, 8, 16]) } ``` `tune.sample_from()`函数使您可以定义自己的采样方法以获得超参数。 在此示例中,`l1`和`l2`参数应为 4 到 256 之间的 2 的幂,因此应为 4、8、16、32、64、128 或 256。`lr`(学习率)应在 0.0001 和 0.1 之间均匀采样。 最后,批量大小可以在 2、4、8 和 16 之间选择。 现在,在每次试用中,Ray Tune 都会从这些搜索空间中随机抽取参数组合。 然后它将并行训练许多模型,并在其中找到表现最佳的模型。 我们还使用`ASHAScheduler`,它将尽早终止效果不佳的测试。 我们用`functools.partial`包装`train_cifar`函数以设置常量`data_dir`参数。 我们还可以告诉 Ray Tune 每个审判应提供哪些资源: ```py gpus_per_trial = 2 # ... result = tune.run( partial(train_cifar, data_dir=data_dir), resources_per_trial={"cpu": 8, "gpu": gpus_per_trial}, config=config, num_samples=num_samples, scheduler=scheduler, progress_reporter=reporter, checkpoint_at_end=True) ``` 您可以指定 CPU 的数量,例如增加 PyTorch `DataLoader`实例的`num_workers`。 在每次试用中,选定数量的 GPU 对 PyTorch 都是可见的。 试用版无法访问未要求使用 GPU 的 GPU,因此您不必担心使用同一组资源进行两次试用。 在这里,我们还可以指定分数 GPU,因此`gpus_per_trial=0.5`之类的东西完全有效。 然后,试用版将彼此共享 GPU。 您只需要确保模型仍然适合 GPU 内存即可。 训练完模型后,我们将找到表现最好的模型,并从检查点文件中加载训练后的网络。 然后,我们获得测试仪的准确率,并通过打印报告一切。 完整的`main`函数如下: ```py def main(num_samples=10, max_num_epochs=10, gpus_per_trial=2): data_dir = os.path.abspath("./data") load_data(data_dir) config = { "l1": tune.sample_from(lambda _: 2 ** np.random.randint(2, 9)), "l2": tune.sample_from(lambda _: 2 ** np.random.randint(2, 9)), "lr": tune.loguniform(1e-4, 1e-1), "batch_size": tune.choice([2, 4, 8, 16]) } scheduler = ASHAScheduler( metric="loss", mode="min", max_t=max_num_epochs, grace_period=1, reduction_factor=2) reporter = CLIReporter( # parameter_columns=["l1", "l2", "lr", "batch_size"], metric_columns=["loss", "accuracy", "training_iteration"]) result = tune.run( partial(train_cifar, data_dir=data_dir), resources_per_trial={"cpu": 2, "gpu": gpus_per_trial}, config=config, num_samples=num_samples, scheduler=scheduler, progress_reporter=reporter) best_trial = result.get_best_trial("loss", "min", "last") print("Best trial config: {}".format(best_trial.config)) print("Best trial final validation loss: {}".format( best_trial.last_result["loss"])) print("Best trial final validation accuracy: {}".format( best_trial.last_result["accuracy"])) best_trained_model = Net(best_trial.config["l1"], best_trial.config["l2"]) device = "cpu" if torch.cuda.is_available(): device = "cuda:0" if gpus_per_trial > 1: best_trained_model = nn.DataParallel(best_trained_model) best_trained_model.to(device) best_checkpoint_dir = best_trial.checkpoint.value model_state, optimizer_state = torch.load(os.path.join( best_checkpoint_dir, "checkpoint")) best_trained_model.load_state_dict(model_state) test_acc = test_accuracy(best_trained_model, device) print("Best trial test set accuracy: {}".format(test_acc)) if __name__ == "__main__": # You can change the number of GPUs per trial here: main(num_samples=10, max_num_epochs=10, gpus_per_trial=0) ``` 出: ```py Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to /var/lib/jenkins/workspace/beginner_source/data/cifar-10-python.tar.gz Extracting /var/lib/jenkins/workspace/beginner_source/data/cifar-10-python.tar.gz to /var/lib/jenkins/workspace/beginner_source/data Files already downloaded and verified == Status == Memory usage on this node: 4.0/240.1 GiB Using AsyncHyperBand: num_stopped=0 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: None Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 1/10 (1 RUNNING) +---------------------+----------+-------+--------------+------+------+-------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | |---------------------+----------+-------+--------------+------+------+-------------| | DEFAULT_d3304_00000 | RUNNING | | 2 | 4 | 16 | 0.000111924 | +---------------------+----------+-------+--------------+------+------+-------------+ [2m[36m(pid=1588)[0m Files already downloaded and verified [2m[36m(pid=1568)[0m Files already downloaded and verified [2m[36m(pid=1504)[0m Files already downloaded and verified [2m[36m(pid=1575)[0m Files already downloaded and verified [2m[36m(pid=1494)[0m Files already downloaded and verified [2m[36m(pid=1572)[0m Files already downloaded and verified [2m[36m(pid=1567)[0m Files already downloaded and verified [2m[36m(pid=1585)[0m Files already downloaded and verified [2m[36m(pid=1565)[0m Files already downloaded and verified [2m[36m(pid=1505)[0m Files already downloaded and verified [2m[36m(pid=1588)[0m Files already downloaded and verified [2m[36m(pid=1568)[0m Files already downloaded and verified [2m[36m(pid=1504)[0m Files already downloaded and verified [2m[36m(pid=1575)[0m Files already downloaded and verified [2m[36m(pid=1494)[0m Files already downloaded and verified [2m[36m(pid=1572)[0m Files already downloaded and verified [2m[36m(pid=1567)[0m Files already downloaded and verified [2m[36m(pid=1565)[0m Files already downloaded and verified [2m[36m(pid=1585)[0m Files already downloaded and verified [2m[36m(pid=1505)[0m Files already downloaded and verified [2m[36m(pid=1585)[0m [1, 2000] loss: 2.307 [2m[36m(pid=1568)[0m [1, 2000] loss: 2.226 [2m[36m(pid=1565)[0m [1, 2000] loss: 2.141 [2m[36m(pid=1505)[0m [1, 2000] loss: 2.339 [2m[36m(pid=1504)[0m [1, 2000] loss: 2.042 [2m[36m(pid=1572)[0m [1, 2000] loss: 2.288 [2m[36m(pid=1567)[0m [1, 2000] loss: 2.047 [2m[36m(pid=1575)[0m [1, 2000] loss: 2.316 [2m[36m(pid=1494)[0m [1, 2000] loss: 2.322 [2m[36m(pid=1588)[0m [1, 2000] loss: 2.289 [2m[36m(pid=1585)[0m [1, 4000] loss: 1.154 [2m[36m(pid=1505)[0m [1, 4000] loss: 1.170 [2m[36m(pid=1565)[0m [1, 4000] loss: 0.939 [2m[36m(pid=1568)[0m [1, 4000] loss: 1.102 [2m[36m(pid=1504)[0m [1, 4000] loss: 0.916 [2m[36m(pid=1572)[0m [1, 4000] loss: 1.156 Result for DEFAULT_d3304_00003: accuracy: 0.226 date: 2021-01-05_20-23-37 done: false experiment_id: d4b00469893d498ea65a729df202882a experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023 hostname: 1a844a452371 iterations_since_restore: 1 loss: 2.083958268547058 node_ip: 172.17.0.2 pid: 1588 should_checkpoint: true time_since_restore: 27.169169902801514 time_this_iter_s: 27.169169902801514 time_total_s: 27.169169902801514 timestamp: 1609878217 timesteps_since_restore: 0 training_iteration: 1 trial_id: d3304_00003 == Status == Memory usage on this node: 9.2/240.1 GiB Using AsyncHyperBand: num_stopped=0 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: -2.083958268547058 Resources requested: 20/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (10 RUNNING) +---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00000 | RUNNING | | 2 | 4 | 16 | 0.000111924 | | | | | DEFAULT_d3304_00001 | RUNNING | | 8 | 16 | 32 | 0.077467 | | | | | DEFAULT_d3304_00002 | RUNNING | | 4 | 8 | 128 | 0.00436986 | | | | | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 2.08396 | 0.226 | 1 | | DEFAULT_d3304_00004 | RUNNING | | 4 | 16 | 32 | 0.016474 | | | | | DEFAULT_d3304_00005 | RUNNING | | 4 | 128 | 64 | 0.00757252 | | | | | DEFAULT_d3304_00006 | RUNNING | | 2 | 64 | 256 | 0.00177236 | | | | | DEFAULT_d3304_00007 | RUNNING | | 8 | 8 | 8 | 0.000155891 | | | | | DEFAULT_d3304_00008 | RUNNING | | 2 | 16 | 64 | 0.0310199 | | | | | DEFAULT_d3304_00009 | RUNNING | | 4 | 4 | 32 | 0.0175239 | | | | +---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1567)[0m [1, 4000] loss: 0.943 [2m[36m(pid=1494)[0m [1, 4000] loss: 1.155 [2m[36m(pid=1575)[0m [1, 4000] loss: 1.162 [2m[36m(pid=1585)[0m [1, 6000] loss: 0.768 [2m[36m(pid=1505)[0m [1, 6000] loss: 0.780 [2m[36m(pid=1565)[0m [1, 6000] loss: 0.582 [2m[36m(pid=1504)[0m [1, 6000] loss: 0.587 [2m[36m(pid=1568)[0m [1, 6000] loss: 0.770 [2m[36m(pid=1572)[0m [1, 6000] loss: 0.771 [2m[36m(pid=1567)[0m [1, 6000] loss: 0.615 Result for DEFAULT_d3304_00007: accuracy: 0.1011 date: 2021-01-05_20-23-51 done: true experiment_id: 947614a8c2a74533be128b929f363bd1 experiment_tag: 7_batch_size=8,l1=8,l2=8,lr=0.00015589 hostname: 1a844a452371 iterations_since_restore: 1 loss: 2.3038805620193483 node_ip: 172.17.0.2 pid: 1494 should_checkpoint: true time_since_restore: 41.69914960861206 time_this_iter_s: 41.69914960861206 time_total_s: 41.69914960861206 timestamp: 1609878231 timesteps_since_restore: 0 training_iteration: 1 trial_id: d3304_00007 == Status == Memory usage on this node: 9.1/240.1 GiB Using AsyncHyperBand: num_stopped=1 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: -2.193919415283203 Resources requested: 20/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (10 RUNNING) +---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00000 | RUNNING | | 2 | 4 | 16 | 0.000111924 | | | | | DEFAULT_d3304_00001 | RUNNING | | 8 | 16 | 32 | 0.077467 | | | | | DEFAULT_d3304_00002 | RUNNING | | 4 | 8 | 128 | 0.00436986 | | | | | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 2.08396 | 0.226 | 1 | | DEFAULT_d3304_00004 | RUNNING | | 4 | 16 | 32 | 0.016474 | | | | | DEFAULT_d3304_00005 | RUNNING | | 4 | 128 | 64 | 0.00757252 | | | | | DEFAULT_d3304_00006 | RUNNING | | 2 | 64 | 256 | 0.00177236 | | | | | DEFAULT_d3304_00007 | RUNNING | 172.17.0.2:1494 | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00008 | RUNNING | | 2 | 16 | 64 | 0.0310199 | | | | | DEFAULT_d3304_00009 | RUNNING | | 4 | 4 | 32 | 0.0175239 | | | | +---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ Result for DEFAULT_d3304_00001: accuracy: 0.1017 date: 2021-01-05_20-23-51 done: true experiment_id: 26ac228b4b454584869f8490742cf253 experiment_tag: 1_batch_size=8,l1=16,l2=32,lr=0.077467 hostname: 1a844a452371 iterations_since_restore: 1 loss: 2.321864831352234 node_ip: 172.17.0.2 pid: 1575 should_checkpoint: true time_since_restore: 42.09821367263794 time_this_iter_s: 42.09821367263794 time_total_s: 42.09821367263794 timestamp: 1609878231 timesteps_since_restore: 0 training_iteration: 1 trial_id: d3304_00001 [2m[36m(pid=1588)[0m [2, 2000] loss: 1.916 [2m[36m(pid=1585)[0m [1, 8000] loss: 0.576 [2m[36m(pid=1505)[0m [1, 8000] loss: 0.584 [2m[36m(pid=1565)[0m [1, 8000] loss: 0.422 [2m[36m(pid=1504)[0m [1, 8000] loss: 0.433 [2m[36m(pid=1572)[0m [1, 8000] loss: 0.578 [2m[36m(pid=1568)[0m [1, 8000] loss: 0.580 Result for DEFAULT_d3304_00003: accuracy: 0.3762 date: 2021-01-05_20-24-00 done: false experiment_id: d4b00469893d498ea65a729df202882a experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023 hostname: 1a844a452371 iterations_since_restore: 2 loss: 1.7041921138763427 node_ip: 172.17.0.2 pid: 1588 should_checkpoint: true time_since_restore: 50.74612545967102 time_this_iter_s: 23.576955556869507 time_total_s: 50.74612545967102 timestamp: 1609878240 timesteps_since_restore: 0 training_iteration: 2 trial_id: d3304_00003 == Status == Memory usage on this node: 8.0/240.1 GiB Using AsyncHyperBand: num_stopped=2 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.3038805620193483 Resources requested: 16/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (8 RUNNING, 2 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00000 | RUNNING | | 2 | 4 | 16 | 0.000111924 | | | | | DEFAULT_d3304_00002 | RUNNING | | 4 | 8 | 128 | 0.00436986 | | | | | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.70419 | 0.3762 | 2 | | DEFAULT_d3304_00004 | RUNNING | | 4 | 16 | 32 | 0.016474 | | | | | DEFAULT_d3304_00005 | RUNNING | | 4 | 128 | 64 | 0.00757252 | | | | | DEFAULT_d3304_00006 | RUNNING | | 2 | 64 | 256 | 0.00177236 | | | | | DEFAULT_d3304_00008 | RUNNING | | 2 | 16 | 64 | 0.0310199 | | | | | DEFAULT_d3304_00009 | RUNNING | | 4 | 4 | 32 | 0.0175239 | | | | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1567)[0m [1, 8000] loss: 0.458 [2m[36m(pid=1585)[0m [1, 10000] loss: 0.461 [2m[36m(pid=1505)[0m [1, 10000] loss: 0.467 [2m[36m(pid=1565)[0m [1, 10000] loss: 0.329 [2m[36m(pid=1504)[0m [1, 10000] loss: 0.344 [2m[36m(pid=1572)[0m [1, 10000] loss: 0.463 [2m[36m(pid=1568)[0m [1, 10000] loss: 0.464 [2m[36m(pid=1567)[0m [1, 10000] loss: 0.360 [2m[36m(pid=1588)[0m [3, 2000] loss: 1.663 Result for DEFAULT_d3304_00002: accuracy: 0.3791 date: 2021-01-05_20-24-18 done: false experiment_id: eaf4d25c9a0e46219afb226ed323095b experiment_tag: 2_batch_size=4,l1=8,l2=128,lr=0.0043699 hostname: 1a844a452371 iterations_since_restore: 1 loss: 1.6690538251161575 node_ip: 172.17.0.2 pid: 1504 should_checkpoint: true time_since_restore: 68.1856791973114 time_this_iter_s: 68.1856791973114 time_total_s: 68.1856791973114 timestamp: 1609878258 timesteps_since_restore: 0 training_iteration: 1 trial_id: d3304_00002 == Status == Memory usage on this node: 8.0/240.1 GiB Using AsyncHyperBand: num_stopped=2 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.193919415283203 Resources requested: 16/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (8 RUNNING, 2 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00000 | RUNNING | | 2 | 4 | 16 | 0.000111924 | | | | | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.66905 | 0.3791 | 1 | | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.70419 | 0.3762 | 2 | | DEFAULT_d3304_00004 | RUNNING | | 4 | 16 | 32 | 0.016474 | | | | | DEFAULT_d3304_00005 | RUNNING | | 4 | 128 | 64 | 0.00757252 | | | | | DEFAULT_d3304_00006 | RUNNING | | 2 | 64 | 256 | 0.00177236 | | | | | DEFAULT_d3304_00008 | RUNNING | | 2 | 16 | 64 | 0.0310199 | | | | | DEFAULT_d3304_00009 | RUNNING | | 4 | 4 | 32 | 0.0175239 | | | | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1585)[0m [1, 12000] loss: 0.384 [2m[36m(pid=1505)[0m [1, 12000] loss: 0.390 Result for DEFAULT_d3304_00009: accuracy: 0.101 date: 2021-01-05_20-24-19 done: true experiment_id: 471eb6134c2a45509b005af46861c602 experiment_tag: 9_batch_size=4,l1=4,l2=32,lr=0.017524 hostname: 1a844a452371 iterations_since_restore: 1 loss: 2.310983589553833 node_ip: 172.17.0.2 pid: 1572 should_checkpoint: true time_since_restore: 69.29919123649597 time_this_iter_s: 69.29919123649597 time_total_s: 69.29919123649597 timestamp: 1609878259 timesteps_since_restore: 0 training_iteration: 1 trial_id: d3304_00009 Result for DEFAULT_d3304_00004: accuracy: 0.102 date: 2021-01-05_20-24-19 done: true experiment_id: bd1f438c1fdd4a9ba98074d1cfd573fe experiment_tag: 4_batch_size=4,l1=16,l2=32,lr=0.016474 hostname: 1a844a452371 iterations_since_restore: 1 loss: 2.313420217037201 node_ip: 172.17.0.2 pid: 1568 should_checkpoint: true time_since_restore: 69.48366618156433 time_this_iter_s: 69.48366618156433 time_total_s: 69.48366618156433 timestamp: 1609878259 timesteps_since_restore: 0 training_iteration: 1 trial_id: d3304_00004 [2m[36m(pid=1565)[0m [1, 12000] loss: 0.267 Result for DEFAULT_d3304_00005: accuracy: 0.3301 date: 2021-01-05_20-24-22 done: false experiment_id: 738b3d315db548a7956646b2c07f1b0c experiment_tag: 5_batch_size=4,l1=128,l2=64,lr=0.0075725 hostname: 1a844a452371 iterations_since_restore: 1 loss: 1.8058318739891053 node_ip: 172.17.0.2 pid: 1567 should_checkpoint: true time_since_restore: 72.0806794166565 time_this_iter_s: 72.0806794166565 time_total_s: 72.0806794166565 timestamp: 1609878262 timesteps_since_restore: 0 training_iteration: 1 trial_id: d3304_00005 Result for DEFAULT_d3304_00003: accuracy: 0.4242 date: 2021-01-05_20-24-23 done: false experiment_id: d4b00469893d498ea65a729df202882a experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023 hostname: 1a844a452371 iterations_since_restore: 3 loss: 1.5498835063934326 node_ip: 172.17.0.2 pid: 1588 should_checkpoint: true time_since_restore: 73.29849410057068 time_this_iter_s: 22.552368640899658 time_total_s: 73.29849410057068 timestamp: 1609878263 timesteps_since_restore: 0 training_iteration: 3 trial_id: d3304_00003 == Status == Memory usage on this node: 6.9/240.1 GiB Using AsyncHyperBand: num_stopped=4 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.3038805620193483 Resources requested: 12/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (6 RUNNING, 4 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00000 | RUNNING | | 2 | 4 | 16 | 0.000111924 | | | | | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.66905 | 0.3791 | 1 | | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.54988 | 0.4242 | 3 | | DEFAULT_d3304_00005 | RUNNING | 172.17.0.2:1567 | 4 | 128 | 64 | 0.00757252 | 1.80583 | 0.3301 | 1 | | DEFAULT_d3304_00006 | RUNNING | | 2 | 64 | 256 | 0.00177236 | | | | | DEFAULT_d3304_00008 | RUNNING | | 2 | 16 | 64 | 0.0310199 | | | | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1585)[0m [1, 14000] loss: 0.329 [2m[36m(pid=1504)[0m [2, 2000] loss: 1.708 [2m[36m(pid=1565)[0m [1, 14000] loss: 0.225 [2m[36m(pid=1505)[0m [1, 14000] loss: 0.334 [2m[36m(pid=1567)[0m [2, 2000] loss: 1.803 [2m[36m(pid=1585)[0m [1, 16000] loss: 0.288 [2m[36m(pid=1588)[0m [4, 2000] loss: 1.541 [2m[36m(pid=1504)[0m [2, 4000] loss: 0.840 [2m[36m(pid=1565)[0m [1, 16000] loss: 0.198 [2m[36m(pid=1505)[0m [1, 16000] loss: 0.292 [2m[36m(pid=1567)[0m [2, 4000] loss: 0.912 Result for DEFAULT_d3304_00003: accuracy: 0.4494 date: 2021-01-05_20-24-44 done: false experiment_id: d4b00469893d498ea65a729df202882a experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023 hostname: 1a844a452371 iterations_since_restore: 4 loss: 1.4720179980278014 node_ip: 172.17.0.2 pid: 1588 should_checkpoint: true time_since_restore: 94.81268787384033 time_this_iter_s: 21.514193773269653 time_total_s: 94.81268787384033 timestamp: 1609878284 timesteps_since_restore: 0 training_iteration: 4 trial_id: d3304_00003 == Status == Memory usage on this node: 6.9/240.1 GiB Using AsyncHyperBand: num_stopped=4 Bracket: Iter 8.000: None | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.3038805620193483 Resources requested: 12/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (6 RUNNING, 4 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00000 | RUNNING | | 2 | 4 | 16 | 0.000111924 | | | | | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.66905 | 0.3791 | 1 | | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.47202 | 0.4494 | 4 | | DEFAULT_d3304_00005 | RUNNING | 172.17.0.2:1567 | 4 | 128 | 64 | 0.00757252 | 1.80583 | 0.3301 | 1 | | DEFAULT_d3304_00006 | RUNNING | | 2 | 64 | 256 | 0.00177236 | | | | | DEFAULT_d3304_00008 | RUNNING | | 2 | 16 | 64 | 0.0310199 | | | | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1585)[0m [1, 18000] loss: 0.256 [2m[36m(pid=1565)[0m [1, 18000] loss: 0.173 [2m[36m(pid=1504)[0m [2, 6000] loss: 0.572 [2m[36m(pid=1505)[0m [1, 18000] loss: 0.259 [2m[36m(pid=1567)[0m [2, 6000] loss: 0.611 [2m[36m(pid=1585)[0m [1, 20000] loss: 0.230 [2m[36m(pid=1565)[0m [1, 20000] loss: 0.156 [2m[36m(pid=1505)[0m [1, 20000] loss: 0.234 [2m[36m(pid=1504)[0m [2, 8000] loss: 0.417 [2m[36m(pid=1588)[0m [5, 2000] loss: 1.452 [2m[36m(pid=1567)[0m [2, 8000] loss: 0.461 Result for DEFAULT_d3304_00003: accuracy: 0.4839 date: 2021-01-05_20-25-06 done: false experiment_id: d4b00469893d498ea65a729df202882a experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023 hostname: 1a844a452371 iterations_since_restore: 5 loss: 1.4083827662467956 node_ip: 172.17.0.2 pid: 1588 should_checkpoint: true time_since_restore: 116.5817449092865 time_this_iter_s: 21.769057035446167 time_total_s: 116.5817449092865 timestamp: 1609878306 timesteps_since_restore: 0 training_iteration: 5 trial_id: d3304_00003 == Status == Memory usage on this node: 6.9/240.1 GiB Using AsyncHyperBand: num_stopped=4 Bracket: Iter 8.000: None | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.3038805620193483 Resources requested: 12/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (6 RUNNING, 4 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00000 | RUNNING | | 2 | 4 | 16 | 0.000111924 | | | | | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.66905 | 0.3791 | 1 | | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.40838 | 0.4839 | 5 | | DEFAULT_d3304_00005 | RUNNING | 172.17.0.2:1567 | 4 | 128 | 64 | 0.00757252 | 1.80583 | 0.3301 | 1 | | DEFAULT_d3304_00006 | RUNNING | | 2 | 64 | 256 | 0.00177236 | | | | | DEFAULT_d3304_00008 | RUNNING | | 2 | 16 | 64 | 0.0310199 | | | | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1504)[0m [2, 10000] loss: 0.339 Result for DEFAULT_d3304_00000: accuracy: 0.1104 date: 2021-01-05_20-25-10 done: false experiment_id: 454624d453954d46b33a1eb496e7ec53 experiment_tag: 0_batch_size=2,l1=4,l2=16,lr=0.00011192 hostname: 1a844a452371 iterations_since_restore: 1 loss: 2.2988875378131866 node_ip: 172.17.0.2 pid: 1585 should_checkpoint: true time_since_restore: 120.59520411491394 time_this_iter_s: 120.59520411491394 time_total_s: 120.59520411491394 timestamp: 1609878310 timesteps_since_restore: 0 training_iteration: 1 trial_id: d3304_00000 Result for DEFAULT_d3304_00008: accuracy: 0.0983 date: 2021-01-05_20-25-11 done: true experiment_id: 381603b190bc47a9b794321f7692695f experiment_tag: 8_batch_size=2,l1=16,l2=64,lr=0.03102 hostname: 1a844a452371 iterations_since_restore: 1 loss: 2.336980807876587 node_ip: 172.17.0.2 pid: 1505 should_checkpoint: true time_since_restore: 121.36707901954651 time_this_iter_s: 121.36707901954651 time_total_s: 121.36707901954651 timestamp: 1609878311 timesteps_since_restore: 0 training_iteration: 1 trial_id: d3304_00008 Result for DEFAULT_d3304_00006: accuracy: 0.4586 date: 2021-01-05_20-25-11 done: false experiment_id: d8bae0fc87134e6398fd0341279c1a1a experiment_tag: 6_batch_size=2,l1=64,l2=256,lr=0.0017724 hostname: 1a844a452371 iterations_since_restore: 1 loss: 1.5124113649010658 node_ip: 172.17.0.2 pid: 1565 should_checkpoint: true time_since_restore: 121.536208152771 time_this_iter_s: 121.536208152771 time_total_s: 121.536208152771 timestamp: 1609878311 timesteps_since_restore: 0 training_iteration: 1 trial_id: d3304_00006 == Status == Memory usage on this node: 6.6/240.1 GiB Using AsyncHyperBand: num_stopped=5 Bracket: Iter 8.000: None | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267 Resources requested: 10/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (5 RUNNING, 5 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00000 | RUNNING | 172.17.0.2:1585 | 2 | 4 | 16 | 0.000111924 | 2.29889 | 0.1104 | 1 | | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.66905 | 0.3791 | 1 | | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.40838 | 0.4839 | 5 | | DEFAULT_d3304_00005 | RUNNING | 172.17.0.2:1567 | 4 | 128 | 64 | 0.00757252 | 1.80583 | 0.3301 | 1 | | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.51241 | 0.4586 | 1 | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ Result for DEFAULT_d3304_00002: accuracy: 0.4078 date: 2021-01-05_20-25-16 done: false experiment_id: eaf4d25c9a0e46219afb226ed323095b experiment_tag: 2_batch_size=4,l1=8,l2=128,lr=0.0043699 hostname: 1a844a452371 iterations_since_restore: 2 loss: 1.6191314194440842 node_ip: 172.17.0.2 pid: 1504 should_checkpoint: true time_since_restore: 126.61185264587402 time_this_iter_s: 58.42617344856262 time_total_s: 126.61185264587402 timestamp: 1609878316 timesteps_since_restore: 0 training_iteration: 2 trial_id: d3304_00002 [2m[36m(pid=1567)[0m [2, 10000] loss: 0.371 [2m[36m(pid=1585)[0m [2, 2000] loss: 2.298 [2m[36m(pid=1565)[0m [2, 2000] loss: 1.466 [2m[36m(pid=1588)[0m [6, 2000] loss: 1.383 Result for DEFAULT_d3304_00005: accuracy: 0.3647 date: 2021-01-05_20-25-24 done: true experiment_id: 738b3d315db548a7956646b2c07f1b0c experiment_tag: 5_batch_size=4,l1=128,l2=64,lr=0.0075725 hostname: 1a844a452371 iterations_since_restore: 2 loss: 1.7739140236496926 node_ip: 172.17.0.2 pid: 1567 should_checkpoint: true time_since_restore: 134.1462869644165 time_this_iter_s: 62.06560754776001 time_total_s: 134.1462869644165 timestamp: 1609878324 timesteps_since_restore: 0 training_iteration: 2 trial_id: d3304_00005 == Status == Memory usage on this node: 6.3/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: None | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267 Resources requested: 10/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (5 RUNNING, 5 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00000 | RUNNING | 172.17.0.2:1585 | 2 | 4 | 16 | 0.000111924 | 2.29889 | 0.1104 | 1 | | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.61913 | 0.4078 | 2 | | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.40838 | 0.4839 | 5 | | DEFAULT_d3304_00005 | RUNNING | 172.17.0.2:1567 | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 | | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.51241 | 0.4586 | 1 | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1504)[0m [3, 2000] loss: 1.656 Result for DEFAULT_d3304_00003: accuracy: 0.5061 date: 2021-01-05_20-25-27 done: false experiment_id: d4b00469893d498ea65a729df202882a experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023 hostname: 1a844a452371 iterations_since_restore: 6 loss: 1.3623717227935792 node_ip: 172.17.0.2 pid: 1588 should_checkpoint: true time_since_restore: 137.95851016044617 time_this_iter_s: 21.376765251159668 time_total_s: 137.95851016044617 timestamp: 1609878327 timesteps_since_restore: 0 training_iteration: 6 trial_id: d3304_00003 [2m[36m(pid=1585)[0m [2, 4000] loss: 1.147 [2m[36m(pid=1565)[0m [2, 4000] loss: 0.749 [2m[36m(pid=1504)[0m [3, 4000] loss: 0.838 [2m[36m(pid=1585)[0m [2, 6000] loss: 0.760 [2m[36m(pid=1565)[0m [2, 6000] loss: 0.498 [2m[36m(pid=1588)[0m [7, 2000] loss: 1.326 [2m[36m(pid=1504)[0m [3, 6000] loss: 0.560 [2m[36m(pid=1585)[0m [2, 8000] loss: 0.561 Result for DEFAULT_d3304_00003: accuracy: 0.5209 date: 2021-01-05_20-25-48 done: false experiment_id: d4b00469893d498ea65a729df202882a experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023 hostname: 1a844a452371 iterations_since_restore: 7 loss: 1.316757419013977 node_ip: 172.17.0.2 pid: 1588 should_checkpoint: true time_since_restore: 158.4953932762146 time_this_iter_s: 20.536883115768433 time_total_s: 158.4953932762146 timestamp: 1609878348 timesteps_since_restore: 0 training_iteration: 7 trial_id: d3304_00003 == Status == Memory usage on this node: 5.8/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: None | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267 Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (4 RUNNING, 6 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00000 | RUNNING | 172.17.0.2:1585 | 2 | 4 | 16 | 0.000111924 | 2.29889 | 0.1104 | 1 | | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.61913 | 0.4078 | 2 | | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.31676 | 0.5209 | 7 | | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.51241 | 0.4586 | 1 | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1565)[0m [2, 8000] loss: 0.372 [2m[36m(pid=1504)[0m [3, 8000] loss: 0.416 [2m[36m(pid=1585)[0m [2, 10000] loss: 0.434 [2m[36m(pid=1565)[0m [2, 10000] loss: 0.292 [2m[36m(pid=1588)[0m [8, 2000] loss: 1.278 [2m[36m(pid=1504)[0m [3, 10000] loss: 0.333 [2m[36m(pid=1585)[0m [2, 12000] loss: 0.347 [2m[36m(pid=1565)[0m [2, 12000] loss: 0.245 Result for DEFAULT_d3304_00003: accuracy: 0.5406 date: 2021-01-05_20-26-08 done: false experiment_id: d4b00469893d498ea65a729df202882a experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023 hostname: 1a844a452371 iterations_since_restore: 8 loss: 1.267511115884781 node_ip: 172.17.0.2 pid: 1588 should_checkpoint: true time_since_restore: 179.13841199874878 time_this_iter_s: 20.64301872253418 time_total_s: 179.13841199874878 timestamp: 1609878368 timesteps_since_restore: 0 training_iteration: 8 trial_id: d3304_00003 == Status == Memory usage on this node: 5.8/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267 Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (4 RUNNING, 6 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00000 | RUNNING | 172.17.0.2:1585 | 2 | 4 | 16 | 0.000111924 | 2.29889 | 0.1104 | 1 | | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.61913 | 0.4078 | 2 | | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.26751 | 0.5406 | 8 | | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.51241 | 0.4586 | 1 | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ Result for DEFAULT_d3304_00002: accuracy: 0.3997 date: 2021-01-05_20-26-11 done: false experiment_id: eaf4d25c9a0e46219afb226ed323095b experiment_tag: 2_batch_size=4,l1=8,l2=128,lr=0.0043699 hostname: 1a844a452371 iterations_since_restore: 3 loss: 1.7084122330278158 node_ip: 172.17.0.2 pid: 1504 should_checkpoint: true time_since_restore: 182.02509140968323 time_this_iter_s: 55.413238763809204 time_total_s: 182.02509140968323 timestamp: 1609878371 timesteps_since_restore: 0 training_iteration: 3 trial_id: d3304_00002 [2m[36m(pid=1585)[0m [2, 14000] loss: 0.290 [2m[36m(pid=1565)[0m [2, 14000] loss: 0.213 [2m[36m(pid=1504)[0m [4, 2000] loss: 1.653 [2m[36m(pid=1588)[0m [9, 2000] loss: 1.245 [2m[36m(pid=1585)[0m [2, 16000] loss: 0.244 [2m[36m(pid=1565)[0m [2, 16000] loss: 0.186 Result for DEFAULT_d3304_00003: accuracy: 0.5409 date: 2021-01-05_20-26-29 done: false experiment_id: d4b00469893d498ea65a729df202882a experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023 hostname: 1a844a452371 iterations_since_restore: 9 loss: 1.2721123942375183 node_ip: 172.17.0.2 pid: 1588 should_checkpoint: true time_since_restore: 199.56540870666504 time_this_iter_s: 20.42699670791626 time_total_s: 199.56540870666504 timestamp: 1609878389 timesteps_since_restore: 0 training_iteration: 9 trial_id: d3304_00003 == Status == Memory usage on this node: 5.8/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267 Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (4 RUNNING, 6 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00000 | RUNNING | 172.17.0.2:1585 | 2 | 4 | 16 | 0.000111924 | 2.29889 | 0.1104 | 1 | | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.70841 | 0.3997 | 3 | | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.27211 | 0.5409 | 9 | | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.51241 | 0.4586 | 1 | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1504)[0m [4, 4000] loss: 0.842 [2m[36m(pid=1585)[0m [2, 18000] loss: 0.214 [2m[36m(pid=1565)[0m [2, 18000] loss: 0.159 [2m[36m(pid=1504)[0m [4, 6000] loss: 0.561 [2m[36m(pid=1585)[0m [2, 20000] loss: 0.191 [2m[36m(pid=1588)[0m [10, 2000] loss: 1.210 [2m[36m(pid=1565)[0m [2, 20000] loss: 0.143 Result for DEFAULT_d3304_00003: accuracy: 0.5619 date: 2021-01-05_20-26-50 done: true experiment_id: d4b00469893d498ea65a729df202882a experiment_tag: 3_batch_size=16,l1=32,l2=4,lr=0.0012023 hostname: 1a844a452371 iterations_since_restore: 10 loss: 1.2222298237800597 node_ip: 172.17.0.2 pid: 1588 should_checkpoint: true time_since_restore: 220.31984639167786 time_this_iter_s: 20.754437685012817 time_total_s: 220.31984639167786 timestamp: 1609878410 timesteps_since_restore: 0 training_iteration: 10 trial_id: d3304_00003 == Status == Memory usage on this node: 5.8/240.1 GiB Using AsyncHyperBand: num_stopped=7 Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267 Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (4 RUNNING, 6 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00000 | RUNNING | 172.17.0.2:1585 | 2 | 4 | 16 | 0.000111924 | 2.29889 | 0.1104 | 1 | | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.70841 | 0.3997 | 3 | | DEFAULT_d3304_00003 | RUNNING | 172.17.0.2:1588 | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 | | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.51241 | 0.4586 | 1 | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1504)[0m [4, 8000] loss: 0.422 Result for DEFAULT_d3304_00000: accuracy: 0.2724 date: 2021-01-05_20-26-55 done: true experiment_id: 454624d453954d46b33a1eb496e7ec53 experiment_tag: 0_batch_size=2,l1=4,l2=16,lr=0.00011192 hostname: 1a844a452371 iterations_since_restore: 2 loss: 1.8605026947617531 node_ip: 172.17.0.2 pid: 1585 should_checkpoint: true time_since_restore: 225.84529209136963 time_this_iter_s: 105.25008797645569 time_total_s: 225.84529209136963 timestamp: 1609878415 timesteps_since_restore: 0 training_iteration: 2 trial_id: d3304_00000 == Status == Memory usage on this node: 5.3/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.4720179980278014 | Iter 2.000: -1.7390530687630177 | Iter 1.000: -2.301384049916267 Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (3 RUNNING, 7 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00000 | RUNNING | 172.17.0.2:1585 | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 | | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.70841 | 0.3997 | 3 | | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.51241 | 0.4586 | 1 | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ Result for DEFAULT_d3304_00006: accuracy: 0.5007 date: 2021-01-05_20-26-57 done: false experiment_id: d8bae0fc87134e6398fd0341279c1a1a experiment_tag: 6_batch_size=2,l1=64,l2=256,lr=0.0017724 hostname: 1a844a452371 iterations_since_restore: 2 loss: 1.3979384284215048 node_ip: 172.17.0.2 pid: 1565 should_checkpoint: true time_since_restore: 227.80454421043396 time_this_iter_s: 106.26833605766296 time_total_s: 227.80454421043396 timestamp: 1609878417 timesteps_since_restore: 0 training_iteration: 2 trial_id: d3304_00006 [2m[36m(pid=1504)[0m [4, 10000] loss: 0.335 Result for DEFAULT_d3304_00002: accuracy: 0.3849 date: 2021-01-05_20-27-06 done: true experiment_id: eaf4d25c9a0e46219afb226ed323095b experiment_tag: 2_batch_size=4,l1=8,l2=128,lr=0.0043699 hostname: 1a844a452371 iterations_since_restore: 4 loss: 1.720731588792801 node_ip: 172.17.0.2 pid: 1504 should_checkpoint: true time_since_restore: 236.71593952178955 time_this_iter_s: 54.69084811210632 time_total_s: 236.71593952178955 timestamp: 1609878426 timesteps_since_restore: 0 training_iteration: 4 trial_id: d3304_00002 == Status == Memory usage on this node: 4.7/240.1 GiB Using AsyncHyperBand: num_stopped=9 Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.5963747934103012 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00002 | RUNNING | 172.17.0.2:1504 | 4 | 8 | 128 | 0.00436986 | 1.72073 | 0.3849 | 4 | | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.39794 | 0.5007 | 2 | | DEFAULT_d3304_00000 | TERMINATED | | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1565)[0m [3, 2000] loss: 1.373 [2m[36m(pid=1565)[0m [3, 4000] loss: 0.696 [2m[36m(pid=1565)[0m [3, 6000] loss: 0.466 [2m[36m(pid=1565)[0m [3, 8000] loss: 0.357 [2m[36m(pid=1565)[0m [3, 10000] loss: 0.283 [2m[36m(pid=1565)[0m [3, 12000] loss: 0.241 [2m[36m(pid=1565)[0m [3, 14000] loss: 0.203 [2m[36m(pid=1565)[0m [3, 16000] loss: 0.178 [2m[36m(pid=1565)[0m [3, 18000] loss: 0.160 [2m[36m(pid=1565)[0m [3, 20000] loss: 0.142 Result for DEFAULT_d3304_00006: accuracy: 0.5095 date: 2021-01-05_20-28-36 done: false experiment_id: d8bae0fc87134e6398fd0341279c1a1a experiment_tag: 6_batch_size=2,l1=64,l2=256,lr=0.0017724 hostname: 1a844a452371 iterations_since_restore: 3 loss: 1.4272501501079649 node_ip: 172.17.0.2 pid: 1565 should_checkpoint: true time_since_restore: 326.1525847911835 time_this_iter_s: 98.34804058074951 time_total_s: 326.1525847911835 timestamp: 1609878516 timesteps_since_restore: 0 training_iteration: 3 trial_id: d3304_00006 == Status == Memory usage on this node: 4.2/240.1 GiB Using AsyncHyperBand: num_stopped=9 Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.5963747934103012 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267 Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (1 RUNNING, 9 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.42725 | 0.5095 | 3 | | DEFAULT_d3304_00000 | TERMINATED | | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00002 | TERMINATED | | 4 | 8 | 128 | 0.00436986 | 1.72073 | 0.3849 | 4 | | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1565)[0m [4, 2000] loss: 1.320 [2m[36m(pid=1565)[0m [4, 4000] loss: 0.701 [2m[36m(pid=1565)[0m [4, 6000] loss: 0.454 [2m[36m(pid=1565)[0m [4, 8000] loss: 0.345 [2m[36m(pid=1565)[0m [4, 10000] loss: 0.276 [2m[36m(pid=1565)[0m [4, 12000] loss: 0.234 [2m[36m(pid=1565)[0m [4, 14000] loss: 0.199 [2m[36m(pid=1565)[0m [4, 16000] loss: 0.170 [2m[36m(pid=1565)[0m [4, 18000] loss: 0.151 [2m[36m(pid=1565)[0m [4, 20000] loss: 0.144 Result for DEFAULT_d3304_00006: accuracy: 0.4749 date: 2021-01-05_20-30-15 done: false experiment_id: d8bae0fc87134e6398fd0341279c1a1a experiment_tag: 6_batch_size=2,l1=64,l2=256,lr=0.0017724 hostname: 1a844a452371 iterations_since_restore: 4 loss: 1.4950430885698218 node_ip: 172.17.0.2 pid: 1565 should_checkpoint: true time_since_restore: 425.3827154636383 time_this_iter_s: 99.23013067245483 time_total_s: 425.3827154636383 timestamp: 1609878615 timesteps_since_restore: 0 training_iteration: 4 trial_id: d3304_00006 == Status == Memory usage on this node: 4.1/240.1 GiB Using AsyncHyperBand: num_stopped=9 Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.4950430885698218 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267 Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (1 RUNNING, 9 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.49504 | 0.4749 | 4 | | DEFAULT_d3304_00000 | TERMINATED | | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00002 | TERMINATED | | 4 | 8 | 128 | 0.00436986 | 1.72073 | 0.3849 | 4 | | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1565)[0m [5, 2000] loss: 1.314 [2m[36m(pid=1565)[0m [5, 4000] loss: 0.663 [2m[36m(pid=1565)[0m [5, 6000] loss: 0.453 [2m[36m(pid=1565)[0m [5, 8000] loss: 0.341 [2m[36m(pid=1565)[0m [5, 10000] loss: 0.278 [2m[36m(pid=1565)[0m [5, 12000] loss: 0.235 [2m[36m(pid=1565)[0m [5, 14000] loss: 0.197 [2m[36m(pid=1565)[0m [5, 16000] loss: 0.173 [2m[36m(pid=1565)[0m [5, 18000] loss: 0.155 [2m[36m(pid=1565)[0m [5, 20000] loss: 0.137 Result for DEFAULT_d3304_00006: accuracy: 0.531 date: 2021-01-05_20-31-56 done: false experiment_id: d8bae0fc87134e6398fd0341279c1a1a experiment_tag: 6_batch_size=2,l1=64,l2=256,lr=0.0017724 hostname: 1a844a452371 iterations_since_restore: 5 loss: 1.373500657767952 node_ip: 172.17.0.2 pid: 1565 should_checkpoint: true time_since_restore: 526.6667892932892 time_this_iter_s: 101.28407382965088 time_total_s: 526.6667892932892 timestamp: 1609878716 timesteps_since_restore: 0 training_iteration: 5 trial_id: d3304_00006 == Status == Memory usage on this node: 4.1/240.1 GiB Using AsyncHyperBand: num_stopped=9 Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.4950430885698218 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267 Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (1 RUNNING, 9 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.3735 | 0.531 | 5 | | DEFAULT_d3304_00000 | TERMINATED | | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00002 | TERMINATED | | 4 | 8 | 128 | 0.00436986 | 1.72073 | 0.3849 | 4 | | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1565)[0m [6, 2000] loss: 1.325 [2m[36m(pid=1565)[0m [6, 4000] loss: 0.668 [2m[36m(pid=1565)[0m [6, 6000] loss: 0.457 [2m[36m(pid=1565)[0m [6, 8000] loss: 0.338 [2m[36m(pid=1565)[0m [6, 10000] loss: 0.283 [2m[36m(pid=1565)[0m [6, 12000] loss: 0.232 [2m[36m(pid=1565)[0m [6, 14000] loss: 0.198 [2m[36m(pid=1565)[0m [6, 16000] loss: 0.175 [2m[36m(pid=1565)[0m [6, 18000] loss: 0.149 [2m[36m(pid=1565)[0m [6, 20000] loss: 0.140 Result for DEFAULT_d3304_00006: accuracy: 0.4852 date: 2021-01-05_20-33-55 done: false experiment_id: d8bae0fc87134e6398fd0341279c1a1a experiment_tag: 6_batch_size=2,l1=64,l2=256,lr=0.0017724 hostname: 1a844a452371 iterations_since_restore: 6 loss: 1.5015573524537555 node_ip: 172.17.0.2 pid: 1565 should_checkpoint: true time_since_restore: 645.3050956726074 time_this_iter_s: 118.63830637931824 time_total_s: 645.3050956726074 timestamp: 1609878835 timesteps_since_restore: 0 training_iteration: 6 trial_id: d3304_00006 == Status == Memory usage on this node: 4.1/240.1 GiB Using AsyncHyperBand: num_stopped=9 Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.4950430885698218 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267 Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (1 RUNNING, 9 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.50156 | 0.4852 | 6 | | DEFAULT_d3304_00000 | TERMINATED | | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00002 | TERMINATED | | 4 | 8 | 128 | 0.00436986 | 1.72073 | 0.3849 | 4 | | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1565)[0m [7, 2000] loss: 1.295 [2m[36m(pid=1565)[0m [7, 4000] loss: 0.662 [2m[36m(pid=1565)[0m [7, 6000] loss: 0.452 [2m[36m(pid=1565)[0m [7, 8000] loss: 0.339 [2m[36m(pid=1565)[0m [7, 10000] loss: 0.270 [2m[36m(pid=1565)[0m [7, 12000] loss: 0.235 [2m[36m(pid=1565)[0m [7, 14000] loss: 0.193 [2m[36m(pid=1565)[0m [7, 16000] loss: 0.169 [2m[36m(pid=1565)[0m [7, 18000] loss: 0.154 [2m[36m(pid=1565)[0m [7, 20000] loss: 0.137 Result for DEFAULT_d3304_00006: accuracy: 0.4696 date: 2021-01-05_20-35-52 done: false experiment_id: d8bae0fc87134e6398fd0341279c1a1a experiment_tag: 6_batch_size=2,l1=64,l2=256,lr=0.0017724 hostname: 1a844a452371 iterations_since_restore: 7 loss: 1.5851255111492393 node_ip: 172.17.0.2 pid: 1565 should_checkpoint: true time_since_restore: 762.1866834163666 time_this_iter_s: 116.88158774375916 time_total_s: 762.1866834163666 timestamp: 1609878952 timesteps_since_restore: 0 training_iteration: 7 trial_id: d3304_00006 == Status == Memory usage on this node: 4.1/240.1 GiB Using AsyncHyperBand: num_stopped=9 Bracket: Iter 8.000: -1.267511115884781 | Iter 4.000: -1.4950430885698218 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267 Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (1 RUNNING, 9 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.58513 | 0.4696 | 7 | | DEFAULT_d3304_00000 | TERMINATED | | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00002 | TERMINATED | | 4 | 8 | 128 | 0.00436986 | 1.72073 | 0.3849 | 4 | | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1565)[0m [8, 2000] loss: 1.341 [2m[36m(pid=1565)[0m [8, 4000] loss: 0.667 [2m[36m(pid=1565)[0m [8, 6000] loss: 0.445 [2m[36m(pid=1565)[0m [8, 8000] loss: 0.336 [2m[36m(pid=1565)[0m [8, 10000] loss: 0.271 [2m[36m(pid=1565)[0m [8, 12000] loss: 0.228 [2m[36m(pid=1565)[0m [8, 14000] loss: 0.196 [2m[36m(pid=1565)[0m [8, 16000] loss: 0.175 [2m[36m(pid=1565)[0m [8, 18000] loss: 0.155 [2m[36m(pid=1565)[0m [8, 20000] loss: 0.135 Result for DEFAULT_d3304_00006: accuracy: 0.467 date: 2021-01-05_20-37-32 done: true experiment_id: d8bae0fc87134e6398fd0341279c1a1a experiment_tag: 6_batch_size=2,l1=64,l2=256,lr=0.0017724 hostname: 1a844a452371 iterations_since_restore: 8 loss: 1.6539037554110967 node_ip: 172.17.0.2 pid: 1565 should_checkpoint: true time_since_restore: 862.3724186420441 time_this_iter_s: 100.18573522567749 time_total_s: 862.3724186420441 timestamp: 1609879052 timesteps_since_restore: 0 training_iteration: 8 trial_id: d3304_00006 == Status == Memory usage on this node: 4.1/240.1 GiB Using AsyncHyperBand: num_stopped=10 Bracket: Iter 8.000: -1.4607074356479388 | Iter 4.000: -1.4950430885698218 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267 Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (1 RUNNING, 9 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00006 | RUNNING | 172.17.0.2:1565 | 2 | 64 | 256 | 0.00177236 | 1.6539 | 0.467 | 8 | | DEFAULT_d3304_00000 | TERMINATED | | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00002 | TERMINATED | | 4 | 8 | 128 | 0.00436986 | 1.72073 | 0.3849 | 4 | | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ == Status == Memory usage on this node: 4.0/240.1 GiB Using AsyncHyperBand: num_stopped=10 Bracket: Iter 8.000: -1.4607074356479388 | Iter 4.000: -1.4950430885698218 | Iter 2.000: -1.7041921138763427 | Iter 1.000: -2.301384049916267 Resources requested: 0/32 CPUs, 0/2 GPUs, 0.0/157.71 GiB heap, 0.0/49.37 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-01-05_20-23-08 Number of trials: 10/10 (10 TERMINATED) +---------------------+------------+-------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_d3304_00000 | TERMINATED | | 2 | 4 | 16 | 0.000111924 | 1.8605 | 0.2724 | 2 | | DEFAULT_d3304_00001 | TERMINATED | | 8 | 16 | 32 | 0.077467 | 2.32186 | 0.1017 | 1 | | DEFAULT_d3304_00002 | TERMINATED | | 4 | 8 | 128 | 0.00436986 | 1.72073 | 0.3849 | 4 | | DEFAULT_d3304_00003 | TERMINATED | | 16 | 32 | 4 | 0.00120234 | 1.22223 | 0.5619 | 10 | | DEFAULT_d3304_00004 | TERMINATED | | 4 | 16 | 32 | 0.016474 | 2.31342 | 0.102 | 1 | | DEFAULT_d3304_00005 | TERMINATED | | 4 | 128 | 64 | 0.00757252 | 1.77391 | 0.3647 | 2 | | DEFAULT_d3304_00006 | TERMINATED | | 2 | 64 | 256 | 0.00177236 | 1.6539 | 0.467 | 8 | | DEFAULT_d3304_00007 | TERMINATED | | 8 | 8 | 8 | 0.000155891 | 2.30388 | 0.1011 | 1 | | DEFAULT_d3304_00008 | TERMINATED | | 2 | 16 | 64 | 0.0310199 | 2.33698 | 0.0983 | 1 | | DEFAULT_d3304_00009 | TERMINATED | | 4 | 4 | 32 | 0.0175239 | 2.31098 | 0.101 | 1 | +---------------------+------------+-------+--------------+------+------+-------------+---------+------------+----------------------+ Best trial config: {'l1': 32, 'l2': 4, 'lr': 0.0012023396319256663, 'batch_size': 16} Best trial final validation loss: 1.2222298237800597 Best trial final validation accuracy: 0.5619 Files already downloaded and verified Files already downloaded and verified Best trial test set accuracy: 0.5537 ``` 如果运行代码,则示例输出如下所示: 为了避免浪费资源,大多数审判​​已提早停止。 效果最好的试验的验证准确度约为 58%,可以在测试仪上进行确认。 就是这样了! 您现在可以调整 PyTorch 模型的参数。 **脚本的总运行时间**:(14 分钟 43.400 秒) [下载 Python 源码:`hyperparameter_tuning_tutorial.py`](../_downloads/95074cd7ce8c3e57a92e7a9c49182e6a/hyperparameter_tuning_tutorial.py) [下载 Jupyter 笔记本:`hyperparameter_tuning_tutorial.ipynb`](../_downloads/c24b93738bc036c1b66d0387555bf69a/hyperparameter_tuning_tutorial.ipynb) [由 Sphinx 画廊](https://sphinx-gallery.readthedocs.io)生成的画廊