日志记录

使用Logger

Spinning Up 提供了基本的日志工具,在 LoggerEpochLogger 类中实现。 Logger类包含用于保存诊断,超参数配置,训练运行的状态和训练好的模型。 EpochLogger类在其之上添加了一层,以便于在每个轮次和整个MPI工作者之间轻松跟踪诊断的平均值,标准差,最小值和最大值。

你应该知道

所有 Spinning Up 算法实现使用了 EpochLogger。

例子

首先,让我们看一个简单的示例,说明EpochLogger如何跟踪诊断值:

>>> from spinup.utils.logx import EpochLogger
>>> epoch_logger = EpochLogger()
>>> for i in range(10):
        epoch_logger.store(Test=i)
>>> epoch_logger.log_tabular('Test', with_min_and_max=True)
>>> epoch_logger.dump_tabular()
-------------------------------------
|     AverageTest |             4.5 |
|         StdTest |            2.87 |
|         MaxTest |               9 |
|         MinTest |               0 |
-------------------------------------

store 方法用于将所有 Test 值保存到 epoch_logger 的内部状态。 然后,在调用 log_tabular 时,它将计算内部状态下所有值的 Test 的平均值,标准偏差,最小值和最大值。 调用 log_tabular 之后,内部状态会清除干净(以防止在下一个轮次泄漏到统计信息中)。 最后,调用 dump_tabular 将诊断信息写入文件和标准输出。

接下来,让我们看一下包含日志记录的完整训练过程,以突出显示配置和模型保存以及诊断记录:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
 import numpy as np
 import tensorflow as tf
 import time
 from spinup.utils.logx import EpochLogger


 def mlp(x, hidden_sizes=(32,), activation=tf.tanh, output_activation=None):
     for h in hidden_sizes[:-1]:
         x = tf.layers.dense(x, units=h, activation=activation)
     return tf.layers.dense(x, units=hidden_sizes[-1], activation=output_activation)


 # Simple script for training an MLP on MNIST.
 def train_mnist(steps_per_epoch=100, epochs=5,
                 lr=1e-3, layers=2, hidden_size=64,
                 logger_kwargs=dict(), save_freq=1):

     logger = EpochLogger(**logger_kwargs)
     logger.save_config(locals())

     # Load and preprocess MNIST data
     (x_train, y_train), _ = tf.keras.datasets.mnist.load_data()
     x_train = x_train.reshape(-1, 28*28) / 255.0

     # Define inputs & main outputs from computation graph
     x_ph = tf.placeholder(tf.float32, shape=(None, 28*28))
     y_ph = tf.placeholder(tf.int32, shape=(None,))
     logits = mlp(x_ph, hidden_sizes=[hidden_size]*layers + [10], activation=tf.nn.relu)
     predict = tf.argmax(logits, axis=1, output_type=tf.int32)

     # Define loss function, accuracy, and training op
     y = tf.one_hot(y_ph, 10)
     loss = tf.losses.softmax_cross_entropy(y, logits)
     acc = tf.reduce_mean(tf.cast(tf.equal(y_ph, predict), tf.float32))
     train_op = tf.train.AdamOptimizer().minimize(loss)

     # Prepare session
     sess = tf.Session()
     sess.run(tf.global_variables_initializer())

     # Setup model saving
     logger.setup_tf_saver(sess, inputs={'x': x_ph},
                                 outputs={'logits': logits, 'predict': predict})

     start_time = time.time()

     # Run main training loop
     for epoch in range(epochs):
         for t in range(steps_per_epoch):
             idxs = np.random.randint(0, len(x_train), 32)
             feed_dict = {x_ph: x_train[idxs],
                          y_ph: y_train[idxs]}
             outs = sess.run([loss, acc, train_op], feed_dict=feed_dict)
             logger.store(Loss=outs[0], Acc=outs[1])

         # Save model
         if (epoch % save_freq == 0) or (epoch == epochs-1):
             logger.save_state(state_dict=dict(), itr=None)

         # Log info about epoch
         logger.log_tabular('Epoch', epoch)
         logger.log_tabular('Acc', with_min_and_max=True)
         logger.log_tabular('Loss', average_only=True)
         logger.log_tabular('TotalGradientSteps', (epoch+1)*steps_per_epoch)
         logger.log_tabular('Time', time.time()-start_time)
         logger.dump_tabular()

 if __name__ == '__main__':
     train_mnist()

In this example, observe that

日志记录和MPI

你应该知道

通过使用MPI求平均梯度和/或其他关键数量,可以轻松地并行化强化学习中的几种算法。 Spinning Up日志记录的设计使其在使用MPI时表现良好:只会从rank 0的进程中写入标准输出。 但是,如果你使用EpochLogger,其他进程的信息也不会丢失:通过 store 传递到EpochLogger中的数据, 无论存储在哪个进程中,都将用于计算诊断的平均值/标准差/最小值/最大值。

Logger类

class spinup.utils.logx.Logger(output_dir=None, output_fname='progress.txt', exp_name=None)[源代码]

A general-purpose logger.

Makes it easy to save diagnostics, hyperparameter configurations, the state of a training run, and the trained model.

__init__(output_dir=None, output_fname='progress.txt', exp_name=None)[源代码]

Initialize a Logger.

参数:
  • output_dir (string) – A directory for saving results to. If None, defaults to a temp directory of the form /tmp/experiments/somerandomnumber.
  • output_fname (string) – Name for the tab-separated-value file containing metrics logged throughout a training run. Defaults to progress.txt.
  • exp_name (string) – Experiment name. If you run multiple training runs and give them all the same exp_name, the plotter will know to group them. (Use case: if you run the same hyperparameter configuration with multiple random seeds, you should give them all the same exp_name.)
dump_tabular()[源代码]

Write all of the diagnostics from the current iteration.

Writes both to stdout, and to the output file.

log(msg, color='green')[源代码]

Print a colorized message to stdout.

log_tabular(key, val)[源代码]

Log a value of some diagnostic.

Call this only once for each diagnostic quantity, each iteration. After using log_tabular to store values for each diagnostic, make sure to call dump_tabular to write them out to file and stdout (otherwise they will not get saved anywhere).

save_config(config)[源代码]

Log an experiment configuration.

Call this once at the top of your experiment, passing in all important config vars as a dict. This will serialize the config to JSON, while handling anything which can’t be serialized in a graceful way (writing as informative a string as possible).

Example use:

logger = EpochLogger(**logger_kwargs)
logger.save_config(locals())
save_state(state_dict, itr=None)[源代码]

Saves the state of an experiment.

To be clear: this is about saving state, not logging diagnostics. All diagnostic logging is separate from this function. This function will save whatever is in state_dict—usually just a copy of the environment—and the most recent parameters for the model you previously set up saving for with setup_tf_saver.

Call with any frequency you prefer. If you only want to maintain a single state and overwrite it at each call with the most recent version, leave itr=None. If you want to keep all of the states you save, provide unique (increasing) values for ‘itr’.

参数:
  • state_dict (dict) – Dictionary containing essential elements to describe the current state of training.
  • itr – An int, or None. Current iteration of training.
setup_tf_saver(sess, inputs, outputs)[源代码]

Set up easy model saving for tensorflow.

Call once, after defining your computation graph but before training.

参数:
  • sess – The Tensorflow session in which you train your computation graph.
  • inputs (dict) – A dictionary that maps from keys of your choice to the tensorflow placeholders that serve as inputs to the computation graph. Make sure that all of the placeholders needed for your outputs are included!
  • outputs (dict) – A dictionary that maps from keys of your choice to the outputs from your computation graph.
class spinup.utils.logx.EpochLogger(*args, **kwargs)[源代码]

Bases: spinup.utils.logx.Logger

A variant of Logger tailored for tracking average values over epochs.

Typical use case: there is some quantity which is calculated many times throughout an epoch, and at the end of the epoch, you would like to report the average / std / min / max value of that quantity.

With an EpochLogger, each time the quantity is calculated, you would use

epoch_logger.store(NameOfQuantity=quantity_value)

to load it into the EpochLogger’s state. Then at the end of the epoch, you would use

epoch_logger.log_tabular(NameOfQuantity, **options)

to record the desired values.

get_stats(key)[源代码]

Lets an algorithm ask the logger for mean/std/min/max of a diagnostic.

log_tabular(key, val=None, with_min_and_max=False, average_only=False)[源代码]

Log a value or possibly the mean/std/min/max values of a diagnostic.

参数:
  • key (string) – The name of the diagnostic. If you are logging a diagnostic whose state has previously been saved with store, the key here has to match the key you used there.
  • val – A value for the diagnostic. If you have previously saved values for this key via store, do not provide a val here.
  • with_min_and_max (bool) – If true, log min and max values of the diagnostic over the epoch.
  • average_only (bool) – If true, do not log the standard deviation of the diagnostic over the epoch.
store(**kwargs)[源代码]

Save something into the epoch_logger’s current state.

Provide an arbitrary number of keyword arguments with numerical values.

加载保存的图

spinup.utils.logx.restore_tf_graph(sess, fpath)[源代码]

Loads graphs saved by Logger.

Will output a dictionary whose keys and values are from the ‘inputs’ and ‘outputs’ dict you specified with logger.setup_tf_saver().

参数:
  • sess – A Tensorflow session.
  • fpath – Filepath to save directory.
返回:

A dictionary mapping from keys to tensors in the computation graph loaded from fpath.

当你使用此方法还原由Spinning Up实现保存的图时,可以最少期望它包括以下内容:

x Tensorflow 状态输入占位符。
pi x 中的状态为条件,从智能体中采样动作。

通常还存储算法的相关值函数。 有关给定算法还能保存哪些内容的详细信息,请参见其文档页面。