This file records all major updates and new features, starting from version 0.5. As Tensorforce is still developing, updates and bug fixes for the internal architecture are continuously being implemented, which will not be tracked here in detail.
- New agent argument
trackingand corresponding functiontracked_tensors()to track and retrieve the current value of predefined tensors, similar tosummarizerfor TensorBoard summaries - New experimental value
trace_decayandgae_decayfor Tensorforce agent argumentreward_estimation, soon for other agent types as well - New options
"early"and"late"for valueestimate_advantageof Tensorforce agent argumentreward_estimation - Changed default value for
Agent.act()argumentdeterministicfromFalsetoTrue
- New option for
Functionlayer argumentfunctionto pass string function expression with argument "x", e.g. "(x+1.0)/2.0"
- New summary
episode-lengthrecorded as part of summary label "reward"
- Support for vectorized parallel environments via new function
Environment.is_vectorizable()and new argumentnum_parallelforEnvironment.reset()- See
tensorforce/environments.cartpole.pyfor a vectorizable environment example Runneruses vectorized parallelism by default ifnum_parallel > 1,remote=Noneand environment supports vectorization- See
examples/act_observe_vectorized.pyfor more details on act-observe interaction
- See
- New extended and vectorizable custom CartPole environment via key
custom_cartpole(work in progress) - New environment argument
reward_shapingto provide a simple way to modify/shape rewards of an environment, can be specified either as callable or string function expression
- New option for command line arguments
--checkpointsand--summariesto add comma-separated checkpoint/summary filename in addition to directory - Added episode lengths to logging plot besides episode returns
- Temporal horizon handling of RNN layers
- Critical bugfix for late horizon value prediction (including DQN variants and DPG agent) in combination with baseline RNN
- GPU problems with scatter operations
- Critical bugfix for DQN variants and DPG agent
- Removed default value
"adam"for Tensorforce agent argumentoptimizer(since default optimizer argumentlearning_rateremoved, see below) - Removed option
"minimum"for Tensorforce agent argumentmemory, useNoneinstead - Changed default value for
dqn/double_dqn/dueling_dqnagent argumenthuber_lossfrom0.0toNone
- Removed default value
0.999forexponential_normalizationlayer argumentdecay - Added new layer
batch_normalization(generally should only be used for the agent argumentsreward_processing[return_processing]andreward_processing[advantage_processing]) - Added
exponential/instance_normalizationlayer argumentonly_meanwith defaultFalse - Added
exponential/instance_normalizationlayer argumentmin_variancewith default1e-4
- Removed default value
1e-3for optimizer argumentlearning_rate - Changed default value for optimizer argument
gradient_norm_clippingfrom1.0toNone(no gradient clipping) - Added new optimizer
doublecheck_stepand corresponding argumentdoublecheck_updatefor optimizer wrapper - Removed
linesearch_stepoptimizer argumentaccept_ratio - Removed
natural_gradientoptimizer argumentreturn_improvement_estimate
- Added option to specify agent argument
saveras string, which is interpreted assaver[directory]with otherwise default values - Added default value for agent argument
saver[frequency]as10(save model every 10 updates by default) - Changed default value of agent argument
saver[max_checkpoints]from5to10
- Added option to specify agent argument
summarizeras string, which is interpreted assummarizer[directory]with otherwise default values - Renamed option of agent argument
summarizerfromsummarizer[labels]tosummarizer[summaries](use of the term "label" due to earlier version, outdated and confusing by now) - Changed interpretation of agent argument
summarizer[summaries] = "all"to include only numerical summaries, so all summaries except "graph" - Changed default value of agent argument
summarizer[summaries]from["graph"]to"all" - Changed default value of agent argument
summarizer[max_summaries]from5to7(number of different colors in TensorBoard) - Added option
summarizer[filename]to agent argumentsummarizer
- Added option to specify agent argument
recorderas string, which is interpreted asrecorder[directory]with otherwise default values
- Added
--checkpoints/--summaries/--recordingscommand line argument to enable saver/summarizer/recorder agent argument specification separate from core agent configuration
- Added
save_load_agent.pyexample script to illustrate regular agent saving and loading
- Fixed problem with optimizer argument
gradient_norm_clippingnot being applied correctly - Fixed problem with
exponential_normalizationlayer not updating moving mean and variance correctly - Fixed problem with
recentmemory for timestep-based updates sometimes sampling invalid memory indices
- Removed agent arguments
execution,buffer_observe,seed - Renamed agent arguments
baseline_policy/baseline_network/critic_networktobaseline/critic - Renamed agent
reward_estimationargumentsestimate_horizontopredict_horizon_values,estimate_actionstopredict_action_values,estimate_terminaltopredict_terminal_values - Renamed agent argument
preprocessingtostate_preprocessing - Default agent preprocessing
linear_normalization - Moved agent arguments for reward/return/advantage processing from
preprocessingtoreward_preprocessingandreward_estimation[return_/advantage_processing] - New agent argument
configwith valuesbuffer_observe,enable_int_action_masking,seed - Renamed PPO/TRPO/DPG argument
critic_network/_optimizertobaseline/baseline_optimizer - Renamed PPO argument
optimization_stepstomulti_step - New TRPO argument
subsampling_fraction - Changed agent argument
use_beta_distributiondefault to false - Added double DQN agent (
double_dqn) - Removed
Agent.act()argumentevaluation - Removed agent function arguments
query(functionality removed) - Agent saver functionality changed (Checkpoint/SavedModel instead of Saver/Protobuf):
save/loadfunctions andsaverargument changed - Default behavior when specifying
saveris not to load agent, unless agent is created viaAgent.load - Agent summarizer functionality changed:
summarizerargument changed, some summary labels and other options removed - Renamed RNN layers
internal_{rnn/lstm/gru}tornn/lstm/gruandrnn/lstm/grutoinput_{rnn/lstm/gru} - Renamed
autonetwork argumentinternal_rnntornn - Renamed
(internal_)rnn/lstm/grulayer argumentlengthtohorizon - Renamed
update_modifier_wrappertooptimizer_wrapper - Renamed
optimizing_steptolinesearch_step, andUpdateModifierWrapperargumentoptimizing_iterationstolinesearch_iterations - Optimizer
subsampling_stepaccepts both absolute (int) and relative (float) fractions - Objective
policy_gradientargumentratio_basedrenamed toimportance_sampling - Added objectives
state_valueandaction_value - Added
Gaussiandistribution argumentsglobal_stddevandbounded_transform(for improved bounded action space handling) - Changed default memory
deviceargument toCPU:0 - Renamed rewards summaries
Agent.create()accepts act-function asagentargument for recording- Singleton states and actions are now consistently handled as singletons
- Major change to policy handling and defaults, in particular
parametrized_distributions, new default policiesparametrized_state/action_value - Combined
longandinttype - Always wrap environment in
EnvironmentWrapperclass - Changed
tune.pyarguments
- Changed independent mode of
agent.actto use final values of dynamic hyperparameters and avoid TensorFlow conditions - Extended
"tensorflow"format ofagent.saveto include an optimized Protobuf model with an act-only graph as.pbfile, andAgent.loadformat"pb-actonly"to load act-only agent based on Protobuf model - Support for custom summaries via new
summarizerargument valuecustomto specify summary type, andAgent.summarize(...)to record summary values - Added min/max-bounds for dynamic hyperparameters min/max-bounds to assert valid range and infer other arguments
- Argument
batch_sizenow mandatory for all agent classes - Removed
Estimatorargumentcapacity, now always automatically inferred - Internal changes related to agent arguments
memory,updateandreward_estimation - Changed the default
biasandactivationargument of some layers - Fixed issues with
sequencepreprocessor - DQN and dueling DQN properly constrained to
intactions only - Added
use_beta_distributionargument with defaultTrueto many agents andParametrizedDistributionspolicy, so default can be changed
- DQN/DuelingDQN/DPG argument
memorynow required to be specified explicitly, plusupdate_frequencydefault changed - Removed (temporarily)
conv1d/conv2d_transposelayers due to TensorFlow gradient problems Agent,EnvironmentandRunnercan now be imported viafrom tensorforce import ...- New generic reshape layer available as
reshape - Support for batched version of
Agent.actandAgent.observe - Support for parallelized remote environments based on Python's
multiprocessingandsocket(replacingtensorforce/contrib/socket_remote_env/andtensorforce/environments/environment_process_wrapper.py), available viaEnvironment.create(...),Runner(...)andrun.py - Removed
ParallelRunnerand merged functionality withRunner - Changed
run.pyarguments - Changed independent mode for
Agent.act: additional argumentinternalsand corresponding return value, initial internals viaAgent.initial_internals(),Agent.reset()not required anymore - Removed
deterministicargument forAgent.actunless independent mode - Added
formatargument tosave/load/restorewith supported formatstensorflow,numpyandhdf5 - Changed
saveargumentappend_timesteptoappendwith defaultNone(instead of'timesteps') - Added
get_variableandassign_variableagent functions
- Added optional
memoryargument to various agents - Improved summary labels, particularly
"entropy"and"kl-divergence" linearlayer now accepts tensors of rank 1 to 3- Network output / distribution input does not need to be a vector anymore
- Transposed convolution layers (
conv1d/2d_transpose) - Parallel execution functionality contributed by @jerabaul29, currently under
tensorforce/contrib/ - Accept string for runner
save_best_agentargument to specify best model directory different fromsaverconfiguration saverargumentstepsremoved andsecondsrenamed tofrequency- Moved
Parallel/Runnerargumentmax_episode_timestepsfromrun(...)to constructor - New
Environment.create(...)argumentmax_episode_timesteps - TensorFlow 2.0 support
- Improved Tensorboard summaries recording
- Summary labels
graph,variablesandvariables-histogramtemporarily not working - TF-optimizers updated to TensorFlow 2.0 Keras optimizers
- Added TensorFlow Addons dependency, and support for TFA optimizers
- Changed unit of
target_sync_frequencyfrom timesteps to updates fordqnanddueling_dqnagent
- Improved unittest performance
- Added
updatesand renamedtimesteps/episodescounter for agents and runners - Renamed
critic_{network,optimizer}argument tobaseline_{network,optimizer} - Added Actor-Critic (
ac), Advantage Actor-Critic (a2c) and Dueling DQN (dueling_dqn) agents - Improved "same" baseline optimizer mode and added optional weight specification
- Reuse layer now global for parameter sharing across modules
- New block layer type (
block) for easier sharing of layer blocks - Renamed
PolicyAgent/-ModeltoTensorforceAgent/-Model - New
Agent.load(...)function, saving includes agent specification - Removed
PolicyAgentargument(baseline-)network - Added policy argument
temperature - Removed
"same"and"equal"options forbaseline_*arguments and changed internal baseline handling - Combined
state/action_valuetovalueobjective with argumentvalueeither"state"or"action"
- Fixed setup.py packages value
- DQFDAgent removed (temporarily)
- DQNNstepAgent and NAFAgent part of DQNAgent
- Agents need to be initialized via
agent.initialize()before application - States/actions of type
intrequire an entrynum_values(instead ofnum_actions) Agent.from_spec()changed and renamed toAgent.create()Agent.act()argumentfetch_tensorschanged and renamed toquery,indexrenamed toparallel,bufferedremovedAgent.observe()argumentindexrenamed toparallelAgent.atomic_observe()removedAgent.save/restore_model()renamed toAgent.save/restore()
update_moderenamed toupdatestates_preprocessingandreward_preprocessingchanged and combined topreprocessingactions_explorationchanged and renamed toexplorationexecutionentrynum_parallelreplaced by a separate argumentparallel_interactionsbatched_observeandbatching_capacityreplaced by argumentbuffer_observescoperenamed toname
update_modereplaced bybatch_size,update_frequencyandstart_updatingoptimizerremoved, implicitly defined as'adam',learning_rateaddedmemorydefines capacity of implicitly defined memory'replay'double_q_modelremoved (temporarily)
- New mandatory argument
max_episode_timesteps update_modereplaced bybatch_sizeandupdate_frequencymemoryremovedbaseline_moderemovedbaselineargument changed and renamed tocritic_networkbaseline_optimizerrenamed tocritic_optimizergae_lambdaremoved (temporarily)
step_optimizerremoved, implicitly defined as'adam',learning_rateadded
cg_*andls_*arguments removed
optimizerremoved, implicitly defined as'adam',learning_rateadded
- Environment properties
statesandactionsare now functionsstates()andactions() - States/actions of type
intrequire an entrynum_values(instead ofnum_actions) - New function
Environment.max_episode_timesteps()
- ALE, MazeExp, OpenSim, Gym, Retro, PyGame and ViZDoom moved to
tensorforce.environments - Other environment implementations removed (may be upgraded in the future)
- Improved
run()API forRunnerandParallelRunner ThreadedRunnerremoved
examplesfolder (includingconfigs) removed, apart fromquickstart.py- New
benchmarksfolder to replace parts of oldexamplesfolder