Alternating between normalized-probabilities and log-probabilities
The derivative of the logarithm trick well-used to solve stochastic optimization problems.
Score Functions
The central computation for MLE, often used in generalized linear regression, deep learning, kernel machines, dimensionality reduction, and tensor decompositions
The expected value of the score is zero. (used in the proof of REINFORCE algorithm)
The variance of the score is the Fisher information. It is used to determine the Cramer-Rao lower bounds.
Score Function Estimators
A recurring task in ML
Posterior computation in VI
Value function and policy learning in RL
Derivative pricing in computational finance
Inventory control in operations research
The gradient of expectation of function $f$ is difficult to compute, because the integral is typically unknown and the parameters , with respect to which we are computing the gradient, are of the distribution .
Moreover, we (perhaps) want to compute this gradient when the function $f$ is not differentiable.
Score function is an unbuased estimator of the gradient.
The function need not be differentiable. Instead, we should be able to evaluate it or observe its value for a given .
Score function estimators
Likelihood ratio methods
Automated variational inference
REINFORCE and policy gradients
Any gradients of the policy that correspond to high rewards are weighted higher—reinforced—by the estimator.
The estimator was called REINFORCE, and its generalization now forms the policy gradient theorem.
Control Variates
To make MC estimator effective, its variance is as low as possible.
(The gradient will not be useful otherwise.)
Control variates: used for variance reduction in MC estimators (baseline technique)
The choice of control variate is the principal challenge in the use of the score function estimators.
Ex. Constant baselines, clever sampling schemes (antithetic or stratified), delta methods, or adaptive baselines
Familes of Stochastic Estimators
Approaches
Differentiate the function f, using pathwise derivatives, if it is differentiable
Differentiate the density , using the score function
Using stochastic computation graph, PD and SF can be combined (providing the lowest variance)
This paper is related to Gaussian processes, especially in searching kernels. The core idea is that complicated kernels can be composited with commonly used kernel families, the squared exponential, periodic, linear, and rational quadratic.
Kernels used in Gaussian processes are positive semidefinite () which means they are closed under addition and multiplication operators.
Kernels are the most important feature in Gaussian processes, specifying which structures are likely under the GP prior, which in turn determines the generalization properties of the model.
Example expressions of composition kernels are shown below.
Generally, nonparametric regression gets drawback in posing high-dimensions, causing computation inefficiency. It says that learning 10 one-dimensional kernels is much easier than learning 1 ten-dimensional kernel. Whenever treating high-dimensional data, try to decompose complicate kernel into easier ones.
According to experiments with time series data, more base kernels are composited, better regression resultant comes. Here, the ‘depth’ is used to matter how many base kernels are used. More depths, the kernel can capture the most of the relevant structure.
Comparing to other methods: linear regression, Generalized Additive Models (GAM), GP with a standard SE kernel using Automatic Relevance Determination (GP SE-ARD), additive GPs, and kernel-search method of Hierarchical Kernel Learning (HKL); structure search outperforms of all in high-dimensional prediction task.
This paper is a dissertation of Andrew Ng. I have chosen to read this article because Professor Abbeel mentioned that chapter 1 and 2 from this article is great to know for MDP information.
The “curse of dimensionality” is caused by discretized reinforcement problem. Then would action with continuous control is avoid the dimensional problem?
‘Reward shaping’ refers to the practice of choosing or modifying a reward function to help algorithms learn.
The remarked thing about Partially Observable MDP (POMDP) is that it involves belief state tracking. Because the state is not fully observable, affected by other environment parameters, the general states are transformed to observation states and related to belief states. States are worked with distribution over represented belief of what state we are in.
In my opinion, according to this information, to handle POMDP environment is closely related to Bayesian methods. In general MDPs, all states are represented deterministically, and value or policy iteration methods are able to apply. Extending this, in POMDPs, state variables are represented with distribution and belief.
Everyday I read papers related to deep reinforcement learning and gaussian processes which are my interested research topics. Those gave me research inspirations and thoughts. Also I check reddit and twitter for the state-of-the-art research papers.
For the reinforcement learning, such topics including better exploration methods, continuous controls, and hierarchical learning are my tastes. I started to get interested in gaussian process by researching better exploration. Uncertainty is the key of the deep learning and largely concerning to reinforcement learning too.
The below lists are the collection of currently reading papers.
Deep Reinforcement Learning
Conventional RL
H. van Seijen, Effective multi-step temporal-difference learning for non-linear function approximation (2016)
K. De Asis et al., Multi-step reinforcement learning: a unifying algorithm (2017)
A. Mahmood, Incremental off-policy reinforcement learning algorithms (2017, Ph.D. thesis)
R. Sutton and A. Barto, Reinforcement learning: an introduction (2nd ed.) (2017, textbook)
MDP
A. Ng, Shaping and policy search in reinforcement learning Ch.1 & 2 (2003, Ph.D. thesis)
Deep RL
L. Lin, Reinforcement learning for robots using neural networks (1993, Ph.D. thesis)
V. Mnih et al., Human-level control through deep reinforcement learning (2015)
V. Mnih et al., Playing atari with deep reinforcement learning (2013)
Inverse RL
C. Finn et al., A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models (2016)
B. Ziebart et al., Maximum entropy inverse reinforcement learning (2010)
Imitation learning
S. Ross et al., A reduction of imitation learning and structured prediction to no-regret online learning (2011, DAGGER)
Meta learning
Y. Duan, Meta learning (2017, Ph.D. thesis)
Continuous control
Y. Wu, E. Mansimov et al., Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (2017, ACKTR)
D. Silver et al., Deterministic policy gradient algorithms (2014, DPG)
T. Lillicrap et al., Continuous control with deep reinforcement learning (2016, DDPG)
R. Islam et al., Reproducibility of benchmarked deep reinforcement learning tasks for continuous control (2017)
N. Heess et al., Learning continuous control policies by stochastic value gradients (2015, SVG)
V. Mnih et al., Asynchronous methods for deep reinforcement learning (2016, A3C)
T. Haarnoja et al., Reinforcement learning with deep energy-based policies (2017, Soft Q-learning)
Improving exploration
I. Osband et al., Deep exploration via bootstrapped DQN (2016)
M. Plappert et al., Parameter space noise for exploration (2017)
Model-based RL
M. Deisenroth and C. Rasmussen, PILCO: A model-based and data-efficient approach to policy search (2011, PILCO)
Policy gradient
R. Sutton et al., Policy gradient methods for reinforcement learning with function approximation (2000)
J. Peters and S. Schaal, Policy gradient methods for robotics (2006)
J. Peters and S. Schaal, Reinforcement learning of motor skills with policy gradients (2008)
Gaussian Process
Uncertainty
Y. Gal, Uncertainty in deep learning (2017, Ph.D. thesis)
Z. Ghahramani, Probabilistic machine learning and artificial intelligence (2015)
N. Srivastava et al., Dropout: A simple way to prevent neural networks from overfitting (2014)
Y. Gal and Z. Ghahramani, Dropout as a bayesian approximation: representing model uncertainty in deep learning (2016)
C. Guo et al., On calibration of modern neural networks (2017)
GP
C.E. Rasmussen and C.K.I. Williams, Gaussian processes for machine learning (2006, textbook)
C. Viroli and G.J. McLachlan, Deep gaussian mixture models (2017)
Kernel
D. Duvenaud et al., Structure discovery in nonparametric regression through compositional kernel search (2013)
A. Wilson and R. Adams, Gaussian process kernels for pattern discovery and exploration (2013)
Mathematics
G. Moore, The emergence of open sets, closed sets, and limit points in analysis and topology (2008)
Variational inference
A. Graves, Practical variational inference for neural networks (2011)
Other topics.
Generative models
C. Vondrick et al., Generating videos with scene dynamics (2016)
I. Goodfellow, NIPS 2016 tutorial: Generative adversarial networks (2017)
D. Kingma and M. Welling, Auto-encoding variational bayes (2014 VAE)
I. Goodfellow et al., Generative adversarial nets (2014, GAN)
A. Radford and L. Metz et al., Unsupervised representational learning with deep convolutional generative adversarial networks (2016, DCGAN)
Visual domains
M. Mathieu et al., Deep multi-scale video prediction beyond mean square error (2016)
H. Altwaijry et al., Learning to match aerial images with deep attentive architectures (2016)
M. Ranzato et al., Video (language) modeling: A baseline for generative models of natural videos (2016)
W. Lotter et al., Deep predictive coding networks for video prediction and unsupervised learning (2017)
G. Huang et al., Densely connected convolutional networks (2017, DenseNet)
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition (2015, VGGnet)
M. Lin et al., Network in network (2014)
A. Krizhevsky et al., ImageNet classification with deep convolutional neural networks (2012)
K. He et al., Deep residual learning for image recognition (2015, ResNet)
C. Szegedy et al., Going deeper with convolutions (2015, GoogLeNet)
G. Masi et al., Pansharpening by convolutional neural networks (2016)
C. Dong et al., Image super-resolution using deep convolutional networks (2015)
A. Karpathy et al., Large-scale video classification with convolutional neural networks (2014)
M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks (2013, ZFNet)
NLP
I. Sutskever et al., Sequence to sequence learning with neural networks (2014, Seq2seq)
J. Pennington et al., GloVe: Global vectors for word representation (2014)
Y. Kim, Convolutional neural networks for sentence classification (2014)