AIC Seminar Series
Reinforcement Learning by Policy Search
|Leon Peshkin||Harvard University|
Date: 2004-04-05 at 10:00
Location: EJ228 (Directions)
Teaching is hard, criticizing is easy. This metaphor stands behind the concept
of reinforcement learning as opposed to supervised learning. Reinforcement
learning means learning a policya mapping of observations into
actionsbased on feedback from the environment. Learning can be viewed as
browsing a set of policies while evaluating them by trial through interaction
with the environment. In this talk I briefly review the framework of
reinforcement learning and present two highlights from my dissertation.
First, I describe an algorithm which learns by ascending the gradient of
expected cumulative reinforcement. I show what conditions enable experience
re-use in learning. Building on statistical learning theory, I address the
question of sufficient experience for uniform convergence of policy
evaluation and obtain sample complexity bounds. Second, I demonstrate an
application of the proposed algorithm to the complex domain of simulated
adaptive packet routing in a telecommunication network. I conclude by
suggesting how to build an intelligent agent and where to apply
reinforcement learning in computer vision and natural language processing.
Keywords: MDP, POMDP, policy search, gradient methods, reinforcement
learning, adaptive systems, stochastic control, adaptive behavior.
Please arrive at least 10 minutes early as you will need to sign in by
following instructions by the lobby phone at Building E. SRI is located
at 333 Ravenswood Avenue in Menlo Park. Visitors may park in the parking
lots off Fourth Street. Detailed directions to SRI, as well as maps, are
available from the Visiting AIC web page.
There are two entrances to SRI International located on Ravenswood Ave.
Please check the Builing E entrance signage.
©2014 SRI International 333 Ravenswood Avenue, Menlo Park, CA 94025-3493