基于强化学习的非线性系统自适应优化控制研究

CORC > 自动化研究所 > 中国科学院自动化研究所 > 毕业生 > 博士学位论文

题名	基于强化学习的非线性系统自适应优化控制研究
作者	杨雄
学位类别	工学博士
答辩日期	2014-05-20
授予单位	中国科学院大学
授予地点	中国科学院自动化研究所
导师	刘德荣
关键词	非线性系统强化学习神经网络最优控制智能控制 Nonlinear system reinforcement learning neural network optimal control intelligent control
其他题名	Research on Adaptive Optimal Control for Nonlinear Systems Based on Reinforcement Learning
学位专业	控制理论与控制工程
中文摘要	非线性系统的最优控制问题一直以来都是控制理论和工程应用中的热点研究领域。人们在寻求解决这类问题方法的过程中，逐步建立起变分法、最大值原理和动态规划等理论。这些理论通常需要非线性系统满足一定的特性，比如具有确定的数学模型和清晰的系统结构等。然而，随着控制对象的多样化、状态空间的复杂化、以及动态系统的不确定性，这些理论逐渐难以适用于现代智能控制理论的研究。同时，这些理论本身也存在一些缺陷，比如变分法难以解决控制存在约束的问题；最大值原理只能得到最优控制的必要条件，不能解决一般化非线性系统的最优控制问题；动态规划在求解最优控制过程中，容易出现“维数灾”现象。为了克服这些理论在求解最优控制问题中的不足，基于动态规划思想的强化学习理论得以建立，并逐步发展成为现代智能控制理论的重要组成部分。强化学习是研究智能系统较为新颖有效的方法，具备广阔的应用前景。故而，它获得了诸多科研人员和工程人员的深切关注。目前，它的理论体系尚待进一步深化和完善。利用强化学习研究非线性系统最优控制的过程中，还存在许多亟待解决的问题。因此，本文基于强化学习方法，进一步研究非线性系统的最优控制问题，建立针对不同动态系统的强化学习算法结构，拓展并尝试完善强化学习的理论体系。本文的主要工作和贡献体现在以下五个方面： 1. 在强化学习方法的基础上，提出了一种新的执行-评价算法结构，用于研究一类多输入多输出未知非仿射非线性离散系统的跟踪控制问题。区别于传统研究未知系统采用辨识器的方法，本文所提出的方法不需要对系统进行辨识。同时，该方法综合利用反馈线性化和强化学习理论，实现了对未知非仿射离散系统的在线控制。首先，借助隐函数定理，构建一个控制器用于抵消未知系统的非线性部分。然后，在设计执行-评价结构时，将该控制器设计为执行网络的输出。与此同时，引入一个效用函数用于评价系统的跟踪效果，并将累积效用作为评价网络的输出。最后，利用Lyapunov理论证明了系统的跟踪误差以及神经网络权值的一致最终有界性，而且通过参数调节可使系统的跟踪误差收敛到零的充分小邻域内。 2. 利用强化学习方法，研究了一类具有控制约束的未知非线性连续系统的近似最优控制问题。该非线性系统的结构具有不确定性，即既可以是仿射非线性系统，也可以是非仿射非线性系统。首先，利用递归神经网络对未知系统进行辨识，将其转化为具有鲁棒项的仿射非线性系统。接着，引入非二次形式的代价函数将控制约束问题转化为非控制约束问题。然后，借助强化学习方法典型的执行-评价结构，进行最优控制器的设计。区别于传统执行-评价结构中网络权值交替更新的调整方式，该算法能实现执行网络权值与评价网络权值的同时更新。此外，鉴于系统辨识和最优控制器的设计是两个独立的过程，因而，该算法是一种离线的方法。最后，通过仿真实验，验证该方法能获得这类系统的近似最优控制，并可以有效地克服控制约束。 3. 基于强化学习的结构，衍化出辨识-评价算法结构，研究了带有饱和执行器的部分未知仿射非线性连续系统最优控制问题。首先，...
英文摘要	The optimal control problem of nonlinear systems has drawn intensive attention in the field of control theory and the applications of engineering. In order to solve the problem, many significant methods were developed, such as Calculus of Variations, Pontryagin's Maximum Principle, Dynamic Programming (DP). All of these approaches generally require nonlinear systems to meet certain conditions, for instance, the mathematical models of nonlinear systems are available and the structures of dynamic systems are known. Nevertheless, with the diversities of controlled systems, the complexity of the state space, and the uncertainties of dynamic systems, above methods do not work well any more to develop the control theory of modern intelligent systems. Meanwhile, there are several intrinsic characters of these approaches prohibiting them being widely used, for example, Calculus of Variations cannot be utilized to deal with constrained optimal control problems; Pontryagin's Maximum Principle only guarantees the necessary condition of the optimal control, and it cannot be used to solve general optimal control problems of nonlinear systems; DP might give rise to “the curse of dimensionality”, while solving optimal control problems. In order to overcome the difficulties in applying these methods, reinforcement learning (RL) is introduced which is based on the theory of DP and has been developed as an important part of the field of intelligent control. It should be mentioned that RL is not only a novel and effective method to study the problems of intelligent systems, but also has a bright future in real engineering applications. Consequently, RL has been draw considerable attention from researchers and engineers. Unfortunately, up to now, the theory of RL is not well developed. There are still lots of difficulties required to be conquered, while using RL to cope with optimal control problems of nonlinear systems.Therefore, in this paper, the optimal control problem of nonlinear systems using RL is further investigated, various RL methods are developed for different kinds of dynamic systems, and new theories are presented to complete the field of the theory of RL. The main contributions of this paper include five aspects: 1. Based on the RL method, a novel actor--critic algorithm is developed to derive adaptive control of a class of multi-input--multi-output unknown non-affine nonlinear discrete-time systems. Compared with traditional methods using identifiers...
语种	中文
其他标识符	201118014628023
内容类型	学位论文
源URL	[http://ir.ia.ac.cn/handle/173211/6581]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	杨雄. 基于强化学习的非线性系统自适应优化控制研究[D]. 中国科学院自动化研究所. 中国科学院大学. 2014.