CORC  > 自动化研究所  > 中国科学院自动化研究所  > 毕业生  > 博士学位论文
题名近似动态规划方法的理论及其应用研究
作者谢志华
学位类别工学博士
答辩日期1995-08-01
授予单位中国科学院自动化研究所
授予地点中国科学院自动化研究所
导师郑应平
学位专业控制理论与控制工程
中文摘要作为人工智能,运筹学与控制理论交叉的智能控制研 究近年来一直是国内外学者关注的一个热点。人工智能与 传统控制理论及运筹学相结合,一方面可以利用人工智 能技术解决传统理论难以处理的复杂问题,另一方面人 工智能方法又可以借助现有传统方法的理论框架得到一 些如收敛性等方面的解析结果。近年来发展起来的近似动 态规划方法就是一个典型例子。 近似动态规划方法是联接主义学习理论中再励学习和 动态规划思想相结合的研究结果,对于人工智能和智能控 制都具有深入和长远的意义。本文在跟踪国际上近似动态 规划方法理论及应用的最新工作基础上,深入探讨了近似 动态规划方法的理论框架和现有结果,提出了新的近似动 态规划方法,并结合实例研究了学习方法的应用实现问 题。 本文主要工作如下: 综述了智能控制研究,较为详细地讨论了再励学习的 发展过程及与监督学习的关系,包括再励学习及其应用研 究的现状。 较为深入和全面地探讨了最优控制理论方法与包括再 励学习和监督学习的几类学习方法之间的关系,给出了解 析的分析结果,形成几类学习方法较为严格的理论框架。 详细分析了现有近似动态规划方法的特点及不足,提 出一类新的近似动态规划方法, 与原有方法结合,形成 了处理一般马尔可夫决策过程问题的近似动态规划方法 结构。 对于学习控制中探索与控制及其关系问题进行探 讨,提出了一种新的有效探索方法,与近似动态规划方法 结合形成了有效可行的学习方法。 针对近似动态规划方法的应用问题,结合实例提出了 一种求解排队系统优化的学习算法和一种直接自适应控 制方法。 研究了近似动态规划方法在联接主义网络上的实现问 题,基于一种元胞自动机网络模型实现函数映射建立了一 种可行算法,用于倒立摆系统的控制问题,给出了仿真数 据及分析。 最后对于目前研究中存在的问题和今后的研究发展方 向提出了自己的看法。 近似动态规划方法的研究还处于初始阶段,本文的工 作丰富了这类方法的理论和应用实现,相信将会对于其 进一步研究和发展起到有益的促进作用。
英文摘要As an interdiscipline of artificial intelligence (AI), operational research (OR), and control theory (CT), intelligent control (IC) has been one of central topics of researchers in many fields. In this interdisciplinary studies, AI techniques can be used to solved some complex problems which are difficult to the classical theories. Meanwhile, some theoretical framework and analytical results in AI can acquired from the classical theories. The new-developped approach, approximate dynamic programming (ADP), is a typical example. ADP is a result of the connectionism learning research which connect reinforcement learning (RL) with the basic idea of dynamic programming. This research has a far-reaching influence to AI and IC. The dissertation based on tracking the late ADP research abroad, study the ADP theory and existing results, propose new ADP methods, and discuss the applications of ADP methods using some examples. The main research works are as follows: Summarize the IC research, discuss the development of ADP and its relation with RL and supervised learning (SL), including the late research of ADP and its applications. Discuss the relation between optimal control theory and some learning methods including RL and SL, give some analytical results, and form a rigorous framework for the learning methods. Analyse the existing ADP methods in detail, develop a new ADP method dealing with average model in Markov decision process (MDP), form a ADP method which can deal With a general MDP. Study the relations between exploration and control in learning control, propose a new effective exploration method, connected with some ADP methods and form some applicable learning methods. In the application studies of ADP methods, a learning algorithm solving the queueing system optimization problem and a direct adaptive optimal control method are suggested. Some simulations are finished and the results are analysed. The realization problem of ADP methods in connectionism networks is studied, an algorithm based on a cellular automata as a function model, is proposed to control an inverted-pendulum system The simulation data and its analysis are given. Finally, summarys and prospects of the ADP studies are suggested. The research work of ADP methods is in a primary stage. The works in the dissertation will be helpful to the research of ADP and its applications, as well as its development.
语种中文
其他标识符333
内容类型学位论文
源URL[http://ir.ia.ac.cn/handle/173211/5651]  
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
谢志华. 近似动态规划方法的理论及其应用研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所. 1995.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace