优化策略模型下的匹配律算法

CORC > 清华大学

	优化策略模型下的匹配律算法
	程振波 ; 邓志东 ; Cheng Zhenbo ; Deng Zhidong
	2010-06-09 ; 2010-06-09
关键词	部分可观马尔可夫决策过程再励学习优化策略搜索匹配律 partially observable Markov decision process reinforcement learning optimal policy search matching law TP301.6
其他题名	Algorithm of matching law based on optimal policy search model
中文摘要	利用基于部分可观马尔可夫决策过程的策略搜索模型,提出了一种具有优化行为的策略搜索算法,并推导出满足匹配律的策略算法.被试可通过调整策略参数,最大化目标值函数的期望值,并根据已往的经验调整策略参数.假定被试所处的环境具有马尔可夫性,通过计算值函数期望值的梯度可求得优化行为的策略搜索算法.理论分析与仿真结果表明,如果策略参数与值函数的期望值仅受当前经验的影响,则可由获得优化行为的策略算法推导出符合匹配律的策略算法.研究结果揭示了匹配行为与优化策略搜索算法之间的关系,表明满足匹配律的决策行为是一类达到次优的决策行为.; Based on the policy search algorithm in partially observable Markov decision process(POMDP),an optimal policy search algorithm is proposed.An algorithm leading to matching law is then derived from the optimal algorithm.The aim of the subject can find a policy parameter that can maximize the expected value of a value function,and the policy parameter is updated on the experience of the subject.Due to the Markov assumption for the environment,the optimal policy algorithm can be obtained from computing the gradient of the expected value of the value function.Theoretical analysis and simulation results show that the decision behavior achieved by this algorithm is able to reach matching law.The matching law can be met if one subject tries to maximize the expected value of the value function under the simple assumption that past choice behaviors do not affect the expected value of the value function and the current policy.It reveals the relationship between the matching behavior and the optimal policy search algorithm,and suggests that the matching behavior is a suboptimal decision behavior.; 国家自然科学基金资助项目(60621062,60775040)
语种	中文 ; 中文
内容类型	期刊论文
源URL	[http://hdl.handle.net/123456789/55368]
专题	清华大学
推荐引用方式 GB/T 7714	程振波,邓志东,Cheng Zhenbo,等. 优化策略模型下的匹配律算法[J],2010, 2010.
APA	程振波,邓志东,Cheng Zhenbo,&Deng Zhidong.(2010).优化策略模型下的匹配律算法..
MLA	程振波,et al."优化策略模型下的匹配律算法".(2010).

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们