Q-learning is approximate dynamic programming with expectation


所有跟贴·加跟贴·新语丝读书论坛

送交者: 短江学者 于 2016-03-15, 10:05:01:

回答: it summarize patterns to evaluate the cost. not based on sample route. 由 短江学者 于 2016-03-15, 10:00:16:

of the cost-to-go, so could go wrong but not often. The more intense monte carlo search then less likely to go wrong. But the combinatorics always win like in powerball



所有跟贴:


加跟贴

笔名: 密码: 注册笔名请按这里

标题:

内容: (BBCode使用说明