Q-learning is approximate dynamic programming with expectation
所有跟贴·加跟贴·新语丝读书论坛
送交者: 短江学者 于 2016-03-15, 10:05:01:
回答: it summarize patterns to evaluate the cost. not based on sample route. 由 短江学者 于 2016-03-15, 10:00:16:
of the cost-to-go, so could go wrong but not often. The more intense monte carlo search then less likely to go wrong. But the combinatorics always win like in powerball
所有跟贴:
加跟贴