Q-learning is approximate dynamic programming with expectation

送交者: 短江学者于 2016-03-15, 10:05:01:

回答: it summarize patterns to evaluate the cost. not based on sample route. 由短江学者于 2016-03-15, 10:00:16:

of the cost-to-go, so could go wrong but not often. The more intense monte carlo search then less likely to go wrong. But the combinatorics always win like in powerball

所有跟贴:

加跟贴