The researchers tested different algorithms on each of the two robots.
The entire testing process was a laborious and costly undertaking, with 450 separate experiments for each algorithm, taking over 950 hours.
All results and code are published on arXiv and GitHub. You can scroll to the end for the link~
Straight to the results, the DDPG algorithm is at the bottom and TRPO is at the top.
The secret of DDPG's success lies in the robustness of its algorithms.
The term robustness may be a bit raw, specifically, the sensitivity of AI in the face of changes in external hyperparameters. The deep learning system works well under the specific conditions set by the developers, and these hyperparameters are carefully tuned to help the machine learn patterns from the data.
Annoying skr people with hyperparameters
In fact, in the laboratory, hyperparameter sensitivity is not that critical. You can try a bunch of values and pick the one with the best result.
But once the robot gets out of the safe zone of the lab and starts to venture out into society, the choice of hyperparameters becomes critical.
If one day we were to use machine learning models to steer driving unmanned cars in real time, a small hyperparameter error in the algorithm could lead to a tragic car crash.
To put a finer point on it, the emitted light on the speed limit sign could shake out the camera, and the silly driverless car might not even know to slow down and slow down.