When the best AI algorithms meet real robots, the results... Hahahahahahahahahahahahaha
lucille from the human office
By NEXTTECH | Public Name TechMix
The age when video games started to become popular, there were a bunch of older players with bad eyes who were still handicapped, in addition to the legends who reigned over the battlefield.
A sassy set of maneuvers and still deep in the 10,000 year old fish pond. Don't be afraid, your lap is here~
What can RL do?
We've seen so many AI's that can play chess, poker, and video games that we've always had an unrealistic fantasy about AI.
All the awesome AI companies and labs will fool you: their high-end gaming robots will one day fly into the common home, and they will take you pretending to take you flying. It doesn't matter if it's DOTA, CSGO or whatever, it's a dream to score all the way.
What's more, in addition to turning you into a power party in games, their new algorithms can be used in all aspects of life - developing new drugs, controlling robots, and even teaching negotiation to computers.
Don't ask. Asking is one word, bull.
But what about the reality? The rosy scenarios they paint for you are all fluff, and who knows how reliable these advanced RL (enhanced learning) algorithms will be in other areas.
So an American startup with similar ambitions, Kindred.AI, is going to do a test run for us and put these new ideas to the test.
They started with the robots. Port these RL algorithms to the gas man and see how they perform。
The results, emmm, are hard to say... From time to time, the robot overheats and fails, and even makes the foolish mistake of tangling the cables into a ball.
Let's start with this unexplained RL algorithm.
RL, which stands for Reinforcement Learning, is a popular method for learning artificial intelligence.
Simply put, intelligences learn by "trial and error" and are guided by rewards for interacting with their environment. Just make the right choice and you will be rewarded accordingly.
Continuing with the game analogy.
In the classic shooter Doom, intelligences score points for picking up guns and ammunition, but if they take a bullet, the points that were only hot will be deducted. As time goes on, the intelligences become more and more skilled at playing Doom. Quickly exterminate the enemy and bury your head in the sand and focus on picking up gear for 20 years.
Who is the winner of the four major algorithms
Researchers at Kindred.AI tested four RL algorithms on robots that were each tasked to perform different tasks.
The four algorithms are "Deep Deterministic Policy Gradient (DDPG)", "Q-learning algorithm", "Proximal Policy Optimization (PPO) algorithm" and "Confidence Domain Policy Optimization (TRPO) algorithm".
The mice used during testing were the UR5 and Create 2 robots. The UR5 is a flexible and lightweight collaborative industrial robot arm with DynaMixel MX-64AT actuator model. And Create 2 is a floor sweeper.
The two robots are tasked with tracking objects and docking charging stations.
The researchers tested different algorithms on each of the two robots.
The entire testing process was a laborious and costly undertaking, with 450 separate experiments for each algorithm, taking over 950 hours.
All results and code are published on arXiv and GitHub. You can scroll to the end for the link~
Straight to the results, the DDPG algorithm is at the bottom and TRPO is at the top.
The secret of DDPG's success lies in the robustness of its algorithms.
The term robustness may be a bit raw, specifically, the sensitivity of AI in the face of changes in external hyperparameters. The deep learning system works well under the specific conditions set by the developers, and these hyperparameters are carefully tuned to help the machine learn patterns from the data.
Annoying skr people with hyperparameters
In fact, in the laboratory, hyperparameter sensitivity is not that critical. You can try a bunch of values and pick the one with the best result.
But once the robot gets out of the safe zone of the lab and starts to venture out into society, the choice of hyperparameters becomes critical.
If one day we were to use machine learning models to steer driving unmanned cars in real time, a small hyperparameter error in the algorithm could lead to a tragic car crash.
To put a finer point on it, the emitted light on the speed limit sign could shake out the camera, and the silly driverless car might not even know to slow down and slow down.
From a security point of view, the choice of hyperparameters has even more influence than the algorithm itself. This also means that in most cases, the standard method of pre-programming the robot using the controller is actually more effective. But RL algorithms are not without their uses.
Mahmood of Kindred.AI says that in cases where scripted or engineered solutions are unclear or not yet feasible, a high-performing scripted program comes into its own.
For example, if one is to learn to control and manipulate arbitrary objects in a dynamic context, the script needs to envision a variety of plausible scenarios and be able to interpret them.
RL Algorithm Baby: I'm just getting started in life
The meta-level scripting program is based on decades of scientific, technological and engineering advances.
The fledgling RL algorithm life experience is still a blank sheet of paper and can only be considered a budding student. It knew nothing about these tasks, and the solution was learned in a matter of hours.
The RL algorithm has a little bit of catching up to do with the script.
Also, there were ice crystals in the hardware during the training of the robot. RL algorithms encourage intelligences or robots to explore their surroundings, but often before they have a chance to learn a particular task, they go wrong in a variety of ways, wasting previous progress.
Silly as it may be, Mahmood is still optimistic about the future of RL. He is convinced that RL algorithms will spring into their own when they perform on par with traditional algorithms and will be more cost effective than scripts written by human experts.
It doesn't differ much from some use cases of robotics. In his vision, it won't be long before we see some applications based on current algorithms.
At that time, people sit at home, 500 strong come from the sky, hey hey hey... (a bronze player wakes up with a drooling laugh)
Thesis and code address.
arXiv: https://arxiv.org/pdf/1809.07731.pdf
GitHub: https://github.com/kindredresearch/SenseAct