{"id":7323,"date":"2016-04-27T05:44:00","date_gmt":"2016-04-27T05:44:00","guid":{"rendered":"http:\/\/www.kurzweilai.net\/?p=278886"},"modified":"2016-04-29T05:44:39","modified_gmt":"2016-04-29T05:44:39","slug":"public-beta-of-toolkit-for-developing-machine-learning-for-robots-and-games-released","status":"publish","type":"post","link":"https:\/\/hoo.central12.com\/fugic\/2016\/04\/27\/public-beta-of-toolkit-for-developing-machine-learning-for-robots-and-games-released\/","title":{"rendered":"Public beta of toolkit for developing machine learning for robots and games released"},"content":{"rendered":"<div id=\"attachment_279010\" class=\"wp-caption aligncenter\" style=\"width: 458px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;\"><a href=\"http:\/\/www.kurzweilai.net\/public-beta-of-toolkit-for-developing-machine-learning-for-robots-and-games-released\/humanoid-walking-training\" rel=\"attachment wp-att-279010\"><img class=\" wp-image-279010\" title=\"humanoid-walking-training\" src=\"http:\/\/www.kurzweilai.net\/images\/humanoid-walking-training.gif\" alt=\"\" width=\"448\" height=\"448\" \/><\/a><p style=' padding: 0 4px 5px; margin: 0;'  class=\"wp-caption-text\">Make a three-dimensional bipedal robot walk forward as fast as possible, without falling over (credit: OpenAI Gym)<\/p><\/div>\n<p>OpenAI (a non-profit AI research company sponsored by Elon Musk and others) has released the public beta of\u00a0<a href=\"https:\/\/gym.openai.com\/\" >OpenAI Gym<\/a>, a toolkit for developing and comparing algorithms for reinforcement learning\u00a0(RL), a type of machine learning.<\/p>\n<p>OpenAI Gym consists of a growing suite of\u00a0<a href=\"https:\/\/gym.openai.com\/envs\" >environments<\/a>\u00a0(from <a href=\"https:\/\/gym.openai.com\/envs\/Humanoid-v0\" >simulated robots<\/a>\u00a0to\u00a0<a href=\"https:\/\/gym.openai.com\/envs\/MsPacman-v0\" >Atari<\/a>\u00a0games), and a site for\u00a0<a href=\"https:\/\/gym.openai.com\/envs\/CartPole-v0#feed\" >comparing and reproducing<\/a>\u00a0results. OpenAI Gym is compatible with algorithms written in any framework, such as\u00a0<a href=\"https:\/\/www.tensorflow.org\/\" >Tensorflow <\/a>and\u00a0<a href=\"https:\/\/github.com\/Theano\/Theano\" >Theano<\/a>. The environments are initially written in Python (other languages planned).<\/p>\n<p>If you&#8217;d like to dive in right away, you can work through a <a href=\"https:\/\/gym.openai.com\/docs\" >tutorial<\/a>, and you help out while learning by\u00a0reproducing\u00a0a\u00a0<a href=\"https:\/\/gym.openai.com\/evaluations\/eval_glkKKInTm6GlmcOQRZuhQ\">result<\/a>.<\/p>\n<p><strong>What is reinforcement learning?<\/strong><\/p>\n<p>Reinforcement learning (RL) is the subfield of machine learning concerned with decision making and motor control. It studies how an agent can learn how to achieve goals in a complex, uncertain environment. It&#8217;s exciting for two reasons, according to OpenAI&#8217;s Greg Brockman and John Schulman:<\/p>\n<ul>\n<li><em>RL is very general<\/em>, encompassing all problems that involve making a sequence of decisions:\u00a0for example, controlling a robot&#8217;s motors so that it&#8217;s able to\u00a0<a href=\"https:\/\/gym.openai.com\/envs\/Humanoid-v0\" >run<\/a>\u00a0and\u00a0<a href=\"https:\/\/gym.openai.com\/envs\/Hopper-v0\" >jump<\/a>, making business decisions like pricing and inventory management, or playing\u00a0<a href=\"https:\/\/gym.openai.com\/envs#atari\" >video games<\/a>\u00a0and\u00a0<a href=\"https:\/\/gym.openai.com\/envs#board_game\" >board games<\/a>. RL can even be applied to supervised learning problems with\u00a0<a href=\"http:\/\/arxiv.org\/abs\/1511.06732\" >sequential<\/a>\u00a0<a href=\"http:\/\/arxiv.org\/abs\/0907.0786\" >or<\/a>\u00a0<a href=\"http:\/\/arxiv.org\/abs\/1601.01705\" >structured<\/a>\u00a0outputs.<\/li>\n<li><em>RL algorithms have started to achieve good results<\/em> in many difficult environments.\u00a0RL has a long history, but until recent advances in deep learning, it required lots of problem-specific engineering. DeepMind&#8217;s\u00a0<a href=\"https:\/\/deepmind.com\/dqn.html\" >Atari results<\/a>,\u00a0<a href=\"http:\/\/news.berkeley.edu\/2015\/05\/21\/deep-learning-robot-masters-skills-via-trial-and-error\/\" >BRETT<\/a>\u00a0from\u00a0<a href=\"https:\/\/openai.com\/blog\/welcome-pieter-and-shivon\" >Pieter Abbeel&#8217;s<\/a>\u00a0group, and\u00a0<a href=\"https:\/\/googleblog.blogspot.com\/2016\/01\/alphago-machine-learning-game-go.html\" >AlphaGo<\/a>\u00a0all used deep RL algorithms, which did not make too many assumptions about their environment, and thus can be applied in other settings.<\/li>\n<\/ul>\n<p>However, RL research is also slowed down by two factors:<\/p>\n<ul>\n<li><em>The need for better benchmarks<strong>.<\/strong><\/em>\u00a0In supervised (human-managed) learning, progress has been driven by large labeled datasets like\u00a0<a href=\"http:\/\/www.image-net.org\/\" >ImageNet<\/a>. In RL, the closest equivalent would be a large and diverse collection of environments. However, the existing open-source collections of RL environments don&#8217;t have enough variety, and they are often difficult to even set up and use.<\/li>\n<li><em>Lack of standardization of environments used in publications.<\/em>\u00a0Subtle differences in the problem definition, such as the reward function or the set of actions, can drastically alter a task&#8217;s difficulty. This issue makes it difficult to reproduce published research and compare results from different papers.<\/li>\n<\/ul>\n<p>OpenAI Gym is an attempt to fix both problems.<\/p>\n<p>Partners include:<\/p>\n<ul>\n<li><a href=\"http:\/\/www.nvidia.com\/\" >NVIDIA<\/a>: Technical\u00a0<a href=\"https:\/\/devblogs.nvidia.com\/parallelforall\/train-reinforcement-learning-agents-openai-gym\" >Q&amp;A<\/a>\u00a0with John.<\/li>\n<li><a href=\"http:\/\/www.nervanasys.com\/\" >Nervana<\/a>: Implementation of a\u00a0<a href=\"http:\/\/www.nervanasys.com\/openai\" >DQN OpenAI Gym agent<\/a>.<\/li>\n<li><a href=\"https:\/\/aws.amazon.com\/\" >Amazon Web Services (AWS)<\/a>: A limited number of $250 credit vouchers for select OpenAI Gym users.<\/li>\n<\/ul>\n<p>More information, including enviroments (Atari games, 2D and 3D robots, and toy text, for example) is available <a href=\"https:\/\/gym.openai.com\/\" >here<\/a>.<\/p>\n<p>&#8220;During the public beta, we&#8217;re looking for feedback on how to make this into an even better tool for research,&#8221; says the OpenAI team. &#8220;If you&#8217;d like to help, you can try your hand at improving the state-of-the-art on each environment, reproducing other people&#8217;s results, or even implementing your own environments. Also please join us in the\u00a0<a href=\"https:\/\/gym.openai.com\/chat\" >community chat<\/a>!<\/p>\n<p><iframe frameborder=\"0\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/jtXiTP96wow?rel=0\" width=\"640\"><\/iframe><br \/>\n<em> John Schulman | hopper<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenAI (a non-profit AI research company sponsored by Elon Musk and others) has released the public beta of&nbsp;OpenAI Gym, a toolkit for developing and comparing algorithms for reinforcement learning&nbsp;(RL), a type of machine learning. OpenAI Gym consists of a growing suite of&nbsp;environments&nbsp;(from simulated robots&nbsp;to&nbsp;Atari&nbsp;games), and a site for&nbsp;comparing and reproducing&nbsp;results. OpenAI Gym is compatible with [&#8230;]<\/p>\n","protected":false},"author":13,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[46,43],"tags":[],"class_list":["post-7323","post","type-post","status-publish","format-standard","hentry","category-airobotics","category-news"],"_links":{"self":[{"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/posts\/7323"}],"collection":[{"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/users\/13"}],"replies":[{"embeddable":true,"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/comments?post=7323"}],"version-history":[{"count":1,"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/posts\/7323\/revisions"}],"predecessor-version":[{"id":7324,"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/posts\/7323\/revisions\/7324"}],"wp:attachment":[{"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/media?parent=7323"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/categories?post=7323"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/tags?post=7323"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}