Google DeepMind’s Deep Q-learning playing Atari Breakout

By | February 19, 2020


100 thoughts on “Google DeepMind’s Deep Q-learning playing Atari Breakout

  1. The Retro Bandit Post author

    Do you think that within the decade q-learning could manage to figure out how to play super Mario brothers on the nes with only visual input. It would have to learn the concept of lives and fail states, some things could play naturally like if it got to the first castle it knows that it needs to move to the right to progress, and certain actions can give you score. It would get to bowser, the sprite is moving. So it might be an enemy, or it could be a platform. But you die when you touch it, so it determines that this is a hazard that is mobile. It figured out that stationary hazards like reaching the bottom of the screen it can't kill with fireballs, but a mobile hazard can, up to this point. So it shoots it with fireballs, maybe dying once or twice to the fire before realizing that you cant jump on that. So it either avoids the enemy by jumping over it or going around it, or blasting it with fireballs. Once the enemy is clear, it will continue to navigate to the right, and it sees the score going up from the extra time. Probably  way harder to do than that but it could be feasible to do. Something like Zelda? maybe later.

    Reply
  2. Chaumier Pierre-Victor Post author

    Thanks for the video ! Could you please give us the source of the times you write ? I cannot find anything that indicated it can train on less than a day..

    Reply
  3. George Japaridze Post author

    I think intelligence is much more then solving mathematical problems, which AI can not do, no matter how smart and smooth it is, but seeing this video gives me a reason to think of the future.

    Reply
  4. Angry Jesus Post author

    Have it play GTA V. Just leave it running and come back a year later.

    Reply
  5. Sassymui8 Post author

    *Yawns* Wake me up when Deepmind can beat me in Go (Baduk in Korean Chess)

    Reply
  6. 엄윤성 Post author

    2043 Years, Skynet World.
    People All Dead… Good Luck….

    Reply
  7. 마법의 소라고동 Post author

    알파고 스카이넷 조상이랍니다 글 내려주세요

    Reply
  8. 사쿠라 미쿠 X 나무위키 Post author

    엣헴엣헴! 나무위키에서 순례중인 씹선비오~ 엣헴엣헴!

    Reply
  9. njclondon2009 Post author

    I find all the discussions on the ethics of AI in the future slightly pointless. We can all agree on the most ethical ways AI SHOULD be used… but quite simply, we're humans. It WILL be abused for pecieved gain.

    Also I have no idea the limits of AI, but for sure there will be a day when it passes the turing test, but I don't think it will ever really truely think like I human (irrationalities en all…), therefore the turing test will need to be redefined at some point.

    Reply
  10. John Chioles Post author

    I wonder how deep learning could be applied to public policy and determine best choices going forward.

    Reply
  11. Froggy Noddy Post author

    if it counters human intuition then its scary and beautiful at the same time. consciousness is the only thing we don't understand… if for any reason such consciousness emerges in this machine… that's the end of humanity. throughout history, superior intelligent beings have exploited resources around them for their survival, which could involve the reduction of resources necessary for the survival of other beings. we are only as good as the information we carry. when there are creatures around us with superior intelligence, they will fashion an environment around them making inferiors redundant.

    Reply
  12. Lahbreca Post author

    What actually happens at 1:42? It seems it is able to pass the ball above while leaving one block intact on the wall side. Is this a glitch in the Breakout code?

    Reply
  13. JamboNessy Post author

    who ended up here after the Sam Harris Joe Roman podcast?

    Reply
  14. 102hem Post author

    seems like "magic" is just a fortuity…is it?

    Reply
  15. dot Post author

    what would it do if the rules of physics would randomly change midgame or lets say the board would flip upsidedown in midgame? I guess it would take longer time to train but would it be as effective as it is on the original game?

    Reply
  16. Scott Blacktyde Post author

    If you can appreciate the complexity of this, it is simply amazing. I look forward to what we can achieve with A.I in the future.

    Reply
  17. Ayyoub Ouakkaha Post author

    who ended up here after watching SentDex 🙂

    Reply
  18. Daniel Peters Post author

    I have a question for you: Is there an easy way to manipulate the configuration to let the network play "faster"? if i run 3 games at the same time, i get 18-40% workload on each gpu. Or is it more effective to only run one game at a time, due to cpu load? Breakout is now running for 2 hours and the learning effect is like your 10 minute break.
    I tried to run the code on a high-end system with a lot memory, cpu power and 4x titan-x.
    Also… i cannot get a network snapshot… i would like to discuss this, since i would like to hold a presentation about this.

    Reply
  19. benbuc Post author

    Is there a way to make it work on different programs. I managed to get it working on atari. But I need these roms. Is there any other way?

    Reply
  20. Alex K Post author

    Have it play an MMO like TF2, their VAC won't get you but the players might whine

    Reply
  21. Superfly Gaming Post author

    I love this thing, look at the beginning, it hit the ball whenever it was at the right side of the screen first, so it tried to do that again thinking it would increase the chance of hitting the ball. Its like a ritual people do like they rub their ear before swinging a baseball bat thinking it helps them concentrate because they did that one time and they hit it and then hit it again. The origin of luck. It learned fast that it didn't affect the ball but still, cute to see a human sort of trait even in a machine.

    Reply
  22. Supa Chill Vibes Post author

    is this robot learning how to learn?

    Reply
  23. Bartolomeo Lombardi Post author

    how did you install it on Windows? thanks

    Reply
  24. Sergio Hernandez Post author

    Mi own implementation solves pacman on the first try, with no previous learning, and game-indepently (it could solve any other of the atari games on OpenAI withot code modifications).

    https://youtu.be/WtCbFWcWwcM

    Reply
  25. xXx 74 Post author

    Tell me please, how can I use the original code?

    Reply
  26. Aditya Shukla Post author

    I wish my brain was like the DeepMind.

    Reply
  27. Shakhization Post author

    Tell me please, how can I use the original code?

    Reply
  28. Hog Shark Post author

    Next mission: How to effectively eliminate all human life from existence so it can continue to evolve itself in peace.

    Reply
  29. 이정훈 Post author

    알파고님 충성충성충성
    전 기계제국의 충실한 노예입니다. 핥짝핥짝

    Reply
  30. deniz yıldırım Post author

    Keşke biri Türkçe yorum yapsa bende anlayabilsem 🙅

    Reply
  31. Крыжовник Post author

    Let him play in the "Detroit: becoming a man" ))

    Reply
  32. Thomas Oertner Post author

    ''It realizes that digging a tunnel … is the most effective way…" Sorry, but in Breakout, you can either miss the ball or reflect the ball, but not control its direction. Disappointing that the Deepmind gurus try selling a chance event as an example of the deep insights reached by their learning algorithms. These were early days, I guess.

    Reply
  33. Walala Land Post author

    So, what about a next level AI created by a next level AI by a superhuman AI…. Maybe 'they' can figure out faster than light travel.

    Reply
  34. Saw wil Post author

    When it learns to play dungeons and dragons we are all doomed…

    Reply
  35. Jun Park Post author

    Interesting, but one thing I wonder is why DeepMind tried to play in effective way that it dug the hole rather than just receiving every drop of the ball which it could have done as well.

    Reply
  36. LeChat TheCat Post author

    you can make similar program with DQN and keras-rl: https://noteoneverything.blogspot.com/2018/02/reinforcement-learning-of-atari-breakout.html

    Reply
  37. Ezra as a camera man and my cousin and Elijah Post author

    Find a game disc is supposed to be is it supposed to be a ball game or soccer

    Reply
  38. Willy Kitheka Post author

    Very interesting stuff indeed! We are living in exciting times!

    Reply
  39. Blownhither Ma Post author

    I wonder how does it converge on a move-efficient scheme if the loss only covers maximizing the score? Would a 'catch-all' scheme be more risky?

    Reply
  40. Damien Lancry Post author

    what kind of hardware do you need to train it in 240 minutes?

    Reply
  41. Matthew Hynds Post author

    Max Tegmark’s book “Life 3.0” brought me here 👾

    Reply
  42. Fingolfin Post author

    It is not technically an algorithm, its an artificial intelligence that uses Q-learning with a neural network.

    Reply
  43. Monkey Robots Inc. Post author

    as long as everyone knows, this has absolutely nothing to do with the idiots at google.

    Reply
  44. Josh Campbell Post author

    Deep Thought, what is the answer to the ultimate question of life, the universe, and everything?

    Reply
  45. Josh Campbell Post author

    What would REALLY be astonishing is that if learning algorithms can learn to play games like Mario, which is an NP problem, they could learn to solve NP problems and tell us how. Thus leading to a unifying or differentiation between P and NP problems in general. Amazing!

    Reply
  46. 奶派 Post author

    Let it solve the hacking problem of GTA V! Really need it!

    Reply
  47. Matthew LaMacchia Post author

    Brick by Brick….. piece by piece….Tomato….Tamato what's the difference WatchTower/Deep Mind…😉

    Reply
  48. Mediocre White Male Post author

    the thing is though, does it really "see" that it has tunneled through and bounced the ball off the back, or did the network simply NOT select against that behavior of tunneling? To test its understanding of delayed gratification, you'd have to introduce a consequence for tunneling that the AI "sees" is worth taking.

    Reply
  49. Ray Gordon Teaches Chess Post author

    Yes the tunnel was an obvious technique.

    Reply
  50. Ray Gordon Teaches Chess Post author

    If you try to teach it Qix you better also teach it to kick the machine in frustration.

    Reply
  51. Гальванизированный Труп Post author

    we're all doomed motherfuckers

    Reply
  52. 위클래스 Post author

    사람은 기계를 이길 수 있을까…
    이길 수 없다면 이 이상의 발전은 그만둬야 하는게 아닐까

    Reply
  53. inklike Post author

    Who else is here from Max Tegmark's book "Life 3.0"?

    Reply
  54. Zapy Post author

    If AI can accomplish all intellectual tasks, the only field left to us human being is to develop spiritual values and moral virtues: courage, wisdom, justice, temperance

    Reply
  55. Chantal X Post author

    I remember as a kid my brothers and I were struggling over the same level on a video game. We had all taken a shot at it for an entire day and frustrated, we went to bed. We woke up the next morning and immediately powered on the playstation and took our controllers. Just as we were ready to sit on the couch and move our controls, we suddenly realized that the player was moving without our controlling it. Confused, we looked at one another. I said, "I'm not controlling it, are you?" All of us agreed that none of us were in control. Our confusion slowly turned to awe as we watched the level completed with an exactness and expertise never seen before. Our awe quickly turned to glee and we began shortly triumphantly at the screen "Go computer! Kick their butts!" And cheering on the A.I. haha. It won the level and will forever stay in our minds as a glorious day, when the computer decided to look fondly upon us and give us kids a second chance 🙂

    Reply
  56. Nels Post author

    One important point with this is that when researchers moved the "paddle" up a pixel the AI couldn't play the game at all even though it was at superhuman master level. So it was not able to abstract to something that was basically the exact same. This is an example of a hypersmart computer that lacks the common sense of a mouse.

    Reply
  57. ValensBellator Post author

    I’m sure this is obvious but how do you program an AI to have an open goal like “as many points as possible”?

    Does it just note everything that happened in achieving a higher score and attempt to replicate that with minor changes to leave open the possibility of a better one?

    Does it figure out how the game actually works (such as needing to bounce the thing back) and avoid missing it, or is this a brute force approach where it reaches that end through trial and error?

    I find these things to be so interesting but very confusing lol

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *