NeuroEvolution – Flappy Bird

w23j · on Nov 10, 2016

This is one of these fantastic "all that is done with so little code?!" moments.

Amazing that this can be done in 300 lines of perfectly readable JS without any libraries. And the author apparently wrote it in 2 days. Great intro to genetic algorithms and reinforcement learning. Universities should teach like that.

Edit: Somebody has asked about the big picture how the thing works:

You use a neuronal network to decide at each step to flap the birds wings or not. That is the only output. The input is only the birds height (y-position) and the height of the next hole. (https://en.wikipedia.org/wiki/Feedforward_neural_network)

The question now is how to find the correct weights for the net.

That net is not trained in a traditional supervised way like with gradient descent (which would be more complicated). Instead it uses a genetic algorithms to find new weights for neuronal networks of future generations. That is use some of the best individuals. Some randomly generated ones and some created by breeding from existing individuals. (https://en.wikipedia.org/wiki/Genetic_algorithm)

And that's basically it in this case.

placebo · on Nov 10, 2016

These are the exactly gems that I am constantly in search of, and thrilled every time I come across one.

This is going to turn out to be a rant, but sometimes it feels like there's a culture of solving problems by a philosophy of "let's throw more money at it, more technology at it, more people, more unnecessary abstraction layers, complexity and bloat - it's bound to be solved eventually". And with enough fire power, it usually is - but at what cost... I'm not sure what the source of the problem is though, and it's not only with code. Many times I struggle to understand certain concepts via professional literature, and when I finally understand them, I'm puzzled as to why they are taught in a way that if I didn't know better, I'd assume a deliberate attempt to confuse the reader instead of making the learning simple, intuitive and fun.

Exactly the same applies to code I've had to refactor and sometimes rewrite over the years. Obviously, some of it has to do with trying to get a product out in crazy schedules which requires making compromises, but I can tell the difference between compromise and "I don't really care" attitude, and I'm talking about the latter. It's a pity though - not only for the one that has to sort out and sometimes clean up the mess, but also for the one that creates it. Things are so much more fun where you care and are passionate about what you do, that I'm sorry for those who are missing out on it.

End of rant. Not sure it belongs here but had to get it out of my system :-)

taeric · on Nov 10, 2016

Sounds like you are noting that brute force is a very effective technique. Combined with explaining being a tough skill.

It is seductive to want things to be elegant. It is costly to wait for the elegant solution.

placebo · on Nov 10, 2016

If that's all it was then I wouldn't bother to bring it up. What I'm saying is that more focus on the beauty of a solution (e.g robustness, scalability, flexibility, simplicity) can be a more financially sound solution in the long run than brute force. Of course, I'm not holding my breath for this to become the norm since using brute force to "get what we want now" is popular because it gets fast results (and this not limited to the software industry) - I'm just suggesting the real costs come later.

As for explaining being a tough skill - what I find in common with the previous point is the lack of stress on keeping things as simple and obvious as possible. If you truly understand something, you should be able to convey that understanding assuming you really want to - it just takes investment, usually by asking yourself why it is obvious to you, then putting yourself in the other party's shoes and then finding the shortest and clearest path to bring them from where they are to where you are. In my experience it's not a tough skill - just requires introspection into your own understanding and not assuming anything about what the other person understands.

taeric · on Nov 10, 2016

I was not trying to dismiss what you said. If anything, I was condensing it into my understanding.

The basis is not that brute force is better. The essence is most of us are not seeking an elegant solution to a problem. We are seeking a solution to a problem. Often, just getting that answer is all that matters. Finding a more concise way to get it is something I fully agree that someone should be trying to do. And, in the long term, it is a huge boon if it is found. For most tasks, though, the original solution is all that was needed.

esquivalience · on Nov 10, 2016

Linked Observation: it is not very reliable in repetition.

I ran the sim a few times and got wildly different results. - First time (about average) it stabalised (scopre >10,000) at Generation 18.

- The quickest stabalisation was at Generation 3

- sometimes it got to Generation 50 without stabalising.

Brute force will eventually get there, but I guess based on so few parameters it is easy to create a misleading weighting and difficult to un-learn that.

taeric · on Nov 10, 2016

Right. But consider that if we just want a network that can play the perfect game of flappy bird, the correct answer is to find the parameters of a success and store those. Anything else is then just excess data. (This is calling the program that got the answer data. That happened to have been executed to get the perfect player.)

gioele · on Nov 10, 2016

Interestingly, the updates are done on the basis of only 2 external inputs [1]: the height of the bird in the screen and height of the aperture in the next pipe. Using only these two parameters the neural network decides whether to flap or not.

I would had expected at least also the horizontal distance from the next pipe...

[1] https://github.com/xviniette/FlappyLearning/blob/gh-pages/ga...

fenomas · on Nov 10, 2016

Wow, that's really unexpected.

This isn't my field, but given the simplicity of the inputs and network, and the way commenters are seeing the demo achieve perfect play after anything between 2 and 200 generations, it makes me wonder if this isn't more of a brute-force search than actual learning?

That is, it smells like there's a "correct" set of neuron values - where any genome within some tolerance of those values wins forever, and any other genome dies quickly. If that's the case, the system can't really evolve towards a solution, can it? It would just cycle randomly through lots of genomes that die immediately, until by pure chance one lives forever. I only tried the demo a few times but that's what it looked like it was doing.

matsemann · on Nov 10, 2016

"All" evolutionary algorithms are basically a local search with smart heuristics. Where a local search is a brute-force where you move in small directions based on feedback on where you are in the solution space.

fenomas · on Nov 10, 2016

I understand how the algorithms work. What I'm suggesting is that the demo seems to behave like a hill-climbing algorithm that's been unleashed on a terrain that's flat everywhere except the solution.

matsemann · on Nov 10, 2016

Ah, I see what you mean now. Yeah, the problem space seems a bit simple. I never see any "learning" before it suddenly achieves perfect play.

raverbashing · on Nov 10, 2016

Not really, if you add new individuals from random then you're doing some global search (or just have a higher mutation rate - but that has some problems)

shmageggy · on Nov 10, 2016

Given the perfect play reported elsewhere, this suggests that the holes are just tall enough to accomodate a single flap, so the network essentially just has to learn the less-than function and then tune the threshold.

edit: With sigmoid or hard-threshold activation, this function is really simple to implement. If we want to flap iff bird is lower than pipe, we can do that with no hidden layers and a weight vector of [1 -1]. I'd be curious to see someone fork and implement this.

hanoz · on Nov 10, 2016

> Given the perfect play reported elsewhere, this suggests that the holes are just tall enough to accomodate a single flap, so the network essentially just has to learn the less-than function and then tune the threshold.

Yes, slightly disappointingly the neural network can be replaced by the line:

if((this.birds[i].y - 70) / this.height > nextHoll) this.birds[i].flap();

onion2k · on Nov 10, 2016

The pipes are evenly spaced so horizontal distance just goes away.

hanoz · on Nov 10, 2016

The fact that the pipes are evenly spaced doesn't make horizontal distance go away, their distance is still a variable at each flap decision.

What does make the distance irrelevant is that the holes are high enough to safely flap whilst inside them, so you never have to make a timed leap through, a luxury not afforded to users of the original game if I recall.

a1k0n · on Nov 10, 2016

Yes. And the optimal policy is trivial given those inputs, as long as the aperture between pipes is larger than the "jump" height of the bird: if bird y + bird y velocity > bottom pipe y, jump. So finding that with a neural network or genetic algorithm is fairly silly.

If the aperture is smaller than the jump height, then you need to do something smart to time your jumps.

JoshTriplett · on Nov 10, 2016

At generation 83, a single bird emerged that successfully navigated through several hundred columns (current score 100000+) and showed no signs of failing.

It took a few dozen generations to find versions that would make it through a few columns if those columns had gaps without too much vertical distance between them. Somewhere in generation 60-70, I could see versions figuring out how to transition between heights, but failing by overshooting, or not accounting for gravity on the far side of the gap. But once it figured out how to transition between distant heights, it had something that kept working.

GIF of the successful bird: https://imgur.com/a/A0li8

mparlane · on Nov 10, 2016

53 for me. The previous generations didn't even look close then all of a sudden this one.

xenadu02 · on Nov 10, 2016

Same. Mine is on generation 22 and is the first one to get past even 30 seconds. So far no sign of stopping.

edit: Gen 22 still going strong @ 1 bird, up to 231,000 points. Looking at game.js it's a fairly simple model, I suspect once you get a bird that makes it past the 1-minute mark it will be able to continue forever.

Udik · on Nov 10, 2016

2nd generation for me. Very suspicious.

EDIT - to be more clear: the first generation died in a couple of seconds. A bird from the second generation was immediately able to navigate forever. Either this was my chance to win the national lottery (what a way to waste it) or something's wrong.

self_awareness · on Nov 10, 2016

For me it was 1st generation and I was wondering what this project was all about. Until I hit F5, now it's 50 generations and still to successfull bird in sight.

simcop2387 · on Nov 10, 2016

What's interesting is I've run it several times now, and I've gotten the perfect bird about half the time on an early generation (<10), but sometimes it just doesn't want to find it for a while. I'm wondering if the net is just at the threshold of being the minimal set of neurons that are needed for it to be able to get it so easily by random chance.

smoyer · on Nov 10, 2016

I was surprised to get a winner on generation 7 the first time I started the trainer. Subsequent trials were much worse.

k__ · on Nov 10, 2016

I was surprised that I got one that get really far and then in the next generation they failed right at the start

smoyer · on Nov 10, 2016

In a GA approach, you'd have kept that bird AND mutate/breed offspring from it. In a NN, you often get oscillation during training.

Houshalter · on Nov 10, 2016

This is a GA. I just think it doesn't practice elitism (saving the best solutions found so far.) Or it's nondeterministic.

k__ · on Nov 10, 2016

The problem seems to be the map.

It's always different and you can get to 1000 points if you are lucky with the map. But the one that got to 1000 might not get a gap that didn't show up in the last map.

saganus · on Nov 10, 2016

Wow, I guess I was very lucky.

I am running right now a bird from generation 3 going at 26k+ score.

How is it possible to get such good specimens after so little generations? I'm guessing part of it is the simplicity of the model and the environment (the game), but still.

hayksaakian · on Nov 10, 2016

gen 6 seems to have gotten it right for me, still running, ill follow back when it loses edit: still running at 4600+ 5 minutes later on gen 6

iMerNibor · on Nov 10, 2016

took 230 generations for me to do anything then it just went from dying within the first 3 pipes to mastering it within 4 generations

alimbada · on Nov 10, 2016

Similar experience here. Went from dying within the first 3 pipes to a perfect run at generation 68.

oxalorg · on Nov 10, 2016

On my first run, 2 individuals learnt(?) perfect play in 3 generations.

eric_h · on Nov 10, 2016

On my second run, 3 individuals seemed to have learned perfect play by gen 8, but after successfully navigating through dozens of pipes, two of them died and one continued indefinitely.

arcatek · on Nov 10, 2016

Gen 17 had its last bird run forever

jjar · on Nov 10, 2016

Generation 10 for me

lloeki · on Nov 10, 2016

gen 9, lots of close calls, but still flying fine

manmal · on Nov 10, 2016

57 for me.

ChicagoBoy11 · on Nov 10, 2016

This reminds me of a section (I think it might be the beginning) of "Surely you're Joking, Mr. Feynman" where he recounts fixing radios as a youngster, and the outrage of a client of his that he at times would just sit and stare at the radio. Apparently the guy went around town telling people about this incredible thing he had witnessed -- this kid he knew fixed radios "by thinking."

For every incredibly complicated algorithm designed to solve some really huge problem and deployed at massive scales, there are another thousand little problems at a far lower scale. I did a project at a school once using a GA to automatically create class rosters based on gender balance, student social networks, and grade distribution. This used to be a two-week long guidance counselor project at the school. Now they press a button.

Everyone in this forum clearly understands the impact that software is having on the world. But nevertheless, I still think we underestimate the impact that a more computationally literate society with more readily available computing power can have. I think examples like this excellently written piece of code highlight that.

Here is one -- admittedly very bright -- person who can from the comfort of his browser come up with such a fascinating and ingenious solution to this computational problem. Yes, it is being used for flappy bird, but this would easily extend to maybe a whole set of domain of problems that he/his company/his community could be facing. Yes, it is flappy bird, but this alone could serve as the motivation for an incredible course in a high-school dealing with topics from coding to the theory of evolution. "The Evolution of Cooperation" had a similar effect on me -- Axelrod uses nothing but very elementary Algebra to make such a strong and insightful argument that got me thinking like nothing ever had before.

Here's hoping we make this kind of "play", and this sort of thinking, as widespread as possible in society.

m3adow · on Nov 10, 2016

When I first watched it, nothing changed for about 200 generations (thanks for the x5 speed up). I wanted to comment on that, but decided to reload first. Now my 12th generation is flapping for a good two minutes while im writing this comment.

It seems only the second time there was machine learning involved.

taneq · on Nov 10, 2016

Evolution is generally characterized by 'punctuated equilibrium', it sounds like you just started on a plateau.

_kush · on Nov 10, 2016

On generation 7, I got 3 perfect birds http://kush.im/ZeHe

wapz · on Nov 10, 2016

Can anyone give me insight on where the code learning is? I cloned the repository and after a quick glance I don't see where the "meat" is.

cedricblack · on Nov 10, 2016

It is a using a genetic algorithm so learning lies in the mutation and crossover of chromosomes as well as selection of the fittest individuals.

https://github.com/xviniette/FlappyLearning/blob/gh-pages/Ne...

Fannon · on Nov 10, 2016

It's also using a simple neural network, which is the target of the genetic algorithm, if I understood it right. I haven't seen this combination often - does that make sense in general, or is this just interesting as in playing around with those concepts?

cedricblack · on Nov 10, 2016

In a way that's what nature does :) I think this combination is quite common.

While visiting a lab in Paris, I saw some researchers using neuroevolution to teach a bird-like robot how to fly.

some links related to neuroevolution:

https://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_t...

http://eplex.cs.ucf.edu/hyperNEATpage

https://github.com/colgreen/sharpneat

deepnet · on Nov 12, 2016

Otoro has an excellent page graphically explaining the neuroevolution of an opponent for Slime Volleyball.

http://blog.otoro.net/2015/03/28/neural-slime-volleyball/

stefs · on Nov 10, 2016

it's a pretty standard procedure to train NNs through GAs but usually not very efficient (e.g. compared to backtracking).

in some cases you might lack an easy way to calculate a fitness score out of the NN performance, which is needed to run the GA.

i tried training a simple NN with a stupid hill climber some time ago but quickly hit a roadblock even with very few neurons because of local minima ... or maybe bugs.

i guess for more complicated problems the pure GA training method might just not be "cost effective" (i.e. time/quality tradeoff).

matsemann · on Nov 10, 2016

> it's a pretty standard procedure to train NNs through GAs but usually not very efficient (e.g. compared to backtracking).

Different applications, though. Backtracking in the normal sense needs input and expected output (e.g. lots of training data), while GA/EA learns to solve it without explicit wrong/correct actions, just the score at the end.

matsemann · on Nov 10, 2016

It's an easy way to train a neural network when you cannot give it a long list of input/output pairs.

Are there other good ways of training a neural network in this kind of scenario?

wapz · on Nov 10, 2016

Thanks for the link to the location and explanation! This is fascinating.

Novex · on Nov 10, 2016

Hmmm.. I made the pipes move up and down and now the neural network isn't doing so well :P

I've added "vertical velocity of next pipe" and "distance until next pipe" as inputs as well as upped the hidden layer neurons to 4 - anyone have any other ideas to improve its performance?

RickS · on Nov 10, 2016

As somebody who's never looked into this kind of task before, I couldn't believe it was powered by a ~300 line file and not some massive library.

Really awesome.

SCHiM · on Nov 10, 2016

Ahh yes, this is very interesting although I agree with many of the posters here that the model is probably too simplistic even for demo purposes.

For programmers there are even more striking examples of AI out there, complete with code, documentation and explanations.

Check this out:

http://www.primaryobjects.com/2013/01/27/using-artificial-in...

In me this sparked a months long fascination with genetic algorithms solving programming puzzles with a few sets of 'initial memory states' and 'desired memory states'.

The articles start with very simplistic generated programs, and end with a sophisticated model able to generate guess-the-number games using loops, conditionals and handling input from users.

james_a_craig · on Nov 10, 2016

Generation 7 produced a perfect result here. I think the underlying problem is just too easy, really.

staticelf · on Nov 10, 2016

This is really cool! I am at generation 17 and it shows already no signs of failing.

reustle · on Nov 10, 2016

This is really cool, and I'm nowhere near smart enough to understand fully how it works, but I'm wondering. Is the "computer" blind in this case? It doesn't seem to understand, even after 50~ runs, that it needs to aim for the opening. It seems to simply go after luck, hoping that some line up. And if it lines up a few times, it will get further, so it is learning something, but still not seeing the change the next time.

Edit: Seeing the new comments here, it seems like it only starts learning after you refresh once or twice. Very cool!

hathers · on Nov 10, 2016

Hit 1M points, 73 generations! Little guy just keeps on going. http://imgur.com/a/HFwRC

NicoJuicy · on Nov 10, 2016

Where can i learn more about NeuroEvolution or similar algorithms?

It's amazing this fits in thos LoC, but i don't see the total picture.. Any books you can recommend?

w23j · on Nov 10, 2016

The big pictures in this case is:

You use a neuronal network to decide at each step to flap the birds wings or not. That is the only output. The input is only the birds height (y-position) and the height of the next hole. (https://en.wikipedia.org/wiki/Feedforward_neural_network)

The question now is how to find the correct weights for the net.

That net is not trained in a traditional supervised way like with gradient descent (which would be more complicated). Instead it uses a genetic algorithms to find new weights for neuronal networks of future generations. (https://en.wikipedia.org/wiki/Genetic_algorithm)

And that's basically it in this case.

chlee · on Nov 10, 2016

Ken Stanley came up with the idea of NeuroEvolution while he was a phd student at the Neural Networks lab at UT-Austin.

You can see some of the older NeuroEvolution papers here: http://nn.cs.utexas.edu/project-list.php

I believe Ken Stanley has gone to become a professor at UCF to continue his research: http://www.cs.ucf.edu/~kstanley/

deepnet · on Nov 12, 2016

http://blog.otoro.net/2015/01/27/neuroevolution-algorithms/

Otoro explains neuroevolution and demonstrates a Silme Volleyball opponent evolving

XaspR8d · on Nov 10, 2016

I got a perfect evolution... on generation 1. Sooo I guess who needs genetics when you're born perfect?

(Naturally I came to the comments because I thought this was BS)

JasonSage · on Nov 10, 2016

I accidentally left this open in another tab for several hours and it got to over 550k score on generation 35. That was fun to see!

jeanlucas · on Nov 10, 2016

Here on generation 33 the games started to have more duration, more than 100k score. Curious.

EDIT: actually generation 33 is still happening, even on "5x" mode, 260k points and counting. Looks like it totally got how the game works.

guiomie · on Nov 10, 2016

The first time I ran it, it seemed to have "learnt" perfectly after 4 generation ... Now im at 16 still no luck.

michaelsbradley · on Nov 10, 2016

For fun, try replacing the setTimeout usage in game.js with calls to setImmediate[+]. My gen 57 birdie (x4) is increasing its score by approx 2,000 points/sec.

[+] https://github.com/YuzuJS/setImmediate/

pokpokpok · on Nov 10, 2016

awesome! if anyone is interested in this, i wanted to show around my similar neuro-evolution toy, pond: http://maxbittker.github.io/pond/

these agents live in a world where they're trying to find and eat food

thomasdd · on Nov 10, 2016

Cool. 16 generations and that's it!

michaelsbradley · on Nov 10, 2016

Exercise: can you find in the Neural Network Zoo the closest match for the Machine Learning code that powers this demo?

http://www.asimovinstitute.org/neural-network-zoo/

ReverseCold · on Nov 11, 2016

My 3rd generation flappy bird seems to be invincible.

http://i.imgur.com/Zz82dp4.png

Will update when/if it dies.

EDIT: How lucky did I just get? Still going at 100k+

a_imho · on Nov 10, 2016

Gen 37 on Chromium

Not worked on FF, maybe it has to do with my plugins/privacy settings.

turtleofdeath · on Nov 10, 2016

Watched this for a solid three minutes and flappy birds never made it past the third column. Painful.

Edit: generation 25 got 10k points and was continuing onward when I stopped. Beautiful.

NicoJuicy · on Nov 10, 2016

I've i'm not mistaking, the deciding factor should be how many pipes it past.

But here it is: "should it flap the wings or not"

What's the difference? Is this a better way?

Lich · on Nov 10, 2016

When I closed it (played some games, then checked on it after about 1.5 hours), it was generation 22, Alive 2/50, Score 561,000.

NicoJuicy · on Nov 10, 2016

Any implementation/advise on how to speed up the algorithm by doing on or 2 supervised learnings or how to implement this?

wruza · on Nov 10, 2016

We need another genetic network that will play against successful birds, posing gates in a way that will kill them.

Robot wars, all that!!!

ishaanbahal · on Nov 10, 2016

Wow, this was so cool. Generation 17, two birds, 18K points, still going. I wan't to learn machine learning now...

ericmo · on Nov 11, 2016

Has anyone tried on Firefox? I'm running FF49 and in gen 50 it has less than 400 max score.

nhatbui · on Nov 10, 2016

Generation 48. What about you guys?

girvo · on Nov 10, 2016

Generation 10, which was weird, but got to 70k points before I closed it. I wonder why mine was so much lower, sheer luck?

icelancer · on Nov 10, 2016

Yep. A good metaphor for evolution in vivo.

SteBu · on Nov 10, 2016

Generation 12. 12k+ http://imgur.com/a/PtvLI

eriknstr · on Nov 10, 2016

I left it open in a tab while I was out for several hours.

Generation 11, score 1M+. http://imgur.com/GjpAFAz

saberworks · on Nov 10, 2016

Generation 22 @ 17k points and rising. Looks perfect now?

amelius · on Nov 10, 2016

Does it still learn if there is only 1 bird left in the simulation?

anotheryou · on Nov 10, 2016

haha, generation 3 and already at score > 10000

one lone hero

turtleofdeath · on Nov 10, 2016

Can this be used on, say, Mario Bros?

notgood · on Nov 10, 2016

You mean this script? Or machine learning in general? If the later then it has already been done: https://www.youtube.com/watch?v=qv6UVOQ0F44 (great educational video)

jeanlucas · on Nov 10, 2016

looking at the author's github profile he did a lib to abstract the artificial intelligence and used in some other games.

dharma1 · on Nov 10, 2016

Deepmind's DQN (reinforcement learning) has been used to play Mario, learning from pure pixel data- http://www.ehrenbrav.com/2016/08/teaching-your-computer-to-p...

monoid · on Nov 10, 2016

didn't have the original flappy bird a randomized tube placement?

alpineidyll3 · on Nov 12, 2016

Fantastic post

baboun · on Nov 10, 2016

Together as one, only the strong will survive.