This is one of these fantastic "all that is done with so little code?!" moments.
Amazing that this can be done in 300 lines of perfectly readable JS without any libraries. And the author apparently wrote it in 2 days. Great intro to genetic algorithms and reinforcement learning. Universities should teach like that.
Edit:
Somebody has asked about the big picture how the thing works:
You use a neuronal network to decide at each step to flap the birds wings or not. That is the only output. The input is only the birds height (y-position) and the height of the next hole. (https://en.wikipedia.org/wiki/Feedforward_neural_network)
The question now is how to find the correct weights for the net.
That net is not trained in a traditional supervised way like with gradient descent (which would be more complicated). Instead it uses a genetic algorithms to find new weights for neuronal networks of future generations. That is use some of the best individuals. Some randomly generated ones and some created by breeding from existing individuals. (https://en.wikipedia.org/wiki/Genetic_algorithm)
These are the exactly gems that I am constantly in search of, and thrilled every time I come across one.
This is going to turn out to be a rant, but sometimes it feels like there's a culture of solving problems by a philosophy of "let's throw more money at it, more technology at it, more people, more unnecessary abstraction layers, complexity and bloat - it's bound to be solved eventually". And with enough fire power, it usually is - but at what cost... I'm not sure what the source of the problem is though, and it's not only with code. Many times I struggle to understand certain concepts via professional literature, and when I finally understand them, I'm puzzled as to why they are taught in a way that if I didn't know better, I'd assume a deliberate attempt to confuse the reader instead of making the learning simple, intuitive and fun.
Exactly the same applies to code I've had to refactor and sometimes rewrite over the years. Obviously, some of it has to do with trying to get a product out in crazy schedules which requires making compromises, but I can tell the difference between compromise and "I don't really care" attitude, and I'm talking about the latter. It's a pity though - not only for the one that has to sort out and sometimes clean up the mess, but also for the one that creates it. Things are so much more fun where you care and are passionate about what you do, that I'm sorry for those who are missing out on it.
End of rant. Not sure it belongs here but had to get it out of my system :-)
If that's all it was then I wouldn't bother to bring it up. What I'm saying is that more focus on the beauty of a solution (e.g robustness, scalability, flexibility, simplicity) can be a more financially sound solution in the long run than brute force. Of course, I'm not holding my breath for this to become the norm since using brute force to "get what we want now" is popular because it gets fast results (and this not limited to the software industry) - I'm just suggesting the real costs come later.
As for explaining being a tough skill - what I find in common with the previous point is the lack of stress on keeping things as simple and obvious as possible. If you truly understand something, you should be able to convey that understanding assuming you really want to - it just takes investment, usually by asking yourself why it is obvious to you, then putting yourself in the other party's shoes and then finding the shortest and clearest path to bring them from where they are to where you are. In my experience it's not a tough skill - just requires introspection into your own understanding and not assuming anything about what the other person understands.
I was not trying to dismiss what you said. If anything, I was condensing it into my understanding.
The basis is not that brute force is better. The essence is most of us are not seeking an elegant solution to a problem. We are seeking a solution to a problem. Often, just getting that answer is all that matters. Finding a more concise way to get it is something I fully agree that someone should be trying to do. And, in the long term, it is a huge boon if it is found. For most tasks, though, the original solution is all that was needed.
Linked Observation: it is not very reliable in repetition.
I ran the sim a few times and got wildly different results.
- First time (about average) it stabalised (scopre >10,000) at Generation 18.
- The quickest stabalisation was at Generation 3
- sometimes it got to Generation 50 without stabalising.
Brute force will eventually get there, but I guess based on so few parameters it is easy to create a misleading weighting and difficult to un-learn that.
Right. But consider that if we just want a network that can play the perfect game of flappy bird, the correct answer is to find the parameters of a success and store those. Anything else is then just excess data. (This is calling the program that got the answer data. That happened to have been executed to get the perfect player.)
Interestingly, the updates are done on the basis of only 2 external inputs [1]: the height of the bird in the screen and height of the aperture in the next pipe. Using only these two parameters the neural network decides whether to flap or not.
I would had expected at least also the horizontal distance from the next pipe...
This isn't my field, but given the simplicity of the inputs and network, and the way commenters are seeing the demo achieve perfect play after anything between 2 and 200 generations, it makes me wonder if this isn't more of a brute-force search than actual learning?
That is, it smells like there's a "correct" set of neuron values - where any genome within some tolerance of those values wins forever, and any other genome dies quickly. If that's the case, the system can't really evolve towards a solution, can it? It would just cycle randomly through lots of genomes that die immediately, until by pure chance one lives forever. I only tried the demo a few times but that's what it looked like it was doing.
"All" evolutionary algorithms are basically a local search with smart heuristics. Where a local search is a brute-force where you move in small directions based on feedback on where you are in the solution space.
I understand how the algorithms work. What I'm suggesting is that the demo seems to behave like a hill-climbing algorithm that's been unleashed on a terrain that's flat everywhere except the solution.
Not really, if you add new individuals from random then you're doing some global search (or just have a higher mutation rate - but that has some problems)
Given the perfect play reported elsewhere, this suggests that the holes are just tall enough to accomodate a single flap, so the network essentially just has to learn the less-than function and then tune the threshold.
edit: With sigmoid or hard-threshold activation, this function is really simple to implement. If we want to flap iff bird is lower than pipe, we can do that with no hidden layers and a weight vector of [1 -1]. I'd be curious to see someone fork and implement this.
> Given the perfect play reported elsewhere, this suggests that the holes are just tall enough to accomodate a single flap, so the network essentially just has to learn the less-than function and then tune the threshold.
Yes, slightly disappointingly the neural network can be replaced by the line:
The fact that the pipes are evenly spaced doesn't make horizontal distance go away, their distance is still a variable at each flap decision.
What does make the distance irrelevant is that the holes are high enough to safely flap whilst inside them, so you never have to make a timed leap through, a luxury not afforded to users of the original game if I recall.
Yes. And the optimal policy is trivial given those inputs, as long as the aperture between pipes is larger than the "jump" height of the bird: if bird y + bird y velocity > bottom pipe y, jump. So finding that with a neural network or genetic algorithm is fairly silly.
If the aperture is smaller than the jump height, then you need to do something smart to time your jumps.
At generation 83, a single bird emerged that successfully navigated through several hundred columns (current score 100000+) and showed no signs of failing.
It took a few dozen generations to find versions that would make it through a few columns if those columns had gaps without too much vertical distance between them. Somewhere in generation 60-70, I could see versions figuring out how to transition between heights, but failing by overshooting, or not accounting for gravity on the far side of the gap. But once it figured out how to transition between distant heights, it had something that kept working.
Same. Mine is on generation 22 and is the first one to get past even 30 seconds. So far no sign of stopping.
edit: Gen 22 still going strong @ 1 bird, up to 231,000 points. Looking at game.js it's a fairly simple model, I suspect once you get a bird that makes it past the 1-minute mark it will be able to continue forever.
EDIT - to be more clear: the first generation died in a couple of seconds. A bird from the second generation was immediately able to navigate forever. Either this was my chance to win the national lottery (what a way to waste it) or something's wrong.
For me it was 1st generation and I was wondering what this project was all about. Until I hit F5, now it's 50 generations and still to successfull bird in sight.
What's interesting is I've run it several times now, and I've gotten the perfect bird about half the time on an early generation (<10), but sometimes it just doesn't want to find it for a while. I'm wondering if the net is just at the threshold of being the minimal set of neurons that are needed for it to be able to get it so easily by random chance.
It's always different and you can get to 1000 points if you are lucky with the map. But the one that got to 1000 might not get a gap that didn't show up in the last map.
I am running right now a bird from generation 3 going at 26k+ score.
How is it possible to get such good specimens after so little generations? I'm guessing part of it is the simplicity of the model and the environment (the game), but still.
On my second run, 3 individuals seemed to have learned perfect play by gen 8, but after successfully navigating through dozens of pipes, two of them died and one continued indefinitely.
This reminds me of a section (I think it might be the beginning) of "Surely you're Joking, Mr. Feynman" where he recounts fixing radios as a youngster, and the outrage of a client of his that he at times would just sit and stare at the radio. Apparently the guy went around town telling people about this incredible thing he had witnessed -- this kid he knew fixed radios "by thinking."
For every incredibly complicated algorithm designed to solve some really huge problem and deployed at massive scales, there are another thousand little problems at a far lower scale. I did a project at a school once using a GA to automatically create class rosters based on gender balance, student social networks, and grade distribution. This used to be a two-week long guidance counselor project at the school. Now they press a button.
Everyone in this forum clearly understands the impact that software is having on the world. But nevertheless, I still think we underestimate the impact that a more computationally literate society with more readily available computing power can have. I think examples like this excellently written piece of code highlight that.
Here is one -- admittedly very bright -- person who can from the comfort of his browser come up with such a fascinating and ingenious solution to this computational problem. Yes, it is being used for flappy bird, but this would easily extend to maybe a whole set of domain of problems that he/his company/his community could be facing. Yes, it is flappy bird, but this alone could serve as the motivation for an incredible course in a high-school dealing with topics from coding to the theory of evolution. "The Evolution of Cooperation" had a similar effect on me -- Axelrod uses nothing but very elementary Algebra to make such a strong and insightful argument that got me thinking like nothing ever had before.
Here's hoping we make this kind of "play", and this sort of thinking, as widespread as possible in society.
When I first watched it, nothing changed for about 200 generations (thanks for the x5 speed up). I wanted to comment on that, but decided to reload first. Now my 12th generation is flapping for a good two minutes while im writing this comment.
It seems only the second time there was machine learning involved.
It's also using a simple neural network, which is the target of the genetic algorithm, if I understood it right. I haven't seen this combination often - does that make sense in general, or is this just interesting as in playing around with those concepts?
it's a pretty standard procedure to train NNs through GAs but usually not very efficient (e.g. compared to backtracking).
in some cases you might lack an easy way to calculate a fitness score out of the NN performance, which is needed to run the GA.
i tried training a simple NN with a stupid hill climber some time ago but quickly hit a roadblock even with very few neurons because of local minima ... or maybe bugs.
i guess for more complicated problems the pure GA training method might just not be "cost effective" (i.e. time/quality tradeoff).
> it's a pretty standard procedure to train NNs through GAs but usually not very efficient (e.g. compared to backtracking).
Different applications, though. Backtracking in the normal sense needs input and expected output (e.g. lots of training data), while GA/EA learns to solve it without explicit wrong/correct actions, just the score at the end.
Hmmm.. I made the pipes move up and down and now the neural network isn't doing so well :P
I've added "vertical velocity of next pipe" and "distance until next pipe" as inputs as well as upped the hidden layer neurons to 4 - anyone have any other ideas to improve its performance?
In me this sparked a months long fascination with genetic algorithms solving programming puzzles with a few sets of 'initial memory states' and 'desired memory states'.
The articles start with very simplistic generated programs, and end with a sophisticated model able to generate guess-the-number games using loops, conditionals and handling input from users.
This is really cool, and I'm nowhere near smart enough to understand fully how it works, but I'm wondering. Is the "computer" blind in this case? It doesn't seem to understand, even after 50~ runs, that it needs to aim for the opening. It seems to simply go after luck, hoping that some line up. And if it lines up a few times, it will get further, so it is learning something, but still not seeing the change the next time.
Edit: Seeing the new comments here, it seems like it only starts learning after you refresh once or twice. Very cool!
You use a neuronal network to decide at each step to flap the birds wings or not. That is the only output. The input is only the birds height (y-position) and the height of the next hole.
(https://en.wikipedia.org/wiki/Feedforward_neural_network)
The question now is how to find the correct weights for the net.
That net is not trained in a traditional supervised way like with gradient descent (which would be more complicated). Instead it uses a genetic algorithms to find new weights for neuronal networks of future generations.
(https://en.wikipedia.org/wiki/Genetic_algorithm)
For fun, try replacing the setTimeout usage in game.js with calls to setImmediate[+]. My gen 57 birdie (x4) is increasing its score by approx 2,000 points/sec.
Amazing that this can be done in 300 lines of perfectly readable JS without any libraries. And the author apparently wrote it in 2 days. Great intro to genetic algorithms and reinforcement learning. Universities should teach like that.
Edit: Somebody has asked about the big picture how the thing works:
You use a neuronal network to decide at each step to flap the birds wings or not. That is the only output. The input is only the birds height (y-position) and the height of the next hole. (https://en.wikipedia.org/wiki/Feedforward_neural_network)
The question now is how to find the correct weights for the net.
That net is not trained in a traditional supervised way like with gradient descent (which would be more complicated). Instead it uses a genetic algorithms to find new weights for neuronal networks of future generations. That is use some of the best individuals. Some randomly generated ones and some created by breeding from existing individuals. (https://en.wikipedia.org/wiki/Genetic_algorithm)
And that's basically it in this case.