Interestingly, the updates are done on the basis of only 2 external inputs [1]: the height of the bird in the screen and height of the aperture in the next pipe. Using only these two parameters the neural network decides whether to flap or not.
I would had expected at least also the horizontal distance from the next pipe...
This isn't my field, but given the simplicity of the inputs and network, and the way commenters are seeing the demo achieve perfect play after anything between 2 and 200 generations, it makes me wonder if this isn't more of a brute-force search than actual learning?
That is, it smells like there's a "correct" set of neuron values - where any genome within some tolerance of those values wins forever, and any other genome dies quickly. If that's the case, the system can't really evolve towards a solution, can it? It would just cycle randomly through lots of genomes that die immediately, until by pure chance one lives forever. I only tried the demo a few times but that's what it looked like it was doing.
"All" evolutionary algorithms are basically a local search with smart heuristics. Where a local search is a brute-force where you move in small directions based on feedback on where you are in the solution space.
I understand how the algorithms work. What I'm suggesting is that the demo seems to behave like a hill-climbing algorithm that's been unleashed on a terrain that's flat everywhere except the solution.
Not really, if you add new individuals from random then you're doing some global search (or just have a higher mutation rate - but that has some problems)
Given the perfect play reported elsewhere, this suggests that the holes are just tall enough to accomodate a single flap, so the network essentially just has to learn the less-than function and then tune the threshold.
edit: With sigmoid or hard-threshold activation, this function is really simple to implement. If we want to flap iff bird is lower than pipe, we can do that with no hidden layers and a weight vector of [1 -1]. I'd be curious to see someone fork and implement this.
> Given the perfect play reported elsewhere, this suggests that the holes are just tall enough to accomodate a single flap, so the network essentially just has to learn the less-than function and then tune the threshold.
Yes, slightly disappointingly the neural network can be replaced by the line:
The fact that the pipes are evenly spaced doesn't make horizontal distance go away, their distance is still a variable at each flap decision.
What does make the distance irrelevant is that the holes are high enough to safely flap whilst inside them, so you never have to make a timed leap through, a luxury not afforded to users of the original game if I recall.
Yes. And the optimal policy is trivial given those inputs, as long as the aperture between pipes is larger than the "jump" height of the bird: if bird y + bird y velocity > bottom pipe y, jump. So finding that with a neural network or genetic algorithm is fairly silly.
If the aperture is smaller than the jump height, then you need to do something smart to time your jumps.
I would had expected at least also the horizontal distance from the next pipe...
[1] https://github.com/xviniette/FlappyLearning/blob/gh-pages/ga...