Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

I'm not an expert, but my guess is because it makes it really easy to materialize the weights between layers as a 2d matrix and we have some really good library code out there for dealing with matrices.

There's probably room for experimenting with more novel constructions and even some papers out there looking into the matter.



Efficiency is the expected answer. I'm just wondering if there's a more theoretical reason, such as "every function that can be computed by a non-layered acyclic network can be computed by a complete layered network using only a small number of extra nodes/layers."


I think that it can. With some weights of 0 and some weights of 1, you can trivially map 'jumps' that skip from a node in one layer to a node a couple layers distant, by means of some incorporate-no-other-inputs intermediate nodes, right? Sigmoid function on 1 is still 1? Once you have those, it's just a matter of how many layers you need for any acyclic structure, I think.

Although if you wanted to come up with difficult scenarios, it's not hard to think of structures that would make some of those middle layers really tall, or add a lot of middle layers.


As I mentioned in another branch of this thread, selectively choosing edges between nodes isn't an option, because in the standard model you have complete incidence between nodes in adjacent layers.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: