Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

I think one should distinguish between 'all necessary information is already in the pixel-space' vs 'we already know how to extract all the information needed from pixel-space'

The fact that (most) humans manage to drive around safely and successfully in current roads proves that the information needed exists in the pixel-space (not just current image, but say current + history). We don't yet have stacks that can successfully map everything needed from this information but I don't think Dr. Karpathy ever claimed that.

(I am not a principal engineer but a mere PhD student who argues daily with people on how RGB information is underappreciated and under utilized)



> The fact that (most) humans manage to drive around safely and successfully in current roads proves that the information needed exists in the pixel-space

But that doesn't mean that it translates to a car.

We constantly move our 576MP resolution eyes in multiple orientations in order to visualise a scene and focus on the most important areas. Cars have fixed, low-quality cameras.

We then interpret this data using the most advanced pattern recognition system the world has ever seen that is trained for at least 20+ years to fully comprehend the behaviour of everything this planet has to offer. Cars don't have anything close to this.


You kind of want to make it seem like a 576 MP resolution (where did you even get this number from while people still argue about a fair comparison between human eye and a camera?) or having to move your head/eyes to visualize your surroundings rather than actually having multiple fixed cameras covering the entire surroundings all the time is a good thing? If the resolution mattered that much, every car would have ultra-high resolution cameras on it.

Humans certainty have a stronger and general prior to make sense out of the information, and that's exactly why I left it as a possibility. Cars don't * yet * have anything close to it, just like they didn't have a way to accurate detect objects a few years ago and just like they didn't have a way to capture RGB information a few decades ago.

I am an optimistic guy, and I certainly believe in the power of learning at scale.


> 576MP

Actually our eyes are more like 8MP: https://www.picturecorrect.com/what-is-the-resolution-of-the...

Perhaps higher synthetic resolution from moving our eyes about, or perhaps that is meaningless.


It could be reframed as saying we have a peak acuity equivalent to a 576MP camera of the same FOV with a theoretical max of 20 samples per second (50 ms to move targets, realistically probably more like single digits). The 8MP comparison is only relevant if there are so many targets that need constant full resolution that you can't focus on all of them or the targets are so large that they are larger than the peak acuity FOV. In practice this is not the case because we can identify something once and keep tracking it in the periphery without issues and something that large will likely be extremely easy to identify.


That doesn't make sense: a camera doesn't get more pixels just because the camera is taking a video tracking something. Neither if it had zoom and a controlled gimbal.


If you turn that tracked video into a panorama it would. Or if you took 10 zoomed photos and stitched them over top of an unzoomed photo. The point is that unless the task demands more focus areas than the eye can focus on in a given window then the visual acuity (for the parts of the scene that matter) is higher than an 8MP shot of the entire scene.


I'll agree with you that there are still techniques to be discovered.

I also agree that most humans manage to drive in challenging conditions, but their margins for error become slimmer and slimmer. I personally want my autonomous robot vehicle to be way more efficient and safer than the best human operator and also able to deal with conditions that any sane human would pull to the side of the road when encountering.


Definitely agree with your second point! In theory, the reaction time and complete environment awareness should itself make an autonomous system way safer than human drivers.

In some way, I am against the philosophy of using HD maps + LIDAR data for highly accurate localization which most companies seem to be using these days. I believe that this approach is inherently brittle and is an 'easy way out' to the hard localization problem. I think more resources should be put into developing more natural, no HD map dependency techniques.

PS: It is my understanding that most of the major players were using HD maps, not sure if it is still true.


>their margins for error become slimmer and slimmer.

Can you elaborate on this? I've always felt like the margins of error are getting wider because the automotive tech (particularly safety features) are so vastly improved. I doubt people would be able to text and drive as much, for example, if they were driving a 1950s era Willys jeep just because it requires so much more attention to keep on the road by comparison to modern vehicles.


bleh. auto accidents are the #1 preventable cause of death for kids:

https://en.wikipedia.org/wiki/Preventable_causes_of_death#Am...





Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: