HN2new | past | comments | ask | show | jobs | submitlogin

this is amazing, counter to what most ppl think, majority of memory bugs are from out of bounds access, not stuff like forgetting to free a pointer or some such


Personally, as someone in C and C++ for the last few years, memory access is almost never the root bug. It's almost always logic errors. Not accounting for all paths, not handling edge cases, not being able to handle certain combinations of user or file input, etc.

Occasionally an out-of-bounds access pops up, but they're generally so blindingly obvious and easy to fix that it's never been the slow part of bug fixing.


I've been programming for long; the ratio of memory errors to logic bugs in production is so low as to be non-existent.

My last memory error in C code in production was in 2018. Prior to that it I had a memory error in C code in production in 2007 or 2008.

In C++, I eventually gave up trying to ship the same level of quality and left the language altogether.


The wider industry data gathered indicates that for memory unsafe languages 80% of issues are due to memory vulnerabilities, including mature codebases like Linux kernel, curl, V8, Chrome, Mach kernel, qemu etc etc etc. This doesn’t mean that logic bugs are less common, it just means that memory safety issues are the easiest way to get access.

As for why your experience may be different, my hunch is that either your code was super simple OR you didn’t test it thoroughly enough against malicious/unexpected inputs OR you never connected the code to untrusted I/O.

Keep in mind the data for this comes from popular projects that have enough attention to warrant active exploit research by a wide population. This is different from a project you wrote that doesn’t have the same level of attention.


> The wider industry data gathered indicates that for memory unsafe languages 80% of issues are due to memory vulnerabilities, including mature codebases like Linux kernel, curl, V8, Chrome, Mach kernel, qemu etc etc etc.

You are misremembering the various reports - the reports were not that 80%[1] of issues were due to memory errors, but more along the lines of 80% of exploits were due to memory errors.

You could have 1000 bugs, with 10 of them being vulnerabilities, and 8 of those 10 being due to memory errors, and that would still be in line with the reports.

> As for why your experience may be different, my hunch is that either your code was super simple OR you didn’t test it thoroughly enough against malicious/unexpected inputs OR you never connected the code to untrusted I/O.

Payments processing, telecoms and munitions control software.

Of those, your explanation only applies to Telecoms; payments processing (EMV) was basically a constant stream of daily attacks, while munitions are live, in the field, with real explosives. We would've noticed any bugs, not just memory error bugs with the munitions one.

--------------------

[1] The number wasn't 80% IIRC, more like 70%?


Sorry, I didn’t misremember but I wrote down without proof checking (see another comment where I got it right). I did indeed mean 80% of security vulnerabilities are caused by memory safety issues.

For EMV you had C connected directly to the network under a steady stream of attacks and only had an issue once? I find that hard to believe. What’s more likely is a Java websever frontend talking to some C processing / crypto code in which case again you’re less likely to encounter bugs in your code because it’s difficult to find a path to injecting unsanitized input.

For munitions there’s not generally I/O with uncontrolled input so it’s less likely you’d find cases where you didn’t properly sanitize inputs and relied on an untrusted length to access a buffer. As a famous quote states, it’s ok if your code has an uptime of 3 minutes until the first bug if the bomb explodes in 2


> For EMV you had C connected directly to the network under a steady stream of attacks and only had an issue once? I find that hard to believe. What’s more likely is a Java websever frontend talking to some C processing / crypto

EMV terminals. No Java involved.

> As a famous quote states, it’s ok if your code has an uptime of 3 minutes until the first bug if the bomb explodes in 2

Look, first you commented that it's not possible for nontrivial or non-networked devices, now you're trivialising code that, if wrong, directly killed people!

All through the 80s, 90s and 2000s (and even now, believe it or not), the world was filled with millions and millions of devices programmed in C, and yet you did not live a life where all the devices around you routinely crashed.

Crs, Microwaves, security systems... they didn't routinely crash even though they were written in C.


EMV terminals are not under daily cybersecurity attack - you need to have physical access unless you designed your system weirdly. You probably had loads of vulnerabilities. But also depending on when you did it, all you had to process was a bar code which is also isn’t some super complicated task.

I’m not trivializing the safety of munitions. I’m attempting to highlight that safety and stability in a munitions context is very different and memory safety issues could easily exist without you realizing. My overall point is that you are silently making the argument that C programmers (or programmers in general) used to be better, which is a wild argument to be making about a culture in which fuzzing didn’t even exist as a concept. You’re also confusing memory safety with implying a crash. That simply isn’t the case - it’s more often exploitable as a security vulnerability than an immediate crash by violating assumptions made that weren’t in the happy path of those microwaves and security systems. That millions of devices were and still are routinely exploitable.

You’re also making a fallacious line of reasoning that the C today is the same C that was in use in the 80s, 90s, and 2000s. It’s not and has gotten harder and more dangerous because a) multi threading became more of a thing and b) compiler authors started exploiting “undefined” behaviors for extra optimization.

It’s just wild for me to encounter someone who believes C is a safe language and is suitable to connect to I/O too when there’s so many anecdotal and wide statistical evidence gathered that that’s not the case. Even SQLite, the darling of the C community, is not safe if asked to open arbitrary SQLite files - there’s various security attacks known and possible.


> EMV terminals are not under daily cybersecurity attack - you need to have physical access unless you designed your system weirdly.

They are under daily attack - in public, at tills, operated by minimum-wage earners.

> You probably had loads of vulnerabilities.

Sure. Hundreds of thousands of terminals sitting in the field, networked, under the control of minimum wage employees, each holding credit card details for hundreds of cards at a time...

Yeah, you're right, not a target at all!

> But also depending on when you did it, all you had to process was a bar code which is also isn’t some super complicated task.

You are hopelessly naive. Even in the magstripe era, certification was not easy.

> It’s just wild for me to encounter someone who believes C is a safe language

When did you meet this person?

Look, the bottom line is, the errors due to memory safety in programs written in C is so small it's a rounding error. It's not even statistical noise. You spent your life surrounded by these programs that, if they went wrong, would kill you, and yet here you are, not only arguing from a place of ignorance, you are reveling in it.

Just out of interest, have you ever used an LLM to write code for you?


Physical attacks are difficult to pull off at scale, especially anonymously. There’s a huge evidence trail linking the people involved to the scheme. And a device being in the hands of a minimum wage employee is very different from a bored and talented and highly skilled person probing your software remotely. Now who’s naive?

As for certification and it being difficult, what does that have to do with the process of bread in Paris? Unless you’re somehow equating certification with a stamp of vulnerability imperviousness in which case you’re seeing your own naivete instead of in others. Btw, Target was fully certified and fully had their payment system breached. Not through the terminals but through the PoS backend. And as for “but you’re here living and breathing”, there’s constant security breaches through whatever hole, memory safety or otherwise. Persistent access into the network is generally only obtainable through credential compromise or memory safety.

> When did you meet this person?

You. You’re here claiming that memory safety issues are statistical noise yet every cloud software I’ve seen deployed regularly had them in the field, sometimes even letting a bad one through to canary. And memory safety issues persisted despite repeated attempts to fix issues and you couldn’t even know if it was legitimately an issue or just a HW flaw due to being deployed at scale enough that you were observing bad components. It’s a real problem and claiming it’s statistical noise ignores the consequences of even one such issue being easily accessible.


> You. You’re here claiming that memory safety issues are statistical noise yet

Claiming that the exploit rate percentage is statistical noise is different from claiming that it's a safe language.

Looks like you have a premade argument to argue.

You haven't answered my question, though: Have you used LLMs to generate any code for yourself?


Yes. The problem is that most memory errors (out of bounds + use after free etc.) result in a vulnerability. Only a minority of the logic errors do.

For operating systems kernels, browsers etc, vulnerabilities have a much, much bigger impact than logic errors: vulnerabilities need to be fixed immediately, and released immediately. Most logic errors don't need to be fixed immediately (sure, it depends on the issue, and on the type of software.)

I would probably say "for memory unsafe languages, 80% of the _impact_ is due to memory vulnerabilities"


logic errors aren't memory errors, unless you have some complex piece of logic for deallocating resources, which, yeah, is always tricky and should just generally be avoided


"Majority" could mean a few things; I wouldn't be surprised if the majority of discovered memory bugs are spatial, but I'd expect the majority of widely exploited memory bugs to be temporal (or pseudo-temporal, like type confusions).


I think UAFs are more common in mature software


Or type confusion bugs, or any other stuff that stems from complex logic having complex bugs.

Boundary checking for array indexing is table stakes.


table stakes, but people still mess up on it constantly. The "yeah, but that's only a problem if you're an idiot" approach to this kind of thing hasn't served us very well so it's good to see something actually being done.

Trains shouldn't collide if the driver is correctly observing the signals, that's table stakes too. But rather than exclusively focussing on improving track to reduce derailments we also install train protection systems that automatically intervene when the driver does miss a signal. Cause that happens a lot more than a derailment. Even though "pay attention, see red signal? stop!" is conceptually super easy.


I'm not saying it's not important, it is. I just don't believe that '[the] majority of memory bugs are from out of bounds access'. That was maybe true 20 years ago, when an unbounded strcpy to an unprotected return pointer on the stack was super common and exploiting this kind of vulnerabilities what most vulndev was.

This brings C one tiny step closer to the state of the art, which is commendable, but I don't believe codebases which start using this will reduce their published vulnerability count significantly. Making use of this requires effort and diligence, and I believe most codebases that can expend such effort already have a pretty good security track record.


The majority of security vulnerabilities in languages like C that aren’t memory safe are due to memory safety issues like UAF, buffer overflows etc etc. I don’t think I’ve seen finer grained research that tries to break it out by class of memory safety issue. The data is something like 80% of reported vulnerabilities in code written in these languages are due to memory safety issues. This doesn’t mean there aren’t other issues. It just means that it’s the cheapest exploit to search for when you are trying to break into a C/C++ service.

And in terms of how easy it is to convert a memory safety issue into an exploit, it’s not meaningfully much harder. The harder pieces are when sandboxing comes into play so that for example exploiting V8 doesn’t give you arbitrary broader access if the compromised process is itself sandboxed.


There is use after free


Majority. Parent said majority


Exactly. Use after free is common enough that you can't just assert that out-of-bounds is the majority without evidence.


actually you may be right, according to project zero by google [1], ~50% is use after free and only ~20% for out of bounds errors, however, this is for errors that resulted in major exploits, i'm not sure what the overall data is

[1] https://projectzero.google/2022/04/the-more-you-know-more-yo...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: