More

xucheng · on April 20, 2022

> figure out a way where we can do things like randomizing questions but preserving the integrity of it to ensure a fair evaluation, etc

How does any of these have anything to do with copyright infringement in the context of DMCA takedown? Do you even own the copyright to the alleged leaked solution?

xucheng · on March 8, 2022

> The wiki listing of packages acts as the "registry" and populates the clib-search(1) results.

This seems to be an extremely bad idea. Since everyone can edit the GitHub wiki pages, it means its registry is vulnerable to all kinds of malicious attack.

Sander_Marechal · on March 8, 2022

Many more things about thsi seem a bad idea. They're proudly making all of npm's mistakes all over again.

moralestapia · on March 8, 2022

And it's soooo gooood that they're doing it!

The C ecosystem desperately needs an npm, it's actually coming in 10 years too late. I don't mind the naysaying.

Yeah sure, "you could have done it better", but then you didn't so this is better than none. I'll take real things over illusions any day :D!

xucheng · on Sept 9, 2021

> Importantly, you must do this for every open branch in your repo. It is not enough to do so for only your default branch since a malicious PR can target any of your open branches. That is, if you have an open branch that uses a vulnerable version of check-spelling then a malicious PR targeting that branch can leak a GITHUB_TOKEN which can then be used to impact any of your branches, including your default branch.

I think this is a big design flaw in GitHub Actions. Whenever there is a security patch, you have to make sure to apply them in every branch. This includes all the historical branches and stale branches which the repo owners forget to delete.

blablabla123 · on Sept 9, 2021

Hard to follow this because I'm mostly on the consuming end of CIs or occasionally do some basic things. Although I've recently tried GHA, setting it up from scratch even for complex setups seems almost trivial. But the security of GHA seems more than shaky.

> I think this is a big design flaw in GitHub Actions. Whenever there is a security patch, you have to make sure to apply them in every branch.

On the other hand I think every action needs to be initialized once on the main branch.

benmmurphy · on Sept 9, 2021

If it’s pulling the actions from git using a fixed commit, then a workaround could be to break history from before the vuln was introduced then it wouldn’t be possible to pull the vulnerable actions. GitHub does GC the unreachable commits quite aggressively.

xucheng · on Aug 28, 2021

Compared with Rust’s Result monad, which allows developers to clearly see the effects of error handling, there are two other hidden fallible effects in Rust that are much harder to tackle:

* Panic rewinding. I am not sure how to ensure your Rust function being panic safe. It is quite easy to cause soundness issue if some invariants no longer hold due to panic. I see `PanicGuard` sometimes used in Rust std library.

* Future cancellation. `tokio::select` is one of the infamous examples, where it is quite easy to introduce bug if the future cannot handle cancellation gracefully.

When trying to handle them properly, it feels more like writing traditional C code than Rust.

newpavlov · on Aug 28, 2021

>I am not sure how to ensure your Rust function being panic safe.

You could use the linking trick in which your panic handler uses non-existent extern fn. For example, this approach is used in the no-panic crate. Of course, this approach is nothing more than a clever hack with several significant limitations.

>Future cancellation

I would say it's a more general problem of Rust lacking linear types.

api · on Aug 28, 2021

IMHO both panics and async are a mistake. The latter should be a standard library with some macros and the former should be replaced by proper error handling everywhere.

Yes out of memory errors should be able to be handled. You might not care when writing web backend slop designed to run under orchestrators but for systems code it is necessary sometimes and Rust is a systems language.

It’s possible to work around both shortcomings but they contradict the languages mission and are warts. Async is a big “excessive cleverness” mistake.

steveklabnik · on Aug 28, 2021

It is not possible to implement async/await efficiently and safely with macros, and so doing so would be “contradicting the language’s mission.”

They were considered and even originally built that way, but empirically it was worse.

erickt · on Aug 28, 2021

I’d go further and say it’s not possible to fully implement async/await without compiler help.

I got really far with stateful, back in 2016 [1]. Stateful was an attempt to write a coroutine library in a proc macro, which generated state machines, as opposed to using os primitives like green threading. This was back before the rust community really started working in this space. I ended up extracting the type system from rustc to do much of the analysis, but it ultimately failed due to how difficult it was to output rust code that respected the borrow checker rules. I also didn’t have anything like the pinning system, so I couldn’t catch move issues either.

It was a much better idea to just implement this in the compiler.

[1]: http://erickt.github.io/blog/2016/01/27/stateful-in-progress...

api · on Aug 28, 2021

Hmm… didn’t know they tried.

Personally I just loathe async in general. Go has it right, but I understand why you can’t do that in Rust. Async is an ugly workaround for the inefficiency of OS threads, and I wish they would just fix that so we can stop all this madness.

xucheng · on Aug 21, 2021

Since many papers have existing bibtex, it would be better if GitHub either provides tools to convert bibtex to CFF or supports bibtex directly.

FWIW, bibtex is the de facto standard.

sdruskat · on Aug 21, 2021

Not for providing citation metadata for software, where BibTeX misses important fields.

xucheng · on Aug 21, 2021

May I ask what missing fields you are referring to? Why @online/@software/@dataset type in biblatex [1] cannot do the job?

That being said, I think GitHub should acknowledge that it is common for authors to want people cite their paper (or multiple papers) rather than simply the source code. Because this is what counts to the citation in academic. At the same time, there is no reason to not support bibtex/biblatex in addition to the cff.

[1]: See section 2.1.1 in http://mirrors.ibiblio.org/CTAN/macros/latex/contrib/biblate...

sdruskat · on Aug 21, 2021

@software is simply an alias for the fallback @misc, i.e., semantics are lost, no fields like different URLs for different software media (code, build artifacts, etc.), no software identifier support, etc.

Also, you can have people cite your paper on GitHub by giving them it as a preferred citation in CFF, and GitHub will render that instead of the source code. Which is, btw, against the software citation principles [1], but caters to people who need time adapting and want traditional credit now.

[1]: https://doi.org/10.7717/peerj-cs.86

xucheng · on Aug 19, 2021

In addition to the attacks, such as converting legit image to be detected as CSAM (false positive) or circumventing detection of the real CSAM image (false negative), which have been widely discussed in HN, I think this can also be used to mount a DOS attack or to censor any images.

It works like this. First, found your target images, which are either widely available like internet memes for DOS attack or images you want to censor. Then, compute their Neuralhash. Next, use the hash collision tool to turn real CSAM images to have the same NeuralHash as the target images. Finally, report these adversarial CSAM images to the government. The result is that the attackers would successfully add the targeted NeuralHash into the CSAM database. And people who store these legit image will then be flagged.

xucheng · on Aug 18, 2021

> 4. Actually, need to run step 1-3 at least 30 times

Depending on how the secret sharing is used in Apple PSI, it may be possible that duplicating the same image 30 times would be enough.

xucheng · on Aug 18, 2021

You can technically hide an adversarial collision inside a complete legit normal image. It won’t be seen by human eyes but it will trigger a detection. In addition, you can do the complete opposite by perturbing a CSAM to output complete different hash to circumvent the detection. All of these vulnerabilities are well known for perceptual hash.

SV_BubbleTime · on Aug 18, 2021

So right there seems to be an issue to me. It seems like if you were trading in CSAM, you would run CLEANER -all on anything and everything. Because you know someone has already written that as proof of concept here.

xucheng · on Aug 18, 2021

> Neural hash generated here might be a few bits off from one generated on an iOS device. This is expected since different iOS devices generate slightly different hashes anyway. The reason is that neural networks are based on floating-point calculations. The accuracy is highly dependent on the hardware. For smaller networks it won't make any difference. But NeuralHash has 200+ layers, resulting in significant cumulative errors.

This is a little unexpected. I'm not sure whether this has any implication on CSAM detection as whole. Wouldn't this require Apple to add multiple versions of NeuralHash of the same image (one for each platform/hardware) into the database to counter this issue? If that is case, doesn't this in turn weak the threshold of the detection as the same image maybe match multiple times in different devices?

cwizou · on Aug 18, 2021

This may explain why they (weirdly), only announced it for iOS and iPadOS, as far as I can tell they didn't announce it for macOS.

My first thought was that they didn't want to make the model too easily accessible by putting it on macOS, in order to avoid adversarial attacks.

But knowing this now, Intel Macs are an issue as (not as I previously wrote because they differ in floating point implementation to ARM, thanks my123 for the correction) they will have to run the network on a wide variety of GPUs (at the very least multiple AMD archs and Intel's iGPU), so maybe that also factored in their decision ? They would have had to deploy multiple models and (I believe, unless they could make the models exactly converge ?) multiple distinct database server side to check back.

To people knowledgeable on the topic, would having two versions of the models increase the attack surface ?

Edit: Also, I didn't realise that because of how perceptual hashes worked, they would need to have their own threshold to matching, independent of the "30 pictures matched to launch a human review". Apple's communication push implied exact matches. I'm not sure they used the right tool here (putting aside the fact for now that this is running client side).

kwerk · on Aug 18, 2021

It wasn’t part of the original announcement afaik but is coming to MacOS Monterey: https://www.apple.com/child-safety/

Edit: cwizou correctly points out not all of the features (per Apple) will be on Monterey but the code exists.

cwizou · on Aug 18, 2021

Is it ? I checked your link and they separate clearly which features comes to which OS, here's how I read it :

- Communication safety in Messages

> "This feature is coming in an update later this year to accounts set up as families in iCloud for iOS 15, iPadOS 15, and macOS Monterey."

- CSAM detection

> "To help address this, new technology in iOS and iPadOS"

- Expanding guidance in Siri and Search

> "These updates to Siri and Search are coming later this year in an update to iOS 15, iPadOS 15, watchOS 8, and macOS Monterey."

So while the two other features are coming, the CSAM detection is singled out as not coming to macOS.

But ! At the same time, and I saw that after the editing window closed, the GitHub repo clearly states that you can get the models from macOS builds 11.4 onwards :

> If you have a recent version of macOS (11.4+) or jailbroken iOS (14.7+) installed, simply grab these files from /System/Library/Frameworks/Vision.framework/Resources/ (on macOS) or /System/Library/Frameworks/Vision.framework/ (on iOS).

So my best guess is, they trialed it on macOS as they did in iOS (and put the model there contrary to what I had assumed) but choose not to enable it yet, perhaps because of the rounding error issue, or something else.

Edit : This repo by KhaosT refers to 11.3 for the API availability but it's the same ballpark, Apple is already shipping it as part of their Vision framework, under an obfuscated class name, and the code samples runs the model directly on macOS : https://github.com/KhaosT/nhcalc/blob/5f5260295ba584019cbad6...

kwerk · on Aug 18, 2021

Ah good catch and write up. I believe you’re right and likely a matter of time for Mac. Hard to tell if this means it’s shipping with MacOS but just not enabled yet.

my123 · on Aug 18, 2021

The model runs on the GPU or the Neural Engine, CPU arch isn't really a factor.

cwizou · on Aug 18, 2021

My bad, I edited the previous post, thanks for this. Assuming this runs on Intel's iGPU, they would still need the ability to run on AMD's GPU for the iMac Pro and Mac Pro, so that's at least two extra separate cases.

trangus_1985 · on Aug 18, 2021

It's not a user facing feature, and x86 macs are the past already - I doubt they'll bother porting it.

cyanydeez · on Aug 18, 2021

my primary expectation is this tech will be used for dcma2.0 and "for the kids" is the best way to launch it.

eurasiantiger · on Aug 18, 2021

This basically invalidates any claims Apple made about accuracy, and brings up an interesting point about the hashing mechanism: it seems two visually similar images will also have similar hashes. This is interesting because humans quickly learn such patterns: for example, many here will know what dQw4w9WgXcQ is without thinking about it at all.

Closi · on Aug 18, 2021

> it seems two visually similar images will also have similar hashes

This is by-design - The whole idea of a perceptual hash is that the more similar the two hashes are, the more similar the two images are, so I don't think it invalidates any claims.

Perceptual hashes are different to a cryptographic hash, where any change in the message would completely change the hash.

enriquto · on Aug 18, 2021

> The whole idea of a perceptual hash is that the more similar the two hashes are, the more similar the two images are

If that is the case, then the word "hash" is terribly mis-applied here.

tomesco · on Aug 18, 2021

Hash is applied correctly here. A hash function is "any function that can be used to map data of arbitrary size to fixed-size values." The properties of being a(n) (essentially) unique fingerprint, or of small changes in input causing large changes in output, are properties of cryptographic hashes. Perceptual hashes do not have those properties.

enriquto · on Aug 18, 2021

Good explanation, thanks. I only knew about cryptographic hashes, or those that are used for hash tables where you absolutely do not want to have collisions. Anyhow, I'm not really comfortable with this usage of the word "hash". It is completely opposite of the meaning I'm used to.

L33tCrown · on Aug 18, 2021

Maybe the term fingerprint is better

kevin_thibedeau · on Aug 18, 2021

It greatly increases the collision space if you only have to get near a bad number.

SV_BubbleTime · on Aug 18, 2021

> The whole idea of a perceptual hash is that the more similar the two hashes are, the more similar the two images are

This is already proven to be inaccurate. There are adversarial hashes and collisions possible in the system. You don’t have to be very skeptically-minded to think that this is intentional. Links to examples of this already posted in this thread.

You are banking on an ideal scenario of this technology not the reality.

EDIT: Proof on the front page on HN right now https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issue...

eurasiantiger · on Aug 18, 2021

I think you may have misread my comment: I did not mean that the similarity of hashes invalidates any claims.

wizzwizz4 · on Aug 18, 2021

> Wouldn't this require Apple to add multiple versions of NeuralHash of the same image (one for each platform/hardware) into the database to counter this issue?

Not if their processor architectures are all the same, or close enough that they can write (and have written) an emulation layer to get bit-identical behaviour.

csmpltn · on Aug 18, 2021

Floating point arithmetic in an algorithm that can land you in jail? Why not!

user-the-name · on Aug 18, 2021

This algorithm can not land you in jail. Nobody would be jailed based on this algorithm.

The algorithm alerts a human, who actually looks and makes the call.

therealcamino · on Aug 18, 2021

I think it would just require generating the table of hashes once on each type of hardware in use (whether CPU or GPU), then doing the lookup only in the table that matches the hardware that generated it.

TuringNYC · on Aug 18, 2021

To re-do the hashes, you would need to run it on the original offending photo database, which -- as an unofficial party doing so -- could land you in trouble, wouldn't it?

And what if you re-do the hashes on a Mac with auto-backup to iCloud -- next think you know the entire offending database has been sync'd into your iCloud account :-/

varispeed · on Aug 18, 2021

They are probably using https://en.wikipedia.org/wiki/Hamming_distance to have a leeway which again adds to a potential of having more false positives.

heavyset_go · on Aug 18, 2021

Yes, this and other distance metrics are what are used to do reverse and image similarity lookups with perceptual hashes.

enriquto · on Aug 18, 2021

I don't understand the concept of "slightly different hash". Aren't hashes supposed to be either equal or completely different?

fizx · on Aug 18, 2021

You're thinking of cryptographic hashes. There are many kinds of hash (geographic, perceptual, semantic, etc), many of which are designed to only be slightly different.

richardxia · on Aug 18, 2021

There is a class of hashes known as locality-sensitive hashes, which are designed to preserve some metric of "closeness".

https://en.wikipedia.org/wiki/Locality-sensitive_hashing

xucheng · on Aug 18, 2021

See https://hackertimes.com/item?id=28105849, which shows a POC to generate adversarial collisions for any neural network based perceptual hash scheme. The reason it works is because "(the network) is continuous(ly differentiable) and vulnerable to (gradient-)optimisation based attack".