Piezo mics are pretty cheap, and if wired up to the microphone input of a computer or phone you could probably get better accuracy as well if you used the same signal processing techniques.
The astounding thing about Goliath wasn’t that is was a huge leap in performance, it was that the damn thing functioned at all. To this day, I still don’t understand why this didn’t raise more eyebrows.
This wasn't something I really dug into in great detail but I remember my surprise back then at how all those merged models and those "expanded" models like Goliath still generated coherent output. IMO those were more community models made by small creators for entertainment rather than work, and only really of interest to the local LLM groups on Reddit, 4chan, and Discord. People might briefly discuss it on the board and say "that's cool" but papers aren't being written and it's less likely for academics or corpo researchers to notice it.
That being said I wonder if it's possible to combine the layers of completely different models like say a Llama and a Qwen and still get it to work.
Even with math probes, I hit unexpected problems. LLMs fail arithmetic in weird ways. They don’t get the answer wrong so much as get it almost right but forget to write the last digit, as if it got bored mid-number. Or they transpose two digits in the middle. Or they output the correct number with a trailing character that breaks the parser.
Would using grammar parsing help here by forcing the LLM to only output the expected tokens (i.e. numbers)? Or maybe on the scoring side you could look at the actual probabilities per token to see how far the correct digit is.
I think the main challenge with combining layers of different would models be their differing embedding sizes and potentially different vocabularies.
Even between two models of identical architecture, they may have landed on quite different internal representations if the training data recipe was substantially different.
Even with the same embedding sizes and vocabularies, there’s nothing that forces the meaning of dimension 1 of model 1 to mean the same thing as dimension 1 of model 2 — there are lots of ways to permute the dimensions of a model without changing its output, so whatever dimension 1 means the first time you train a model is just as likely to end up as dimension 2 the second time you train is as it is to be consistent with the first model.
Nobody here or on Reddit has mentioned this, maybe bc it’s too obvious, but it’s clear to me that the residual connections are an absolutely necessary component to making this merging possible — that’s the only reason dimension 1 of a later layer is encouraged to mean something similar to dimension 1 of an earlier layer.
It’s a good spot for hobbyists to fill in the gaps. Maybe it’s not interesting enough for academics to study, and for corporate ML they would probably just fine tune something that exists rather than spending time on surgery. Even Chinese labs that are more resource constrained don’t care as much about 4090-scale models.
I mean from a privacy perspective alone its clear that Meta throws its ethics out the door in that regard. There's the Cambridge Analytica scandal, the more recent incident with Instagram bypassing Android OS restrictions for more tracking, and many many other examples.
Their apps also regularly nag you to allow access to stuff like contacts and the photo gallery when you've already said no the first time.
And for a personal anecdote: I was recently helping a senior setup Whatsapp Desktop on her Windows computer. It could chat fine but refused to join calls, displaying an error that said there was no microphone connected. I mean, there is a mic connected and it could record voice notes fine. Turns out that error actually meant that there was no webcam connected, and a webcam is required to join calls. I think it's the same way in the mobile app where you need to give it the camera permission to join a video call even if you turn the video off. Meanwhile Zoom, Teams, Webex, and others allow you to join any call without a mic or camera.
As she didn't have a webcam I first tried the OBS virtual camera but Whatsapp refused to recognize that despite all other apps working fine with it. Somehow Droidcam with no phone connected worked fine, displaying a black screen in the virtual camera feed, and that got Whatsapp to join the call successfully. Absolutely ridiculous and it's clear to me how desperately they want that camera access and that sweet data.
See, this is why I made a comment in that Apple thread (see my post history) about stopping Facebook doing things like this. I was told "Android can do it too". Yes but no. Apple may do evil things but they punished Facebook for their bullshit, revoking their certificate. The landscape of contact info (phone numbers, email addresses, social media services, people just submitted it, they trust me, dumb f-) means you can't have bad faith actors like Zuckerberg Zucking about. Whatsapp is such a clear case of antitrust just for starters
Edit: sorry, not entirely clear, I mean we need Apple's system of granularity. "Deny access to contacts" needs to work even when the asking company (Facebook) tries tricking people
Personally I wonder even if the LLM hype dies down we'll get a new boom in terms of AI for robotics and the "digital twin" technology Nvidia has been hyping up to train them. That's going to need GPUs for both the ML component as well as 3D visualization. Robots haven't yet had their SD 1.1 or GPT-3 moment and we're still in the early days of Pythia, GPT-J, AI Dungeon, etc. in LLM speak.
That's going to tank the stock price though as that's a much smaller market than AI, though it's not going to kill the company. Hence why I'm talking about something like robotics which has a lot of opportunity to grow and make use of all those chips and datacenters they're building.
Now there's one thing with AR/VR that might need this kind of infrastructure though and that's basically AI driven games or Holodeck like stuff. Basically have the frames be generated rather than modeled and rendered traditionally.
Nvidia's not your average bear, they can walk and chew bubblegum at the same time. CUDA was developed off money made from GeForce products, and now RTX products are being subsidized by the money made on CUDA compute. If an enormous demand for efficient raster compute arises, Nvidia doesn't have to pivot much further than increasing their GPU supply.
Robotics is a bit of a "flying car" application that gets people to think outside the box. Right now, both Russia and Ukraine are using Nvidia hardware in drones and cruise missiles and C2 as well. The United States will join them if a peer conflict breaks out, and if push comes to shove then Europe will too. This is the kind of volatility that crazy people love to go long on.
I feel that the push will not be towards a general computing device though, but rather to a curated computing device sort of like the iPhone or iPad. Basically general in theory but actually vendor restricted inside a walled garden.
With improved cellular and possibly future satellite connectivity I feel that this would also be more of a thin client than a local first device, since companies want that recurring cloud subscription revenue over a single lump sum.
Keep in mind bitrot is a real thing if you roll your own storage. While most cloud storage solutions store multiple copies of your data I'm not sure if all of them have a system that checks for and fixes bitrot.
I love my ZFS server as it handles all that transparently but that's really not an option for everyone.
I think I have some bitrot in my photo collection, there are a few pictures that seem to be broken, but it's far less than 1%. I'm fine with it. I could probably restore most of those images if I tried.
After I got my server going I transferred all my photos over and ran a utility overnight to check them for corruption, the name escapes me but it was an open source cli program. A small number of images were corrupted and the majority were replaced with thankfully pristine backup copies. The rest were restored with minor visual glitches.
That's fine for the main article but I think there should be a way to get higher quality images should the reader request them. If power is a concern those can be hosted elsewhere.
I think it's acceptable for the drawings to be compressed this way but the photographs are very unclear.
The issue is that it's hacky, and in that case I'd rather go with a Intel or AMD x86 system with more or less out of the box Linux support. What we're looking for is a performant ARM system where Linux is a first class citizen.
Seems some people have done this already with a PC app: https://timeandtidewatches.com/how-to-make-your-own-timegrap...
reply