This is just a smart move by Satya Nadella after the non-standard drama that occurred with OpenAI a few months back where it nearly imploded and then didn't.
You want both a backup for OpenAI as well as negotiating leverage if OpenAI gets too powerful and this achieves both.
It's also a good play to try to take resources away from local, self-hosted "Feasible AI" solutions. With compute resources, I think Microsoft hopes Mistral skews their focus and resources towards large models that can run only run in the cloud, trying to lure them away with the bait:
"Don't you want to build the best AI possible, independent of compute?"
I'd be surprised if they didn't consider the notion that they are hitting to birds with one stone: OpenAI and Indie AI.
It's not like Microsoft is working on "Windows AI Studio" [1], or released Orca, or Phi.
It's not like there's any talk of AI PCs with mandatory TOPs requirements for Windows 12.
Big bad Microsoft coming for your local AI, beware.
> Mistral Remove "Committing to open models" from their website
That was 5 hours ago.
Without having insider details it is hard to know why, but the coincidence of timing with the Microsoft deal is not lost on me. It could have even been a stipulation.
I have no explanation for why Microsoft has started aggressively innovating again (with the introduction of Satya) than my theory that US DoD realized the country's tool of dominance in the future will be predominantly with tech superiority instead of military power. Microsoft's new strategy of running everything on the cloud aligns with this, even if it may have been also motivated by the fact that most people now only own a battery-constrained mobile device and laptops getting smaller and thinner.
From my understanding, which may be wrong, you only need the massive compute resources initially to create a compiled vector space LLM - and then that LLM once compiled can be run locally?
This is why anti-CSAM measures policy is possible so compiled-release LLMs can have certain vector spaces removed before release; but apparently people are creating cracks for these types of locks?
You are a little confused. There’s no “compiling” of LLMs. It’s just once it’s trained, inference takes less compute than further training. So you can run things locally that you couldn’t necessarily train locally.
Not sure where you are getting the CSAM bit. We aren’t that good at blanking out weights in any kind of model, certainly not good enough to lobotomize specific types of content.
The CSAM bit seems to then be propaganda from at least one AI company putting out PR to falsely quell people's concerns about their LLMs being able to generate content involving children that's sexualized.
I've yet to see details of how much compute-minimum server requirements are necessary to run LLMs. Maybe you know a source who's compiling a list in a feature matrix that includes such details?
Large LLMs like gpt-3 and gpt-4 need very serious hardware. They have hundreds of billions of parameters (or more) which need to be loaded in memory all at once.
I don't see why Mistral would acquiesce. Like the other comment says, Microsoft has a lot of chips on the table for local AI. They didn't even mention DirectML, ONNX or Microsoft's other local AI frameworks - suffice to say Microsoft does care about on-device AI.
So... would Mistral deliberately sabotage their low-end models to appease Microsoft's cloud demand? I don't think so. Microsoft probably knows that letting Mistral fall behind would devalue their investment. It makes more sense to bolster the small models to increase demand for the larger ones, at least from where I'm standing.
If you're asking about Microsoft's APIs - I'd keep an eye on ONNX. It's the most ambitious, but also supports an insane amount of acceleration targets. It would be the proverbial "big guns" if vendors continued investing in more insular frameworks like Metal and CUDA.
Diversifying their AI bets definitely makes total sense. If this wasn't their strategy originally, it almost certainly became so the moment the OpenAI board fired Sam Altman.
It's easy to make simplistic judgements from the outside, but with the limited information we have, it does seem like Satya Nadella came out of this OpenAI debacle looking pretty competent.
It's hard to reconcile the fact that the Microsoft that handled the unexpected OpenAI issue so well is the same Microsoft that seems intent on literally setting fire to their flagship product! (Windows)
I totally agree it’s also like the move where Microsoft is at least supporting Linux on their systems and cloud as not a backup but to just close you into their ecosystem . Honestly I could see Microsoft buying Huggingface.
Yes, Microsoft doesn't have to pick the sole winner in AI, but rather they could just start eating the AI ecosystem bit by bit so that they win by default. It is what large players can do. May open themselves up to some scrutiny for too many acquisitions and reducing competition though, but that is a separate issue.
This is how microsoft has been doing data for at least 10 years (See databricks).
Step 1: Get the industry leaders to be purchasable via Azure.
Step 2: Slowly build your own clone and start stealing user share even though your offering is still worse.
"Microsoft recommends OpenAI as your default overlord. Did you know it can do everything your current AI can do, sometimes better, but always more profitably for us? [Switch now] [Ask me again in 30 seconds]"
Would you mind elaborating why? I'm not super experienced in the AI world, and barely use Hugging Face. Frankly, the name makes it difficult to take it seriously.
Hugging Face is very supportive of the open source machine learning community, both in the work they do with the transformers library, as well going above and beyond in developer and community relations to build an all around great product offering and user experience. Microsoft does the opposite of all of those things and has only made GitHub worse and more unstable since acquiring them.
> Nadella [in December 2022] abruptly cut off Lee midsentence, demanding to know how OpenAI had managed to surpass the capabilities of the AI project Microsoft’s 1,500-person research team had been working on for decades. “OpenAI built this with 250 people,” Nadella said, according to Lee, who is executive vice president and head of Microsoft Research. “Why do we have Microsoft Research at all?”
> At the same time, even as the company began weaving OpenAI into the fabric of Microsoft’s products, Nadella decided not to abort Microsoft’s own research efforts in AI. During the tense exchange at the December meeting between the Microsoft CEO and Lee, other executives spoke up to defend the work of Microsoft’s researchers, including Mikhail Parakhin, who oversees Microsoft’s Bing search and Edge browser groups, Lee said. After grilling Lee in the meeting, Nadella called him privately, thanking him for the work Microsoft Research had done to understand and implement OpenAI’s work in a way that passed muster for corporate customers. Nadella said he saw Lee’s group as a “secret weapon.”
While this is entirely speculation, it's easy to imagine that there are many levels of PR magic going on here, to share a quote that on the surface feels "leaked" and "explosive" but, among investors and clients who read beyond the (very good) paywall, actually shores up a narrative that Microsoft has a capability that significantly augments OpenAI's, and allows the existence of MSR to become headline news without even needing a product release.
The Mistral deal feels like yet another step in this direction. Microsoft is not afraid of seeming "messy" in the press as long as it can control the narrative around its value-add to customers in the context of its partnerships. By contrast, the rest of FAANG's more consumer-facing positioning makes it a lot harder for them to maneuver in a similar way.
> Nadella [in December 2022] abruptly cut off Lee midsentence, demanding to know how OpenAI had managed to surpass the capabilities of the AI project Microsoft’s 1,500-person research team had been working on for decades. “OpenAI built this with 250 people,” Nadella said, according to Lee, who is executive vice president and head of Microsoft Research. “Why do we have Microsoft Research at all?”
The answer to that is till Google released the Attention is All You Need paper in 2017 there were no breakthroughs allowing models as we have now to be built, OpenAI being a small and nible team picked up on which direction the wind is blowing with LLMs and quickly brought a product to market whilst MS just did what corps do - move slowly (same for Google etc).
Microsoft research has also been not solely devoted in AI I have seen much in quantum computing and programming language research and general computer science .
> I guess it has a cost, though? I presume OpenAI didn’t like this move. If that’s the case, what might be the consequences?
Until OpenAI releases GPT 5 and it blows everyone away, OpenAI's leverage is constantly decreasing as the gap between their best model and everyone else's best model decreases.
There doesn't seem to be moats right now in this industry except for pure model performance.
Maybe someone should as ChatGPT what OpenAI should do to maintain long-term leadership in this industry?
If I had to pick one player who wanted to win the AI race and was willing to be ruthless to do it, I'd pick Nvidia. Computation is the excludable bottleneck, and Nvidia is the essentially the singular company who makes AI computers.
Hire Ilya, get him to hire as many of the best folks he can.
Stop selling GPUs. Hoard them. Introduce some subtle bug into the drivers that dramatically increases their rate of burn out.
Figure out some reasonable way to give attribution to original content creators, approximately solve the content ID problem of the AI age. Cut the content creators into the rev share in proportion to their data importance to the model. Make the content creators incredibly pissed off that their work is being stolen by big AI companies unfairly and encourage to them to sue the other big AI firms. Their content share multiplier increases if they get injunctions against LLM firms.
Convince politicians that the AI firms have performed an intellectual heist of epic proportions, and that they must not be allowed to even generate synthetic training data from poisoned models. With the content creators united behind you, convince congress that poisoned models must be destroyed, that even using synthetic training data from poisoned models must be illegal. Make them start over from a clean room with no copyrighted data.
> Make them start over from a clean room with no copyrighted data.
And when such models become popular[0], all the artists now have no job and no way to get compensation for being unable to work through no fault of their own.
I don't think that's really a winning condition. It might make you feel better about the world, but the end result is still all the artists being out of work.
[0] some models are already trained that way, although I assume you're using the word "copyrighted" in the conventional sense of "neither public domain nor an open license", as e.g. all my MIT licensed stuff is still copyrighted but it's fine to use.
In my hypothetical future, at least the people who create the content used to train the models can get "training royalties", which they aren't getting now.
There is still also money to be made in producing physical art or performances, even when AI can produce amazing digital works.
"Make them start over from a clean room with no copyrighted data." makes "the people who create the content used to train the models" the empty set.
> There is still also money to be made in producing physical art or performances, even when AI can produce amazing digital works.
Perhaps, but it may be akin to the way there is still money to be made from horse drawn carriages in city centres, even when cars displaced them over a century ago — a rare treat for special occasions, to demonstrate wealth.
Sure, though I suspect "art" is the human version of a peacock tail — the difficulty is the point, it how we signal our worth to others, cheapening it breaks that signal — which would suggest that making all forms of art easy messes with (many of) us at a deep, essentially automatic, level.
More specifically, I was responding to the idea that "compensating creators whose works are used to train the models" would actually solve anything; to use your examples, it would be as if the literal luddites were suggesting passing laws saying that "all textile machines that work like humans need to compensate the humans they displace, and also you need to make your new machines from scratch without talking to any textile workers to make sure you don't cheat", and my response would be analogous to saying "there's already machines which don't work like humans, so you're going to be out of work and have no compensation".
The Luddite movement preceded The Communist Manifesto by about 30 years. Everything's sped up since then, so I'd be surprised if we have to wait 30 years for a political shift which is to AI what Communism was to industrialisation. I'm just hoping we don't get someone analogous to Stalin or Pol Pot this time.
>If I had to pick one player who wanted to win the AI race and was willing to be ruthless to do it, I'd pick Nvidia. Computation is the excludable bottleneck, and Nvidia is the essentially the singular company who makes AI computers.
I've thought the same thing. NVIDIA getting into AI seriously is a vertical integration play and they often do that -- like NVIDIA trying to buy ARM.
If google benchmarks are to be believed, gemini 1.5 will be better than gpt and they use their own chips (Google TPU), no nvidia involved. There is also Groq. I don't see Nvidia keeping their lead and profit margins forever.
don't stop just raise the margin slightly and limit the number available of the higher end chip using the proceeds to self fund building their own datacneter
do runs of cards for themselves with higher core counts and clock speed that they dont release to others.
Sure that helps with the consumer market, but most people will use AI integrated into other products and not directly.
Those integrated AI solutions will usually be done via enterprise deals where brand name is not quite as important. It will be done by people who care about cost, reliability and ease of use.
Think of nginx's dominance in web servers even though it has no name recognition among the general population. Or Stripe's payment system.
Yes, however it's increasingly likely that the GPT in ChatGPT will not be limited to OpenAI (in the US), so I'm not sure how much ChatGPT will be worth with countless other platforms containing GPT in their names.
The thing is that there is almost no lock in in the models. So brand recognition doesn't help much as people look into the benchmarks and price sometime in the future, if not when just starting out.
Meh, I don't think it's worth much. In a few years that'll be like claiming that so-and-so had name brand recognition for transistors. Most people don't need to care who manufactures their transistors.
It’s not necessarily a bad thing. Most people don’t know that TSMC exists, or what Microsoft does beyond Windows and Xbox (which are a small fraction of its business).
Brands can change quickly, but they do matter in the short term. I've witnessed customer support teams use Firefox to say they only supported Internet Explorer and government ministers who thought it was "good" that IE was the "only" web browser, and weirdly a phone company whose customer support person thought their SIM cards worked better on Android than iPhone and that their web chat wouldn't work with a Mac even though they were talking to me on a Mac at the time.
And when I was a kid, it seemed like all the teachers thought it would be a waste of time to learn MacOS because "Apple would be bankrupt soon". (Given how much all the app UIs changed, right decision for the wrong reason).
All of these examples are end-products. "AI" itself will not be. The winner in AI will be whoever permeates other products/brands most successfully, and end-user brand familiarity doesn't matter much for that. Familiarity among engineering and product leaders is what matters.
Maybe, but maybe AI will become front and center of consumer and productivity IT products and their premier brand ambassadors will be anthropomorphized AI agents. Hello Clippy, this time for real.
Most people don't need to care who manufactures their transistors.
They might, in an upside-down world where the Shockley Semiconductor board tried to fire Shockley, and where the Traitorous Eight not only didn't bail out but took his side.
Unless your market is direct to end user, end user brand name recognition doesn't matter. In the case of AI, at least so far, the primarily income won't be from end-users directly, but rather via enterprise integrations into existing tools that already have end user market share (e.g. Microsoft Office, Microsoft Windows, VS Code, Notion, etc.)
Eh, all this talk of "moats" etc. feels weird when just a few years ago it seemed like everyone was complaining they'd rearranged their corporate structure to include a fully-owned profit-making subsidiary to attract investments, and all the loud voices seemed to think a cap of x100 return on investment was so large it was unlikely to be reached.
And then OpenAI tripped and fell over a magic money printing factory, and the complaints are now in the set ["it's just a stochastic parrot", "it's so good it's a professional threat to $category", "they've lobotomised it", "they don't have a moat", "they're too expensive"].
As the saying goes, "Prediction is very difficult, especially if it’s about the future!"
MSFT needs companies like OpenAI to give Azure credits to for their valuation to continue soaring. The deferred revenue on their balance sheet from the unspent Azure credits they give as investment are worth much more to their market cap than $80B.
I think the main move would be some type of true AGI that leads to a hard takeoff scenario, but it isn't clear we are close to that or not.
Basically something that is more than just another bump in the scorecard for GPT 5 over GPT 4. Otherwise it is still just a horse race between relatively interchangeable GPT engines.
There are no consequences for Microsoft. It owns a 49% stake in OpenAI, so the only action that OpenAI could take to hurt Microsoft would be to deliberately destroy its own value.
We should just automete these comments:
MS: Oh No! Embrace Extend Extinguish !
Google: Oh No! killedbygoogle !
Meta: Oh No! So much Ads !
Apple: Oh No! Evil App Store policy !
Uh, sorry, but this seems pretty consistent with trying to co-opt and kill open source AI competition:
> [EEE] describe its strategy for entering product categories involving widely used standards, extending those standards with proprietary capabilities, and then using those differences in order to strongly disadvantage its competitors.
> "The US tech giant will provide the 10-month-old Paris-based company with help in bringing its AI models to market. Microsoft will also take a minor stake in Mistral although the financial details have not been disclosed"
Where are the "widely used standards"? Where are the "extending the standards with proprietary capabilities"? Where is the "strongly disadvantaging competitors"?
Mistral is the most used and fine-tuned open source model by a mile, close to the standard for open models, they’ve locked them down into offering their models behind an API and in Azure. The Azure offering sets them up for be the most safe, GDPR compliant offering for enterprises in Europe, where Microsoft already has a huge reach and customer base, bolstered by Mistral being a homegrown brand.
You want both a backup for OpenAI as well as negotiating leverage if OpenAI gets too powerful and this achieves both.