Hacker Timesnew | past | comments | ask | show | jobs | submit | more Gcam's commentslogin

As part of our benchmarking of Groq we have asked Groq regarding quantization and they have assured us they are running models at full FP-16. It's a good point and important to check.

Link to benchmarking: https://artificialanalysis.ai/ (Note question was regarding API rather than their chat demo)


Groq's API performance reaches close to this level of performance as well. We've benchmarked performance over time and >400 tokens/s has sustained - can see here https://artificialanalysis.ai/models/mixtral-8x7b-instruct (bottom of page for over time view)


Hi, we have this if you take a look at the models page (https://artificialanalysis.ai/models) and scroll down to 'Latency', and also on the API host comparison pages for each model (e.g. https://artificialanalysis.ai/models/llama-2-chat-70b)


Ah so you do!

Your latency numbers for OpenAI (and Azure's equivalents) seem really high, I run time to first token tests and I see much better numbers!

(Also are those numbers average, p50, p99, etc? I'd honestly expect a box plot to really see what is going on!)


Hey com2kid - if you're still there, we did end up adding boxplots to show variance. Can be seen on the models page https://artificialanalysis.ai/models and on each models page where you view hosts by clicking one of the models. They are toward the end of the page under 'Detailed performance metrics'


We have Claude Instant on the models page: https://artificialanalysis.ai/models Can add it via the select at the top right of each card where it says '9 Selected' (below the highlight charts)


Ah cool was on mobile didn't eee the select


Definitely agree with your point on Claude Instant though. Much less than half the price, much higher throughput/speed for a relatively small quality decrease (varied by how 'quality' is measured, use-case)


Model quality index methodology is as per this comment (can add perplexity using the dropdown): https://hackertimes.com/item?id=39014985#39017632

It's a combination of different quality metrics which have Perplexity, overall, not performing as well. That being said, I think we are in the very early stages of model quality scoring/ranking - and (for closed sourced models) we are seeing frequent changes. Will be interesting to see how measures evolve / model ranks change


Thanks for the feedback and glad it is useful! Yes, agree might better representative of future use. I think a view of variance would be a good idea, currently just shown in over-time views - maybe a histogram of response times or a box and whisker. We have a newsletter subscribe form on the website or twitter (https://twitter.com/ArtificialAnlys) if you want to follow future updates


Variance would be good, and I've also seen significant variance on "cold" request patterns, which may correspond to resources scaling up on the backend of providers.

Would be interesting to see request latency and throughput when API calls occur cold (first data point), and once per hour, minute, and per second with the first N samples dropped.

Also, at least with Azure OpenAI, the AI safety features (filtering & annotations) make a significant difference in time to first token.


Hi HN, Thanks for checking this out! Goal with this project is to provide objective benchmarks and analysis of LLM AI models and API hosting providers to compare which to use in your next (or current) project. Benchmark comparisons include quality, price, technical performance (e.g. throughput, latency).

Twitter thread with initial insights: https://twitter.com/ArtificialAnlys/status/17472648324397343...

All feedback is welcome


Any chance of including some of the better fine tunes, e.g. wizard or tulu? (worse than mixtral but I assume other finetines will be better just like wizard and tulu are better than LLAMA2)

I guess their cost is same as base model although would effect performance.


Hey, yeah the bar for adding finetunes will probably be that they're being hosted by ~3 supported hosting providers. Very much open to it!


Can quality score be added for each inference provider for the same model. Many of them use different quantization and approximation so that it's not just price and throughput that's important. Specially for model like Mixtral.


I'd love to see replicate.com (pay per sip) on there. And lambdalabs.com

[edit: And also MPS]


We've been waiting on Replicate to launch per-token pricing for LLMs because their previous pay-per-second model was uncompetitive - but it looks like they might have just turned it on with no big announcement! They'll go straight to the top of the priority list.

Do Lambda have a serverless inference API? Not aware of them playing in this space yet.

Presume you mean MPT not MPS - yep we'll look into MosaicML soon.


We have this (and other more detailed metrics) on the models page https://artificialanalysis.ai/models if you scroll down and for individual hosts if you click into a model (nav or click one of the model bars/bubbles) :)

There are some interesting views of throughput vs. latency whereby some models are slower to the first chunk but faster for subsequent chunks and vice versa, and so suit different use cases (e.g. if just want a true/false vs. more detailed model responses)


Thanks!


Thanks! For Claude instant, select the dropdown on the top right of the card where it says '8 Selected' and can add it to the graphs. Thanks for the suggestions for adding Phi 2, Model.com as a host, can look into these!


Quality index is equally-weighted normalized values of Chatbot Arena Elo Score, MMLU, and MT Bench.

We have a bit more information in the FAQ: https://artificialanalysis.ai/faq but thanks for the feedback, will look into expanding more on how the normalization works. We are thinking of ways to improve this generalized metric.

A sticking point is quality can of course be thought of from different perspectives, reasoning, knowledge (retrieval), use-case specific (coding, math, readability), etc. This is why show individual scores on home page and models page: https://artificialanalysis.ai/models


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: