Hacker Timesnew | past | comments | ask | show | jobs | submitlogin
Is Intel within ARM’s reach? Pedestrian Detection shows the way (edn.com)
23 points by kracekumar on Aug 22, 2013 | hide | past | favorite | 30 comments


Sigh. Although some of this is Intel's own doing, "i3" is not a meaningful descriptor, especially when compared to "Cortex-A15" or "Cortex-A9". "Core i3" can refer to many, many generations of CPUs; this is not a nitpick, but a real complaint, because each of those CPUs have very different performance characteristics. Core i3 has been a Westmere (Nehalem tick); a Sandy Bridge; an Ivy Bridge (Sandy Bridge tick); and a Haswell. Saying "a 1.2GHz Core i3" is about as valuable, then, as saying "a Tegra" -- which could be an ARM11 MPCore (Tegra 6xx); a Cortex-A9 without NEON (Tegra 2); a Cortex-A9 with NEON (Tegra 3); or a Cortex-A15 (Tegra 4).

So, on a micro-level, the comparison is not terribly valid. On a macro-level -- saying that the devices are within an order of magnitude -- the results are reasonable, but certainly not novel...


You are right that the i3 CPU(Core i3-530) we compared with is a little old generation. I tried to compare online Core i3-530 with Core i3-2105. The 2105 is SandyBridge and runs at a slightly higher clock 3.1 GHz(while the 530 runs at 2.93 according to CPU world).

According to cpu-world. The 2105 is 21% faster than the 530 for single threaded operations. If you account for the speed in clock this would mean only a boost of 14% improvement over the 530.

So its really not such a bad comparison.

In the meanwhile we will try run this on a newer CPU and let you know the results.


So code fully optimized for ARM/Cortex-A15 is almost as fast as only partially optimized code on a 1.2 GHz i3? Well, good to know I guess.


Yes this is correct. But as you see the original un-optimized version was running at nearly same speed on both. Meaning the A15 and Core i3 performance are comparable, if run at the same clock. The general perception has been that the raw performance of the ARM CPUs(like A15 and A57) and Intel Sandybridge CPUs like the one inside like i3/i5 are not in the same ball-park . Most people believes they are leagues apart. Also one of the reasons why such comparisons havent been made before, much.The idea of the blog is to show that this is not completely true.


Not meant to counter your argument, but at least one compiler out there (GCC) is - in my experience - very good at finding optimizations for x86 but fails most of the time for ARM unless you provide very clear and very strict hints in your code. NEON optimization is one of them. It wouldn't be the first time that GCC completely ignores intrinsics in my loops or (I kid you not) introduces 16-bit Thumb code in my 32 bit code. Very frustrating to constantly have to second-guess your compiler.


Spoiler: no. 1265 ms vs 439 ms on their OpenCV benchmark.

(Then they play some what-if games by underlocking the i3 in imaginative ways and applying SIMD opts to only the ARM side)


We wanted to keep the clocks out of the picture. This I must admit, is a little unfair, because having the ability to run at higher clocks is indeed a capability of the CPU, which cannot be discounted. In this case I compared a Cortex-A15 Vs an Core i3. The A15s can run at max 2GHz I think. While the Core i3 cores can run upto 3GHz. However ARM has Cortex A57 coming along, which I believe will have similar performance to the A15 and will be able to run at higher clocks too. Of course we need to wait for the A57 to appear on an actual SOC and measure it, before we can truly make that claim.


i guess nobody really cares about clocks. what people care about is either maximum performance or max performance per watt (cpu and/or whole system). you should be benchmarking for those two scores.


This.

Assuming the compiler will generate good SSE code for the Intel CPU is a joke. If you write intrinsics for one arch, write it for both.

I'd bet money the Intel side could be made 2.5-3x faster with proper SSE intrinsics and maybe 5-6x faster with a Haswell i3 using AVX (SandyBridge doesn't have the cache bandwidth to fully utilise AVX properly).


The original OpenCV code already has intrinsics in many portions of the code. But enabling them results only in a 10% improvement.

We decided to report non-intrinsics version, because reporting the original OpenCV numbers with intrinsics as SSE optimized would be unfair to Intel. Apparently its not very well optimized.

My own guess is that if we add intrinsics for Intel to our own C code, it will boost by around 2x. We could have written a blog without reporting the Intel C optimized numbers, but that would have been unfair to Intel again.


Very primitive benchmark, not measuring power consumption in comparison between desktop i3 and ARM when the target is embedded use!

It would be informative having the power used to process the same tasks compared.

So what was the power consumption in every reported run? What was when the i3 was underclocked?


We dont report power because we dont have a way to measure power accurately.


Also we have made no effort to hide this information.

We have applied SIMD optimizations only to ARM because thats our business. Licensing computer vision algorithms on ARM.

The blog is a by-product of that effort.


but also, maybe? http://liliputing.com/2013/07/intel-atom-z3770-bay-trail-chi...

There's some evidence that the new Bay Trail Atom's should be pretty competitive against current ARM stuff.


I am getting tired of this whole dumb x86 vs. arm comparison bullshit. You cannot compare chips (read: chips not even talking about architecture here because it's not relevant) which are designed for complete different purposes (high performance big ooo workstation cpu vs. low power mobile chip). Please stop making those.


People always focus on the instruction set -- "ARM's fixed length instructions are easy to decode, so it will win in the long run" -- ignoring how decoding is a tiny fraction of a CPU's silicon and power budget. Memory controllers, pipelining, and efficient superscalar instruction dispatch have far more effect, and Intel has a large lead on ARM in these areas.


"Memory controllers, pipelining, and efficient superscalar instruction dispatch have far more effect"

Memory controllers(atleast the SOC(chip) level ones) are normally developed by the silicon vendor - like Nvidia, Broadcomm, Qualcomm, Samsung, TI, Freescale etc. Not ARM. And these companies have been working on it for many years. They have had graphics, video acceleration, display, camera-interface IPs all integrated into one SOC for almost a decade now. Intel is infact relatively new to this kind of integration.

In anycase everything including memory controllers, pipelining and superscalar has already been taken into account in this benchmark.

What has been left out is higher clocks and hyperthreading. Two things that ARM doesnt have yet.


Something is wrong Ranjith, you have more "dead" posts, you probably triggered something like "too many posts for a new account."

Hallo admins, Ranjith is the author of the linked article!


Probably the main one is cost. ARM licenses are cheap, and the many manufacturers work on knife edge profit margins. Intel makes very large profits on high end chips that only it can fab.

On the other hand, Intel has better technology and could cannibalize itself. The future will be interesting.


Drill, I partly agree with you. But as you know ARM is getting into servers in a big way. The idea is low power servers(green servers as it is called) of course, and there are several silicon vendors already working on it. ARM recently announced Cortex-A57 and A53 precisely for this segment. The A57 can be expected to have a similar performance to that of A15, but with higher clocks. Hence this discussion, I believe, is not totally irrelevant.


It's not exclusively about the architecture (granted fully fledged x86 out-of-order cpus have some junk to carry around but the architecture overhead isn't that big). The assumption that always is made ARM = low power is just not true. At least not in all scenarios. Nobody expects a typical workstation x86 grade cpu to be competing with a designed for mobile arm-chip. But it's not about architecture, just look at the upcoming Atom chips. They seem to be very competitive with their arm counterparts. On the other side it is probably no problem to build a big arm processor like the typical intel/amd desktop cpu (not talking about shoving 5000 small mobile arm cores in it because that is just programmers and everyones nightmare). TL;DR You design a processor for a specific purpose regardless of architecture.


Drill,

The thing is that ARM licenses its core, while Intel doesnt. ARM allows many many silicon vendors to do their own thing. Make their own server CPUs etc. Something they had never been able to do before.

I have no doubt that Intel will get competitive over time on the power front, if they are not already there yet. But people need to know that ARM is getting competitive on the performance front too. So that many silicon vendors, not just Intel, can come out with servers, and SOCs for other types of applications which is pure x86 right now.


Intel will be in a lot of trouble soon (within 2-3 years).

Forget the Core line chips. That's irrelevant. It will remain a cash cow for the next few years, but a rapidly shrinking cash cow nonetheless. They'll move upmarket with them, until there's nowhere to move to.

ARM chips' improvements over the next years will make them "good enough" for most people, and Intel's Core chips which cost 10x more (literally) will be very uncompetitive in that environment.

Their only solution is to fight with Atom, but so far no success there, and even if they succeed, it means their profits will lower dramatically, and they need to survive as a company with much lower revenue and profit, which means the "all-powerful Intel" of the past will be but a faint memory in the future.


If a company has enough resources (financial and know-how) to pull out the hammer when needed, it's probably Intel. They have the best people, the best manufacturing, the most experience etc. Just look how they evolved shitty Atom processors in something competitive (bail trail). Intel has a long breath. And once they made arm processors too.


In addition to what zurn said, they also don't specify the exact Core i3 model number (it could be an older generation), the RAM speed, and if the data set fits in cache.

If RAM access is needed ARM machines usually fall behind quickly, since they usually have much lower RAM bandwidth.


The model number used for the evaluation is Core™ i3-530


That's a Westmere from Jan 2010. Westmere -> Sandy Bridge -> Ivy Bridge -> Haswell.

Comparing with something from 3.5 years and 3 generations ago is useful?


Copy pasting my reply once again.

You are right that the i3 CPU(Core i3-530) we compared with is a little old generation. I tried to compare online Core i3-530 with Core i3-2105. The 2105 is SandyBridge(couldnt find a direct comparison with an Ivy Bridge) and runs at a slightly higher clock 3.1 GHz(while the 530 runs at 2.93 according to CPU world).

According to cpu-world. The 2105 is 21% faster than the 530 for single threaded operations. If you account for the speed in clock this would mean only a boost of 14% improvement over the 530.

So its really not such a bad comparison.

In the meanwhile we will try run this on a newer CPU and let you know the results.


You Cant Scale a Server/Desktop CPU two step down into Mobile Devices.

This is what Atom is all about, the higher power/performance x86 possible.

And You also cant scale a Mobile Devices up to a Server / Desktop Product.

That is what the ARMv8 Cortex A58 is all about, Low Power Desktop and Server Class.

So technically speaking both are marching towards each others end. Although Intel would lose out due to other factor such as business model.


For those who are not convinced. Here is one more benchmark.

http://www.inpai.com.cn/doc/hard/198143_8.htm

Page takes a while to load. Then scroll down to the benchmarks. Take a look at the single threaded Linpack benchmarks between i7@3.5GHz and Exynos@1.6GHz.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: