Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

A very good writeup, but one thing always confuses me.

What is meant specifically by the "heap" and "stack"? I know what a stack is, but "heap" gets thrown around in many different contexts and I've yet to find any explanation that made it clear.

If anyone has a good explanation or good links for those two terms in this context, I'd be very grateful. Thanks!

[EDIT: thanks everyone for the answers so far!]



A stack is a data structure that grows and can be accessed from one end.

The stack usually means the call stack, or c stack, where one of the registers of the cpu is always pointed to the top of a stack structure in memory. That way it can be used to quickly allocate and free memory for function calls and function-scoped temporaries, and it has no trouble handling (limited) recursion.

Heap refers to the part of the memory that is fundamentally just a flat address space asked for from the OS. On top of that lives the heap allocator, whose job is to portion that into usable chunks and try to be able to reuse returned space. In C that is malloc/free, in c++ new and delete.

In the olden days heap and stack actually referred to the same area of memory, the heap growing upwards from the bottom and the stack growing down from the top. Luckily, today there is enough address space that we don't have to do this anymore.


AFAIK, heap and stack are still in the same "area of memory;" the discipline of separating the two is still enforced (or not) by the operating system. The heap-grows-up, stack-grows-down convention is still common on (32-bit, maybe 64-bit?) x86 too.


This looks pretty well-answered already, but I'll throw in my $0.02.

The stack and heap are the two main regions of memory where data can live in a program. They are (somewhat unhelpfully) named after the data structures originally used to represent them. Pretty much all modern languages have a concept of a stack and a heap, although there are differences both in how explicit they are, and in implementation details.

The stack is the function call stack, used to store data local to a function call. It behaves like a traditional stack, with push and pop operations. When you call a function, a new stack frame is pushed onto the stack, which contains all of the local variables that are only accessible inside that function call. When that function returns, it pops its stack frame (and sends its return value, somehow). The language scoping rules determine how one function may access the contents of an earlier stack frame that called it, but stack frames are basically deleted and inaccessible after they are popped (a source of errors in C).

The heap is the region of memory where you put data that lives longer than the function that created it. Heap behavior is far less specified, and more variable, between languages than stack behavior is (and involves the OS more), but if you're passing around pointers or references to an object you're probably using the heap. In languages that have them, the malloc and new operators usually mean "put this on the heap." In languages with a notion of reference types and value types, reference types are usually heap-allocated by default.

The distinction between the two types of storage is more important in some languages that others. It's really important in C, where you have to manually manage all your memory, but some of the more dynamic languages (especially everything-is-an-object ones) use the heap for all user data, and the call stack only for implementation details.

Languages also vary greatly in how much control you have over where things get put. On one extreme, C lets you put any type in any location, with malloc, while on the other I am not aware of any way in JavaScript to control object lifetime. Java and C# both distinguish between value-types (stack-allocated by default) and reference-types (heap-allocated by default), but C# allows user-defined value-types while Java does not. Common Lisp has a standardized way to tell the compiler that it is safe to stack-allocate a value (of any type). And so on.


There are several things I've never had a good understanding of regarding the stack. How big is the stack? Is it a fixed size dictated by the architecture of the processor? I know I've written recursions that overflow it. Other than gut feel and catching an exception after the fact, how do programmers know when their program might overflow it?


> How big is the stack? Is it a fixed size dictated by the architecture of the processor?

It's more to do with the OS or language runtime; I think the language runtime has a bigger impact for most programs.

> Other than gut feel and catching an exception after the fact, how do programmers know when their program might overflow it?

In general, you don't. You might be able to do some system-specific thing on specific systems or just happen to know what it is on some given system, but there's no way to find out in the general case. Worse, there's generally no way to recover from blowing the stack; this has always made programs written using alloca() or, more recently, variable-length arrays potentially unstable.


Just wanted to say thanks again for the thorough answer. This really made it clear for me.


Generally, the "heap" is the addressable memory assigned to the process. E.g. you malloc a byte array and gain a reference to the (byte[] rep.) memory object in heap. The life-cycle of a memory object in heap is explicitly controlled by OS (and sometimes the language e.g. C/Obj-C/etc via freeing the heap memory object). Languages with garbage collectors do this implicitly.

The "stack" is the dynamic structure -- its a stack :) -- that maintains the nested call/method/function frames from the initial entry point (e.g. main(..) in C family) so it maintains the path traversing the invocation/call graph to the current executing call/method/function. The 'objects' in a stack are call/function/method parameters, various metadata (e.g. obj ptr in some oo langauges). The life cycle of objects in stack is scoped by the call/function/method itself. Further, typically attempts are made to store (current) stack objects in CPU registers so as to avoid incurring memory access latencies.


Usually when people say "the heap" (especially in the context of garbage collection) they mean the memory where new (non-stack) data/objects are allocated from. Usually this is the bulk of memory an application uses.

There are other meanings of the word "heap" in the realm of data structures, but I haven't personally seen "heap" the data structure talked about (or used) much in the wild.


> I haven't personally seen "heap" the data structure talked about (or used) much in the wild.

Maybe not, but the concept of computing with addresses is reasonably common.

That's why programming interviews often have some variation on heap sort or radix sort.


In programming languages, generally:

* Stack: "control stack" -- a structure that tracks the context of an evaluation "step" in the program

* Heap: an environment that maps symbols to values (i.e a data structure that tracks the bindings of names to computations and results).


Traditionally, the stack is a FILO (first in, last out) allocation area. Generally they start at the top of memory and work their way down as space is allocated, then back up as it is freed.

The heap might start at the low end of unused memory (above the program and libraries) and is used for allocating space with arbitrary release order. This means it has to keep track of what is used and what is available as well as when to "grow the heap" by allocating more pages of memory.

Garbage collection is one way of managing a heap.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: