Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

My understanding of simulated annealing is that solutions that are not improvements are still accepted with some probability in early steps but that this probability decreases as "temperature" drops. Looking at your description (but not code) I did not see that aspect but it looked like you would only accept improvements of the cost function. Is this correct or where does your solution accept slight regressions with some probability, too?


Based on the other comments, they are correct in that when doing annealing you usually want to accept some moves that do lead to regression that might not improve the regression to escape early local minimums in your objective.

I abused the definition of annealing a lot in the post but I briefly touched on the idea:

"At first, you might want to make moves or swaps over large distances or you might want to accept some percent of moves that don't improve the objective, but as time goes on, you want to make smaller moves and be less likely to select moves that don't improve the objective. This is the "annealing" part of simulated annealing in the context of FPGA placement."

I think I might have made the writing confusing because I mixed the original definition of the annealing approach (of accepting moves that don't improve the objective) with the use of "annealing" for other things like action parameters (ex. swap distance between two nodes). Something I should edit to clarify better.

Note that, yes, the thing I implemented doesn't do any annealing but rather just pick actions that only improve the objective. I am working on some extensions to add real annealing but that turned out to have a lot of more in-depth technical work that is not obvious.


Yes, that's the annealing part. Otherwise it's just local search


In the "Simulated Annealing" section:

"At first, you might want to make moves or swaps over large distances or you might want to accept some percent of moves that don't improve the objective, but as time goes on ...

However, as it turns out, you technically don't need this annealing part to make FPGA placement work. You can just randomly try different moves and accept or reject them based on whether they improve the objective function. This is what I did in my toy implementation of an FPGA placer just to keep it simple."


The argument for annealing in the original paper is that accepting regressions is essential to escape from local minima.

https://www2.stat.duke.edu/~scs/Courses/Stat376/Papers/Tempe...

"Annealing, as implemented by the Metropolis procedure, differs from iterative improvement in that the procedure need not get stuck since transitions out of a local optimum are always possible at nonzero temperature. A second and more important feature is that a sort of adaptive divide-and-conquer occurs. Gross features of the eventual state of the system appear at higher tempera-tures; fine details develop at lower tem-peratures. This will be discussed with specific examples."


Yes, without a temperature and cooling schedule, how can it be annealing? It's in the name. It may sound harsh, but I'd call it an abuse of the term to do hillclimbing but call it annealing. It also seems lazy, since doing it right is an all but trivial addition to the code. Finding the best cooling schedule might require some experimentation though.


So obscure that in a field as important as optimization we still think in terms of „escaping from local minima“. Also (as a total outsider) the progress in general optimization algorithms/implementations appears to be very slow (I was shocked how old ipopt is). I was wondering if all the low hanging inductive biases (for real world problems) have already been exploited or if we just have no good ways of expressing them? Maybe learning them from data in a fuzzy way might work?


Unless you come with some new take on the P ?= NP problem, there isn't much we can improve on generic optimization.

There are all kinds of possibilities for specific problems, but if you want something generic, you have to traverse the possibility space and use its topology to get into an optimum. And if the topology is chaotic, you are out of luck, and if it's completely random, there's no hope.


Couldn‘t there be something between chaotic and completely random, let’s call it correlated, where e.g. (conditional) independence structures are similar in real-world problems?


You mean something that is well behaved in practical situations but intractable in general?

There is plenty of stuff like that, things don't even need to be chaotic for that. Anyway, chaotic and random are just two specific categories. There are many different ones. Nature happens to like those two (or rather, not random exactly, but it does surely likes things that look like it), that's why I pointed them.


Yes.

And beyond this intuition (escape from local optima), the reason that annealing matters is that you can show that (under conditions) with the right annealing schedule (it's rather slow, T ~ 1/log(Nepoch) iirc?) you will converge to the global optimum.

I'm not well-versed enough to recall the conditions, but it wouldn't surprise me if they are quite restrictive, and/or hard to implement (e.g., with no explicit annealing guidance to choose a specific temperature).


If you want to keep the pedants at bay, call this "simulated annealing at zero temperature." Or just call it a greedy algorithm.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: