Thursday, April 14, 2011

Content-aware image resizing

Today I'm going to discuss a technique called Seam Carving, originally presented in Siggraph 2007. This algorithm at it's core it's fairly simple but produces impressive results.

We will start from this image:

And take 200 pixels from its width, and turn it into this one:

Note that the image wasn't just resized, but most of the detail is still there. The size reduction is rather aggressive so there are some artifacts. But the results are quite good.

This algorithm works by repeatedly finding vertical seams of pixels and removing them. It chooses which one to remove by finding the seam with the minimal amount of energy.

The whole algorithm revolves around an energy function. In this case, I'm using a function suggested in the original paper which is based on the luminance of the image. What we do is compute the vertical and horizontal derivatives of the image, take the absolute value of each, and add both. The derivative is approximated by a simple subtraction.

The following code computes the energy of the image. The intensities image is basically the grayscale version of the image, normalized between 0 and 1.
private static FloatImage computeEnergy(FloatImage intensities) {
        int w = intensities.getWidth(), h = intensities.getHeight();
        final FloatImage energy = FloatImage.createSameSize(intensities);
        for(int x = 0; x < w-1; x++) {
            for(int y = 0; y < h-1; y++) {
                //I'm aproximating the derivatives by subtraction
                float e = abs(intensities.get(x,y)-intensities.get(x+1,y))
                        + abs(intensities.get(x,y)-intensities.get(x,y+1));
                energy.set(x,y, e);
            }
        }
        return energy;
    }

After applying this function to our image, we get the following:

You can observe that the edges are highlighted (i.e. have more energy). That is caused by our choice of an energy function. Since we're taking the derivatives and adding its absolute value, abrupt changes in luminance are highlighted (i.e. edges).

The next step is where things start to get interesting. To find the minimal energy seam, we build an image with the accumulated minimal energy. We do so by computing an image where the value of each pixel is the value of the minimum of the three above it, plus the energy of that pixel:


We do so with the following code:

final FloatImage energy = computeEnergy(intensities);

    final FloatImage minima = FloatImage.createSameSize(energy);
    //First row is equal to the energy
    for(int x = 0; x < w; x++) {
        minima.set(x,0, energy.get(x,0));
    }

    //I assume that the rightmost pixel column in the energy image is garbage
    for(int y = 1; y < h; y++) {
        minima.set(0,y, energy.get(0,y) + min(minima.get(0, y - 1),
                minima.get(1, y - 1)));

        for(int x = 1; x < w-2; x++) {
            final float sum = energy.get(x,y) + min(min(minima.get(x - 1, y - 1),
                    minima.get(x, y - 1)),minima.get(x + 1, y - 1));
            minima.set(x,y, sum);
        }
        minima.set(w-2,y, energy.get(w-2,y) + min(minima.get(w-2, y - 1),
                minima.get(w-3, y - 1)));
    }

Once we do this, the last row contains the sum of all the potential minimal seams.


With this, we search the last row for the one with the minimum total value:

//We find the minimum seam
    float minSum = Float.MAX_VALUE;
    int seamTip = -1;
    for(int x = 1; x < w-1; x++) {
        final float v = minima.get(x, h-1);
        if(v < minSum) {
            minSum=v;
            seamTip=x;
        }
    }

And backtrace the seam:

//Backtrace the seam
    final int[] seam = new int[h];
    seam[h-1]=seamTip;
    for(int x = seamTip, y = h-1; y > 0; y--) {
        float left = x>0?minima.get(x-1, y-1):Float.MAX_VALUE;
        float up = minima.get(x, y-1);
        float right = x+1<w?minima.get(x+1, y-1):Float.MAX_VALUE;
        if(left < up && left < right) x=x-1;
        else if(right < up && right < left) x= x+1;
        seam[y-1]=x;
    }
}

Having the minimum energy seam, all is left to do is remove it.

If we repeat this process several times, removing one seam at a time, we end up with a smaller image. Check the following video to see this algorithm in action:


If you want to reduce an image vertically, you have to find horizontal seams. If you want to do it vertically and horizontally you have to find which seam has the least energy (either the vertical or the horizontal one) and remove that one.

This implementation is quick & dirty and very simplistic. Many optimization can be done to make it work faster. It is also quite incomplete. By priming the energy image, you can influence the algorithm to avoid distorting certain objects in the image or to particularly pick one.

It is also possible to use it to enlarge an image (although I haven't implemented it), and by a combination of both methods one can selectively remove objects from an image.

The full source code for this demo follows. Have fun!

import javax.imageio.ImageIO;
import java.io.File;
import java.io.IOException;
import java.awt.image.BufferedImage;
import java.awt.*;
import static java.lang.Math.abs;
import static java.lang.Math.min;

public class SeamCarving
{
    public static void main(String[] args) throws IOException {
        final BufferedImage input = ImageIO.read(new File(args[0]));


        final BufferedImage[] toPaint = new BufferedImage[]{input};
        final Frame frame = new Frame("Seams") {

            @Override
            public void update(Graphics g) {
                final BufferedImage im = toPaint[0];
                if (im != null) {
                    g.clearRect(0,0,getWidth(), getHeight());
                    g.drawImage(im,0,0,this);
                }
            }
        };
        frame.setSize(input.getWidth(), input.getHeight());
        frame.setVisible(true);

        BufferedImage out = input;
        for(int i = 0; i < 200; i++) {
            out = deleteVerticalSeam(out);
            toPaint[0]=out;
            frame.repaint();
        }
    }

    private static BufferedImage deleteVerticalSeam(BufferedImage input) {
        return deleteVerticalSeam(input, findVerticalSeam(input));
    }

    private static BufferedImage deleteVerticalSeam(final BufferedImage input, final int[] seam) {
        int w = input.getWidth(), h = input.getHeight();
        final BufferedImage out = new BufferedImage(w-1,h, BufferedImage.TYPE_INT_ARGB);

        for(int y = 0; y < h; y++) {
            for(int x = 0; x < seam[y]; x++) {
                    out.setRGB(x,y,input.getRGB(x, y));
            }
            for(int x = seam[y]+1; x < w; x++) {
                    out.setRGB(x-1,y,input.getRGB(x, y));
            }
        }
        return out;
    }

    private static int[] findVerticalSeam(BufferedImage input) {
        final int w = input.getWidth(), h = input.getHeight();
        final FloatImage intensities = FloatImage.fromBufferedImage(input);
        final FloatImage energy = computeEnergy(intensities);

        final FloatImage minima = FloatImage.createSameSize(energy);
        //First row is equal to the energy
        for(int x = 0; x < w; x++) {
            minima.set(x,0, energy.get(x,0));
        }

        //I assume that the rightmost pixel column in the energy image is garbage
        for(int y = 1; y < h; y++) {
            minima.set(0,y, energy.get(0,y) + min(minima.get(0, y - 1),
                    minima.get(1, y - 1)));

            for(int x = 1; x < w-2; x++) {
                final float sum = energy.get(x,y) + min(min(minima.get(x - 1, y - 1),
                        minima.get(x, y - 1)),minima.get(x + 1, y - 1));
                minima.set(x,y, sum);
            }
            minima.set(w-2,y, energy.get(w-2,y) + min(minima.get(w-2, y - 1),minima.get(w-3, y - 1)));
        }

        //We find the minimum seam
        float minSum = Float.MAX_VALUE;
        int seamTip = -1;
        for(int x = 1; x < w-1; x++) {
            final float v = minima.get(x, h-1);
            if(v < minSum) {
                minSum=v;
                seamTip=x;
            }
        }

        //Backtrace the seam
        final int[] seam = new int[h];
        seam[h-1]=seamTip;
        for(int x = seamTip, y = h-1; y > 0; y--) {
            float left = x>0?minima.get(x-1, y-1):Float.MAX_VALUE;
            float up = minima.get(x, y-1);
            float right = x+1<w?minima.get(x+1, y-1):Float.MAX_VALUE;
            if(left < up && left < right) x=x-1;
            else if(right < up && right < left) x= x+1;
            seam[y-1]=x;
        }

        return seam;
    }

    private static FloatImage computeEnergy(FloatImage intensities) {
        int w = intensities.getWidth(), h = intensities.getHeight();
        final FloatImage energy = FloatImage.createSameSize(intensities);
        for(int x = 0; x < w-1; x++) {
            for(int y = 0; y < h-1; y++) {
                //I'm approximating the derivatives by subtraction
                float e = abs(intensities.get(x,y)-intensities.get(x+1,y))
                        + abs(intensities.get(x,y)-intensities.get(x,y+1));
                energy.set(x,y, e);
            }
        }
        return energy;
    }
}

import java.awt.image.BufferedImage;

public final class FloatImage {
    private final int width;
    private final int height;
    private final float[] data;

    public FloatImage(int width, int height) {
        this.width = width;
        this.height = height;
        this.data = new float[width*height];
    }

    public int getWidth() {
        return width;
    }

    public int getHeight() {
        return height;
    }

    public float get(final int x, final int y) {
        if(x < 0 || x >= width) throw new IllegalArgumentException("x: " + x);
        if(y < 0 || y >= height) throw new IllegalArgumentException("y: " + y);
        return data[x+y*width];
    }

    public void set(final int x, final int y, float value) {
        if(x < 0 || x >= width) throw new IllegalArgumentException("x: " + x);
        if(y < 0 || y >= height) throw new IllegalArgumentException("y: " + y);
        data[x+y*width] = value;
    }

    public static FloatImage createSameSize(final BufferedImage sample) {
        return new FloatImage(sample.getWidth(), sample.getHeight());
    }

    public static FloatImage createSameSize(final FloatImage sample) {
        return new FloatImage(sample.getWidth(), sample.getHeight());
    }

    public static FloatImage fromBufferedImage(final BufferedImage src) {
        final int width = src.getWidth();
        final int height = src.getHeight();
        final FloatImage result = new FloatImage(width, height);
        for(int x = 0; x < width; x++) {
            for(int y = 0; y < height; y++) {
                final int argb = src.getRGB(x, y);
                int r = (argb >>> 16) & 0xFF;
                int g = (argb >>> 8) & 0xFF;
                int b = argb & 0xFF;
                result.set(x,y, (r*0.3f+g*0.59f+b*0.11f)/255);
            }
        }
        return result;
    }
    public BufferedImage toBufferedImage(float scale) {
        final BufferedImage result = new BufferedImage(width, height, BufferedImage.TYPE_INT_ARGB);
        for(int x = 0; x < width; x++) {
            for(int y = 0; y < height; y++) {
                final int intensity = ((int) (get(x, y) * scale)) & 0xFF;
                result.setRGB(x,y,0xFF000000 | intensity | intensity << 8 | intensity << 16);
            }
        }
        return result;
    }
}

Friday, April 08, 2011

Nerding with the Y-combinator

What follows is a pointless exercise. Hereby I present you the Y-combinator in Java with generics:
public class Combinators {
    interface F<A,B> {
        B apply(A x);
    }
    //Used for proper type checking
    private static interface FF<A, B> extends F<FF<A, B>, F<A, B>> {}

    //The Y-combinator
    public static <A, B> F<A, B> Y(final F<F<A, B>,F<A, B>> f) {
        return U(new FF<A, B>() {
            public F<A, B> apply(final FF<A, B> x) {
                return f.apply(new F<A, B>() {
                    public B apply(A y) {
                        return U(x).apply(y);
                    }
                });
            }
        });
    }

    //The U-combinator
    private static <A,B> F<A, B> U(FF<A, B> a) {
        return a.apply(a);
    }

    static F<F<Integer, Integer>, F<Integer, Integer>> factorialGenerator() {
        return new F<F<Integer, Integer>, F<Integer, Integer>>() {
            public F<Integer, Integer> apply(final F<Integer, Integer> fact) {
                return new F<Integer, Integer>() {
                    public Integer apply(Integer n) {
                        return n == 0 ? 1 : n * fact.apply(n-1);
                    }
                };
            }
        };
    }

    public static void main(String[] args) {
        F<Integer, Integer> fact = Y(factorialGenerator());
        System.out.println(fact.apply(6));
    }
}
Having the Y-combinator implemented in Java, actually serves no purpose (Java supports recursion) but it was interesting to see if it could be done with proper generics.

Monday, April 04, 2011

Paper programming

When I was a kid, all I wanted was a computer. Finally when I was twelve I made a bargain with my dad. I would give up the graduation trip in exchange for a Commodore 64 (graduation trips are customary in Argentina when you finish primary and secondary school).


We bought a "Segundamano" (lit. second hand) magazine and found a used one for U$S 200. My dad contacted the seller and we went to pick it up.

You have to keep in mind that this was 1989 and the social and economic landscape in Argentina was a mess. That year the inflation rate was over 3000% (it is not a typo) and those 200 dollars was a lot of money, so my dad really made an effort.

If you are still paying attention, you might have noticed that I never mentioned a Datasette nor a disk drive. All I got was a Commodore 64, plain and simple. But this was not going to stop me.

After we got it, my dad was concerned that I might get too obsessed with the computer, so he placed some additional constraints on when I could use it (the fact that we had only one TV might have also been a factor in his decision). I was allowed to use it only on Saturday mornings before noon.

In retrospective, I think that this two factors made me a better programmer.

At that time I had some experience programming in Logo mostly. It was a Logo dialect in spanish that we used at school. We had a Commodore-128 laboratory at school (about eight machines with disk drives and monitors). I started learning Logo when I was 8, by the time I was twelve I could program a bit of BASIC also, but not much since literature was scarce.

One great thing about the Commodore 64 was that it came with BASIC, but most importantly, it came with a manual! The Commodore's Basic manual was my first programming book.

What happened was that I was forced to work as if I had punched cards. I would spend most of my spare time during the week writing programs in a notebook I had. Thinking about ways to solve problems and reading and re-reading the C64 manual.

On saturday mornings I would wake up at 6 am and hook-up the computer to the TV and start typing whatever program I was working on that week. Run it and debug it, improve it and around noon my mom would start reminding me that time was up. So at that point I began listing the BASIC source code and copying it back to my notebook.

It was during this time that I rediscovered things like an optimized Bubble-sort, although I didn't know its name then (and I wouldn't learn it for many more years). I still vividly remember the moment. It was one afternoon that I was trying to figure out a way to sort an array, so I started playing with a deck of cards. I finally figured out that I could do several passes of comparison and exchanges on adjacent cards. And if I did exactly as many passes as the number of elements the array would be sorted. I also noticed that the largest element would be at the end of the array after the first pass, and the second largest would be in place after the second pass and so on. This allowed me to save about half the number of comparisons.

The most valuable thing that I learned during this time is that when it comes to programming, thinking hard about the problem at hand before doing anything pays off, big time!

I eventually was able to save enough for a Datasette (it costed 40 dollars, my dad payed half of it) and I was able to build much larger programs.

The biggest one I wrote was an interpreter with a multitasking runtime. It was very basic. I had no notion of a lexer, let alone a parser. Commands were composed of a single letter (the first one of the line) and several arguments. You could declare procedures with names and call them. It was like pidgin-basic (since my exposure was to basic) but the procedure structure resembled Logo.

Programs were stored in memory in several arrays. There was a main one that contained statements, and couple to hold procedure names and offsets into the main array.

The runtime divided the screen in 4 areas (this was a 40x25 text screen, so not much room was left for each), and each window could run a separate program. The runtime would do a round robin on each running program, executing a single statement of each on each pass. For this it had to keep keeping a separate program counter and variables for each. I even planned to add windowing calls to create custom windows but I never got to finish it.

It was at this time that I also got interested in electronics, so I built a a few contraptions controlled by the C64, but that's a tale for another post.