Developer Diary of Johan Sivertsen: Testing and sketching

For the last two days I have been unit testing the implementations for my thesis. Finding lot's of bugs and improving the codebase a lot. One of the core algorithms is Count Sketch. It's a kind of magic that goes like this:

We take a vector x \in R^d and map it to a vector x \in R^D. Normally the smart part is that D is a lot smaller than d (which could even be infinite as in a stream). In my thesis however I exploit two things. CountSketch maintains the inner product and there is a very fast way to find the count sketch of a tensor product by polynomial multiplication using Fast Fourier Transform. Denote the sketch of x as CS(x)

E[<CS(x),CS(y)>]=<x,y>

We also use this property of tensor products:

<x^(p),y^(p)>=<x,y>^p

which is the polynomial kernel. So if we can map to tensor products we have an explicit map of the polynomial kernel. This is not very smart on it's own because the tensor product is huge (d^p). Fortunately we have a powerful tool (http://arxiv.org/abs/1108.1320) that allows us to find the count sketch of an outer product from count sketches, like this:

CS(x^(p))=FFT^-1(FFT(CS(x))*....p-times...*FFT(CS_2(x))

Notice that this is from different count sketches using different h and s. the final CS will look like it was generated from the sum of the h'es and the product of the s'es.

Developer Diary of Johan Sivertsen

Thursday, January 30, 2014

Testing and sketching

No comments:

Post a Comment