Developer Diary of Johan Sivertsen: January 2014

Friday, January 31, 2014

Testing fast convolution of tensor sketches

Today I have been working on the final unit test of my code and the changes necessary to run it. I am testing my first implementations last and changing the code a lot in the progress. It's fun to see how different my C++ looks now. The final beast to slay is my implementation of the fast convulution of tensor sketches and checking that it actually keeps the promise of finding the Count Sketch of an outer product fast. Some things are bugging me:

I convert a CS into a polynomial.
I still use a discrete fourier transform. Somehow my CS(vector) represent the polynomial well enough. The conversion seems to be a math showpiece. I have been spending a lot of time understanding this today and it's still not really clear.

Thursday, January 30, 2014

Testing and sketching

For the last two days I have been unit testing the implementations for my thesis. Finding lot's of bugs and improving the codebase a lot. One of the core algorithms is Count Sketch. It's a kind of magic that goes like this:

We take a vector x \in R^d and map it to a vector x \in R^D. Normally the smart part is that D is a lot smaller than d (which could even be infinite as in a stream). In my thesis however I exploit two things. CountSketch maintains the inner product and there is a very fast way to find the count sketch of a tensor product by polynomial multiplication using Fast Fourier Transform. Denote the sketch of x as CS(x)

E[<CS(x),CS(y)>]=<x,y>

We also use this property of tensor products:

<x^(p),y^(p)>=<x,y>^p

which is the polynomial kernel. So if we can map to tensor products we have an explicit map of the polynomial kernel. This is not very smart on it's own because the tensor product is huge (d^p). Fortunately we have a powerful tool (http://arxiv.org/abs/1108.1320) that allows us to find the count sketch of an outer product from count sketches, like this:

CS(x^(p))=FFT^-1(FFT(CS(x))*....p-times...*FFT(CS_2(x))

Notice that this is from different count sketches using different h and s. the final CS will look like it was generated from the sum of the h'es and the product of the s'es.

Friday, January 24, 2014

Permutation matrix - Continuing Linear Algebra for programmers phrase book

Continuing the post from a few days ago about just how different Linear Algebra can look on paper and in C++ I today did a bit of work implementing and testing a permutation matrix.

Permutation matrix

A permutation matrix has a single 1 on each row, the rest is zeros. Like this:

All it does is switch rows around. Try to do a few matrix multiplications with this on the left side and you quickly see how it works. In C++ I implemented it like this for a Permutation Matrix * vector use.

void FastFood::permutate(vector<double>& data) const{

    std::shuffle(data.begin(),data.end(),std::default_random_engine(m_seed));}

Here I have a random permutation, but depending on how I make use of the "m_seed" value I can control when I want a new random order. This is useful because I want my permutation to be random, but I want it to be the same random order every time I use the method from the same object instance.

Thursday, January 23, 2014

Distribution from hell

Today I had a few errands to run but my working time was spend with the distribution from hell (again). Today I gave up and posted to math exchange http://math.stackexchange.com/questions/648577/the-radial-part-of-a-normal-distribution unfortunatly I don't think this will yield an answer.

My thinking so far can be summed to.

What problem is being solved?

The entires of the matrix look like guassian samples, but all rows have the same length. By scaling with $s_i$ this should be fixed. Conclusion: the s behaves like the lenght of a gaussian vector of size d.

It seems intuitive to me that this is related to the unit ball, but I am not fully on page with it still, and especially the weird inversion of A is just killing me.

EDIT: Det der synes at være formålet med S er at få rækkerne i V til at have længder der varierer på samme måde som hvis elementerne i V´ havde været samplet i.i.d fra N(0,1). Jeg tænker at det samme må kunne opnås ved at trække d samples fra en N(0,1) og gemme R^2 normen af disse som indgang i S. Det kræver selvfølgelig D x d samples, men det er en engangsforestilling. << from an email i wrote my supervisor the next day. This is how i ended up doing it: sample d times from a N(0,1) normal distribution and use the length of resulting vectors as entries in S.

Tuesday, January 21, 2014

Tex adventures

Today I had a fun time trying to add the dirtree style to my tex install. In the end it was rather simple with the few things that you have to guess on an ubuntu system. The procedure was:

1. Download Dirtree.
2. Run "latex dirtree.ins"
3. Copy the dirtree.sty and dirtree.tex somewhere near my other .sty files. "locate *.sty"
4. Run "texhash" to update the tex install and make it aware of the new style.

I was all listed here: http://www.ctan.org/tex-archive/macros/generic/dirtree. Except the last step for some reason. Today I also made more tests for my thesis codebase. I wrote a section on testing methods where you only have expected properties of the outcome, and had fun trying to calculate some variances.

Monday, January 20, 2014

Xbox Music API

I had a few hours to spend on F# and I wanted to do an application using the Xbox Music API. The documentation is here http://msdn.microsoft.com/en-us/library/dn546696.aspx. I think the structure of that page is messy since it invites you to start at "Getting Started", but really the best place to start is "Using the xbox restful services". So, what is REST? I found a nice description here http://martinfowler.com/articles/richardsonMaturityModel.html that basically sums to rest being proper use of resources, HTTP Verbs and "hypermedia controls" which is short for navigation hints. Hopefully I will find some more time soon to actually build something useful with this API and F#.

In the end I got it working. Microsoft uses a service to provide a token and you must then use the token to query xbox music. I spend some time failing because some of my secret keys and the token needed to be URL encoded before being passed in the request. In F# it is easy to URL encode using a .NET module, I think it was in something like System.Web.HttpUtilies.URLencode.

Friday, January 17, 2014

Linear algebra for programmers

Inspired by Eric Meijer I thought I would list some of the things that have been useful to me recently in transforming a Machine Learning article to actual running C++. So the idea is a kind of phrase book for programmers venturing into the strange lingo of Linear Algebra. Most examples are taken from this paper since I recently implemented it.

In the examples I will assume no familiarity with Linear Algebra so we will take some shortcuts, but we will get the job done. The idea is to transfer these strange terms into something familiar to any developer. I am pretty sure that you would fail horribly trying to do any math with this understanding of Linear Algebra, but it should work fine for coding.

Scalar

int, float, double, or objects if you please. Basic arithmetic should be implemented (+ - / *).

Vector

An ordered list of scalars. scalar[]

Matrix

An ordered list of vectors [scalar[]]. Can also be multidimensional. [[[scalar[]]]. All the innermost lists have the same length. Normally we only need the list of lists. The length of the lists is called the dimensionallity of the matrix. An n x m Matrix is a list of size n containing lists of size m.

Dot product / Inner product

The most normal way to multiply two vectors. Takes the sum of the product of the elements. You can only take the dot product of two vectors of the same length:

int dotProduct(int[] a, int[] b){

    if (a.size()!=b.size) throw exception;

    int result =0;

    for(int i=0; i<a.size();++i){

        result += a[i]*b[i];

    return result;

Notice that multiplying two vectors returns a scalar.

Diagonal matrix

Mathematicians spend a lot of time swapping elements or building strange structures like the diagonal matrix because Linear Algebra relies heavily on the rules for multiplying different elements. Say I want to multiply two vectors like this:

int[] DiagonalProduct(int[] a, int[] b){

    if (a.size()!=b.size) throw exception;

    int[] result.reserve(a.size());.

    for(int i=0; i,a<size();++i){

        result[i] = a[i]*b[i];

    return result;

You could call it element-wise multiplication. There is no rule like that in Linear Algebra between two vectors. So what is done is put the a vector on the diagonal of a matrix A and fill the rest of the spaces with 0's. A is then called a diagonal matrix. Then the above corresponds to multiplying A*b with the rules for Matrix vector multiplication.

Permutation matrix

The same is the case for the permutation matrix. What it does is swap the position of the list in a matrix. Consider a permutation matrix P=[[0,1],[1,0]]. What it does is swap the first and second row of a matrix. Example P*[[2,4],[6,5]] = [[6,5],[2,4]]. I C++ you can use the swap method to easily do the same job.

I hope to write more entries to this some day, but for now my 30 minutes are up.

Thursday, January 16, 2014

Lazarus, dig!

Tonight I am resurrecting this blog after a long break. Long story short i quit my job at Rehfeld to write my thesis. I have been writing since september and now things are shaping up. Back when I started my thesis I thought that I did not want to write about my progress on here because I would be documenting my work anyway.

The last three days I have been at the Warm Crocodile Developer Conference and I learned a lot of stuff. Mostly I wanted to get into F# and I am really excited to start working with a functional language that is easy to put to actual production use. However I did not want to write about F# today. I saw three talks by Eric Meijer and they where all really great. He talked a lot about duality, as a helpful concept to understand a bunch of things. Some of his examples:

De morgans laws are dual
!(A || B) => !A && !B
!(A & B) => !A || ! B

The idea of duality was often described as switching the arrows. In the above we go from a law regarding OR to AND to the second law which describes AND to OR.

Another example, Bayes rules:

P(B)*P(A|B) = P(A UNION B) = P(A)*P(B|A)

Again "switching the arrow" we get a duality describing on the one hand a prb. of A given B and on the other hand a prob. of B given A.

Finally the described how Machine Learning is the dual of Querying:

f(x) => [x,y] query by providing a function and getting data
[x,y]=> f(x) machine learning provides a function by learning from data

I think that trying to "complete" a system by finding the dual of a statement or idea is a powerful tool, and I hope I get to hear Eric give a talk again soon. He was really provoking, especially he threw a lot of dirt at Tedd Codd, but it made me wake up and want to argument so I think it served its purpose.