Monday, November 3, 2014

Updating DLL refs when calling F# code from a C# project

This might be useful some other day. I had a situation where everything worked fine in the F# project but would crash with a FileNotFound FSharp.core exception when calling from the C# project. The problem was the I had used a build of FParsec that relied on an old version of FSharp.core. This was taken care of in the F# app.config file, but not in the app.config file of the C# project. So I just made sure the to add the same snippet to the C# project and everything works again. Snippet:

Monday, August 4, 2014

C# F# Excel parsing

Working on some new additions to MKC-shift planner I decided I would write an Excel Parser in F# to work with the main C# project. Most methods of the parser are in "Parse.fs"
module FSpart.Parse

open FSpart.Enums
open FSpart.Types
open Converters

open Microsoft.Office.Interop.Excel


let cellToString (range :Microsoft.Office.Interop.Excel.Range) =
    range.Value2.ToString()

let xlApp = 
    new ApplicationClass(Visible = true)

let workbook path (xlApp:Microsoft.Office.Interop.Excel.ApplicationClass) =
    xlApp.Workbooks.Open(path)

let sheetFromPath path sheetName (xlApp:Microsoft.Office.Interop.Excel.ApplicationClass) =
    let xlWorkbook = xlApp.Workbooks.Open(path)
    let xlWorksheet = xlWorkbook.Worksheets.[sheetName] :?> Microsoft.Office.Interop.Excel.Worksheet
    xlWorksheet
   
let sheetFromWorkbook (workbook:Microsoft.Office.Interop.Excel.Workbook) sheetname =
    workbook.Worksheets.[sheetname] :?> Microsoft.Office.Interop.Excel.Worksheet

... More methods are here

let ExcelExit =
    xlApp.Quit()
    System.Runtime.InteropServices.Marshal.FinalReleaseComObject xlApp
To make this accesible to the main c# project I added "ExcelParser.fs"
module FSpart.Excelparser


type ExcelParser =
    static member Parse(path)=
        let Excel = Parse.xlApp
        let workbook = Parse.workbook path Excel
        let sheet = Parse.sheetFromWorkbook workbook 1
        let LongShiftPlan = Parse.longShiftAssignments sheet
        Parse.ExcelExit |> ignore
        LongShiftPlan
A reference to the F# project was added to the c# project and the parser can be called via.
FSpart.Excelparser.ExcelParser.Parse(path);
This Works, but there is a caveat. The last line in Parse.fs:
System.Runtime.InteropServices.Marshal.FinalReleaseComObject xlApp 
causes an error:
"COM object that has been separated from its underlying RCW cannot be used."
Omitting the line makes my program run, but it leaves those Excel processes running somewhere in the background. Any suggestions on avoiding this ?
SOLVED:
Mathias Brandewinder(@brandewinder) was kind enough to point me towards this SO post that includes lots of useful information. By following the "never use 2 dots on COM objects" rule the snippet below shows the changes made to ExcelParser, enough to kill of the Excel squatters.
module FSpart.Excelparser


type ExcelParser =
    static member Parse(path)=
        let Excel = Parse.xlApp
        let workbook = Parse.workbook path Excel
        let sheet = Parse.sheetFromWorkbook workbook 1
        let LongShiftPlan = Parse.longShiftAssignments sheet

        //Cleanup
        Parse.ExcelExit |> ignore
        System.GC.Collect()
        System.GC.WaitForPendingFinalizers()
        System.Runtime.InteropServices.Marshal.FinalReleaseComObject sheet |>ignore
        System.Runtime.InteropServices.Marshal.FinalReleaseComObject workbook |> ignore
        System.Runtime.InteropServices.Marshal.FinalReleaseComObject Excel |> ignore


        LongShiftPla
Thanks Mathias.

Thursday, July 24, 2014

F# fibonacci numbers

It's very warm. Here is an F# fibonacci sequence. I am not sure it's really good functional fu to use mutable values.
let sumUp =
    seq {let fib = ref 0
         let fib1 = ref 1
        while true do 
        yield !fib + !fib1
        let oldfib = !fib 
        fib:=!fib1
        fib1:=oldfib + !fib1}
        
(Seq.take 20 sumUp) |> Seq.iter (fun x -> printfn "%d" x)

Monday, May 19, 2014

SqlDataConnection and wasted time

Today I was trying to connect to an Azure SQL-database using F# type providers. The guide given here is great, but does not mention how to get the connection string to a database from Azure. There are some strings given inside the Azure portal, but none of them seem to match the format suggested in the guide. I spend a long time trying to construct a connection string and it just doesn't work. The connection works fine if I only specify the server, but when I also specify the database I get errors like:

Warning : SQM1012: Unable to extract table 'dbo.Board' from SqlServer. Invalid object name 'syscomments'

The table "dbo.Board" exists in the database, so somehow the connection is pointing to the right place. My code looks like this:

open System
open System.Data
open System.Data.Linq
open Microsoft.FSharp.Data.TypeProviders
open Microsoft.FSharp.Linq
 

type dbSchema = SqlDataConnection<"Server=tcp:k9qd5a64zq.database.windows.net,1433;Database=timetobeat562;User ID=johanvts@k9qd5a64zq;Password=MYPASSWORD;Trusted_Connection=False;Encrypt=True;Connection Timeout=30;">

let db = dbSchema.GetDataContext()
db.DataContext.Log <- System.Console.Out

printfn "%s" db.DataContext.Connection.DataSource

[]
let main argv =
    printfn "%A" argv
    System.Console.ReadLine() |> ignore
    0 // return an integer exit code

Any help would be greatly appriciated.

Wednesday, May 7, 2014

Rewriting another Query

I spend a lot of my time rewriting queries, so today I though I would document my process. We start with this query: 
Select A.*
,1 as Vægtet
,'Betalende' as rtype
From
(
SELECT AA.DT, CASE WHEN AA.afdeling='HK/Hovedstaden' AND Fsektor = 'HK/Privat' THEN aa.Asektor else aa.afdeling END as afdeling, AA.KTG_ID, aa.nvn, aa.AFD_SEKTOR_ID, aa.Asektor, aa.Fsektor, aa.hovedopdeling, aa.ERHVERVSAKTIVE, aa.Betalende, aa.afd_sortering, aa.antal_medlemmer_1mdr, aa.antal_medlemmer_2mdr, aa.antal_medlemmer_ult, aa.blb_1mdr,
(SUM(aa.blb_1mdr) * (aa.antal_medlemmer_1mdr)) - (SUM(aa.blb_ult) * (aa.antal_medlemmer_ult))as kr

FROM

(

SELECT convert(nvarchar(10),max(a.dt),105) as DT, a.afdeling, a.KTG_ID, a.nvn, a.AFD_SEKTOR_ID, a.Asektor, a.Fsektor, CASE WHEN a.KTG_ID IN (1910,1914) THEN 'Fritidsjob u. 25 år' ELSE a.hovedopdeling END as hovedopdeling, a.ERHVERVSAKTIVE, a.Betalende, 
CASE WHEN a.afdeling = 'HK/Hovedstaden' THEN 1 WHEN a.afdeling= 'HK/Midt' THEN 2 WHEN a.afdeling = 'HK/Sjælland' THEN 3 WHEN a.afdeling = 'HK/Sydjylland' THEN 4 WHEN a.afdeling = 'HK/Nordjylland' THEN 5 WHEN a.afdeling = 'HK/Midtvest' THEN 6 WHEN a.afdeling = 'HK/Østjylland' THEN 7 WHEN a.afdeling = 'HK/Post og Kommunikation' THEN 8 WHEN a.afdeling = 'HK/Trafik og Jernbane' THEN 9 ELSE 99 END as afd_sortering,


SUM(a.antal_medlemmer_1mdr) as antal_medlemmer_1mdr, SUM(a.antal_medlemmer_2mdr) as antal_medlemmer_2mdr, SUM(a.antal_medlemmer_ult) as antal_medlemmer_ult,

SUM(a.blb_1mdr) as blb_1mdr, SUM(a.blb_2mdr) as blb_2mdr, SUM(a.blb_ult) as blb_ult

FROM

(


SELECT a.DT, a.AAR, a.MAANED, a.KTG_ID, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, case when b.blb > 0 THEN 'Betalende' ELSE 'Ikke betalende' END as Betalende, b.blb as blb_1mdr, 0 as blb_2mdr, 0 as blb_ult, COUNT(*) as antal_medlemmer_1mdr, 0 as antal_medlemmer_2mdr, 0 as antal_medlemmer_ult
from lr.dbo.Medlem a
left join (select flag, ktg_id, SUM(blb) as blb, afd_sektor_id from [LR].[dbo].[tKontingentBlb] where flag=1 and beskrivelse in ('Forbund', 'Afdeling','Sektor') group by ktg_id, flag,afd_sektor_id) b on a.KTG_ID=b.ktg_id and a.afd_sektor_id  = b.afd_sektor_id  --and a.FLAG=b.flag
left join lr.dbo.tkontingent c on a.KTG_ID=c.ktg_id
left join lr.dbo.tAfdeling d on a.KRDS_ID=d.krds_id
left join lr.dbo.tAsektor e on a.AFD_SEKTOR_ID=e.afd_sektor_id
left join lr.dbo.tFsektor f on a.FORB_SEKTOR_ID=F.forb_sektor_id
  
where DT=(select max(dt) from lr..medlem) and a.MEDL_FORB=1 --and a.ERHVERVSAKTIVE=1
group by a.DT, a.KTG_ID, a.AAR, a.MAANED, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, b.blb


UNION ALL


SELECT a.dt, a.AAR, a.MAANED, a.KTG_ID, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, case when b.blb > 0 THEN 'Betalende' ELSE 'Ikke betalende' END as Betalende, 0 as blb_1mdr, b.blb as blb_2mdr, 0 as blb_ult, 0 as antal_medlemmer_1mdr, COUNT(*) as antal_medlemmer_2mdr, 0 as antal_medlemmer_ult
from lr.dbo.Medlem a
left join (select dt, ktg_id, SUM(blb) as blb, afd_sektor_id from [LR].[dbo].[tKontingentBlb] where flag=2 and beskrivelse in ('Forbund', 'Afdeling','Sektor') group by dt, ktg_id, afd_sektor_id ) b on a.KTG_ID=b.ktg_id and a.afd_sektor_id  = b.afd_sektor_id 
left join lr.dbo.tkontingent c on a.KTG_ID=c.ktg_id
left join lr.dbo.tAfdeling d on a.KRDS_ID=d.krds_id
left join lr.dbo.tAsektor e on a.AFD_SEKTOR_ID=e.afd_sektor_id
left join lr.dbo.tFsektor f on a.FORB_SEKTOR_ID=F.forb_sektor_id
  
where a.dt=(select dt from lr.dbo.tKalender_statistik where flag=1) and a.MEDL_FORB=1 -- and a.ERHVERVSAKTIVE=1
group by a.dt, a.KTG_ID, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE,a.AAR, a.MAANED,  d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, b.blb

UNION ALL

SELECT a.dt, a.AAR, a.MAANED, a.KTG_ID, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, case when b.blb > 0 THEN 'Betalende' ELSE 'Ikke betalende' END as Betalende, 0 as blb_1mdr, 0 as blb_2mdr, b.blb as blb_ult, 0 as antal_medlemmer_1mdr, 0 as antal_medlemmer_2mdr, COUNT(*) as antal_medlemmer_ult
from lr.dbo.Medlem a
left join (select dt, ktg_id, SUM(blb) as blb,  afd_sektor_id from [LR].[dbo].[tKontingentBlb] where dt = (SELECT  Dateadd(dd,-1,DATEADD(MILLISECOND, -1, DATEADD(YEAR,DATEDIFF(YEAR, 0, DATEADD(YEAR, -1, GETDATE())) + 1, 0)))) and beskrivelse in ('Forbund', 'Afdeling','Sektor') group by dt, ktg_id, afd_sektor_id) b on a.KTG_ID=b.ktg_id and a.afd_sektor_id  = b.afd_sektor_id 
left join lr.dbo.tkontingent c on a.KTG_ID=c.ktg_id
left join lr.dbo.tAfdeling d on a.KRDS_ID=d.krds_id
left join lr.dbo.tAsektor e on a.AFD_SEKTOR_ID=e.afd_sektor_id
left join lr.dbo.tFsektor f on a.FORB_SEKTOR_ID=F.forb_sektor_id
  
where a.dt=(SELECT  Dateadd(dd,-1,DATEADD(MILLISECOND, -1, DATEADD(YEAR,DATEDIFF(YEAR, 0, DATEADD(YEAR, -1, GETDATE())) + 1, 0)))) and a.MEDL_FORB=1 -- and a.ERHVERVSAKTIVE=1
group by a.dt, a.KTG_ID, c.nvn, c.hovedopdeling,a.ERHVERVSAKTIVE ,a.AAR, a.MAANED,  d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, b.blb

) A

group by a.afdeling, a.KTG_ID, a.nvn, a.AFD_SEKTOR_ID, a.Asektor, a.Fsektor, CASE WHEN a.KTG_ID IN (1910,1914) THEN 'Fritidsjob u. 25 år' ELSE a.hovedopdeling END, a.ERHVERVSAKTIVE, a.Betalende, 
CASE WHEN a.afdeling = 'HK/Hovedstaden' THEN 1 WHEN a.afdeling= 'HK/Midt' THEN 2 WHEN a.afdeling = 'HK/Sjælland' THEN 3 WHEN a.afdeling = 'HK/Sydjylland' THEN 4 WHEN a.afdeling = 'HK/Nordjylland' THEN 5 WHEN a.afdeling = 'HK/Midtvest' THEN 6 WHEN a.afdeling = 'HK/Østjylland' THEN 7 WHEN a.afdeling = 'HK/Post og Kommunikation' THEN 8 WHEN a.afdeling = 'HK/Trafik og Jernbane' THEN 9 ELSE 99 END

) AA

where aa.asektor <> '?' and aa.afdeling not in ('HK/Trafik og Jernbane', 'HK/Post og kommunikation')

GROUP BY AA.DT, AA.afdeling, AA.KTG_ID, aa.nvn,aa.AFD_SEKTOR_ID, aa.Asektor, aa.Fsektor, aa.hovedopdeling, aa.ERHVERVSAKTIVE, aa.Betalende, aa.afd_sortering, aa.antal_medlemmer_1mdr, aa.antal_medlemmer_2mdr, aa.antal_medlemmer_ult, aa.blb_1mdr
) A 
--Cross join (SELECT Max([dt]) as dt FROM [LR].[dbo].[factKontigent_Blb]) B
--left join (SELECT [dt], [ktg_id], [afd_sektor_id], [Forbund],[Afdeling],[Sektor],[Fuldtid_Forbund],[Fuldtid_Afdeling],[Fuldtid_Sektor] FROM [LR].[dbo].[factKontigent_Blb] Where dt =(SELECT Max([dt]) as dt FROM [LR].[dbo].[factKontigent_Blb]))
--C on A.KTG_ID = C.ktg_id and A.AFD_SEKTOR_ID = C.[afd_sektor_id]


union all

Select A.DT ,CASE WHEN A.afdeling='HK/Hovedstaden' THEN a.Asektor else a.afdeling END as afdeling,A.KTG_ID,A.nvn ,A.AFD_SEKTOR_ID,A.Asektor,A.Fsektor,A.hovedopdeling,A.ERHVERVSAKTIVE,A.Betalende,A.afd_sortering
,Case isnull(C.[Fuldtid_Forbund]+C.[Fuldtid_Afdeling]+C.[Fuldtid_Sektor],0) when 0 then 0 else (C.[Forbund]+C.[Afdeling]+C.[Sektor])/(C.[Fuldtid_Forbund]+C.[Fuldtid_Afdeling]+C.[Fuldtid_Sektor]) end * A.antal_medlemmer_1mdr as antal_medlemmer_1mdr
,Case isnull(C.[Fuldtid_Forbund]+C.[Fuldtid_Afdeling]+C.[Fuldtid_Sektor],0) when 0 then 0 else (C.[Forbund]+C.[Afdeling]+C.[Sektor])/(C.[Fuldtid_Forbund]+C.[Fuldtid_Afdeling]+C.[Fuldtid_Sektor]) end * A.antal_medlemmer_2mdr as antal_medlemmer_2mdr
,Case isnull(C.[Fuldtid_Forbund]+C.[Fuldtid_Afdeling]+C.[Fuldtid_Sektor],0) when 0 then 0 else (C.[Forbund]+C.[Afdeling]+C.[Sektor])/(C.[Fuldtid_Forbund]+C.[Fuldtid_Afdeling]+C.[Fuldtid_Sektor]) end * A.antal_medlemmer_ult as antal_medlemmer_ult
,A.blb_1mdr
,A.kr
,Case isnull(C.[Fuldtid_Forbund]+C.[Fuldtid_Afdeling]+C.[Fuldtid_Sektor],0) when 0 then 0 else (C.[Forbund]+C.[Afdeling]+C.[Sektor])/(C.[Fuldtid_Forbund]+C.[Fuldtid_Afdeling]+C.[Fuldtid_Sektor]) end as Vægtetmedlem
,'Fuldtidsvægtede' as rtype
From
(
SELECT AA.DT, AA.afdeling, AA.KTG_ID, aa.nvn, aa.AFD_SEKTOR_ID, aa.Asektor, aa.Fsektor, aa.hovedopdeling, aa.ERHVERVSAKTIVE, aa.Betalende, aa.afd_sortering, aa.antal_medlemmer_1mdr, aa.antal_medlemmer_2mdr, aa.antal_medlemmer_ult, aa.blb_1mdr,
(SUM(aa.blb_1mdr) * (aa.antal_medlemmer_1mdr)) - (SUM(aa.blb_ult) * (aa.antal_medlemmer_ult))as kr

FROM

(

SELECT convert(nvarchar(10),max(a.dt),105) as DT, a.afdeling, a.KTG_ID, a.nvn, a.AFD_SEKTOR_ID, a.Asektor, a.Fsektor, CASE WHEN a.KTG_ID IN (1910,1914) THEN 'Fritidsjob u. 25 år' ELSE a.hovedopdeling END as hovedopdeling, a.ERHVERVSAKTIVE, a.Betalende, 
CASE WHEN a.afdeling = 'HK/Hovedstaden' THEN 1 WHEN a.afdeling= 'HK/Midt' THEN 2 WHEN a.afdeling = 'HK/Sjælland' THEN 3 WHEN a.afdeling = 'HK/Sydjylland' THEN 4 WHEN a.afdeling = 'HK/Nordjylland' THEN 5 WHEN a.afdeling = 'HK/Midtvest' THEN 6 WHEN a.afdeling = 'HK/Østjylland' THEN 7 WHEN a.afdeling = 'HK/Post og Kommunikation' THEN 8 WHEN a.afdeling = 'HK/Trafik og Jernbane' THEN 9 ELSE 99 END as afd_sortering,


SUM(a.antal_medlemmer_1mdr) as antal_medlemmer_1mdr, SUM(a.antal_medlemmer_2mdr) as antal_medlemmer_2mdr, SUM(a.antal_medlemmer_ult) as antal_medlemmer_ult,

SUM(a.blb_1mdr) as blb_1mdr, SUM(a.blb_2mdr) as blb_2mdr, SUM(a.blb_ult) as blb_ult

FROM

(


SELECT a.DT, a.AAR, a.MAANED, a.KTG_ID, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, case when b.blb > 0 THEN 'Betalende' ELSE 'Ikke betalende' END as Betalende, b.blb as blb_1mdr, 0 as blb_2mdr, 0 as blb_ult, COUNT(*) as antal_medlemmer_1mdr, 0 as antal_medlemmer_2mdr, 0 as antal_medlemmer_ult
from lr.dbo.Medlem a
left join (select flag, ktg_id, SUM(blb) as blb, afd_sektor_id from [LR].[dbo].[tKontingentBlb] where flag=1 and beskrivelse in ('Forbund', 'Afdeling','Sektor') group by ktg_id, flag,afd_sektor_id) b on a.KTG_ID=b.ktg_id and a.afd_sektor_id  = b.afd_sektor_id  --and a.FLAG=b.flag
left join lr.dbo.tkontingent c on a.KTG_ID=c.ktg_id
left join lr.dbo.tAfdeling d on a.KRDS_ID=d.krds_id
left join lr.dbo.tAsektor e on a.AFD_SEKTOR_ID=e.afd_sektor_id
left join lr.dbo.tFsektor f on a.FORB_SEKTOR_ID=F.forb_sektor_id
  
where DT=(select max(dt) from lr..medlem) and a.MEDL_FORB=1 --and a.ERHVERVSAKTIVE=1
group by a.DT, a.KTG_ID, a.AAR, a.MAANED, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, b.blb


UNION ALL


SELECT a.dt, a.AAR, a.MAANED, a.KTG_ID, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, case when b.blb > 0 THEN 'Betalende' ELSE 'Ikke betalende' END as Betalende, 0 as blb_1mdr, b.blb as blb_2mdr, 0 as blb_ult, 0 as antal_medlemmer_1mdr, COUNT(*) as antal_medlemmer_2mdr, 0 as antal_medlemmer_ult
from lr.dbo.Medlem a
left join (select dt, ktg_id, SUM(blb) as blb, afd_sektor_id from [LR].[dbo].[tKontingentBlb] where flag=2 and beskrivelse in ('Forbund', 'Afdeling','Sektor') group by dt, ktg_id, afd_sektor_id ) b on a.KTG_ID=b.ktg_id and a.afd_sektor_id  = b.afd_sektor_id 
left join lr.dbo.tkontingent c on a.KTG_ID=c.ktg_id
left join lr.dbo.tAfdeling d on a.KRDS_ID=d.krds_id
left join lr.dbo.tAsektor e on a.AFD_SEKTOR_ID=e.afd_sektor_id
left join lr.dbo.tFsektor f on a.FORB_SEKTOR_ID=F.forb_sektor_id
  
where a.dt=(select dt from lr.dbo.tKalender_statistik where flag=1) and a.MEDL_FORB=1 --and a.ERHVERVSAKTIVE=1
group by a.dt, a.KTG_ID, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE,a.AAR, a.MAANED,  d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, b.blb

UNION ALL

SELECT a.dt, a.AAR, a.MAANED, a.KTG_ID, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, case when b.blb > 0 THEN 'Betalende' ELSE 'Ikke betalende' END as Betalende, 0 as blb_1mdr, 0 as blb_2mdr, b.blb as blb_ult, 0 as antal_medlemmer_1mdr, 0 as antal_medlemmer_2mdr, COUNT(*) as antal_medlemmer_ult
from lr.dbo.Medlem a
left join (select dt, ktg_id, SUM(blb) as blb,  afd_sektor_id from [LR].[dbo].[tKontingentBlb] where dt = (SELECT  Dateadd(dd,-1,DATEADD(MILLISECOND, -1, DATEADD(YEAR,DATEDIFF(YEAR, 0, DATEADD(YEAR, -1, GETDATE())) + 1, 0)))) and beskrivelse in ('Forbund', 'Afdeling','Sektor') group by dt, ktg_id, afd_sektor_id) b on a.KTG_ID=b.ktg_id and a.afd_sektor_id  = b.afd_sektor_id 
left join lr.dbo.tkontingent c on a.KTG_ID=c.ktg_id
left join lr.dbo.tAfdeling d on a.KRDS_ID=d.krds_id
left join lr.dbo.tAsektor e on a.AFD_SEKTOR_ID=e.afd_sektor_id
left join lr.dbo.tFsektor f on a.FORB_SEKTOR_ID=F.forb_sektor_id
  
where a.dt=(SELECT  Dateadd(dd,-1,DATEADD(MILLISECOND, -1, DATEADD(YEAR,DATEDIFF(YEAR, 0, DATEADD(YEAR, -1, GETDATE())) + 1, 0)))) and a.MEDL_FORB=1 --and a.ERHVERVSAKTIVE=1
group by a.dt, a.KTG_ID, c.nvn, c.hovedopdeling,a.ERHVERVSAKTIVE ,a.AAR, a.MAANED,  d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, b.blb

) A

group by a.afdeling, a.KTG_ID, a.nvn, a.AFD_SEKTOR_ID, a.Asektor, a.Fsektor, CASE WHEN a.KTG_ID IN (1910,1914) THEN 'Fritidsjob u. 25 år' ELSE a.hovedopdeling END, a.ERHVERVSAKTIVE, a.Betalende, 
CASE WHEN a.afdeling = 'HK/Hovedstaden' THEN 1 WHEN a.afdeling= 'HK/Midt' THEN 2 WHEN a.afdeling = 'HK/Sjælland' THEN 3 WHEN a.afdeling = 'HK/Sydjylland' THEN 4 WHEN a.afdeling = 'HK/Nordjylland' THEN 5 WHEN a.afdeling = 'HK/Midtvest' THEN 6 WHEN a.afdeling = 'HK/Østjylland' THEN 7 WHEN a.afdeling = 'HK/Post og Kommunikation' THEN 8 WHEN a.afdeling = 'HK/Trafik og Jernbane' THEN 9 ELSE 99 END

) AA

where  aa.asektor <> '?' and aa.afdeling not in ('HK/Trafik og Jernbane', 'HK/Post og kommunikation')

GROUP BY AA.DT, AA.afdeling, AA.KTG_ID, aa.nvn,aa.AFD_SEKTOR_ID, aa.Asektor, aa.Fsektor, aa.hovedopdeling, aa.ERHVERVSAKTIVE, aa.Betalende, aa.afd_sortering, aa.antal_medlemmer_1mdr, aa.antal_medlemmer_2mdr, aa.antal_medlemmer_ult, aa.blb_1mdr
) A 
--Cross join (SELECT Max([dt]) as dt FROM [LR].[dbo].[factKontigent_Blb]) B
left join (SELECT [dt], [ktg_id], [afd_sektor_id], [Forbund],[Afdeling],[Sektor],[Fuldtid_Forbund],[Fuldtid_Afdeling],[Fuldtid_Sektor] FROM [LR].[dbo].[factKontigent_Blb] Where dt =(SELECT Max([dt]) as dt FROM [LR].[dbo].[factKontigent_Blb]))
C on A.KTG_ID = C.ktg_id and A.AFD_SEKTOR_ID = C.[afd_sektor_id]
Where antal_medlemmer_1mdr>0 or antal_medlemmer_2mdr > 0 or antal_medlemmer_ult > 0
This query is central to a few dashboards but I needed to make some changes and I would also like it to load faster, currently it takes 39 seconds. I skimmed the entire query, but to make changes I just start from the top. Open a new query windows and make a copy so I can compare my changes query results to the current ones.

Starting at the top the first 4 lines simply add two columns with fixed values "1, 'Betalende'", so I remove that outer query and add it to the first inner query starting at line 6. I also removed the mathching ending on line 75.

Next I looked at the many unioned queries starting at line 26, they seem very similar. Looking closer they select the same thing, except the last six values where they are used to sum one thing at a time, and then the outer queries collapses the rows using sum. It seems this would be better handled by a single select statement with conditional sums. I started writing my conditional sums, one thing that helped a lot is to always look at the "raw" data, that's just select * without the aggregates and looking at  the rows. In the end i had constructed this:

SELECT max(a.dt)
       , a.KTG_ID, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, c.nvn, c.hovedopdeling
       , a.ERHVERVSAKTIVE, case when sum(b.blb) > 0 THEN 'Betalende' ELSE 'Ikke betalende' END as Betalende
    , sum( CASE WHEN flag = 0 THEN b.blb ELSE 0 END) as blb_1mdr
    , sum( CASE WHEN flag = 1 THEN b.blb ELSE 0 END) as blb_2mdr
    , sum( CASE WHEN a.Dt = convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) THEN b.blb ELSE 0 END) as blb_ult
    , COUNT(DISTINCT CASE WHEN flag = 0 THEN PRS_ID ELSE NULL END) as antal_medlemmer_1mdr
    , COUNT( DISTINCT CASE WHEN flag = 1 THEN PRS_ID ELSE NULL END) as antal_medlemmer_2mdr
    , COUNT( DISTINCT CASE WHEN a.Dt = convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) THEN PRS_ID ELSE NULL END) as antal_medlemmer_ult
    , CASE WHEN COUNT(distinct case when flag = 0 then Beskrivelse else null end) = 0 THEN 0 ELSE SUM( CASE WHEN FLAG = 0 THEN fuld_vægt ELSE 0 END)/COUNT(distinct case when flag = 0 then Beskrivelse else null end)END
    , CASE WHEN COUNT(distinct case when flag = 1 then Beskrivelse else null end) = 0 THEN 0 ELSE SUM( CASE WHEN FLAG = 1 THEN fuld_vægt ELSE 0 END)/COUNT(distinct case when flag = 1 then Beskrivelse else null end) END
    , CASE WHEN COUNT(distinct case when a.DT = convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) then Beskrivelse else null end) = 0 THEN 0 ELSE SUM( CASE WHEN a.DT = convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) THEN fuld_vægt ELSE 0 END)/COUNT(distinct case when a.DT = convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) then Beskrivelse else null end) END
    from lr.dbo.Medlem a
    left join [LR].[dbo].[tKontingentBlb] b on a.KTG_ID=b.ktg_id and a.afd_sektor_id  = b.afd_sektor_id and a.DT = b.dt
    left join lr.dbo.tkontingent c on a.KTG_ID=c.ktg_id
    left join lr.dbo.tAfdeling d on a.KRDS_ID=d.krds_id
    left join lr.dbo.tAsektor e on a.AFD_SEKTOR_ID=e.afd_sektor_id
    left join lr.dbo.tFsektor f on a.FORB_SEKTOR_ID=F.forb_sektor_id
    OUTER APPLY (SELECT top 1 CASE WHEN sum([Fuldtid_Forbund]+[Fuldtid_Afdeling]+[Fuldtid_Sektor]) = 0 THEN 0 ELSE 
     sum(CAST([Forbund]+[Afdeling]+[Sektor] AS FLOAT)/([Fuldtid_Forbund]+[Fuldtid_Afdeling]+[Fuldtid_Sektor])) END as fuld_vægt
    FROM [LR].[dbo].[factKontigent_Blb] 
    where ktg_id= A.KTG_ID and [afd_sektor_id]= A.AFD_SEKTOR_ID and 
    dt = a.DT ) ktg_fact 
    where (a.DT>=DATEADD(M,-1,GETDATE()) OR a.DT =convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105)) and a.MEDL_FORB=1  and beskrivelse in ('Forbund', 'Afdeling','Sektor') and a.dt = b.dt
    group by    a.KTG_ID,c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor,fuld_vægt
    order by ktg_id,afd_sektor_id,afdeling

While this is much more compact, it's still not very readable and it turns out the wrong results. The summing simply is too complicated and too many strange constrains across the rows apply. Time to start over. In my second approach i first select the data i am looking for a into a temporary table and a CTE and the abuse the PIVOT command to get the summations as needed. Notice that the aggragation in my last select are necessary for the use of more than one PIVOT statement:

/*
Datagrundlag til medlemsdelen af det fælles dashboard
Author : Johan Sivertsen

==================================
Kontingentoplysninger: Samme som viewet factkontingentblb men her valgt kun for de tre relevante måneder
*/
SELECT     Kontigent.*, Fuldtid.Forbund AS [Fuldtid_Forbund], Fuldtid.Afdeling AS [Fuldtid_Afdeling], Fuldtid.Sektor AS [Fuldtid_Sektor], 
                      Fuldtid.[AkasseAdm] AS [Fuldtid_AkasseAdm] into #kont_fact
FROM         (SELECT     dt, aar, maaned, ktg_id, afd_sektor_id, sum(isnull([Forbund], 0)) AS [Forbund], sum(isnull([Afdeling], 0)) AS [Afdeling], sum(isnull([Sektor], 0)) 
                                              AS [Sektor], Sum(isnull([AkasseAdm], 0)) AS [AkasseAdm]
                       FROM          (SELECT     Kontingent.dt, Kontingent.aar, Kontingent.maaned, Kontingent.ktg_id, Kontingent.afd_sektor_id, Kontingent.beskrivelse, 
                                                                      Kontingent.blb, fuldtid.fblb, CASE fuldtid.fblb WHEN 0 THEN 0 ELSE cast(Kontingent.blb AS FLOAT) / cast(fuldtid.fblb AS FLOAT) 
                                                                      END AS fa
                                               FROM          (SELECT     tKontingentBlb_alle.aar, tKontingentBlb_alle.maaned, blb AS fblb, beskrivelse, afd_sektor_id
                                                                       FROM          dbo.tKontingentBlb_alle
                                                                       WHERE      ((tKontingentBlb_alle.ktg_id = 1110) AND beskrivelse IN ('Sektor', 'Afdeling', 'Forbund')) OR
                                                                                              ((tKontingentBlb_alle.ktg_id = 2110) AND beskrivelse IN ('AkasseAdm'))
                                                                       GROUP BY tKontingentBlb_alle.aar, tKontingentBlb_alle.maaned, tKontingentBlb_alle.blb, tKontingentBlb_alle.afd_sektor_id, 
                                                                                              tKontingentBlb_alle.beskrivelse) AS fuldtid INNER JOIN
                                                                      tKontingentBlb_alle AS Kontingent ON fuldtid.aar = Kontingent.aar AND fuldtid.maaned = Kontingent.maaned AND 
                                                                      fuldtid.afd_sektor_id = Kontingent.afd_sektor_id AND fuldtid.beskrivelse = Kontingent.beskrivelse) AS A PIVOT (max(blb) FOR 
                                              beskrivelse IN ([Forbund], [Afdeling], [Sektor], [AkasseAdm])) AS pvt
                       GROUP BY dt, aar, maaned, ktg_id, afd_sektor_id) AS Kontigent INNER JOIN
                          (SELECT     dt, aar, maaned, ktg_id, afd_sektor_id, sum(isnull([Forbund], 0)) AS [Forbund], sum(isnull([Afdeling], 0)) AS [Afdeling], sum(isnull([Sektor], 
                                                   0)) AS [Sektor], Sum(isnull([AkasseAdm], 0)) AS [AkasseAdm]
                            FROM          (SELECT     Kontingent.dt, Kontingent.aar, Kontingent.maaned, Kontingent.ktg_id, Kontingent.afd_sektor_id, Kontingent.beskrivelse, 
                                                                           Kontingent.blb, fuldtid.fblb, CASE fuldtid.fblb WHEN 0 THEN 0 ELSE cast(Kontingent.blb AS FLOAT) / cast(fuldtid.fblb AS FLOAT) 
                                                                           END AS fa
                                                    FROM          (SELECT     tKontingentBlb_alle.aar, tKontingentBlb_alle.maaned, blb AS fblb, beskrivelse, afd_sektor_id
                                                                            FROM          dbo.tKontingentBlb_alle
                                                                            WHERE      ((tKontingentBlb_alle.ktg_id = 1110) AND beskrivelse IN ('Sektor', 'Afdeling', 'Forbund')) OR
                                                                                                   ((tKontingentBlb_alle.ktg_id = 2110) AND beskrivelse IN ('AkasseAdm'))
                                                                            GROUP BY tKontingentBlb_alle.aar, tKontingentBlb_alle.maaned, tKontingentBlb_alle.blb, tKontingentBlb_alle.afd_sektor_id, 
                                                                                                   tKontingentBlb_alle.beskrivelse) AS fuldtid INNER JOIN
                                                                           tKontingentBlb_alle AS Kontingent ON fuldtid.aar = Kontingent.aar AND fuldtid.maaned = Kontingent.maaned AND 
                                                                           fuldtid.afd_sektor_id = Kontingent.afd_sektor_id AND fuldtid.beskrivelse = Kontingent.beskrivelse) AS A PIVOT (max(fblb) FOR 
                                                   beskrivelse IN ([Forbund], [Afdeling], [Sektor], [AkasseAdm])) AS pvt
WHERE (DT>=DATEADD(M,-1,GETDATE()) OR DT =convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105))
GROUP BY dt, aar, maaned, ktg_id, afd_sektor_id) AS Fuldtid ON Kontigent.dt = Fuldtid.dt AND Kontigent.ktg_id = Fuldtid.ktg_id AND 
                      Kontigent.afd_sektor_id = Fuldtid.afd_sektor_id;

/*
====================== # Medlem ===============
Danner medlemsoplysninger - Bruger WITH da dette er den tungeste af de to midlertidige tabeller
*/
WITH medlem AS(
SELECT medlem.dt,medlem.prs_id, medlem.KTG_ID, afd.afdeling, medlem.AFD_SEKTOR_ID, Asek.Asektor, Fsek.Fsektor
FROM LR.dbo.medlem medlem
 left join lr.dbo.tAfdeling afd on medlem.krds_id = afd.krds_id
 left join lr.dbo.tAsektor Asek on medlem.afd_sektor_id = Asek.afd_sektor_id
 left join lr.dbo.tFsektor Fsek on medlem.FORB_SEKTOR_ID = Fsek.forb_sektor_id
WHERE (medlem.DT>=DATEADD(M,-1,GETDATE()) OR medlem.DT =convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105)) --medlemmer fra de to seneste md samt ult. sidste år
      and medlem.MEDL_Forb = 1  )                       --medlemmer af forbundet)




/*Det hele klistres sammen med en lækker triple PIVOT, den første pivot opbruger egentlig den midlertidige tabel, 
men hvis man undlader at kalde den(SourceTable) ved navn virker det. Grupperingen er nødvendig da PIVOT ikke ved
at de tre "id" nøgler egentlig alle peger samme opdeling af dt.
*/


SELECT afdeling, CASE WHEN Fsektor = 'HK/Privat' and afdeling = 'HK/Hovedstaden' THEN Asektor ELSE afdeling END as Afdeling_Privat,Fsektor,ktg_id,nvn,hovedopdeling, ---row
  isnull(MAX([0]),0) as antal_dm,
  isnull(MAX([1]),0) as antal_sm,
  isnull(MAX([2]),0) as antal_ult,
  isnull(MAX([3]),0) as ktg_dm,
  isnull(MAX([4]),0) as ktg_sm,
  isnull(MAX([5]),0) as ktg_ult,
  Coalesce(MAX([6]),MAX([7]),MAX([8])) as fuldktg_dm,
  Coalesce(MAX([7]),MAX([6]),MAX([8])) as fuldktg_sm,
  Coalesce(MAX([8]),MAX([7]),MAX([6])) as fuldktg_ult,
  ISNULL(MAX([9]),0) as fb_dm,
  ISNULL(MAX([10]),0) as fb_sm,
  ISNULL(MAX([11]),0) as fb_ult,
  Coalesce(MAX([12]),MAX([9]),1) as fb_fuld_dm, --The coalesce avoid deviding with 0
  Coalesce(MAX([13]),MAX([10]),1) as fb_fuld_sm,
  Coalesce(MAX([14]),MAX([11]),1) as fb_fuld_ult,
  CASE WHEN MAX([3])>0 THEN 1 ELSE 0 END AS Betalende
FROM(
SELECT  CASE WHEN medlem.DT =convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) THEN 2 WHEN MONTH(medlem.DT) = MONTH(GETDATE())-1 THEN 1 WHEN MONTH(medlem.dt)=MONTH(GETDATE()) THEN 0 ELSE NULL END as antal_id,
  CASE WHEN medlem.DT =convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) THEN 5  WHEN MONTH(medlem.DT) = MONTH(GETDATE())-1 THEN 4 WHEN MONTH(medlem.dt)=MONTH(GETDATE()) THEN 3 ELSE NULL END as blb_id,
  CASE WHEN medlem.DT =convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) THEN 8  WHEN MONTH(medlem.DT) = MONTH(GETDATE())-1 THEN 7 WHEN MONTH(medlem.dt)=MONTH(GETDATE()) THEN 6 ELSE NULL END as fuld_id,
  CASE WHEN medlem.DT =convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) THEN 11  WHEN MONTH(medlem.DT) = MONTH(GETDATE())-1 THEN 10 WHEN MONTH(medlem.dt)=MONTH(GETDATE()) THEN 9 ELSE NULL END as fb_id,
  CASE WHEN medlem.DT =convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) THEN 14  WHEN MONTH(medlem.DT) = MONTH(GETDATE())-1 THEN 13 WHEN MONTH(medlem.dt)=MONTH(GETDATE()) THEN 12 ELSE NULL END as fb_fuld_id,
  medlem.prs_id,medlem.afdeling,medlem.Asektor,medlem.Fsektor,medlem.KTG_ID,kont_description.nvn,kont_description.hovedopdeling,
  forbund+kont.afdeling+sektor as beløb,
  Fuldtid_forbund+fuldtid_afdeling+fuldtid_sektor as Fuldt_beløb,
  Forbund as fb_beløb,
  Fuldtid_Forbund as fb_fuldt_beløb
 FROM medlem
 Left join #kont_fact as kont on medlem.KTG_ID=kont.ktg_id and medlem.AFD_SEKTOR_ID = kont.afd_sektor_id and medlem.DT = kont.dt
 Left join tKontingent as kont_description on medlem.KTG_ID = kont_description.ktg_id) AS SourceTable
 PIVOT
 (
 COUNT(SourceTable.prs_id)
 FOR antal_id IN([0],[1],[2])
 ) AS PivotTable
 PIVOT
 (
 SUM(beløb)
 FOR blb_id IN([3],[4],[5])
 ) AS PivotTable
 PIVOT
 (
 SUM(Fuldt_beløb)
 FOR fuld_id IN([6],[7],[8])
 ) AS PivotTable
 PIVOT
 (
 SUM(fb_beløb)
 FOR fb_id IN ([9],[10],[11])
 ) AS PivotTable
 PIVOT
 (
 SUM(fb_fuldt_beløb)
 FOR fb_fuld_id IN ([12],[13],[14])
 ) AS PivotTable
 GROUP BY afdeling, CASE WHEN Fsektor = 'HK/Privat' and afdeling = 'HK/Hovedstaden' THEN Asektor ELSE afdeling END,Fsektor,ktg_id,nvn,hovedopdeling

drop table #kont_fact
While this query is far from elegant it is still more compact and readable than what we started with, but it requires an understanding of the PIVOT function, and how to abuse it to pivot multiple colums. This is achieved by constructing multiple ID's and using aggregation. Not pretty, but it works. Best of all it runs in around 14 seconds, mostly because it does not rely on the factKontingentblb view, but just selects the necessary subsection into a tmp. table.

Tuesday, May 6, 2014

Since last time - Ugly fonts on SSRS charts

Since finishing my thesis I have moved away from working with C++ to a temporary job with SQL and the microsoft BI stack while I await a decision on my Ph.D. application. So I will try to write a bit about SSRS and SQL here the next few months. I also try to catch up on some F# in my spare time so hopefully I will also write a bit about that.

Today I had to deal with a old nemesis: SSRS 2008. When drawing graphs the font sometimes becomes terribly ugly. The problem is that SSRS cannot use anti-aliasing when you set the chart background to "No color", simply because it appearently has to render the chart before knowing about the color of the background. If you set a background color the text becomes smooth again. You can then use if statements to make the background conditional, ex. make it different when printing and viewing online using Globals!RenderFormat.

Also yesterday I learned that the VB 2005 expressions you write in SSRS 2008 have two different "OR" statements: Or and OrElse. The last one is short-curcuiting, but the other one isn't. Why would you ever need one that isn't? I really cannot come up with a case for VB 2005 regular Or.


Monday, April 7, 2014

ssh agent

Ok, I am writing a bit again. Currently in the processe of applying for a Ph.D and wasted some time trying to add a key to ssh-agent using ssh-add 'key'. Turns out the agent needs to be runnig first and it can be hard to start through an ssh. But "eval $(ssh-agent)" did the trick in the end.

Friday, January 31, 2014

Testing fast convolution of tensor sketches

Today I have been working on the final unit test of my code and the changes necessary to run  it. I am testing my first implementations last and changing the code a lot in the progress. It's fun  to see how different my C++ looks now. The final beast to slay is my implementation of the fast convulution of tensor sketches and checking that it actually keeps the promise of finding the Count Sketch of an outer product fast. Some things are bugging me:

I convert a CS into a polynomial.
I still use a discrete fourier transform. Somehow my CS(vector) represent the polynomial well enough. The conversion seems to be a math showpiece. I  have been spending a lot of time understanding this today and it's still not really clear.

Thursday, January 30, 2014

Testing and sketching

For the last two days I have been unit testing the implementations for my thesis. Finding lot's of bugs and improving the codebase a lot. One of the core algorithms is Count Sketch. It's a kind of magic that goes like this:

We take a vector x \in R^d and map it to a vector x \in R^D. Normally the smart part is that D is a lot smaller than d (which could even be infinite as in a stream). In my thesis however I exploit two things. CountSketch maintains the inner product and there is a very fast way to find the count sketch of a tensor product by polynomial multiplication using Fast Fourier Transform. Denote the sketch of x as CS(x)

E[<CS(x),CS(y)>]=<x,y>

We also use this property of tensor products:

<x^(p),y^(p)>=<x,y>^p

which is the polynomial kernel. So if we can map to tensor products we have an explicit map of the polynomial kernel. This is not very smart on it's own because the tensor product is huge  (d^p). Fortunately we have a powerful tool (http://arxiv.org/abs/1108.1320) that allows us to find the count sketch of an outer product from count sketches, like this:

CS(x^(p))=FFT^-1(FFT(CS(x))*....p-times...*FFT(CS_2(x))

Notice that this is from different count sketches using different h and s. the final CS will look like it was generated from the sum of the h'es and the product of the s'es.

Friday, January 24, 2014

Permutation matrix - Continuing Linear Algebra for programmers phrase book

Continuing the post from a few days ago about just how different Linear Algebra can look on paper and in C++ I today did a bit of work implementing and testing a permutation matrix.

Permutation matrix

A permutation matrix has a single 1 on each row, the rest is zeros. Like this:


All it does is switch rows around. Try to do a few matrix multiplications with this on the left side and you quickly see how it works. In C++ I implemented it like this for a Permutation Matrix * vector use.

void FastFood::permutate(vector<double>& data) const{
    std::shuffle(data.begin(),data.end(),std::default_random_engine(m_seed));}


Here I have a random permutation, but depending on how I make use of the "m_seed" value I can control when I want a new random order. This is useful because I want my permutation to be random, but I want it to be the same random order every time I use the method from the same object instance.

Thursday, January 23, 2014

Distribution from hell

Today I had a few errands to run but my working time was spend with the distribution from hell (again). Today I gave up and posted to math exchange http://math.stackexchange.com/questions/648577/the-radial-part-of-a-normal-distribution unfortunatly I don't think this will yield an answer.

My thinking so far can be summed to.

What problem is being solved?

The entires of the matrix look like guassian samples, but all rows have the same length. By scaling with $s_i$ this should be fixed. Conclusion: the s behaves like the lenght of a gaussian vector of size d.

It seems intuitive to me that this is related to the unit ball, but I am not fully on page with it still, and especially the weird inversion of A is just killing me.

EDIT: Det der synes at være formålet med S er at få rækkerne i V til at have længder der varierer på samme måde som hvis elementerne i V´ havde været samplet i.i.d fra N(0,1). Jeg tænker at det samme må kunne opnås ved at trække d samples fra en N(0,1) og gemme R^2 normen af disse som indgang i S. Det kræver selvfølgelig D x d samples, men det er en engangsforestilling.  << from an email i wrote my supervisor the next day. This is how i ended up doing it: sample d times from a N(0,1) normal distribution and use the length of resulting vectors as entries in S.

Tuesday, January 21, 2014

Tex adventures

Today I had a fun time trying to add the dirtree style to my tex install. In the end it was rather simple with the few things that you have to guess on an ubuntu system. The procedure was:

1. Download Dirtree.
2. Run "latex dirtree.ins"
3. Copy the dirtree.sty and dirtree.tex somewhere near my other .sty files. "locate *.sty"
4. Run "texhash" to update the tex install and make it aware of the new style.

I was all listed here: http://www.ctan.org/tex-archive/macros/generic/dirtree. Except the last step for some reason. Today I also made more tests for my thesis codebase. I wrote a section on testing methods where you only have expected properties of the outcome, and had fun trying to calculate some variances.

Monday, January 20, 2014

Xbox Music API

I had a few hours to spend on F# and I wanted to do an application using the Xbox Music API. The documentation is here http://msdn.microsoft.com/en-us/library/dn546696.aspx. I think the structure of that page is messy since it invites you to start at "Getting Started", but really the best place to start is "Using the xbox restful services". So, what is REST? I found a nice description here http://martinfowler.com/articles/richardsonMaturityModel.html that basically sums to rest being proper use of resources, HTTP Verbs and "hypermedia controls" which is short for navigation hints. Hopefully I will find some more time soon to actually build something useful with this API and F#.

In the end I got it working. Microsoft uses a service to provide a token and you must then use the token to query xbox music. I spend some time failing because some of my secret keys and the token needed to be URL encoded before being passed in the request. In F# it is easy to URL encode using a .NET module, I think it was in something like System.Web.HttpUtilies.URLencode.

Friday, January 17, 2014

Linear algebra for programmers

Inspired by Eric Meijer I thought I would list some of the things that have been useful to me recently in transforming a Machine Learning article to actual running C++. So the idea is a kind of phrase book for programmers venturing into the strange lingo of Linear Algebra. Most examples are taken from this paper since I recently implemented it.

In the examples I will assume no familiarity with Linear Algebra so we will take some shortcuts, but we will get the job done. The idea is to transfer these strange terms into something familiar to any developer. I am pretty sure that you would fail horribly trying to do any math with this understanding of Linear Algebra, but it should work fine for coding.

Scalar

int, float, double, or objects if you please. Basic arithmetic should be implemented (+ - / *).

Vector

An ordered list of scalars.  scalar[]

Matrix

An ordered list of vectors [scalar[]]. Can also be multidimensional. [[[scalar[]]]. All the innermost lists have the same length. Normally we only need the list of lists. The length of the lists is called the dimensionallity of the matrix. An n x m Matrix is a list of size n containing lists of size m.

Dot product / Inner product

The most normal way to multiply two vectors. Takes the sum of the product of the elements. You can only take the dot product of two vectors of the same length:

int dotProduct(int[] a, int[] b){
    if (a.size()!=b.size) throw exception;
    int result =0;
    for(int i=0; i<a.size();++i){
        result += a[i]*b[i];
    }
    return result;
}

Notice that multiplying two vectors returns a scalar.


Diagonal matrix

Mathematicians spend a lot of time swapping elements or building strange structures like the diagonal matrix because Linear Algebra relies heavily on the rules for multiplying different elements. Say I want to multiply two vectors like this:

int[] DiagonalProduct(int[] a, int[] b){
    if (a.size()!=b.size) throw exception;    
    int[] result.reserve(a.size());.
    for(int i=0; i,a<size();++i){
        result[i] = a[i]*b[i];
    }
    return result;
}

You could call it element-wise multiplication. There is no rule like that in Linear Algebra between two vectors. So what is done is put the a vector on the diagonal of a matrix A and fill the rest of the spaces with 0's. A is then called a diagonal matrix. Then the above corresponds to multiplying A*b with the rules for Matrix vector multiplication.

Permutation matrix

The same is the case for the permutation matrix. What it does is swap the position of the list in a matrix. Consider a permutation matrix P=[[0,1],[1,0]]. What it does is swap the first and second row of a matrix. Example P*[[2,4],[6,5]] = [[6,5],[2,4]]. I C++ you can use the swap method to easily do the same job.

I hope to write more entries to this some day, but for now my 30 minutes are up.








Thursday, January 16, 2014

Lazarus, dig!

Tonight I am resurrecting this blog after a long break. Long story short i quit my job at Rehfeld to write my thesis. I have been writing since september and now things are shaping up. Back when I started my thesis I thought that I did not want to write about my progress on here because I would be documenting my work anyway.

The last three days I have been at the Warm Crocodile Developer Conference and I learned a lot of stuff. Mostly I wanted to get into F# and I am really excited to start working with a functional language that is easy to put to actual production use. However I did not want to write about F# today. I saw three talks by Eric Meijer and they where all really great. He talked a lot about duality, as a helpful concept to understand a bunch of things. Some of his examples:

De morgans laws are dual
!(A || B) => !A && !B
!(A & B) => !A || ! B

The idea of duality was often described as switching the arrows. In the above we go from a law regarding OR to AND to the second law which describes AND to OR.

Another example, Bayes rules:

P(B)*P(A|B) = P(A UNION B) = P(A)*P(B|A)

Again "switching the arrow" we get a duality describing on the one hand a prb. of A given B and on the other hand a prob. of B given A.

Finally the described how Machine Learning is the dual of Querying:

f(x) => [x,y] query by providing a function and getting data
[x,y]=> f(x) machine learning provides a function by learning from data

I think that trying to "complete" a system by finding the dual of a statement or idea is a powerful tool, and I hope I get to hear Eric give a talk again soon. He was really provoking, especially he threw a lot of dirt at Tedd Codd, but it made me wake up and want to argument so I think it served its purpose.