I try to spend 30 minutes writing down what is in my head at the end of each working day. I also sometimes write longer entries. This blog is mostly a way for me to reflect and look back at what I have been doing, I hope some of it might also be useful to others.
Monday, November 3, 2014
Updating DLL refs when calling F# code from a C# project
This might be useful some other day. I had a situation where everything worked fine in the F# project but would crash with a FileNotFound FSharp.core exception when calling from the C# project.
The problem was the I had used a build of FParsec that relied on an old version of FSharp.core. This was taken care of in the F# app.config file, but not in the app.config file of the C# project. So I just made sure the to add the same snippet to the C# project and everything works again. Snippet:
Monday, August 4, 2014
C# F# Excel parsing
Working on some new additions to MKC-shift planner I decided I would write an Excel Parser in F# to work with the main C# project. Most methods of the parser are in "Parse.fs"
"COM object that has been separated from its underlying RCW cannot be used."
Omitting the line makes my program run, but it leaves those Excel processes running somewhere in the background. Any suggestions on avoiding this ?
SOLVED:
Mathias Brandewinder(@brandewinder) was kind enough to point me towards this SO post that includes lots of useful information. By following the "never use 2 dots on COM objects" rule the snippet below shows the changes made to ExcelParser, enough to kill of the Excel squatters.
module FSpart.Parse open FSpart.Enums open FSpart.Types open Converters open Microsoft.Office.Interop.Excel let cellToString (range :Microsoft.Office.Interop.Excel.Range) = range.Value2.ToString() let xlApp = new ApplicationClass(Visible = true) let workbook path (xlApp:Microsoft.Office.Interop.Excel.ApplicationClass) = xlApp.Workbooks.Open(path) let sheetFromPath path sheetName (xlApp:Microsoft.Office.Interop.Excel.ApplicationClass) = let xlWorkbook = xlApp.Workbooks.Open(path) let xlWorksheet = xlWorkbook.Worksheets.[sheetName] :?> Microsoft.Office.Interop.Excel.Worksheet xlWorksheet let sheetFromWorkbook (workbook:Microsoft.Office.Interop.Excel.Workbook) sheetname = workbook.Worksheets.[sheetname] :?> Microsoft.Office.Interop.Excel.Worksheet ... More methods are here let ExcelExit = xlApp.Quit() System.Runtime.InteropServices.Marshal.FinalReleaseComObject xlAppTo make this accesible to the main c# project I added "ExcelParser.fs"
module FSpart.Excelparser type ExcelParser = static member Parse(path)= let Excel = Parse.xlApp let workbook = Parse.workbook path Excel let sheet = Parse.sheetFromWorkbook workbook 1 let LongShiftPlan = Parse.longShiftAssignments sheet Parse.ExcelExit |> ignore LongShiftPlanA reference to the F# project was added to the c# project and the parser can be called via.
FSpart.Excelparser.ExcelParser.Parse(path);This Works, but there is a caveat. The last line in Parse.fs:
System.Runtime.InteropServices.Marshal.FinalReleaseComObject xlAppcauses an error:
"COM object that has been separated from its underlying RCW cannot be used."
Omitting the line makes my program run, but it leaves those Excel processes running somewhere in the background. Any suggestions on avoiding this ?
SOLVED:
Mathias Brandewinder(@brandewinder) was kind enough to point me towards this SO post that includes lots of useful information. By following the "never use 2 dots on COM objects" rule the snippet below shows the changes made to ExcelParser, enough to kill of the Excel squatters.
module FSpart.Excelparser type ExcelParser = static member Parse(path)= let Excel = Parse.xlApp let workbook = Parse.workbook path Excel let sheet = Parse.sheetFromWorkbook workbook 1 let LongShiftPlan = Parse.longShiftAssignments sheet //Cleanup Parse.ExcelExit |> ignore System.GC.Collect() System.GC.WaitForPendingFinalizers() System.Runtime.InteropServices.Marshal.FinalReleaseComObject sheet |>ignore System.Runtime.InteropServices.Marshal.FinalReleaseComObject workbook |> ignore System.Runtime.InteropServices.Marshal.FinalReleaseComObject Excel |> ignore LongShiftPlaThanks Mathias.
Thursday, July 24, 2014
F# fibonacci numbers
It's very warm. Here is an F# fibonacci sequence. I am not sure it's really good functional fu to use mutable values.
let sumUp = seq {let fib = ref 0 let fib1 = ref 1 while true do yield !fib + !fib1 let oldfib = !fib fib:=!fib1 fib1:=oldfib + !fib1} (Seq.take 20 sumUp) |> Seq.iter (fun x -> printfn "%d" x)
Monday, May 19, 2014
SqlDataConnection and wasted time
Today I was trying to connect to an Azure SQL-database using F# type providers. The guide given here is great, but does not mention how to get the connection string to a database from Azure. There are some strings given inside the Azure portal, but none of them seem to match the format suggested in the guide. I spend a long time trying to construct a connection string and it just doesn't work. The connection works fine if I only specify the server, but when I also specify the database I get errors like:
Warning : SQM1012: Unable to extract table 'dbo.Board' from SqlServer. Invalid object name 'syscomments'
The table "dbo.Board" exists in the database, so somehow the connection is pointing to the right place. My code looks like this:
Any help would be greatly appriciated.
Warning : SQM1012: Unable to extract table 'dbo.Board' from SqlServer. Invalid object name 'syscomments'
The table "dbo.Board" exists in the database, so somehow the connection is pointing to the right place. My code looks like this:
open System open System.Data open System.Data.Linq open Microsoft.FSharp.Data.TypeProviders open Microsoft.FSharp.Linq type dbSchema = SqlDataConnection<"Server=tcp:k9qd5a64zq.database.windows.net,1433;Database=timetobeat562;User ID=johanvts@k9qd5a64zq;Password=MYPASSWORD;Trusted_Connection=False;Encrypt=True;Connection Timeout=30;"> let db = dbSchema.GetDataContext() db.DataContext.Log <- System.Console.Out printfn "%s" db.DataContext.Connection.DataSource [] let main argv = printfn "%A" argv System.Console.ReadLine() |> ignore 0 // return an integer exit code
Any help would be greatly appriciated.
Wednesday, May 7, 2014
Rewriting another Query
I spend a lot of my time rewriting queries, so today I though I would document my process. We start with this query:
Select A.* ,1 as Vægtet ,'Betalende' as rtype From ( SELECT AA.DT, CASE WHEN AA.afdeling='HK/Hovedstaden' AND Fsektor = 'HK/Privat' THEN aa.Asektor else aa.afdeling END as afdeling, AA.KTG_ID, aa.nvn, aa.AFD_SEKTOR_ID, aa.Asektor, aa.Fsektor, aa.hovedopdeling, aa.ERHVERVSAKTIVE, aa.Betalende, aa.afd_sortering, aa.antal_medlemmer_1mdr, aa.antal_medlemmer_2mdr, aa.antal_medlemmer_ult, aa.blb_1mdr, (SUM(aa.blb_1mdr) * (aa.antal_medlemmer_1mdr)) - (SUM(aa.blb_ult) * (aa.antal_medlemmer_ult))as kr FROM ( SELECT convert(nvarchar(10),max(a.dt),105) as DT, a.afdeling, a.KTG_ID, a.nvn, a.AFD_SEKTOR_ID, a.Asektor, a.Fsektor, CASE WHEN a.KTG_ID IN (1910,1914) THEN 'Fritidsjob u. 25 år' ELSE a.hovedopdeling END as hovedopdeling, a.ERHVERVSAKTIVE, a.Betalende, CASE WHEN a.afdeling = 'HK/Hovedstaden' THEN 1 WHEN a.afdeling= 'HK/Midt' THEN 2 WHEN a.afdeling = 'HK/Sjælland' THEN 3 WHEN a.afdeling = 'HK/Sydjylland' THEN 4 WHEN a.afdeling = 'HK/Nordjylland' THEN 5 WHEN a.afdeling = 'HK/Midtvest' THEN 6 WHEN a.afdeling = 'HK/Østjylland' THEN 7 WHEN a.afdeling = 'HK/Post og Kommunikation' THEN 8 WHEN a.afdeling = 'HK/Trafik og Jernbane' THEN 9 ELSE 99 END as afd_sortering, SUM(a.antal_medlemmer_1mdr) as antal_medlemmer_1mdr, SUM(a.antal_medlemmer_2mdr) as antal_medlemmer_2mdr, SUM(a.antal_medlemmer_ult) as antal_medlemmer_ult, SUM(a.blb_1mdr) as blb_1mdr, SUM(a.blb_2mdr) as blb_2mdr, SUM(a.blb_ult) as blb_ult FROM ( SELECT a.DT, a.AAR, a.MAANED, a.KTG_ID, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, case when b.blb > 0 THEN 'Betalende' ELSE 'Ikke betalende' END as Betalende, b.blb as blb_1mdr, 0 as blb_2mdr, 0 as blb_ult, COUNT(*) as antal_medlemmer_1mdr, 0 as antal_medlemmer_2mdr, 0 as antal_medlemmer_ult from lr.dbo.Medlem a left join (select flag, ktg_id, SUM(blb) as blb, afd_sektor_id from [LR].[dbo].[tKontingentBlb] where flag=1 and beskrivelse in ('Forbund', 'Afdeling','Sektor') group by ktg_id, flag,afd_sektor_id) b on a.KTG_ID=b.ktg_id and a.afd_sektor_id = b.afd_sektor_id --and a.FLAG=b.flag left join lr.dbo.tkontingent c on a.KTG_ID=c.ktg_id left join lr.dbo.tAfdeling d on a.KRDS_ID=d.krds_id left join lr.dbo.tAsektor e on a.AFD_SEKTOR_ID=e.afd_sektor_id left join lr.dbo.tFsektor f on a.FORB_SEKTOR_ID=F.forb_sektor_id where DT=(select max(dt) from lr..medlem) and a.MEDL_FORB=1 --and a.ERHVERVSAKTIVE=1 group by a.DT, a.KTG_ID, a.AAR, a.MAANED, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, b.blb UNION ALL SELECT a.dt, a.AAR, a.MAANED, a.KTG_ID, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, case when b.blb > 0 THEN 'Betalende' ELSE 'Ikke betalende' END as Betalende, 0 as blb_1mdr, b.blb as blb_2mdr, 0 as blb_ult, 0 as antal_medlemmer_1mdr, COUNT(*) as antal_medlemmer_2mdr, 0 as antal_medlemmer_ult from lr.dbo.Medlem a left join (select dt, ktg_id, SUM(blb) as blb, afd_sektor_id from [LR].[dbo].[tKontingentBlb] where flag=2 and beskrivelse in ('Forbund', 'Afdeling','Sektor') group by dt, ktg_id, afd_sektor_id ) b on a.KTG_ID=b.ktg_id and a.afd_sektor_id = b.afd_sektor_id left join lr.dbo.tkontingent c on a.KTG_ID=c.ktg_id left join lr.dbo.tAfdeling d on a.KRDS_ID=d.krds_id left join lr.dbo.tAsektor e on a.AFD_SEKTOR_ID=e.afd_sektor_id left join lr.dbo.tFsektor f on a.FORB_SEKTOR_ID=F.forb_sektor_id where a.dt=(select dt from lr.dbo.tKalender_statistik where flag=1) and a.MEDL_FORB=1 -- and a.ERHVERVSAKTIVE=1 group by a.dt, a.KTG_ID, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE,a.AAR, a.MAANED, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, b.blb UNION ALL SELECT a.dt, a.AAR, a.MAANED, a.KTG_ID, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, case when b.blb > 0 THEN 'Betalende' ELSE 'Ikke betalende' END as Betalende, 0 as blb_1mdr, 0 as blb_2mdr, b.blb as blb_ult, 0 as antal_medlemmer_1mdr, 0 as antal_medlemmer_2mdr, COUNT(*) as antal_medlemmer_ult from lr.dbo.Medlem a left join (select dt, ktg_id, SUM(blb) as blb, afd_sektor_id from [LR].[dbo].[tKontingentBlb] where dt = (SELECT Dateadd(dd,-1,DATEADD(MILLISECOND, -1, DATEADD(YEAR,DATEDIFF(YEAR, 0, DATEADD(YEAR, -1, GETDATE())) + 1, 0)))) and beskrivelse in ('Forbund', 'Afdeling','Sektor') group by dt, ktg_id, afd_sektor_id) b on a.KTG_ID=b.ktg_id and a.afd_sektor_id = b.afd_sektor_id left join lr.dbo.tkontingent c on a.KTG_ID=c.ktg_id left join lr.dbo.tAfdeling d on a.KRDS_ID=d.krds_id left join lr.dbo.tAsektor e on a.AFD_SEKTOR_ID=e.afd_sektor_id left join lr.dbo.tFsektor f on a.FORB_SEKTOR_ID=F.forb_sektor_id where a.dt=(SELECT Dateadd(dd,-1,DATEADD(MILLISECOND, -1, DATEADD(YEAR,DATEDIFF(YEAR, 0, DATEADD(YEAR, -1, GETDATE())) + 1, 0)))) and a.MEDL_FORB=1 -- and a.ERHVERVSAKTIVE=1 group by a.dt, a.KTG_ID, c.nvn, c.hovedopdeling,a.ERHVERVSAKTIVE ,a.AAR, a.MAANED, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, b.blb ) A group by a.afdeling, a.KTG_ID, a.nvn, a.AFD_SEKTOR_ID, a.Asektor, a.Fsektor, CASE WHEN a.KTG_ID IN (1910,1914) THEN 'Fritidsjob u. 25 år' ELSE a.hovedopdeling END, a.ERHVERVSAKTIVE, a.Betalende, CASE WHEN a.afdeling = 'HK/Hovedstaden' THEN 1 WHEN a.afdeling= 'HK/Midt' THEN 2 WHEN a.afdeling = 'HK/Sjælland' THEN 3 WHEN a.afdeling = 'HK/Sydjylland' THEN 4 WHEN a.afdeling = 'HK/Nordjylland' THEN 5 WHEN a.afdeling = 'HK/Midtvest' THEN 6 WHEN a.afdeling = 'HK/Østjylland' THEN 7 WHEN a.afdeling = 'HK/Post og Kommunikation' THEN 8 WHEN a.afdeling = 'HK/Trafik og Jernbane' THEN 9 ELSE 99 END ) AA where aa.asektor <> '?' and aa.afdeling not in ('HK/Trafik og Jernbane', 'HK/Post og kommunikation') GROUP BY AA.DT, AA.afdeling, AA.KTG_ID, aa.nvn,aa.AFD_SEKTOR_ID, aa.Asektor, aa.Fsektor, aa.hovedopdeling, aa.ERHVERVSAKTIVE, aa.Betalende, aa.afd_sortering, aa.antal_medlemmer_1mdr, aa.antal_medlemmer_2mdr, aa.antal_medlemmer_ult, aa.blb_1mdr ) A --Cross join (SELECT Max([dt]) as dt FROM [LR].[dbo].[factKontigent_Blb]) B --left join (SELECT [dt], [ktg_id], [afd_sektor_id], [Forbund],[Afdeling],[Sektor],[Fuldtid_Forbund],[Fuldtid_Afdeling],[Fuldtid_Sektor] FROM [LR].[dbo].[factKontigent_Blb] Where dt =(SELECT Max([dt]) as dt FROM [LR].[dbo].[factKontigent_Blb])) --C on A.KTG_ID = C.ktg_id and A.AFD_SEKTOR_ID = C.[afd_sektor_id] union all Select A.DT ,CASE WHEN A.afdeling='HK/Hovedstaden' THEN a.Asektor else a.afdeling END as afdeling,A.KTG_ID,A.nvn ,A.AFD_SEKTOR_ID,A.Asektor,A.Fsektor,A.hovedopdeling,A.ERHVERVSAKTIVE,A.Betalende,A.afd_sortering ,Case isnull(C.[Fuldtid_Forbund]+C.[Fuldtid_Afdeling]+C.[Fuldtid_Sektor],0) when 0 then 0 else (C.[Forbund]+C.[Afdeling]+C.[Sektor])/(C.[Fuldtid_Forbund]+C.[Fuldtid_Afdeling]+C.[Fuldtid_Sektor]) end * A.antal_medlemmer_1mdr as antal_medlemmer_1mdr ,Case isnull(C.[Fuldtid_Forbund]+C.[Fuldtid_Afdeling]+C.[Fuldtid_Sektor],0) when 0 then 0 else (C.[Forbund]+C.[Afdeling]+C.[Sektor])/(C.[Fuldtid_Forbund]+C.[Fuldtid_Afdeling]+C.[Fuldtid_Sektor]) end * A.antal_medlemmer_2mdr as antal_medlemmer_2mdr ,Case isnull(C.[Fuldtid_Forbund]+C.[Fuldtid_Afdeling]+C.[Fuldtid_Sektor],0) when 0 then 0 else (C.[Forbund]+C.[Afdeling]+C.[Sektor])/(C.[Fuldtid_Forbund]+C.[Fuldtid_Afdeling]+C.[Fuldtid_Sektor]) end * A.antal_medlemmer_ult as antal_medlemmer_ult ,A.blb_1mdr ,A.kr ,Case isnull(C.[Fuldtid_Forbund]+C.[Fuldtid_Afdeling]+C.[Fuldtid_Sektor],0) when 0 then 0 else (C.[Forbund]+C.[Afdeling]+C.[Sektor])/(C.[Fuldtid_Forbund]+C.[Fuldtid_Afdeling]+C.[Fuldtid_Sektor]) end as Vægtetmedlem ,'Fuldtidsvægtede' as rtype From ( SELECT AA.DT, AA.afdeling, AA.KTG_ID, aa.nvn, aa.AFD_SEKTOR_ID, aa.Asektor, aa.Fsektor, aa.hovedopdeling, aa.ERHVERVSAKTIVE, aa.Betalende, aa.afd_sortering, aa.antal_medlemmer_1mdr, aa.antal_medlemmer_2mdr, aa.antal_medlemmer_ult, aa.blb_1mdr, (SUM(aa.blb_1mdr) * (aa.antal_medlemmer_1mdr)) - (SUM(aa.blb_ult) * (aa.antal_medlemmer_ult))as kr FROM ( SELECT convert(nvarchar(10),max(a.dt),105) as DT, a.afdeling, a.KTG_ID, a.nvn, a.AFD_SEKTOR_ID, a.Asektor, a.Fsektor, CASE WHEN a.KTG_ID IN (1910,1914) THEN 'Fritidsjob u. 25 år' ELSE a.hovedopdeling END as hovedopdeling, a.ERHVERVSAKTIVE, a.Betalende, CASE WHEN a.afdeling = 'HK/Hovedstaden' THEN 1 WHEN a.afdeling= 'HK/Midt' THEN 2 WHEN a.afdeling = 'HK/Sjælland' THEN 3 WHEN a.afdeling = 'HK/Sydjylland' THEN 4 WHEN a.afdeling = 'HK/Nordjylland' THEN 5 WHEN a.afdeling = 'HK/Midtvest' THEN 6 WHEN a.afdeling = 'HK/Østjylland' THEN 7 WHEN a.afdeling = 'HK/Post og Kommunikation' THEN 8 WHEN a.afdeling = 'HK/Trafik og Jernbane' THEN 9 ELSE 99 END as afd_sortering, SUM(a.antal_medlemmer_1mdr) as antal_medlemmer_1mdr, SUM(a.antal_medlemmer_2mdr) as antal_medlemmer_2mdr, SUM(a.antal_medlemmer_ult) as antal_medlemmer_ult, SUM(a.blb_1mdr) as blb_1mdr, SUM(a.blb_2mdr) as blb_2mdr, SUM(a.blb_ult) as blb_ult FROM ( SELECT a.DT, a.AAR, a.MAANED, a.KTG_ID, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, case when b.blb > 0 THEN 'Betalende' ELSE 'Ikke betalende' END as Betalende, b.blb as blb_1mdr, 0 as blb_2mdr, 0 as blb_ult, COUNT(*) as antal_medlemmer_1mdr, 0 as antal_medlemmer_2mdr, 0 as antal_medlemmer_ult from lr.dbo.Medlem a left join (select flag, ktg_id, SUM(blb) as blb, afd_sektor_id from [LR].[dbo].[tKontingentBlb] where flag=1 and beskrivelse in ('Forbund', 'Afdeling','Sektor') group by ktg_id, flag,afd_sektor_id) b on a.KTG_ID=b.ktg_id and a.afd_sektor_id = b.afd_sektor_id --and a.FLAG=b.flag left join lr.dbo.tkontingent c on a.KTG_ID=c.ktg_id left join lr.dbo.tAfdeling d on a.KRDS_ID=d.krds_id left join lr.dbo.tAsektor e on a.AFD_SEKTOR_ID=e.afd_sektor_id left join lr.dbo.tFsektor f on a.FORB_SEKTOR_ID=F.forb_sektor_id where DT=(select max(dt) from lr..medlem) and a.MEDL_FORB=1 --and a.ERHVERVSAKTIVE=1 group by a.DT, a.KTG_ID, a.AAR, a.MAANED, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, b.blb UNION ALL SELECT a.dt, a.AAR, a.MAANED, a.KTG_ID, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, case when b.blb > 0 THEN 'Betalende' ELSE 'Ikke betalende' END as Betalende, 0 as blb_1mdr, b.blb as blb_2mdr, 0 as blb_ult, 0 as antal_medlemmer_1mdr, COUNT(*) as antal_medlemmer_2mdr, 0 as antal_medlemmer_ult from lr.dbo.Medlem a left join (select dt, ktg_id, SUM(blb) as blb, afd_sektor_id from [LR].[dbo].[tKontingentBlb] where flag=2 and beskrivelse in ('Forbund', 'Afdeling','Sektor') group by dt, ktg_id, afd_sektor_id ) b on a.KTG_ID=b.ktg_id and a.afd_sektor_id = b.afd_sektor_id left join lr.dbo.tkontingent c on a.KTG_ID=c.ktg_id left join lr.dbo.tAfdeling d on a.KRDS_ID=d.krds_id left join lr.dbo.tAsektor e on a.AFD_SEKTOR_ID=e.afd_sektor_id left join lr.dbo.tFsektor f on a.FORB_SEKTOR_ID=F.forb_sektor_id where a.dt=(select dt from lr.dbo.tKalender_statistik where flag=1) and a.MEDL_FORB=1 --and a.ERHVERVSAKTIVE=1 group by a.dt, a.KTG_ID, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE,a.AAR, a.MAANED, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, b.blb UNION ALL SELECT a.dt, a.AAR, a.MAANED, a.KTG_ID, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, case when b.blb > 0 THEN 'Betalende' ELSE 'Ikke betalende' END as Betalende, 0 as blb_1mdr, 0 as blb_2mdr, b.blb as blb_ult, 0 as antal_medlemmer_1mdr, 0 as antal_medlemmer_2mdr, COUNT(*) as antal_medlemmer_ult from lr.dbo.Medlem a left join (select dt, ktg_id, SUM(blb) as blb, afd_sektor_id from [LR].[dbo].[tKontingentBlb] where dt = (SELECT Dateadd(dd,-1,DATEADD(MILLISECOND, -1, DATEADD(YEAR,DATEDIFF(YEAR, 0, DATEADD(YEAR, -1, GETDATE())) + 1, 0)))) and beskrivelse in ('Forbund', 'Afdeling','Sektor') group by dt, ktg_id, afd_sektor_id) b on a.KTG_ID=b.ktg_id and a.afd_sektor_id = b.afd_sektor_id left join lr.dbo.tkontingent c on a.KTG_ID=c.ktg_id left join lr.dbo.tAfdeling d on a.KRDS_ID=d.krds_id left join lr.dbo.tAsektor e on a.AFD_SEKTOR_ID=e.afd_sektor_id left join lr.dbo.tFsektor f on a.FORB_SEKTOR_ID=F.forb_sektor_id where a.dt=(SELECT Dateadd(dd,-1,DATEADD(MILLISECOND, -1, DATEADD(YEAR,DATEDIFF(YEAR, 0, DATEADD(YEAR, -1, GETDATE())) + 1, 0)))) and a.MEDL_FORB=1 --and a.ERHVERVSAKTIVE=1 group by a.dt, a.KTG_ID, c.nvn, c.hovedopdeling,a.ERHVERVSAKTIVE ,a.AAR, a.MAANED, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, b.blb ) A group by a.afdeling, a.KTG_ID, a.nvn, a.AFD_SEKTOR_ID, a.Asektor, a.Fsektor, CASE WHEN a.KTG_ID IN (1910,1914) THEN 'Fritidsjob u. 25 år' ELSE a.hovedopdeling END, a.ERHVERVSAKTIVE, a.Betalende, CASE WHEN a.afdeling = 'HK/Hovedstaden' THEN 1 WHEN a.afdeling= 'HK/Midt' THEN 2 WHEN a.afdeling = 'HK/Sjælland' THEN 3 WHEN a.afdeling = 'HK/Sydjylland' THEN 4 WHEN a.afdeling = 'HK/Nordjylland' THEN 5 WHEN a.afdeling = 'HK/Midtvest' THEN 6 WHEN a.afdeling = 'HK/Østjylland' THEN 7 WHEN a.afdeling = 'HK/Post og Kommunikation' THEN 8 WHEN a.afdeling = 'HK/Trafik og Jernbane' THEN 9 ELSE 99 END ) AA where aa.asektor <> '?' and aa.afdeling not in ('HK/Trafik og Jernbane', 'HK/Post og kommunikation') GROUP BY AA.DT, AA.afdeling, AA.KTG_ID, aa.nvn,aa.AFD_SEKTOR_ID, aa.Asektor, aa.Fsektor, aa.hovedopdeling, aa.ERHVERVSAKTIVE, aa.Betalende, aa.afd_sortering, aa.antal_medlemmer_1mdr, aa.antal_medlemmer_2mdr, aa.antal_medlemmer_ult, aa.blb_1mdr ) A --Cross join (SELECT Max([dt]) as dt FROM [LR].[dbo].[factKontigent_Blb]) B left join (SELECT [dt], [ktg_id], [afd_sektor_id], [Forbund],[Afdeling],[Sektor],[Fuldtid_Forbund],[Fuldtid_Afdeling],[Fuldtid_Sektor] FROM [LR].[dbo].[factKontigent_Blb] Where dt =(SELECT Max([dt]) as dt FROM [LR].[dbo].[factKontigent_Blb])) C on A.KTG_ID = C.ktg_id and A.AFD_SEKTOR_ID = C.[afd_sektor_id] Where antal_medlemmer_1mdr>0 or antal_medlemmer_2mdr > 0 or antal_medlemmer_ult > 0This query is central to a few dashboards but I needed to make some changes and I would also like it to load faster, currently it takes 39 seconds. I skimmed the entire query, but to make changes I just start from the top. Open a new query windows and make a copy so I can compare my changes query results to the current ones.
Starting at the top the first 4 lines simply add two columns with fixed values "1, 'Betalende'", so I remove that outer query and add it to the first inner query starting at line 6. I also removed the mathching ending on line 75.
Next I looked at the many unioned queries starting at line 26, they seem very similar. Looking closer they select the same thing, except the last six values where they are used to sum one thing at a time, and then the outer queries collapses the rows using sum. It seems this would be better handled by a single select statement with conditional sums. I started writing my conditional sums, one thing that helped a lot is to always look at the "raw" data, that's just select * without the aggregates and looking at the rows. In the end i had constructed this:
SELECT max(a.dt) , a.KTG_ID, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor, c.nvn, c.hovedopdeling , a.ERHVERVSAKTIVE, case when sum(b.blb) > 0 THEN 'Betalende' ELSE 'Ikke betalende' END as Betalende , sum( CASE WHEN flag = 0 THEN b.blb ELSE 0 END) as blb_1mdr , sum( CASE WHEN flag = 1 THEN b.blb ELSE 0 END) as blb_2mdr , sum( CASE WHEN a.Dt = convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) THEN b.blb ELSE 0 END) as blb_ult , COUNT(DISTINCT CASE WHEN flag = 0 THEN PRS_ID ELSE NULL END) as antal_medlemmer_1mdr , COUNT( DISTINCT CASE WHEN flag = 1 THEN PRS_ID ELSE NULL END) as antal_medlemmer_2mdr , COUNT( DISTINCT CASE WHEN a.Dt = convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) THEN PRS_ID ELSE NULL END) as antal_medlemmer_ult , CASE WHEN COUNT(distinct case when flag = 0 then Beskrivelse else null end) = 0 THEN 0 ELSE SUM( CASE WHEN FLAG = 0 THEN fuld_vægt ELSE 0 END)/COUNT(distinct case when flag = 0 then Beskrivelse else null end)END , CASE WHEN COUNT(distinct case when flag = 1 then Beskrivelse else null end) = 0 THEN 0 ELSE SUM( CASE WHEN FLAG = 1 THEN fuld_vægt ELSE 0 END)/COUNT(distinct case when flag = 1 then Beskrivelse else null end) END , CASE WHEN COUNT(distinct case when a.DT = convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) then Beskrivelse else null end) = 0 THEN 0 ELSE SUM( CASE WHEN a.DT = convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) THEN fuld_vægt ELSE 0 END)/COUNT(distinct case when a.DT = convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) then Beskrivelse else null end) END from lr.dbo.Medlem a left join [LR].[dbo].[tKontingentBlb] b on a.KTG_ID=b.ktg_id and a.afd_sektor_id = b.afd_sektor_id and a.DT = b.dt left join lr.dbo.tkontingent c on a.KTG_ID=c.ktg_id left join lr.dbo.tAfdeling d on a.KRDS_ID=d.krds_id left join lr.dbo.tAsektor e on a.AFD_SEKTOR_ID=e.afd_sektor_id left join lr.dbo.tFsektor f on a.FORB_SEKTOR_ID=F.forb_sektor_id OUTER APPLY (SELECT top 1 CASE WHEN sum([Fuldtid_Forbund]+[Fuldtid_Afdeling]+[Fuldtid_Sektor]) = 0 THEN 0 ELSE sum(CAST([Forbund]+[Afdeling]+[Sektor] AS FLOAT)/([Fuldtid_Forbund]+[Fuldtid_Afdeling]+[Fuldtid_Sektor])) END as fuld_vægt FROM [LR].[dbo].[factKontigent_Blb] where ktg_id= A.KTG_ID and [afd_sektor_id]= A.AFD_SEKTOR_ID and dt = a.DT ) ktg_fact where (a.DT>=DATEADD(M,-1,GETDATE()) OR a.DT =convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105)) and a.MEDL_FORB=1 and beskrivelse in ('Forbund', 'Afdeling','Sektor') and a.dt = b.dt group by a.KTG_ID,c.nvn, c.hovedopdeling, a.ERHVERVSAKTIVE, d.afdeling, a.AFD_SEKTOR_ID, e.Asektor, f.Fsektor,fuld_vægt order by ktg_id,afd_sektor_id,afdeling
While this is much more compact, it's still not very readable and it turns out the wrong results. The summing simply is too complicated and too many strange constrains across the rows apply. Time to start over. In my second approach i first select the data i am looking for a into a temporary table and a CTE and the abuse the PIVOT command to get the summations as needed. Notice that the aggragation in my last select are necessary for the use of more than one PIVOT statement:
/* Datagrundlag til medlemsdelen af det fælles dashboard Author : Johan Sivertsen ================================== Kontingentoplysninger: Samme som viewet factkontingentblb men her valgt kun for de tre relevante måneder */ SELECT Kontigent.*, Fuldtid.Forbund AS [Fuldtid_Forbund], Fuldtid.Afdeling AS [Fuldtid_Afdeling], Fuldtid.Sektor AS [Fuldtid_Sektor], Fuldtid.[AkasseAdm] AS [Fuldtid_AkasseAdm] into #kont_fact FROM (SELECT dt, aar, maaned, ktg_id, afd_sektor_id, sum(isnull([Forbund], 0)) AS [Forbund], sum(isnull([Afdeling], 0)) AS [Afdeling], sum(isnull([Sektor], 0)) AS [Sektor], Sum(isnull([AkasseAdm], 0)) AS [AkasseAdm] FROM (SELECT Kontingent.dt, Kontingent.aar, Kontingent.maaned, Kontingent.ktg_id, Kontingent.afd_sektor_id, Kontingent.beskrivelse, Kontingent.blb, fuldtid.fblb, CASE fuldtid.fblb WHEN 0 THEN 0 ELSE cast(Kontingent.blb AS FLOAT) / cast(fuldtid.fblb AS FLOAT) END AS fa FROM (SELECT tKontingentBlb_alle.aar, tKontingentBlb_alle.maaned, blb AS fblb, beskrivelse, afd_sektor_id FROM dbo.tKontingentBlb_alle WHERE ((tKontingentBlb_alle.ktg_id = 1110) AND beskrivelse IN ('Sektor', 'Afdeling', 'Forbund')) OR ((tKontingentBlb_alle.ktg_id = 2110) AND beskrivelse IN ('AkasseAdm')) GROUP BY tKontingentBlb_alle.aar, tKontingentBlb_alle.maaned, tKontingentBlb_alle.blb, tKontingentBlb_alle.afd_sektor_id, tKontingentBlb_alle.beskrivelse) AS fuldtid INNER JOIN tKontingentBlb_alle AS Kontingent ON fuldtid.aar = Kontingent.aar AND fuldtid.maaned = Kontingent.maaned AND fuldtid.afd_sektor_id = Kontingent.afd_sektor_id AND fuldtid.beskrivelse = Kontingent.beskrivelse) AS A PIVOT (max(blb) FOR beskrivelse IN ([Forbund], [Afdeling], [Sektor], [AkasseAdm])) AS pvt GROUP BY dt, aar, maaned, ktg_id, afd_sektor_id) AS Kontigent INNER JOIN (SELECT dt, aar, maaned, ktg_id, afd_sektor_id, sum(isnull([Forbund], 0)) AS [Forbund], sum(isnull([Afdeling], 0)) AS [Afdeling], sum(isnull([Sektor], 0)) AS [Sektor], Sum(isnull([AkasseAdm], 0)) AS [AkasseAdm] FROM (SELECT Kontingent.dt, Kontingent.aar, Kontingent.maaned, Kontingent.ktg_id, Kontingent.afd_sektor_id, Kontingent.beskrivelse, Kontingent.blb, fuldtid.fblb, CASE fuldtid.fblb WHEN 0 THEN 0 ELSE cast(Kontingent.blb AS FLOAT) / cast(fuldtid.fblb AS FLOAT) END AS fa FROM (SELECT tKontingentBlb_alle.aar, tKontingentBlb_alle.maaned, blb AS fblb, beskrivelse, afd_sektor_id FROM dbo.tKontingentBlb_alle WHERE ((tKontingentBlb_alle.ktg_id = 1110) AND beskrivelse IN ('Sektor', 'Afdeling', 'Forbund')) OR ((tKontingentBlb_alle.ktg_id = 2110) AND beskrivelse IN ('AkasseAdm')) GROUP BY tKontingentBlb_alle.aar, tKontingentBlb_alle.maaned, tKontingentBlb_alle.blb, tKontingentBlb_alle.afd_sektor_id, tKontingentBlb_alle.beskrivelse) AS fuldtid INNER JOIN tKontingentBlb_alle AS Kontingent ON fuldtid.aar = Kontingent.aar AND fuldtid.maaned = Kontingent.maaned AND fuldtid.afd_sektor_id = Kontingent.afd_sektor_id AND fuldtid.beskrivelse = Kontingent.beskrivelse) AS A PIVOT (max(fblb) FOR beskrivelse IN ([Forbund], [Afdeling], [Sektor], [AkasseAdm])) AS pvt WHERE (DT>=DATEADD(M,-1,GETDATE()) OR DT =convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105)) GROUP BY dt, aar, maaned, ktg_id, afd_sektor_id) AS Fuldtid ON Kontigent.dt = Fuldtid.dt AND Kontigent.ktg_id = Fuldtid.ktg_id AND Kontigent.afd_sektor_id = Fuldtid.afd_sektor_id; /* ====================== # Medlem =============== Danner medlemsoplysninger - Bruger WITH da dette er den tungeste af de to midlertidige tabeller */ WITH medlem AS( SELECT medlem.dt,medlem.prs_id, medlem.KTG_ID, afd.afdeling, medlem.AFD_SEKTOR_ID, Asek.Asektor, Fsek.Fsektor FROM LR.dbo.medlem medlem left join lr.dbo.tAfdeling afd on medlem.krds_id = afd.krds_id left join lr.dbo.tAsektor Asek on medlem.afd_sektor_id = Asek.afd_sektor_id left join lr.dbo.tFsektor Fsek on medlem.FORB_SEKTOR_ID = Fsek.forb_sektor_id WHERE (medlem.DT>=DATEADD(M,-1,GETDATE()) OR medlem.DT =convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105)) --medlemmer fra de to seneste md samt ult. sidste år and medlem.MEDL_Forb = 1 ) --medlemmer af forbundet) /*Det hele klistres sammen med en lækker triple PIVOT, den første pivot opbruger egentlig den midlertidige tabel, men hvis man undlader at kalde den(SourceTable) ved navn virker det. Grupperingen er nødvendig da PIVOT ikke ved at de tre "id" nøgler egentlig alle peger samme opdeling af dt. */ SELECT afdeling, CASE WHEN Fsektor = 'HK/Privat' and afdeling = 'HK/Hovedstaden' THEN Asektor ELSE afdeling END as Afdeling_Privat,Fsektor,ktg_id,nvn,hovedopdeling, ---row isnull(MAX([0]),0) as antal_dm, isnull(MAX([1]),0) as antal_sm, isnull(MAX([2]),0) as antal_ult, isnull(MAX([3]),0) as ktg_dm, isnull(MAX([4]),0) as ktg_sm, isnull(MAX([5]),0) as ktg_ult, Coalesce(MAX([6]),MAX([7]),MAX([8])) as fuldktg_dm, Coalesce(MAX([7]),MAX([6]),MAX([8])) as fuldktg_sm, Coalesce(MAX([8]),MAX([7]),MAX([6])) as fuldktg_ult, ISNULL(MAX([9]),0) as fb_dm, ISNULL(MAX([10]),0) as fb_sm, ISNULL(MAX([11]),0) as fb_ult, Coalesce(MAX([12]),MAX([9]),1) as fb_fuld_dm, --The coalesce avoid deviding with 0 Coalesce(MAX([13]),MAX([10]),1) as fb_fuld_sm, Coalesce(MAX([14]),MAX([11]),1) as fb_fuld_ult, CASE WHEN MAX([3])>0 THEN 1 ELSE 0 END AS Betalende FROM( SELECT CASE WHEN medlem.DT =convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) THEN 2 WHEN MONTH(medlem.DT) = MONTH(GETDATE())-1 THEN 1 WHEN MONTH(medlem.dt)=MONTH(GETDATE()) THEN 0 ELSE NULL END as antal_id, CASE WHEN medlem.DT =convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) THEN 5 WHEN MONTH(medlem.DT) = MONTH(GETDATE())-1 THEN 4 WHEN MONTH(medlem.dt)=MONTH(GETDATE()) THEN 3 ELSE NULL END as blb_id, CASE WHEN medlem.DT =convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) THEN 8 WHEN MONTH(medlem.DT) = MONTH(GETDATE())-1 THEN 7 WHEN MONTH(medlem.dt)=MONTH(GETDATE()) THEN 6 ELSE NULL END as fuld_id, CASE WHEN medlem.DT =convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) THEN 11 WHEN MONTH(medlem.DT) = MONTH(GETDATE())-1 THEN 10 WHEN MONTH(medlem.dt)=MONTH(GETDATE()) THEN 9 ELSE NULL END as fb_id, CASE WHEN medlem.DT =convert(datetime,'31-12-'+CAST(YEAR(GETDATE())-1 AS varchar),105) THEN 14 WHEN MONTH(medlem.DT) = MONTH(GETDATE())-1 THEN 13 WHEN MONTH(medlem.dt)=MONTH(GETDATE()) THEN 12 ELSE NULL END as fb_fuld_id, medlem.prs_id,medlem.afdeling,medlem.Asektor,medlem.Fsektor,medlem.KTG_ID,kont_description.nvn,kont_description.hovedopdeling, forbund+kont.afdeling+sektor as beløb, Fuldtid_forbund+fuldtid_afdeling+fuldtid_sektor as Fuldt_beløb, Forbund as fb_beløb, Fuldtid_Forbund as fb_fuldt_beløb FROM medlem Left join #kont_fact as kont on medlem.KTG_ID=kont.ktg_id and medlem.AFD_SEKTOR_ID = kont.afd_sektor_id and medlem.DT = kont.dt Left join tKontingent as kont_description on medlem.KTG_ID = kont_description.ktg_id) AS SourceTable PIVOT ( COUNT(SourceTable.prs_id) FOR antal_id IN([0],[1],[2]) ) AS PivotTable PIVOT ( SUM(beløb) FOR blb_id IN([3],[4],[5]) ) AS PivotTable PIVOT ( SUM(Fuldt_beløb) FOR fuld_id IN([6],[7],[8]) ) AS PivotTable PIVOT ( SUM(fb_beløb) FOR fb_id IN ([9],[10],[11]) ) AS PivotTable PIVOT ( SUM(fb_fuldt_beløb) FOR fb_fuld_id IN ([12],[13],[14]) ) AS PivotTable GROUP BY afdeling, CASE WHEN Fsektor = 'HK/Privat' and afdeling = 'HK/Hovedstaden' THEN Asektor ELSE afdeling END,Fsektor,ktg_id,nvn,hovedopdeling drop table #kont_factWhile this query is far from elegant it is still more compact and readable than what we started with, but it requires an understanding of the PIVOT function, and how to abuse it to pivot multiple colums. This is achieved by constructing multiple ID's and using aggregation. Not pretty, but it works. Best of all it runs in around 14 seconds, mostly because it does not rely on the factKontingentblb view, but just selects the necessary subsection into a tmp. table.
Tuesday, May 6, 2014
Since last time - Ugly fonts on SSRS charts
Since finishing my thesis I have moved away from working with C++ to a temporary job with SQL and the microsoft BI stack while I await a decision on my Ph.D. application. So I will try to write a bit about SSRS and SQL here the next few months. I also try to catch up on some F# in my spare time so hopefully I will also write a bit about that.
Today I had to deal with a old nemesis: SSRS 2008. When drawing graphs the font sometimes becomes terribly ugly. The problem is that SSRS cannot use anti-aliasing when you set the chart background to "No color", simply because it appearently has to render the chart before knowing about the color of the background. If you set a background color the text becomes smooth again. You can then use if statements to make the background conditional, ex. make it different when printing and viewing online using Globals!RenderFormat.
Also yesterday I learned that the VB 2005 expressions you write in SSRS 2008 have two different "OR" statements: Or and OrElse. The last one is short-curcuiting, but the other one isn't. Why would you ever need one that isn't? I really cannot come up with a case for VB 2005 regular Or.
Today I had to deal with a old nemesis: SSRS 2008. When drawing graphs the font sometimes becomes terribly ugly. The problem is that SSRS cannot use anti-aliasing when you set the chart background to "No color", simply because it appearently has to render the chart before knowing about the color of the background. If you set a background color the text becomes smooth again. You can then use if statements to make the background conditional, ex. make it different when printing and viewing online using Globals!RenderFormat.
Also yesterday I learned that the VB 2005 expressions you write in SSRS 2008 have two different "OR" statements: Or and OrElse. The last one is short-curcuiting, but the other one isn't. Why would you ever need one that isn't? I really cannot come up with a case for VB 2005 regular Or.
Monday, April 7, 2014
ssh agent
Ok, I am writing a bit again. Currently in the processe of applying for a Ph.D and wasted some time trying to add a key to ssh-agent using ssh-add 'key'. Turns out the agent needs to be runnig first and it can be hard to start through an ssh. But "eval $(ssh-agent)" did the trick in the end.
Friday, January 31, 2014
Testing fast convolution of tensor sketches
Today I have been working on the final unit test of my code and the changes necessary to run it. I am testing my first implementations last and changing the code a lot in the progress. It's fun to see how different my C++ looks now. The final beast to slay is my implementation of the fast convulution of tensor sketches and checking that it actually keeps the promise of finding the Count Sketch of an outer product fast. Some things are bugging me:
I convert a CS into a polynomial.
I still use a discrete fourier transform. Somehow my CS(vector) represent the polynomial well enough. The conversion seems to be a math showpiece. I have been spending a lot of time understanding this today and it's still not really clear.
I convert a CS into a polynomial.
I still use a discrete fourier transform. Somehow my CS(vector) represent the polynomial well enough. The conversion seems to be a math showpiece. I have been spending a lot of time understanding this today and it's still not really clear.
Thursday, January 30, 2014
Testing and sketching
For the last two days I have been unit testing the implementations for my thesis. Finding lot's of bugs and improving the codebase a lot. One of the core algorithms is Count Sketch. It's a kind of magic that goes like this:
We take a vector x \in R^d and map it to a vector x \in R^D. Normally the smart part is that D is a lot smaller than d (which could even be infinite as in a stream). In my thesis however I exploit two things. CountSketch maintains the inner product and there is a very fast way to find the count sketch of a tensor product by polynomial multiplication using Fast Fourier Transform. Denote the sketch of x as CS(x)
E[<CS(x),CS(y)>]=<x,y>
We also use this property of tensor products:
<x^(p),y^(p)>=<x,y>^p
which is the polynomial kernel. So if we can map to tensor products we have an explicit map of the polynomial kernel. This is not very smart on it's own because the tensor product is huge (d^p). Fortunately we have a powerful tool (http://arxiv.org/abs/1108.1320) that allows us to find the count sketch of an outer product from count sketches, like this:
CS(x^(p))=FFT^-1(FFT(CS(x))*....p-times...*FFT(CS_2(x))
Notice that this is from different count sketches using different h and s. the final CS will look like it was generated from the sum of the h'es and the product of the s'es.
We take a vector x \in R^d and map it to a vector x \in R^D. Normally the smart part is that D is a lot smaller than d (which could even be infinite as in a stream). In my thesis however I exploit two things. CountSketch maintains the inner product and there is a very fast way to find the count sketch of a tensor product by polynomial multiplication using Fast Fourier Transform. Denote the sketch of x as CS(x)
E[<CS(x),CS(y)>]=<x,y>
We also use this property of tensor products:
<x^(p),y^(p)>=<x,y>^p
which is the polynomial kernel. So if we can map to tensor products we have an explicit map of the polynomial kernel. This is not very smart on it's own because the tensor product is huge (d^p). Fortunately we have a powerful tool (http://arxiv.org/abs/1108.1320) that allows us to find the count sketch of an outer product from count sketches, like this:
CS(x^(p))=FFT^-1(FFT(CS(x))*....p-times...*FFT(CS_2(x))
Notice that this is from different count sketches using different h and s. the final CS will look like it was generated from the sum of the h'es and the product of the s'es.
Friday, January 24, 2014
Permutation matrix - Continuing Linear Algebra for programmers phrase book
Continuing the post from a few days ago about just how different Linear Algebra can look on paper and in C++ I today did a bit of work implementing and testing a permutation matrix.
Permutation matrix
A permutation matrix has a single 1 on each row, the rest is zeros. Like this:
All it does is switch rows around. Try to do a few matrix multiplications with this on the left side and you quickly see how it works. In C++ I implemented it like this for a Permutation Matrix * vector use.
Permutation matrix
A permutation matrix has a single 1 on each row, the rest is zeros. Like this:
void FastFood::permutate(vector<double>& data) const{
std::shuffle(data.begin(),data.end(),std::default_random_engine(m_seed));}
Here I have a random permutation, but depending on how I make use of the "m_seed" value I can control when I want a new random order. This is useful because I want my permutation to be random, but I want it to be the same random order every time I use the method from the same object instance.
Thursday, January 23, 2014
Distribution from hell
Today I had a few errands to run but my working time was spend with the distribution from hell (again). Today I gave up and posted to math exchange http://math.stackexchange.com/questions/648577/the-radial-part-of-a-normal-distribution unfortunatly I don't think this will yield an answer.
My thinking so far can be summed to.
What problem is being solved?
The entires of the matrix look like guassian samples, but all rows have the same length. By scaling with $s_i$ this should be fixed. Conclusion: the s behaves like the lenght of a gaussian vector of size d.
It seems intuitive to me that this is related to the unit ball, but I am not fully on page with it still, and especially the weird inversion of A is just killing me.
EDIT: Det der synes at være formålet med S er at få rækkerne i V til at have længder der varierer på samme måde som hvis elementerne i V´ havde været samplet i.i.d fra N(0,1). Jeg tænker at det samme må kunne opnås ved at trække d samples fra en N(0,1) og gemme R^2 normen af disse som indgang i S. Det kræver selvfølgelig D x d samples, men det er en engangsforestilling. << from an email i wrote my supervisor the next day. This is how i ended up doing it: sample d times from a N(0,1) normal distribution and use the length of resulting vectors as entries in S.
My thinking so far can be summed to.
What problem is being solved?
The entires of the matrix look like guassian samples, but all rows have the same length. By scaling with $s_i$ this should be fixed. Conclusion: the s behaves like the lenght of a gaussian vector of size d.
It seems intuitive to me that this is related to the unit ball, but I am not fully on page with it still, and especially the weird inversion of A is just killing me.
EDIT: Det der synes at være formålet med S er at få rækkerne i V til at have længder der varierer på samme måde som hvis elementerne i V´ havde været samplet i.i.d fra N(0,1). Jeg tænker at det samme må kunne opnås ved at trække d samples fra en N(0,1) og gemme R^2 normen af disse som indgang i S. Det kræver selvfølgelig D x d samples, men det er en engangsforestilling. << from an email i wrote my supervisor the next day. This is how i ended up doing it: sample d times from a N(0,1) normal distribution and use the length of resulting vectors as entries in S.
Tuesday, January 21, 2014
Tex adventures
Today I had a fun time trying to add the dirtree style to my tex install. In the end it was rather simple with the few things that you have to guess on an ubuntu system. The procedure was:
1. Download Dirtree.
2. Run "latex dirtree.ins"
3. Copy the dirtree.sty and dirtree.tex somewhere near my other .sty files. "locate *.sty"
4. Run "texhash" to update the tex install and make it aware of the new style.
I was all listed here: http://www.ctan.org/tex-archive/macros/generic/dirtree. Except the last step for some reason. Today I also made more tests for my thesis codebase. I wrote a section on testing methods where you only have expected properties of the outcome, and had fun trying to calculate some variances.
1. Download Dirtree.
2. Run "latex dirtree.ins"
3. Copy the dirtree.sty and dirtree.tex somewhere near my other .sty files. "locate *.sty"
4. Run "texhash" to update the tex install and make it aware of the new style.
I was all listed here: http://www.ctan.org/tex-archive/macros/generic/dirtree. Except the last step for some reason. Today I also made more tests for my thesis codebase. I wrote a section on testing methods where you only have expected properties of the outcome, and had fun trying to calculate some variances.
Monday, January 20, 2014
Xbox Music API
I had a few hours to spend on F# and I wanted to do an application using the Xbox Music API. The documentation is here http://msdn.microsoft.com/en-us/library/dn546696.aspx. I think the structure of that page is messy since it invites you to start at "Getting Started", but really the best place to start is "Using the xbox restful services". So, what is REST? I found a nice description here http://martinfowler.com/articles/richardsonMaturityModel.html that basically sums to rest being proper use of resources, HTTP Verbs and "hypermedia controls" which is short for navigation hints. Hopefully I will find some more time soon to actually build something useful with this API and F#.
In the end I got it working. Microsoft uses a service to provide a token and you must then use the token to query xbox music. I spend some time failing because some of my secret keys and the token needed to be URL encoded before being passed in the request. In F# it is easy to URL encode using a .NET module, I think it was in something like System.Web.HttpUtilies.URLencode.
In the end I got it working. Microsoft uses a service to provide a token and you must then use the token to query xbox music. I spend some time failing because some of my secret keys and the token needed to be URL encoded before being passed in the request. In F# it is easy to URL encode using a .NET module, I think it was in something like System.Web.HttpUtilies.URLencode.
Friday, January 17, 2014
Linear algebra for programmers
Inspired by Eric Meijer I thought I would list some of the things that have been useful to me recently in transforming a Machine Learning article to actual running C++. So the idea is a kind of phrase book for programmers venturing into the strange lingo of Linear Algebra. Most examples are taken from this paper since I recently implemented it.
In the examples I will assume no familiarity with Linear Algebra so we will take some shortcuts, but we will get the job done. The idea is to transfer these strange terms into something familiar to any developer. I am pretty sure that you would fail horribly trying to do any math with this understanding of Linear Algebra, but it should work fine for coding.
Scalar
int, float, double, or objects if you please. Basic arithmetic should be implemented (+ - / *).
Vector
An ordered list of scalars. scalar[]
Matrix
An ordered list of vectors [scalar[]]. Can also be multidimensional. [[[scalar[]]]. All the innermost lists have the same length. Normally we only need the list of lists. The length of the lists is called the dimensionallity of the matrix. An n x m Matrix is a list of size n containing lists of size m.
Dot product / Inner product
The most normal way to multiply two vectors. Takes the sum of the product of the elements. You can only take the dot product of two vectors of the same length:
Notice that multiplying two vectors returns a scalar.
Diagonal matrix
Mathematicians spend a lot of time swapping elements or building strange structures like the diagonal matrix because Linear Algebra relies heavily on the rules for multiplying different elements. Say I want to multiply two vectors like this:
You could call it element-wise multiplication. There is no rule like that in Linear Algebra between two vectors. So what is done is put the a vector on the diagonal of a matrix A and fill the rest of the spaces with 0's. A is then called a diagonal matrix. Then the above corresponds to multiplying A*b with the rules for Matrix vector multiplication.
Permutation matrix
The same is the case for the permutation matrix. What it does is swap the position of the list in a matrix. Consider a permutation matrix P=[[0,1],[1,0]]. What it does is swap the first and second row of a matrix. Example P*[[2,4],[6,5]] = [[6,5],[2,4]]. I C++ you can use the swap method to easily do the same job.
I hope to write more entries to this some day, but for now my 30 minutes are up.
In the examples I will assume no familiarity with Linear Algebra so we will take some shortcuts, but we will get the job done. The idea is to transfer these strange terms into something familiar to any developer. I am pretty sure that you would fail horribly trying to do any math with this understanding of Linear Algebra, but it should work fine for coding.
Scalar
int, float, double, or objects if you please. Basic arithmetic should be implemented (+ - / *).
Vector
An ordered list of scalars. scalar[]
Matrix
An ordered list of vectors [scalar[]]. Can also be multidimensional. [[[scalar[]]]. All the innermost lists have the same length. Normally we only need the list of lists. The length of the lists is called the dimensionallity of the matrix. An n x m Matrix is a list of size n containing lists of size m.
Dot product / Inner product
The most normal way to multiply two vectors. Takes the sum of the product of the elements. You can only take the dot product of two vectors of the same length:
int dotProduct(int[] a, int[] b){
if (a.size()!=b.size) throw exception;
int result =0;
for(int i=0; i<a.size();++i){
result += a[i]*b[i];
}
return result;
}
Notice that multiplying two vectors returns a scalar.
Diagonal matrix
Mathematicians spend a lot of time swapping elements or building strange structures like the diagonal matrix because Linear Algebra relies heavily on the rules for multiplying different elements. Say I want to multiply two vectors like this:
int[] DiagonalProduct(int[] a, int[] b){
if (a.size()!=b.size) throw exception;
int[] result.reserve(a.size());.
for(int i=0; i,a<size();++i){
result[i] = a[i]*b[i];
}
return result;
}
You could call it element-wise multiplication. There is no rule like that in Linear Algebra between two vectors. So what is done is put the a vector on the diagonal of a matrix A and fill the rest of the spaces with 0's. A is then called a diagonal matrix. Then the above corresponds to multiplying A*b with the rules for Matrix vector multiplication.
Permutation matrix
The same is the case for the permutation matrix. What it does is swap the position of the list in a matrix. Consider a permutation matrix P=[[0,1],[1,0]]. What it does is swap the first and second row of a matrix. Example P*[[2,4],[6,5]] = [[6,5],[2,4]]. I C++ you can use the swap method to easily do the same job.
I hope to write more entries to this some day, but for now my 30 minutes are up.
Thursday, January 16, 2014
Lazarus, dig!
Tonight I am resurrecting this blog after a long break. Long story short i quit my job at Rehfeld to write my thesis. I have been writing since september and now things are shaping up. Back when I started my thesis I thought that I did not want to write about my progress on here because I would be documenting my work anyway.
The last three days I have been at the Warm Crocodile Developer Conference and I learned a lot of stuff. Mostly I wanted to get into F# and I am really excited to start working with a functional language that is easy to put to actual production use. However I did not want to write about F# today. I saw three talks by Eric Meijer and they where all really great. He talked a lot about duality, as a helpful concept to understand a bunch of things. Some of his examples:
De morgans laws are dual
!(A || B) => !A && !B
!(A & B) => !A || ! B
The idea of duality was often described as switching the arrows. In the above we go from a law regarding OR to AND to the second law which describes AND to OR.
Another example, Bayes rules:
P(B)*P(A|B) = P(A UNION B) = P(A)*P(B|A)
Again "switching the arrow" we get a duality describing on the one hand a prb. of A given B and on the other hand a prob. of B given A.
Finally the described how Machine Learning is the dual of Querying:
f(x) => [x,y] query by providing a function and getting data
[x,y]=> f(x) machine learning provides a function by learning from data
I think that trying to "complete" a system by finding the dual of a statement or idea is a powerful tool, and I hope I get to hear Eric give a talk again soon. He was really provoking, especially he threw a lot of dirt at Tedd Codd, but it made me wake up and want to argument so I think it served its purpose.
The last three days I have been at the Warm Crocodile Developer Conference and I learned a lot of stuff. Mostly I wanted to get into F# and I am really excited to start working with a functional language that is easy to put to actual production use. However I did not want to write about F# today. I saw three talks by Eric Meijer and they where all really great. He talked a lot about duality, as a helpful concept to understand a bunch of things. Some of his examples:
De morgans laws are dual
!(A || B) => !A && !B
!(A & B) => !A || ! B
The idea of duality was often described as switching the arrows. In the above we go from a law regarding OR to AND to the second law which describes AND to OR.
Another example, Bayes rules:
P(B)*P(A|B) = P(A UNION B) = P(A)*P(B|A)
Again "switching the arrow" we get a duality describing on the one hand a prb. of A given B and on the other hand a prob. of B given A.
Finally the described how Machine Learning is the dual of Querying:
f(x) => [x,y] query by providing a function and getting data
[x,y]=> f(x) machine learning provides a function by learning from data
I think that trying to "complete" a system by finding the dual of a statement or idea is a powerful tool, and I hope I get to hear Eric give a talk again soon. He was really provoking, especially he threw a lot of dirt at Tedd Codd, but it made me wake up and want to argument so I think it served its purpose.
Subscribe to:
Posts (Atom)