T-SQL vs. CLR – A Comparison
Join the DZone community and get the full member experience.
Join For Freei’ve recently been looking at optimizing a data load that i’m working on, and have got a quite a nice example of how the performance can be improved significantly by creating a clr object.
to do this, i created a set of data (which is downloadable from here , is approx. 3mb and is extractable using 7-zip ) which has approx. 200k records in it. the file has two columns (pipe delimited), which are a row id, and a set of events in a string (in the format “1,2,3,4,5”). each string has a variable number of events in it (between 1 and 75), which looks something like this.
what i’m wanting to do is split this out, to have an event id, sequence_number and event with the eventlist above being split over multiple records, and a sequence number showing the position of the event code in the list. sounds straightforward.
i’m effectively looking for a quick way to generate this data.
t-sql – custom_split
a custom script to split the table was created, looking like this:
create function [dbo].[split] ( @string nvarchar(4000), @delimiter nvarchar(10) ) returns @table table ( [value] nvarchar(4000) ) begin declare @nextstring nvarchar(4000) declare @pos int, @nextpos int set @nextstring = '' set @string = @string + @delimiter set @pos = charindex(@delimiter, @string) set @nextpos = 1 while (@pos <> 0) begin set @nextstring = substring(@string, 1, @pos - 1) insert into @table ( [value] ) values ( @nextstring ) set @string = substring(@string, @pos + len(@delimiter), len(@string)) set @nextpos = @pos set @pos = charindex(@delimiter, @string) end return end -- this code was taken from here: http://stackoverflow.com/questions/314824/t-sql-opposite-to-string-concatenation-how-to-split-string-into-multiple-recor/314833#314833
this was executed in management studio, with this script:
select eventid, identity (int, 1, 1) as sequenceno, value as event into dbo.sql_events from dbo.largefile cross apply dbo.split(dbo.largefile.eventlist, ‘,’) as t order by dbo.largefile.eventid
running this generated a table with 9.8 million rows and took 6 minutes 8 seconds, so handling approx. 26.6k records a second.
clr – splitevents
to split the eventlist in a clr object, i did the following in visual studio 2010.
1. choose the .net framework, clr project
2. right click the project when it’s created, and choose ‘add –> stored procedure’.
in the body of the class, i used the following code:
using system; using system.data; using system.data.sqlclient; using microsoft.sqlserver.server;public partial class storedprocedures { [microsoft.sqlserver.server.sqlprocedure] public static void clr_split() { // put your code here char[] delimiters = new char[] { ‘,’, ‘-’ }; using (sqlconnection conn = new sqlconnection(“context connection=true”)) { using (sqlcommand cmd = new sqlcommand(“select eventid, eventlist from largefile”, conn)) { sqlpipe pipe = sqlcontext.pipe; sqlmetadata[] cols = new sqlmetadata[3]; cols[0] = new sqlmetadata(“eventid”, sqldbtype.int); cols[1] = new sqlmetadata(“sequence_no”, sqldbtype.int); cols[2] = new sqlmetadata(“event”, sqldbtype.int); conn.open(); using (sqldatareader reader = cmd.executereader()) { int irow = 0; if (reader.hasrows) { sqldatarecord rec = new sqldatarecord(cols); pipe.sendresultsstart(rec); while (reader.read()) { int ictiqueueid = convert.toint32(reader[0]); string sincoming = reader[1].tostring(); if (sincoming.length > 0) { string[] ssplit = sincoming.split(delimiters); int isplitcount = 1; foreach (string s in ssplit) { try { int isplitval = convert.toint32(s); rec.setsqlint32(0, ictiqueueid); rec.setsqlint32(1, isplitcount); rec.setsqlint32(2, isplitval); pipe.sendresultsrow(rec); } catch (exception) { } isplitcount++; } } irow++; } pipe.sendresultsend(); } else pipe.send(“”); } conn.close(); } } } };
having compiled and deployed this to sql server, it could be run using this :
insert into clr_events (eventid, sequence_no, event) exec clr_split
notes
when running these tests, i was running sql server 2008 r2, on a i5-430m laptop, with 4gb ram.
the database had plenty of space pre-allocated to it, so there was no performance hit based on growing the database.
Published at DZone with permission of Nick Haslam, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments