As a SQL Server developer I often need test databases that resemble my client databases as much as possible. But with the increasingly and correct, in my opinion, concern over data privacy you often need to obfuscate data. This is also called scrubbing the data. I have a common table that is used as a repository for a lot of client data. I need to write a script that would make the data obscured but keep the weight of the data there so I could continue testing. I use the following script to randomly pick from a 1000 Unicode characters to build strings to replace those customers had added. Here is the heart of it (in a loop with the length of the target string as its limit):
declare @somevalueid int;
declare @intstringlength int;
declare @intcounter int;
declare @mystring nvarchar(50);
declare @mynewstring nvarchar(50);select @mystring=mystring from tblguitar where guitarid_pk=@somevalueid
select @intstringlength= len(@mystring)while @intcounter > 0
begin
set @mynewstring=@mynewstring+nchar((cast(rand()*1000 as int))+65)
set @intcounter = @intcounter-1
end
update tblguitar set mystring=@mynewstring where guitarid_pk=@somevalueid
It should prove easy to adapt this into your code and build random strings of any length.
I could, too, restrict which Unicode or regular ASCII characters I want simply by figuring out the character range and code numbers involved. I used Unicode because it’s so big and I like the weird characters I come up with. With a large database it’s unlikely I’ll have 2 entries the same using 1000 characters; even so you may need to check if the entry already exists if you, like I, require unique entries. For those with less worry about that, here’s two examples limiting you to the normal alphabetic range:
select nchar((cast(rand()*1000 as int)%26)+97) –just a-z
select nchar((cast(rand()*1000 as int)%26)+65) –just A-Z
In the end, it’s the shape of the data that is important to me, not the actual meaning. If I can retain that shape and work on without worry that I’m compromising my client’s privacy, so much the better.