Saturday, October 14, 2017


So I'm on this project to move a (big!!) database to a new home. Upgrade the hardware and RDBMS too, but mainly split up into a more logically planned set of databases (plural), instead of schemas all within one database.
Sounds easy, doesn't it - especially to data-ninjas like all you out there reading this.
Yeah, well, not so much.
The thing that everyone looks at is size - if it's huge (tens of terabytes or more, say) then it must be difficult.
But that's not really so. It's kinda like logistics - once you've done it once then you know how, so doing it again is easy ... it's the organising of doing in 10 billion times at the same time that takes skill.
In this case it isn't the size of the data that makes things hard - it would be as hard with 1 MB or 1 GB or 1 TB or 1 EB of data - but it's the interconnectivity between objects that's the killer.
Not when you move something in a lump - that's when the size becomes a pain - but when you move things relative to each other - that's when things get really awkward!
You see, if you have two things sitting next to each other, so to speak, is (let's say) Schema_a, then one can refer to the other directly by just the name - like this:
select Field_a, Field_b from Table_a
If you move Table_a to another Schema (Schema_b, say), then you need to say
select Field_a, Field_b from Schema_b.Table_a
to distinguish it from the version that stayed home, back in Schema_a. Then you take it to another database in the same instance
select Field_a, Field_b from Database_b.Schema_b_Table_a
and then move it off to another server .....
select Field_a, Field_b from Server_b.Database_b.Schema_b_Table_a

All this seems very simple - and it is. However, there's a certain amount of work involved to change from the first to the fourth example:
1. Find where the external reference is
2. Open the code for editing
3. Edit the code (correctly!)
4. Save and close the code (compiling it as needed)
Which also is simple, albeit taking a little longer than just realising that it needs to be done.

In fact, the difficult pieces are not the ones in that list above, but these:
1. Locating all the places where the code will have to be changed, missing none.
2. Doing it all within the inevitably unrealistic time-constraints imposed on you by people who agreed to the time scheduling without knowing what had to be done.

So the first thing you do is write some code. This code finds every piece of code that refers to any of the objects that's going to move. That could easily be just a few, but if you're working with a database that's grown slowly over the last ten years or so, it's quite possible to number in the thousands.

Then the number of connections ........... what you would call the interconnectivity. Just how many of the queries in your code depend on information that's suddenly going to be in a new place? Will it be hundreds (lucky you!), or will it be tens of thousands (unlucky you!).

Either way, upping your Courage Vitamin might be good .

Good luck!

Sunday, October 08, 2017

I need Data, but I mustn't have Data!

[ Tech ]
That's a strange title, but it's a fairly common dilemma these days in software development, where the developers need access to realistic data to be able to test their general algorithms and also to exceptional data (such as incorrect and "illegal" or impermissible values), but, on the other hand, cannot be given access to really "real" data because of privacy and security concerns.

Normally the answer to the privacy and security concerns is to encrypt the data with an algorithm that can be reversed, so that "Fred Bloggs" is stored as "&*@#$G KJF^^S" but is decrypted back to "Fred Bloggs" every time. However, a lot of the software will still have to deal with the "Fred Bloggs" version for functions like mail composition, and for checking that someone isn't trying to attack the system (entering their name as "Fred ';drop table patient;", for example).

For development there's a very lucky freebie - "real" real data doesn't have to be used - so long as it appears to be real then that's fine. So, we can take "real" real data and transform it into something that developers can use ("fake" real data ?), providing that the transformation cannot be reversed.

So here's a suggestion of how to anonymise your data [Please think hard about the way you implement this and, if you realise any flaws, please tell me!].

1. Determine the fields that are to be anonymised. Obviously things like names, addresses, phone numbers, and identifiers like Social Security IDs have to be altered, but other, less obvious items, can be used to link to individuals. For example, if you leave an intact claim number to a claim for medical treatment then the recipient of that treatment could be located (the fact that such activity would be illegal doesn't matter - it's still possible).

2. Make some tables to store these data. For example, one table for the person, one for their address(es), another for phone(s), etc., all linked so that a phone number will be linkable to the correct person. Additionally, each table must also have a unique key (just an integer).

3. Each item in (2) above must also include a reference to the table(s) and field(s) where that piece of data is located.

3. When you have collected all the data that will need to be altered make a complete copy of this metadata. This copy will be altered to be anonymous, and then used to replace the occurrences of the original data. Having both available will allow a comparison to verify that the anonymisation has worked.

4. There are a number of types of data. Determine an algorithm for anonymisation for each type.

Overall plan. 
The way that I have successfully done this has been to utilise the data itself for substitute values. For example, for first names, I chose an minimal offset that was the number of instances of the most common first name. Then, each time I wanted to alter a first name, I obtained a random number, added the minimal offset to it, and used the first name from that many records further on down the table. If the search went over the end then I simply continued at the top of the table. One note here: you need to ensure that the replacement values that you are obtaining for some fields come from a record having the same gender as the record receiving them - someone called Bill Smith is unlikely to be undergoing a hysterectomy, for example (not impossible, but unlikely, and exceptions like this, while technically possible, will be very rare in real life).

Now for suggestions for dealing with particular types
  • First Names. As mentioned above, select names of same gender as the receiving record (this means that gender must be recorded in the metadata, but not altered).
  • Names. You may have to take national origin into account for both first and last names. This may be needed in order to allow software testing that "looks" real. Having someone with a name that is obviously of Indian origin assigned an English birthplace is not unreasonable; Alaska, however, despite being nearer, is lots less likely. 
  • National IDs. In the USA the format is 999-99-9999; in the UK it's XX999999X. In either case replace the three sections separately, picking replacements from three different source records.
  • Addresses. These can be complicated! The postal code has to match the country and local area. The street name has to match the country. For example, 90453 translates to the Postleitzahl for  Nürnberg in Germany, but is (at the time of writing) not an assigned value for the US zip code system. Thus simply grabbing a number that looks correct can trigger warnings later on when software tries to use it! Similarly, one might have to have some sort of street lookup system to make sure that a street does exist if the street name and city are not used together. House numbers, of course, can be changed, but should not be raised with abandon if address checking is part of the software being tested. 
  • References to other items. These need to remain contextually correct, but the reference identities must change. For example, as pointed out above, assigning a claim for a female-only procedure to a patient tagged with another gender should raise errors that you probably don't want to see. Not altering the ID code of the claim will offer a viewer an easy way of locating the original claimant, thus making all your work pointless!


Lastly, a description of how I actually did a project like this.
  1. This was for a health insurance claims system from auto insurance, so HIPAA was involved from the start. Also, overseas drivers were a possibility, so names and addresses, etc., had to be handled with care.
  2. The metadata included 
    1. Surrogate key
    2. First name
    3. Middle name(s)
    4. Last name
    5. Gender
    6. National ID (e.g. SSN)
    7. Address Street Number
    8. Address Street Name
    9. Address City
    10. Address County
    11. Address State
    12. Address Postal Code
    13. Address Country
    14. Claim Insurer  (multiple, in child table)
    15. Claim Identifier (multiple, in same child table)
    16. Some other fields as well
  3. The metadata items were mapped back to their sources, so that a single record in the metadata would map to a varying number of fields in data tables, depending on the field in the metadata table. For example, National ID would only be mapped to one field in one table, whereas First Name and Last Name might appear in several other tables. 
  4. In addition, text within the claims tables might contain the name of the individual, so this also had to be searched and possibly altered.
  5. The data above was extracted from all the basic tables of the database.The surrogate key here was available for linking an individual record (which is what this was) with a table of claim records. 
  6. Once the metadata was assembled it was retained as a Source Copy, and copied as a Destination Copy, which would store the alterations.  
  7. Finally the data in the Destination Copy would be used to alter the data in the database that was to be used for development and sales purposes.

StartRecordPlace in metadata table to start. In the example it starts at the beginning.
RunLengthNumber of records to process in this run
RecordNrVariable holding surrogate key of metadata table.
OffsetNumber of records to skip down the metadata table before starting to search for an appropriate record from which to use data
RunLengthNumber of records to process in this run
RunLengthNumber of records to process in this run
RunLengthNumber of records to process in this run

StartRecord = 0
RunLength = SystemDate.seconds
Wait(random number of seconds, seeded by part of system time)
Offset = SystemDate.seconds
For RecordNr from (StartRecord + 1) to (StartRecord + 1 + RunLength)
      ProcessRecord(RecordNr, Offset)
//Save RecordNr for next batch.

Within ProcessRecord the code would search for the first record in the Source Table after the record to be altered where the Gender and the Address Country matched the record to be altered. The search would start a random number of records forward - as determined by the Offset value. One part of the located record was used to update the Destination Table, and the field marked as changed in a log table. If the search reached the end of the Source Table without finding a Country/Gender match then the search would resume at the start of the table, but only to the record before the Source Record of the individual being altered.
The alterations made are described above: if the National ID was being altered then three progressive searches would be made, each returning one part of the ID. In this way the identity value was scrambled but still composed of valid parts.

After processing one batch of records the next batch would be processed using different basic values as obtained by using the system time.
Once this was completed for one small group of fields then the process was restarted for the next group of fields.

Finally, the log would be examined to locate any records with fields that had not been altered. These were altered by hand.

The randomising agent was the system time, which would not recur. Because no precise record of when the operation took place was kept, it would not be reasonable to expect someone to be able to work backwards from the transformed Destination Data to obtain the Source Data, but the transformed data itself, when viewed, appeared to be very normal, to the extent that it caused a major panic when demonstrators first used it and believed that they were showing real data!

So, as I pointed out above, this process does take some time, but yields you some very real-looking data that you can then use for demonstrations and for development purposes.  As I also asked above, if you notice something that is missing or a flaw in the logic, please tell me (in the comments would be a good place!), as leaving a known flaw for others to adopt in ignorance would be bad.


Saturday, July 29, 2017

Computers Accumulate!

[Tech - Work]
Having moved a little while ago to Tacoma, I now have a new manager and today was the day of the new team's first meeting. So, a trek from Tacoma to Bellevue (about 75 minutes) and a really long one back (about 3 hours) because of at least one accident. Oh well!
It was really nice to meet all the people that I'd only seen as names on emails up to now - they're a really great bunch and really exhibit the qualities people notice about Microsoft People - very friendly and very bright!

[Tech - Home]
Looking back I saw a blog entry here from October 3, 2010 where the Tech entry mentioned that I'd shed a pair of tower computers and that the household was down to just six computers.Phones didn't really count in those days. Right now we're at a high spot again
In 2010 I would have had a Sony VAIO laptop for home and recording off TV, and a 17" Acer Aspire laptop for work. The Sony, along with another amazingly tiny Sony laptop acquired out of curiosity from a thrift store, are retired but not shed, and the two remaining desktops of that era were shed in 2016. In 2011 I bought a System-76 Bonobo (17.5", 32 GB, etc) - in short, a small server masquerading as a laptop (you'd need knees of steel!). We also have two Windows-based netbooks, an Asus netbox for on-monitor mounting, 3 ChromeBooks (so convenient it's untrue!), and an Acer Switch tablet with keyboard. Very similar to the SurfaceBook from Microsoft that I use for work.
In addition there's a Samsung tablet lurking somewhere, but that hasn't been seen since we moved!
Twelve machines plus two iPhone SEs. Like I say, they accumulate!

Salmon, basil, tomato, garlic, and some pepper. Heat and eat!

Well, not quite so simple, but here's how I've been doing it.

Many years ago I owned a contact infra-red grill by a company called Rima. It was a gift from my father while I was away at university. It was a 240v beast, so never made it across the Atlantic with me, and the company is long gone now.
The nearest thing that I've found is the
George Foreman grill (right). Here in the US they cost about $50, but can often be found in thrift stores for $10 or $20. They're fairly easy to clean, being Teflon coated, but unfortunately the plates don'e come off for washing as they did with the old Rima.

1 cup Couscous1 oz Butter
4 Salmon piecesBasil
One large Tomato, sliced thinlyGarlic powder or flakes
Plum Tomatoes

  1. Melt the butter in a pot, then bring it to a "boil".
  2. Add the couscous and stir well so that the hot butter is well coating the couscous.
  3. keep this on medium heat for a few minutes
  4. Add a cu of water or, better, chicken broth
  5. After another minute or two on heat, remove from heat and leave for couscous to absorb the water
  6. Prepare the salmon. Mine came frozen in pieces this time, but basically you will need to cut it into pieces each about half of what one will want to eat.
  7. If you have fresh Basil then roll some leaves and then slice the roll. This gives you long thin strips like tinsel that are easy to drape onto the salmon. This time I used dried basil from Trader Joe's as an experiment. It came out rather well.
  8. Sprinkle the salmon with some pepper and some garlic powder or flakes.
  9. Add the basil onto the salmon.
  10. Lay tomato slices onto the pieces of salmon.
  11. Cook the salmon and toppings in a George Foreman grill for about 4-5 minutes (depending on the thickness of the salmon).
  12. Place the cooked salmon on a plate, add the couscous beside it, and garnish with plum tomatoes and chopped celery.

Bon appetit!


Sunday, July 23, 2017

Power On The Go

What's the most common sight you see in an airport or hotel or meeting or bar or coffee-shop these days?
I would suggest that it's the sight of a poor human searching desperately for an unoccupied pair of holes in the wall (or wherever!) that can be used to provide food for his or her trusty electronic companion, which is currently powerless (sorry !) !
I've also been one of those poor humans, but took steps to try to fix my problem. Yesterday, however, the problem visited a friend of my wife's, and I was able to help (in a small way), so I'd like to share. Maybe it'll help you, dear reader, avoid her situation. By the way, this is not an advertisement for Amazon or any of the products described here - it's simply what you might term a PSA - Public Service Announcement!

My wife and some friends went with some others to a small festival about 40 miles from where we now live, and I turned up, as arranged, to meet my wife when they all returned. Goodbyes were said and everyone made for their cars - it was about 85F at the time - and most people left.
My wife and I spent a little time getting things in the car sorted out, and then we looked up to realise that one of the other women, who I'll name "J" to minimise her embarrassment, was still trying to get into her car. I was about to get out to ask if I could help when she got the driver's door open and got in - "panic over", I thought.
However, a few seconds later, J was again out of her car, so I also got out to see what was wrong. It turned out that her car wouldn't start (it turns out that the battery had died, but that would be another story) and her large iPhone was also completely out of juice - hardly even registering a red line!

She said that she didn't live that far away and that her husband could come and get her, if she could only contact him. So, obviously, the first thing I did was hand her my phone and say "use it!" - she did.

She was happy to wait there in the shade for him, but my wife and I weren't happy just leaving her alone with no communications, so I loaned her my battery and cable, and here is the point of the story.

I had already used the battery to top off my wife's iPhone SE from about 50% that day. I don't know how long J's iPhone took to charge, but apparently it charged fully, and I got the battery back the next day with 50% of its capacity still remaining!

The picture on the right is of my battery - search for "intocircuit" on Amazon. Right now this one, holding 11,200 mAh, is being withdrawn and replaced with one holding 15,000 mAh. I have had this one since August 2015, carry it with me everywhere (including in aeroplane carry-on) and it never fails me. The reason, I believe, is that there is a nice meter to tell me how much it is depleted, so I always know when to recharge it well before it runs out of power.

The moral of the story: if you have a phone, carry a battery too, and another in the car! Make them as large as you can afford (this one is about $20; the replacement about $27).

One extra note: cables are a real pain to carry around; try this (on the right) for an idea. It's a retractable spring-loaded cable that offers USB-A to Lightning. Great for charging your iPhone from the intoCircuit beast above.
On the left, however, you see the reason for that strange little loop on the Lightning plug ... it stops you losing it, because the cable actually ends with a micro-USB connector, and the loop attaches an adapter from micro-USB to Lightning. So you can use the same cable to charge that battery and then charge either an iPhone or an Android phone.
These cables cost about $6 each - there's another company on Amazon (Miger) with essentially the same product at $12 for two. I haven't had the ones above for very long, so I can't report on longevity. I did buy some 1:4 retractables some 5 months ago and 1 of 4 just died on me.

Be safe and in touch!

Friday, July 14, 2017

The Wonder of Chicken!

I follow a blog called French Letters, by  a lady from a town near Nîmes, and a couple of posts ago she described how she'd got addicted to a particular chicken dish. So, I decided to try it out.

I should warn you that while this is really easy,
a. it has peanuts
b. no pictures until next time - it was so good we scarfed it all down!

You'll need alarge frying pan and an oven pot, such as a Pyrex or Arcopol glass bowl with a flat base and lid (I used an arcopol 26).
So, to start, there's two parts that come together during the process.

1. The Meat.
2. The Gravy

The Meat
This was chicken. She used thighs, but I used two chicken breasts double cut to make four thin pieces.
Preparation here is simple; heat maybe two or three tablespoons of oil in a large pan and fry the chicken on each side until it's cooked and browned just a bit. The first couple of times you turn it, sprinkle a little salt over the top before the turn, so it heats and melts and flavours the meat as it heats up.
When the chicken pieces are done place them flat in the bottom of your oven-proof dish.
Turn your oven on to 350F.

At this point I started rice in my rice cooker; you have just enough time to do this and have it be ready when the dish emerges from the oven.

The Gravy
The ingredients are these:

Diced Red Bell Pepper         1 tsp Cinnamon     
Chopped Cilantro 1 tsp Paprika
Chopped Spring Onions 2 tsp Ginger
Peanut Butter A little Salt
1 can of Coconut Milk Pepper to taste.

I used a bunch of eight small Spring Onions and chopped them completely (excepting the roots and dried tips, of course!). I used a small box of Cilantro from the supermarket and one really large but very firm red Bell Pepper.
Using the oil left in the pan after the chicken has been cooked, add the vegetables and fry them for about 3-4 minutes until they begin to wilt. Then add the spices (right-hand column) and stir them in to the frying vegetables. After a minute or so, add two good dollops of peanut butter and then the Coconut milk. Mix all the ingredients well, stirring so that the peanut butter mixes in to the cocout milk.
Pour this mixture over the chicken breasts in the pot
Cover the pot and cook in the oven for about 25 minutes.

Remove pot from oven, uncover, and let it start to cool.

At this point my rice was cooked, so I could fill plates with beds of rice.

Plate your masterpiece, serve, and bask in the applause from your family and guests.