Saturday, October 21, 2017

Cooking Cheese

It's called Baked Brie. 
I was recently on an engagement in Chicago and lucky enough to ask the hotel night staff as I arrived just after midnight, very early one Monday morning, about where I might get some food. So lucky, indeed, that they recommended Bijan's Bistro, which was just around the corner, one block from the hotel.


So, if you take a look at Chicago on Google Maps and compare it with the clip above, you'll find it. Baked Brie as a dish I already knew from living in Germany; I'd very seldom had it here in the USA, but the restaurant proclaimed itself as being European, so I tried it.
Very very good!
It's also very simple, so here's how I reproduced it:
Ingredients (for two people)
  • One small Brie, cut into eight pieces.
  • Two good handfuls of almonds. Half will be toasted.
  • A couple of dozen grapes.
  • Honey to taste.
  • Soppressata salami, sourdough bread, arugula (rocket) 
Method
  • Start the oven at 350F (175C)
  • Put four eighths on the Brie into each of two bowls. Make Maltese Crosses of them in order to allow space for the cheese to melt and run.
  • Sprinkle a quarter of the nuts over each bowl
  • Drizzle honey onto the cheese. 
  • Bake the bowls of cheese for about 10 minutes, until the cheese starts to run.
  • Slice fresh bread, soppressata to fill out the meal
  • Chop fresh green salad (I like rocket because it's sour, contrasting with the cheese and honey)
  • Toast the rest of the almonds
  • When the cheese starts to run, bring the bowls from the oven
  • Sprinkle the rest of the nuts and some of the grapes over the bowls



Arrange the ingredients around the bowls 

(which will be hot! Warn your guests!!!).

Serve

Enjoy.






TTFN





Saturday, October 14, 2017

Interconnectivity

[Tech]
So I'm on this project to move a (big!!) database to a new home. Upgrade the hardware and RDBMS too, but mainly split up into a more logically planned set of databases (plural), instead of schemas all within one database.
Sounds easy, doesn't it - especially to data-ninjas like all you out there reading this.
Yeah, well, not so much.
The thing that everyone looks at is size - if it's huge (tens of terabytes or more, say) then it must be difficult.
But that's not really so. It's kinda like logistics - once you've done it once then you know how, so doing it again is easy ... it's the organising of doing in 10 billion times at the same time that takes skill.
In this case it isn't the size of the data that makes things hard - it would be as hard with 1 MB or 1 GB or 1 TB or 1 EB of data - but it's the interconnectivity between objects that's the killer.
Not when you move something in a lump - that's when the size becomes a pain - but when you move things relative to each other - that's when things get really awkward!
You see, if you have two things sitting next to each other, so to speak, is (let's say) Schema_a, then one can refer to the other directly by just the name - like this:
select Field_a, Field_b from Table_a
If you move Table_a to another Schema (Schema_b, say), then you need to say
select Field_a, Field_b from Schema_b.Table_a
to distinguish it from the version that stayed home, back in Schema_a. Then you take it to another database in the same instance
select Field_a, Field_b from Database_b.Schema_b_Table_a
and then move it off to another server .....
select Field_a, Field_b from Server_b.Database_b.Schema_b_Table_a

All this seems very simple - and it is. However, there's a certain amount of work involved to change from the first to the fourth example:
1. Find where the external reference is
2. Open the code for editing
3. Edit the code (correctly!)
4. Save and close the code (compiling it as needed)
Which also is simple, albeit taking a little longer than just realising that it needs to be done.

In fact, the difficult pieces are not the ones in that list above, but these:
1. Locating all the places where the code will have to be changed, missing none.
2. Doing it all within the inevitably unrealistic time-constraints imposed on you by people who agreed to the time scheduling without knowing what had to be done.

So the first thing you do is write some code. This code finds every piece of code that refers to any of the objects that's going to move. That could easily be just a few, but if you're working with a database that's grown slowly over the last ten years or so, it's quite possible to number in the thousands.

Then the number of connections ........... what you would call the interconnectivity. Just how many of the queries in your code depend on information that's suddenly going to be in a new place? Will it be hundreds (lucky you!), or will it be tens of thousands (unlucky you!).

Either way, upping your Courage Vitamin might be good .

Good luck!





Sunday, October 08, 2017

I need Data, but I mustn't have Data!

[ Tech ]
That's a strange title, but it's a fairly common dilemma these days in software development, where the developers need access to realistic data to be able to test their general algorithms and also to exceptional data (such as incorrect and "illegal" or impermissible values), but, on the other hand, cannot be given access to really "real" data because of privacy and security concerns.

Normally the answer to the privacy and security concerns is to encrypt the data with an algorithm that can be reversed, so that "Fred Bloggs" is stored as "&*@#$G KJF^^S" but is decrypted back to "Fred Bloggs" every time. However, a lot of the software will still have to deal with the "Fred Bloggs" version for functions like mail composition, and for checking that someone isn't trying to attack the system (entering their name as "Fred ';drop table patient;", for example).

For development there's a very lucky freebie - "real" real data doesn't have to be used - so long as it appears to be real then that's fine. So, we can take "real" real data and transform it into something that developers can use ("fake" real data ?), providing that the transformation cannot be reversed.

So here's a suggestion of how to anonymise your data [Please think hard about the way you implement this and, if you realise any flaws, please tell me!].

1. Determine the fields that are to be anonymised. Obviously things like names, addresses, phone numbers, and identifiers like Social Security IDs have to be altered, but other, less obvious items, can be used to link to individuals. For example, if you leave an intact claim number to a claim for medical treatment then the recipient of that treatment could be located (the fact that such activity would be illegal doesn't matter - it's still possible).

2. Make some tables to store these data. For example, one table for the person, one for their address(es), another for phone(s), etc., all linked so that a phone number will be linkable to the correct person. Additionally, each table must also have a unique key (just an integer).

3. Each item in (2) above must also include a reference to the table(s) and field(s) where that piece of data is located.

3. When you have collected all the data that will need to be altered make a complete copy of this metadata. This copy will be altered to be anonymous, and then used to replace the occurrences of the original data. Having both available will allow a comparison to verify that the anonymisation has worked.

4. There are a number of types of data. Determine an algorithm for anonymisation for each type.

Overall plan. 
The way that I have successfully done this has been to utilise the data itself for substitute values. For example, for first names, I chose an minimal offset that was the number of instances of the most common first name. Then, each time I wanted to alter a first name, I obtained a random number, added the minimal offset to it, and used the first name from that many records further on down the table. If the search went over the end then I simply continued at the top of the table. One note here: you need to ensure that the replacement values that you are obtaining for some fields come from a record having the same gender as the record receiving them - someone called Bill Smith is unlikely to be undergoing a hysterectomy, for example (not impossible, but unlikely, and exceptions like this, while technically possible, will be very rare in real life).

Now for suggestions for dealing with particular types
  • First Names. As mentioned above, select names of same gender as the receiving record (this means that gender must be recorded in the metadata, but not altered).
  • Names. You may have to take national origin into account for both first and last names. This may be needed in order to allow software testing that "looks" real. Having someone with a name that is obviously of Indian origin assigned an English birthplace is not unreasonable; Alaska, however, despite being nearer, is lots less likely. 
  • National IDs. In the USA the format is 999-99-9999; in the UK it's XX999999X. In either case replace the three sections separately, picking replacements from three different source records.
  • Addresses. These can be complicated! The postal code has to match the country and local area. The street name has to match the country. For example, 90453 translates to the Postleitzahl for  Nürnberg in Germany, but is (at the time of writing) not an assigned value for the US zip code system. Thus simply grabbing a number that looks correct can trigger warnings later on when software tries to use it! Similarly, one might have to have some sort of street lookup system to make sure that a street does exist if the street name and city are not used together. House numbers, of course, can be changed, but should not be raised with abandon if address checking is part of the software being tested. 
  • References to other items. These need to remain contextually correct, but the reference identities must change. For example, as pointed out above, assigning a claim for a female-only procedure to a patient tagged with another gender should raise errors that you probably don't want to see. Not altering the ID code of the claim will offer a viewer an easy way of locating the original claimant, thus making all your work pointless!

Example

Lastly, a description of how I actually did a project like this.
  1. This was for a health insurance claims system from auto insurance, so HIPAA was involved from the start. Also, overseas drivers were a possibility, so names and addresses, etc., had to be handled with care.
  2. The metadata included 
    1. Surrogate key
    2. First name
    3. Middle name(s)
    4. Last name
    5. Gender
    6. National ID (e.g. SSN)
    7. Address Street Number
    8. Address Street Name
    9. Address City
    10. Address County
    11. Address State
    12. Address Postal Code
    13. Address Country
    14. Claim Insurer  (multiple, in child table)
    15. Claim Identifier (multiple, in same child table)
    16. Some other fields as well
  3. The metadata items were mapped back to their sources, so that a single record in the metadata would map to a varying number of fields in data tables, depending on the field in the metadata table. For example, National ID would only be mapped to one field in one table, whereas First Name and Last Name might appear in several other tables. 
  4. In addition, text within the claims tables might contain the name of the individual, so this also had to be searched and possibly altered.
  5. The data above was extracted from all the basic tables of the database.The surrogate key here was available for linking an individual record (which is what this was) with a table of claim records. 
  6. Once the metadata was assembled it was retained as a Source Copy, and copied as a Destination Copy, which would store the alterations.  
  7. Finally the data in the Destination Copy would be used to alter the data in the database that was to be used for development and sales purposes.

StartRecordPlace in metadata table to start. In the example it starts at the beginning.
RunLengthNumber of records to process in this run
RecordNrVariable holding surrogate key of metadata table.
OffsetNumber of records to skip down the metadata table before starting to search for an appropriate record from which to use data
RunLengthNumber of records to process in this run


StartRecord = 0
RunLength = SystemDate.seconds
Wait(random number of seconds, seeded by part of system time)
Offset = SystemDate.seconds
For RecordNr from (StartRecord + 1) to (StartRecord + 1 + RunLength)
Begin
      ProcessRecord(RecordNr, Offset)
End
//Save RecordNr for next batch.

Within ProcessRecord the code would search for the first record in the Source Table after the record to be altered where the Gender and the Address Country matched the record to be altered. The search would start a random number of records forward - as determined by the Offset value. One part of the located record was used to update the Destination Table, and the field marked as changed in a log table. If the search reached the end of the Source Table without finding a Country/Gender match then the search would resume at the start of the table, but only to the record before the Source Record of the individual being altered.
The alterations made are described above: if the National ID was being altered then three progressive searches would be made, each returning one part of the ID. In this way the identity value was scrambled but still composed of valid parts.

After processing one batch of records the next batch would be processed using different basic values as obtained by using the system time.
Once this was completed for one small group of fields then the process was restarted for the next group of fields.

Finally, the log would be examined to locate any records with fields that had not been altered. These were altered by hand.

The randomising agent was the system time, which would not recur. Because no precise record of when the operation took place was kept, it would not be reasonable to expect someone to be able to work backwards from the transformed Destination Data to obtain the Source Data, but the transformed data itself, when viewed, appeared to be very normal, to the extent that it caused a major panic when demonstrators first used it and believed that they were showing real data!

So, as I pointed out above, this process does take some time, but yields you some very real-looking data that you can then use for demonstrations and for development purposes.  As I also asked above, if you notice something that is missing or a flaw in the logic, please tell me (in the comments would be a good place!), as leaving a known flaw for others to adopt in ignorance would be bad.

TTFN

Found Food

I have published quite a few recipes here on my blog over the last few years, and I hope that all my readers have tried at least some of the...