Wednesday, March 18, 2020

A CDC Example


The idea of CDC is to capture the changes undergone by the data in a table in order to make use of those changes elsewhere. Very often this is to update a data warehouse with just the changes rather than completely refreshing it. This post is designed to lead you through a simple example so that you get a good feel of how to create and use CDC.
A word of warning before we start off - this is going to be a long one!

Create Your Environment


For this example we start by creating a new database and a new table.

1.     Create the database

create database NWM

2.     Make sure that the SQL Server Agent is running for the database instance

3.     Enable CDC for that new database

exec sys.sp_cdc_enable_db

4.     Check results

select name, is_cdc_enabled from sys.databases

5.     You will get a list of databases and an is_cdc_enabled

      flag. In the example shown the NWM database is enabled.


6.     Next, create a table to set up for CDC. The table must have a primary key. For convenience here we make it an identity field.
    create table nwm_cdc
( id int identity(1,1),
  field_1 varchar(32),
  field_2 varchar(32),
constraint [PK_nwm_cdc] primary key clustered ([id] asc) on [PRIMARY]
 )

7.     Add some data to the table:
    insert into nwm_cdc
    select 'one', 'ein'

8.     Check the list of CDC-enabled tables in the database. You will find that you are returned an empty dataset.
    select * from cdc.change_tables



Set up CDC for your table

In this section you prepare the data table where changes will cause CDC activity.

1.     Set up CDC for the table by running this system procedure:

exec sys.sp_cdc_enable_table  
      @source_schema = N'dbo',             --schema of table with data
      @source_name = N'nwm_cdc',           --name of table with data
      @role_name = null,                   --role of table owner (optional)
      @captured_column_list = N'id, Field_2',--Fields to capture when changed
      @capture_instance = null,            --Name of the CDC “capture instance”
      @filegroup_name = N'PRIMARY',         --filegroup to keep output in
      @supports_net_changes = 1            --1 = merge changes on the same field

2.     Re-run step 8 above. You will now get what you see below. Notice that many of the values are taken from the command in step 1 just above. 
           a.     source_object_id is the id of the table we referenced in step 8, as found in
the sys.databases view.
                                b.     object_id is the id of the capture_instance that we have just created
                                c.     schema_id is the id of the schema of the table we referenced in step 8, as found in
                                                                             the sys.schemas view.
                                d.     capture_instance is the name of the CDC setup that we have just created.
                                                                              There can be two per data table, and each must have a unique
                                                                               name – the default is the table’s own name with _cdc added.

       3.     Now look at the System Tables branch under your database
               in the SSMS Object Explorer: 
               You will see that there are a number of tables with a schema
               cdc. The one that you have just made by performing step 1
               just above is cdc.dbo_nwm_cdc_CT, and this will collect the
               data generated by the CDC system whenever the table is 
               changed.


       4.     Now go further down in the Object Explorer and open up the SQL Server Agent tree.
              It should look like this (right). Two new jobs have been created in order to hold the code necessary
             
for processing the CDC operations.


Note     If you wish to disable CDC on a table then you use  sys.sp_cdc_disable_table, providing it the @source_schema, @source_name, and @capture_instance values to identify the Capture Instance.























Some Rules to Live By ?

Once upon a time I worked for this really cool company in Norway.

We had a company handbook, like everywhere, but this one started a little differently.

Work:You spend at least half of your waking time at work – get the most out of it!
Solutions:Do not choose the easiest solution; choose the one you think is right!
Work pressure:The reward is usually proportional to the difficulties.
Things you dislike:Do something about them; improve them if they are important enough.
Work instructions:Until you are certain someone else has taken over the responsibility, it is your own.
Colleagues:Find out which ones are important to you (organizational chart disregarded) and treat them accordingly.
Instincts:Be skeptical of some instincts, do some of the things you dislike the most, talk to some of the persons you dislike the most.
Performance:If you are honest with yourself, you are the best judge.
Improvements:You are allowed to propose improvements, even if you are not perfect yourself.
Obedience:If you are convinced that you are right, stick to it.
Personality:Be yourself. Like yourself. Improve yourself.
Mistakes:Admit them.
Chances:Take them




Monday, March 16, 2020

Moving to the Cloud.

Tech
No, this isn't going to be all about sneezing while making cake icing - sorry! It'll be about a project I was on recently and why this sort of thing, while time-consuming, really is worthwhile.

The company I was working with had a rather old Pick system completely custom-written for all customer-facing functions, newer warehouse controls, and a variety of reporting systems. They already had an on-premises SQL Server instance to provide an acceptable interface for all the reporting software, and the Pick Basic programmers had created the output software to fill it with data.

Their problem had been that the Pick system was becoming more demanding of maintenance in its old age, so they had decided to upgrade their whole system by replacing the Pick system with a Microsoft Dynamics 365 system, which, of course, lives in the cloud.

Happily, one of the features of the Dynamics system is that pretty much any entity within it can be set to export values every so may seconds (or minutes, ...) to an external instance of Azure SQL Database (named by default "BYOD" !!). The database records (stored in tables defined in the export) include indications of source, date, and time of creation. This allows the developer of an SSIS package on-premises to pull the data, transform it, and use it to update tables in a local SQL Server instance.

The transformation processes can, of course, include not just aggregation by time or by customer, supplier, warehouse, article, etc., but also the change of units - singles to dozens or kg to lbs and oz ! For example, Dynamics might record that 12 items were picked from the warehouse whereas the reporting systems might work in units of a dozen.

So, to convert to using the Dynamics system one must first connect it to the various electronic reporting systems from warehouses, etc. But .... connect it in such a way that what it produces in terms of data resulting from any specific event must be identical to that produced by the legacy system. This sounds simple, but that's as far as the simplicity goes!

So in mid 2018 Microsoft announced an add-on to Dynamics 365 called "BYOD". This would be an Azure SQL Database instance, running beside the Dynamics instance. Entities withing Dynamics could be modified to maintain tables in the BYOD with their values, updating at regular time intervals.
The result of this is that an on-prem system can be fitted with some SQL Server jobs written using SSIS and these jobs used to pull data from the BYOD database. The data can be pulled down, loaded into local staging tables, and then transformed into the formats expected by the programs running off the on-prem system. Once you've got all that designed, implemented, tested, and working then you're ready to move over from your old system to the new cloud one.
The key word here is designed - that includes determining all the pieces of data that the down-stream programs (reporting, etc.) need and then finding them in the new cloud data system. Once you've done that then the job is relatively simple, although tedious and needing a lot of patience.

[UPDATE] March, 2020; It looks like I'm going to be doing another similar project soon ... more info to come. Just keep away from CoViD-19 !!

Found Food

I have published quite a few recipes here on my blog over the last few years, and I hope that all my readers have tried at least some of the...