Archive for the ‘Automation’ Category

Database Development Life Cycle (DDLC)

Tuesday, July 10th, 2012

The Gator dropped by today and suggested I have a little discuss on the Database Development Life Cycle. “Don’t you mean the Software Development Life Cycle?” Nope… databases are a lot different than software, although software does run and interact with the database. The database is a living, breathing thing that must be fed and nurtured to get the best performance out of it and the software that uses it. An executable or web service can be rebuilt from scratch every time, but once a database is being used in a production capacity you are left with modification in situ.

At Humana, I describe the Database Development Lifecycle thus:

  • Data Refresh
    • Metadata Exchange
    • Security Sync
  • Code Sync
    • Verification
  • Code Turn
    • Verification
  • Database Use
    • QA – UAT
    • Test – System/Integration Testing
    • Dev – Development

As I breakdown these items please keep in mind that not all elements are going to exist in every database environment.

Database Refresh – Data in lower environments (not production) grows stale over time. This stagnation allows the data, though use and testing, to get further from the real world of production. To combat this data stagnation the lower environments should occasionally be refreshed from the production source. More than likely a straight data replacement will not work: in most modern systems a certain amount of environmentally sensitive data exists and must be retained in the target database. To dig deeper into managing metadata, read about SQLDataCompare. Additionally, you will probably need to correct/sync security so that the target database has the correct security on it for the environment you are refreshing rather than the environment that the data came from (production).

Code Sync – When you refresh the data in the target environment, more than likely you will copy over the existing database, and then start modifying it to get it where it is supposed to be. Modern DBMS’s usually have a mix of data and code objects intermixed in the same storage system (It is possible to create boundaries between the two, but it is way beyond this conversation). So once you’ve done the ‘restore’ of production data on the target database, the code objects will need to be “sync’d” so that you can get the target environment back to where it was before the data refresh.

Example: Production is on release 11/08.12, QA is on 11.09.12, and Test/Dev is on 11.10. So if you refresh Test, its code base would be that of production, or 11.08, but it was on 11.10 so QA would need to be resync the database code (e.g., rerun the turn scripts for 11.09 and 11.10).

Once you think you’re sync’d then you need to perform verification to ensure that you know where your starting point is.

Code Promotion– Refreshes are usually at the start a development cycle, which means that you will probably run (test, prove, etc.) the next releases code base (scripts). If you are dealing with QA, the Turn Center will actually do the turn via a TCW. Code turns should ALWAYS be verified, regardless of who does them.

Database Use – This is why the database exists… utilization by an end-use process. QA and Test have the same end-use processes; just the audience is different. In Dev, the end-use process is the actual development and database script creation.

That is it in a nut shell. So….What do you consider the DDLC?

Redgate versus Aliens from Planet SQL

Tuesday, July 3rd, 2012

Data refreshes are one of the hardest things to accomplish in an Agile environment: The downtime between sprints is miniscule and most modern systems have quite a bit of metadata that must be contained within specific environment such that ‘unwanted’ interactions do not occur. “Like what?” I can here you say. How about sending account cancellation to active members during a test. Yeah…I can hear the gasps of horror from here.

This is the Redgate Tools Team, a small software company out of Cambridge, England. How small? Here’s the entire group…..

Ahh, the English, their Easter Egg and their dry sense of humor. Anyway, Redgate builds the best SQL management tools I’ve ever seen: Fast, reliable, and easy to use. They have quite a tool selection, but my favorites are SQLCompare and SQLDataCompare.

SQLCompare is a schema level comparison tool that can either report on the differences between two databases or it can alter the target database to match the sourcedatabase…VERY POWERFUL. There are a few gotcha’s here and there (like circular relationships…has a hard time with those), but when you are in a hurry it’s a great time saver.

While SQLCompare is centered on structure, SQLDataCompare is all about what is in those structures, namely data. Given two databases, it can compare every row on every table with a corresponding row on a corresponding table. Quick…real quick. The configuration can be intimidating, but once you have a handle on it you can get a lot of work done quickly and reliably.

And the best part of Redgate products are that once a project is created and configured it can be run from the command line, which in turn is easily integrated into a schedule, SSIS, or FinalBuilder automation.

So here’s the scenario:

  • Four environments must have refreshed data from PROD: DEV, TEST, QA, and PROOF.
  • Each environment has 5 major databases with a footprint of about 500GB.
  • > 100 tables hold environmentally sensitive data spread out over 5 databases.
  • There can be NO ENVIRONMENTALLY SENSITIVE DATA LEAKAGE! (e.g., Data from PROD can’t exist in TEST if it could cause unwanted results or damage.)
  • Any one environment can be down no longer than 24 hours.
  • All the refreshes have to be done in a calendar week.

So…Here’s the game plan:

We created a new environment (What’s that? Number 7?) that will be used as a staging area where the data refresh preparations will be conducted. This environment will hold an extra database that will gather and hold all environmentally sensitive data from all the environments using SQLDataCompare.

  1. Each Saturday a FinalBuilder project is scheduled to gather all the environmentally sensitive data from each environment and store it in the hold area.
  2. Each Sunday a SQL job restores a production copy of all databases down onto this refresh preparation environment.
  3. A FinalBuilder project is scheduled for 8AM each day:
    1. It recovered non-essential space from the databases in the form of audit, deprecated, and staging tables.
    2. Environmental sensitive data was pulled from the holding area and overlaid on the PROD data depending on the day:
      1. Monday = DEV
      2. Tuesday = TEST
      3. Wednesday = QA
      4. Thursday = PROOF
    3. The data is scrubbed of personal/sensitive data.

On Monday, which is the first time the space recovery and cleansing routines run, it takes about 6 to 7 hours. Each subsequent day takes about 5 minutes.

  1. The Production DBA’s backup the refresh environment and restore it to the target environment.
  2. The Production DBA’s resync the security on the target environment. (So that it works the way it did before the refresh.)
  3. Conduct any necessary code promotions (either though automation or a request, depending on your process).

There…All done. A data refresh used to take all week without the ability to do anything else. Now…we could do all environments in a single week and still have time for additional tasks. Gotta love Redgate.

Why can’t people just make things simpler?

Monday, July 2nd, 2012

Previously I discussed how to automate a SQL Build. One of the things I mentioned was the inability to get a clean list of source code file changes from FinalBuilder (FB) without all the shelf set gibberish. I mean, who really wants to parse a huge set of text trying to find the nuggets of information you need if you don’t have to? Looks like I need to break out Visual Studio!

FinalBuilder comes with a set of API’s to customize ANYTHING so that it can be used with FinalBuilder. This is a two-step process (which I will over simplify):

  1. Write a .NET compliant DLL that:
    1. References the necessary VSoft API.
    2. Inherits from one of the core abstract classes.
  2. Create an Action Package in FinalBuilder Action Studio so that:
    1. FinalBuilder can call the custom DLL.
    2. The custom DLL can get parameters passed to it.

The first thing you need to know is that VSoft’s documentation on their API is less than stellar…way less…OK, it’s non-existent. It took me a couple of weeks, and several emails to the VSoft blokes Down-Under, to figure out how to get an action compiled, set it up in action studio, and deployed in a FB project. The API allows three basic actions, which come with matching abstract classes:

  • ExecuteProgramAction – Used to launch external processes from FB project.
  • StandardAction – Used to interact with a custom .NET DLL.
  • IteratorAction – Used to build a list to iterate through (similar to a For-Each loop) from .NET.

As far as my DLL is concerned, I needed an iterator that interfaced directly. This is how it’s done:

  1. Create a DLL project in Visual Studio.
  2. Add necessary references to the VSoft API.
  3. Create a new class by inheriting from the abstract IteratorAction class.
  4. Add necessary TFS API references.
  5. Inherit the abstract class and implement the required methods. In this case the constructor, InitializeIterator(), Validate(), and GetCurrentValue.
    1. I used Validate() to insert code necessary to validate the properties being sent to my DLL.
    2. InitializeIterator() housed the main logic used to gather the necessary source code files (TFS Items). It looked something like this (condensed for brevity)…
// Define TFS variables.
private TeamFoundationServer _tfs;
private VersionControlServer _vcs;
// Create a list to contain the source code file references.
List<Item> _itemSet = new List<Item>();
// Create objects for TFS and the client services.
_tfs = new TeamFoundationServer("");
_vcs = (VersionControlServer)_tfs.GetService(typeof(VersionControlServer));
// Pull required properties from FB.
_projectPath = ExpandProperty("ProjectPath", true);
bool bValid = DateTime.TryParse(ExpandProperty("SetFromDate", true), out _lastBuildDT);
// Given a path within TFS, find all source code files within that path.
localSet = _vcs.GetItems(_projectPath, RecursionType.Full);
// Loop through all the source code files
foreach (Item i in localSet.Items)
// finding all the source code files that have been altered after a given date
if ((i.ItemType != ItemType.Folder) && (i.CheckinDate.CompareTo(_lastBuildDT) > 0))
// and add them to the list container.
  1. Once initialized, FB will used the GetCurrentValue method to pull item based on position. The latest version of FB uses a 1 based (vs. 0 based) starting position of the iterator, so code the method appropriately:

    // Return the string (TFS path) of the TFS Server Item.

    return _itemSet[iteration-1].ServerItem.ToString();


  2. The only other thing we need is to get the parameters from FB when called, and that is accomplished through the protected property ExpandProperty from the base class BaseAction. This is probably the biggest gotcha and is not explain very well. In a nutshell, the properties you define in Action Studio are called via ExpandProperty to get a value from the FB interface to your DLL. ExpandProperty can be called anytime in the life of your object, even in the constructor. There are also ways to: SetProperty, SendMessage, GetFileSet, GetString,…..All of which are available from the base class BaseAction.
  3. Once I compiled the DLL, I had to move the following files to \\[machine/drive]\Program Files\FinalBuilder 7\ActionDefs\Assemblies\2.0:
    1. FBIntegration.DLL (My new DLL.)
    2. Supporting DLL’s (DLL’s that are being referenced by my DLL…In this case the TFS DLL’s.)

OK…So the DLL is built, now we have to use the DLL to create an action that a FB project can use.

  1. Open FB Action Studio and create a new action package.
  2. While in the new package, create a new action.
  3. Add all the properties that will be passed to the DLL via ExpandProperty, in this case I required:
    1. ProjectPath – TFS path to the set of files I want to check.
    2. SinceDate – A date that represents a cutoff point for changes to be included in the list.
  4. Create a property page (interface form). This will be the representation of your action while interacting with the FB project. I added three controls to the page:
    1. Last Date Time – Corresponds to the SinceDate property being sent to the DLL.
    2. TFS Path – Corresponds to the ProjectPath property being sent to the DLL.
    3. Iterator Variable – Variable that will be stuffed with the GetCurrentValue method’s return value.
  5. On the Action Details section, add two entries (found on the bottom of the page). These tell the action which DLL to use and which class is associated with the action.
    1. Namespace.Classname – In my case this was somthing like Company.Unit.FBIntegration.ChangedItemIterator.
    2. Assembly – Which is the name of the DLL or FBIntegration.dll
  6. Save the package, using an appropriate name for both the package and action.

Done! Now if I open FB can see a new category with a new action item in it (TFS Change Iterator). The best part is that this can be done with anything that can be coded in .NET, so the possibilities are pretty much endless. In fact, this might be a good candidate for a VSoft packaged component. If you do happen to go down this road, I would be happy to guide you and help out.

Process Documentation

Monday, July 2nd, 2012

Once I took over from the previous App DBA, it became apparent rather rapidly that builds were an exercise in patience…and they took a lot of time. Developers requested scripts be run in multiple environments in what seemed to at random intervals. The master turn list was informal and updated manually. Some of the scripts ran out of order, sometimes with devastating effect. Some of the scripts were environment specific without a way to manage their operation. In a word…CHAOS!

So I told my boss that I was going to automate the SQL builds…”You can do that?!??” My response: “With the correct tools and a documented process you can automate anything.”

Based on what we discovered doing the Process Inventory we know that we can get a big payoff from automating the SQL Build. And from what we covered in my series on documentation, you can’t automate what you don’t understand. So, it looks like we need to breakdown and document a process that will do the following:

  • Keep track of changes to the SQL source code.
  • Ensure that a script is written so that it can run in every environment.
  • Ensure that a scripts runs is every environment.
  • Verify that the scripts after they were run.

So with that in mind, he’s the starting scenario: Two development teams in an Agile framework using Microsoft TFS. After several meetings and some chalk talk with the key developers and managers, we came up with the following initial outline.

So this procedure was documented and put into place (with the exception of the FinalBuilder automation). One of the first things we noticed was that we could actually plan a build without worrying about losing some dependent SQL code. The second thing we noticed was that the process was cumbersome. So we got rid of some of the quality gates (How many gates do you really need?) and started coding some customized tools. In this case FinalBuilder.

Seque: If you are doing builds of any kind, YOU NEED FINALBUILDER! Why?

  • It’s quick to setup and use.
  • It integrates with most open-source platforms and source-code control systems.
  • It’s completely customizable, including the ability to use in-house DLL’s. (I’ve done this and it works great!)
  • A license is cheap.

OK….So I get back together with the development teams and we revise the process, which turns into this:

Based on this I write the process documentation, paying particular attention to the detail needed to providing the necessary information for:

  • Code Promotion Requests
  • Production DBA Support
  • Sprint Team Feedback
  • Build List Verification

Once we the team felt comfortable with what I had, we started the automation process in earnest. Until the automation process was completed, the existing process would stay in place. Once completed with testing the FinalBuilder SQL Build, we cut over and never looked back.

There is quite a bit of the unsaid in this article: I apologize for that. I, unfortunately, have limited time that I can dedicate to writing and this subject is quite detailed and extensive. If you would like to have a fuller explanation of any items, please contact me.

First step of Automation: Process Inventory

Friday, June 29th, 2012

One of the first steps to initiating a maturity model program is to take an inventory of what processes exist. There are generally more processes than initially thought in any unit and it takes a bit of time to sit down and hash out what they are.

  1. proc·ess/ˈpräˌses/

    Verb: Perform a series of mechanical or chemical operations on (something) in order to change or preserve it: “the stages in processing the wool”.

    Noun: A series of actions or steps taken to achieve an end. 

 With this definition in mind, how many processes (noun form of definition above) does your unit have? Probably more than you think. Anything can be automated as long as you understand the process steps and you have the necessary resources to accomplish the automation. Let’s take a look at a simple example; Spending Accounts (SPAA) Database Management. Keep in mind this is just an example and not an all-encompassing, cohesive list. Here are some of the processes that exist for managing the database in my old position:

Area Task
Disk Management Database Growth Estimation
Disk Management Performance Monitoring
Build Management Data Refresh
Build Management Database Builds
Build Management Script Builds
Turn Process Build Validation
Turn Process TCW Preparation
Code Management Code Reviews
Code Management Source Control Audits
Code Management Code Branching

 This is a short list of all the processes that are required, but I’d like to keep it simple for this exercise. Next, let’s categorize the processes into major and minor effort (i.e., how much resources does it take to accomplish – this could be considered resource savings or the REWARD), the dependency type of process (internal, external, mixed – i.e., will I need outside resources to finish the task?), and automation effort (i.e., how much resources would it take to automate – this could be considered the cost or the RISK). This is a much tougher task with a lot of variability. It is best done with a small group of knowledgeable associates; any group larger than 4 is going to lose focus and the discussion will breakdown pretty fast.

So let’s say this is the result (keep in mind this is hypothetical):

Area Task Current Effort (Hours/Month) Dependency


Automation Effort

(Effort Months)

Disk Management Database Growth Estimation 1 Internal 2
Disk Management Performance Monitoring 10 External 6
Build Management Data Refresh 60 Mixed 3
Build Management Database Builds 40 Internal 2
Build Management Script Builds 30 Internal 2
Turn Process Build Validation 4 Internal 2
Turn Process TCW Preparation 4 Mixed 1
Turn Process TCW Turn 4 External 2
Code Management Code Reviews 20 Internal 10
Code Management Source Control Audits 4 Internal 1
Code Management Code Branching 4 Internal 1

 So with these numbers we can start the evaluation/prioritization step. Remember, we are looking to get the biggest bang for the buck. There are many measures you could you, but the one I like to use is the Breakeven Point; at what point did the effort payoff? I understand that this is highly subjective and imprecise, but you have to start somewhere. Here are the results:

Area Task Breakeven

Point (Months)

Disk Management Database Growth Estimation 320 7
Disk Management Performance Monitoring 96 6
Build Management Data Refresh 8 2
Build Management Database Builds 8 1
Build Management Script Builds 11 3
Turn Process Build Validation 80 5
Turn Process TCW Preparation 40 4
Turn Process TCW Turn 80 5
Code Management Code Reviews 80 5
Code Management Source Control Audits 40 4
Code Management Code Branching 40 4

 So I can see that Data Refresh and Database Builds are taking the most of my time. They just happen to also be the fastest payoff for an automation investment, so I’ll target them first for automation. I also notice that while these tasks will require the same effort to automate, one has mixed (external) process dependencies so I might want to lower that tasks priority until I can arrange the necessary outside resources time.

Process improvement is a never ending endeavor: On a regular basis (e.g., once a quarter) this exercise should be conducted again to see what has changed and how it impacts existing maturity efforts. Business processes change, new technology becomes available, associates come up with new ideas for automation…These will impact what is possible and how they can be employed to improve the existing processes.

Next we’ll discuss breaking down a process targeted for automation.


Maturity Models

Thursday, June 28th, 2012

I really appreciate, in general, maturity models. They provide a frame work from which organizations can contemplative and improve, and in doing so provide a healthier environment to work in while becoming more efficient and profitable. Probably the most well-known is the Capability Maturity Model which evolved directly out of the software development process. There are some concerns with models like CMM in certain circles. But if you disregard the ideological hype and consultant-generating/copyrighted/patented materials, you can create a very cohesive structure that can effectively meet the necessary business objectives for a lower cost in a shorter timeframe.

Maturity models outline a framework that provides:

  • A starting point for process improvement.
  • A shared lexicon from which information can be exchanged.
  • A way to define and document processes.
  • A structure for prioritizing actions.
  • A way to measure improvement.

I’m not married to a particular maturity model and the belief that goes with it. Rather I prefer to follow a common sense approach that can fit the organization’s idiosyncrasies. Additionally, following an unofficial model allows an individual to institute changes from the bottom-up rather than the top-down.

When it comes right down to it, process improvement is having a process that can be documented, executed, measured, and then evaluated for improvement. The best way to accomplish this is to automate all the repeatable steps in the process.

When I look toward process improvement, I take the following path:

  1. I take an inventory of all known processes and assess the complexity of possible automation (the Risk) and the resources I could save (the Reward).
  2. I prioritize the inventory list based on the Risk/Reward factor and available/desired tools and resources.
  3. Using the prioritize list as a backlog of projects:
    1. Document and verify the process steps.
    2. Investigate and evaluate possible forms of automation, identifying which parts can be and can’t be automated.
    3. Generate the appropriate design documentation (blueprint) required to automate the process
    4. Develop the automation and supporting process measurements
    5. Test and validate the automation
    6. Integrate the automation into the existing process, updating the process documentation.
  4. Once the process(es) are automated, reports can be generated to evaluate the process, find trouble areas and provide feedback back into the backlog.

We’ll look at more details as we progress…The next entry will discuss the Process Inventory.

Data Refreshed & Metadata

Tuesday, June 12th, 2012

Data refreshes are a vital part of preparing an environment for testing. During any rigorous testing phase, two things occur to the data in the test environment: 1) the data gets modified (or even mangled) when testing the extremes of the possible data ranges; and, 2) the data gets stale (further from the current production state). This also occurs in the development environment, but to a lower extent. Normally all you have to do is grab a production backup, push it down to the target environment and overlay the immediate build. But what if you have intermixed environmentally sensitive data with the regular data?

“What the heck is ‘environmentally sensitive data’?” I can here you say. OK…here’s the scenario: You have tables in the application’s database that contain URL’s to the environment’s web service host. You restore the production back onto the testing server, but now you have production URL pointers in the test environment. If testing proceeded without changing the URL’s to the test web service host there could be major issues including, but not limited to:

  • Invalid testing results.
  • Misreported defects.
  • Updating production system during testing.
  • Inappropriate outbound communications. (Yes this does happen!)

What other types of data could cause similar issues: Email addresses, redirecting URL’s, server names, drive URI’s, account numbers, etc. In fact, up to the time I left Spending Accounts, new types of metadata were being integrated into the databases every release. Each new piece of metadata increased the possibility of an issue turning into a full blown production support incident or a PR problem.

Needless to say, we would want to minimize these types of occurrences, so ensuring the correct metadata is in place is vital to both release testing and system integrity. So with this in mind I set out a strategy that would be generic enough to work regardless of the actual platform or number of environments.


The construct’s rules are as follows:

  1. Data is only pushed down to a lower environment (Production -> UAT -> Testing -> Development).
  2. Code is only pushed up to a higher environment (Development -> Testing -> UAT -> Production).
  3. Metadata is maintained only in the environment that it is targeted for.

Sound easy enough? In general, it is pretty easy to grab a backup from a production database and restore it to a lower environment (taking into consideration any size limitations). And it is also pretty easy to overlay a backup with whatever builds (source code) that has not yet been applied.

But the metadata….That is a little tougher. It must be updated ‘in-place’ as it were: You must save it off prior to restore of the production data, and then overwrite whatever was brought over from production. And to make it a little tougher, the records are not always a one to one match. You might also have some issues with surrogate key consistencies that must be taken into account. So when I first started to look at the possible solutions I could:

  • Write DML scripts to completely update the data: A little tedious, even if you write code to create the DML. I would have to generate the scripts before every refresh for every environment. Also there is the problem of interspersed metadata in primarily non-metadata tables…bother!
  • Keep a separate copy of all metadata containing tables, by environment, segregated from the other environments. Then overlay the restored database with the metadata. Sounds much simpler….But is it?

After a little research, I turned to an old friend…the SQLToolbelt from Redgate Software. Specifically, I used the SQL Data Compare tool. This allowed me to:

  • Create projects that allowed the polling and storage of data from a target environment to a common database.
  • Create projects that allowed the syncing of data from the common database to the target environment.
  • Prepare for command-line execution for open-system automation.
  • Create reports to a file (for both later auditing/accountability and concatenate into overall system health report).

I had SQLToolbelt certified here at Humana and we purchased a couple of licenses. We used this tool to automate the data cleansing and refresh process. Later on I’ll give a detailed outline of the actual data refresh process for Spending Accounts.

Primum Non Nocere

Tuesday, June 12th, 2012

One of the items I ran into when I took over as the App DBA for SPAA was the complete disregard of source code control for SQL objects. At times there could be two, or maybe three, different scripts for the same object in the same build. So when the build occurred, it was possible to overwrite change A with change B, which in turn usually resulting in a defect. It was frustrating and it had to change.

So to put an end to this issue we instituted formal source code control for all SQL objects. The rules were:

  • Each SQL object would have a single source code, or SQL script, file.
  • Each SQL object script would contain all DDL, including security grants, so that they would be whole and complete.
  • Each SQL object script would be written in such a way as to Do No Harm (or in Latin: Primum Non Nocere).

Following these rules allowed us to ensure that all changes to a single object were contained within a single file and that file would have a history that could be tracked. They also ensured that if the file was executed, it contained everything necessary to allow proper operation of the SQL object. Lastly, it would allow the script to be run on any environment, at any time, and insuring that the database would be in the proper state.

Do No Harm

The Do No Harm philosophy had two main paths: table scripts, and everything else. The reason the table scripts have to be treated different is that they are objects with permanent state. All the other objects (stored procedures, functions, views, etc.) are stateless. Regardless of which type of scripts was managed, all scripts followed this simple rule:

All DDL manipulation occurred within the confines of a controlled database transaction.

This ensured that the script worked completely, or returned the object to its original state.

So for the stateless objects the rule was drop and recreate. Doing this provided several advantages:

  1. You could query the system tables for when the object what created. This works well if you, say, want to verify a turn the Turn Center just completed.
  2. If the object didn’t exist, it would just create it and move on. Before it was possible to get an error when you ALTERed an object that didn’t exist.

For objects with permanent state (tables), you have to maintain the state. Therefore you have to alter in place…And this is where it gets tricky. So for table scripts the following rules were followed:

  1. Tables are created once, so you had to check for existence before creating the table.
  2. DDL could be added to the script, but existing DDL was never altered.
  3. As new DDL is added to the script, the appropriate requirement checks had to be made prior to execution. Say for example you wanted to add a column, that was segregated from the original CREATE statement and you checked to see if the column exists prior to ALTERing the table and adding the column.

It did take a little time to train the developers to follow this philosophy, but the payoffs were well worth it:

  • It allowed HEAVY automation, including automated SQL builds to the lower environments.
  • It reduced downtime during turns.
  • It reduced turn preparation.
  • It allowed automation of turn verification.
  • It reduced defects.

Now all we need is a starting code base….

Creating the Code Base

To establish the original code base files, we needed a way to create ~12,000 individual SQL scripts. No short order by any imagination. So we used the Do No Harm templates as a base to construct a program that read the system tables of the PROD databases. The program extracted all the necessary information to reverse-engineer the T-SQL to a file, then that file was pushed to the code base structure of TFS. Easy!

The question you have to ask yourself is this: Why isn’t everyone doing this?

Environment Management in an Agile World

Monday, June 11th, 2012

As Agile goes into a full court press, many seem to downplay the importance of environmental management, which is also sometimes known as configuration management. This is usually to the detriment of all concerned.

The focus of Agile is generally on the various aspects of development: Defining what to develop, how to develop, when to develop. But ask yourself this: How many times has anyone talked about environmental management? Automated testing? Production support? I heard nothing about these until my co-worker Keron and I started asking very sharp questions about migration times, data refreshes, and build verifications. Even the instructors for the SCRUM class we attended were befuddled by our questions. It was like it never occurred to anyone that configuration management was important to a successful Agile implementation.

So why is it important? Any derivative of an Agile methodology must include a framework of continuous integration into target environments. This puts the product (process, code, intellectual property, etc.) into a place where you get the most benefit from it in the least amount of time. From a graphical point of view, our ideal SCRUM process flow looks something like this….

Bob Johnson called this diagram the Left Brain/Right Brain Scrum explanation. The actual start and end points are not important; what is important is the understanding of the two very different functions that promote fast turn-around times for any development process. Also, that the ‘team’ goes beyond those in the SCRUM.

The Right Brain is responsible for creativity and pattern recognition: This is the domain of the Development Team. They work the backlog at the discretion of the Product Owner. They get feedback from the Production Support team and make changes as assigned.

The Left Brain is responsible for process and sequencing: This is the domain of the Environment Team. They manage the output from the SCRUM team(s) in a proven, repeatable manner that is easily automated. They gather data about the process to:

  • Implement continuous process improvement.
  • Provide feedback to the Production Owner and SCRUM team(s).

Case in point, one of the repeating tasks that occurred in the environment was a data refresh. The refresh was problematic on several fronts:

  1. Data had to drop down from the PROD environment to the lower environment.
  2. Code had to be promoted up an environment.
  3. The environment specific metadata had to stay in the same environment.

Sounds easy enough, but when you’re talking about 5 databases, >1000 tables, and a ~1.5TB footprint nothing is easy. And, ohh, by the way…you have to do it between Sprints and not slow down development. Without getting in the weeds, we set out a plan (which I’ll write about later in detail) that allowed continuous process improvement to evolve the process naturally and incrementally. The outcome?

A little planning, a little automation and, BAM!, a process that takes a couple of hours. Don’t know about you, but could you imagine the savings of time and resources if you did this with just 10% of your company’s IT processes? (Sniff, sniff) I smell shareholder value!

“I’ve got an order for table 4!”

Thursday, June 7th, 2012

One of the most useful things anyone dealing with a database can learn is the system tables. Period. And there are many reason for this, most of which revolve around getting tedious work done fast and correctly.

Ever wanted to automate table DDL scripts so that they run in the correct order? If you know how the system tables work, you could just ask the database.

Case in point: I automated SQL builds for TEST, QA, Proof, and Prod. One of the issues was that I needed to know what order the table scripts needed to be run in so that I didn’t get a foreign key construction error. My solution was to query the DEV box that the developers used to create/modify the scripts. The steps top make this work:

  1. Get the scripts that were created/modified since the start of the release.
  2. Get the order of all tables from DEV database based on the Parent-Child relationships (foreign keys).
  3. Order the scripts based on where they fell in the list and execute in that order.

Easy! And I can use it to automatically execute the scripts (as I did in TEST), or I can use it to create the properly order DDL script for a TCW request (as I did for QA and PROD). The script is below:

 Declare @Continue int
, @ColContinue int
, @InterSQL nvarchar(4000)
, @SQL nvarchar(4000)
, @TableName varchar(250)
, @ColumnName varchar(250)
, @AllColumns varchar(4000)
, @DatabaseName varchar(200)
, @SourceDatabase varchar(200)
, @TableLevel int
Select @SourceDatabase = DB_Name()
Create table #TableOrder
( [Order] [int] IDENTITY (1, 1) NOT NULL
, [TableName] [varchar] (250) NOT NULL
, [TableLevel] [int] DEFAULT (0)
select @Continue = 1
, @TableLevel = 0
while @Continue > 0
Insert Into #TableOrder
( TableName
, TableLevel
, @TableLevel
from sys.objects SO
where type = 'U'
and SO.object_ID not in ( select distinct so.object_id
from sys.foreign_key_columns SFK
on SFK.Constraint_object_ID = SF.object_ID
inner join sys.objects SOC
on SFK.Constraint_Object_ID = SOC.Object_ID
inner join sys.objects SO
on SFK.referenced_Object_ID = SO.Object_ID
inner join sys.objects SOR
on SFK.Parent_Object_ID = SOR.Object_ID
inner join sys.columns SC
on SOR.Object_ID = SC.Object_ID
and SFK.Parent_Column_ID = Sc.Column_ID
inner join sys.columns SCP
on SO.Object_ID = SCP.Object_ID
and SFK.Referenced_Column_ID = SCP.Column_ID
inner join sys.schemas S
on SOC.Schema_ID = S.Schema_ID
where not in ( Select TableName
From #TableOrder
and not in ( Select TableName
From #TableOrder
select @Continue = @@rowcount
, @TableLevel = @TableLevel + 1
print @Continue
select *
from #TableOrder
order by
TableLevel desc
drop table #TableOrder