Automatic DBA

Archive for the ‘Tools’ Category

Database Development Life Cycle (DDLC)

Tuesday, July 10th, 2012

The Gator dropped by today and suggested I have a little discuss on the Database Development Life Cycle. “Don’t you mean the Software Development Life Cycle?” Nope… databases are a lot different than software, although software does run and interact with the database. The database is a living, breathing thing that must be fed and nurtured to get the best performance out of it and the software that uses it. An executable or web service can be rebuilt from scratch every time, but once a database is being used in a production capacity you are left with modification in situ.

At Humana, I describe the Database Development Lifecycle thus:

Data Refresh
- Metadata Exchange
- Security Sync
Code Sync
- Verification
Code Turn
- Verification
Database Use
- QA – UAT
- Test – System/Integration Testing
- Dev – Development

As I breakdown these items please keep in mind that not all elements are going to exist in every database environment.

Database Refresh – Data in lower environments (not production) grows stale over time. This stagnation allows the data, though use and testing, to get further from the real world of production. To combat this data stagnation the lower environments should occasionally be refreshed from the production source. More than likely a straight data replacement will not work: in most modern systems a certain amount of environmentally sensitive data exists and must be retained in the target database. To dig deeper into managing metadata, read about SQLDataCompare. Additionally, you will probably need to correct/sync security so that the target database has the correct security on it for the environment you are refreshing rather than the environment that the data came from (production).

Code Sync – When you refresh the data in the target environment, more than likely you will copy over the existing database, and then start modifying it to get it where it is supposed to be. Modern DBMS’s usually have a mix of data and code objects intermixed in the same storage system (It is possible to create boundaries between the two, but it is way beyond this conversation). So once you’ve done the ‘restore’ of production data on the target database, the code objects will need to be “sync’d” so that you can get the target environment back to where it was before the data refresh.

Example: Production is on release 11/08.12, QA is on 11.09.12, and Test/Dev is on 11.10. So if you refresh Test, its code base would be that of production, or 11.08, but it was on 11.10 so QA would need to be resync the database code (e.g., rerun the turn scripts for 11.09 and 11.10).

Once you think you’re sync’d then you need to perform verification to ensure that you know where your starting point is.

Code Promotion– Refreshes are usually at the start a development cycle, which means that you will probably run (test, prove, etc.) the next releases code base (scripts). If you are dealing with QA, the Turn Center will actually do the turn via a TCW. Code turns should ALWAYS be verified, regardless of who does them.

Database Use – This is why the database exists… utilization by an end-use process. QA and Test have the same end-use processes; just the audience is different. In Dev, the end-use process is the actual development and database script creation.

That is it in a nut shell. So….What do you consider the DDLC?

Posted in Automation, Environment, Tools | No Comments »

Redgate versus Aliens from Planet SQL

Tuesday, July 3rd, 2012

Data refreshes are one of the hardest things to accomplish in an Agile environment: The downtime between sprints is miniscule and most modern systems have quite a bit of metadata that must be contained within specific environment such that ‘unwanted’ interactions do not occur. “Like what?” I can here you say. How about sending account cancellation to active members during a test. Yeah…I can hear the gasps of horror from here.

This is the Redgate Tools Team, a small software company out of Cambridge, England. How small? Here’s the entire group…..

Ahh, the English, their Easter Egg and their dry sense of humor. Anyway, Redgate builds the best SQL management tools I’ve ever seen: Fast, reliable, and easy to use. They have quite a tool selection, but my favorites are SQLCompare and SQLDataCompare.

SQLCompare is a schema level comparison tool that can either report on the differences between two databases or it can alter the target database to match the sourcedatabase…VERY POWERFUL. There are a few gotcha’s here and there (like circular relationships…has a hard time with those), but when you are in a hurry it’s a great time saver.

While SQLCompare is centered on structure, SQLDataCompare is all about what is in those structures, namely data. Given two databases, it can compare every row on every table with a corresponding row on a corresponding table. Quick…real quick. The configuration can be intimidating, but once you have a handle on it you can get a lot of work done quickly and reliably.

And the best part of Redgate products are that once a project is created and configured it can be run from the command line, which in turn is easily integrated into a schedule, SSIS, or FinalBuilder automation.

So here’s the scenario:

Four environments must have refreshed data from PROD: DEV, TEST, QA, and PROOF.
Each environment has 5 major databases with a footprint of about 500GB.
> 100 tables hold environmentally sensitive data spread out over 5 databases.
There can be NO ENVIRONMENTALLY SENSITIVE DATA LEAKAGE! (e.g., Data from PROD can’t exist in TEST if it could cause unwanted results or damage.)
Any one environment can be down no longer than 24 hours.
All the refreshes have to be done in a calendar week.

So…Here’s the game plan:

We created a new environment (What’s that? Number 7?) that will be used as a staging area where the data refresh preparations will be conducted. This environment will hold an extra database that will gather and hold all environmentally sensitive data from all the environments using SQLDataCompare.

Each Saturday a FinalBuilder project is scheduled to gather all the environmentally sensitive data from each environment and store it in the hold area.
Each Sunday a SQL job restores a production copy of all databases down onto this refresh preparation environment.
A FinalBuilder project is scheduled for 8AM each day:
1. It recovered non-essential space from the databases in the form of audit, deprecated, and staging tables.
2. Environmental sensitive data was pulled from the holding area and overlaid on the PROD data depending on the day:
  1. Monday = DEV
  2. Tuesday = TEST
  3. Wednesday = QA
  4. Thursday = PROOF
3. The data is scrubbed of personal/sensitive data.

On Monday, which is the first time the space recovery and cleansing routines run, it takes about 6 to 7 hours. Each subsequent day takes about 5 minutes.

The Production DBA’s backup the refresh environment and restore it to the target environment.
The Production DBA’s resync the security on the target environment. (So that it works the way it did before the refresh.)
Conduct any necessary code promotions (either though automation or a request, depending on your process).

There…All done. A data refresh used to take all week without the ability to do anything else. Now…we could do all environments in a single week and still have time for additional tasks. Gotta love Redgate.

Posted in Automation, Environment, Tools | No Comments »

Why can’t people just make things simpler?

Monday, July 2nd, 2012

Previously I discussed how to automate a SQL Build. One of the things I mentioned was the inability to get a clean list of source code file changes from FinalBuilder (FB) without all the shelf set gibberish. I mean, who really wants to parse a huge set of text trying to find the nuggets of information you need if you don’t have to? Looks like I need to break out Visual Studio!

FinalBuilder comes with a set of API’s to customize ANYTHING so that it can be used with FinalBuilder. This is a two-step process (which I will over simplify):

Write a .NET compliant DLL that:
1. References the necessary VSoft API.
2. Inherits from one of the core abstract classes.
Create an Action Package in FinalBuilder Action Studio so that:
1. FinalBuilder can call the custom DLL.
2. The custom DLL can get parameters passed to it.

The first thing you need to know is that VSoft’s documentation on their API is less than stellar…way less…OK, it’s non-existent. It took me a couple of weeks, and several emails to the VSoft blokes Down-Under, to figure out how to get an action compiled, set it up in action studio, and deployed in a FB project. The API allows three basic actions, which come with matching abstract classes:

ExecuteProgramAction – Used to launch external processes from FB project.
StandardAction – Used to interact with a custom .NET DLL.
IteratorAction – Used to build a list to iterate through (similar to a For-Each loop) from .NET.

As far as my DLL is concerned, I needed an iterator that interfaced directly. This is how it’s done:

Create a DLL project in Visual Studio.

Add necessary references to the VSoft API.

Create a new class by inheriting from the abstract IteratorAction class.

Add necessary TFS API references.

Inherit the abstract class and implement the required methods. In this case the constructor, InitializeIterator(), Validate(), and GetCurrentValue.

I used Validate() to insert code necessary to validate the properties being sent to my DLL.

InitializeIterator() housed the main logic used to gather the necessary source code files (TFS Items). It looked something like this (condensed for brevity)…
// Define TFS variables.
private TeamFoundationServer _tfs;
private VersionControlServer _vcs;
// Create a list to contain the source code file references.
List<Item> _itemSet = new List<Item>();
// Create objects for TFS and the client services.
_tfs = new TeamFoundationServer("tfs.humana.com");
_vcs = (VersionControlServer)_tfs.GetService(typeof(VersionControlServer));
// Pull required properties from FB.
_projectPath = ExpandProperty("ProjectPath", true);
bool bValid = DateTime.TryParse(ExpandProperty("SetFromDate", true), out _lastBuildDT);
// Given a path within TFS, find all source code files within that path.
localSet = _vcs.GetItems(_projectPath, RecursionType.Full);
// Loop through all the source code files
foreach (Item i in localSet.Items)
{
// finding all the source code files that have been altered after a given date
if ((i.ItemType != ItemType.Folder) && (i.CheckinDate.CompareTo(_lastBuildDT) > 0))
{
// and add them to the list container.
_itemSet.Add(i);
}
}

Once initialized, FB will used the GetCurrentValue method to pull item based on position. The latest version of FB uses a 1 based (vs. 0 based) starting position of the iterator, so code the method appropriately:

// Return the string (TFS path) of the TFS Server Item.

return _itemSet[iteration-1].ServerItem.ToString();
The only other thing we need is to get the parameters from FB when called, and that is accomplished through the protected property ExpandProperty from the base class BaseAction. This is probably the biggest gotcha and is not explain very well. In a nutshell, the properties you define in Action Studio are called via ExpandProperty to get a value from the FB interface to your DLL. ExpandProperty can be called anytime in the life of your object, even in the constructor. There are also ways to: SetProperty, SendMessage, GetFileSet, GetString,…..All of which are available from the base class BaseAction.
Once I compiled the DLL, I had to move the following files to \\[machine/drive]\Program Files\FinalBuilder 7\ActionDefs\Assemblies\2.0:
1. FBIntegration.DLL (My new DLL.)
2. Supporting DLL’s (DLL’s that are being referenced by my DLL…In this case the TFS DLL’s.)

OK…So the DLL is built, now we have to use the DLL to create an action that a FB project can use.

Open FB Action Studio and create a new action package.
While in the new package, create a new action.
Add all the properties that will be passed to the DLL via ExpandProperty, in this case I required:
1. ProjectPath – TFS path to the set of files I want to check.
2. SinceDate – A date that represents a cutoff point for changes to be included in the list.
Create a property page (interface form). This will be the representation of your action while interacting with the FB project. I added three controls to the page:
1. Last Date Time – Corresponds to the SinceDate property being sent to the DLL.
2. TFS Path – Corresponds to the ProjectPath property being sent to the DLL.
3. Iterator Variable – Variable that will be stuffed with the GetCurrentValue method’s return value.
On the Action Details section, add two entries (found on the bottom of the page). These tell the action which DLL to use and which class is associated with the action.
1. Namespace.Classname – In my case this was somthing like Company.Unit.FBIntegration.ChangedItemIterator.
2. Assembly – Which is the name of the DLL or FBIntegration.dll
Save the package, using an appropriate name for both the package and action.

Done! Now if I open FB can see a new category with a new action item in it (TFS Change Iterator). The best part is that this can be done with anything that can be coded in .NET, so the possibilities are pretty much endless. In fact, this might be a good candidate for a VSoft packaged component. If you do happen to go down this road, I would be happy to guide you and help out.

Posted in Automation, Environment, Tools | No Comments »

Process Documentation

Monday, July 2nd, 2012

Once I took over from the previous App DBA, it became apparent rather rapidly that builds were an exercise in patience…and they took a lot of time. Developers requested scripts be run in multiple environments in what seemed to at random intervals. The master turn list was informal and updated manually. Some of the scripts ran out of order, sometimes with devastating effect. Some of the scripts were environment specific without a way to manage their operation. In a word…CHAOS!

So I told my boss that I was going to automate the SQL builds…”You can do that?!??” My response: “With the correct tools and a documented process you can automate anything.”

Based on what we discovered doing the Process Inventory we know that we can get a big payoff from automating the SQL Build. And from what we covered in my series on documentation, you can’t automate what you don’t understand. So, it looks like we need to breakdown and document a process that will do the following:

Keep track of changes to the SQL source code.
Ensure that a script is written so that it can run in every environment.
Ensure that a scripts runs is every environment.
Verify that the scripts after they were run.

So with that in mind, he’s the starting scenario: Two development teams in an Agile framework using Microsoft TFS. After several meetings and some chalk talk with the key developers and managers, we came up with the following initial outline.

So this procedure was documented and put into place (with the exception of the FinalBuilder automation). One of the first things we noticed was that we could actually plan a build without worrying about losing some dependent SQL code. The second thing we noticed was that the process was cumbersome. So we got rid of some of the quality gates (How many gates do you really need?) and started coding some customized tools. In this case FinalBuilder.

Seque: If you are doing builds of any kind, YOU NEED FINALBUILDER! Why?

It’s quick to setup and use.
It integrates with most open-source platforms and source-code control systems.
It’s completely customizable, including the ability to use in-house DLL’s. (I’ve done this and it works great!)
A license is cheap.

OK….So I get back together with the development teams and we revise the process, which turns into this:

Based on this I write the process documentation, paying particular attention to the detail needed to providing the necessary information for:

Code Promotion Requests
Production DBA Support
Sprint Team Feedback
Build List Verification

Once we the team felt comfortable with what I had, we started the automation process in earnest. Until the automation process was completed, the existing process would stay in place. Once completed with testing the FinalBuilder SQL Build, we cut over and never looked back.

There is quite a bit of the unsaid in this article: I apologize for that. I, unfortunately, have limited time that I can dedicate to writing and this subject is quite detailed and extensive. If you would like to have a fuller explanation of any items, please contact me.

Posted in Agile, Automation, Environment, Tools | No Comments »

Data Refreshed & Metadata

Tuesday, June 12th, 2012

Data refreshes are a vital part of preparing an environment for testing. During any rigorous testing phase, two things occur to the data in the test environment: 1) the data gets modified (or even mangled) when testing the extremes of the possible data ranges; and, 2) the data gets stale (further from the current production state). This also occurs in the development environment, but to a lower extent. Normally all you have to do is grab a production backup, push it down to the target environment and overlay the immediate build. But what if you have intermixed environmentally sensitive data with the regular data?

“What the heck is ‘environmentally sensitive data’?” I can here you say. OK…here’s the scenario: You have tables in the application’s database that contain URL’s to the environment’s web service host. You restore the production back onto the testing server, but now you have production URL pointers in the test environment. If testing proceeded without changing the URL’s to the test web service host there could be major issues including, but not limited to:

Invalid testing results.
Misreported defects.
Updating production system during testing.
Inappropriate outbound communications. (Yes this does happen!)

What other types of data could cause similar issues: Email addresses, redirecting URL’s, server names, drive URI’s, account numbers, etc. In fact, up to the time I left Spending Accounts, new types of metadata were being integrated into the databases every release. Each new piece of metadata increased the possibility of an issue turning into a full blown production support incident or a PR problem.

Needless to say, we would want to minimize these types of occurrences, so ensuring the correct metadata is in place is vital to both release testing and system integrity. So with this in mind I set out a strategy that would be generic enough to work regardless of the actual platform or number of environments.

The construct’s rules are as follows:

Data is only pushed down to a lower environment (Production -> UAT -> Testing -> Development).
Code is only pushed up to a higher environment (Development -> Testing -> UAT -> Production).
Metadata is maintained only in the environment that it is targeted for.

Sound easy enough? In general, it is pretty easy to grab a backup from a production database and restore it to a lower environment (taking into consideration any size limitations). And it is also pretty easy to overlay a backup with whatever builds (source code) that has not yet been applied.

But the metadata….That is a little tougher. It must be updated ‘in-place’ as it were: You must save it off prior to restore of the production data, and then overwrite whatever was brought over from production. And to make it a little tougher, the records are not always a one to one match. You might also have some issues with surrogate key consistencies that must be taken into account. So when I first started to look at the possible solutions I could:

Write DML scripts to completely update the data: A little tedious, even if you write code to create the DML. I would have to generate the scripts before every refresh for every environment. Also there is the problem of interspersed metadata in primarily non-metadata tables…bother!
Keep a separate copy of all metadata containing tables, by environment, segregated from the other environments. Then overlay the restored database with the metadata. Sounds much simpler….But is it?

After a little research, I turned to an old friend…the SQLToolbelt from Redgate Software. Specifically, I used the SQL Data Compare tool. This allowed me to:

Create projects that allowed the polling and storage of data from a target environment to a common database.
Create projects that allowed the syncing of data from the common database to the target environment.
Prepare for command-line execution for open-system automation.
Create reports to a file (for both later auditing/accountability and concatenate into overall system health report).

I had SQLToolbelt certified here at Humana and we purchased a couple of licenses. We used this tool to automate the data cleansing and refresh process. Later on I’ll give a detailed outline of the actual data refresh process for Spending Accounts.

Tags: Red Gate
Posted in Automation, Environment, Tools | No Comments »

Archive for the ‘Tools’ Category

Database Development Life Cycle (DDLC)

Redgate versus Aliens from Planet SQL

Why can’t people just make things simpler?

Process Documentation

Data Refreshed & Metadata

Pages

Archives

Categories