Archive for July, 2012

Reverse Engineering

Tuesday, July 10th, 2012

I support several units outside of the Enterprise Data Warehouse to retrieve, process, and export information. Most of this work is done using straight PL/SQL (Oracle’s transactional SQL language). However there are quite a few processes on a platform that we no longer want to support. These processes have a minimum (and I do mean minimum) amount of architectural documentation. This leaves me with the unenviable position of having to reverse engineer these processes and draw up a plan to convert them to a new platform.

Reverse Engineering – The process of discovering the technological principles of a device, object, or system through analysis of its structure, function, and operation.

Just to be clear: The only reason this is necessary is that the original architectural documentation was a) lost, or b) never completed. If the person who originally created the process or someone who understands the process was available to write the architectural documentation, reverse engineering would not be necessary. This, however, is not the case I found myself in.

So…How do you reverse engineer something? Here are the steps I’m using:

  • Get access to the current system.
  • Use current system to create a sandbox area.
    • If a sandbox already exists, ensure that the sandbox has the most up-to-date source code from the production environment.
  • Familiarize yourself with the existing platform: You need just enough knowledge to navigate the environment and read the processes. You don’t need to program it.
  • Start at a very high level and work your way down to the details. Each level that is delved into gives insight and clarity to the next. During this iterative process create the following documents:
    • Diagram of the major items. (These items will get smaller and more detailed the further down the iterative path you get.)
    • Description of the major items.
    • Listing of item’s code elements and locations.
    • Listing of major business rules.
    • Any other information that clarifies the system.
  • Analyze documentation and look for areas for efficiency gains and process streamlining. Modify documentation where appropriate.

Yeah…This is boring and tedious and VERY TIME CONSUMING. Example: My current documentation for the DMExpress jobs I’ve targeted for conversion is just north of 140 pages! Think of it…how much is it costing the company when this type of thing happens?

What it is not is a waste of time. Once completed I can sit down with a group of developers and break down this documentation into Agile stories ready for development. And this is exactly what I plan on doing.

I will return to my regular subjects shortly.

Database Development Life Cycle (DDLC)

Tuesday, July 10th, 2012

The Gator dropped by today and suggested I have a little discuss on the Database Development Life Cycle. “Don’t you mean the Software Development Life Cycle?” Nope… databases are a lot different than software, although software does run and interact with the database. The database is a living, breathing thing that must be fed and nurtured to get the best performance out of it and the software that uses it. An executable or web service can be rebuilt from scratch every time, but once a database is being used in a production capacity you are left with modification in situ.

At Humana, I describe the Database Development Lifecycle thus:

  • Data Refresh
    • Metadata Exchange
    • Security Sync
  • Code Sync
    • Verification
  • Code Turn
    • Verification
  • Database Use
    • QA – UAT
    • Test – System/Integration Testing
    • Dev – Development

As I breakdown these items please keep in mind that not all elements are going to exist in every database environment.

Database Refresh – Data in lower environments (not production) grows stale over time. This stagnation allows the data, though use and testing, to get further from the real world of production. To combat this data stagnation the lower environments should occasionally be refreshed from the production source. More than likely a straight data replacement will not work: in most modern systems a certain amount of environmentally sensitive data exists and must be retained in the target database. To dig deeper into managing metadata, read about SQLDataCompare. Additionally, you will probably need to correct/sync security so that the target database has the correct security on it for the environment you are refreshing rather than the environment that the data came from (production).

Code Sync – When you refresh the data in the target environment, more than likely you will copy over the existing database, and then start modifying it to get it where it is supposed to be. Modern DBMS’s usually have a mix of data and code objects intermixed in the same storage system (It is possible to create boundaries between the two, but it is way beyond this conversation). So once you’ve done the ‘restore’ of production data on the target database, the code objects will need to be “sync’d” so that you can get the target environment back to where it was before the data refresh.

Example: Production is on release 11/08.12, QA is on 11.09.12, and Test/Dev is on 11.10. So if you refresh Test, its code base would be that of production, or 11.08, but it was on 11.10 so QA would need to be resync the database code (e.g., rerun the turn scripts for 11.09 and 11.10).

Once you think you’re sync’d then you need to perform verification to ensure that you know where your starting point is.

Code Promotion– Refreshes are usually at the start a development cycle, which means that you will probably run (test, prove, etc.) the next releases code base (scripts). If you are dealing with QA, the Turn Center will actually do the turn via a TCW. Code turns should ALWAYS be verified, regardless of who does them.

Database Use – This is why the database exists… utilization by an end-use process. QA and Test have the same end-use processes; just the audience is different. In Dev, the end-use process is the actual development and database script creation.

That is it in a nut shell. So….What do you consider the DDLC?

A view that calls a view that calls a view that calls a view that….

Monday, July 9th, 2012

OK, I have a serious pet peeve: Views calling views. Who the heck started that? Why would we bury simple code so deep that you can’t immediately see an issue? How can you properly optimize the thing?!???

I’ve had developers give me quite a few reasons why they stack views (views calling views). The reasons generally fall into these categories:

  • It reduces the code we have to write.
  • Easier to maintain.
  • Provides a layer of abstraction.

Views are not an arbitrary feature that databases have; they serve a purpose. Views exist to:

  • Simplify security.
  • Simplify coding.
  • Simplify presentation.

The key word here is SIMPLIFY….

sim·pli·fy (sîm′plê-fì) tr.v. sim·pli·fied, sim·pli·fy·ing, sim·pli·fies

To make simple or simpler, as:

a. To reduce in complexity or extent.

b. To reduce to fundamental parts.

c. To make easier to understand.

So, in what way is an architecture where views are calling views simplifying?

Reasons why stacking views are misguided:

SQL is not OO. Having stacks of views does not make the code more efficient or faster. Any measurement thus would be coincidental.

  • Views are compiled base on position. Changing a referenced view could possibly affect referencing views. To prevent this you have to refresh the views in the order of referencing.
  • Dependency verification is not straightforward. You might not see all the dependencies using the standard system views that detail dependencies.
  • Code obfuscation: Views abstract complexity (i.e., simplification), so when you use an abstraction to create another abstraction you start to lose sight of what is really going on (i.e., add complexity).
  • Duplicate references: Once the complexity is hidden in the code, references to the same table or column are used multiple times without it being readily acknowledgeable.
  • It is difficult to properly optimize a view embedded in another view; ultimately you end up trying to optimize multiple views w/o understanding the effect on other referencing objects.

This is by no means an exhaustive list and each situation is a little different, but in general this covers most of the issues. Views, as with any feature, should not be abused. Just because something is possible doesn’t make it a good idea.

Battlespace Management

Monday, July 9th, 2012

We’ve all been there: A project that has gone south and needs to be put back on track quickly. Resources get thrown into the breach to bolster customer and management confidence. When this situation occurs the manager dealing with the issue can get myopic pretty quickly. How can he do anything but focus like a laser on the one thing that everyone is looking at?

But focus comes at a cost. Other issues can quickly rise out of the turmoil and out flank a manager’s ability to effectively see and deal with it. Some of the available resources do not get fully utilized. There is dead space in the feedback loop. This is where Battlespace Management comes into play:

Battlespace Management (BM) – The adaptive means and measures that enable the dynamic synchronization of activities and resources to provide the desired outcome. BM has always been important but the nature of modern operations requires ever lower levels of command to plan and execute increasingly complex BM.

Yes, it is a military definition but can easily translate to business. In fact many consulting firms use military comparisons regularly. Whole books are written on the subject.

Battlespace Management has four main principles:

  • Universal Application – BM is applied at all levels, the only difference being the means and measures. All applied activities require coordination, synchronization and prioritization.
    • Strategic Management – Focuses on resources acquisition and management.
    • Tactical Management – Focuses on resource assignment and task measurement.
  • Interaction – Resources interact; therefore activities will (eventually) impact each other.
  • Coordination and Control – As resources interact, control must be asserted to maximize benefit and minimize interruptions.
  • Collaboration – Seldom can activities remain compartmentalized. Collaboration occurs both up and down the command chain, as well as across environmental boundaries.

So how could this understanding benefit an IT unit?

  • Standard Operating Procedures: Clear cut procedures that exist in a process improvement environment that allows any member of a team to easily initiate required work.
  • Task Integration vs. Responsibility Silos: Each member of a team should be familiar (but maybe not an expert) with all aspects of a unit’s area or responsibility.
  • Liaisons: Have members of the team assigned to interact with support units. These liaisons are used to reduce the ‘friction’ that can naturally occur when coordinating tasks and resources w/o the need for supervisorial involvement.
  • Training and Preparation: Each member should be competent in all platforms the unit operates in and a training regime should exist to support this.

If a unit followed these ideas it would be a lot harder to be surprised (flanked) by new developments within their area of responsibility. Has anybody seen a related case-study in one of their MBA classes?

“Technical skill is mastery of complexity, while creativity is mastery of simplicity” – Erik Christopher Zeeman

Tuesday, July 3rd, 2012

In the past I’ve been called technical…Usually in the phrase, “Look, I don’t want you to get to technical in this meeting.” Heck, if I didn’t know any better I would think that was an insult.

But I don’t think I’m that technical. There are many more people a lot more technical than I. I think of myself as procedural….as in, “A standard procedure could remove the complexity of what we are trying to do and make it simpler…and more efficient!”

To wit the situation I’ve seen with turns in the Enterprise Data Warehouse: They are complex…very complex with many rules and exceptions. At least that is how it appears. And, in general, appearance (perception) is reality. This is further complicated by having each developer do their own turn requests, which in turn is complicated by the turnover, especially with the contracted/offshored resources.

So what is the cost of all this complexity?

  • Miscommunications
  • Delays
  • Defects
  • Rework/Resubmissions

It’s like throwing sand in a transfer case! So what is the answer? My proposed solution would be twofold:

  1. Create a living document that consolidates all the turn request processes/knowledge/lessons learned into a single, cohesive standard operating procedure.
    1. Create a logic-branched procedure so that someone could literally ‘walk through’ the process…step by step.
    2. Cover all aspects of a turn: Sign-offs, security requests, document transfers, code file staging, job templates, etc.
    3. Create examples for the most common cases.
  2. Have a turn Subject Matter Expert (and a backup) who is responsible for all turns within a unit.
    1. They would serve as the liaison between the DBA group and the unit doing the work.
    2. They would advise/facilitate all turns.
    3. They would provide quality control and feedback for the procedure outlined above.
    4. They would automate what pieces of the process that can be automated.

And the best part? This is not theory. This is how my old unit operates day in and day out; regular turn or emergency turn. It works, it works well, and has allowed the unit to split out some of their resources to work on other projects (like Mobile Computing!). Gotta love simplicity!

Redgate versus Aliens from Planet SQL

Tuesday, July 3rd, 2012

Data refreshes are one of the hardest things to accomplish in an Agile environment: The downtime between sprints is miniscule and most modern systems have quite a bit of metadata that must be contained within specific environment such that ‘unwanted’ interactions do not occur. “Like what?” I can here you say. How about sending account cancellation to active members during a test. Yeah…I can hear the gasps of horror from here.

This is the Redgate Tools Team, a small software company out of Cambridge, England. How small? Here’s the entire group…..

Ahh, the English, their Easter Egg and their dry sense of humor. Anyway, Redgate builds the best SQL management tools I’ve ever seen: Fast, reliable, and easy to use. They have quite a tool selection, but my favorites are SQLCompare and SQLDataCompare.

SQLCompare is a schema level comparison tool that can either report on the differences between two databases or it can alter the target database to match the sourcedatabase…VERY POWERFUL. There are a few gotcha’s here and there (like circular relationships…has a hard time with those), but when you are in a hurry it’s a great time saver.

While SQLCompare is centered on structure, SQLDataCompare is all about what is in those structures, namely data. Given two databases, it can compare every row on every table with a corresponding row on a corresponding table. Quick…real quick. The configuration can be intimidating, but once you have a handle on it you can get a lot of work done quickly and reliably.

And the best part of Redgate products are that once a project is created and configured it can be run from the command line, which in turn is easily integrated into a schedule, SSIS, or FinalBuilder automation.

So here’s the scenario:

  • Four environments must have refreshed data from PROD: DEV, TEST, QA, and PROOF.
  • Each environment has 5 major databases with a footprint of about 500GB.
  • > 100 tables hold environmentally sensitive data spread out over 5 databases.
  • There can be NO ENVIRONMENTALLY SENSITIVE DATA LEAKAGE! (e.g., Data from PROD can’t exist in TEST if it could cause unwanted results or damage.)
  • Any one environment can be down no longer than 24 hours.
  • All the refreshes have to be done in a calendar week.

So…Here’s the game plan:

We created a new environment (What’s that? Number 7?) that will be used as a staging area where the data refresh preparations will be conducted. This environment will hold an extra database that will gather and hold all environmentally sensitive data from all the environments using SQLDataCompare.

  1. Each Saturday a FinalBuilder project is scheduled to gather all the environmentally sensitive data from each environment and store it in the hold area.
  2. Each Sunday a SQL job restores a production copy of all databases down onto this refresh preparation environment.
  3. A FinalBuilder project is scheduled for 8AM each day:
    1. It recovered non-essential space from the databases in the form of audit, deprecated, and staging tables.
    2. Environmental sensitive data was pulled from the holding area and overlaid on the PROD data depending on the day:
      1. Monday = DEV
      2. Tuesday = TEST
      3. Wednesday = QA
      4. Thursday = PROOF
    3. The data is scrubbed of personal/sensitive data.

On Monday, which is the first time the space recovery and cleansing routines run, it takes about 6 to 7 hours. Each subsequent day takes about 5 minutes.

  1. The Production DBA’s backup the refresh environment and restore it to the target environment.
  2. The Production DBA’s resync the security on the target environment. (So that it works the way it did before the refresh.)
  3. Conduct any necessary code promotions (either though automation or a request, depending on your process).

There…All done. A data refresh used to take all week without the ability to do anything else. Now…we could do all environments in a single week and still have time for additional tasks. Gotta love Redgate.

Why can’t people just make things simpler?

Monday, July 2nd, 2012

Previously I discussed how to automate a SQL Build. One of the things I mentioned was the inability to get a clean list of source code file changes from FinalBuilder (FB) without all the shelf set gibberish. I mean, who really wants to parse a huge set of text trying to find the nuggets of information you need if you don’t have to? Looks like I need to break out Visual Studio!

FinalBuilder comes with a set of API’s to customize ANYTHING so that it can be used with FinalBuilder. This is a two-step process (which I will over simplify):

  1. Write a .NET compliant DLL that:
    1. References the necessary VSoft API.
    2. Inherits from one of the core abstract classes.
  2. Create an Action Package in FinalBuilder Action Studio so that:
    1. FinalBuilder can call the custom DLL.
    2. The custom DLL can get parameters passed to it.

The first thing you need to know is that VSoft’s documentation on their API is less than stellar…way less…OK, it’s non-existent. It took me a couple of weeks, and several emails to the VSoft blokes Down-Under, to figure out how to get an action compiled, set it up in action studio, and deployed in a FB project. The API allows three basic actions, which come with matching abstract classes:

  • ExecuteProgramAction – Used to launch external processes from FB project.
  • StandardAction – Used to interact with a custom .NET DLL.
  • IteratorAction – Used to build a list to iterate through (similar to a For-Each loop) from .NET.

As far as my DLL is concerned, I needed an iterator that interfaced directly. This is how it’s done:

  1. Create a DLL project in Visual Studio.
  2. Add necessary references to the VSoft API.
  3. Create a new class by inheriting from the abstract IteratorAction class.
  4. Add necessary TFS API references.
  5. Inherit the abstract class and implement the required methods. In this case the constructor, InitializeIterator(), Validate(), and GetCurrentValue.
    1. I used Validate() to insert code necessary to validate the properties being sent to my DLL.
    2. InitializeIterator() housed the main logic used to gather the necessary source code files (TFS Items). It looked something like this (condensed for brevity)…
// Define TFS variables.
private TeamFoundationServer _tfs;
private VersionControlServer _vcs;
// Create a list to contain the source code file references.
List<Item> _itemSet = new List<Item>();
// Create objects for TFS and the client services.
_tfs = new TeamFoundationServer("tfs.humana.com");
_vcs = (VersionControlServer)_tfs.GetService(typeof(VersionControlServer));
// Pull required properties from FB.
_projectPath = ExpandProperty("ProjectPath", true);
bool bValid = DateTime.TryParse(ExpandProperty("SetFromDate", true), out _lastBuildDT);
// Given a path within TFS, find all source code files within that path.
localSet = _vcs.GetItems(_projectPath, RecursionType.Full);
// Loop through all the source code files
foreach (Item i in localSet.Items)
{
// finding all the source code files that have been altered after a given date
if ((i.ItemType != ItemType.Folder) && (i.CheckinDate.CompareTo(_lastBuildDT) > 0))
{
// and add them to the list container.
_itemSet.Add(i);
}
}
  1. Once initialized, FB will used the GetCurrentValue method to pull item based on position. The latest version of FB uses a 1 based (vs. 0 based) starting position of the iterator, so code the method appropriately:

    // Return the string (TFS path) of the TFS Server Item.

    return _itemSet[iteration-1].ServerItem.ToString();

     

  2. The only other thing we need is to get the parameters from FB when called, and that is accomplished through the protected property ExpandProperty from the base class BaseAction. This is probably the biggest gotcha and is not explain very well. In a nutshell, the properties you define in Action Studio are called via ExpandProperty to get a value from the FB interface to your DLL. ExpandProperty can be called anytime in the life of your object, even in the constructor. There are also ways to: SetProperty, SendMessage, GetFileSet, GetString,…..All of which are available from the base class BaseAction.
  3. Once I compiled the DLL, I had to move the following files to \\[machine/drive]\Program Files\FinalBuilder 7\ActionDefs\Assemblies\2.0:
    1. FBIntegration.DLL (My new DLL.)
    2. Supporting DLL’s (DLL’s that are being referenced by my DLL…In this case the TFS DLL’s.)

OK…So the DLL is built, now we have to use the DLL to create an action that a FB project can use.

  1. Open FB Action Studio and create a new action package.
  2. While in the new package, create a new action.
  3. Add all the properties that will be passed to the DLL via ExpandProperty, in this case I required:
    1. ProjectPath – TFS path to the set of files I want to check.
    2. SinceDate – A date that represents a cutoff point for changes to be included in the list.
  4. Create a property page (interface form). This will be the representation of your action while interacting with the FB project. I added three controls to the page:
    1. Last Date Time – Corresponds to the SinceDate property being sent to the DLL.
    2. TFS Path – Corresponds to the ProjectPath property being sent to the DLL.
    3. Iterator Variable – Variable that will be stuffed with the GetCurrentValue method’s return value.
  5. On the Action Details section, add two entries (found on the bottom of the page). These tell the action which DLL to use and which class is associated with the action.
    1. Namespace.Classname – In my case this was somthing like Company.Unit.FBIntegration.ChangedItemIterator.
    2. Assembly – Which is the name of the DLL or FBIntegration.dll
  6. Save the package, using an appropriate name for both the package and action.

Done! Now if I open FB can see a new category with a new action item in it (TFS Change Iterator). The best part is that this can be done with anything that can be coded in .NET, so the possibilities are pretty much endless. In fact, this might be a good candidate for a VSoft packaged component. If you do happen to go down this road, I would be happy to guide you and help out.

Process Documentation

Monday, July 2nd, 2012

Once I took over from the previous App DBA, it became apparent rather rapidly that builds were an exercise in patience…and they took a lot of time. Developers requested scripts be run in multiple environments in what seemed to at random intervals. The master turn list was informal and updated manually. Some of the scripts ran out of order, sometimes with devastating effect. Some of the scripts were environment specific without a way to manage their operation. In a word…CHAOS!

So I told my boss that I was going to automate the SQL builds…”You can do that?!??” My response: “With the correct tools and a documented process you can automate anything.”

Based on what we discovered doing the Process Inventory we know that we can get a big payoff from automating the SQL Build. And from what we covered in my series on documentation, you can’t automate what you don’t understand. So, it looks like we need to breakdown and document a process that will do the following:

  • Keep track of changes to the SQL source code.
  • Ensure that a script is written so that it can run in every environment.
  • Ensure that a scripts runs is every environment.
  • Verify that the scripts after they were run.

So with that in mind, he’s the starting scenario: Two development teams in an Agile framework using Microsoft TFS. After several meetings and some chalk talk with the key developers and managers, we came up with the following initial outline.

So this procedure was documented and put into place (with the exception of the FinalBuilder automation). One of the first things we noticed was that we could actually plan a build without worrying about losing some dependent SQL code. The second thing we noticed was that the process was cumbersome. So we got rid of some of the quality gates (How many gates do you really need?) and started coding some customized tools. In this case FinalBuilder.

Seque: If you are doing builds of any kind, YOU NEED FINALBUILDER! Why?

  • It’s quick to setup and use.
  • It integrates with most open-source platforms and source-code control systems.
  • It’s completely customizable, including the ability to use in-house DLL’s. (I’ve done this and it works great!)
  • A license is cheap.

OK….So I get back together with the development teams and we revise the process, which turns into this:

Based on this I write the process documentation, paying particular attention to the detail needed to providing the necessary information for:

  • Code Promotion Requests
  • Production DBA Support
  • Sprint Team Feedback
  • Build List Verification

Once we the team felt comfortable with what I had, we started the automation process in earnest. Until the automation process was completed, the existing process would stay in place. Once completed with testing the FinalBuilder SQL Build, we cut over and never looked back.

There is quite a bit of the unsaid in this article: I apologize for that. I, unfortunately, have limited time that I can dedicate to writing and this subject is quite detailed and extensive. If you would like to have a fuller explanation of any items, please contact me.