Data Refreshed & Metadata

Data refreshes are a vital part of preparing an environment for testing. During any rigorous testing phase, two things occur to the data in the test environment: 1) the data gets modified (or even mangled) when testing the extremes of the possible data ranges; and, 2) the data gets stale (further from the current production state). This also occurs in the development environment, but to a lower extent. Normally all you have to do is grab a production backup, push it down to the target environment and overlay the immediate build. But what if you have intermixed environmentally sensitive data with the regular data?

“What the heck is ‘environmentally sensitive data’?” I can here you say. OK…here’s the scenario: You have tables in the application’s database that contain URL’s to the environment’s web service host. You restore the production back onto the testing server, but now you have production URL pointers in the test environment. If testing proceeded without changing the URL’s to the test web service host there could be major issues including, but not limited to:

  • Invalid testing results.
  • Misreported defects.
  • Updating production system during testing.
  • Inappropriate outbound communications. (Yes this does happen!)

What other types of data could cause similar issues: Email addresses, redirecting URL’s, server names, drive URI’s, account numbers, etc. In fact, up to the time I left Spending Accounts, new types of metadata were being integrated into the databases every release. Each new piece of metadata increased the possibility of an issue turning into a full blown production support incident or a PR problem.

Needless to say, we would want to minimize these types of occurrences, so ensuring the correct metadata is in place is vital to both release testing and system integrity. So with this in mind I set out a strategy that would be generic enough to work regardless of the actual platform or number of environments.


The construct’s rules are as follows:

  1. Data is only pushed down to a lower environment (Production -> UAT -> Testing -> Development).
  2. Code is only pushed up to a higher environment (Development -> Testing -> UAT -> Production).
  3. Metadata is maintained only in the environment that it is targeted for.

Sound easy enough? In general, it is pretty easy to grab a backup from a production database and restore it to a lower environment (taking into consideration any size limitations). And it is also pretty easy to overlay a backup with whatever builds (source code) that has not yet been applied.

But the metadata….That is a little tougher. It must be updated ‘in-place’ as it were: You must save it off prior to restore of the production data, and then overwrite whatever was brought over from production. And to make it a little tougher, the records are not always a one to one match. You might also have some issues with surrogate key consistencies that must be taken into account. So when I first started to look at the possible solutions I could:

  • Write DML scripts to completely update the data: A little tedious, even if you write code to create the DML. I would have to generate the scripts before every refresh for every environment. Also there is the problem of interspersed metadata in primarily non-metadata tables…bother!
  • Keep a separate copy of all metadata containing tables, by environment, segregated from the other environments. Then overlay the restored database with the metadata. Sounds much simpler….But is it?

After a little research, I turned to an old friend…the SQLToolbelt from Redgate Software. Specifically, I used the SQL Data Compare tool. This allowed me to:

  • Create projects that allowed the polling and storage of data from a target environment to a common database.
  • Create projects that allowed the syncing of data from the common database to the target environment.
  • Prepare for command-line execution for open-system automation.
  • Create reports to a file (for both later auditing/accountability and concatenate into overall system health report).

I had SQLToolbelt certified here at Humana and we purchased a couple of licenses. We used this tool to automate the data cleansing and refresh process. Later on I’ll give a detailed outline of the actual data refresh process for Spending Accounts.


Leave a Reply