Home >
Dataleaf Technologies, Inc: HR System migration, Archiving, Analysis, Modeling, Interim HRIS
Whenever a Dataleaf® data mart is updated, data quality information is compiled, and the information becomes an integral part of the data mart
A user can instantly retrieve rows showing any data quality problems, or specific problems. "New-problem" and "fixed-problem" records and totals can be viewed in the same way.
Trouble Tests are defined at the client level to control monitoring of metadata discrepancies, sanity-check exceptions, and administrative errors.
Typically, HR organizations define 10-50 Trouble Tests in addition to any XML-validation, as in the following example ...
Instructions for load-time "triage" (that is, actual load time alterations accompanied by a log entry) to be carried out on data which flunks a particular Trouble Test, may be part of the specification of that test. Both original and 'fixed' data values are always retained. Of course Trouble Test definitions without such triage properties are more common.
The graphic display shows a historical record of data quality problems in a single large database. This particular display shows all DQ problems defined in a group of 30 client-specific trouble tests. Dark columns are fixed problems; light columns are new problems; the line marked 'curr' (axis to the right) is the total frequency of all data quality exceptions. The numbers on the X axis are the months of the year. About 2-1/2 years history is shown.
Overall, it appears that data quality exceptions rise sharply every few months with an influx of new records, and -- in the next month -- most of the new problems are corrected.
What is interesting is that -- once such an influx of bad data has been handled -- only a low level of correction needs to occur.
In the illustration above, correction of data quality exceptions (the dark columns in the Dataleaf® illustration) does seem to occur in bursts.