Page 245 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 245

Unit 12: Metadata and Warehouse Quality




             the challenge                                                                      notes
             Rapidly processing large quantities of information is a key to Trans Union’s success, but
             that information must also be in an accessible format. “Some of our input was a jumbled
             mess  of  data  that  we  couldn’t  get  at,”  said  Trans  Union’s  database  construction  team
             leader.
             The  database  team  knew  it  had  a  wealth  of  information  buried  within  the  comment
             fields of many records. More specifically, they were looking for information on consumer
             ownership of large, consumer durables. Most important were boats, recreational vehicles,
             motor homes and motor vehicles. The challenge, therefore, was to scrutinize the comment
             field, investigate and parse out the valued data components, and create and populate new
             fields with data derived from the comment field content.

             The targeted quantity of data was large, but not enormous by Trans Union standards: 27
             million individual records containing comment fields. Fast turnaround of the data on a
             Windows NT platform was the goal, because new data is always in demand. Trans Union
             also didn’t want to get into a protracted code-writing exercise that would tax its time and
             resources. It needed a versatile data-cleansing solution that could be implemented very
             quickly.
             The database team realized that they needed a solution that could scan free-form text,
             standardize  and  transform  the  extracted  data,  and  create  new  fields  that  would  be
             populated with intelligence gathered during the cleansing process. The solution would
             have to be robust enough to handle large volumes of data and simple enough for Trans
             Union to quickly learn.
             Trans Union wanted to develop a standardized set of enterprise business rules for data
             quality management that could be shared across existing and future platforms. Specifically,
             the company needed a versatile tool that could reach deep within the complex product
             data and provide repeatable and reusable business rules for data cleansing.

             the solution
             Trans Union chose the Trillium Software System® for its ability to clean and standardize
             large volumes of generalized data from multiple sources. The Trillium Software System’s
             user-defined transformation and data-filling capabilities, as well as its data element repair
             facilities, are unique among solutions that operate in multiplatform environments.
             It was the Trillium Software System’s specific ability to understand, elementize and create
             a distribution of words and phrases from within floating, free-form text that made it a
             perfect fit for Trans Union. The company understood that the Trillium Software System
             could meet the immediate need for data reengineering and fill an expanded role in future
             projects. Trans Union initially assigned a team of three to the project: a project manager, a
             programmer and a research analyst. The team’s first step was to compile a comprehensive
             listing of boats, RVs, motor homes and other vehicles made and sold in the previous ten
             years, and enter them into tables and parameters for data comparisons.
             the results
             As a result of its initial data reengineering, Trans Union has created a new suite of products
             that allows the company to identify owners of specific types of major consumer durables.
             Trans Union went live with its data-cleansing project within one week of initial training
             on the system.
             “We were able to identify 14 million records—a full 50 percent more than what we had
             imagined—that  had  vehicle  category  information  we  could  append  to  our  customer
             database,” the database construction team leader stated.
                                                                                Contd...



                                           LoveLy professionaL university                                   239
   240   241   242   243   244   245   246   247   248   249   250