Page 245 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 245
Unit 12: Metadata and Warehouse Quality
the challenge notes
Rapidly processing large quantities of information is a key to Trans Union’s success, but
that information must also be in an accessible format. “Some of our input was a jumbled
mess of data that we couldn’t get at,” said Trans Union’s database construction team
leader.
The database team knew it had a wealth of information buried within the comment
fields of many records. More specifically, they were looking for information on consumer
ownership of large, consumer durables. Most important were boats, recreational vehicles,
motor homes and motor vehicles. The challenge, therefore, was to scrutinize the comment
field, investigate and parse out the valued data components, and create and populate new
fields with data derived from the comment field content.
The targeted quantity of data was large, but not enormous by Trans Union standards: 27
million individual records containing comment fields. Fast turnaround of the data on a
Windows NT platform was the goal, because new data is always in demand. Trans Union
also didn’t want to get into a protracted code-writing exercise that would tax its time and
resources. It needed a versatile data-cleansing solution that could be implemented very
quickly.
The database team realized that they needed a solution that could scan free-form text,
standardize and transform the extracted data, and create new fields that would be
populated with intelligence gathered during the cleansing process. The solution would
have to be robust enough to handle large volumes of data and simple enough for Trans
Union to quickly learn.
Trans Union wanted to develop a standardized set of enterprise business rules for data
quality management that could be shared across existing and future platforms. Specifically,
the company needed a versatile tool that could reach deep within the complex product
data and provide repeatable and reusable business rules for data cleansing.
the solution
Trans Union chose the Trillium Software System® for its ability to clean and standardize
large volumes of generalized data from multiple sources. The Trillium Software System’s
user-defined transformation and data-filling capabilities, as well as its data element repair
facilities, are unique among solutions that operate in multiplatform environments.
It was the Trillium Software System’s specific ability to understand, elementize and create
a distribution of words and phrases from within floating, free-form text that made it a
perfect fit for Trans Union. The company understood that the Trillium Software System
could meet the immediate need for data reengineering and fill an expanded role in future
projects. Trans Union initially assigned a team of three to the project: a project manager, a
programmer and a research analyst. The team’s first step was to compile a comprehensive
listing of boats, RVs, motor homes and other vehicles made and sold in the previous ten
years, and enter them into tables and parameters for data comparisons.
the results
As a result of its initial data reengineering, Trans Union has created a new suite of products
that allows the company to identify owners of specific types of major consumer durables.
Trans Union went live with its data-cleansing project within one week of initial training
on the system.
“We were able to identify 14 million records—a full 50 percent more than what we had
imagined—that had vehicle category information we could append to our customer
database,” the database construction team leader stated.
Contd...
LoveLy professionaL university 239