Page 258 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 258
Data Warehousing and Data Mining
notes Creating a unified customer view within the warehouse demanded highquality, consistent
data. The stakes were high: the potential telecommunications market exceeding 100 million
customers meant that even 99 accuracy in customer data would still result in more than a
million faulty records.
the challenge
Data quality wasn’t a new idea at the company. Until recently, there wasn’t a convenient
way for AT&T to reconcile different versions of the same consumer from one product
or department to the next. Although the problem was hardly unique to the carrier,
the ramifications of duplication and inaccuracy were significant given the size of its
marketplace.
Integrating these systems to present a unified customer view was far from straightforward.
In the telecommunications industry, the problem is even more challenging. Unlike other
businesses, such as consumer banks, AT&T might serve the same customer at multiple
addresses and with multiple phone numbers.
Another problem was data volatility: anyone could become a customer at any time.
Moreover, if the customer moved to another locale, the source and the content of the data
for that particular customer could change, too.
As for data sources, there are more than 1,500 local phone service providers across the U.S.
The content, format, and quality of name and address data can vary sharply between one
local exchange provider and the next.
The manager of the AT&T Integrated Customer View project explained, “We needed a data
cleansing system that could reach for the ‘knowledge’ in our address base, and perform
parsing and matching according to a standardized process.”
the solution
The AT&T consumer division’s Integrated Customer View project included a team with
roughly a dozen core members, each of whom had years of experience in customer
identification and systems development. That core group was supplemented by a larger
group of business analysts from across the enterprise.
The team ruled out custom development because of stiff maintenance requirements
and rejected out-of-the-box direct mail software packages because they weren’t precise
enough. Ultimately, the team chose the Trillium Software System® for data identification,
standardization, and postal correction. Trillium Software’s solution was the only
package that could deliver the necessary functionality in both UNIX and mainframe
environments.
Multiplatform support was critical because, although the company was committed to
client/server migration, legacy systems would likely coexist through the foreseeable
future. Initially, the system would consist of two parts: a terabyte-sized master repository
residing on an IBM mainframe, and an Oracle-based data warehouse maintained on AT&T
UNIX server.
The AT&T project team spent nine months creating and testing data cleansing rules for an
initial loading of data into the data warehouse. Because of its unique data requirements,
the team developed an automated name and address matching process that resulted in a
large number of permutations. According to the AT&T project manager, “The problem was
so complex that it was beyond the ability of a single individual to handle.”
Contd...
252 LoveLy professionaL university