Page 258 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 258

Data Warehousing and Data Mining




                    notes            Creating a unified customer view within the warehouse demanded highquality, consistent
                                     data. The stakes were high: the potential telecommunications market exceeding 100 million
                                     customers meant that even 99 accuracy in customer data would still result in more than a
                                     million faulty records.
                                     the challenge
                                     Data quality wasn’t a new idea at the company. Until recently, there wasn’t a convenient
                                     way  for  AT&T  to  reconcile  different  versions  of  the  same  consumer  from  one  product
                                     or  department  to  the  next.  Although  the  problem  was  hardly  unique  to  the  carrier,
                                     the  ramifications  of  duplication  and  inaccuracy  were  significant  given  the  size  of  its
                                     marketplace.
                                     Integrating these systems to present a unified customer view was far from straightforward.
                                     In the telecommunications industry, the problem is even more challenging. Unlike other
                                     businesses, such as consumer banks, AT&T might serve the same customer at multiple
                                     addresses and with multiple phone numbers.

                                     Another  problem  was  data  volatility:  anyone  could  become  a  customer  at  any  time.
                                     Moreover, if the customer moved to another locale, the source and the content of the data
                                     for that particular customer could change, too.
                                     As for data sources, there are more than 1,500 local phone service providers across the U.S.
                                     The content, format, and quality of name and address data can vary sharply between one
                                     local exchange provider and the next.
                                     The manager of the AT&T Integrated Customer View project explained, “We needed a data
                                     cleansing system that could reach for the ‘knowledge’ in our address base, and perform
                                     parsing and matching according to a standardized process.”
                                     the solution
                                     The AT&T consumer division’s Integrated Customer View project included a team with
                                     roughly  a  dozen  core  members,  each  of  whom  had  years  of  experience  in  customer
                                     identification and systems development. That core group was supplemented by a larger
                                     group of business analysts from across the enterprise.
                                     The  team  ruled  out  custom  development  because  of  stiff  maintenance  requirements
                                     and rejected out-of-the-box direct mail software packages because they weren’t precise
                                     enough. Ultimately, the team chose the Trillium Software System® for data identification,
                                     standardization,  and  postal  correction.  Trillium  Software’s  solution  was  the  only
                                     package  that  could  deliver  the  necessary  functionality  in  both  UNIX  and  mainframe
                                     environments.
                                     Multiplatform  support  was  critical  because,  although  the  company  was  committed  to
                                     client/server  migration,  legacy  systems  would  likely  coexist  through  the  foreseeable
                                     future. Initially, the system would consist of two parts: a terabyte-sized master repository
                                     residing on an IBM mainframe, and an Oracle-based data warehouse maintained on AT&T
                                     UNIX server.
                                     The AT&T project team spent nine months creating and testing data cleansing rules for an
                                     initial loading of data into the data warehouse. Because of its unique data requirements,
                                     the team developed an automated name and address matching process that resulted in a
                                     large number of permutations. According to the AT&T project manager, “The problem was
                                     so complex that it was beyond the ability of a single individual to handle.”

                                                                                                         Contd...







          252                              LoveLy professionaL university
   253   254   255   256   257   258   259   260   261   262   263