Cover Page
Wiley Logo

SPSS® Statistics for Data Analysis and Visualization








Keith McCormick

Jesus Salcedo


with Jon Peck and Andrew Wheeler









Wiley Logo






We would like to dedicate this book to Jon Peck, who retired from
more than 30 years with SPSS and IBM while this book was
in its final stages. We wish him the best of retirements even
though he probably won't be able to resist staying in the
SPSS community in some form.

About the Authors

Keith McCormick is a data mining consultant, trainer, and speaker. A passionate user of SPSS for 25 years, he has trained thousands on how to effectively use SPSS Statistics and SPSS Modeler. He blogs at keithmccormick.com.

Jesus Salcedo is an independent statistical consultant. He is a former SPSS Curriculum Team Lead and Senior Education Specialist, who has written numerous SPSS training courses and trained thousands of users.

Jon Peck, recently retired from IBM and SPSS, was instrumental in developing and introducing the R and Python connections to the SPSS community. This expertise made him uniquely qualified to produce Chapter 18. He is the author of all the extension commands discussed in that chapter and has a patent pending on the algorithm in SPSSINC TURF procedure discussed there. He can be reached at jkpeck@gmail.com.

Andrew Wheeler is a professor of criminology at the University of Texas at Dallas and a former crime analyst. The application of geospatial techniques in his research created the opportunity for a powerful real world example in Chapter 8. He has used SPSS for over 10 years, and often blogs SPSS tutorials at andrewpwheeler.wordpress.com.

About the Technical Editors

Jon Peck, now retired from IBM, was a senior engineer, statistician, and product strategy person for SPSS and IBM for 32 years. He earned a Ph.D in economics from Yale University, and taught econometrics and statistics there for 13 years before joining SPSS. He designed and contributed to many features of SPSS Statistics and has consulted with and trained many users. He remains active on social media and in consulting.

Terry Taerum has fifteen years’ experience as a statistician at the University of Alberta, fifteen years as a data analyst at SPSS Inc., and five years as a predictive analyst and consultant with IBM Inc.

Credits

Project Editor

Tom Dinse

Technical Editors

Jon Peck

Terry Taerum

Production Editor

Dassi Zeidel

Copy Editor

Kim Cofer

Production Manager

Katie Wisor

Manager of Content Development & Assembly

Mary Beth Wakefield

Marketing Manager

Christie Hilbrich

Professional Technology & Strategy Director

Barry Pruett

Business Manager

Amy Knies

Executive Editor

Jim Minatel

Project Coordinator, Cover

Brent Savage

Proofreader

Nancy Carrasco

Indexer

Johnna VanHoose Dinse

Cover Designer

Wiley

Cover Image

iStock.com/agsandrew

Acknowledgments

Keith and Jesus are especially proud to have worked with Bob Elliot before he retired. Our good friend Dean Abbott recommended Keith to Bob when Bob was seeking out a follow up to Dean’s excellent Applied Predictive Analytics, but specifically in SPSS Statistics. Without both of them, this book would not have been created.

Terry’s and Jon’s contribution extended well beyond technical reviewing. We consider both of them mentors and friends. Jon took over technical reviewing when Terry took on a new role with a return to IBM. Jon, in particular, was an interlocutor and trusted advisor, and we produced a better book as a result.

Tom, our project editor, had to be patient with us. Deadlines slipped, contributors became unavailable, and Bob retired before the book was complete. Whenever it seemed that something wasn’t quite as it should be, it was often Tom that ultimately made it right. He deserves credit for multiple roles, and we thank him.

We would also like to thank all of the many SPSSers that we turn to when we have a question even if they haven’t heard from us in a while. We love the sense of community that we have all managed to maintain even when so many have moved on to other roles. And we thank Jason for capturing that sense of community in his foreword.

Foreword

In my various roles at SPSS and IBM I met Keith and Jesus many years ago. They both have over 20 years of statistical consulting experience, and they both have been training people on statistics and how to use SPSS for many years. Each has in fact trained thousands of students. They are uniquely qualified to bring the message and content of this book to you, and they have done so with rigor and grace. SPSS has so many techniques and procedures to perform both simple and complex analysis, and Keith and Jesus will introduce you to this rich tapestry so that it pays dividends in benefiting your endeavors in driving societal change based on data and analytics for years to come. This book goes beyond the elementary treatments found in most of the other books on SPSS Statistics but is written for users who do not necessarily have an advanced statistical background. It can make the reader a better analyst by expanding their toolkit to include powerful techniques that he or she might not otherwise consider but that can have a big payoff in increased insight.

Keith and Jesus’ outstanding new book on SPSS Statistics has brought back so many thoughts about this great product and the influence it has had on so many people that I thought I would briefly reminisce.

I first became involved with this software when I went to work for SPSS in 1995 as Director of Quality Assurance. A year earlier, SPSS had released its first Microsoft Windows product—which, while solid, did not really take advantage of the amazing possibilities a true graphical interface could provide. This was a huge and important time for the company as the SPSS team was hard at work revolutionizing both the front-end user interface and the output to create a standard that is still in place and considered best of breed today. These innovations enabled sophisticated pivot table output as well as much more customized graphical output than had ever been attempted before. Indeed, in the years to come it was that spirit of always getting ahead of every technological trend that would keep this software right in the heart of what the data analysis community demanded.

When I say the heart of the data analysis community I am not in any way exaggerating. This software has been used by hundreds of thousands of students in college and graduate school and by similar numbers in government and commercial environments worldwide. Over the years I have literally had hundreds, if not thousands of people say to me “I used SPSS in college” when I introduced myself. And of course, I can’t leave out the bootleg copies I have seen in innumerable places during my travels and personally purchased on the streets of Santiago and Beijing.

Impressive? Absolutely. But of course the real question is … WHY is SPSS so heavily used and so well loved? WHY has its community of users stayed vibrant and loyal even eight years after the company itself was acquired by IBM?

The answer is the combination of power and simplicity combined with elegance. This is a big statement. To back this up—and apropos of the subject matter—I’ll contribute a data point as my best evidence. A few years ago, when I was still with IBM (which acquired SPSS in 2009), we hired a summer intern who had used our software for a semester in college. After about a month on the job, we debriefed her on the progress of her user interface design assignment. She discussed at length the challenges she was having coming up with a design that was up to the standard of the rest of the product in terms of simplicity, backed by immense power. This led to a discussion of the first time she used the product as a student. Of course, opening a “statistics” product for the first time filled this iPhone-using millennial with much trepidation; however, as she described to us within just a few minutes she was loading and manipulating data, building predictive models, and producing output for her class. In just a short time beyond that she was digging into the depths of some of the power the product provided. Even a user nearly born and bred with the beautiful user designs of the smartphone consumer era was right at home using SPSS. What an amazing statement in and of itself. Think about it! This is made even more extraordinary because this same student had interactions with professors and researchers on her campus who were using—in fact, relying on—that very same product to do their cutting-edge work. As I said, the answer is the combination of power and simplicity combined with elegance.

This amazing simplicity does not come at the expense of power. As Keith and Jesus make clear in this book, SPSS Statistics is an incredibly powerful tool for data analysis and visualization. Even today there is no tool that works with its users of any level (novice, intermediate, or expert) to uncover meanings and relationships in data as powerfully as SPSS does. Further, once the data has been prepared, the models built, and the analysis done, there is no software available that is better at explaining the results to non-data analysts who have to act on it. This increases the value of the tool immeasurably—since it creates the understanding and confidence to deploy its insights into the real world to create real value. Having seen this done so many times, by so many people, in so many domains, I can say to those starting with this product for the first time that I truly envy you—you are about to start on a journey of learning and getting results that will amaze you—and the people you work with.

Let’s put this all in perspective. This product is now in its sixth decade of existence. That’s right—it first came out in the late 1960s. How many products can you name that have survived and prospered for that long? Not many. The Leica M camera and the Porsche 911 car with their classic timeless designs come to mind, but not much else. How many COMPUTER products? Even less; perhaps only the venerable IBM mainframe, in fact. But here we have IBM SPSS Statistics—not only surviving but still as relevant and vital as ever—right in the midst of the new age of big data and machine learning, heavily used by experts who dig deep into data and model building, but usable by novices in the iPhone era as well.

Now, let us switch our focus from celebrating the vibrancy and staying power of the SPSS journey and into the heart of what Keith and Jesus have addressed in this book. This is first and foremost a book for data analysis practitioners at intermediate and advanced levels. The question this begs is how this product can help that audience create the most value in the modern era.

Unlike the world of the late 1960s when SPSS was created, we now live in an age where there are many tools to do quick and fast analysis of datasets. For example, Tableau is a fine tool for more business-oriented users with less data analysis training to get immediate and useful visual insights from their data. So what then is the need for IBM SPSS Statistics in this new world?

To answer that question, let me take you back several years to a conference called “MinneAnalytics,” sponsored by a Minnesota-based organization of analytic professionals, where I delivered a presentation on Advanced Analytics called “What’s Your World View?” In that presentation, I envisioned a rapidly approaching new age where “big data” would meet advanced analytic techniques running in real time and that combination would drive every decision- making aspect of how our society would work. I compared the importance of this movement to previous huge steps that changed the very foundation of society—including the invention of the automobile and the invention of assembly-line production for manufacturing many different types of goods.

Well, a mere three years later that “future” society is here already—right now. It is happening all around us. Analytics on big data is driving decision making and processes everywhere you look. Hospitals apply real-time analytics to data feeds from patient-monitoring instruments in intensive care units to message doctors automatically that their patient in the ICU will shortly take a turn for the worse. Firms managing trucking use analytics to intervene proactively when the system tells them one of their drivers is predicted to have an accident. Airplanes and cars apply real-time analytics to engine sensors to predict failure and inform the pilots and drivers to take action before such failure occurs. Indeed, big data analytics has become one of the most disruptive forces in business history and is unleashing new value creation quite literally wherever you look. All of these examples clearly show a fundamental point—quick visual understanding is one thing—but deep insight yielding confidence in a predictive model that is deployed in real time at critical decision points at vast scale is quite another. It is in this realm of confirmation and confidence that SPSS Statistics shines like no other.

Mass deployment of advanced analytics will create benefits for society that are for all intents and purposes unimaginable. Assuming, of course, that the deployed analytics are in fact correct (and with the right tweaking and trade-offs between accuracy and stability) and deployed properly. It is the almost unique benefit of SPSS that no matter what language in which those analytics are built (SPSS, R, Python, supervised or unsupervised, standard or machine learning, executed programmatically or through visual interfaces, or any other variant you can think of) the product can be used to confirm confidence that the desired results will be achieved, and in understanding the risks involved. It can also be used to explain the results to others in the enterprise, aligning those who need to be in the know on exactly and precisely how analytics drive their new business models. There is no better “hub” for data scientists to practice their craft and contribute their value to the creation of a new world—a new world of staggering rates of change guided or driven by data and analytics.

IBM SPSS Statistics is the perfect tool for this new world when used by well-trained analysts who can put all the data and all the insights together without mistakes to create the most value. People who can take the output of machine learning, add traditional data and then other new forms of data (like sensors and social media for example), to get insights well beyond those quick insights from Tableau and other surface-level tools. People who know how to use the advanced capabilities of the tool, such as the ability to do mixed model analysis of data at different levels (for example, within a hierarchy to find even deeper insights). Such a tool, in the hands of such people—well-trained data scientists—can drive us into this new remarkable world with both confidence and safety. To become one of those who drive this societal transformation using SPSS you can benefit from having this book as your guide.

Enjoy the book…and enjoy the next 50 years of IBM SPSS Statistics as well!

 —  Jason Verlen

Jason Verlen is currently Senior Vice President of Product Management and Marketing at CCC Information Services, based in Chicago. Before moving to CCC he spent 20 years at SPSS and then IBM (after its acquisition of SPSS) in various roles ending with being named Vice President of Big Data Analytics at IBM.