Cover
About the Authors
Acknowledgments
Foreword for Smarter Data Science
Epigraph
Preamble
1. Why You Need This Book
2. What You'll Learn
CHAPTER 1: Climbing the AI Ladder
1. Readying Data for AI
2. Technology Focus Areas
3. Taking the Ladder Rung by Rung
4. Constantly Adapt to Retain Organizational Relevance
5. Data-Based Reasoning Is Part and Parcel in the Modern Business
6. Toward the AI-Centric Organization
7. Summary
CHAPTER 2: Framing Part I: Considerations for Organizations Using AI
1. Data-Driven Decision-Making
2. Democratizing Data and Data Science
3. Aye, a Prerequisite: Organizing Data Must Be a Forethought
4. Facilitating the Winds of Change: How Organized Data Facilitates Reaction Time
5. Quae Quaestio (Question Everything)
6. Summary
CHAPTER 3: Framing Part II: Considerations for Working with Data and AI
1. Personalizing the Data Experience for Every User
2. Context Counts: Choosing the Right Way to Display Data
3. Ethnography: Improving Understanding Through Specialized Data
4. Data Governance and Data Quality
5. Ontologies: A Means for Encapsulating Knowledge
6. Fairness, Trust, and Transparency in AI Outcomes
7. Accessible, Accurate, Curated, and Organized
8. Summary
CHAPTER 4: A Look Back on Analytics: More Than One Hammer
1. Been Here Before: Reviewing the Enterprise Data Warehouse
2. Drawbacks of the Traditional Data Warehouse
3. Paradigm Shift
4. Modern Analytical Environments: The Data Lake
5. Elements of the Data Lake
6. The New Normal: Big Data Is Now Normal Data
7. Schema-on-Read vs. Schema-on-Write
8. Summary
CHAPTER 5: A Look Forward on Analytics: Not Everything Can Be a Nail
1. A Need for Organization
2. Data Topologies
3. Expanding, Adding, Moving, and Removing Zones
4. Enabling the Zones
5. Summary
CHAPTER 6: Addressing Operational Disciplines on the AI Ladder
1. A Passage of Time
2. Create
3. Execute
4. Operate
5. The xOps Trifecta: DevOps/MLOps, DataOps, and AIOps
6. Summary
CHAPTER 7: Maximizing the Use of Your Data: Being Value Driven
1. Toward a Value Chain
2. Curation
3. Data Governance
4. Integrated Data Management
5. Summary
CHAPTER 8: Valuing Data with Statistical Analysis and Enabling Meaningful Access
1. Deriving Value: Managing Data as an Asset
2. Accessibility to Data: Not All Users Are Equal
3. Providing Self-Service to Data
4. Access: The Importance of Adding Controls
5. Ranking Datasets Using a Bottom-Up Approach for Data Governance
6. How Various Industries Use Data and AI
7. Benefiting from Statistics
8. Summary
CHAPTER 9: Constructing for the Long-Term
1. The Need to Change Habits: Avoiding Hard-Coding
2. Extending the Value of Data Through AI
3. Polyglot Persistence
4. Benefiting from Data Literacy
5. Summary
CHAPTER 10: A Journey's End: An IA for AI
1. Development Efforts for AI
2. Essential Elements: Cloud-Based Computing, Data, and Analytics
3. Driving Action: Context, Content, and Decision-Makers
4. Keep It Simple
5. The Silo Is Dead; Long Live the Silo
6. Taxonomy: Organizing Data Zones
7. Capabilities for an Open Platform
8. Summary
Appendix: Glossary of Terms
Index
End User License Agreement

List of Illustrations

Chapter 1
1. Figure 1-1: The AI Ladder to achieve a full complement of data and analytics...
2. Figure 1-2: The ladder is part of a repetitive climb to continual improvemen...
3. Figure 1-3: Current state ⇦ future state ⇦ current state
4. Figure 1-4: Ends and means model
Chapter 2
1. Figure 2-1: Trust matrix
2. Figure 2-2: Breadth and depth slivers
3. Figure 2-3: Grading
4. Figure 2-4: Data and AI democratization
5. Figure 2-5: Recognizing that the ability to skillfully ask questions is the ...
Chapter 3
1. Figure 3-1: Monitors at Mission Control Center for the International Space S...
2. Figure 3-2: A closer view
3. Figure 3-3: Monitors in a hospital emergency room
4. Figure 3-4: An electrocardiogram pattern showing normal and abnormal heartbe...
5. Figure 3-5: Data governance
6. Figure 3-6: An ontological model
7. Figure 3-7: Inference
8. Figure 3-8: Blood test results showing normalcy, part A
9. Figure 3-9: Blood test results showing normalcy, part B
10. Figure 3-10: Recognizing preconditions
Chapter 4
1. Figure 4-1: Reviewing atomic data
2. Figure 4-2: Simplified information architecture for an EDW
3. Figure 4-3: YARN architecture
Chapter 5
1. Figure 5-1: Starter set zones
2. Figure 5-2: Consistent to Active
3. Figure 5-3: Consistent to Inactive
4. Figure 5-4: Consistent to CISCO_INACTIVE_IND
5. Figure 5-5: Consistent to Status
6. Figure 5-6: Consistent to Worksheets
7. Figure 5-7: Consistent to Implied Meaning
8. Figure 5-8: Consistent to Metadata
9. Figure 5-9: Misrepresenting the nature of data governance
10. Figure 5-10: Core elements of a data topology
11. Figure 5-11: Primitive zone types
12. Figure 5-12: Data lakes, data ponds, and data puddles
Chapter 6
1. Figure 6-1: Challenges
2. Figure 6-2: Seven practices
3. Figure 6-3: Building to an MVP
4. Figure 6-4: DevOps shift-left approach
5. Figure 6-5: Core capabilities for DevOps and MLOps
6. Figure 6-6: Identifying DataOps stakeholders
7. Figure 6-7: Building blocks for AIOps
Chapter 7
1. Figure 7-1: Data to wisdom
2. Figure 7-2: Dialing data governance
Chapter 8
1. Figure 8-1: Data value chain
2. Figure 8-2: Skewness
3. Figure 8-3: Kurtosis
4. Figure 8-4: Identifying outliers
5. Figure 8-5: Gaussian distribution
6. Figure 8-6: Gaussian histogram plot
7. Figure 8-7: Gaussian distribution with low and high variance
Chapter 9
1. Figure 9-1: Caesar's Entertainment operating company asset values
2. Figure 9-2: What color are clouds?
3. Figure 9-3: Digitization misses more data than is actually collected.
Chapter 10
1. Figure 10-1: Cloud topography
2. Figure 10-2: Compute and storage capabilities
3. Figure 10-3: Analytic intensity
4. Figure 10-4: Communication flows
5. Figure 10-5: Flight paths for model execution
6. Figure 10-6: Driving prediction, automation, and optimization
7. Figure 10-7: Transitive closure and access privileges
8. Figure 10-8: A proliferation of lines serves to highlight the need for line ...
9. Figure 10-9: Taxonomic representation
10. Figure 10-10: Virtualized data zones

Praise For This Book

The authors have obviously explored the paths toward an efficient information architecture. There is value in learning from their experience. If you have responsibility for or influence over how your organization uses artificial intelligence you will find Smarter Data Science an invaluable read. It is noteworthy that the book is written with a sense of scope that lends to its credibility. So much written about AI technologies today seems to assume a technical vacuum. We are not all working in startups! We have legacy technology that needs to be considered. The authors have created an excellent resource that acknowledges that enterprise context is a nuanced and important problem. The ideas are presented in a logical and clear format that is suitable to the technologist as well as the businessperson.

Christopher Smith, Chief Knowledge Management and Innovation Officer, Sullivan & Cromwell, LLC

It has been always been a pleasure to learn from Neal. The stories and examples that urge every business to stay "relevant" served to provide my own source of motivation. The concepts presented in this book helped to resolve issues that I have been having to address. This book teaches almost all aspects of the data industry. The experiences, patterns, and anti-patterns, are thoroughly explained. This work provides benefit to a variety of roles, including architects, developers, product owners, and business executives. For organizations exploring AI, this book is the cornerstone to becoming successful.

Harry Xuegang Huang Ph.D., External Consultant, A.P. Moller – Maersk (Denmark)

This is by far one of the best and most refreshing books on AI and data science that I have come across. The authors seek and speak the truth and they penetrate into the core of the challenge most organizations face in finding value in their data: moving focus away from a tendency to connect the winning dots by ‘magical’ technologies and overly simplified methods. The book is laid out in a well-considered and mature approach that is grounded in deliberation, pragmatism, and respect for information. By following the authors' advice, you will unlock true and long-term value and avoid the many pitfalls that fashionistas and false prophets have come to dominate the narrative in AI.

Jan Gravesen, M.Sc., IBM Distinguished Engineer, Director and Chief Technology Officer, IBM

Most of the books on data analytics and data science focus on tools and techniques of the discipline and do not provide the reader with a complete framework to plan and implement projects that solve business problems and foster competitive advantage. Just because machine learning and new methodologies learn from data and do not require a preconceived model for analysis does not eliminate the need for a robust information management program and required processes. In Smarter Data Science, the authors present a holistic model that emphasizes how critical data and data management are in implementing successful value-driven data analytics and AI solutions. The book presents an elegant and novel approach to data management and explores its various layers and dimensions (from data creation/ownership and governance to quality and trust) as a key component of a well-integrated methodology for value-adding data sciences and AI. The book covers the components of an agile approach to data management and information architecture that fosters business innovation and can adapt to ever changing requirements and priorities. The many examples of recent data challenges facing diverse businesses make the book extremely readable and relevant for practical applications. This is an excellent book for both data officers and data scientists to gain deep insights into the fundamental relationship between data management, analytics, machine learning, and AI.

Ali Farahani, Ph.D., Former Chief Data Officer, County of Los Angeles; Adjunct Associate Professor, USC

There are many different approaches to gaining insights with data given the new advances in technology today. This book encompasses more than the technology that makes AI and machine learning possible, but truly depicts the process and foundation needed to prepare that data to make AI consumable and actionable. I thoroughly enjoyed the section on data governance and the importance of accessible, accurate, curated, and organized data for any sort of analytics consumption. The significance and differences in zones and preparation of data also has some fantastic points that should be highly considered in any sort of analytics project. The authors' ability to describe best practices from a true journey of data within an organization in relation to business needs and information outcomes is spot on. I would highly recommend this book to anyone learning, playing, or working in the wonderful space of Data & AI.

Phil Black, VP of Client Services for Data and AI, TechD

The authors have pieced together data governance, data architecture, data topologies, and data science in a perfect way. Their observations and approach have paved the way towards achieving a flexible and sustainable environment for advanced analytics. I am adopting these techniques in building my own analytics platform for our company.

Svetlana Grigoryeva, Manager Data Services and AI, Shearman and Sterling

This book is a delight to read and provides many thought-provoking ideas. This book is a great resource for data scientists, and everyone who is involved with large scale, enterprise-wide AI initiatives.

Simon Seow, Managing Director, Info Spec Sdn Bhd (Malaysia)

Having worked in IT as a Vice president at MasterCard and as a Global Director at GM, I learned long ago about the importance of finding and listening to the best people. Here, the authors have brought a unique and novel voice that resonates with verve about how to be successful with data science at an enterprise scale. With the explosive growth of big data, computer power, cheap sensor technology, and the awe-inspiring breakthroughs with AI, Smarter Data Science also instills in us that without a solid information architecture, we may fall short in our work with AI.

Glen Birrell, Executive IT Consultant

In the 21st century the ability to use metadata to empower cross-industry ecosystems and exploit a hierarchy of AI algorithms will be essential to maximize stakeholder value. Today's data science processes and systems simply don't offer enough speed, flexibility, quality or context to enable that. Smarter Data Science is a very useful book as it provides concrete steps towards wisdom within those intelligent enterprises.

Richard Hopkins, President, Academy of Technology, IBM (UK)

A must read for everyone who curates, manages, or makes decisions on data. Lifts a lot of the mystery and magical thinking out of “Data Science” to explain why we're underachieving on the promise of AI. Full of practical ideas for improving the practice of information architecture for modern analytical environments using AI or ML. Highly recommended.

Linda Nadeau, Information Architect, Metaphor Consulting LLC

In this book, the authors “unpack” the meaning of data as a natural resource for the modern corporation. Following on Neal's previous book that explored the role of data in enterprise transformation, the authors construct and lead the reader through a holistic approach to drive business value with data science. This book examines data, analytics, and the AI value chain across several industries describing specific use and business cases. This book is a must read for Chief Data Officers as well as accomplished or inspiring data scientists in any industry.

Boris Vishnevsky, Principal, Complex Solutions and Cyber Security, Slalom; Adjunct Professor, TJU

As an architect working with clients on highly complex projects, all of my new projects involve vast amounts of data, distributed sources of data, cloud-based technologies, and data science. This book is invaluable for my real-world enterprise scale practice. The anticipated risks, complexities, and the rewards of infusing AI is laid out in a well-organized manner that is easy to comprehend taking the reader out of the scholastic endeavor of fact-based learning and into the real world of data science. I would highly recommend this book to anyone wanting to be meaningfully involved with data science.

John Aviles, Federal CTO Technical Lead, IBM

I hold over 150 patents and work as a data scientist on creating some of the most complex AI business projects, and this book has been of immense value to me as a field guide. The authors have established the need as to why IA must be part of a systematic maturing approach to AI. I regard this book as a “next generation AI guidebook” that your organization can't afford to be without.

Gandhi Sivakumar, Chief Architect and Master Inventor, IBM (Australia)

A seminal treatment for how enterprises must leverage AI. The authors provide a clear and understandable path forward for using AI across cloud, fog, and mist computing. A must read for any serious data scientist and data manager.

Raul Shneir, Director, Israel National Cyber Directorate (Israel)

As a professor at Wharton who teaches data science I often mention to my students about emerging new analytical tools such as AI that can provide valuable information to business decision makers. I also encourage them to keep abreast of such tools. Smarter Data Science will definitely make my recommended readings list. It articulates clearly how an organization can build a successful Information architecture, capitalizing on AI technologies benefits. The authors have captured many intricate themes that are relevant for my students to carry with them into the business world. Many of the ideas presented in this book will benefit those working directly in the field of data science or those that will be impacted by data science. The book also includes many critical thinking tools to ready the worker of tomorrow … and realistically, today.

Dr. Josh Eliashberg, Sebastian S. Kresge Professor of Marketing, Professor of Operations, Information, and Decisions, The Wharton School

This is an excellent guide for the data-driven organization that must build a robust information architecture to continuously deliver greater value through data science or be relegated to the past. The book will enable organizations to complete their transformative journey to sustainably leverage AI technologies that incorporate cloud-based AI tools and dueling neural networks. The guiding principles that are laid out in the book should result in the democratization of data, a data literate workforce, and a transparent AI revolution.

Taarini Gupta, Behavioral Scientist/Data Scientist, Mind Genomics Advisors

Published simultaneously in Canada

ISBN: 978-1-119-69341-3
ISBN: 978-1-119-69438-0 (ebk)
ISBN: 978-1-119-69342-0 (ebk)

Manufactured in the United States of America

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Web site is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that Internet websites listed in this work may have changed or disappeared between when this work was written and when it is read.

For general information on our other products and services please contact our Customer Care Department within the United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.

Library of Congress Control Number: 2020933636

Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

Foreword for Smarter Data Science

There have been remarkable advances in artificial intelligence the past decade, owing to a perfect storm at the confluence of three important forces: the rise of big data, the exponential growth of computational power, and the discovery of key algorithms for deep learning. IBM's Deep Blue beat the world's best chess player, Watson bested every human on Jeopardy, and DeepMind's AlphaGo and AlphaZero have dominated the field of Go and videogames. On the one hand, these advances have proven useful in commerce and in science: AI has found an important role in manufacturing, banking, and medicine, to name a few domains. On the other hand, these advances raise some difficult questions, especially with regard to privacy and the conduct of war.

While discoveries in the science of artificial intelligence continue, the fruits of that science are now being put to work in the enterprise in very tangible ways, ways that are not only economically interesting but that also contribute to the human condition. As such, enterprises that want to leverage AI must turn their focus to engineering pragmatic systems of value that contain cognitive components.

That's where Smarter Data Science comes in.

As the authors explain, data is not an afterthought in building such systems; it is a forethought. To leverage AI for predicting, automating, and optimizing enterprise outcomes, the science of data must be made an intentional, measurable, repeatable, and agile part of the development pipeline. Here, you'll learn about best practices for collecting, organizing, analyzing, and infusing data in ways that make AI real for the enterprise. What I celebrate most about this book is that not only are the authors able to explain these best practices from a foundation of deep experience, they do so in a manner that is actionable. Their emphasis on results-driven methodology that is agile yet enables a strong architectural framework is refreshing.

I'm not a data scientist; I'm a systems engineer, and increasingly I find myself working with data scientists. Believe me, this is a book that has taught me many things. I think you'll find it quite informative as well.

Grady Booch
ACM, IEEE, and IBM Fellow

Preamble

“What I'm trying to do is deliver results.”

Lou Gerstner

Business Week

Why You Need This Book

“No one would have believed in the last years of the nineteenth century that this world was being watched keenly and closely…”

So begins H. G. Wells' The War of the Worlds, 1898, Harper&Brothers. In the last years of the 20th century, such disbelief also prevailed. But unlike the fictional watchers from the 19th century, the late-20th century watchers were real, pioneering digitally enabled corporations. In The War of the Worlds, simple bacteria proved to be a defining weapon for both offense and defense. Today, the ultimate weapon is data. When misusing data, a corporate entity can implode. When data is used appropriately, a corporate entity can thrive.

Ever since the establishment of hieroglyphs and alphabets, data has been useful. The term business intelligence (BI) can be traced as far back as 1865 (ia601409.us.archive.org/25/items/cyclopaediacomm00devegoog). However, it wasn't until Herman Hollerith, whose company would eventually become known as International Business Machines, developed the punched card that data could be harvested at scale. Hollerith initially developed his punched card–processing technology for the 1890 U.S. government census. Later in 1937, the U.S. government contracted IBM to use its punched card–reading machines for a new, massive bookkeeping project that involved 26 million Social Security numbers.

In 1965, the U.S. government built its first data center to store 742 million tax returns and 175 million sets of fingerprints on magnetic computer tape. With the advent of the Internet, and later mobile devices and IoT, it became possible for private companies to truly use data at scale, building massive stores of consumer data based on the growing number of touchpoints they now shared with their customers. Taken as an average, data is created at a rate of more than 1.7MB every second for every person (www.domo.com/solution/data-never-sleeps-6). That equates to approximately 154,000,000,000,000 punched cards. By coupling the volume of data with the capacity to meaningfully process that data, data can be used at scale for much more than simple record keeping.

Clearly, our world is firmly in the age of big data. Enterprises are scrambling to integrate capabilities that can address advanced analytics such as artificial intelligence and machine learning in order to best leverage their data. The need to draw out insights to improve business performance in the marketplace is nothing less than mandatory. Recent data management concepts such as the data lake have emerged to help guide enterprises in storing and managing data. In many ways, the data lake was a stark contrast to its forerunner, the enterprise data warehouse (EDW). Typically, the EDW accepted data that had already been deemed useful, and its content was organized in a highly systematic way.

When misused, a data lake serves as nothing more than a hoarding ground for terabytes and petabytes of unstructured and unprocessed data, much of it never to be used. However, a data lake can be meaningfully leveraged for the benefit of advanced analytics and machine learning models.

But, are data warehouses and data lakes serving their intended purpose? More succinctly, are enterprises realizing the business-side benefit of having a place to hoard data?

The global research and advisory firm Gartner has provided sobering analysis. It has estimated that more than half of the enterprise data warehouses that were attempted have been failures and that the new data lake has fared even worse. At one time, Gartner analysts projected that the failure rate of data lakes might reach as high as 60 percent (blogs.gartner.com/nick-heudecker/big-data-challenges-move-from-tech-to-the-organization). However, Gartner has now dismissed that number as being too conservative. Actual failure rates are thought to be much closer to 85 percent (www.infoworld.com/article/3393467/4-reasons-big-data-projects-failand-4-ways-to-succeed.html).

Why have initiatives such as the EDW and the data lake failed so spectacularly? The short answer is that developing a proper information architecture isn't simple.

For much the same reason that the EDW failed, many of the approaches taken by data scientists have failed to recognize the following considerations:

The nature of the enterprise
The business of the organization
The stochastic and potentially gargantuan nature of change
The importance of data quality
How different techniques applied to schema design and information architecture can affect the organization's readiness for change

Analysis reveals that the higher failure rate for data lakes and big data initiatives has been attributed not to technology itself but, rather, to how the technologists have applied the technology (datazuum.com/5-data-actions-2018/).

These facets become quickly self-evident in conversations with our enterprise clients. In discussing data warehousing and data lakes, the conversation often involves answers such as, “Which one? We have many of each.” It often happens that a department within an organization needs a repository for its data, but their requirements are not satisfied by previous data storage efforts. So instead of attempting to reform or update older data warehouses or lakes, the department creates a new data store. The result is a hodgepodge of data storage solutions that don't always play well together, resulting in lost opportunities for data analysis.

Obviously, new technologies can provide many tangible benefits, but those benefits cannot be realized unless the technologies are deployed and managed with care. Unlike designing a building as in traditional architecture, information architecture is not a set-it-and-forget-it prospect.

While an organization can control how data is ingested, your organization can't always control how the data it needs changes over time. Organizations tend to be fragile in that they can break when circumstances change. Only flexible, adaptive information architectures can adjust to new environmental conditions. Designing and deploying solutions against a moving target is difficult, but the challenge is not insurmountable.

The glib assertion that garbage in will equal garbage out is treated as being passé by many IT professionals. While in truth garbage data has plagued analytics and decision-making for decades, mismanaged data and inconsistent representations will remain a red flag for each AI project you undertake.

The level of data quality demanded by machine learning and deep learning can be significant. Like a coin with two sides, low data quality can have two separate and equally devastating impacts. On the one hand, low-quality data associated with historical data can distort the training of a predictive model. On the other, new data can distort the model and negatively impact decision-making.

As a sharable resource, data is exposed across your organization through layers of services that can behave like a virus when the level of data quality is poor—unilaterally affecting all those who touch the data. Therefore, an information architecture for artificial intelligence must be able to mitigate traditional issues associated with data quality, foster the movement of data, and, when necessary, provide isolation.

The purpose of this book is to provide you with an understanding of how the enterprise must approach the work of building an information architecture in order to make way for successful, sustainable, and scalable AI deployments. The book includes a structured framework and advice that is both practical and actionable toward the goal of implementing an information architecture that's equipped to capitalize on the benefits of AI technologies.

What You'll Learn

We'll begin in Chapter 1, “Climbing the AI Ladder” with a discussion of the AI Ladder, an illustrative device developed by IBM to demonstrate the steps, or rungs, an organization must climb to realize sustainable benefits with the use of AI. From there, Chapters 2, “Framing Part I: Considerations for Organizations Using AI” and Chapter 3, “Framing Part II: Considerations for Working with Data and AI” cover an array of considerations data scientists and IT leaders must be aware of as they traverse their way up the ladder.

In Chapter 4, “A Look Back on Analytics: More Than One Hammer” and Chapter 5, “A Look Forward on Analytics: Not Everything Can Be a Nail,” we'll explore some recent history: data warehouses and how they've given way to data lakes. We'll discuss how data lakes must be designed in terms of topography and topology. This will flow into a deeper dive into data ingestion, governance, storage, processing, access, management, and monitoring.

In Chapter 6, “Addressing Operational Disciplines on the AI Ladder,” we'll discuss how DevOps, DataOps, and MLOps can enable an organization to better use its data in real time. In Chapter 7, “Maximizing the Use of Your Data: Being Value Driven,” we'll delve into the elements of data governance and integrated data management. We'll cover the data value chain and the need for data to be accessible and discoverable in order for the data scientist to determine the data's value.

Chapter 8, “Valuing Data with Statistical Analysis and Enabling Meaningful Access” introduces different approaches for data access, as different roles within the organization will need to interact with data in different ways. The chapter also furthers the discussion of data valuation, with an explanation of how statistics can assist in ranking the value of data.

In Chapter 9, “Constructing for the Long-Term,“ we'll discuss some of the things that can go wrong in an information architecture and the importance of data literacy across the organization to prevent such issues.

Finally, Chapter 10, “A Journey's End: An IA for AI” will bring everything together with a detailed overview of developing an information architecture for artificial intelligence (IA for AI). This chapter provides practical, actionable steps that will bring the preceding theoretical backdrop to bear on real-world information architecture development.

Succeeding with Enterprise-Grade Data and AI Projects

Why You Need This Book

What You'll Learn