001

Table of Contents
 
Title Page
Copyright Page
Dedication
Preface
Acknowledgments
 
Chapter 1 - Understanding the Information Economy
 
DID THE INTERNET CREATE THE INFORMATION ECONOMY?
ORIGINS OF ELECTRONIC DATA STORAGE
STOCKS AND FLOWS
BUSINESS DATA
CHANGING BUSINESS MODELS
INFORMATION SHARING VERSUS INFRASTRUCTURE SHARING
GOVERNING THE NEW BUSINESS
SUCCESS IN THE INFORMATION ECONOMY
NOTES
 
Chapter 2 - The Language of Information
 
STRUCTURED QUERY LANGUAGE
STATISTICS
XQUERY LANGUAGE
SPREADSHEETS
DOCUMENTS AND WEB PAGES
KNOWLEDGE, COMMUNICATIONS, AND INFORMATION THEORY
NOTES
 
Chapter 3 - Information Governance
 
INFORMATION CURRENCY
ECONOMIC VALUE OF DATA
GOALS OF INFORMATION GOVERNANCE
ORGANIZATIONAL MODELS
OWNERSHIP OF INFORMATION
STRATEGIC VALUE MODELS
REPACKAGING OF INFORMATION
LIFE CYCLE
NOTES
 
Chapter 4 - Describing Structured Data
 
NETWORKS AND GRAPHS
BRIEF INTRODUCTION TO GRAPHS
RELATIONAL MODELING
RELATIONAL CONCEPTS
CARDINALITY AND ENTITY-RELATIONSHIP DIAGRAMS
NORMALIZATION
IMPACT OF TIME AND DATE ON RELATIONAL MODELS
APPLYING GRAPH THEORY TO DATA MODELS
DIRECTED GRAPHS
NORMALIZED MODELS
NOTE
 
Chapter 5 - Small Worlds Business Measure of Data
 
SMALL WORLDS
MEASURING THE PROBLEM AND SOLUTION
ABSTRACTING INFORMATION AS A GRAPH
METRICS
INTERPRETING THE RESULTS
NAVIGATING THE INFORMATION GRAPH
INFORMATION RELATIONSHIPS QUICKLY GET COMPLEX
USING THE TECHNIQUE
NOTE
 
Chapter 6 - Measuring the Quantity of Information
 
DEFINITION OF INFORMATION
THERMAL ENTROPY
INFORMATION ENTROPY
ENTROPY VERSUS STORAGE
ENTERPRISE INFORMATION ENTROPY
DECISION ENTROPY
CONCLUSION AND APPLICATION
NOTES
 
Chapter 7 - Describing the Enterprise
 
SIZE OF THE UNDERTAKING
ENTERPRISE DATA MODELS ARE ALL OR NOTHING
THE DATA MODEL AS A PANACEA
METADATA
THE METADATA SOLUTION
MASTER DATA VERSUS METADATA
THE METADATA MODEL
XML TAXONOMIES
METADATA STANDARDS
COLLABORATIVE METADATA
METADATA TECHNOLOGY
DATA QUALITY METADATA
HISTORY
EXECUTIVE BUY-IN
NOTES
 
Chapter 8 - A Model for Computing Based on Information Search
 
FUNCTION-CENTRIC APPLICATIONS
AN INFORMATION-CENTRIC BUSINESS
ENTERPRISE SEARCH
SECURITY
METADATA SEARCH REPOSITORY
BUILDING THE EXTRACTS
THE RESULT
NOTE
 
Chapter 9 - Complexity, Chaos, and System Dynamics
 
EARLY INFORMATION MANAGEMENT
SIMPLE SPREADSHEETS
COMPLEXITY
CHAOS THEORY
WHY INFORMATION IS COMPLEX
EXTENDING A PROTOTYPE
SYSTEM DYNAMICS
DATA AS AN ALGORITHM
VIRTUAL MODELS AND INTEGRATION
CHAOS OR COMPLEXITY
NOTES
 
Chapter 10 - Comparing Data Warehouse Architectures
 
DATA WAREHOUSING
CONTRASTING THE INMON AND KIMBALL APPROACHES
QUANTITY IMPLICATIONS
USABILITY IMPLICATIONS
HISTORICAL DATA
SUMMARY
NOTES
 
Chapter 11 - Layered View of Information
 
INFORMATION LAYERS
ARE THEY REAL?
TURNING THE LAYERS INTO AN ARCHITECTURE
THE USER INTERFACE
SELLING THE ARCHITECTURE
 
Chapter 12 - Master Data Management
 
PUBLISH AND SUBSCRIBE
ABOUT TIME
GRANULARITY, TERMINOLOGY, AND HIERARCHIES
RULE1: CONSISTENT TERMINOLOGY
RULE2: EVERYONE OWNS THE HIERARCHIES
RULE3: CONSISTENT GRANULARITY
RECONCILING INCONSISTENCIES
SLOWLY CHANGING DIMENSIONS
CUSTOMER DATA INTEGRATION
EXTENDING THE METADATA MODEL
TECHNOLOGY
 
Chapter 13 - Information and Data Quality
 
SPREADSHEETS
REFERENCING
FIT FOR PURPOSE
MEASURING STRUCTURED DATA QUALITY
A SCORECARD
METADATA QUALITY
EXTENDED METADATA MODEL
NOTES
 
Chapter 14 - Security
 
CRYPTOGRAPHY
PUBLIC KEY CRYPTOGRAPHY
APPLYING PKI
PREDICTING THE UNPREDICTABLE
PROTECTING AN INDIVIDUAL’S RIGHT TO PRIVACY
SECURING THE CONTENT VERSUS SECURING THE REFERENCE
 
Chapter 15 - Opening Up to the Crowd
 
A TAXONOMY FOR THE FUTURE
POPULATING THE STAKEHOLDER ATTRIBUTES
REDUCING E-MAIL TRAFFIC WITHIN PROJECTS
MANAGING CUSTOMER E-MAIL
GENERAL E-MAIL
PREPARING FOR THE UNKNOWN
THIRD-PARTY DATA CHARTERS
INFORMATION IS DYNAMIC
POWER OF THE CROWD CAN IMPROVE YOUR DATA QUALITY
NOTE
 
Chapter 16 - Building Incremental Knowledge
 
BAYESIAN PROBABILITIES
INFORMATION FROM PROCESSES
THE MIT BEER GAME
HYPOTHESIS TESTING AND CONFIDENCE LEVELS
BUSINESS ACTIVITY MONITORING
NOTE
 
Chapter 17 - Enterprise Information Architecture
 
WEB SITE INFORMATION ARCHITECTURE
EXTENDING THE INFORMATION ARCHITECTURE
BUSINESS CONTEXT
USERS
CONTENT
TOP-DOWN/BOTTOM-UP
PRESENTATION FORMAT
PROJECT RESOURCING
INFORMATION TO SUPPORT DECISION MAKING
NOTES
 
Looking to the Future
About the Author
Index

001

To A, I, and M with love.

Preface
This book is aimed at anyone who is in any way responsible for information. Executives, managers, and technical staff all need to understand how to manage this most valuable resource.
I wrote this book based on the observation that the concept of information overload is permeating every business that I deal with. At the same time, the global economy is moving from products to services that are described almost entirely electronically. Even those businesses that are traditionally associated with making things are less concerned with the management of the manufacturing process (which is largely outsourced) than they are with the management of their intellectual property. Increasingly, information doesn’t provide a window on the business. It is the business.
It’s a simple equation. Intellectual property is tied up in the data on computers. If it is the subject of focused management, then greater value is extracted from that data. If the intellectual property is a significant proportion of the value of the business, then such a focused effort will have a dramatic effect on the value of the business as a whole. Such an effort will also make the organization much more enjoyable to work in with less time lost searching for information that should be readily available and less time sifting through irrelevant data that should never have hit the e-mail inbox.
As business has become more complex, techniques are appearing almost every day that seek to simplify the task of managing a large, multifaceted organization. Their quest is similar to a physicist looking for the single unifying equation that will define the universe. Any approach that recommends focusing on one part of the business must use a limited set of measures that aggregate complex data from across the enterprise. In providing a simple answer, detail and differentiation must be lost.
A simple set of metrics by itself is no longer enough to sum up the millions or billions of moving parts that define the enterprise. Perhaps, then, it is time to gain a better understanding of the role of information in business.
While large quantities of information have been with us for as long as humans have gathered in groups, it has taken on a whole new dynamic form. The quantity of data has grown dramatically since the cost of computer storage dropped as it did at the end of the twentieth century. The growth has taken business management by surprise and the techniques that we use have not been able to keep up.
With little differentiation in the bricks-and-mortar assets, business needs to enhance its service and differentiate using the informational resources at its disposal. The winners tailor their product to the needs of their markets. Successful leaders have a deep insight into the running of their business. Such an insight can come only from accurate information.
In almost every organization, one or more executives have been assigned accountability for information governance, quality, or records. Similarly, technologists are being asked to make sense of the mountains of data that exist in databases, file systems, and other repositories. This is a book about becoming an information-centric business and achieving significant benefits as a result.
Over many years, I have had the opportunity to work with hundreds of organizations in the private and government sectors. The issues that they face handling business information have a common theme of complexity. Questions that should be simple to answer take too long, reconciliations that should be exact aren’t, privacy that should be perfect isn’t, and security that should be tight is porous.
Treating information as something that needs to be managed in its own right allows a profession of information managers to develop a common approach to information management. Without common techniques, many organizations have been ad hoc in their approach. The most successful, though, have borrowed approaches from other disciplines and been part of the evolution of a form of professional consensus.
For that reason, I have been pleased over a number of years to be part of the leadership of the MIKE2.0 initiative. MIKE2.0 (Method for the Implementation of a Knowledge Enterprise) is an open collaboration of information management professionals from a variety of organizations seeking to develop a common approach. The content is entirely free under the Creative Commons licensing model. MIKE2.0 can be found at www.openmethodology.org.
I have applied the techniques in this book in some of the world’s largest companies and government departments. They have also been effectively adopted in midsized and even small businesses. As a field grows in sophistication, so the knowledge needed by practitioners also increases. This book provides sufficient detail to allow anyone who deals with information to identify the right approach to apply without trying to be a step-by-step guide. Armed with the knowledge within these pages, the reader can then adopt comprehensive methodologies like MIKE2.0 to develop detailed project plans or establish programs of work.
Each chapter introduces a concept and in many cases provides both strategic and tactical advice. The strategic advice will help shape the future enterprise. The tactical advice will help solve immediate challenges. The reader should be left with the overwhelming message that information management is not the responsibility of the information technology department, nor is it able to be governed by any one line of business. Information is an asset with a very real economic value. It is the responsibility of everyone who in any way creates, handles, stores, or exploits this asset to ensure that they achieve the greatest possible value for the enterprise as a whole.
This is not the final book that will be written on this subject. The discipline will continue to develop as we all find better and more effective ways to run organizations to better create, handle, and exploit information. There is no single answer to the question on how you should manage your information resources, so apart from the MIKE2.0 site, I also encourage readers of this book to check in at www.infodrivenbusiness.com where additional references and comments will be posted.

Acknowledgments
Many people have helped to review draft manuscripts, supported the process of getting it published, and constantly challenged me to think deeply about all aspects of information and data management. I’d like to specifically thank, in no particular order, Robin Hillard, Michelle Pearce, Professor David Arnott, Sean McClowry, Professor Graeme Shanks, Dr. Gregory Hill, Frank Farrall, Gerhard Vorster, Giam Swiegers, Brian Romer, and Michael Tarlinton.

Chapter 1
Understanding the Information Economy
Managing information has become as important to the enterprise as managing financial information has been to the accounting functions of a business. Information now pervades every aspect of an organization, including reporting, marketing, product development, and resource allocation. In the last twenty years, business reports to management and investors have become much more dependent on information derived from nonfinancial sources than ever before.
In fact, as the economy increasingly depends on information, the old assumptions about what is important have changed. The value that business saw in scale due to shared functions and infrastructure have been turned on their head by business process outsourcing (BPO), which is the outsourcing of a business function that might previously have been done within the organization. Examples include the processing of invoices, payroll, or even customer contact through call centers.
BPO is only possible because of advances in the storage, communication, and description of complex information at a cost that is much lower than imaginable even twenty years ago. At the same time, the value that business might previously have seen in owning infrastructure (such as manufacturing plants) has been overtaken by the value of the knowledge of the manufacturing process.
Everywhere we look, we see examples of how the management and exchange of intangible information has become more important than the trade in physical resources. An information economy has been created describing the exchange of information among organizations and between individuals and departments within a single organization.
To extract the greatest possible value from the concept of the information economy, it is worth looking at its origins.
 
We should be investing in the new electronic superhighways—satellite and telecommunications technology that is the nerve centre of a new Information Economy—doing for the next century what roads and railways have done for this one.
—Tony Blair, Labour Party Conference, 1994
 
Blair, like most politicians, saw services trading in information as being driven by the Internet and its supporting communications infrastructure. By 1990, however, the networking technologies that drove the Internet were already well established and mature. So why wasn’t the economy already online?

DID THE INTERNET CREATE THE INFORMATION ECONOMY?

The concept of electronic or information superhighways appeared as early as the 1970s. Artist Nam June Paik, who is well known for his electronic and video work, appears to be the first person to have used information superhighway as a term, in 1974. Certainly, by the 1980s, there are many references to the term. Newsweek carried an article on January 3, 1983, which uses the term with reference to networks being built to connect northeastern cities such as New York, Washington, DC, and Boston. Al Gore (Vice President of the United States from 1992 to 2000) and Bill Gates (cofounder of Microsoft) did much to popularize the term in the 1990s.
 
The United States could benefit greatly—in research, in education, in economic development, and in scores of other areas—by efficiently processing and dealing with information that is available but unused. What we need is a nationwide network of information superhighways, linking scientists, business people, educators, and students by fiber-optic cable.
—Al Gore, “Information Superhighways: The Next Information Revolution,” The Futurist, 1991
Now that computing is astoundingly inexpensive and computers inhabit every part of our lives, we stand at the brink of another revolution. This one will involve unprecedentedly inexpensive communications. All the computers will join together to communicate with us and for us. Interconnected globally, they’ll form a large interactive network, which is sometimes called the information superhighway.
—Bill Gates, The Road Ahead, 1995
 
The consistent theme of speeches and commentary from the era is that the Internet combined with ubiquitous connectivity would drive economic activity and a new way of doing business. What most commentators of the time missed, however, was that the Internet was not a creation of the U.S. government but rather an inevitable consequence of a business and consumer need created by a new phenomenon: mass computer storage.

ORIGINS OF ELECTRONIC DATA STORAGE

In the 1940s and 1950s, the U.S. Navy was undertaking a computer project titled “Whirlwind.” Whirlwind was designed to support the development of flight simulations in support of pilot training.
While this would be an easy task today, it was revolutionary in many respects then. Most problems that were tackled using computers at that time were based on individual equations that needed to be applied many times (such as the repetitive calculation of artillery range tables). Flight simulations required complex algorithms with large amounts of data to be shared between the steps.
Apart from the many new and complex tasks involved, the output was time dependent. Until that time, all computing had been undertaken in batches with the only driver for speed being the time it took to get the final result.
The project was run by Jay W. Forrester who realized that existing technology was not able to deliver information to the flight-simulator environment quickly enough to be useful. He also realized that it wasn’t processing power that was holding up the system; rather, it was the ability to access information from the archaic technologies in use at the time to store variables.
Forrester leveraged the work of An Wang, a physicist who was developing a technique to use magnetic fields to store individual bits of data. The high speed of this nonmechanical approach was exactly what Whirlwind needed. As a result of this collaboration, Wang’s core memory (referred to as core because it uses the core magnetic fields) became the standard form of memory until the 1970s when silicon memory manufacture took over.
Previous forms of computer memory had been so inefficient that the concept of data was limited to variables explicitly set by the programmer at the time of computation. There was no need for any relationship to be described between any of these discrete variables.
With the introduction of core memory, however, digital computers could move into the mainstream of industry. They became business as well as mathematical tools capable of handling clerical, data-centric functions such as banking account balances, retail stock control, and financial ledgers.
Once the computer moved out of the purely mathematical world, the handling of complex data became possible, driving even greater storage needs, which in turn spawned developments in both memory and computer disk technologies. This insatiable need for data drove technological development at such a dramatic pace that cofounder of Intel Gordon Moore wrote in 1965:
The complexity for minimum component costs has increased at a rate of roughly a factor of two per year … Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000. I believe that such a large circuit can be built on a single wafer.1
This statement was later generalized into Moore’s Law and extended by others to support the ongoing doubling every twelve to eighteen months of all types of computer storage and processing capacity.

STOCKS AND FLOWS

Economists deal with complex systems with elements that accumulate or reduce as a function of activity or time. The elements that accumulate are often referred to as stocks because they represent an amount that builds up (a stock) and can be drawn from in the future. A good example of a stock is wealth. To accumulate or reduce a stock, something needs to be added or removed. This process is called a flow. Spending money is a good example of a flow because it reduces the stock of wealth (see Figure 1.1).
Figure 1.1 Stock of Wealth Reduced by Flow of Spending
002
In the 1950s, the same Professor Jay W. Forrester who was central to the development of magnetic computer storage applied the principles of stocks and flows to create the discipline of system dynamics, which describes complex systems (of which the economy is a perfect example) by describing every element in terms of stocks or flows. The author has previously applied Forrester’s principles of system dynamics to data warehouse systems in particular (see Hillard, Blecher, and O’Donnell, “The Implications of Chaos Theory on the Management of a Data Warehouse.”2 Chapter 9 introduces system dynamics and its application in more detail.
The Internet can be similarly described in terms of stocks and flows. Each server on the network accumulates information, while the routers direct the flow of information around the system.
Which is more valuable: the stock or the flow? Without the flow of the Internet, there is no ability to access information on individual servers. Without the stocks of information on the servers, there is no reason for the flow of the Internet to exist. Therefore, it can be said that stocks and flows are of equal value. To function, the Internet needs storage capacity and connectivity. Although the network technology for the information superhighway was available before the 1990s, the Internet did not come into existence until there were enough valuable stores of data that people wanted to access.

BUSINESS DATA

With the availability of practical technology for the storage of data, business enthusiastically adopted computing through the 1970s and 1980s; however, the cost of storage remained a substantial impediment to unfettered application and accumulation of business history. Computing historians can show this by many measures, but none is more dramatic than the rise of the Y2K problem when companies that had systems that were built during these years were so concerned with conserving storage that they reserved only two characters for the year of any date (e.g., 1985 became 85).
By the end of the 1980s, Moore’s Law began to catch up with the latent content generated by business. By the early 1990s, the price of semipermanent storage had reached the psychologically important threshold of US$1 per megabyte.
For the first time, business systems did not need to be so Spartan in deciding what data to keep. In fact, more and more programmers postponed the development of archive routines, knowing that Moore’s Law would outpace the growth in their databases. Of even more benefit, business analysts could now require the collection of data that was ancillary to the core transaction, building up a context for every business relationship. The business system had become a data repository of value.
The Internet had existed in some form for decades, with the foundations laid in ARPANET in the 1970s and widely used local area networks (LANs) in the 1980s. The network technology was robust, but public and business interest in applying it further was limited by the lack of content. To leverage the stock and flow metaphor, there was no demand for the flow of information in the absence of any significant stocks of data.
Low-cost storage enabled the stocks of data to build in business and the wider community. Gradually hubs of content built up with proprietary access, such as bulletin boards, AOL (America On-Line), and many other similar services. The networking technology was mature and so it was inevitable that it would standardize.
A useful comparison is the introduction of telephones at the end of the nineteenth century. Initially, the technology was applied to pairs or to small numbers of businesses that needed to connect several of their locations. Even though the technology initially had some minor differences between suppliers, there was a very quick jump to exchanges and then interfaces between different exchanges. Today, we consider it historically inevitable that the telephone would quickly standardize to one network across the globe.

CHANGING BUSINESS MODELS

Historically, business has been heavily decentralized. A very good and illustrative example is the banking industry in which a bank branch manager in the 1970s and 1980s had considerable executive authority and prestige. The advent of centralized information has allowed the head office to take over the day-to-day running, approval, and review of transactions, ultimately leading to today’s branch manager generally having a greatly reduced role and responsibility.
Access to complex information covering all aspects of business has coincided with a tectonic shift to centralized power and control in almost every industry sector, from retail through manufacturing, logistics, telecommunications, and financial services. Of course, one of the problems of this approach is the ability for small head-office errors to be magnified many times. An error in a ledger at a branch is limited to a small part of the business. A centralized error can be a material proportion of the business.
Robin Morgan, a feminist writer, once said that “Information is power.” Armed for the first time with masses of information, head-office business executives have wielded previously unimaginable power, taking over not only broad strategy but the minutia of transaction review and approval. Morgan’s hypothesis was that those armed with information are tempted to conceal it from others and use it to exercise control. Many staff in large organizations today regularly complain about their access to information and the lack of discretion they are permitted in the fulfillment of their jobs. The excuse most commonly given for the concealment of information is market regulation (such as the prohibition of insider trading) or commercial sensitivities (such as those used by government to avoid disclosing dealings with the private sector).
It is worth considering whether the reason some information is hidden from wider view may be due to a lack of confidence in its quality. This is particularly relevant if published results are derived from the detail and there could be a genuine fear that independent analysis (even within their own ranks) of the data could yield different and challenging results.
The question that any organization needs to ask itself is whether it is using information to create the most dynamic, responsive, and adaptable enterprise possible, or is it using information to satisfy the need for power by a privileged few?

INFORMATION SHARING VERSUS INFRASTRUCTURE SHARING

Companies, like any social network, gain scale because there is an advantage to their constituent parts. Companies, like countries, break apart when the constituent parts are able to realize more value without the parent entity.
During the majority of the twentieth century, conglomerates formed with the express purpose of providing back-end and management scale. By being part of the one entity, constituent businesses were able to share capital, administration services, logistic hubs, office space, and other traditional infrastructure. Business trends through the last decades have created third-party services that can provide such facilities more effectively and usually more cheaply than in-house equivalents.
The growth of superannuation and other pension funds has created cash box investments looking to provide working capital for high-growth business.
Large-scale services firms have standardized the provision of administrative services such as payroll, accounts, and even more hands-on services such as call centers.
The privatization of traditional postal services is combined with much more entrepreneurial transport businesses to provide outsourced warehousing, distribution, and global integration at unit costs that are less than anything available to even the largest conglomerate.
Commercial office space is much more commoditized with a mobile workforce that expects the facilities in the location or locations that they choose to work rather than an employer who requires them to relocate daily to a supercampus.
In short, the infrastructural reasons for conglomerate businesses to exist have been dramatically reduced over time and the capital markets punish companies that have failed to realize this.
There is, however, a new and even more powerful reason for conglomerate companies to exist. While they are more complex to manage than their simplified competitors, they also have access to equally complex data about their stakeholders and operations. To justify its existence, a conglomerate cannot rely on back-end infrastructure sharing; rather, it must be able to demonstrate that it is generating growth and cash flow through active sharing of information between every division of its constituent businesses. It can only demonstrate this effectively to its stakeholders by measuring the equivalent of gross domestic product (GDP) in the terms of its own internal information economy.
There is no better example of this than the attempts by media companies, such as Rupert Murdoch’s News Corporation, to establish their role in the information economy. Small media companies see the Internet as an opportunity to get their product to market without needing expensive infrastructure. Large companies like News Corporation need to find a way to use their extensive content to aggregate more effectively and offer consumers a product for which they are prepared to pay a premium.

GOVERNING THE NEW BUSINESS

Like information economy, the term information governance has been misused and misunderstood. Most organizations, pressed by regulatory compliance or other oversight, have introduced some form of information governance, but in general it is seen as a committee-based audit process resulting in some score and identification of issues to be resolved.
Human review and intervention is seldom sustainable without permanent intervention by an outside authority. Even when this happens, in the absence of a crisis, the review becomes superficial and compliance driven.
To use information to achieve business outcomes, organizations need to motivate their staff to use information for the greater good of the organization rather than for individual gain or power. Using Forrester’s stocks and flows metaphor of the enterprise, it would appear that if the only use of information is to cement power, then information will naturally flow into a few locations without natural dissemination to the wider enterprise.
Centralized and mandated initiatives seldom work, with most economists agreeing that groups will seek to serve the greater good only when there is a currency that they are exchanging and that results in some type of personal gain (even if it is only in terms of credit or well-being). For this reason, the business that seeks to model itself to achieve its business goals must assign value for information and, even more important, a currency to recognize its exchange. Information is neither free nor unlimited.
It is the role of information governance to track the creation of information, understand the value it provides to the organization, reward its sharing, and understand its depreciation through use or time. It should come as no surprise that many of the activities of information governance are founded in economics and the management of the information economy.
Information governance and information management are sometimes confused. Information governance is concerned with supervising and motivating information activities without necessarily accessing the content. Information management describes the activities themselves and involves directly interacting with the information materials.
Chapter 3 tackles this challenge in detail, including using the concept of an “information currency” as a way to challenge existing business models and more effectively leverage the information asset.
While it is simple to understand, monetary value is often not enough to reward information exchange. Information budgets (a little like the carbon credits proposed in response to global warming) allow groups to become experts in the generation of relevant content. Breaking the budget into strategic categories allows the company to build a balanced approach to its business goals and encourages exchange between departments to meet targets in each of the relevant categories.
Unfortunately, the internal information economy is too complex for a small number of universal rules to be applied universally across the enterprise. Markets are required at the individual product line level. For instance, the sharing of data about a customer across product groups (such as is found in telecommunications or financial services companies) requires a benefit to flow between them. Since the objective is to reward further business, such a motivation could be achieved by aligning permitted customer discounts to the sharing of detailed customer data. (Information governance and information currency are described further in Chapter 3.)

SUCCESS IN THE INFORMATION ECONOMY

If the reader is willing to accept the premise that content is more important than transmission (i.e., stocks are more important than flows), and considers carefully if information is being used to better the enterprise rather than control it, then it is possible to begin to look at organizational success in terms of the information economy.
The first step is to understand what success actually means. It isn’t obvious because every organization has its own objectives. For a government enterprise, success is usually defined in terms of service or public good. For a company or other type of firm, success has to be aligned to strategic goals, such as positioning for future growth, extracting maximum cash flow from an asset, or responding to disruptive competitive events in the market. In each of these cases, information is critical but how it is used will differ. When the enterprise’s business goals are understood, decisions can then be made about how information should be used, as shown in the following examples.
The company that is trying to maximize cash flow, for instance, is likely to put a high priority on discipline and will not foster innovation at the frontline (after all, the main obstacle to maximizing cash is its diversion to new initiatives). This type of organization will usually use information most effectively to drive centralized control by a small number of business executives.
The government enterprise that is seeking to maximize its stakeholder (public) service will often seek to use information to empower individual line service staff to make the right decision for their direct client while at the same time monitoring compliance with government policy and good budgetary discipline.
The firm that is seeking to differentiate through innovation will try to maximize its business talent pool by creating a culture of collaboration across the business that is not dependent on hierarchy. In a meritocracy, business leaders must be prepared to promote initiatives that they find neither intuitive nor comfortable but that are thoroughly examined through modeling and peer review.
Each of the preceding examples is a generalization and represents only a subset of the possible permutations of business need and the application of information. If the reader understands how information should be applied to achieve his or her strategic goals within an organization, then the next question to ask is how can these principles be introduced and governed? The answer is a properly structured internal economy based on an information currency and appropriate governance.

NOTES

1 Gordon E. Moore (1965), “Cramming More Components onto Integrated Circuits,” Electronics Magazine, 38(8).
2 R. Hillard, P. Blecher, and P. O’Donnell (1999). “The Implications of Chaos Theory on the Management of a Data Warehouse,” Proceedings of the International Society of Decision Support Systems (ISDSS).

Chapter 2
The Language of Information
The study of any subject has to start with an introduction to the vernacular of the discipline. In chemistry, this means understanding the periodic table, in mathematics the language of algebra, in accounting it means understanding the meaning of price-to-earnings ratios, amortization, depreciation, and so on.
In the discipline of information management there are still many different ways of describing information, including the meaning of terms like metadata and document. Because much of the field has developed rapidly in response to new technologies, there has been no opportunity for a consensus on definitions, terms, and language to develop.
One of the more succinct definitions of information has been suggested by Robert M. Losee1 and will be discussed in more detail in Chapter 6:
Information is produced by all processes and it is the values of characteristics in the processes’ output that are information.
Without a common definition, practitioners who work with information in different disciplines as diverse as computer science, communications, and library management don’t have a linguistic foundation to support important discussions around the content that overlaps all of their areas of expertise. Over time, many aspects of the language associated with information management need to be standardized by professional consensus.
The current lack of a common language is a significant issue faced by the information management profession. Because practitioners have few standards they can use when discussing information concepts, there is little cross-pollination of ideas among the different domains of information management. Compare this to the field of accounting, in which the same principles are applied across industries and accounting specializations.
In information management, there have been some ambitious and in a couple of cases, very successful, attempts to provide languages in specific domains. However, these efforts are largely immature and the field is waiting for practitioners to reach consensus and this takes time.
There are a wide range of stakeholders in such a language. The librarian profession is responsible for information storage and retrieval for both corporate clients and the general public. Communications engineers develop solutions that transmit messages between machines and people. Data modelers design database structures that hold operational and analytical data for many corporate applications. Chief data officers and data stewards manage the corporate information asset on behalf of the business. Chief information officers and technology managers look after the computer systems that store and retrieve the data. Knowledge managers act as corporate information coaches, helping organizations to realize the breadth of their capabilities and reduce their dependencies on individuals.
Before the widely disparate groups of professionals can agree on a common language, they must establish a foundation for how to differentiate data, information, and knowledge. At present, there is not even agreement about whether the word data is singular or plural with popular and academic use differing in some countries.
The word data is derived from Latin and is the plural of datum. The word datum has a long heritage in the English language and continues to be used by many disciplines, such as surveying and engineering, to mean a reference point. This appears to be the historical reason that the word datum is seldom used in the context of information management. To avoid confusion, it is advisable not to use the word datum in the context of information management.
The debate about whether data should be treated as plural or singular is ongoing. Language scholars appear to have a preference for the plural form, for example:
These data were retrieved from the computer.
Such an approach is often used in general language in the United States, but it appears to be less common in countries that derive their English more directly from Britain. Overall, the most common use appears to assume that data is a singular mass noun, similar to water, thus the following two sentences are in the same structure:
The water was retrieved from the bucket.
The data was retrieved from the computer.
Given that information is a singular mass noun and is consistent with the growing consensus on data, it is likely that this form will prevail.
The use of the word knowledge in the context of information usually refers to codifying interpretations of data. Knowledge is generally subjective, interpretive, and depends on the experiences of the organization and the individual.
The discipline of Knowledge Management, which is definitely a branch of Information Management, has two distinct types of knowledge: tacit and explicit. Tacit knowledge is commonly understood, often regarded as obvious but is difficult to describe in a proscriptive manner. Explicit knowledge, on the other hand, is clear and often has parameters that can be recorded. For example, a group of sales staff may exercise their judgment on when to offer a discount to potential customers—the judgment they make is not documented and is based on their tacit knowledge of when it will have the most impact on the potential sale. In another organization, the decision on discounting might be clearly documented and applied based on specific volume thresholds, in this case the knowledge on when to apply the discount is described as being explicit.
Figure 2.1 Wisdom or Knowledge Pyramid
003
During the 1980s, computer-based trading became popular. Some investment funds promoted themselves as using such an approach exclusively. One of the reasons it was attractive to investors was that the decision-making process of fund managers has often been based on tacit knowledge, which is difficult to quantify. By moving to a rules-based algorithm, the investment knowledge became explicit. While still used to a large extent today, the use of automated algorithms is almost always paired with the tacit knowledge of an expert, as the global stock market crash of 1987 taught investors that there is significant value in tacit knowledge that is too complex to codify as explicit knowledge.
Conventional knowledge management definitions imply that information is derived from data and that knowledge is derived from information. This is sometimes described in terms of a pyramid, as shown in Figure 2.1. As shown in this diagram, some practitioners have gone further and are defining wisdom in terms of its derivation from knowledge. It isn’t known who first drew the pyramid in this way, but it has been popular for many years.
Such a pyramid could be used, for instance, to illustrate the decision-making process used by a department store to buy next season’s fashions. In this example, the raw data could correspond to the retail sales transactions associated with existing merchandise. The information might be this data into sales performance by color and style. Explicit knowledge, based on this information, can be the apportionment of sizes based on the sales trend information. Finally, wisdom can take many forms including insight into whether it is possible to draw conclusions on next year’s fashion demand based on this year’s sales.
It is interesting that T.S. Eliot appears to have anticipated this discussion more than half a century beforehand:
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?2
While the inclusion of wisdom is very appealing, it is difficult to describe in a meaningful way and little, if any, direct benefit has been gained from having it in this type of model. For that reason, the loose concept of wisdom is not widely regarded as being a legitimate component of information management.
The relationship between knowledge and information is useful. It provides an explicit description of the role of both tacit and explicit knowledge in realizing the economic benefit of the information asset. The relationship between information and data is much more troubling given the lack of a clear differentiation between the two concepts.
There is a tendency by most people to generically talk about information when referring to anything that informs, whether it be a raw set of numbers or a spreadsheet document, with an advanced level of interpretation. It would be extremely arrogant for a profession that hasn’t even got its scope or definition defined to try and mandate a change to the popular usage of the term information.
Broadly, it appears that popular usage is that data means a set of numbers or a very unprocessed list of textual items. Information is an umbrella that includes data together with all documents, Web pages, and anything else that is absorbed by the senses through a computer interface. Although it is implied, there is no requirement in any widely accepted definition that information be derived from data.
Although both data and information are generally mass nouns, there are still many different ways of describing specific collections of data and information. In statistics, such a collection is called a set. In database theory, a logical grouping is called an entity and a physical grouping is called a table. In its raw form, when extracted from a table, the grouping becomes known as a data set. In content and knowledge management, the most common grouping is a document. In communications, engineers think about data being combined into messages.

STRUCTURED QUERY LANGUAGE

There is a way of phrasing questions about data contained in structured databases. It is called structured query language (SQL) and is officially pronounced es queue el3