Everyone in this audience is familiar to some extent with the basic concepts of information privacy and security. The diagram below captures the essence of these issues.
Figure 1: The Essential Elements of Information Privacy
Legal and regulatory environments, quite rightly, are of utmost concern to privacy professionals because they shape the debate about what companies and individuals must do to ensure compliance. Law also drives many administrative decisions about how to implement processes that protect customer information, insure proper governance and guard the corporate brand.
The final side in this triad is the technology that puts privacy and security into action. The technology can be as diverse as programs, storage, transmission protocols, and access controls. One of the most important technical considerations is the way that sensitive information is modeled, stored, and archived in enterprise databases. This is an area that offers many opportunities for both intentional and accidental breach of law and procedure.
One of the greatest challenges at the data level is that most companies developed and implemented large numbers of databases before there was a common understanding of what privacy requires. At most companies, privacy controls have been retrofitted or added-on to these databases without a fundamental rethinking of the overall requirements.
In most companies, sensitive customer data is spread across many departmental databases such as those in marketing, sales, finance, order processing and service. In government organizations, sensitive data may reside in multiple databases, in many departments, within diverse agencies. Integrating, storing and archiving this information can be a never-ending technical challenge.
A strategic question for many companies is how and where to store sensitive information and the privacy preferences that determine how it may be used. In practical terms, however, these two kinds of information are very different. A privacy profile may be required to determine how to access sensitive information, but is not, in itself, particularly sensitive. A customer's telephone and credit card numbers are highly sensitive and must be carefully protected. The fact that he or she does not wish to be contacted at home by telephone is much less sensitive because it is much less useful to another party. Steal a person's Social Security number and you have the keys to the financial kingdom; steal a privacy profile and you experience only a slight frisson of naughtiness.
One key idea in privacy modeling is a "privacy profile of record" that consolidates customer preferences, codifies company responsibilities, and can be shared as required across various databases. Defining such a standard can be difficult. In the U.S., for example, the sectoral approach to privacy has meant that companies think about this issue in terms of communications through a specific channel (telephone, direct mail, retail store, etc.), to a certain demographic (age, geography, citizenship, etc.), at a specific time (point of sale, before 8 pm, when a contract term expires, etc.), for a specific product or service (credit card, pharmaceutical, toy, etc.). Each of these dimensions may be regulated separately and be supported by a unique company database. Taking a holistic view of privacy goes against years of corporate practice.
Database professionals often use an approach called dimensional modeling to develop analytical databases (for data warehouses, decision support systems, business intelligence, etc.). The following discussion uses this modeling terminology to look at some key decisions in implementing a privacy sensitive data model. Throughout this article, we will consider a marketing database as an example of such a model. Other corporate databases have different design centers and functionality, but most of the following considerations remain relevant.
Figure 2 illustrates some of the common security and privacy issues in any customer communication in a marketing context.
Figure 2: Common Database Privacy Issues
At a fundamental level, privacy concerns inbound and outbound communications between a company and a current or prospective customer. Inbound communications from a customer (or data subject) may involve many things. One of these may well be a requested change to the customer's privacy preferences. In light of current privacy laws, these changes must be made quickly, and in most cases, they must be confirmed to the customer.
Outbound communications, on the other hand, must check known privacy preferences to ensure compliance with the data subject's wishes. Inbound and outbound communications are often related (an inbound customer response to an outbound marketing offer, for example) as the diagram illustrates. Companies often think about checking a customer's privacy profile before sending an outbound communication. However, they less frequently scan inbound communications to check for any customer changes to his or her privacy preference.
Checking for permission to send an outbound message is not all that must be done. Security is also an issue. Many customers (both individuals and companies) have preferences or requirements about the method of communication. This is not simply a matter of accepting communications by email but not telephone; it also involves the message protocol. Is it encrypted or clear text? Can the message be sent to a work address? Is a return receipt permissible? Can it be sent to a mobile telephone? Privacy preference and security configuration go hand-in-hand in most customer communications.
At the database level, privacy and security are not simple entities or lists of facts. Figure 3 illustrates some of the typical dimensions of a privacy profile.
Figure 3: Typical Dimensions of a Privacy Profile
At the bottom of the diagram are the obvious aspects of privacy: the customer's preferences, the date and time they were captured and last updated, and compliance information that must be compiled for corporate and legal governance reports. Moving up the diagram illustrates some of the growing complexity of managing privacy information.
There are types of privacy requirements that may vary by communications channel, product, or geographic region. Many channels have evolved elaborate means for maintaining privacy requirements (the Platform for Privacy Preferences Project, or P3P, standards being developed by the Worldwide Web Consortium, for example). Standards for other communications channels may be less advanced or inconsistent based on the disparate legal requirements in different geographic regions.
Since a customer may reside in one area, but be governed by the laws of another jurisdiction, it is often important to store not just a mailing address but also the legal country of residence. Determining the relevant legal jurisdiction for an email communication from the U.S. to a British citizen living in Australia and using a mailbox hosted in Canada may provide much fruitful work for the legal community, but it also involves a set of complex choices for the database designer.
A similar situation exists in the dimension of interested parties. The privacy profile may include parental information for a customer that is a minor; it may include a key to patient history or heath provider information; and it certainly may include information about government, regulatory; or law enforcement agencies.
People's preferences change over time and their memories are not necessarily accurate indicators about what they previously authorized. Knowingly, or not, many companies maintain multiple, conflicting profiles about customers. One may grant the company wide discretion to communicate, another may retract such permission, and others may restrict contact to a specific channel. The time stamps on such data may overlap and the customer's intentions may not be clear.
By default, most companies have distributed such privacy information across multiple tables in multiple databases. The same is true of most sensitive information. The question is: should they continue to do so? The following chart describes some of the tradeoffs in deciding whether to distribute or consolidate this kind of information.
Figure 4: Criteria Concerning Centralizing or Distributing Sensitive Information
As is often the case in technology, developers must balance performance, security, ease of use, and maintenance. Clearly there is no universal recommendation to be made. One way to look at the issue is to determine what kind of privacy breach is the most probable. This frequently comes down to whether the greatest privacy risk lies in accidental or intentional disclosure of sensitive information. All companies face a risk of accidental disclosure. In some industries (banking and government are two obvious examples), there can also be a high risk for intentional or malicious actions.
When accidental disclosure is the principal concern, then a centralized approach is preferred. Keeping sensitive data in a small number of dedicated tables or repositories simplifies updates and maintenance. However, it also creates a prime target for attacks.
Encrypting information, not just during transmission, but also on dynamic storage devices such as disks, is expensive in terms of computing cycles. It is also a logical way to prevent unwanted access and exposure of sensitive information. Is it less expensive to encrypt a small number of specialized tables or to encrypt information that is distributed across multiple tables? Again, there is no universal answer. Such a decision is closely tied to the usage model. Is there a high volume of accesses that are highly structured and repetitive? Is the use model more unpredictable and unstructured?
One issue that is typically not open to debate is the kind of keys that are used to access tables that contain privacy details or sensitive data. The key itself should not be sensitive information such as a Social Security number, contact name, etc. In these kinds of applications, it is best to use surrogate keys or another unique identifier that is neither a piece of sensitive information nor a derivative of it.
As we have seen, managing privacy preferences can be a daunting task, both logistical and technical, but it is only one thing the database designer must consider. Figure 5 illustrates some of the common security issues concerning storage and communication of this kind of information.
Figure 5: Typical Dimensions of a Security Configuration
A security configuration typically contains complementary dimensions about data subjects. These concern how the data is stored, how authorized users are authenticated, who may access it, how its integrity is maintained, and how and to whom it may be transferred.
Highly sensitive personal information like Social Security number, credit card, and order history is typically transmitted, and usually archived, in encrypted form. This is less likely to happen when such data is stored on disk. As discussed earlier, encryption at this level can decrease performance, and many companies prefer to invest in finer-grained access controls to limit unauthorized use.
Access controls are the topic for another discussion. Even the most sophisticated physical, procedural and electronic authentication and access controls cannot guarantee total security. This is particularly true given the liquidity of information in the typical company. Sometimes, even the best controls cannot overcome the habits of careless or malicious employees.
Privacy and security controls may be built into the original procedures and systems where data was captured and stored, but corporate data has a noticeable leakage factor over time. Personally identifiable information and sensitive data often find their way into secondary systems like a corporate data warehouse, a business intelligence tool, or a sales force automation system, and can spread in summary form through many systems and departments. Maintaining the confidentiality and integrity of personal data is not a static, batch process. The company must be ever vigilant about data migration and the institutional amnesia that often arises as data ages.
While the technical issues that concern data privacy and security can be daunting, inaction is not an option. Regardless of company size, industry, geography or business model, there are some obvious starting points.
First, it is important to inventory the sources of and repositories for sensitive data and the customer's stated privacy preferences. These are often in a variety of databases, in many departments and may conflict. Many European companies have completed such a process in preparation for the implementation of European data directive. Similarly, any U.S. company that has undergone a process of self-certification for a Safe Harbor listing with the department of commerce will be very familiar with this investigative process. One of the key concerns at this stage is to document all assumptions, processes and decisions. In the event of a legal inquiry, such an audit trail will prove essential.
Second, in most cases, it is important to define a privacy profile of record that is the company's best estimate of the customer's true intentions. As discussed earlier, this can sometimes be a challenge. Where security permits, the benefits of a centralized approach are compelling. However, when dealing with privacy data and other sensitive information, security is the ultimate trump card. Whatever the decision, the customer should be contacted to confirm the validity of his or her privacy record.
Third, after these records have been standardized, all communications to a customer must be vetted against the profile. Similarly, all inbound communications must be checked for any change requests to the privacy record.
Fourth, there is the issue of how and where privacy and security information should be stored in a database. Our recommendation is typically that there be a common privacy repository to which other systems have access. This creates a central holding place for privacy data that is the company's best record of customer preferences. In many large companies, it is impractical to hold such information in a single, physical database. The same is true of sensitive information. The goal here is typically to consolidate information, develop detailed access controls and encrypt as necessary. In both cases, the repository may consist of several federated databases or may reside in a data warehouse from which other databases receive regular feeds.
Finally, a strategic look at sensitive data assets should include a discussion of when and how to delete such information. Some regions, such as the EU, have begun to implement regulations that limit data storage to the term of the original business purpose. In other words, if a company captures customer information through a specific campaign or product purchase, such data should be deleted when it is no longer needed (at the end of a warranty period or service contract, for example). Other laws, such as the EU data retention directive, mandate that Internet service providers, telephone and mobile telephone operators keep records of their customer's communications for up to two years.
Privacy law is complex and evolving rapidly. There has been much discussion in the U.S. recently about passage of a national privacy policy that codifies and supersedes existing law. However, waiting is not an option. Regardless of the specific legal or regulatory environment, many technical investments (such as those described here) can be taken immediately. A single breach of acceptable practice can cost the company millions of dollars in legal penalties and brand damage.
|