JOINT MEETING

 

                              OF

 

             THE FORUM ON TECHNOLOGY & INNOVATION

 

                              AND

 

                  ALLIANCE FOR HEALTH REFORM

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This transcript was produced from tapes provided by The Forum on Technology & Innovation.


                     P-R-O-C-E-E-D-I-N-G-S

              SENATOR ROCKEFELLER:  ‑‑ through what is a little bit of history here, because this is the first time we've ever had a joint meeting of our Alliance for Health Reform and our IT Forum.  And it's appropriate.  Hopefully, it will happen again; I suspect it will.  And this is an absolutely terrific group of people.

              We're going to be talking about medical privacy genetics and the impact of technology.  Obviously, that brings the Alliance and the IT Forum, which are both 501(c)3s, together in a way that it should.  And as technology spreads in both directions, as I said, I think we're probably going to be doing more of this in the future.

              Bill and I, or Senator Frist and I, Chair both of the groups, so we're quite comfortable here working together.  Our rules, for those of you who have not been here before, are that we take no, nor do the Forums take any point of view on any issue.  Senator Frist doesn't, I don't.  We simply facilitate.  And then we bring in speakers who have different points of view.  They present those different points of view, and then the whole burden is upon you through questioning, the green cards that you have, microphones if we have them ‑‑ there's a couple in the back ‑‑ is to challenge, is to go for it, just to probe them.

              And again, for those of you who haven't been to this before, our philosophy underlying this is that the information best come from staff, because staff are most likely to understand, and then we'll get aggressive with their senators and congressmen, force them to engage themselves more in the issues.

              So, you're the audience; you're who we care about; you're why both of these groups are formed, and we're very happy about it.

              Today's topic has not been addressed, really, with major legislation.  In 1966, we even set a deadline to force the creation of medical privacy rules, but we haven't done anything about it.  We did that through the hip-up, but we haven't done anything about that.  We haven't taken any action in terms of medical privacy.

              So, the reason completion of the mapping of the human genome raises the stakes even more.  And with a lot of promise that comes from medical advancement also comes the whole question of what happens to privacy, and that was in the papers this morning.  It's on the minds of people ‑‑ genetic discrimination, whatever.

              It's interesting that if you look at the 500 top Fortune Magazine companies, most of them ask questions before they hire about medical history.  Most of them do that.  So, the question is what are they going to have available to them before they make these kinds of decisions?  What will be available to them, and will genetic information be one of those things, which is available to them?

              And as more and more individuals use the Internet for health information services, we have to be careful about on-line privacy.  Third-party selling is a fairly common thing.  People say they don't, but they do.  We've seen that, again, in the papers people who say they won't do it or that's the whole opt-in, opt-out question.  There's a very big difference, and the stakes, again, for privacy are enormous.

              So, that's what we're going to talk about.  We've got excellent panelists to give you different points of view.  We want you to be aggressive.

              And Senator Bill Frist.

              SENATOR FRIST:  Thank you, Jay, and it is my pleasure to welcome everybody, as well, to this combined meeting of the Tech Forum and the Alliance for Healthcare Reform.  I am very excited about the next few minutes, the next hour, because, again, as I've said before, in these meetings, one of the real gems, I think, of the Alliance and the Tech Forum is that we're going to all leave here a lot smarter than when we came in.  And there are many meetings in Washington, D.C. ‑‑ I won't say most ‑‑ but many meetings where you just can't say that.  So, this is a great one.

              For the speakers, I'm going to go ahead and introduce the four speakers.  And I apologize, we're at eye level now, so I will ask the speakers in their initial presentation to use the podium just so that people can see you make your presentation.

              I want to remind everybody to fill out their cards and or come to a microphone with their questions.

              I'll be introducing the speakers in turn ‑‑ or in the order that they will be presenting.  First, Dr. Kari Stefansson who is President, Chairman, and CEO of deCODE genetics.  deCODE genetics is constructing a comprehensive database of medical and genetic information of nearly every citizen of Iceland under contract with the Icelandic government.  Dr. Stefansson is both a medical doctor and a Ph.D. research physician and has held faculty appointments at Harvard Medical School and University of Chicago Medical School.

              George Lundberg is Editor in Chief of Medscape, a leading Internet site for health information.  A pioneer in on-line medical information, he has been called the medicine man of the Internet.  He has helped, having founded both Medscape General Medicine and CBS Healthwatch.com.  Dr. Lundberg is a physician with academic appointments at Harvard and Northwestern Universities.

              Latanya Sweeney is a Professor of Computer Science and Public Policy at Carnegie Mellon University.  She, too, is a pioneer, a pioneer in the new area of computer science, termed computational disclosure control.  She is an expert in data privacy and has developed systems to protect individual privacy for electronic databases.  Professor Sweeney has lectured and written widely on data protection policy particularly as it relates to personal medical information.

              Janlori Goldman is Director of the Health Privacy Project at Georgetown University.  I guess it was a year ago Janlori that we were all together and heard your excellent presentation at that time.  She is an expert in medical privacy, a forceful advocate of a more robust privacy protection in the healthcare arena.

              She co-founded the Center for Democracy and Technology, which is a non-profit organization committed to preserving free speech and privacy on the Internet.  She has also worked at the Electronic Frontier Foundation and is a past Director of the Privacy and Technology Project at the American Civil Liberties Union.

              Our plans are for each of our speakers to make approximately eight to ten minutes of remarks.  We will be putting the hook out if you're up a grunt later than ten minutes, and we can fill in with the questions if you are not quite finished with your presentation.  We'll have a roundtable discussion following their formal presentations, at which time we will entertain as many questions as possible from the audience.

              Let's begin with Dr. Stefansson.  We'll proceed in the order that I introduced.  Dr. Stefansson, welcome, and appreciate you being with us today.

              STEFANSSON:  I would first like to begin by thanking the senators for inviting me to talk here.  I want also to begin by making a little bit of a correction.  The centralized database on healthcare that is being constructed in Iceland ‑‑ according to law, that was passed in Iceland in December of 1998, it is simply a centralized database on healthcare information produced in the process of delivering healthcare.  It is not a genetic database.

              This is terribly important and becomes sort of a central point in what I'm going to discuss with you here today.  I'm not going to elaborate on technical terms or specifically what we are doing in Iceland.  I'm simply going to discuss privacy in the context particularly of biomedical research, privacy as a right balanced against what I believe are obligations when it comes to the healthcare system.

              And remember that privacy, as it relates to what we can do in biomedical research, may for some of us eventually decide between early deaths and longer lives.  It's nothing more and nothing less.  It may eventually have a bearing on rights that at least in the minds of some transcends privacy in importance, such as the right to life.

              Having said this, I want to emphasize that in my mind privacy is an important right that should be cherished and should be protected.

              The basic issue when it comes to healthcare information used in the delivery of healthcare, as well as in research such as genetic research ‑‑ and privacy is a societal issue not a technical one.  And keep in mind that this comes from the mouth of a man who runs a company that, for example, wants to market technical solutions when it comes to protection of privacy.  It is an issue of how society looks at a right to good healthcare in the context of our obligation to make our contributions to the improvement of the same.

              In my mind, the debate on biomedical research and privacy is of utmost importance, but it, I think, has been led a little bit astray by those who believe that the right to ‑‑ that they have a right to healthcare that is completely unequivocal, but we have no obligations, no particular obligations to contribute information about ourselves to research and aimed at maintaining and furthering the quality of healthcare.  And I will come back to this point.  It's a terribly important point and has to do with the balance between our right in our society and our obligations.

              There are two issues that here are closely related but clearly distinct.  The first issue is how we make sure that we can make use of important data on healthcare and genetics to mindful new knowledge about nature of disease and health in order to develop new methods to diagnose and treat and prevent disease without violating privacy.

              The second issue is how and to what extent society decides to protect the privacy of the clients of the healthcare system.

              These two issues constitute a specific case of the difference between generation of knowledge on one hand and the use of knowledge on the other, and there is a clear line of distinction between the two.

              Let's begin by examining the issue of the discovery of knowledge.  There are two fundamental kinds of data that are used in the act of discovering new knowledge in the nature of health and disease.  One kind is our data that again ranges in the process of delivering healthcare for the purpose of delivering healthcare.  The others are data that again ranges specifically for the purpose of research.  These are totally different kinds of data cells.

              Let's begin on the data collected in the process of delivering healthcare.  These are data that are collected about us when we enter hospital or clinics when we are sick.  These data about us are then placed in the context of knowledge that was discovered by taking advantage of healthcare data on people who came before us.

              It is, however, by some considered to be a right to deny science to use information about us to develop knowledge so that those who follow us ‑‑ our children and their children ‑‑ can enjoy the same quality of healthcare as we do.  Their right to decline to have information about us used in biomedical research is considered to be a part of our right for self-determination, a part of our autonomy.

              Fortunately, the voices that oppose the use of healthcare information produced in the process of delivering healthcare using presumed consent have not prevailed.  We have been allowed to use this information without explicit consent.

              And I am pretty convinced ‑‑ I am absolutely convinced that if we would have been held to the use of explicit, informed consent for the use of healthcare information of this sort, we would not have healthcare as we know it today.  There is no question about it.

              The definition of the right to healthcare without obligation of contributing to science, the information that is generated through the delivery and acceptance of healthcare breeds predatory behavior.  You accept the gifts of others who came before you without any obligation to contribute to those who follow you.

              I think it is clearly possible, I think it is almost certain that the great English poet, John Dunn, was writing in anticipation of the debate on privacy and healthcare when he composed the following poem of singular beauty.  I have to recite at least one poem over Americans.  "No man is an island entire by himself.  Every man is a piece of the continent, a part of the main.  When a clod be washed away, Europe is the less, as well as if a manor of a friend or thine own were, every man's death diminishes me, because I'm part of mankind.  And therefore never ask for whom the bell tolls; it tolls for thee."

              Remember that every time we go to the healthcare system and use its services, we are benefiting from the fact that the bell has tolled for others who came before us.  And let's make it certain, let's make it obligatory that when the bell tolls for us, it will be at least potentially beneficial for those who follow us.  I think it is very important.  I think this is a question of the connection between right and obligation.

              Let us now look at the data that are generated for the sole purpose of doing research.  In my mind, such data should not be generated without explicit, informed consent.  And it was actually research of this sort, generation of data like this sort, that was dealt with at the Helsinki Convention when it was convened after the Second World War for the purpose of preventing the crimes committed by the 3rd Reich to be committed again when people were under the disguise of scientific research committing crimes.  The goal was to protect the autonomy of people.  They should not participate in biomedical research unless they wanted to.  The instrument that was instituted was informed consent.

              And let's now look at how the use of informed consent has sort of taken changes, how it has evolved over the years.  And I'm only going to mention one of the developments, which I'm pretty concerned about.  This development has to do with the fact that there's some part of the ‑‑ or a part of the bioethics community has come to the conclusion that it is bad to allow people to give broad consent.  Possibly if I would go to one of the senators and ask them to give me ten cc's of blood, to isolate DNA, to look at variations in the genome to study one disease, the senator would be able to give me such a consent.  If I would ask the senator to give me consent to use the data from this to study every disease, the senator would be told by the bioethics community, "No, you cannot do that, because your consent would not be informed."

              So, all of a sudden informed consent, which was an instrument has become a goal.  It was an instrument to protect autonomy, but all of a sudden it has become a goal, and it's used to limit the very autonomy it was meant to protect.  And I think it is terribly important to make sure that people can give as broad consent for the use of data on them as possible if it doesn't constitute a threat to their lives and if the only risk that is taken is informational risk.  I think this is quintessential.

              And this brings me to the second of the two original issues I raised ‑‑ the issue of the difference between the generation of knowledge and the use of knowledge.  And when it comes to the use of knowledge, I think it is probably best to give a real live example from genetics.

              There are two breast cancer genes that have been discovered, and if you have a mutation in either one of them, you have an increased probability of developing breast cancer.  And an awful lot has been written in the American press about the possibility that insurance companies will abuse this knowledge.  They will demand from women that a mutational test done before they're insured.  And if they have mutation, they will either be declined insurance or the premiums will be raised.

              This would, in my mind, constitute violation of two very important rights:  The right not to know ‑‑ you don't have to know if you don't want to whether you have a mutation or not ‑‑ and the other is the right to equal treatment irrespective of your genetic background.

              This raises the question as to whether we should not have made this discovery, because it can be abused.  And my answer to that question is it would have constituted crime against humanity to suppress the discovery of these genes, because they will eventually be used to save lives, save lives for women from bad diseases.

              We should, however, have very low threshold to pass a law to set regulation that forbids the abuse of knowledge.  But it is very important to recognize that you are not going to control this world by controlling the discovery of new knowledge, because new knowledge is new, and you don't know what it is until you have it in hand.

              In conclusion, I'm going to sort of appeal or I'm—or my plea to these great senators that we have here would be the following:  Make sure ‑‑ do whatever you can to make sure that scientists in the States, your wonderful country, will be allowed to use healthcare information produced in the process of delivering healthcare with the use of presumed consent.  Also make sure that ‑‑ and this is important remember, because it is never a crime when this information is used to discovery new knowledge, and it is almost always a crime when it is not.

              Secondly, make sure that broad consent will be allowed.  We should allow people to contribute as much as they can to this discovery of new knowledge.  And in the end, you know keep in mind that compassion for the sick and the wounded is a quality of good people.

              Thank you.

              (Applause.)

              SENATOR ROCKEFELLER:  Thank you, Dr. Stefansson.

              Dr. Lundberg.

              DR. LUNDBERG:  Thank you very much, Senator Rockefeller, Senator Frist, Mr. Howard, and members of the Alliance and the Forum, fellow speakers, ladies and gentlemen.

              I appreciate the opportunity to participate in this Tech Forum today on the important issue of privacy in the Information Age.  Two currently very hot topics:  money privacy and medical privacy.

              When I was a child growing up in rural South Alabama, my father used to listen to a great comedian on the radio called Jack Benny.  And I would listen to that Jack Benny and laugh with my father.  And there's this marvelous joke that Benny once told that has to do with this issue of money and medicine.

              Jack Benny is confronted by a thief.  The thief says, "Put up your hands."  He says, "I got them up."  The thief says, "All right, your money or your life."  Silence.  He said, "Didn't you hear me:  Your money or your life."  Jack Benny says, "I'm thinking, I'm thinking."  Okay, privacy ‑‑ money, privacy, life, the very elements of medicine.

              Some human facts, activities or events are so personal, so private that many human beings prefer that they not be shared with anyone else.  But sometimes in the course of human experiences it is essential to share such information with another who can be trusted not to use this information against the person.  The other to whom I refer has over centuries come to be known as a learned professional.

              Learned professionals traditionally are physicians, attorneys, and the clergy, and they exist because we as people need them from time to time to share the most intimate knowledge of our lives, our minds, our bodies, and our souls.  We must trust them then not to hurt us when we are most humanly vulnerable.

              Actions such as doing surgery, defending a person in a lawsuit or hearing a confession of sin are examples of professional actions.  But in the course of these activities information is generated.  Information may even be a central element of the professional relationships, because much of the practice is medicine is only information.

              I have always taught my medical students, residents, and medical technology trainees that the patient-physician relationship was sacred, hallowed, and information exchanged between them was the business of only them.  That I believe is ideal.  What's reality?  Fast forward.

              Medical records are kept, paper or electronic ‑‑ they are the same.  Other human beings have routine access to those records, not just the doctor and the patient.  What are we going to do about that?

              Insurance companies pay bills.  They demand to know what they're paying for, like your mental illness.  There are huge banks of patient-specific data on your health if you have insurance.  What are we going to do about that?

              Pharmacies dispense drugs.  Some are disease-specific, like acyclovere for your herpes genitalia.  There are huge databanks of pharmacies that can tell who's taking what for what.  What are we going to do about that?

              There are national chains of clinical laboratories that do millions of laboratory tests.  The results say, yes, you have syphilis today.  And there are major banks of that.  What are we going to do about that?

              Most hospital records are still on paper, and if you go to the record room ‑‑ if you haven't been to the record room of a local hospital, I invite you to do that.  If you go down to that record room, you'll see mountains and mountains of piles of paper, about 20 percent misfiled, and the lowest paid people in the hospital chain are taking care of those records.  Any one of those pieces of paper could be copied by any one of them and sent anywhere.

              There are no secrets in hospitals.  Sometimes there are efforts to keep secrets.  My most famous pathology consult was when I was a professor at the University of California doing consultative forensic toxicology, and the autopsy report that I was asked to make an opinion on was on a patient from Memphis, Tennessee, Dr. Frist, and his name was Erin Sivle.

              Well, the question was did Erin Sivle die of heart disease or of multiple drugs effects, and a physician licensed in the state of Tennessee's license was going to be dependent in part on the interpretation of this autopsy.

              Well, it turns out this was a primitive effort by a hospital in Memphis to conceal the privacy information of a fairly famous person named Elvis Presley whose middle name was Erin, and Sivle is Elvis spelled backwards.  So, the autopsy record on Elvis Presley is under Erin Sivle.  A primitive effort, but then again Elvis was fairly primitive himself, so I guess maybe that might not be inappropriate.

              And now we have genetic tests, many, revealing the chemical essence of our very being.  Having grown up in the South, as I indicated, I remember that we were taught that certain anatomic structures, which many of you probably know about, were referred to as our "private parts."  I would suggest that our genes are our really private parts.

              Now, let's segue to what this conference is all about, including what I've been talking about, and it's supposed to be about the Internet.  Okay, I earn my living, such as it is, with the Internet, but I would ask you to only think of it this way.  The Internet is the medium; it is not the message.  It's simply a way of transmitting information.  Different in some ways but the same as every other method of transmitting information.  Don't get hung up on the Internet; it's just a medium.

              How does one legislate or regulate or self-regulate the privacy of all this medical information, which ought to remain private, be it spoke, written, electronic, paper or Internet?  Now, fortunately, the medical Internet did not spring from a vacuum.

              We have five, rich, intellectual banks to draw from, from which we have developed over the past six months the ethics of the medical Internet.  These rich data sources are ‑‑ the medical Internet is medicine, so we work from medical ethics.  It is journalism; we work from journalism ethics.  It is sometimes, and some of us hope it will become more so, a business, so you work from business ethics, and that's not necessarily inappropriate to put those two words together, "business ethics;" there are such things.  And the medical Internet is medical journals, so you work from the ethics of medical journals.  And it's also medical education, so you work from the ethics of continuing medical education.

              A handout that was referred to that you will have at the door ‑‑ I brought 50 copies, but there are many more of you than that for which I apologize I didn't bring more than that ‑‑ but it will have printouts of key parts from the Counsel on Ethical and Judicial Affairs, the American Medical Association, which creates the ethics for American medicine, the International Committee of Medical Journal editors, which creates the ethics for how medical journals ought to behave, and then it has the key parts on privacy from the EHealth code that we rolled out here on Capitol Hill about a month ago, and another one from High Ethics that was rolled out in San Francisco from a slightly different group about a month before that.  And finally, a set of the privacy rules from the Journal of the American Medical Association written almost entirely by my former staff, since I worked there for a long time in JAMA, and published in May of this year.

              All of these are there.  They deal with very specific areas of ‑‑ now, to the table of contents of the book of medical ethics from AMA.  There are more headings and more policies regarding confidentiality and privacy than anything else.  It's the top one.  This has been around as an issue for a long time, a lot of people thinking about it.

              On a final statement, you go on our site, www.medscape.com, and you see our privacy policy, which is relatively simple.  It says, "Medscape does not provide or release names or e-mail addresses of members to any third party without the member's explicit permission.  Medscape does not and will not use cookies or any other technology to track or report on member activity when they are not on Medscape nor pass member data on to other web sites.

              Finally, our Company now merged, called Medical Logic Medscape, favors regulation and legislation regarding privacy of medical records and has expressed willingness to work with the Congress to develop such if the Congress wishes us to.  We only have one stipulation:  You don't just regulate the Internet, you regulate information no matter which way it might be transmitted.

              Thank you very much.

              (Applause.)

              SENATOR ROCKEFELLER:  Thank you, Dr. Lundberg.

              Professor Sweeney.

              PROFESSOR SWEENEY:  Since I'm using technology here, they told me to stand over here.

              I want to thank you for this opportunity to be here and to address this issue.  The primary thing that I'd like to add to the mix in terms of my introduction is basically what is going on with data now, data that are publicly available, that is data that you or I or anyone else pretty much can get for a nominal amount of money or semi-publicly available meaning that there's a slight barrier?  That barrier might be an additional fee, but for the most part it's pretty regularly available.  What does it look like in the nature of health data, and what happens when that health data meets genetic data?

              I'm primarily just going to make ‑‑ due to brevity of time, I'm just going to make sure that I just make only brief major points that we are under data surveillance, there has been a tremendous explosion in personal collected data, and some of the problems that result because of an inability to understand the technology by our policies and practices, and what happens when genetic data is added to the mix?

              A couple of years ago in an attempt to characterize the amount of information that's been collected on individuals, I introduced a new term called "disk storage per person."  This is basically the amount of rigid disk drive space sold in a year divided by the adult world population in that year.  Currently, what you see on the far left is a graph of that over time.  The elbow happens around the mid-1990s.  We're well on the exponential growth.  This chart happens to correlate with access to inexpensive computers with large storage capacities, which tends to be bringing forth this revolution.

              I'm going to just quickly show you just in brief quick instances from the state of Illinois how this has made a dramatic increase in data.  At the time I was born, as well as almost everyone in this room, this was the sum of all the ‑‑ these 15 fields were all the fields that were collected on your birth certificate.  Today, in almost every hospital in the United States the following over 200 fields are collected on each birth, each time, stored in a database, and that's 226, and in some cases made available on the Internet.  These are from on-line birth certificates from the state of California in certain counties.

              Another explosion that we've seen is in hospital visits.  Hospital data doesn't stay in the confounds of the hospital nor is it only located with the insurance company.  And just to give you a sense of that, in over 40 states in the United States a copy of the hospital information, like the fields that I'm beginning to show you now, are collected on each hospital visit and then made publicly available.  It includes a patient demographic, such as age or in this case data of birth, along with various diagnosis and procedure codes and various charges that are specific to the care.

              We can all relate, of course, to grocery data; that is, the grocery store can know if we use their loyalty card, exactly what we purchased, and so forth.  And those are only three very simple examples.  They don't include visual data surveillance and others.

              Let me give you a sense of what this means.  What kind of problems does it really mean for trying to look at this in terms of policy?  One of the confusions is that all of our laws, practices, and regulations continue to be confused by the idea that if I remove the explicit identifiers, such as name, address or security number, or somehow encrypt them, that the result is anonymous.  And, so by those policies, regulations, and so forth, we would consider these three fields sufficiently anonymous to be made publicly available.  And you may believe that especially if I tell you these three fields are part of a very large and diverse database.

              But if I subsequently tell you that 33171 is a zip code primarily of a retirement community, then there are going to be very few people of such a young age living there.  02657 is the zip code for Provincetown, Massachusetts, and reportedly there are only five Black women who live there year round.  20612 may have only one Asian family.  And notice this information outside the data that helped to identify these individuals.

              Let me give you a very quick example from the state of Massachusetts.  The Group Insurance Commission is the group responsible for purchasing insurance for state employees, their families, and retirees.  They collected the type of data that I showed you earlier ‑‑ this is a subset of those fields ‑‑ and then copies were made available to researchers, and additional copies sold to industry.

              For $20, I went to the City of Cambridge and purchased the Cambridge voter list, and it came on two floppy diskettes.  In fact, all of the examples that I'll be giving you are examples that use only standard computer technology with standard office shelf software.  The voter list, as you can see, also has a zip code, birth date, and gender, along with the name.  Clearly, the idea is to take this believed to be anonymous data and reidentify it by linking on zip code, birth date, and gender.

              The question, of course, is how unique would such a linking be?  Cambridge, Massachusetts is a little unusual in that it houses both MIT and Harvard, so there's a skew in the population to the early '20s.  But even with that, the numbers are quite revealing.  Birth date alone, that's month, day, and year of birth, is unique for 12 percent of the population.  That means when those people go and they visit a web site and the web site only asks what city do they live in and their birth date that could be enough to uniquely identify them.  Birth date and gender, 29 percent; birth date and the five-digit zip code, 69 percent; birth date and the full postal code, 97 percent.  Note that this is only one- and two-way combinations, not three-way and beyond.

              I chose Cambridge, Massachusetts because William Weld was the Governor of Massachusetts, and he lived in Cambridge.  Only six people ‑‑ and his medical data was in the GIC release.  Only six people had his birth date, only three of them were men, and he was the only one in his five-digit zip code.

              In subsequent experiments, the numbers have been replicated throughout the United States, and recently we tried to ‑‑ we went to figure out how many people in the United States are identifiable on which characteristics.  We found that 87 percent of the U.S. population is identified uniquely by birth date, gender, and zip code, five-digit zip code; in some cases, the entire state.

              This is another quick example.  This is a release under a four-year request from a release from a cancer registry in a particular state.  The data that you see under diagnosis and zip code has been made up to protect both the identity of the state as well as the identity of the patients.

              This was supposed to be one of the most difficult cases to reidentify, because neuroblastoma is a cancer found primarily in children, and not only that there is no ICD-9 diagnosis code that is neuroblastoma.  So, even if you get the health data, you can't look and say who had neuroblastoma.  It can only be inferred from a preponderance of care and a preponderance of diagnoses.

              The diagnosis data is only the month and year of the diagnosis and the five-digit zip code.  The copies were released to me from the cancer registry, and after again using standard software and publicly available data, I was able to reidentify them with 100 percent accuracy.

              You might say how did I do it?  I sort of put together this chart.  Any path from the top of the chart down to the bottom is a possible way to reidentify those children.

              Another problem that I see a lot in the data sharing is that a tremendous amount of attention is given to the person who collects the data.  So, in hospital data, it's usually the hospital.  And we try to put boundaries around who it is that they are allowed to give the data to, and then after that we don't care.  And, so we see a lot of uses where then subsequent releases of the data are totally not controlled at all nor is any attempt made.

              That gives you a sense of the space of the kind of problems that show up.

              In more recent work, I've been working with Brad Malen who is a graduate student of mine, a very gifted graduate student at Carnegie Mellon.  We've been looking at the, what we call, second-generation DNA databases.  These are generations of databases that are appearing throughout the United States in hospitals.  And there have been a lot of discussion within the medical community should they be considered a part of the medical record and therefore just distributed the way hospital data is distributed or should they in fact just be distributed by themselves as a research database autonomously?  And after all, how could they be reidentified?

              And, so what we've been doing ‑‑ this diagram shows on the left side the privately held data, on the right side, the data that would be found in the public that's publicly available.  So, one thing that we sort of, I think, you know, intuitively but one of the things we had to quantify is how much additional risk is brought in when DNA data is added?  So, we have some measurements called "gross maximal risk," and what have you.  You can see the numbers.  This is again from the state of Illinois.

              The public health data is the current risk of the society in the state based on their practices that they currently engage in.  And when the DNA data is also released, you can see how the graph grows quite large.  There are a lot of still privately held data that's right in the middle of changing, getting ready ‑‑ where people are getting ready to release it.

              The last thing I want to point out is that we've been looking at this program called Clean Gene, which is a program that we've created that one time infers how is it that if you have only a DNA database that you could actually reidentify the person who's the subject of the DNA sequence?

              And, so what happens in step one is we may or may not be able to ‑‑ we usually are able to identify gender to the DNA sequence.  What happens in step two, depending on how it was sequenced, we basically can infer particular diseases, which in fact was the reason that the DNA was collected by the hospital in the first place in these second generation databases.  And, what's happening in step three and step four is our linking basically to the hospital data that I described earlier, such as the GIC data.

              And, so we've been able to show that we can actually go both ways.  We can take the DNA sequence data and infer what would have to be true in the health data, and we can take the health data and draw inferences and limit which DNA consequences most possibly match it.

              So, in closing, I just wanted to say ‑‑ make three points.  One is that we are having an explosion in data, that there are a lot of problems with our policies and practices, because we don't really understand the identifiability of data, and that when genetic data is added to the mix, it does increase the risk tremendously.

              Thank you.

              (Applause.)

              SENATOR FRIST:  Professor, thank you very much.  That was quite enlightening.  As we were sitting up here ‑‑ I mean everybody in this room was saying that somebody out there is looking at them this very second as we go through.

              But thank all four of our speakers.  Oh, of course, our veteran here who has been probably the most visible on this particular topic in the three years that I've been dealing with genetic information, probably the most visible and most active, a real advocate as somebody who has testified is the veteran of our panelists here, Janlori Goldman.

              DR. GOLDMAN:  Well, Senator, I have to say you played on one my greatest fears which is that by inviting me back you were hoping and praying that I wouldn't say the same thing, that I might have something new to say, but maybe with the assumption I was going to just repeat myself, we can get right to the questions.

              But I have to say there is nothing better than following Latanya Sweeney on a panel ‑‑ nothing.  Because what she does, I think, is make all of us get in our gut, that no matter what we know, no matter how many articles we read, no matter how much we study and research ourselves, what she is doing is proving in some ways what our true fears are, that no matter how anonymized we think the databases are and no matter how many promises we hear, and we hear many promises ‑‑ don't worry, trust me, it's going to be fine, I'm only going to use it for this one thing, it's never going to be used for another purpose, it will be absolutely non-identifiable.

              If any of you have dealt with this issue as a staffer or as I have as an advocate, you hear that a lot, and I think Latanya's work is critical in helping us understand that we are still vulnerable.

              The Health Privacy Project, as some of you may know, was created a number of years ago to try to create greater privacy protections in the healthcare area, and that privacy ‑‑ our view is that privacy is critical to improving the quality of care in this country and to broadening access to care.  What we want to do is to provide the greatest resources for you in looking at this issue and to provide the kind of information that you need to make good policy judgments.

              What we've done is develop a set of best principles that we did with a working group of diverse stakeholders.  We did a survey of state health privacy laws, health privacy statutes.  We have put together a primer for consumers on health privacy.  All of these reports are available at our site, at healthprivacy.org.  Feel free to read them, download them, share them, do with them what you will.

              And we also start from the position that the technology that is available today should be used to harness the opportunity that we have to protect privacy to a greater extent, to put better security in place, and that while certainly there are greater risks and the magnitude of the risks are much greater with the Internet, with databases, with genetic testing, that we can also use that as an opportunity to build privacy in up-front into our policies and into our technologies.

              Another thing that we found in the last few years in trying to be more focused on creating an empirical basis for understanding how the lack of privacy affects healthcare and affects how people see care, is we've been involved in a number of studies, empirical studies, polling to try to understand the impact.

              And what we have found in a very broad sense is that about one out of every six people will do something when they're seeking healthcare or deciding whether to seek healthcare to protect their privacy.  That out of fear that the information may fall into the hands of employers or insurers or family members or that they may just be in some ways embarrassed by the release of certain information or especially with genetics, that it may affect their family members, that the release of their information may somehow affect future generations, that people are withdrawing from full participation in their own care.

              They're leaving information out when they see their doctor, or they're paying out-of-pocket for certain care that they're entitled to reimbursement for.  Maybe they're afraid to seek care at all, testing, especially for stigmatized illnesses and conditions.

              We know this about mental health, we know it about communicable diseases, certain kinds of cancers.  We certainly know it in the genetics area that the lack of privacy is a major barrier to people seeking care.  In the genetics area, the studies have shown that the lack of privacy is the number one barrier today to people seeking testing and counseling.

              There was a recent CNN/Time poll that showed that half the people in this country think that mapping the human genome is immoral.  Now, that's terribly troubling given the incredible advance that ‑‑ scientific advances and medical advances that can come from that.  But 75 ‑‑ and I think there's a link here ‑‑ 75 to 80 percent of the public are worried that the information will be used by insurers, will be used by their employers to make decisions to deny them jobs and deny them benefits.

              So, the Congress has been working, I think, pretty hard the last few years to try to pass anti-discrimination legislation targeted at genetics to try to create protections and employment and insurance.  And I think that's critically important, but it's only half a solution.

              We can tell employers and we can tell insurers that they can't discriminate on the basis of certain information, but the temptation will still be there.  The risk will still be there as long as we allow that information to get into their hands.  If we put privacy protections in place and say, "You don't have a need for this information, you shouldn't have access to it in the first place," then we create, I think, a much more comprehensive set of protections.

              Now, the Internet is a whole other problem, because even though, as Dr. Lundberg said, it's just the medium, let's not kind of attack it as the message, the truth is the Internet is different that it is built into the capacity and design of the Internet to gather information invisibly and seamlessly.

              There was an article, and there's an article everyday, but I can just talk about today's piece in the front page of the Washington Post, about how there is software built into it.  We have designed some software in here to let us know what you're doing and what products you're using and what's useful to you.  So, the Internet is different, and it does pose some serious challenges.

              We also know from polling that's been done that lack of privacy is the number one barrier keeping people off the Internet.  And in a study that we did that came out in February, we looked specifically at health web sites.  We looked at their privacy policies and their privacy practices, and we found across the board that the privacy policies were inadequate, and that even where they did exist the practices were inconsistent with what the sites said they were actually doing.  So, information was being gathered when the policy said we are not gathering or sharing information.  Information was being made available to others without people's knowledge, without their permission.

              And, so even in this area of high sensitivity, even where we have opportunities to improve health through the Internet by making information more accessible to consumers, by allowing people to talk more freely with each other with other people who might have similar conditions, allowing people to buy prescriptions on-line and I think with some illusion that there's anonymity, we see some very serious vulnerabilities that have been not, I think, been addressed.

              I want to just suggest that we have actually learned some lessons that while we have a lot of information now about some of the vulnerabilities and some of the risks, there are some lessons that we've learned from looking at how past privacy issues have been addressed that might be helpful to us.

              We can say, at least I think I can say, maybe you'll agree, that there will be temptation to use information that was gathered for a very specific purpose ‑‑ there will be temptations to use that information in other ways that were not anticipated at the outset, that were not thought about at the outset, and certainly that were not kind of publicized or made clear to the consumer at the outset, and that those temptations will almost surely overcome whatever privacy interests might be raised.

              And I'm just going to put out two examples:  The Frammingham Heart Study, which again was gathered in a research context for public health purposes, has recently been made available for commercial purposes.  There's a debate going, an ethical debate, about should we notify the people who participated in this study?  There's genetic information in that study.  Should we notify them?  How can we notify them?  Should we get their permission?  The initial consent form did not anticipate commercial use of the data.  They're telling people that it's going to be anonymous.  How do we know?

              Second example, I think would be the Icelandic database that again gathered for a particular research purpose.  That was the expectation of the population of Iceland.  And while some of us might be moved by this notion of individuals contributing to the greater good, how can we stand in the way of contributing to the greater good?  If we know that people's fears about how their information will be used will keep them from fully participating in their own care, that lack of privacy, that lack of trust and confidence will keep people from being honest, it will keep people from fully participating, and it will in many ways directly affect their own quality of care.

              So, what we have to do in order, I would say, to deal with what we know will be future temptations that will be overcome by whatever privacy concern is raised at the time is put rules in place today that say how information should be used, who should get access to it, under what circumstances, and that the societal expectations are etched in stone in our public policy and that they're enforceable.

              Now, we do have a legal response that is coming down the pike.  As you know, when the Portability Act was passed, it did put in this timeline, this deadline, for legislation or regulations.  Draft regulations were issued by the Secretary in the fall.  They're due to be finalized ‑‑ the health privacy regulations are due to be finalized in August or September of this year.

              I don't want to suggest by any means that they are comprehensive and that they are strong enough.  As I said to Senator Frist at the beginning, we'll take whatever we can get, but we're not going to be satisfied.  It takes us part of the way.

              It will cover health plans and healthcare providers that gather information from patients, and it will put rules in place about how that information can flow.  What it won't do is to cover those entities such as researchers or law enforcement officials or some that are doing research but aren't considered research under the federal laws that are gathering information from patients.

              Most of the health web sites right now will not be covered, because they are not considered in the traditional sense providers or plans.  Many of the genetic databases that will be in the commercial realm may not be covered, because they're not traditionally providers or plans.

              So, there's still a lot of work to do, and while we have seen a number of ethical codes and voluntary guidelines with an effort to head off, I would say, regulation, that self-regulation should be enough.  It is not enough.  What self-regulation does is it says to those of us that are concerned about these issues, yes, we know what the right thing is to do.  We, the good actors in this area, know what the right thing is, and we're willing to do it, but it doesn't bind the bad actors, and it is not enforceable as to anybody.  So, we can use them as a guide and as a model as to what we can put in place, but it's just the beginning.

              And I would also suggest that as we're building these rules and as we're looking to create a set of enforceable expectations for the public, that we try to distinguish between the information that is collected in the healthcare context, that we ask the question what is needed to treat the patients, what information do we have to have to treat people and pay for their care, and that we may have a set of rules for those kinds of uses.  But outside of those core healthcare purposes we should have, I think, a much more skeptical and careful eye that involves people directly in making decisions about how that information should be used.

              Thank you very much.

              (Applause.)

              SENATOR ROCKEFELLER:  As I said, we have microphones in the back.  Those aren't obviously convenient for everybody.  And I hope that you'll stay, because the questions, with all due respect to our four speakers, are usually the most interesting part.

              One I have here is, would you be in favor of legislation that would prohibit U.S. law enforcement agencies from developing and using information technology collection devices for genetic information?  That's not addressed to any particular person.  And then you can respond Kari.  One of the panelists has to respond.  That’s the deal.      

              DR. GOLDMAN:  I guess I'm warmed up, I don't know.  It's a very broad question, but I think that we need to be extremely careful.  I mean many states have already passed laws that are mandating DNA databases.  And, again, DNA databases are different than genetic databases.  They tend to have more of an identifier as opposed to medical information in them.  But they are intended to be collected by people who have already been convicted of crimes as a way of searching those databases in the event another crime is committed.  It's meant to deal with recidivism.

              And while I have many concerns about those databases, I think that law enforcement should not be in a position of collecting genetic information and storing it.  I think that is a totally different circumstance, and that there is just no justification for it.

              SENATOR ROCKEFELLER:  I think Mr. Stefansson wanted to respond to something that you had said.  So, you go ahead.  There was a disagreement between the two of you.

              DR. STEF