JOINT
MEETING
OF
THE FORUM ON
TECHNOLOGY & INNOVATION
AND
ALLIANCE FOR
HEALTH REFORM
This transcript was produced from tapes provided
by The Forum on Technology & Innovation.
P-R-O-C-E-E-D-I-N-G-S
SENATOR
ROCKEFELLER: ‑‑ through
what is a little bit of history here, because this is the first time we've ever
had a joint meeting of our Alliance for Health Reform and our IT Forum. And it's appropriate. Hopefully, it will happen again; I suspect
it will. And this is an absolutely
terrific group of people.
We're
going to be talking about medical privacy genetics and the impact of
technology. Obviously, that brings the
Alliance and the IT Forum, which are both 501(c)3s, together in a way that it
should. And as technology spreads in
both directions, as I said, I think we're probably going to be doing more of
this in the future.
Bill
and I, or Senator Frist and I, Chair both of the groups, so we're quite
comfortable here working together. Our
rules, for those of you who have not been here before, are that we take no, nor
do the Forums take any point of view on any issue. Senator Frist doesn't, I don't.
We simply facilitate. And then
we bring in speakers who have different points of view. They present those different points of view,
and then the whole burden is upon you through questioning, the green cards that
you have, microphones if we have them ‑‑ there's a couple in the
back ‑‑ is to challenge, is to go for it, just to probe them.
And
again, for those of you who haven't been to this before, our philosophy
underlying this is that the information best come from staff, because staff are
most likely to understand, and then we'll get aggressive with their senators
and congressmen, force them to engage themselves more in the issues.
So,
you're the audience; you're who we care about; you're why both of these groups
are formed, and we're very happy about it.
Today's
topic has not been addressed, really, with major legislation. In 1966, we even set a deadline to force the
creation of medical privacy rules, but we haven't done anything about it. We did that through the hip-up, but we
haven't done anything about that. We
haven't taken any action in terms of medical privacy.
So,
the reason completion of the mapping of the human genome raises the stakes even
more. And with a lot of promise that
comes from medical advancement also comes the whole question of what happens to
privacy, and that was in the papers this morning. It's on the minds of people ‑‑ genetic
discrimination, whatever.
It's
interesting that if you look at the 500 top Fortune Magazine companies, most of
them ask questions before they hire about medical history. Most of them do that. So, the question is what are they going to
have available to them before they make these kinds of decisions? What will be available to them, and will
genetic information be one of those things, which is available to them?
And
as more and more individuals use the Internet for health information services,
we have to be careful about on-line privacy.
Third-party selling is a fairly common thing. People say they don't, but they do. We've seen that, again, in the papers people who say they won't
do it or that's the whole opt-in, opt-out question. There's a very big difference, and the stakes, again, for privacy
are enormous.
So,
that's what we're going to talk about.
We've got excellent panelists to give you different points of view. We want you to be aggressive.
And
Senator Bill Frist.
SENATOR
FRIST: Thank you, Jay, and it is my
pleasure to welcome everybody, as well, to this combined meeting of the Tech
Forum and the Alliance for Healthcare Reform.
I am very excited about the next few minutes, the next hour, because,
again, as I've said before, in these meetings, one of the real gems, I think,
of the Alliance and the Tech Forum is that we're going to all leave here a lot
smarter than when we came in. And there
are many meetings in Washington, D.C. ‑‑ I won't say most ‑‑
but many meetings where you just can't say that. So, this is a great one.
For
the speakers, I'm going to go ahead and introduce the four speakers. And I apologize, we're at eye level now, so
I will ask the speakers in their initial presentation to use the podium just so
that people can see you make your presentation.
I
want to remind everybody to fill out their cards and or come to a microphone
with their questions.
I'll
be introducing the speakers in turn ‑‑ or in the order that they
will be presenting. First, Dr. Kari
Stefansson who is President, Chairman, and CEO of deCODE genetics. deCODE genetics is constructing a
comprehensive database of medical and genetic information of nearly every
citizen of Iceland under contract with the Icelandic government. Dr. Stefansson is both a medical doctor and
a Ph.D. research physician and has held faculty appointments at Harvard Medical
School and University of Chicago Medical School.
George
Lundberg is Editor in Chief of Medscape, a leading Internet site for health
information. A pioneer in on-line
medical information, he has been called the medicine man of the Internet. He has helped, having founded both Medscape
General Medicine and CBS Healthwatch.com.
Dr. Lundberg is a physician with academic appointments at Harvard and
Northwestern Universities.
Latanya
Sweeney is a Professor of Computer Science and Public Policy at Carnegie Mellon
University. She, too, is a pioneer, a
pioneer in the new area of computer science, termed computational disclosure
control. She is an expert in data
privacy and has developed systems to protect individual privacy for electronic
databases. Professor Sweeney has
lectured and written widely on data protection policy particularly as it
relates to personal medical information.
Janlori
Goldman is Director of the Health Privacy Project at Georgetown
University. I guess it was a year ago
Janlori that we were all together and heard your excellent presentation at that
time. She is an expert in medical
privacy, a forceful advocate of a more robust privacy protection in the
healthcare arena.
She
co-founded the Center for Democracy and Technology, which is a non-profit
organization committed to preserving free speech and privacy on the
Internet. She has also worked at the
Electronic Frontier Foundation and is a past Director of the Privacy and
Technology Project at the American Civil Liberties Union.
Our
plans are for each of our speakers to make approximately eight to ten minutes
of remarks. We will be putting the hook
out if you're up a grunt later than ten minutes, and we can fill in with the
questions if you are not quite finished with your presentation. We'll have a roundtable discussion following
their formal presentations, at which time we will entertain as many questions
as possible from the audience.
Let's
begin with Dr. Stefansson. We'll
proceed in the order that I introduced.
Dr. Stefansson, welcome, and appreciate you being with us today.
STEFANSSON: I would first like to begin by thanking the
senators for inviting me to talk here.
I want also to begin by making a little bit of a correction. The centralized database on healthcare that
is being constructed in Iceland ‑‑ according to law, that was
passed in Iceland in December of 1998, it is simply a centralized database on
healthcare information produced in the process of delivering healthcare. It is not a genetic database.
This
is terribly important and becomes sort of a central point in what I'm going to
discuss with you here today. I'm not
going to elaborate on technical terms or specifically what we are doing in
Iceland. I'm simply going to discuss
privacy in the context particularly of biomedical research, privacy as a right
balanced against what I believe are obligations when it comes to the healthcare
system.
And
remember that privacy, as it relates to what we can do in biomedical research,
may for some of us eventually decide between early deaths and longer
lives. It's nothing more and nothing
less. It may eventually have a bearing
on rights that at least in the minds of some transcends privacy in importance,
such as the right to life.
Having
said this, I want to emphasize that in my mind privacy is an important right
that should be cherished and should be protected.
The
basic issue when it comes to healthcare information used in the delivery of
healthcare, as well as in research such as genetic research ‑‑ and
privacy is a societal issue not a technical one. And keep in mind that this comes from the mouth of a man who runs
a company that, for example, wants to market technical solutions when it comes
to protection of privacy. It is an
issue of how society looks at a right to good healthcare in the context of our
obligation to make our contributions to the improvement of the same.
In
my mind, the debate on biomedical research and privacy is of utmost importance,
but it, I think, has been led a little bit astray by those who believe that the
right to ‑‑ that they have a right to healthcare that is completely
unequivocal, but we have no obligations, no particular obligations to
contribute information about ourselves to research and aimed at maintaining and
furthering the quality of healthcare.
And I will come back to this point.
It's a terribly important point and has to do with the balance between
our right in our society and our obligations.
There
are two issues that here are closely related but clearly distinct. The first issue is how we make sure that we
can make use of important data on healthcare and genetics to mindful new
knowledge about nature of disease and health in order to develop new methods to
diagnose and treat and prevent disease without violating privacy.
The
second issue is how and to what extent society decides to protect the privacy
of the clients of the healthcare system.
These
two issues constitute a specific case of the difference between generation of
knowledge on one hand and the use of knowledge on the other, and there is a
clear line of distinction between the two.
Let's
begin by examining the issue of the discovery of knowledge. There are two fundamental kinds of data that
are used in the act of discovering new knowledge in the nature of health and
disease. One kind is our data that
again ranges in the process of delivering healthcare for the purpose of
delivering healthcare. The others are
data that again ranges specifically for the purpose of research. These are totally different kinds of data
cells.
Let's
begin on the data collected in the process of delivering healthcare. These are data that are collected about us
when we enter hospital or clinics when we are sick. These data about us are then placed in the context of knowledge
that was discovered by taking advantage of healthcare data on people who came
before us.
It
is, however, by some considered to be a right to deny science to use
information about us to develop knowledge so that those who follow us ‑‑
our children and their children ‑‑ can enjoy the same quality of
healthcare as we do. Their right to
decline to have information about us used in biomedical research is considered
to be a part of our right for self-determination, a part of our autonomy.
Fortunately,
the voices that oppose the use of healthcare information produced in the
process of delivering healthcare using presumed consent have not prevailed. We have been allowed to use this information
without explicit consent.
And
I am pretty convinced ‑‑ I am absolutely convinced that if we would
have been held to the use of explicit, informed consent for the use of
healthcare information of this sort, we would not have healthcare as we know it
today. There is no question about it.
The
definition of the right to healthcare without obligation of contributing to
science, the information that is generated through the delivery and acceptance
of healthcare breeds predatory behavior.
You accept the gifts of others who came before you without any
obligation to contribute to those who follow you.
I
think it is clearly possible, I think it is almost certain that the great
English poet, John Dunn, was writing in anticipation of the debate on privacy
and healthcare when he composed the following poem of singular beauty. I have to recite at least one poem over
Americans. "No man is an island
entire by himself. Every man is a piece
of the continent, a part of the main.
When a clod be washed away, Europe is the less, as well as if a manor of
a friend or thine own were, every man's death diminishes me, because I'm part
of mankind. And therefore never ask for
whom the bell tolls; it tolls for thee."
Remember
that every time we go to the healthcare system and use its services, we are
benefiting from the fact that the bell has tolled for others who came before
us. And let's make it certain, let's
make it obligatory that when the bell tolls for us, it will be at least
potentially beneficial for those who follow us. I think it is very important.
I think this is a question of the connection between right and
obligation.
Let
us now look at the data that are generated for the sole purpose of doing
research. In my mind, such data should
not be generated without explicit, informed consent. And it was actually research of this sort, generation of data
like this sort, that was dealt with at the Helsinki Convention when it was convened
after the Second World War for the purpose of preventing the crimes committed
by the 3rd Reich to be committed again when people were under the disguise of
scientific research committing crimes.
The goal was to protect the autonomy of people. They should not participate in biomedical research
unless they wanted to. The instrument
that was instituted was informed consent.
And
let's now look at how the use of informed consent has sort of taken changes,
how it has evolved over the years. And
I'm only going to mention one of the developments, which I'm pretty concerned
about. This development has to do with
the fact that there's some part of the ‑‑ or a part of the
bioethics community has come to the conclusion that it is bad to allow people
to give broad consent. Possibly if I
would go to one of the senators and ask them to give me ten cc's of blood, to
isolate DNA, to look at variations in the genome to study one disease, the
senator would be able to give me such a consent. If I would ask the senator to give me consent to use the data
from this to study every disease, the senator would be told by the bioethics
community, "No, you cannot do that, because your consent would not be
informed."
So,
all of a sudden informed consent, which was an instrument has become a
goal. It was an instrument to protect
autonomy, but all of a sudden it has become a goal, and it's used to limit the
very autonomy it was meant to protect.
And I think it is terribly important to make sure that people can give
as broad consent for the use of data on them as possible if it doesn't
constitute a threat to their lives and if the only risk that is taken is
informational risk. I think this is
quintessential.
And
this brings me to the second of the two original issues I raised ‑‑
the issue of the difference between the generation of knowledge and the use of
knowledge. And when it comes to the use
of knowledge, I think it is probably best to give a real live example from
genetics.
There
are two breast cancer genes that have been discovered, and if you have a mutation
in either one of them, you have an increased probability of developing breast
cancer. And an awful lot has been
written in the American press about the possibility that insurance companies
will abuse this knowledge. They will
demand from women that a mutational test done before they're insured. And if they have mutation, they will either
be declined insurance or the premiums will be raised.
This
would, in my mind, constitute violation of two very important rights: The right not to know ‑‑ you don't
have to know if you don't want to whether you have a mutation or not ‑‑
and the other is the right to equal treatment irrespective of your genetic
background.
This
raises the question as to whether we should not have made this discovery,
because it can be abused. And my answer
to that question is it would have constituted crime against humanity to
suppress the discovery of these genes, because they will eventually be used to
save lives, save lives for women from bad diseases.
We
should, however, have very low threshold to pass a law to set regulation that
forbids the abuse of knowledge. But it
is very important to recognize that you are not going to control this world by
controlling the discovery of new knowledge, because new knowledge is new, and
you don't know what it is until you have it in hand.
In
conclusion, I'm going to sort of appeal or I'm—or my plea to these great
senators that we have here would be the following: Make sure ‑‑ do whatever you can to make sure that
scientists in the States, your wonderful country, will be allowed to use
healthcare information produced in the process of delivering healthcare with
the use of presumed consent. Also make
sure that ‑‑ and this is important remember, because it is never a
crime when this information is used to discovery new knowledge, and it is
almost always a crime when it is not.
Secondly,
make sure that broad consent will be allowed.
We should allow people to contribute as much as they can to this
discovery of new knowledge. And in the
end, you know keep in mind that compassion for the sick and the wounded is a
quality of good people.
Thank
you.
(Applause.)
SENATOR
ROCKEFELLER: Thank you, Dr. Stefansson.
Dr.
Lundberg.
DR.
LUNDBERG: Thank you very much, Senator
Rockefeller, Senator Frist, Mr. Howard, and members of the Alliance and the
Forum, fellow speakers, ladies and gentlemen.
I
appreciate the opportunity to participate in this Tech Forum today on the
important issue of privacy in the Information Age. Two currently very hot topics:
money privacy and medical privacy.
When
I was a child growing up in rural South Alabama, my father used to listen to a
great comedian on the radio called Jack Benny.
And I would listen to that Jack Benny and laugh with my father. And there's this marvelous joke that Benny
once told that has to do with this issue of money and medicine.
Jack
Benny is confronted by a thief. The
thief says, "Put up your hands."
He says, "I got them up."
The thief says, "All right, your money or your life." Silence.
He said, "Didn't you hear me:
Your money or your life."
Jack Benny says, "I'm thinking, I'm thinking." Okay, privacy ‑‑ money, privacy,
life, the very elements of medicine.
Some
human facts, activities or events are so personal, so private that many human
beings prefer that they not be shared with anyone else. But sometimes in the course of human
experiences it is essential to share such information with another who can be
trusted not to use this information against the person. The other to whom I refer has over centuries
come to be known as a learned professional.
Learned
professionals traditionally are physicians, attorneys, and the clergy, and they
exist because we as people need them from time to time to share the most
intimate knowledge of our lives, our minds, our bodies, and our souls. We must trust them then not to hurt us when
we are most humanly vulnerable.
Actions
such as doing surgery, defending a person in a lawsuit or hearing a confession
of sin are examples of professional actions.
But in the course of these activities information is generated. Information may even be a central element of
the professional relationships, because much of the practice is medicine is
only information.
I
have always taught my medical students, residents, and medical technology
trainees that the patient-physician relationship was sacred, hallowed, and
information exchanged between them was the business of only them. That I believe is ideal. What's reality? Fast forward.
Medical
records are kept, paper or electronic ‑‑ they are the same. Other human beings have routine access to
those records, not just the doctor and the patient. What are we going to do about that?
Insurance
companies pay bills. They demand to
know what they're paying for, like your mental illness. There are huge banks of patient-specific
data on your health if you have insurance.
What are we going to do about that?
Pharmacies
dispense drugs. Some are
disease-specific, like acyclovere for your herpes genitalia. There are huge databanks of pharmacies that
can tell who's taking what for what.
What are we going to do about that?
There
are national chains of clinical laboratories that do millions of laboratory
tests. The results say, yes, you have
syphilis today. And there are major
banks of that. What are we going to do
about that?
Most
hospital records are still on paper, and if you go to the record room ‑‑
if you haven't been to the record room of a local hospital, I invite you to do
that. If you go down to that record
room, you'll see mountains and mountains of piles of paper, about 20 percent
misfiled, and the lowest paid people in the hospital chain are taking care of
those records. Any one of those pieces
of paper could be copied by any one of them and sent anywhere.
There
are no secrets in hospitals. Sometimes
there are efforts to keep secrets. My
most famous pathology consult was when I was a professor at the University of
California doing consultative forensic toxicology, and the autopsy report that
I was asked to make an opinion on was on a patient from Memphis, Tennessee, Dr.
Frist, and his name was Erin Sivle.
Well,
the question was did Erin Sivle die of heart disease or of multiple drugs
effects, and a physician licensed in the state of Tennessee's license was going
to be dependent in part on the interpretation of this autopsy.
Well,
it turns out this was a primitive effort by a hospital in Memphis to conceal
the privacy information of a fairly famous person named Elvis Presley whose
middle name was Erin, and Sivle is Elvis spelled backwards. So, the autopsy record on Elvis Presley is
under Erin Sivle. A primitive effort,
but then again Elvis was fairly primitive himself, so I guess maybe that might
not be inappropriate.
And
now we have genetic tests, many, revealing the chemical essence of our very
being. Having grown up in the South, as
I indicated, I remember that we were taught that certain anatomic structures,
which many of you probably know about, were referred to as our "private parts." I would suggest that our genes are our
really private parts.
Now,
let's segue to what this conference is all about, including what I've been
talking about, and it's supposed to be about the Internet. Okay, I earn my living, such as it is, with the
Internet, but I would ask you to only think of it this way. The Internet is the medium; it is not the
message. It's simply a way of
transmitting information. Different in
some ways but the same as every other method of transmitting information. Don't get hung up on the Internet; it's just
a medium.
How
does one legislate or regulate or self-regulate the privacy of all this medical
information, which ought to remain private, be it spoke, written, electronic,
paper or Internet? Now, fortunately, the
medical Internet did not spring from a vacuum.
We
have five, rich, intellectual banks to draw from, from which we have developed
over the past six months the ethics of the medical Internet. These rich data sources are ‑‑
the medical Internet is medicine, so we work from medical ethics. It is journalism; we work from journalism
ethics. It is sometimes, and some of us
hope it will become more so, a business, so you work from business ethics, and
that's not necessarily inappropriate to put those two words together,
"business ethics;" there are such things. And the medical Internet is medical journals, so you work from
the ethics of medical journals. And
it's also medical education, so you work from the ethics of continuing medical
education.
A
handout that was referred to that you will have at the door ‑‑ I
brought 50 copies, but there are many more of you than that for which I
apologize I didn't bring more than that ‑‑ but it will have
printouts of key parts from the Counsel on Ethical and Judicial Affairs, the
American Medical Association, which creates the ethics for American medicine,
the International Committee of Medical Journal editors, which creates the
ethics for how medical journals ought to behave, and then it has the key parts
on privacy from the EHealth code that we rolled out here on Capitol Hill about
a month ago, and another one from High Ethics that was rolled out in San
Francisco from a slightly different group about a month before that. And finally, a set of the privacy rules from
the Journal of the American Medical Association written almost entirely by my
former staff, since I worked there for a long time in JAMA, and published in
May of this year.
All
of these are there. They deal with very
specific areas of ‑‑ now, to the table of contents of the book of
medical ethics from AMA. There are more
headings and more policies regarding confidentiality and privacy than anything
else. It's the top one. This has been around as an issue for a long
time, a lot of people thinking about it.
On
a final statement, you go on our site, www.medscape.com, and you see our
privacy policy, which is relatively simple.
It says, "Medscape does not provide or release names or e-mail
addresses of members to any third party without the member's explicit
permission. Medscape does not and will
not use cookies or any other technology to track or report on member activity
when they are not on Medscape nor pass member data on to other web sites.
Finally,
our Company now merged, called Medical Logic Medscape, favors regulation and
legislation regarding privacy of medical records and has expressed willingness
to work with the Congress to develop such if the Congress wishes us to. We only have one stipulation: You don't just regulate the Internet, you
regulate information no matter which way it might be transmitted.
Thank
you very much.
(Applause.)
SENATOR
ROCKEFELLER: Thank you, Dr. Lundberg.
Professor
Sweeney.
PROFESSOR
SWEENEY: Since I'm using technology
here, they told me to stand over here.
I
want to thank you for this opportunity to be here and to address this
issue. The primary thing that I'd like
to add to the mix in terms of my introduction is basically what is going on
with data now, data that are publicly available, that is data that you or I or
anyone else pretty much can get for a nominal amount of money or semi-publicly
available meaning that there's a slight barrier? That barrier might be an additional fee, but for the most part
it's pretty regularly available. What
does it look like in the nature of health data, and what happens when that
health data meets genetic data?
I'm
primarily just going to make ‑‑ due to brevity of time, I'm just
going to make sure that I just make only brief major points that we are under
data surveillance, there has been a tremendous explosion in personal collected
data, and some of the problems that result because of an inability to
understand the technology by our policies and practices, and what happens when
genetic data is added to the mix?
A
couple of years ago in an attempt to characterize the amount of information
that's been collected on individuals, I introduced a new term called "disk
storage per person." This is
basically the amount of rigid disk drive space sold in a year divided by the
adult world population in that year.
Currently, what you see on the far left is a graph of that over
time. The elbow happens around the
mid-1990s. We're well on the
exponential growth. This chart happens
to correlate with access to inexpensive computers with large storage
capacities, which tends to be bringing forth this revolution.
I'm
going to just quickly show you just in brief quick instances from the state of
Illinois how this has made a dramatic increase in data. At the time I was born, as well as almost
everyone in this room, this was the sum of all the ‑‑ these 15
fields were all the fields that were collected on your birth certificate. Today, in almost every hospital in the
United States the following over 200 fields are collected on each birth, each
time, stored in a database, and that's 226, and in some cases made available on
the Internet. These are from on-line
birth certificates from the state of California in certain counties.
Another
explosion that we've seen is in hospital visits. Hospital data doesn't stay in the confounds of the hospital nor
is it only located with the insurance company.
And just to give you a sense of that, in over 40 states in the United
States a copy of the hospital information, like the fields that I'm beginning
to show you now, are collected on each hospital visit and then made publicly
available. It includes a patient
demographic, such as age or in this case data of birth, along with various
diagnosis and procedure codes and various charges that are specific to the
care.
We
can all relate, of course, to grocery data; that is, the grocery store can know
if we use their loyalty card, exactly what we purchased, and so forth. And those are only three very simple
examples. They don't include visual
data surveillance and others.
Let
me give you a sense of what this means.
What kind of problems does it really mean for trying to look at this in
terms of policy? One of the confusions
is that all of our laws, practices, and regulations continue to be confused by
the idea that if I remove the explicit identifiers, such as name, address or
security number, or somehow encrypt them, that the result is anonymous. And, so by those policies, regulations, and
so forth, we would consider these three fields sufficiently anonymous to be
made publicly available. And you may
believe that especially if I tell you these three fields are part of a very
large and diverse database.
But
if I subsequently tell you that 33171 is a zip code primarily of a retirement
community, then there are going to be very few people of such a young age
living there. 02657 is the zip code for
Provincetown, Massachusetts, and reportedly there are only five Black women who
live there year round. 20612 may have
only one Asian family. And notice this
information outside the data that helped to identify these individuals.
Let
me give you a very quick example from the state of Massachusetts. The Group Insurance Commission is the group
responsible for purchasing insurance for state employees, their families, and
retirees. They collected the type of
data that I showed you earlier ‑‑ this is a subset of those fields ‑‑
and then copies were made available to researchers, and additional copies sold
to industry.
For
$20, I went to the City of Cambridge and purchased the Cambridge voter list,
and it came on two floppy diskettes. In
fact, all of the examples that I'll be giving you are examples that use only
standard computer technology with standard office shelf software. The voter list, as you can see, also has a
zip code, birth date, and gender, along with the name. Clearly, the idea is to take this believed
to be anonymous data and reidentify it by linking on zip code, birth date, and
gender.
The
question, of course, is how unique would such a linking be? Cambridge, Massachusetts is a little unusual
in that it houses both MIT and Harvard, so there's a skew in the population to
the early '20s. But even with that, the
numbers are quite revealing. Birth date
alone, that's month, day, and year of birth, is unique for 12 percent of the
population. That means when those
people go and they visit a web site and the web site only asks what city do
they live in and their birth date that could be enough to uniquely identify
them. Birth date and gender, 29
percent; birth date and the five-digit zip code, 69 percent; birth date and the
full postal code, 97 percent. Note that
this is only one- and two-way combinations, not three-way and beyond.
I
chose Cambridge, Massachusetts because William Weld was the Governor of
Massachusetts, and he lived in Cambridge.
Only six people ‑‑ and his medical data was in the GIC
release. Only six people had his birth
date, only three of them were men, and he was the only one in his five-digit
zip code.
In
subsequent experiments, the numbers have been replicated throughout the United
States, and recently we tried to ‑‑ we went to figure out how many
people in the United States are identifiable on which characteristics. We found that 87 percent of the U.S.
population is identified uniquely by birth date, gender, and zip code,
five-digit zip code; in some cases, the entire state.
This
is another quick example. This is a
release under a four-year request from a release from a cancer registry in a
particular state. The data that you see
under diagnosis and zip code has been made up to protect both the identity of
the state as well as the identity of the patients.
This
was supposed to be one of the most difficult cases to reidentify, because
neuroblastoma is a cancer found primarily in children, and not only that there
is no ICD-9 diagnosis code that is neuroblastoma. So, even if you get the health data, you can't look and say who
had neuroblastoma. It can only be
inferred from a preponderance of care and a preponderance of diagnoses.
The
diagnosis data is only the month and year of the diagnosis and the five-digit
zip code. The copies were released to
me from the cancer registry, and after again using standard software and
publicly available data, I was able to reidentify them with 100 percent
accuracy.
You
might say how did I do it? I sort of
put together this chart. Any path from
the top of the chart down to the bottom is a possible way to reidentify those
children.
Another
problem that I see a lot in the data sharing is that a tremendous amount of
attention is given to the person who collects the data. So, in hospital data, it's usually the
hospital. And we try to put boundaries
around who it is that they are allowed to give the data to, and then after that
we don't care. And, so we see a lot of
uses where then subsequent releases of the data are totally not controlled at
all nor is any attempt made.
That
gives you a sense of the space of the kind of problems that show up.
In
more recent work, I've been working with Brad Malen who is a graduate student
of mine, a very gifted graduate student at Carnegie Mellon. We've been looking at the, what we call,
second-generation DNA databases. These
are generations of databases that are appearing throughout the United States in
hospitals. And there have been a lot of
discussion within the medical community should they be considered a part of the
medical record and therefore just distributed the way hospital data is distributed
or should they in fact just be distributed by themselves as a research database
autonomously? And after all, how could
they be reidentified?
And,
so what we've been doing ‑‑ this diagram shows on the left side the
privately held data, on the right side, the data that would be found in the
public that's publicly available. So,
one thing that we sort of, I think, you know, intuitively but one of the things
we had to quantify is how much additional risk is brought in when DNA data is
added? So, we have some measurements
called "gross maximal risk," and what have you. You can see the numbers. This is again from the state of Illinois.
The
public health data is the current risk of the society in the state based on
their practices that they currently engage in.
And when the DNA data is also released, you can see how the graph grows
quite large. There are a lot of still
privately held data that's right in the middle of changing, getting ready ‑‑
where people are getting ready to release it.
The
last thing I want to point out is that we've been looking at this program
called Clean Gene, which is a program that we've created that one time infers
how is it that if you have only a DNA database that you could actually
reidentify the person who's the subject of the DNA sequence?
And,
so what happens in step one is we may or may not be able to ‑‑ we
usually are able to identify gender to the DNA sequence. What happens in step two, depending on how
it was sequenced, we basically can infer particular diseases, which in fact was
the reason that the DNA was collected by the hospital in the first place in
these second generation databases. And,
what's happening in step three and step four is our linking basically to the
hospital data that I described earlier, such as the GIC data.
And,
so we've been able to show that we can actually go both ways. We can take the DNA sequence data and infer
what would have to be true in the health data, and we can take the health data
and draw inferences and limit which DNA consequences most possibly match it.
So,
in closing, I just wanted to say ‑‑ make three points. One is that we are having an explosion in
data, that there are a lot of problems with our policies and practices, because
we don't really understand the identifiability of data, and that when genetic
data is added to the mix, it does increase the risk tremendously.
Thank
you.
(Applause.)
SENATOR
FRIST: Professor, thank you very
much. That was quite enlightening. As we were sitting up here ‑‑ I
mean everybody in this room was saying that somebody out there is looking at
them this very second as we go through.
But
thank all four of our speakers. Oh, of
course, our veteran here who has been probably the most visible on this
particular topic in the three years that I've been dealing with genetic
information, probably the most visible and most active, a real advocate as
somebody who has testified is the veteran of our panelists here, Janlori
Goldman.
DR.
GOLDMAN: Well, Senator, I have to say
you played on one my greatest fears which is that by inviting me back you were
hoping and praying that I wouldn't say the same thing, that I might have
something new to say, but maybe with the assumption I was going to just repeat
myself, we can get right to the questions.
But
I have to say there is nothing better than following Latanya Sweeney on a panel
‑‑ nothing. Because what
she does, I think, is make all of us get in our gut, that no matter what we
know, no matter how many articles we read, no matter how much we study and
research ourselves, what she is doing is proving in some ways what our true
fears are, that no matter how anonymized we think the databases are and no
matter how many promises we hear, and we hear many promises ‑‑
don't worry, trust me, it's going to be fine, I'm only going to use it for this
one thing, it's never going to be used for another purpose, it will be
absolutely non-identifiable.
If
any of you have dealt with this issue as a staffer or as I have as an advocate,
you hear that a lot, and I think Latanya's work is critical in helping us
understand that we are still vulnerable.
The
Health Privacy Project, as some of you may know, was created a number of years
ago to try to create greater privacy protections in the healthcare area, and
that privacy ‑‑ our view is that privacy is critical to improving
the quality of care in this country and to broadening access to care. What we want to do is to provide the
greatest resources for you in looking at this issue and to provide the kind of
information that you need to make good policy judgments.
What
we've done is develop a set of best principles that we did with a working group
of diverse stakeholders. We did a
survey of state health privacy laws, health privacy statutes. We have put together a primer for consumers
on health privacy. All of these reports
are available at our site, at healthprivacy.org. Feel free to read them, download them, share them, do with them
what you will.
And
we also start from the position that the technology that is available today
should be used to harness the opportunity that we have to protect privacy to a
greater extent, to put better security in place, and that while certainly there
are greater risks and the magnitude of the risks are much greater with the
Internet, with databases, with genetic testing, that we can also use that as an
opportunity to build privacy in up-front into our policies and into our
technologies.
Another
thing that we found in the last few years in trying to be more focused on
creating an empirical basis for understanding how the lack of privacy affects
healthcare and affects how people see care, is we've been involved in a number
of studies, empirical studies, polling to try to understand the impact.
And
what we have found in a very broad sense is that about one out of every six
people will do something when they're seeking healthcare or deciding whether to
seek healthcare to protect their privacy.
That out of fear that the information may fall into the hands of
employers or insurers or family members or that they may just be in some ways
embarrassed by the release of certain information or especially with genetics,
that it may affect their family members, that the release of their information
may somehow affect future generations, that people are withdrawing from full
participation in their own care.
They're
leaving information out when they see their doctor, or they're paying
out-of-pocket for certain care that they're entitled to reimbursement for. Maybe they're afraid to seek care at all,
testing, especially for stigmatized illnesses and conditions.
We
know this about mental health, we know it about communicable diseases, certain
kinds of cancers. We certainly know it
in the genetics area that the lack of privacy is a major barrier to people
seeking care. In the genetics area, the
studies have shown that the lack of privacy is the number one barrier today to
people seeking testing and counseling.
There
was a recent CNN/Time poll that showed that half the people in this country
think that mapping the human genome is immoral. Now, that's terribly troubling given the incredible advance that ‑‑
scientific advances and medical advances that can come from that. But 75 ‑‑ and I think there's a
link here ‑‑ 75 to 80 percent of the public are worried that the
information will be used by insurers, will be used by their employers to make
decisions to deny them jobs and deny them benefits.
So,
the Congress has been working, I think, pretty hard the last few years to try
to pass anti-discrimination legislation targeted at genetics to try to create
protections and employment and insurance.
And I think that's critically important, but it's only half a solution.
We
can tell employers and we can tell insurers that they can't discriminate on the
basis of certain information, but the temptation will still be there. The risk will still be there as long as we
allow that information to get into their hands. If we put privacy protections in place and say, "You don't
have a need for this information, you shouldn't have access to it in the first
place," then we create, I think, a much more comprehensive set of
protections.
Now,
the Internet is a whole other problem, because even though, as Dr. Lundberg
said, it's just the medium, let's not kind of attack it as the message, the
truth is the Internet is different that it is built into the capacity and
design of the Internet to gather information invisibly and seamlessly.
There
was an article, and there's an article everyday, but I can just talk about
today's piece in the front page of the Washington Post, about how there is
software built into it. We have
designed some software in here to let us know what you're doing and what
products you're using and what's useful to you. So, the Internet is different, and it does pose some serious
challenges.
We
also know from polling that's been done that lack of privacy is the number one
barrier keeping people off the Internet.
And in a study that we did that came out in February, we looked
specifically at health web sites. We
looked at their privacy policies and their privacy practices, and we found
across the board that the privacy policies were inadequate, and that even where
they did exist the practices were inconsistent with what the sites said they
were actually doing. So, information
was being gathered when the policy said we are not gathering or sharing
information. Information was being made
available to others without people's knowledge, without their permission.
And,
so even in this area of high sensitivity, even where we have opportunities to
improve health through the Internet by making information more accessible to
consumers, by allowing people to talk more freely with each other with other
people who might have similar conditions, allowing people to buy prescriptions
on-line and I think with some illusion that there's anonymity, we see some very
serious vulnerabilities that have been not, I think, been addressed.
I
want to just suggest that we have actually learned some lessons that while we
have a lot of information now about some of the vulnerabilities and some of the
risks, there are some lessons that we've learned from looking at how past
privacy issues have been addressed that might be helpful to us.
We
can say, at least I think I can say, maybe you'll agree, that there will be
temptation to use information that was gathered for a very specific purpose ‑‑
there will be temptations to use that information in other ways that were not
anticipated at the outset, that were not thought about at the outset, and
certainly that were not kind of publicized or made clear to the consumer at the
outset, and that those temptations will almost surely overcome whatever privacy
interests might be raised.
And
I'm just going to put out two examples:
The Frammingham Heart Study, which again was gathered in a research
context for public health purposes, has recently been made available for
commercial purposes. There's a debate
going, an ethical debate, about should we notify the people who participated in
this study? There's genetic information
in that study. Should we notify
them? How can we notify them? Should we get their permission? The initial consent form did not anticipate
commercial use of the data. They're
telling people that it's going to be anonymous. How do we know?
Second
example, I think would be the Icelandic database that again gathered for a
particular research purpose. That was
the expectation of the population of Iceland.
And while some of us might be moved by this notion of individuals
contributing to the greater good, how can we stand in the way of contributing
to the greater good? If we know that
people's fears about how their information will be used will keep them from
fully participating in their own care, that lack of privacy, that lack of trust
and confidence will keep people from being honest, it will keep people from
fully participating, and it will in many ways directly affect their own quality
of care.
So,
what we have to do in order, I would say, to deal with what we know will be
future temptations that will be overcome by whatever privacy concern is raised
at the time is put rules in place today that say how information should be
used, who should get access to it, under what circumstances, and that the
societal expectations are etched in stone in our public policy and that they're
enforceable.
Now,
we do have a legal response that is coming down the pike. As you know, when the Portability Act was
passed, it did put in this timeline, this deadline, for legislation or
regulations. Draft regulations were
issued by the Secretary in the fall.
They're due to be finalized ‑‑ the health privacy
regulations are due to be finalized in August or September of this year.
I
don't want to suggest by any means that they are comprehensive and that they
are strong enough. As I said to Senator
Frist at the beginning, we'll take whatever we can get, but we're not going to
be satisfied. It takes us part of the
way.
It
will cover health plans and healthcare providers that gather information from
patients, and it will put rules in place about how that information can
flow. What it won't do is to cover
those entities such as researchers or law enforcement officials or some that
are doing research but aren't considered research under the federal laws that are
gathering information from patients.
Most
of the health web sites right now will not be covered, because they are not
considered in the traditional sense providers or plans. Many of the genetic databases that will be
in the commercial realm may not be covered, because they're not traditionally
providers or plans.
So,
there's still a lot of work to do, and while we have seen a number of ethical
codes and voluntary guidelines with an effort to head off, I would say,
regulation, that self-regulation should be enough. It is not enough. What
self-regulation does is it says to those of us that are concerned about these
issues, yes, we know what the right thing is to do. We, the good actors in this area, know what the right thing is,
and we're willing to do it, but it doesn't bind the bad actors, and it is not
enforceable as to anybody. So, we can
use them as a guide and as a model as to what we can put in place, but it's
just the beginning.
And
I would also suggest that as we're building these rules and as we're looking to
create a set of enforceable expectations for the public, that we try to
distinguish between the information that is collected in the healthcare
context, that we ask the question what is needed to treat the patients, what
information do we have to have to treat people and pay for their care, and that
we may have a set of rules for those kinds of uses. But outside of those core healthcare purposes we should have, I
think, a much more skeptical and careful eye that involves people directly in
making decisions about how that information should be used.
Thank
you very much.
(Applause.)
SENATOR
ROCKEFELLER: As I said, we have
microphones in the back. Those aren't
obviously convenient for everybody. And
I hope that you'll stay, because the questions, with all due respect to our
four speakers, are usually the most interesting part.
One
I have here is, would you be in favor of legislation that would prohibit U.S.
law enforcement agencies from developing and using information technology
collection devices for genetic information?
That's not addressed to any particular person. And then you can respond Kari.
One of the panelists has to respond.
That’s the deal.
DR.
GOLDMAN: I guess I'm warmed up, I don't
know. It's a very broad question, but I
think that we need to be extremely careful.
I mean many states have already passed laws that are mandating DNA
databases. And, again, DNA databases
are different than genetic databases.
They tend to have more of an identifier as opposed to medical
information in them. But they are
intended to be collected by people who have already been convicted of crimes as
a way of searching those databases in the event another crime is committed. It's meant to deal with recidivism.
And
while I have many concerns about those databases, I think that law enforcement
should not be in a position of collecting genetic information and storing
it. I think that is a totally different
circumstance, and that there is just no justification for it.
SENATOR
ROCKEFELLER: I think Mr. Stefansson
wanted to respond to something that you had said. So, you go ahead. There
was a disagreement between the two of you.
DR. STEF