Economics of Information Security Paper Reviews and Notes
Economics of Information Security Paper
Reviews and Notes
Below are my
write-ups and notes for the papers I've been reading in the "Economics of
Information Security" class I'm enrolled in. I'm guessing most of my readers
won't get much out of them unless they have read, or plan to read, the same
papers. More to come as the class continues.
Week 1
This write-up is for three documents, which will give
numbers to for clarity when I refer to them later in the assignment:
1. Commercial Data Privacy and Innovation in the Internet Economy: A Dynamic
Policy Framework
2. Information Privacy and Innovation in the Internet Economy
3. Federal Trade Commission privacy report.
We were asked to answer three open
questions. Here are my choices:
1. Should baseline commercial data privacy principles, such as comprehensive
FIPPs, be enacted by statute or through other formal means to address how
current privacy law is enforced?
This is labeled a as 1.a in document
1 (it’s also in document 2), but my answer applies to all of the items under
section one in some part. While I would like to see my own personal data being
treated with respect in terms of my privacy, I’m not sure the FTC or a
government entity would set reasonable rules. Even common language terms seem to
differ, as mentioned in our first class (using the term “cyber” is often
regarded with derision in hacker/security practitioner communities). I don’t
have a lot of faith it politicians nor bureaucrats to understand technology well
enough to make informed decisions. That said, no other entity may have the power
to force commercial interests to “play fair”. I’m not sure a free market
solution would work in this case since most people seem to care little about
their own privacy until something egregious happens (merely see the actions of
most social network users). I err on the side of not giving out information, but
I also know this is not a workable option for some (or even I at times). I guess
I just don’t have a good answer to privacy problems in general, and anything
coming from politicians or bureaucrats I’d like to give the hair eyeball before
I support it.
5 (Doc 2). What is the best way of promoting transparency so as to promote
informed choices? The Task Force is especially interested in comments that
address the benefits and drawbacks of legislative, regulatory, and voluntary
private sector approaches to promoting transparency.
The problem with informed choice is
that it requires the consumer to take action to become informed. As the old
saying goes, you can lead a horse to water but you can’t make them drink.
Besides the obvious benefit of making sure a privacy policy is prominently
displayed and easy to find, it might be useful to control how policies can be
changed. When you start doing business with a company, let’s say Facebook for
example, you may have a certain privacy policy in place. However, an
organization may change their privacy policy after a consumer starts doing
business with them in such a way that if the consumer had known, they would not
have agreed to do business with then in the first place. As such, it would be
helpful if notifications of privacy policy changes had to be made in advanced so
that a consumer has a chance to stop doing business with the company before the
policy takes effect. To be effective, along with advanced notification, some
measures would need to be taken so that the organization must expunge data about
the consumer after they have left. Without this expunging of data, the
consumer’s privacy may still be violated by future actions on the part of the
organization.
20 (Doc 2). Are technologies available to allow consumers to verify that their
personal information is used in ways that are consistent with their
expectations?
21 (Doc 2). Are technologies available to help companies monitor their data use,
to support internal accountability mechanisms?
While we were asked to answer only
three questions, I have put 20 and 21 together as they can be seen as closely
related. The crux of the two questions could be restated as: Do technologies
exist that allow the auditing of who accesses information. If a company can
monitor for how data is used, then they should be able to allow consumers to see
the filtered reports of data use for just their records (with some pain,
granted). This is why I’m treating these two questions as one. Do technologies
for data use monitoring exist? Certainly, to name a few: Files access logs (if
enabled), ACLs, data exfiltration prevention systems (ZixMail for example),
policy to disable saving to removable storage, even simple access control lists
could be seen as such technologies. The important thing to ask is how effective
are these technologies, how easy are they to implement, how difficult are they
to get around, and how comprehensive are they? Measures can be taken inside of a
company to track where data goes, and who accesses it, but there are so many
variables that I would doubt that any system could reliably say “we know where
the data has been used, how, and who has accessed it”. There are so many ways
data can be exfiltrated that blocking all possible avenues seems unlikely.
That’s not to say that reasonable precautions should not be enforced, but
there’s not much that can be done to keep an insider threat from bringing in a
camera and photographing the data even if other vectors are locked down. I guess
it all depends on expectations of the person asking the questions above.
Week 2
This write-up is based on the following two readings.
1. Why Information Security is Hard - An Economic Perspective
Ross Anderson
2. The Economics of Information Security
Ross Anderson and Tyler Moore
I plan to make it a general
commentary on some of the points that provoked thought. Both papers focus on
decision factors that may affect security in a negative way. Ultimately, if we
assume the decision maker is self-interested and rationale there are certain
goals they will pursue that do not take security as the first and foremost
factor. This is not necessarily a bad thing, businesses exist to make money and
a business that is not profitable may not exist for long no matter how secure
its products are. Also, people decide on what to buy based on the utility it
gives them, or stated another way, its usefulness. A stone is far more secure
that my smart phone, but not nearly as useful for information exchange (unless
you count throwing it at someone). Put another way, some factors precede
security in the hierarchy of needs. Now, most of the above assumes the decision
maker is also looking out for other stakeholders (vendor wants to sell a product
consumers will by, consumer want to buy something they see as useful); however
perverse incentives may cause further problems when an agent wishes to do
something solely out of self-interest.
A few of the given example of
economic incentives that trump security:
1. The majority of cost of the security measure is bore by someone other than
the decision maker (an externality). An example given is anti DDoS software on
end user’s computers. My thoughts: This seems like a weak example to me since
most anti-malware packages are intended to look for the sort of bots used in
DDoS attack. Granted, the person installs the anti-malware package to protect
themselves, but they do protect others/the network as well by installing it. A
positive externality.
2. Being first to market, and the advantages thereof, may take precedence over
security. Ship now, patch later. My thoughts: I agree this is an incentive that
gets in the way of security, though I think the advantage of being first to
market may be over played. Not all companies that are first necessary lock up
the market. For example, these are a few technologies where the first to market
did not end up being the leader for very long: tablets, PDAs, Social Networks,
Personal Computers. I’m actually trying to think of someone that was first to
market that turned up holding onto the dominant position for more than five
years.
3. Vendor lock-in caused by closed or less than documented protocols and
specifications. My thoughts: I think this is less of a problem than it once was.
It seems to me that others have gotten better at reverse engineering protocols
and making things interoperate. I’m more worried about laws like the DMCA or
patent trolls causing it to be illegal to reverse software/electronics for
compatibility (or just knowledge). Then again, the first paper was written about
ten years ago, so the landscape was a bit different .
I do think things have swung somewhat
the other way since the first paper was written. With all the problems cause by
ad/spyware/worms/etc. security has come to be seen as a market differentiator by
some users who aren’t even that techie. This is one reason the use of Firefox
has taken off, and one of the reasons some people buy Macs (though I have my
doubts about the inherit security of OS X being the reason, its niche status
until more recently meant malware writers had less incentive to target it).
Also, perhaps for PR reasons Microsoft seems to have improved greatly, sometimes
to the point of problems with backwards compatibility and ease of use (Use of
DEP, drive signing, UAC and other features in Vista and newer for example).
In the second paper, I wish they had
gone more into how/why UK banks are spending more than US banks on security
measures. Something to do with wanting to make the machines “so secure” that
they can place blame for any fraud at the feet of the users? I also think the
second paper seems to miss the point of why some P2P systems did not take off
(Eternity, Freenet, Chord, Pastry, and OceanStore) and others did (Gnutella and
Kazaa). I don’t think it was about forcing you to share resources that made the
first group less popular, Bittorrent encourages you to share resources to get
faster downloads and it has taken off quite well (understatement). The first
group (Eternity, Freenet, Chord, Pastry, and OceanStore) are about
reliability/scalability/privacy issues, something most Internet users would not
care much about. The second group (Gnutella and Kazaa) is largely about getting
free stuff, which is widely popular with just about everyone.
The second paper also seems to
indicate OS 10 is of FreeBSD lineage, which is not quite right (though
admittedly the Unix family tree is a bit like a genealogy from Arkansas).
I’m also not sure the car analogy in
the second paper really applies. With cars, there is a very significant cost
associated with duplication (making more than one car). Making a good car means
more than just good design, but also good parts that must be duplicated. It
makes sense that the producer would not want to produce a better car than they
can sell at profit. With software, pretty much all of the cost is in R&D/coding
and unfortunately perhaps marketing, duplication of the software is almost
negligible (media pressings in large quantity, online distribution, etc.). A
software company can spend time on the security and reliability of the product,
and use that as a market differentiator to demand a premium price.
Page 8 of the
second paper also seems to indicate that making open bug markets would encourage
people to find more bugs. I have a hard time seeing how this can be a bad thing
in the grand scheme. Better some code debug wiz makes money off of selling the
bug to the company that made the product or to a security product related
company, then to keep it to themselves or sell it to a malware producer.
Week 3
For this week the readings are as follows.
1. System Reliability and Free Riding
Hal R. Varian
2. Incentive-Centered Design for Information Security (abstract)
Rick Wash and Jeffrey K. MacKie-Mason
3. The Changing Nature Of U.S. Card Payment Fraud: Issues For Industry And
Public Policy
Richard J. Sullivan
4. Information Security Policy in the U.S. Retail Payments Industry
Mark MacCarthy
I will focus on papers one and two,
with perhaps some influence from the others.
For the first
paper, I take it there are some people that find the economic ideas easier to
express using algebra and calculus than in examples? Perhaps I can manage a
synopsis of the three prototypical cases, using my own examples, please let me
know if I get it right:
Total effort: The reliability of a system comes from the combined efforts of all
responsible individuals. For an example in the security field I’d need to come
up with a scenario where the work is similar across the board, no matter who is
contributing. At first I thought “analyzing a protocol for security”, but I can
see where this could fall into the “Best shot” case if one individual is far
beyond the rest in a particular expertise. Perhaps using the example of a mix
net for privacy would be an example? Even if one node only puts in a few
resources, just having another node increase the anonymity set even if the
contribution is far less than other nodes.
Weakest link: The reliability of the
system depends on the individual that does the least effort. I could see a
scenario for this much easier. Let’s say you, PayPal, some online vendor and the
credit card company are all responsible for protecting you from identity theft
involving your credit card number. The one with the weakest security may likely
be the one where the number becomes compromised, especially if an attacker knows
which link is the weakest so they can target it first. This is sometimes
referred to as going after the low hanging fruit in my circles.
Best shot: The reliability of the
system depends on the person that does the maximum effort. I’m having a tough
time thinking of a scenario where this would truly fit. So many things are
dependent on the efforts of more than one entity. I guess the example of an
encryption key that is split amongst multiple parties (keys to the root DNS
servers are an example of this as I understand) might work, as long as a single
entry holds out and does not expose their key the system is safe. Then again, if
the system implementers screwed up, maybe the attacker could get away with fewer
keys to crack a system, or find some side channel avenues of attack. I think the
best way to use the “Best shot” prototypical case is when explaining why
something is not “Best shot” even when people treat it like it is. For example,
some treat their networks like a “Best shot” prototypical case, relying on one
or two things to stop an attacker (Example: Hey, I don’t need to worry about
securing this box, the firewall will block it). Without “defense in depth”
networks can become like candy: hard coating, soft gooey center.
Not for the third paper. I was
exposed to quite a bit about US law and banking/credit companies by reading this
paper. Some of the PCI information I had encountered before, but quickly forgot
after the job interview was over. Many of the rules are of interest to me form
the stand point of how to balance flexibility with workable requirements. For
example, if the rules merely say data must be encrypted, can I use the Caesar
Cipher (I ROT26ed this whole write-up) and still be in compliance (I assume PCI
requirement 4 has a footnote against this)? On the opposite side, does it always
make sense to force someone to run Anti-Virus software (PCI requirement 5)? What
if the system is so obscure that there is no AV package for it, or the AV
software is almost useless to the point of just being there to take up CPU
cycles? PCI requirement 11 comes up a lot amongst security folks when they talk
about what is a pen-test vs. a vulnerability scan.
Other random thoughts and question
concerning paper three:
As a side note, how is the CVV
calculated? If it is mathematically based on the card number and expiration date
(which may be guessable), can the CVV just be calculated if the attacker has the
information? I need to look up more information on how this is done, as the
paper does not seem to say.
Associating the cost with the breached entity may be hard. Sometime you know who
let the PII slip, but I imagine most times you don’t. An attacker grabs the
account information one place, and uses it at another. I guess the issuing
companies can look for where the cards have been used in the past to see if they
can correlate the suspected leak.
I like that the paper mentions the
costs to the card holder that are not directly financial. The card holder may
not be liable for false charges, but they still pay for credit card fraud in
time, worry, identity management and other costs.
Page 28 mentions the cost of PCI
being less for smaller businesses. Is this percentage wise, or merely dollar
amount? I imagine smaller businesses could have a higher cost percentage because
of their lack of infrastructure (or this could make it cheap to switch in some
cases).
If merchants think that PCI is
offloading security costs onto them instead of the credit companies, they may be
right, but what are their expectations for the credit companies to fix security
issues? Changes in the credit company’s security infrastructure would seem
likely to trickle down to the merchants in the cost of new equipment needed to
use the card system (chip and PIN for example) .
The paper seems to focus on card
present transactions, almost exclusively. I almost never do these card present
transactions, just online shopping. What are they doing to protect this online
activity? I suppose they could issue smartcards which can display one time PINs
via an LCD or e-Ink, but that would raise the costs of the cards.
Week 4
As I’m not sure
which papers, if any, this assignment is meant for I’ll just try my best to
answer the questions. I have something else written for the papers assigned so
far, but those may be reassigned for next week so I’ll hold onto it for now.
What are transactions costs, production costs and network externalities?
I’ll use buying a PC as my example
case for illustrating the concepts as best as I can, and as best as I understand
them. Transactions costs are the cost incurred trying to get the product from
producer to consumers that are institutional in nature and don’t directly figure
into the creation of the product. In my example, the time it takes me to figure
out what to buy is a cost, or the shipping fees I incur may be considered
transaction costs. Fees paid to middle men may also count. Production costs are
the cost incurred directly related to the production of a product. In the case
of a PC, the cost of the parts and labor that went into making the PC would be
production costs. A normal externality would be a cost incurred by a third party
(or sometime a benefit) outside of the decision making of the buyer and seller,
not reflected in the price paid by the buyer. An example might be the
environmental costs incurred if I decide to dump my PC in a stream after I no
longer have a use for it (see RoHS). For a network externality we normally think
of something positive that makes the product more valuable the more people that
have it. In the case of a PC, the more people that have it the more people you
can exchange compatible documents and software with, which would be a positive
network externality. A negative network externality might be the case of a
prestige item, where the more people that have it the less someone might value
it because of lack of rarity (I wonder how much my Apple IIe is worth?). To use
the PC example instead, I suppose if there are enough PCs created, and they are
all networked and cause congestion to the point of being useless then that would
be a negative network externality. Perhaps having a platform monoculture that
makes malware creator’s jobs easier might also be a negative network
externality.
How do these influence investment in networked goods?
I’d imagine transaction and
production costs figure the most directly, in that the buyer figures out what
they are willing to pay for a good, and the producer decides how many they are
willing to make for a given price. Of course as time goes by, if a positive
network effect is in play, the buyers may value the product more because they
can use it with others that have the same or similar product, and perhaps the
producer could benefit from economies of scale to make the product cheaper (or
they may make it a prestige item, sell less, and try for a higher markup).
Which categories of security products can be considered networked, and what can
be considered stand-alone?
This one is a little tougher for me
to answer. I’ll consider email encryption software to be a security
product/feature that would benefit from network effect. The more people that
enable or support email encryption software, the more people who could or would
use it. I imagine a privacy mix net would be a good example, as the more people
who join the network the more anonymous each person becomes in the crowd.
Ant-Virus software may be an example of a standalone security product, as most
of the benefit goes to the person that runs it on their system. However, there
is a positive externality there as well, having AV on their system may benefit
others because hopefully their system won’t get infected and attack other
systems. Perhaps exploit code that is written and kept to one author could be a
standalone product, but I don’t know if you would categorize this as a security
product? I’m having a hard time thinking of a security product that has no
positive externalities; even a firewall meant for one organization’s network may
benefit others in small ways. Though a firewall’s positive externality may not
be a network one, as I don’t believe on person’s firewall becomes more valuable
just because others have one (Though many folks using the same firewall may make
support easier). Perhaps hard drive encryption software could be considered a
standalone product, since only the user receives the benefit, and other than the
chance of bug being more easily spotted there is not much of a positive effect
in others using the exact same or compatible software. Then again, if the person
is using hard drive encryption to protect others PII there may still be a
positive externality (though perhaps not a network one?).
Week 5
This paper will be for the readings:
1. The Economics of Networks, Nicholas Economides
2. What really matters in auction design, Klemperer
3. Bug Auctions: Vulnerability Markets Reconsidered, Ozment
Since I already
covered Economics of Networks in the extra credit write-up from last week I’ll
focus on the two auction papers. I had not thought of some of the forms of
collusion put forth in the Klemperer paper before. I get the picture in my head
of wolfs nipping at each other over who gets the best parts of a caribou
carcass, using largely nonverbal collusion. The fight between McLeod and U.S.
West over the Minnesota spectrum was interesting: You let me have what I want,
and won’t bid up prices where you have the most interest.
The “Winner’s
Curse” has a nice corresponding idea in warfare: Pyrrhic victory. King Pyrrhus
of Epirus won battles against the Romans, but the losses were so great his army
was vastly weakened. I imagine the “Winners Curse” being a larger problem for
firms with fewer resources; larger firms may be willing to take the losses to
drive out the competition via attrition (sort of like the Romans to Pyrrhus or
Russians to the Germans). This is sort of like predatory pricing, with the hope
being to someday be a monopoly. Also, the stories of promised Pyrrhic victories
from page 174, especially Pacific Telephone hiring a prominent auction theorist
to give seminars to the competition on the Winner’s Curse, were amusing.
On the subject
of reserve prices: I wonder how often the values are set at the wrong levels in
government run auctions? I’d have liked to see more information on this, and how
good reserve prices are to be chosen. Consulting experts I imagine, and
comparing similar auctions (which I’ve been told caused part of the bubble in
one speculation market: Comic books). A few less than rational actors bid up a
similar item, and then others expect to be able to sell their items for similar
amounts.
The Bug Auctions
paper’s use of the abbreviation VM for vulnerability market causes a name
collision in my head, not sure that is the best term to go with but it works for
the paper. Looking over page two of the Bug Auctions paper, there is another
contest I can think of that may be interesting to look at, Pwn2Own from tipping
point:
http://dvlabs.tippingpoint.com/blog/2010/02/15/pwn2own-2010
In this case
it’s not the product vendor directly offering the prize, but a third party
security company. In the case of auctions, I wonder how third party security
companies would come into the mix; their influence did not seem to be a major
expectation of the paper. For example, an IDS company may be interested in
paying for exploits so they can write signatures and offer/market better
protection to their customers. In some cases the third party security company
may have deeper pockets than the original producers (especially with open source
software projects). How should rules be set up in regards to the sharing of
information goods? Is the third party security company obligated to the pass on
the information to the producer if they buy it first? How often is it better for
the third party security company to buy the bug for themselves, as oppose to let
the producer buy it and then reverse engineer the patch to make a signature or
mitigation?
Some of the
assumptions about the black market I’m not sure are true. I’m not sure a risk
neutral bug finder will generally sell to the producer and not the black market
for fear of legal action, that depends on how the bug finder evaluates the
effectiveness of the law and the producer’s investigative abilities. Attribution
of exploit code can be a tough thing to do. That said, if the bug finder does
not already have ties to the black markets I’d imagine he would just sell to the
producer since the point of contact would be easier. I’m not sure the “sleeps
with the fishes” scenario (page 8) is as likely as the paper’s author; it might
make more rational sense to put the vulnerability researcher on retainer than to
kill the goose that lays the golden eggs (at least till the vulnerability
researcher is found moonlighting for someone else). Also, some of the
mitigations put forth in the paper to avoid cheating may make the vulnerability
researcher just skip the whole official process if it makes it to laborious to
get paid. A bug finder may accept a lesser payout in exchange for a simpler
process.
Not really the
focus of the paper, but an analysis of entry costs for vulnerability researchers
might be interesting. A large Information Technology company will have more
resources than some guy living in his mom’s basement eating Cheetos and drinking
Code Red, but the basement dweller would have lower opportunity costs. I say
that with all love to basement dwellers, I’d be one if I had a basement. The
large Information Technology Company may have better opportunities to make money
than finding bugs in some other company’s software.
I have one final
discussion point, this time on the subject of copyright infringement. If the
producer takes out features in test copies for fear piracy, doesn’t that mean
that there are features that have not been tested for vulnerabilities? It’s sort
of self-defeating to the process to release software to the testers that is too
crippled. Expiration systems could be put in place instead, but people who are
good vulnerability researcher also have the skills to be good software crackers
if so inclined.
Week 6
The papers for this week are:
1. An Empirical Approach to Understanding Privacy Valuation
Luc Wathieu, Allan Friedman
2. Privacy, Economics, and Price Discrimination on the Internet
Andrew Odlyzko
3. Pricing Security
L. Jean Camp, Catherine Wolfram
4. Impact of Software Vulnerability Announcements on the Market Value of
Software Vendors – an Empirical Investigation
Rahul Telang, Sunil Wattal
Let me start by quoting the general
hypothesis of the first paper: “consumers are capable of expressing
differentiated levels of concerns in the presence of changes that suggest
indirect consequences of information transmission”. I also love the term “homo
economicus” for a rational self-interested actor. All in all, I don’t have much
to say on this paper. While not a direct criticism of the research methodology,
I wonder how the results would be different if the message had come from a real
alumni association, and not simulated by sending it to people who were paid five
dollars to participate. My guess is the results might be the same for a real
notice for those few that read it, but that most people would not even have
bothered reading the information that was sent to them (I know I round file most
things the university sends me unless it’s a bill or a check). Also, while they
say the subject pool was diverse, I’d still like more information. Another
point, it seems rational that people would be more concerned about privacy with
“No personal benefit/Dissemination” then just “No personal benefit”, but what
about the untested “Explicitly gained personal benefit/Dissemination”? What I
mean by this is the notice said that they “may” receive a benefit, or a modified
one said they did not, but none as I read it explicitly said “you received this
benefit”. The positive feeling associated with getting something might have made
people value their privacy differently (think of the “you just won a free iPad “
scams on the Internet).
For paper two, the title “Privacy,
Economics, and Price Discrimination on the Internet” seems odd as it has little
to directly do with Privacy or the Internet. As a primer on price discrimination
is seems great, just the examples given don’t seem closely tied to Privacy or
the Internet. The airplane example does involve privacy to a degree, as
government regulations about identity may make it harder to game the system in
ticket buying by using different names. Perhaps finding an explicit example
where “frequent flyer miles” were used to judge someone’s flight patterns and
price based on that would have helped. The Dell example I’ve seen firsthand, but
it’s also not about privacy really, or the Internet other than as a vector for
ordering. The Dell example seems to be more of a “cost of
information/opportunity cost” and “lazy shopping of the user“ issue. For
example, I have the time and already know from shopping around that I’d have a
hard time building a base system for cheaper than Dell offers it, but that any
upgrade (RAM, hard drive, better video card, etc.) would cost way more from Dell
then I could get it for if I bought it someplace else and put it in myself
(granted, I guess not everyone can do this). I’ve also learned from personal
experience that how you got into the store (business/education/home user) makes
a big difference on the deals you receive for essentially the same hardware. I’d
guess some users may not have the time to research the best buys, or their
opportunity costs are higher (some people may be better off making money at
their job than spending hours online looking for the best deals). On a less
flattering note, some people are just lazy shoppers who click buy on a whim. On
the bright side for the Internet, it can also make price discrimination harder
since the prices are generally show in the open and it takes little time to
compare one online shop to another (you don’t have to drive around town price
checking). If you don’t have the time, there are even sites that specialize in
finding the best deals, or listing competing shop’s prices side by side. There
is the issue of bundling, but even that does not seem much related to privacy
nor the Internet specifically. Did I miss something in the article, or was its
ties to Privacy and the Internet as weak as I think? I really don’t think they
made a convincing argument for “Privacy appears to be declining largely in order
to facilitate differential pricing”, I’d still say the main drivers are
targeting ads and predicting user behavior.
As this write-up is already running
long, I will cover the last two papers lightly. First the Pricing Security
paper. On the subject of government subsidies for research, I got to be in the
room when Peiter "Mudge" Zatko announced this:
https://www.infosecisland.com/blogview/11614-DARPA-Seeks-Innovation-from-Hacker-Community.html
DARPA’s Cyber Fast Track program will
be something to watch, especial as the hacker community has far different
methods than academia when it comes to research(not that there aren’t some
people with a foot in both realms).
The paper defines security
vulnerabilities in part by saying they “enable unauthorized access.” How about
Denial of Service vulnerabilities, would they not be considered part of the
proposed market? Not all DoS attacks are based on merely overwhelming a host
with traffic from many other hosts (DDoS), some are the result of software
errors or the unexpected outcomes of odd data that can be fixed in code. A few
examples:
http://ha.ckers.org/slowloris/
http://insecure.org/sploits/ping-o-death.html
http://www.iss.net/security_center/advice/Exploits/TCP/SYN_flood/default.htm
http://www.pentics.net/denial-of-service/white-papers/smurf.cgi
(Granted the Smurf Attack is a DDoS, but one caused by someone else’s network
misconfiguration)
Also, just a little nitpick,
depending on the video card and what you plan to do with it the result can be
far greater than adding a general purpose CPU. I have a few friends who run a
business cracking passwords, and their core box uses the parallel processing
power of several video cards to speed up the process to levels greater than a
standard CPU can do the task.
For the vulnerability impact article,
I found the following interesting and have theories as to why they got some of
the results they did. I perfectly understand why having the vulnerabilities
mentioned in the press caused more of an effect than that from a CERT
announcement, how many investors read anything form CERT? As to why the loss was
greater when the vendor found the vulnerabilities than a third party, perhaps
the press follows the vendors press releases far closer than the security
researcher’s (I image News Week follows all press releases coming from
Microsoft, but I doubt they follow many from HD Moore). Thus, vendor releases of
the information may mean higher press coverage. As to why Microsoft is less
affected than others market share wise when a vulnerability is announced, I
imagine many people have one or both of two attitudes:
1. Well, it’s Windows/Office, I have to use it anyway (not exactly true, but I
think the thought process still applies).
2. Another vulnerability in a Microsoft product? In other news, ice is cold. It
just does not get that much attention anymore, especially with the clock work
nature of “Patch Tuesdays”.
Week 7
The readings for this week are as follows:
1. Judgment Under Uncertainty: Heuristics and Biases
D. Kahneman, Paul Slovic & Amos Tversky
2. The Economic Consequences of Sharing Security Information
Esther Gal-or and Anindya Ghose
3. Network Security: Vulnerabilities and Disclosure Policy
Jay Pil Choi, Chaim Fershtman, Neil Gandal
4. Competitive and Strategic Effects in the Timing of Patch Release, Fifth
Workshop on the Economics of Information Security
Ashish Arora and Christopher M. Forman and Anand Nandkumar and Rahul Telang
The first paper on judgment may best be summarized as “people are not good at
judging probability in their head”. The main cognitive biases covered are:
Representativeness
I looked up similar scenarios and
this seems close to the “Base rate neglect” and “Stereotyping” biases, at least
how the example is given in the article. Given certain traits that may or may
not be related to the outcome/category, the person judging will give undue
weight to the traits (especially if stereotypical) and ignore the base rate of
each possible outcome. Still, if a trait is highly correlated to a given
outcome/category, I’d have a hard time faulting the user for selecting it.
Availability heuristic
The items that can be remembered are
the ones that get the greater weight. For example, if a list of names is 50/50
split between male and female names, but one gender of names is composed of
famous people, the gender having famous people’s name may be biased for when the
subject is asked to estimate the split percentages.
Anchoring effect
Sometimes this can be just a number
that is first mentioned to the subject that they then bias for when making a
judgment. It’s sort of like the power of suggestion in a way. In the case of
multi-event chance calculations, the tendency is to judge the outcome to be
closer to the number of the starting probability.
I wonder, evolutionarily speaking,
what were the reasons for these biases to develop in human cognitive function?
It could be just happenstance, but my intuition causes me to doubt that (paper
referential joke).The sorts of problems given are not the kind that an ape would
normally face; I wonder if these biases have some sort of advantage survival
wise against common natural decisions?
I’m not sure what to say about the
“The Economic Consequences of Sharing Security Information”. They say of their
model “To answer these questions, we analyze a market consisting of two firms
producing a differentiated product in a two-stage non-cooperative game.” I’m
still not sure how this game/simulation was run, or if I’d give its results much
weight. Maybe if I had a better grasp of the kind of experiment design they were
doing, I’d be able to follow their clarifications better. As it stands, I can’t
say I know if they proved their point or not.
For “Network Security:
Vulnerabilities and Disclosure Policy” I’ll see if I can come up with some
questions that do not seem to be answered by the paper. My biggest issue would
be how do you judge the probability that a “hacker” (as the word is used in the
paper) will find and use the vulnerability first? It seems that if a criminal
had the exploit code, they would keep it a secret as a competitive advantage. I
don’t think that choosing an accurate probability for exploitation is an easy
task, but this framework relies on having that probability to make an optimizing
decision. As the first paper this week points out, sometimes people are terrible
intuitive statistician. I also wonder, in the case of the not patching scenario,
if sales will be lost because of some customers demanding that the code be
maintained and going elsewhere if it is not. I have a hard time feeling sorry
for those that don’t patch their systems; then again I’m a geek. Perhaps more
can be done by the companies to make the patch process less painfully, which I
think Microsoft has done a lot of work on since XP SP2, but automatic patching
of third party apps is still a pain. A scenario I don’t remember being mentioned
in the paper is “stealth patching”, where a fix is rolled out with other things.
It’s frowned on in the industry it seems, but it’s an option to look at in a
paper like this. As an aside, I’d generally be against government mandating of
disclosure, but I also hate the idea of laws like the DMCA being used to stifle
the release of the information when it comes from a third party.
The fourth paper seems to largely
reinforce items that seem intuitive, at least to my mind. To summarize:
1. Then there is competition, vendor patch earlier. Don’t want customers to jump
ship or your competition to use a bug as part of a negative marketing campaign.
2. The threat of disclosing causes vendors to patch faster. Lights a fire so to
speak.
Neither of those two points seems shocking, but it’s nice to have a study based
on real world data to point to. Some papers seem to be like the saying “Well
yes, it works in practice, but will it work in theory?” I need to find a good
attribution for that line.
Week 8
The readings for this week are as follows:
1. Economics of Security Patch Management
Huseyin Cavusoglu and Hasan Cavusoglu and Jun Zhang
2. Honey Pots, Impact of Vulnerability Disclosure and Patch Availability
Ashish Arora, Ramayya Krishnan, Anand Nandkumar , Rahul Telang and Yubao Yang
3. Windows of Vulnerability: A Case Study Analysis
William A. Arbaugh and William L. Fithen and John McHugh
The first thought that came to my
mind when I read the “Economics of Security Patch Management” paper was the same
thought as previous papers that are in a similar vein: How do you accurately
estimate risk? Without a somewhat accurate estimation of risk, mathematical
models for optimal patch policy seem of little use. Another thought came when
the idea of shifting more of the cost of patching to the vendor was mentioned:
How do you avoid perverse incentives? If the vendor knows that they will have to
share more of the costs, then won’t they avoid releasing a patch at all (unless
there is wide media attention about the vulnerability forcing their hand) to
avoid having to incur the costs associated with it? Another thing to address is
the use of patches in load balanced systems and methods of patching that can
mean less down time (translating to less cost).
For the paper “Honey Pots, Impact of
Vulnerability Disclosure and Patch Availability”, let me start by reiterating
their key findings:
1. The disclosure of vulnerabilities increases the number of attacks on hosts,
while the availability of patches reduces the number of attacks. Keeping
vulnerabilities secret may also result in fewer attacks.
2. Vulnerabilities for which patches are release earlier are attacked less than
vulnerabilities whose patches are relatively new.
3. Open source software vendors seem to patch faster than closed source vendors,
and large vendors tend to be more responsive to the vulnerabilities disclosed in
their products.
Some of these statements may seem
self-opposing. For example, open source software projects patch faster in
general than closed source, and large vendors patch quicker than small vendors
on average, but how often is an open source project something from a large
vendor? Here is another interesting duality: the existence of patches reduces
attacks, keeping vulnerabilities secret reduces attacks, but the existence of a
patch also generally means the vulnerability is no longer a secret. These
conclusions should probably be read as there being a balancing act, where one
effect overwhelms the other for a certain time period or under certain
conditions. This is alluded to in the closing remarks of the paper.
Much of this paper I don’t think I
have the ability to analyze thoroughly. I need to brush up on statistical terms,
commonly used variables and symbols. One thing I would like some clarification
on is the meaning of “secret” vulnerabilities. If a vulnerability is “secret”,
there would be no signature to test against. Given that they had pcaps for the
different time periods, I’m assuming that a vulnerability was considered
“secret” if it was undisclosed and unpatched at the time the pcap was made, but
the signature that was used to find it came out at a later date.
Normally I don’t bother with
grammar/typo issues, but there are some in this paper that were curiously missed
considering it lists five authors. I’d assume this would mean five people
proofreading each other’s work. I miss a lot of my own mistakes, in part because
when I proofread I see in my head what I meant to write, not necessarily what I
put down on paper. For example, on page eight, what does “The probability of a
vulnerability being exploited is a function of the attacker’s fixed cost to
attack relative to the gains from attacking relative to the gains from
attacking” mean? I’m guessing the sentence was amended at some point without
noticing the redundancy. On page thirteen, footnote fourteen, I’m guessing
“lunch” type should be “launch” type? On page fifteen, paragraph three, “loss
hat” is probably “lost that”. I look forward to folks finding similar issues in
this write-up, as I’m sure I have them.
One final point, and this is
something they do mention in the paper, not all vulnerabilities are equal. Some
are easier to exploit, or may cause greater damage, and future research should
look into this aspect.
I think the last paper “Windows of
Vulnerability: A Case Study Analysis” might best be read as a history piece,
showing how we got to the current thoughts people have on patching and
vulnerabilities. Many things have changed in the last ten years. A few things I
would note: The step referred to as “scripting” might more often be referred to
as “weaponizing” in modern times. On the fourth page of the article (55) they
state: “We rarely encounter cases with CERT/CC’s preferred ordering: Following a
carefully controlled initial disclosure, a modification or configuration change
corrects the vulnerability, a public advisory reveals that the problem has been
corrected, the vulnerability is never scripted, and it dies quickly. Death
eventually followed years later.” Anymore, if a vulnerability can be scripted,
it seems to me that it likely will be. The Metasploit project and ExploitDB are
quite good at releasing exploits for known vulnerabilities. Sometimes, in the
case of ExploitDB, the exploit code may not be fully weponized but just pop up
calc.exe or the like. This however can be enough to get an attack on their way
if they know a little about shellcode. Another historical footnote: “As a
result, attackers could execute arbitrary commands on the Web server at the
privilege level of the HTTP server daemon—usually root, which is the most
privileged user in Unix systems.” I’d hope that most HTTP services are not
usually running with root privileges anymore. It’s at least no longer the
default on most systems I see.
Week 9
The readings for this week are as follows:
1. When 25 Cents is too much: An Experiment on Willingness-To-Sell and
Willingness-To-Protect Personal Information
Jens Grossklags, Alessandro Acquisti
2. The Red and the Black: Mental Accounting of Savings and Debt
Prelec and Loewenstein
3. Valuating Privacy, Fourth Workshop on the Economics of Information Security
Bernardo A. Huberman and Eytan Adar and Leslie R. Fine
4. Incentive Design for “Free” but “No Free Disposal” Services: The Case of
Personalization under Privacy Concerns
Ramnath K. Chellappa, Shivendu Shivendu
5. Why We Cannot Be Bothered to Read Privacy Policies
Tony Vila and Rachel Greenstadt and David Molnar
I read papers two and three before
paper one, so my thoughts on paper one (When 25 Cents is too much: An Experiment
on Willingness-To-Sell and Willingness-To-Protect) are influenced by reading
those first. The greater willingness to reveal weight related private data in
paper one seems odd, given that people demand on average $74.06 in the third
paper. Perhaps this is a case of anchoring? Given that the third paper
constructed the auction as being up to $100, maybe that anchored participant’s
prices high? This is alluded to in the discussion section of paper one in
section 5.2. The results seem to indicate people have a greater willingness to
sell their data at a given price than to protect it.
For the paper “The Red and the Black:
Mental Accounting of Savings and Debt” I found a few key concept interesting.
The effect of “coupling”, where the costs of a benefit are not as directly
mentally tied to the use of the benefit, is a useful short hand for explain many
things. Why do some people buy so much on their credit card, when they don’t
have the cash? The example given of a sports car being made less enjoyable each
time a payment had to be made was good. One line I would like to nitpick: “The
rationale for such a feeling is somewhat unclear, since paying off the loan
doesn’t diminish the real opportunity cost of purchasing the car: However he
pays for the car, Jones has less wealth, which will inevitably require some
sacrifice in future consumption.” Not having to pay the interest part of the
payment may be a huge benefit, depending on the future value of the money. For
example, let’s say I bought a house at an interest rate of 6%, but Certificates
of Deposit are returning about 1.3% right now. Am I not better off keeping a
small “just in case fund” and using the liquid capital to pay off the house? If
rates stay the same in the long run less is coming out of my pocket grand total.
Paper two indicates that people are
debt adverse, even to the point of not making the best utility maximizing
choices, and thus are less than rational. One example given is: “Contrary to the
economic prediction that consumers should prefer to pay, at the margin, for what
they consume, our model predicts that consumers will find it less painful to pay
for, and hence will prefer, flat-rate pricing schemes such as unlimited Internet
access at a fixed monthly price, even if it involves paying more for the same
usage.” Perhaps the payer is still being rational however. They may not know
what their usage will be, and want to avoid overage charges. Some service
providers may be quite draconian when it comes to overage charges, and while
it’s not an ISP example, think of charges for texting on phones even when the
true costs of the overage incurred by the phones company are miniscule. Could
debt aversions be largely explained by fear of the unknown?
In the paper “Valuating Privacy,
Fourth Workshop on the Economics of Information Security” the point I found most
interesting is the context of the information, as in who it is given to. People
were more willing to reveal BMI information to people they don’t know,
“phenomenon of the stranger” effect, than people they associate with. Would
other data, like financial or contact information, be as easily given to a
stranger? I would guess that people are more willing to give out information
that is slightly embarrassing to strangers than data that could more readily be
used to directly harm them. This “phenomenon of the stranger” may however
explain why people are willing to tell things to bartenders, and why people with
the perceived anonymity of the Internet will say outlandish things on forums.
As a final thought on this paper, I’d
like to note that BMI is a problematic metric depending on the group being
observed. BMI does not take into account muscle mass, and as such many people
who are “gym rats” are labeled as obese. If the survey were done with mostly
bodybuilders, I imagine most would be more than willing to reveal their BMI
since they know it is of little values in their field. A better metric might be
body fat percentage.
For the last two papers I’ll give
short synopses.
The “Incentive Design for “Free” but
“No Free Disposal” Services: The Case of Personalization under Privacy Concerns”
concerns personalization services offered via browser tool bars and other means.
The consumer’s benefit is the personalization (custom search, new features) and
the vendor’s benefit is preference information obtained about the consumers. A
“No Free Disposal” property occurs when more of something than is desired causes
a disutility for the consumer. In other words, more is not necessarily better
and there may be a point at which more is actually worse. This paper attempts to
find models for optimizing “No Free Disposal” personalization vs. privacy goods
so the vendor knows what to offer the consumer.
In “Why We Cannot Be Bothered to Read
Privacy Policies” the key focus is on the asymmetric nature of information as it
concerns privacy on websites. As a result of this asymmetric information
concerning what the owners of the site will do with the data they decided to
look into it as a “lemons market with testing” where what people expect causes
the sellers to only offer the least valuable good (lack of privacy). They wished
to find out the effect signals in the markets, specifically privacy policies,
and the costs for consumers of testing if a site meets their personal
requirements. They find that that the market does not move directly to an
equilibrium point, there is fluctuation as to the amount of privacy offered and
expected, and that the equilibrium point may change with time.
Week 10
The readings for this week are as follows:
1. Who Signed Up for the Do-Not-Call List?
Hal Varian and Fredrik Wallenberg and Glenn Woroch
2. On the Viability of Privacy-Enhancing Technologiesin a Self-Regulated
Business-to-Consumer Market:Will Privacy Remain a Luxury Good?
Rainer Bohme and Sven Koble
3. Who Gets Spammed?
Il-Horn Hann, Kai-Lung Hui, Yee-Lin Lai, and S.Y.T. Lee and I.P.L. Png
4. Spamscatter: Characterizing Internet Scam Hosting Infrastructure
David S. Anderson, Chris Fleizach, Stefan Savage and Geoffrey M. Voelker
The key pursuit of the paper “Who
Signed Up for the Do-Not-Call List?” is to find demographic information for
those who enrolled in the Federal Trade Commission’s Do-Not-Call list. Some key
information they were looking for includes the monetary value that households
attach to being on the Do-Not-Call list and the effects of different
registration vectors (phone vs. web). They also tried ascertaining racial and
ethnic statistics about who signed up for the Do-Not-Call, but since the
Do-Not-Call signup did not ask for this information they attempted to figure it
out based on US Census reported information concerning those areas where people
had signed up. Amongst the findings:
1. Figures for the value of the Do-Not-Call list varied based on the assumed
level of knowledge about State-run programs that were already in existence,
their costs, and what people who never signed up would have possibly valued it
at. Figures varied from $7.5 million to an upper bound of $48 million per year
if what people are willing to pay is the metric used to determine value. If one
instead assumes each unwanted call imposes a disutility of $0.10, then the value
of the Do-Not-Call would be closer to $3.6 Billion per year. There seems to be
quite a disparity between what people might be willing to pay, and the benefits
they receive. With the free signups more people took advantage of the FTC
program than would likely have otherwise.
2. Low income households had a lower probability of signing up for the
Do-Not-Call list.
3. On page 12 it is stated that counties with a high fraction of Internet users
had higher signups rates, though “not by a dramatic amount”. I’m a little
confused however by the line on page 2 that read “However, there are some
surprises, as in the case of Internet penetration rates which appear to be
negatively, if weakly, related to sign-up frequencies.”
4. Racial statistics are given on page 10. The statistics may be somewhat rough
because of how the data had to be sourced. Whites seemed more likely to sign up
than blacks, Asian and multiracial households also seemed to have higher signup
rates than average.
5. Larger households seemed to have lower signup rates. It was conjectured that
this may be because the answering of unwanted calls was spread out amongst the
household.
In the paper “On the Viability of
Privacy-Enhancing Technologies in a Self-Regulated Business-to-Consumer Market:
Will Privacy Remain a Luxury Good?” they tried to develop a model to compare
government vs. market driven privacy demands and people’s willingness to pay for
privacy. The crux of it is: can a seller gain more revenue by offering privacy
and hopefully obtaining more privacy valuing customers vs. the gain obtained by
using information about the buyer to price discriminate. According to their
analysis most sellers can increase revenues by supporting privacy enhancing
technologies. However, this is dependent on how many buyers they would lose from
not offing privacy, and the potential gains from price discrimination, so
different markets will vary. It should also be pointed out that the desire for
privacy can be the trait that price discrimination could be based on.
The papers “Who Gets Spammed?” and
“Spamscatter: Characterizing Internet Scam Hosting Infrastructure” both concern
spam of course, but the type of spam seems to differ a little.
The “Who Gets Spammed?” paper points
out that there is little to discourage spammers monetarily since most of the
cost is incurred by third parties. The researchers signed up for multiple
accounts at multiple mail providers and chose different setting as to privacy.
Some accounts’ email addresses were also posted on the web where they could be
scraped, while others were kept somewhat private. Some key findings were that
Hotmail accounts got spammed more than the others, with the rest in descending
order from most spammed to least spammed being Lycos, Excite, and then Yahoo. Of
course, those that had their email address posted publicly on the web got more
spam than those that did not. Accounts that declared interests also seemed to
receive more spam. In this paper most of the spam originated from the email
services providers and their marketing collaborators, that does not seem to be
the type of spam focused on in the next paper.
The paper “Spamscatter:
Characterizing Internet Scam Hosting Infrastructure” describes its focus in the
title. The spam here is pretty clearly not from the email services providers or
their marketing collaborators. They used various image processing techniques to
correlate related scams together based on graphic similarity, even if the pages
differ somewhat they could still hopeful tell if two pages were part of the same
scam. They found many interesting points, a few of which are:
1. Spam relays were more likely to be transitory than scam hosts, but this makes
sense since web pages need to stay up to be viewed, but once the emails are sent
the relays are no longer as important. There was also only 9.7 overlap between
spam relays and scam hosts.
2. Most individual scams were hosted on a single IP, but may have had different
URLs and vhosts in use to keep from being blacklisted based on the URL string.
3. A single server may host more than one scam. They say this suggests that
individual scammers may have multiple scams going on at one time, or that some
hosts are more accommodating to scammers and as such get business from multiple
scammers.
4. Malicious scams (phishing/malware/etc.) had a shorter lifetime, and more
mundane shopping scams had longer lifetimes.
Week 11
The readings for this week are as follows:
1. Proof-of-Work Proves Not to Work
Ben Laurie and Richard Clayton
2. Proof of Work can Work
Debin Liu and L Jean Camp
3. Adverse Selection in Online 'Trust' Certifications
Benjamin Edelman
4. Privacy-Aware Architecture for Sharing Web Histories
Alex Tsow, Camilo Viecco, and L. Jean Camp
The first two papers will be the
focus of this write-up and they deal with the feasibility of using proof-of-work
algorithms to combat spam. One of the core problems of spam is that it cost next
to nothing for the spammer to send their emails. The idea has been floated in
the past to make emails cost some form of postage, but few seem to like the idea
of real money being used. This is where proof-of-work comes in. The core idea of
a proof-of-work is to make the initiator have to work out some computational
problem that is hard to solve, but easy to verify. In the case of email the
sender has to solve the problem and send the right solution before the recipient
will accept the email. Since the problem is hard to solve, but easy to verify,
the sender will take the brunt of the computational and time burden. Hashcash is
one such algorithm. Part of the idea is that legitimate users send few emails
relative to spammer and have enough idle CPU time available that it should not
be a burden to them, but it will be a burden for spammers who send massive
amounts of email.
The paper “Proof-of-Work Proves Not
to Work” tries to ascertain how hard to make the problem, time wise, to be a big
enough of a burden to make spamming economically unfeasible. The first paper’s
conclusion is that it is currently impossible to discourage spammers
sufficiently with proof-of-work systems without unacceptably effect on
legitimate users. They calculate that to drop the average spam per person down
below a certain fraction (S) what the proof-of work-burden (C) would have to be.
To achieve an S of 0.01 for example, a C of 346 seconds may be needed, depending
on assumptions about the spammer’s infrastructure and ability to control a large
botnet. A few of the problems highlighted by the first paper are:
1. The disparity in processing speeds between mail agents (desktop pc vs a PDA
for example). Some of this has been offset by work in algorithms that are memory
bound instead of CPU bound, as memory speed vary less than CPU. However, the
paper states: “To address this problem, Dwork et al. [9] have recently proposed
puzzles that rely on accessing large amounts of random access memory.” My
question is how likely is a system with a slow CPU to have large amounts of
memory?
2. How do you handle mailing lists where a single email is sent to a “list
exploder” that would then have to figure out the proof-of-work problem for each
recipient? The authors assume since this is clearly impractical that the mail
exploder is delegated some authority by those signed up for the list to check
the sent mail once for them.
3. Spammer would likely use botnets to send their mass mailings, so they would
not have to pay directly for the hardware costs of intensive proof-of-work
operations.
The second paper proposes that
proof-of-work could work if it was part of a reputation system and combined with
current commercial anti-spam technologies. The proof-of-work burden could be
adjusted based on reputation. New hosts would have a higher proof-of-work burden
than those that have been around for a while and are well behaved. If a formally
well behaved host starts to misbehave and spam is caught from it, its
proof-of-work burden can be increased until its reputation as a trustworthy
sender has been built up again.
The third paper, “Adverse Selection
in Online 'Trust' Certifications”, wishes to ascertain the real value of trust
authority certificates like TRUSTe and BBBOnline. Is there adverse selection
from companies who are not trustworthy seeking to be certified to gain a veneer
of respectability? Also analyzed is the trustworthiness of the top organic
results from search engines. Edelman used SiteAdvisor’s (he is on its Advisory
Board ) test results to judge the meaningfulness of the certifications and
search results. Sites with a TRUSTe certification were only 94.6% trustworthy,
vs 97.5% of all site tested being trustworthy in SiteAdvisor determination for
the general web population. This would seem to indicate that TRUSTe has adverse
selection in effect, and perhaps seeing a TRUSTe certification should be a bad
sign to end users. BBBOnline certified site however seemed to be more
trustworthy than a random cross section of site. However, BBBOnline means of
enrolling new sites to certify may not scale well. Organic search results also
did not seem to suffer from adverse selection, but sold search engine ads did.
The fourth paper, “Privacy-Aware
Architecture for Sharing Web Histories” concerns the Net Trust system. Net Trust
is a rating system that uses multiple sources of data to give the user
information so they can decide if a site should be trusted. One of the main
sources of information comes from the user’s Net Trust social network, using
both implicit browsing history of peers and explicit ratings. This is
implemented in part as a browser tool bar for managing social networks and
personas, and for viewing ratings. The user can manage different personas so
they don’t have to share possibly embarrassing or unrelated information with the
wrong social group. Efforts are also made to keep the server that distributes
rating information from telling too much about the users, and the social network
connections are managed locally and not on the server. The rating servers
protocol is kept thin in hopes of migrating to a p2p system in the future
(distributed hash table?).
Week 12
The readings for this week are as follows:
1. The Privacy Jungle: On the Market for Data Protection in Social Networks
Joseph Bonneau and Soren Preibusch
2. Imagined Communities: Awareness, Information Sharing, and Privacy on the
Facebook
Alessandro Acquisti and Ralph Gross
3. HIPAA Compliance: An Examination of Institutional and Market Forces
Ajit Appari, Denise Anthony and Eric Johnson
4. Data Hemorrhages in the Health-Care Sector
M. Eric Johnson
5. Information Explosion. Confidentiality, Disclosure, and Data Access: Theory
and Practical Applications for Statistical Agencies
Latanya Sweeney
The first two papers focus on social
networks and their security/privacy ramifications. “The Privacy Jungle”
evaluated forty five social networking sites using two-hundred and sixty
criteria. One of the points they make is that while many social networks seem to
vaunt their privacy practices, the policies themselves are not easily
understandable by the average user without a legal background. They point out
user surveys consistently show a high sense of concern for privacy, however
observed user actions on the networks seem to contradict this. They divided
their 45 selected sites into general (MySpace, Facebook) and niche sites (Linkedin,
Habbo , and strangely selected to my mind Twitter, though they point out it may
be in a niche by itself) but excluded content sharing focused sites (YouTube,
Flickr). They evaluated data collected by sites, and when they signed up for
services they filled out consistent data to the fullest extent possible, but
withheld data which was not mandatory. After their evaluation some of their
conclusions were:
1. They found strong evidence that social networks were failing to provide
adequate privacy controls.
2. Evidence that one of the main problems was lack of accessible information for
users.
They suggest “privacy nutrition
labels” to help standardize communication with users, and doing further research
into how privacy policies could be made easier for users to understand. To
increase consumer choice they recommend reducing social network lock-in by
allowing for data to be portable as to allow moving to a new network (this has
its own privacy concerns, but that’s not a focus of the paper).
The second paper (Imagined
Communities) focuses on Facebook. It is from the time period when Facebook was
focused on colleges and high schools, so a few of the points like requiring a .edu
email address no longer apply and the demographics may deferrer now. Like the
previous paper they point out that while user survey results show a high sense
of concern for privacy, observed user actions on the network seem to contradict
this. They also found evidence that users have misconceptions about how visible
their data on Facebook is. They compared survey responses with the data they
could scrape from members’ profiles to see how “what they say” and “what they
do” match up. A few of the interesting findings:
1. Privacy concerns may drive some older and senor college members away from
Facebook, however for undergrads having reported higher privacy concerns did not
seem to be driving them away from being on Facebook.
2. Non-members seemed to have higher privacy concerns in general than members.
3. What Facebook members said they used the network for differed greatly from
what they think that other members are using it for. As an example: few said
they were using it to find dates, but many respondents reported that they
thought others were using it for that purpose. Perhaps those that responded were
applying their own motives to others?
4. In expressed attitudes vs. behavior, members did not seem to be consistent.
Even if a user said they cared about the privacy of their politics/sexual
orientation/relationship status, they were still likely to put it in their
profile.
Also of interest to me, page 20
spells out some of the changes users made to their profiles after taking the
survey (site scrapes were taken before and after). It seems just taking the
survey made some people think about what was in their profile.
Paper three (HIPAA Compliance) aims
to develop “a regulatory compliance model by drawing insights from the
institutional theory literature to identify the key drivers influencing HIPAA
compliance, both institutional and market forces”. They put forth nine
hypotheses to test with their dataset, most of which there were at least
somewhat supported by their dataset after they did their evaluation. For the
sake of brevity paper three will not be a focus of this write-up (no sense in
repeating all of the hypotheses), but I would like to point out one of
hypotheses that they indicated came back negative:
H6: Hospitals employing external consultants will exhibit a higher tendency to
become HIPAA compliant.
They point out that having an outside
consultant is negatively correlated to being “compliant,” but from my reading of
the paper it seems compliance is self-reported by the institution. Could it be
that those institutions that have had external consultants look into their
information system are just more aware of where they are lacking and responded
more honestly? As the mantra from pen-testers goes “compliance != security”. It
could be that those institutions that reported themselves as being less
compliant may actually have better security and privacy controls in place than
those that self-reported as highly compliant. If there is some place in the
paper that points out a more rigorous test for compliance besides
self-assessment please let me know, but on page 11 it states “self-reported
level of compliance to HIPAA privacy, and security rules”.
Paper four is titled “Data
Hemorrhages in the Health-Care Sector”. Section two covers how health related
data leaks can be used, and gives real world examples. A few of the missuses/ramifications
of this data being misused include:
1. General privacy violations.
2. Issuance fraud for un-rendered services.
3. Issuance fraud for rendered services, but to someone that is stealing the
identity.
4. Use of identities to access to multiple prescriptions and then sell the drugs
(or feed a habit).
5. They mention the possibility of life threating wrong information being put on
a medical record because of identity theft, but don’t seem to go into it in
depth. An example I could think of is if someone stealing an identity listed an
allergy or contraindication that the real person did not have, and as such the
real person did not receive the best possible treatment when they went it for
their own problems. As pointed out in class, having wrong blood type information
put on your record would also be problematic.
The paper briefly covers different
ways breaches occur, but focuses on data available because of it being
inadvertently placed on peer to peer file sharing networks. They used p2p
searching software from Tiversa so scour the Gnutella, FastTrack, Aries, and
e-donkey networks for Microsoft Office documents (Word, PowerPoint, Excel, and
Access) containing certain medically related key terms. They kept the terms
somewhat limited to cut down on the false positive they would have to weed
through, but still had to manually throw out a lot of results (look on the
bottom of page 9, and the top of page 10 for some of the weeding statistics).
Just some of the interesting documents found include:
1. Medical documents used for tax purposes.
2. Employment documents.
3. Spreadsheet of recent hires at a hospital, including name Social Security
number, contact information, etc.
4. Medical card detailing prescription information.
5. From a medical testing lab, a 1718 page document containing Social Security
numbers and other information for almost 9000 patients.
6. Two spread sheets from a hospital containing Social Security numbers and
other information for over 20000 patients.
7. Psychiatric evaluations.
I could go on, but I think that makes
the point. Some of their conclusions and concerns are:
1. Under HIPPA, it can be hard to correct information after an identity has been
stolen.
2. HIPPA may help stop leaks from the health care providers directly, but does
little to stop patients for inadvertently sharing the information themselves.
3. More tamper proof photo id issues from the insurance companies may help (or
at least raise the bar).
Page 16 is also interesting as it
shows some of the searches other p2p users were doing for medical information.
Idea for further research: the viability of a honey pot for people doing
searches for such information?
Paper five (Information Explosion) I
will only mention in brief, and concerns the growing trend to collect more and
more data. This is aided by the fact that storage is so much cheaper today that
it has been in the past. Mention is made of how more information is now
collected on birth certificates, via “loyalty cards” at retailers, and on
employment paperwork than in the past. A few of the data collecting behaviors
and trends the author points out are:
Collect more – if data is already being collected from a
person, add more fields.
Collect specifically – If data in the past was aggregate, make it specific to
the individual. Loyalty cards are a great example of this. In the past,
retailers may only have known the aggregate sales at a store, not the can try to
predict based on what an individual with certain demographics bought.
Collect if you can – given the opportunity, collect data. Examples given include
new hire paper work and immunization registries.
Also mentioned is the balance of
providing enough information for researchers to do useful work for society,
while still providing for privacy to the individual. Datasets can be somewhat
anonymized, but there is the risk of useful information being removed or
obfuscated in the process.
Week 13
The readings for this week are as follows:
1. Digital Rights Management and the Pricing of Digital
Products
Yooki Park and Suzanne Scotchmer
2. Competing with Free: The Impact of Movie Broadcasts on DVD Sales and Internet
Piracy
Michael Smith and Rahul Telang
3. A contribution to the understanding of illegal copying of software: empirical
and analytical evidence against conventional wisdom
C. Osorio
4. The Simple Economics of Open Source
Josh Lerner & Jean Triole
Paper one, as the title spells out,
concerns DRM and pricing models. Specifically it seems to be focusing on
products you download without receiving physical media, though it would have
been helpful if it had made this clearer from the start of the paper. After a
brief synopsis of the history of Digital Rights Management the paper goes into
the effects of content vendors maintaining their own DRM systems directly, or
through a shared protection platform. Two important things to consider with a
shared protection system are:
a. Whether the vendors can set their prices independently.
b. How costs for the share systems are allocated to the member firms.
If the vendor can choose their own
price, they can choose one to maximize profits. If the shared system does not
allow the vendor to choose the price, they could still possibly undercut others
via the use of rebates outside of the shared system. Decisions would also be
based on how the costs of a shared system are allocated, as a percentage of
selling price or as a per download cost. If the company plans to issue rebates a
“per download” scheme may be preferred by them if an item has a high enough
selling price.
One of the paper’s conclusions is
that with a separate DRM system prices will be lower than if there were perfect
legal enforcement, and the vendors will also be burdened with the cost of the
DRM system. However a perfect legal enforcement does not seem very likely to my
mind, though the RIAA/MPAA and others seem to lobby for laws to support it. A
shared DRM system may or may not be less costly than a firm controlling their
own, and it may or may not lead to higher prices. It depends on many variables.
In the paper’s opinion, it is not clear whether a shared DRM system would be
more efficient than the vendor running their own.
Paper two looks at the impact of
movie broadcasts on the sale of DVDs and unlicensed downloads of movies. They
used the broadcast of the selected movies as an “exogenous demand shock” and
attempt to measure the change in sales of the DVDs before and after the
broadcast, as well as attempting to track the downloads of the movies from
popular BitTorrent trackers. Several points were brought up in the paper:
a. Movie studios are becoming more reliant on media sales for their profits.
b. Movie content may be more prone to single use consumption (watch once and
never again).
c. Extras on DVDs that normally do not come with downloaded movies
(commentaries, “making of” featurettes) may have an effect on people deciding to
buy them.
d. Effects of ease of downloading and storage of the content.
e. Media types vary, and the effects of illegal downloading on the market place
for one type of media (music) may not be the same as for another type of media
(movies, games, books, etc).
f. Technology changes with time, so what is true now about the market structure
for a given media type may change in the future.
Because of the points I listed above,
and others, the conclusions from this paper should probably be seen as applying
to a snapshot in time, or as the authors say: “our findings may change in the
future if the environment surrounding piracy changes”. With those caveats, some
of their findings include:
1. Broadcasting movies on TV provided a boost in DVD sales.
2. Broadcasting movies on TV also boosted illegal downloads.
3. Over the air broadcasts had more of an effect than cable broadcasts.
Since both the illegal copying and
DVD sales of a given movie went up they also looked to see if the availability
of the movie on popular BitTorrent trackers (Piratebay and Mininova) had an
effect on the resulting DVD sales after the broadcast. Using movies that were
and were not easily available on the chosen trackers, along with DVD sales
information, they did a regression analysis so see how they were related. From
their results they concluded that the illegal copiers and the DVD buyers were
two separate segments without much crossover, and the availability of the movie
on the BitTorrent trackers did not significantly affect the DVD sales.
Neither paper three nor four lend
themselves to what I would consider concise write-ups with so many points being
made in each. Instead I’ll cover what I consider the more interesting points.
Paper three looks into the effects of
illegal copying on the software market. It points out that software is a quasi
non-excludable good, one person having a copy does not keep someone else from
having a copy (unlike let’s say a physical item). Osorio points out that illegal
copying can cause positive network effects. More users, even the illegally
copying kind, may help lead to the software being bought because of word of
mouth and the usefulness of having a shared platform with others. He points out
overestimates on the cost of illegal copying that stem from the faulty
assumption that illegal copiers would have bought the software if they could not
have gotten it for free. Osorio’s review of the literature points to four common
hypotheses about why software is illegally copied. So as to not quote the text
verbatim I’ll summarize them roughly as:
1. Cost: Copying is cheaper than buying the software legitimately and there
could be an affordability or income issue.
2. Legal or Worldview: Not all societies/culture look at intellectual property
and copying the same way, and may see nothing wrong in the copying of software.
3. Fit: The software does not fit local needs.
4. Support and Services: There may not be enough local support and complementary
services for the software.
Of these hypotheses Osorio found:
1. Seems supported by his data, the lower the income in an area the higher the
illegal copying.
2. His synopsis of the results for hypotheses two are not clear to me. From the
Table 1 and Figure 1 it looks like there is a correlation between legal
framework and illegal downloads.
3. Seems to have some support.
4. Also seems to be supported.
Paper four looks at the economics of
Open Source projects. The three projects they focus on were Apache, Perl and
Sendmail, though Linux seems to pop up a lot in their discussions also. One
question commonly asked is why someone would develop software for free, what are
they getting out of it? One of the points made in paper four is that the utility
function for an Open Source developer is not necessary money; there is also ego
gratification and reputation amongst peers. Some developers also just enjoy
developing their projects for the fun of it. The reputation gained from working
on an Open Source project could also lead to a job, which might be a monetarily
compelling reason for some. A company may support an Open Source project, either
under hopes of making money from support, good PR, or for selling hardware that
uses the software. In the case of company sponsored Open Source projects the
developers motives can be salary based.
They mention a few pitfalls that can affect Open Source projects, the two I find
most interesting are:
1. Forking if there is not strong confidence in the leadership. This may have
the effect of pulling developers away from the core project. Then again, maybe
the spinoff project will be compelling in its own right. To my mind forking has
sort of an undeserved negative connotation in some circles/contexts.
2. Problems getting people to develop the “boring parts”. One common example of
this is software that is great technically, but has very little documentation
because no one finds that task interesting/fun/rewarding.
Also covered in the paper are the
leadership structures of Open Source projects. Some projects like Linux have a
recognized leader (Linus Torvalds), others like Apache have more of a committee
system. Not mentioned in the paper is Larry Wall’s title as the leader of the
Perl project: “Benevolent Dictator for Life of the Perl project”. Something to
look at for further research would be the changes in structure of Open Source
projects and corporate support since the paper was written in 2000.
Week 14
The readings for this week are as follows:
1. Predictors of Home-Based Wireless Security
Matthew Hottell, Drew Carter and Matthew Deniszczuk
2. Practice & Prevention of Home-Router Mid-Stream Injection Attacks
Steven Myers and Sid Stamm
3. Information Disclosure as a light-weight regulatory mechanism
Deirdre K. Mulligan
4. Mandatory Disclosure As a Solution to Agency Problems
Paul G. Mahoney
Paper one tries to determine factors
that contribute to whether or not security features will have been enabled on a
WiFi network. Some of the demographic information they tried to correlate with
wireless security were education level, income and housing density. They
conducted their own wardrive, and used US Census data for demographic
information about the neighborhoods they were scanning. Wireless networks were
marked as secured if they had WEP or WPA enabled. They had three main hypotheses
to test. In the end they could find no statistical support for any of their
hypotheses. I’ll quote their hypotheses and summarize their resulting
conclusions below:
“Hypothesis 1: higher education level is a predictor of higher levels of
wireless security.”
I find this quote from the paper
troubling “These findings indicate that investing resources in large-scale
educational campaigns to raise awareness of wireless security may be a bad
decision since education seems to have little effect.” I don’t see how this is a
good conclusion. Just because someone has a bachelor’s degree does not mean that
they would have a specific skill set, or an awareness of a given issue. Being
aware that you should secure your wireless network and knowing how is pretty
specific. We might question the usefulness of user awareness education for other
reasons, but not based on this data concerning the number of college graduates
in a given neighborhood.
“Hypothesis 2: Higher income indicates a greater likelihood of secured wireless
access points.”
As stated before, they found no support for this hypothesis either.
“Hypothesis 3: Higher population density predicts better levels of wireless
security.”
Much the same, they found no support for this hypothesis either.
One factor that did seem to have an
effect on the whether a network was secured was its SSID. Some SSIDs were
associated with services like 2Wire that enabled encryption by default on most
of their deployments (330 out of 340). About 88 percent of the routers that used
the SSID “Linksys Secure Easy Setup” were secure, presumable because the install
walked the user though setting up encryption. After taking these two SSIDs out
about 55 percent of the routers found were configured to be secure.
I have many, possibly minor, nitpicks
about the paper. First, on page 5 as an example of the cost of implementing
security on a wireless network they went into what someone would have to do to
set up MAC filtering. This is not the best example for two main reasons:
1. Setting up MAC filtering is much more labor intensive to do than turning on
WEP/WPA/WPA2, which is what they tested for. This sets up somewhat of a straw
man as to the level of difficulty. I’m not saying setting up WEP/WPA/WPA2 is
necessarily easy for the average user, but using MAC filtering as the example
for the difficulty of setting up a secure network is setting the bar too high as
far as difficulty.
2. Not only is MAC filtering harder to configure than WEP/WPA/WPA2, it is
ineffective. All an attacker needs to do is put their WiFi card in monitor mode,
sniff for active connections, and clone a MAC address. All that trouble, for
little to no benefit when turning on encryption is so much easier.
Another issue some technical readers
may have with the paper is the definition of security (WEP or WPA on). Did they
look for wireless networks that allowed anyone to connect, but would not let the
person route traffic anywhere without connecting to a VPN first? This was a
commonly seen configuration on commercial/education networks for many years
since WEP was so horribly broken and required a shared key. They say on page 7
that they weeded out commercial access points as identified by SSID or maker
(hopefully they excluded University ones as well), so perhaps this is not as big
an issue since the VPN configuration is not as likely to be seen for home users.
They also used Netstumbler, which will only see access points that allow probes
or beacon. Kismet would have been a better choice as it can find “cloaked SSIDs”
using monitor mode as long as there is association traffic to be seen. Depending
on how it was set up (past versions have logged more than just management frames
by default), this may run afoul of some wiretap laws (A good example story:
http://arstechnica.com/tech-policy/reviews/2011/04/judge-was-wifi-packet-sniffing-by-google-street-view-spying.ars
). Then again, I doubt many people enable SSID cloaking as it does not add much
real security and can cause connectivity issues. Still, use of cloaked SSIDs is
something to look into if the study is repeated.
If this study was done again
WiGLE.net would be a good resource (assuming you trust the users to upload valid
results). WiGLE has a database of found access points from around the world, and
is fairly easy to query. You can also query for just access points you have
found. It unfortunately lumps WPA and WEP into the same category, but this would
not be a problem considering this paper’s definition of secured wireless. I can
say that people do seem to be more conscious about home wireless security now
than they were then, or perhaps there are more routers that turn on encryption
by default. Of the 9943 access points I’ve found just this year (within .5
degrees of 38.22,-85.75), 7871 have used WEP or WPA (%79). I did not however
factor out commercial installations.
Paper two from Myers and Stamm
concerns mid-stream injection attacks and countermeasures. Specifically it
focuses on the injecting of scripts into unencrypted webpages by compromised
home routers, but the mitigations should also be applicable to other mid-stream
injection vectors such as inline proxies and routers (these are more of a threat
to my mind as I will cover later). The core motivation for this research is that
the use of TLS/SSL can be too costly CPU wise, which is why some sites avoid
using it for all except for the most confidential information (passwords, form
data, etc,). This leads some sites to use “secure post” where only the submitted
data is sent using TLS/SSL. The down side to this is that the HTML form itself
is sent in the clear without the signing advantages of TLS/SSL, so an attacker
in the middle of a connection can modify the returned form page to include
JavaScript to fork the form into sending its returned data to more than one
receiver (the attacker’s collection box for example). The paper proposes using
obfuscated scripts and cryptographic hashes to have the browser check to make
sure that an attacker has not inserted new code mid-stream. The obfuscation used
will have to be changed from time to time so the attacker will have to come up
with new countermeasures. Both the client and the server do their own checks of
the HTML form and also, depending on where z (the cryptographic hash of the
canonical HTML form+JavaScript, etc.) is checked there may be other problems.
For example, the server side may know that the form has been changed because z
does not equal z’ and can warn the users, but the data may already have been
submitted to the attacker. The paper says about notifying the user of an attack
“This notification should be through alternative channels, as in this case the
web channel may be compromised.” This is true since if an attacker can modify
the page to add new content, they can also modify alerts coming in from the
server over the same HTTP connection. However, what alternate channels should be
used to avoid great time delays? Are they intending to email the user and say
“hey, I think someone is messing with your connection”?
While not a perfect solution, it is
hoped that these mitigations will make it too costly to carry out the attack.
The paper is pretty detailed as to how this can be done by a developer, but a
library for making it easy to implement and to avoid “roll your own” security
problems would be helpful. As stated in the paper : “Thus, our countermeasure is
not computationally secure in a cryptographic sense, and we do not dispute that
a dedicated hacker could overcome the solution with enough resources. Our goal
however is more preventative: to make script injection unattractive and
non-profitable, so that fraudsters do not attempt the attack since it will not
be financially profitable (at least on those sites that bother to invest in
countermeasures).” This is where I have the most nitpicks. The mid-stream
injection attacks seem like they could be a big concern if the injection is
happening on an ISP’s router, a shared proxy (think of a rogue Tor exit point)
or even a wireless router at a location that has a lot of patrons (coffee shop,
library, etc.). The early focus on home routers seems a little out of line with
the “not be financially profitable” quote above as I don’t think the
installation of mid-stream injection software on the routers is quite as easy as
the authors make it out to be. Yes, it can be done for a large fraction of home
routers, but at what time and effort cost to the attacker? They answer some
objections on the second and third pages, but I still have my doubts. Yes,
OpenWRT (and it’s cousin DD-WRT) support a wide range of routers, but getting it
installed on a large number of home routers would be a non-trivial automation
task (automating being the only way to make it profitable, unless we are
considering a targeted attack). An attacker could drive around neighborhoods
installing it, but that would be cost and time prohibitive. Installing a new
firmware remotely across the Internet via something akin to a CSRF attack could
be attempted, but this seems non-trivial. A few of the things a mass attacker
would have to get around if they wanted to install a new firmware remotely
include:
1. Yes, a lot of routers do use default passwords (estimates of %25 to %35 are
given in the paper), but they vary and it does cut down on targets if you are an
attacker going for numbers.
2. Assuming you know the default passwords, newer browsers have mitigations in
place to make requests to http://root:password@someip less transparent then they
use to be. This sort of attack was shown in the Grossman presentation referenced
in the paper, and is a little harder now depending on the browser in question
(in my tests, Firefox 4 gives a warning, IE 8 fails completely).
3. CSRF attacks could be used against some vulnerable routers to first make the
admin interface available on an Internet facing IP, but uploading a new firmware
from across the internet seems like an idea prone to bricking routers.
4. You have to detect the router type so you choose the right version of the
firmware to upload, but this is not nearly as hard as the other issues and I’ve
seen code to do it.
5. This is not really an issue, just an observation. If you are going to bother
with installing OpenWRT, install TCPDump/Ettercap/Dsniff and harvest
passwords/data from protocols besides just HTTP.
Yes, I suppose these issues can all
be gotten around, but it seems to me that a lot or work has to be done, it would
be error prone, and it would not be attractive to an attacker looking for large
aggregate numbers. I see other vectors of mid-stream injection as being far more
likely than a home router being backdoored. The modification of a router by
someone with direct wireless access might still be profitable if it’s:
A. Specifically targeted.
B. The router has a lot of users to aggregate semi-valuable data from.
C. There is an intention is to do identity theft against a small number of
individuals and try to draw as much profit as possible from fewer people.
As I’m running way long already, I will attempt to use the third and fourth
paper to answer the question proposed in the reading list:
“The argument against privacy is that disclosure is a lightweight effective
market mechanism. Is this consistent with the arguments for or against mandatory
disclosure?”
Well, as I read the papers, this is
an argument against a specific type of privacy, not privacy in general. For
certain economic transactions to take place there has to be trust, and many
times that means verification of information that may be considered private
under certain circumstances. If I write someone a check, they want to have a
fair degree of certainty that I am who I say I am, and will want my name and
contact information. If I am a promoter of a certain stock, potential investors
should want to know what my interests are and if I might have any perverse
incentives. The question becomes what information they need to know. Certainly
in the case of a stock promoter an investor may want to know the financial
interests of the promoter in the company and related assets, but they would not
need to know the promoters blood type. That example is silly and clear, but they
may be other finer points about what needs to be exposed and what can remain
private. Mandatory discloser of certain information may make the market more
efficient since each investor will not have to do all the research for
themselves. In the case of information breaches, breach notifications could help
victims of data theft be able to mitigate and look out for misuses of their
data. I suppose I would need more clarification for what is remaining private
and what has to be revealed, but yes it can be consistent with arguments for
mandatory disclosure.
Week 15
This write-upThe readings for this week are as follows:
1. Data Breaches and Identity Theft: When is Mandatory Disclosure Optimal?
Sasha Romanosky, Richard Sharp and Alessandro Acquisti
2. Stopping Spyware at the Gate: A User Study of Privacy, Notice and Spyware
Good, N., Dhamija, R., Grossklags, J., Thaw, D., Aronowitz, S., Mulligan, D.,
and Konstan., J.
3. The Role of Internet Service Providers in Botnet Mitigation: An Empirical
Analysis Based on Spam Data
Michel van Eeten, Johannes M. Bauer, Hadi Asghari, Shirin Tabatabaie and Dave
Rand
Like many of the papers we have read
over the last few weeks the “Data Breaches and Identity Theft” paper spells out
its core question in the subtitle. To reduce the effects of data breaches on
consumer losses many states have enacted mandatory disclosure laws to help
victims mitigate misuse of their data. However, according to the paper the
effects of these laws have not been rigorously tested, and there is some fear
that the laws may impose a burden on data holders and consumers that is not
commensurate with the results . The paper seeks to set up models for judging the
usefulness of these laws, and see what would have optimal effects on firm,
consumer and especially social costs.
Three common policy approaches are mentioned, relating to the chronological
order of events surrounding the breach:
1. Ex ante regulations: These are preventative rules, put in place to try to
keep the breach from happening in the first place.
2. Information disclosure: The hope is that the threat of internalizing costs
will incentivize the data holders to be more careful. It is also hoped that the
disclosure will help victims to take precautions to prevent further losses.
3. Ex post liability: These are recovery mechanisms. One example might be a
victim suing the data holder for loss compensation.
One of the more interesting sidelines
of the paper is the concept of consumer under-reaction and over-reaction. Will
some consumers be hardened to breach notifications and start ignoring them? Will
some over-react and cause themselves more cost than is justifiable considering
potential loss? An example of and over-reaction might me “I’m going to switch
all of my accounts” when the chances of loss is less than the time cost involved
in switching. Another example might be people who decide to never user a type of
service again, even when it is normally to their economic benefit (shopping
online comes to mind).
In the end, the paper claimed to show two major effects of mandatory disclosure
laws:
1. It changes the care model from unilateral to bilateral. Both the firm and the
consumer can take action to mitigate losses.
2. The laws impose costs on firms in two distinct ways:
a. Direct costs from disclosure: fines, fees, lost business, etc. (what they
refer to as disclosure tax).
b. They can force the firm to internalize some portion of consumer loss
(consumer redress).
They indicate both will cause a firm
to increase its level of care, but only the disclosure tax is a dead weight
loss. However, if consumer redress is low, some disclosure tax may be necessary
to reduce social cost.
The second paper, “Stopping Spyware
at the Gate”, tries to determine what influences user choices when they install
software. The subjects were given the scenario of a friend asking them to help
set up a new computer. The researchers gave five real world applications for the
test subjects to choose to install (Google Toolbar, Webshots, Weatherscope,
KaZaA and Edonkey). They split thirty-one test subjects into three groups.
1. Ten were in the control group, and only received the EULA before the install
commenced.
2. Ten were in the “Generic Microsoft SP2 Short Notice+EULA” group.
3. Finally, eleven were in the “Customized Short Notice + EULA”. The short
notices were generated in a standardized way, outlined at the top right corner
of page 5, and include items like (copied directly from the paper):
• The name of the company
• The purpose of the data processing
• The recipients or categories of recipients of the data
• Whether replies to questions are obligatory or voluntary, as well as the
possible consequences of failure to reply
• The possibility of transfer to third parties
• The right to access, to rectify and oppose
Strangely, table 1 on page 6 gives
the breakdown of groupings, and somehow 10+10+11=30 in the total. I assume they
just made a typo, unless I’m missing something. They also interviewed the
participants after the installations to gain insight into why they installed
what they chose. A few of the key observations from the authors include:
1. While users knew they were agreeing to a contract when they clicked though,
they felt they had a limited understanding of the EULA. They also expressed
distaste for reading long notices.
2. Many uses expressed regret about their installation decisions after the fact.
3. Short notices did improve understanding of the consequences of installation.
4. Short notices did not however have a statistically significant effect on
installation.
5. If the utility of an application was seen as high, users would install it
despite bundled software.
6. Brand plays a part in deciding what to install. For example, Google was seen
as a trusted name, but Weatherscope sounded too much like Weatherbug so people
distrusted it. Users also had a somewhat negative response to the name KaZaA ,
with some choosing to install Edonkey instead (at least in the EULA Only group).
However, this effect on KaZaA /Edonkey was somewhat reversed in the next point.
7. Too little information in short notices may cause an unwarranted impression
of increased security. Since KaZaA had less information in its short notice than
Edonkey, some in the “Generic +EULA” group thought it was less scary (See table
4 and sections 5.2.8 and 5.2.9).
There are a few things that I think
may skew the results and could be improved upon in future studies, some of which
the authors also allude to as possible weaknesses (sections 4 and 5.5):
1. The subjects were asked to install the applications on a test system, not
their own computer. There may be some question as to the level of care they took
since it was not their computer they would be affecting.
2. The users knew they were being observed as part of a study, even if they did
not know the study was about spyware. This may influence their thought
processes.
3. They mention the weakness of small sample size when it comes to students, but
what about installable applications? To get a better idea of the effects of
brand future tests could include more installable apps from big names like
Google or Microsoft. Also, there are many other apps that could fill the niches
listed, especially when it comes to file sharing, so testing the effects of many
other EULA and short notices based on them would be helpful to obtain a boarder
sample.
Paper four, “The Role of Internet
Service Providers in Botnet Mitigation”, attempts to determine if ISPs can be an
effective control points for botnet mitigation. Of special concern is the
practicality of rules imposed upon ISPs, and if they would be effective given
the number of ISPs out there, where they are located, and the amount of botnet
traffic each one is associated with. To gain empirical data on the number of
bots in the wild, and what ISPs the infected hosts were on, two approaches were
considered. One would be to gain access to the command and control channel of a
botnet and see which hosts were communicating. However, this approach only gives
a snap shot of the botnets they have infiltrated. Another approach would be to
set up hosts to act as honeypots, in this case spam traps, to look for
suspicious contact. The spam trap approach also has issues, such as potential
false positives, but it may give a more representative sample from a wider range
of botnets than the first approach. They went with the spam trap approach, and
used IPs to tie hosts to ASNs and countries. They then used the ASNs to try to
figure out which ISP the bot belonged to, with some difficulties based on not
having a database of mappings. More complete details on the problems of
attribution can be found on page 6 of the paper. To summarize just some of their
findings:
1. While the total number of ISPs is high (somewhere between 4000 and 100000
actors they say), many are bit players. Just 50 ISPs account for over half of
the infected sources. This seems to indicate that even if not all ISPs could be
brought under collective action (government interventions or public-private
sector cooperation for example), getting the larger ones on board could still
have a significant effect.
2. As competition is mostly driven by price, even if consumers care about
security there are “no adequate market signals” that can reliably guide them
choosing better ISPs from a security standpoint.
3. As might be expected, size was a major player in the number of infections at
a given ISP. A larger user base means a larger number of potential bot hosts.
However, per user, larger ISPs seemed to do better security wise that smaller
ISPs (less infections per capita). It was conjectured that this may have been
because of better automation. Even then, amongst ISPs of similar size there
could be an order of magnitude difference in the number of detected infections.
4. Higher rates of illegally copied software were associated with higher botnet
activity.
5. If level of education is used as a proxy for technical competency, it seems
technical competency has negative effect on bot numbers (less total infections).
In the end, the concentrated nature
of responsible ISPs seems to indicate that they could act as a critical control
point to mitigate botnets, even under current market conditions.