Economics of Information Security Paper Reviews and Notes

Economics of Information Security Paper Reviews and Notes

Below are my write-ups and notes for the papers I've been reading in the "Economics of Information Security" class I'm enrolled in. I'm guessing most of my readers won't get much out of them unless they have read, or plan to read, the same papers. More to come as the class continues.

Week 1

This write-up is for three documents, which will give numbers to for clarity when I refer to them later in the assignment:

1. Commercial Data Privacy and Innovation in the Internet Economy: A Dynamic Policy Framework
2. Information Privacy and Innovation in the Internet Economy
3. Federal Trade Commission privacy report.

       We were asked to answer three open questions. Here are my choices:

1. Should baseline commercial data privacy principles, such as comprehensive FIPPs, be enacted by statute or through other formal means to address how current privacy law is enforced?

        This is labeled a as 1.a in document 1 (it’s also in document 2), but my answer applies to all of the items under section one in some part. While I would like to see my own personal data being treated with respect in terms of my privacy, I’m not sure the FTC or a government entity would set reasonable rules. Even common language terms seem to differ, as mentioned in our first class (using the term “cyber” is often regarded with derision in hacker/security practitioner communities). I don’t have a lot of faith it politicians nor bureaucrats to understand technology well enough to make informed decisions. That said, no other entity may have the power to force commercial interests to “play fair”. I’m not sure a free market solution would work in this case since most people seem to care little about their own privacy until something egregious happens (merely see the actions of most social network users). I err on the side of not giving out information, but I also know this is not a workable option for some (or even I at times). I guess I just don’t have a good answer to privacy problems in general, and anything coming from politicians or bureaucrats I’d like to give the hair eyeball before I support it.

5 (Doc 2). What is the best way of promoting transparency so as to promote informed choices? The Task Force is especially interested in comments that address the benefits and drawbacks of legislative, regulatory, and voluntary private sector approaches to promoting transparency.

        The problem with informed choice is that it requires the consumer to take action to become informed. As the old saying goes, you can lead a horse to water but you can’t make them drink. Besides the obvious benefit of making sure a privacy policy is prominently displayed and easy to find, it might be useful to control how policies can be changed. When you start doing business with a company, let’s say Facebook for example, you may have a certain privacy policy in place. However, an organization may change their privacy policy after a consumer starts doing business with them in such a way that if the consumer had known, they would not have agreed to do business with then in the first place. As such, it would be helpful if notifications of privacy policy changes had to be made in advanced so that a consumer has a chance to stop doing business with the company before the policy takes effect. To be effective, along with advanced notification, some measures would need to be taken so that the organization must expunge data about the consumer after they have left. Without this expunging of data, the consumer’s privacy may still be violated by future actions on the part of the organization.

20 (Doc 2). Are technologies available to allow consumers to verify that their personal information is used in ways that are consistent with their expectations?
21 (Doc 2). Are technologies available to help companies monitor their data use, to support internal accountability mechanisms?

        While we were asked to answer only three questions, I have put 20 and 21 together as they can be seen as closely related. The crux of the two questions could be restated as: Do technologies exist that allow the auditing of who accesses information. If a company can monitor for how data is used, then they should be able to allow consumers to see the filtered reports of data use for just their records (with some pain, granted). This is why I’m treating these two questions as one. Do technologies for data use monitoring exist? Certainly, to name a few: Files access logs (if enabled), ACLs, data exfiltration prevention systems (ZixMail for example), policy to disable saving to removable storage, even simple access control lists could be seen as such technologies. The important thing to ask is how effective are these technologies, how easy are they to implement, how difficult are they to get around, and how comprehensive are they? Measures can be taken inside of a company to track where data goes, and who accesses it, but there are so many variables that I would doubt that any system could reliably say “we know where the data has been used, how, and who has accessed it”. There are so many ways data can be exfiltrated that blocking all possible avenues seems unlikely. That’s not to say that reasonable precautions should not be enforced, but there’s not much that can be done to keep an insider threat from bringing in a camera and photographing the data even if other vectors are locked down. I guess it all depends on expectations of the person asking the questions above.

Week 2

This write-up is based on the following two readings.

1. Why Information Security is Hard - An Economic Perspective
Ross Anderson
2. The Economics of Information Security
Ross Anderson and Tyler Moore

        I plan to make it a general commentary on some of the points that provoked thought. Both papers focus on decision factors that may affect security in a negative way. Ultimately, if we assume the decision maker is self-interested and rationale there are certain goals they will pursue that do not take security as the first and foremost factor. This is not necessarily a bad thing, businesses exist to make money and a business that is not profitable may not exist for long no matter how secure its products are. Also, people decide on what to buy based on the utility it gives them, or stated another way, its usefulness. A stone is far more secure that my smart phone, but not nearly as useful for information exchange (unless you count throwing it at someone). Put another way, some factors precede security in the hierarchy of needs. Now, most of the above assumes the decision maker is also looking out for other stakeholders (vendor wants to sell a product consumers will by, consumer want to buy something they see as useful); however perverse incentives may cause further problems when an agent wishes to do something solely out of self-interest.

        A few of the given example of economic incentives that trump security:

1. The majority of cost of the security measure is bore by someone other than the decision maker (an externality). An example given is anti DDoS software on end user’s computers. My thoughts: This seems like a weak example to me since most anti-malware packages are intended to look for the sort of bots used in DDoS attack. Granted, the person installs the anti-malware package to protect themselves, but they do protect others/the network as well by installing it. A positive externality.
2. Being first to market, and the advantages thereof, may take precedence over security. Ship now, patch later. My thoughts: I agree this is an incentive that gets in the way of security, though I think the advantage of being first to market may be over played. Not all companies that are first necessary lock up the market. For example, these are a few technologies where the first to market did not end up being the leader for very long: tablets, PDAs, Social Networks, Personal Computers. I’m actually trying to think of someone that was first to market that turned up holding onto the dominant position for more than five years.
3. Vendor lock-in caused by closed or less than documented protocols and specifications. My thoughts: I think this is less of a problem than it once was. It seems to me that others have gotten better at reverse engineering protocols and making things interoperate. I’m more worried about laws like the DMCA or patent trolls causing it to be illegal to reverse software/electronics for compatibility (or just knowledge). Then again, the first paper was written about ten years ago, so the landscape was a bit different .

        I do think things have swung somewhat the other way since the first paper was written. With all the problems cause by ad/spyware/worms/etc. security has come to be seen as a market differentiator by some users who aren’t even that techie. This is one reason the use of Firefox has taken off, and one of the reasons some people buy Macs (though I have my doubts about the inherit security of OS X being the reason, its niche status until more recently meant malware writers had less incentive to target it). Also, perhaps for PR reasons Microsoft seems to have improved greatly, sometimes to the point of problems with backwards compatibility and ease of use (Use of DEP, drive signing, UAC and other features in Vista and newer for example).

        In the second paper, I wish they had gone more into how/why UK banks are spending more than US banks on security measures. Something to do with wanting to make the machines “so secure” that they can place blame for any fraud at the feet of the users? I also think the second paper seems to miss the point of why some P2P systems did not take off (Eternity, Freenet, Chord, Pastry, and OceanStore) and others did (Gnutella and Kazaa). I don’t think it was about forcing you to share resources that made the first group less popular, Bittorrent encourages you to share resources to get faster downloads and it has taken off quite well (understatement). The first group (Eternity, Freenet, Chord, Pastry, and OceanStore) are about reliability/scalability/privacy issues, something most Internet users would not care much about. The second group (Gnutella and Kazaa) is largely about getting free stuff, which is widely popular with just about everyone.

        The second paper also seems to indicate OS 10 is of FreeBSD lineage, which is not quite right (though admittedly the Unix family tree is a bit like a genealogy from Arkansas).

        I’m also not sure the car analogy in the second paper really applies. With cars, there is a very significant cost associated with duplication (making more than one car). Making a good car means more than just good design, but also good parts that must be duplicated. It makes sense that the producer would not want to produce a better car than they can sell at profit. With software, pretty much all of the cost is in R&D/coding and unfortunately perhaps marketing, duplication of the software is almost negligible (media pressings in large quantity, online distribution, etc.). A software company can spend time on the security and reliability of the product, and use that as a market differentiator to demand a premium price.

Page 8 of the second paper also seems to indicate that making open bug markets would encourage people to find more bugs. I have a hard time seeing how this can be a bad thing in the grand scheme. Better some code debug wiz makes money off of selling the bug to the company that made the product or to a security product related company, then to keep it to themselves or sell it to a malware producer.

Week 3

For this week the readings are as follows.

1. System Reliability and Free Riding
Hal R. Varian
2. Incentive-Centered Design for Information Security (abstract)
Rick Wash and Jeffrey K. MacKie-Mason
3. The Changing Nature Of U.S. Card Payment Fraud: Issues For Industry And Public Policy
Richard J. Sullivan
4. Information Security Policy in the U.S. Retail Payments Industry
Mark MacCarthy

I will focus on papers one and two, with perhaps some influence from the others.

        For the first paper, I take it there are some people that find the economic ideas easier to express using algebra and calculus than in examples? Perhaps I can manage a synopsis of the three prototypical cases, using my own examples, please let me know if I get it right:
Total effort: The reliability of a system comes from the combined efforts of all responsible individuals. For an example in the security field I’d need to come up with a scenario where the work is similar across the board, no matter who is contributing. At first I thought “analyzing a protocol for security”, but I can see where this could fall into the “Best shot” case if one individual is far beyond the rest in a particular expertise. Perhaps using the example of a mix net for privacy would be an example? Even if one node only puts in a few resources, just having another node increase the anonymity set even if the contribution is far less than other nodes.

        Weakest link: The reliability of the system depends on the individual that does the least effort. I could see a scenario for this much easier. Let’s say you, PayPal, some online vendor and the credit card company are all responsible for protecting you from identity theft involving your credit card number. The one with the weakest security may likely be the one where the number becomes compromised, especially if an attacker knows which link is the weakest so they can target it first. This is sometimes referred to as going after the low hanging fruit in my circles.

        Best shot: The reliability of the system depends on the person that does the maximum effort. I’m having a tough time thinking of a scenario where this would truly fit. So many things are dependent on the efforts of more than one entity. I guess the example of an encryption key that is split amongst multiple parties (keys to the root DNS servers are an example of this as I understand) might work, as long as a single entry holds out and does not expose their key the system is safe. Then again, if the system implementers screwed up, maybe the attacker could get away with fewer keys to crack a system, or find some side channel avenues of attack. I think the best way to use the “Best shot” prototypical case is when explaining why something is not “Best shot” even when people treat it like it is. For example, some treat their networks like a “Best shot” prototypical case, relying on one or two things to stop an attacker (Example: Hey, I don’t need to worry about securing this box, the firewall will block it). Without “defense in depth” networks can become like candy: hard coating, soft gooey center.

        Not for the third paper. I was exposed to quite a bit about US law and banking/credit companies by reading this paper. Some of the PCI information I had encountered before, but quickly forgot after the job interview was over. Many of the rules are of interest to me form the stand point of how to balance flexibility with workable requirements. For example, if the rules merely say data must be encrypted, can I use the Caesar Cipher (I ROT26ed this whole write-up) and still be in compliance (I assume PCI requirement 4 has a footnote against this)? On the opposite side, does it always make sense to force someone to run Anti-Virus software (PCI requirement 5)? What if the system is so obscure that there is no AV package for it, or the AV software is almost useless to the point of just being there to take up CPU cycles? PCI requirement 11 comes up a lot amongst security folks when they talk about what is a pen-test vs. a vulnerability scan.

        Other random thoughts and question concerning paper three:

        As a side note, how is the CVV calculated? If it is mathematically based on the card number and expiration date (which may be guessable), can the CVV just be calculated if the attacker has the information? I need to look up more information on how this is done, as the paper does not seem to say.
Associating the cost with the breached entity may be hard. Sometime you know who let the PII slip, but I imagine most times you don’t. An attacker grabs the account information one place, and uses it at another. I guess the issuing companies can look for where the cards have been used in the past to see if they can correlate the suspected leak.

        I like that the paper mentions the costs to the card holder that are not directly financial. The card holder may not be liable for false charges, but they still pay for credit card fraud in time, worry, identity management and other costs.

        Page 28 mentions the cost of PCI being less for smaller businesses. Is this percentage wise, or merely dollar amount? I imagine smaller businesses could have a higher cost percentage because of their lack of infrastructure (or this could make it cheap to switch in some cases).

        If merchants think that PCI is offloading security costs onto them instead of the credit companies, they may be right, but what are their expectations for the credit companies to fix security issues? Changes in the credit company’s security infrastructure would seem likely to trickle down to the merchants in the cost of new equipment needed to use the card system (chip and PIN for example) .

        The paper seems to focus on card present transactions, almost exclusively. I almost never do these card present transactions, just online shopping. What are they doing to protect this online activity? I suppose they could issue smartcards which can display one time PINs via an LCD or e-Ink, but that would raise the costs of the cards.

Week 4

        As I’m not sure which papers, if any, this assignment is meant for I’ll just try my best to answer the questions. I have something else written for the papers assigned so far, but those may be reassigned for next week so I’ll hold onto it for now.

What are transactions costs, production costs and network externalities?

        I’ll use buying a PC as my example case for illustrating the concepts as best as I can, and as best as I understand them. Transactions costs are the cost incurred trying to get the product from producer to consumers that are institutional in nature and don’t directly figure into the creation of the product. In my example, the time it takes me to figure out what to buy is a cost, or the shipping fees I incur may be considered transaction costs. Fees paid to middle men may also count. Production costs are the cost incurred directly related to the production of a product. In the case of a PC, the cost of the parts and labor that went into making the PC would be production costs. A normal externality would be a cost incurred by a third party (or sometime a benefit) outside of the decision making of the buyer and seller, not reflected in the price paid by the buyer. An example might be the environmental costs incurred if I decide to dump my PC in a stream after I no longer have a use for it (see RoHS). For a network externality we normally think of something positive that makes the product more valuable the more people that have it. In the case of a PC, the more people that have it the more people you can exchange compatible documents and software with, which would be a positive network externality. A negative network externality might be the case of a prestige item, where the more people that have it the less someone might value it because of lack of rarity (I wonder how much my Apple IIe is worth?). To use the PC example instead, I suppose if there are enough PCs created, and they are all networked and cause congestion to the point of being useless then that would be a negative network externality. Perhaps having a platform monoculture that makes malware creator’s jobs easier might also be a negative network externality.

How do these influence investment in networked goods?

        I’d imagine transaction and production costs figure the most directly, in that the buyer figures out what they are willing to pay for a good, and the producer decides how many they are willing to make for a given price. Of course as time goes by, if a positive network effect is in play, the buyers may value the product more because they can use it with others that have the same or similar product, and perhaps the producer could benefit from economies of scale to make the product cheaper (or they may make it a prestige item, sell less, and try for a higher markup).

Which categories of security products can be considered networked, and what can be considered stand-alone?

        This one is a little tougher for me to answer. I’ll consider email encryption software to be a security product/feature that would benefit from network effect. The more people that enable or support email encryption software, the more people who could or would use it. I imagine a privacy mix net would be a good example, as the more people who join the network the more anonymous each person becomes in the crowd. Ant-Virus software may be an example of a standalone security product, as most of the benefit goes to the person that runs it on their system. However, there is a positive externality there as well, having AV on their system may benefit others because hopefully their system won’t get infected and attack other systems. Perhaps exploit code that is written and kept to one author could be a standalone product, but I don’t know if you would categorize this as a security product? I’m having a hard time thinking of a security product that has no positive externalities; even a firewall meant for one organization’s network may benefit others in small ways. Though a firewall’s positive externality may not be a network one, as I don’t believe on person’s firewall becomes more valuable just because others have one (Though many folks using the same firewall may make support easier). Perhaps hard drive encryption software could be considered a standalone product, since only the user receives the benefit, and other than the chance of bug being more easily spotted there is not much of a positive effect in others using the exact same or compatible software. Then again, if the person is using hard drive encryption to protect others PII there may still be a positive externality (though perhaps not a network one?).

Week 5

This paper will be for the readings:

1. The Economics of Networks, Nicholas Economides
2. What really matters in auction design, Klemperer
3. Bug Auctions: Vulnerability Markets Reconsidered, Ozment

Since I already covered Economics of Networks in the extra credit write-up from last week I’ll focus on the two auction papers. I had not thought of some of the forms of collusion put forth in the Klemperer paper before. I get the picture in my head of wolfs nipping at each other over who gets the best parts of a caribou carcass, using largely nonverbal collusion. The fight between McLeod and U.S. West over the Minnesota spectrum was interesting: You let me have what I want, and won’t bid up prices where you have the most interest.

The “Winner’s Curse” has a nice corresponding idea in warfare: Pyrrhic victory. King Pyrrhus of Epirus won battles against the Romans, but the losses were so great his army was vastly weakened. I imagine the “Winners Curse” being a larger problem for firms with fewer resources; larger firms may be willing to take the losses to drive out the competition via attrition (sort of like the Romans to Pyrrhus or Russians to the Germans). This is sort of like predatory pricing, with the hope being to someday be a monopoly. Also, the stories of promised Pyrrhic victories from page 174, especially Pacific Telephone hiring a prominent auction theorist to give seminars to the competition on the Winner’s Curse, were amusing.

On the subject of reserve prices: I wonder how often the values are set at the wrong levels in government run auctions? I’d have liked to see more information on this, and how good reserve prices are to be chosen. Consulting experts I imagine, and comparing similar auctions (which I’ve been told caused part of the bubble in one speculation market: Comic books). A few less than rational actors bid up a similar item, and then others expect to be able to sell their items for similar amounts.

The Bug Auctions paper’s use of the abbreviation VM for vulnerability market causes a name collision in my head, not sure that is the best term to go with but it works for the paper. Looking over page two of the Bug Auctions paper, there is another contest I can think of that may be interesting to look at, Pwn2Own from tipping point:

http://dvlabs.tippingpoint.com/blog/2010/02/15/pwn2own-2010

In this case it’s not the product vendor directly offering the prize, but a third party security company. In the case of auctions, I wonder how third party security companies would come into the mix; their influence did not seem to be a major expectation of the paper. For example, an IDS company may be interested in paying for exploits so they can write signatures and offer/market better protection to their customers. In some cases the third party security company may have deeper pockets than the original producers (especially with open source software projects). How should rules be set up in regards to the sharing of information goods? Is the third party security company obligated to the pass on the information to the producer if they buy it first? How often is it better for the third party security company to buy the bug for themselves, as oppose to let the producer buy it and then reverse engineer the patch to make a signature or mitigation?

Some of the assumptions about the black market I’m not sure are true. I’m not sure a risk neutral bug finder will generally sell to the producer and not the black market for fear of legal action, that depends on how the bug finder evaluates the effectiveness of the law and the producer’s investigative abilities. Attribution of exploit code can be a tough thing to do. That said, if the bug finder does not already have ties to the black markets I’d imagine he would just sell to the producer since the point of contact would be easier. I’m not sure the “sleeps with the fishes” scenario (page 8) is as likely as the paper’s author; it might make more rational sense to put the vulnerability researcher on retainer than to kill the goose that lays the golden eggs (at least till the vulnerability researcher is found moonlighting for someone else). Also, some of the mitigations put forth in the paper to avoid cheating may make the vulnerability researcher just skip the whole official process if it makes it to laborious to get paid. A bug finder may accept a lesser payout in exchange for a simpler process.

Not really the focus of the paper, but an analysis of entry costs for vulnerability researchers might be interesting. A large Information Technology company will have more resources than some guy living in his mom’s basement eating Cheetos and drinking Code Red, but the basement dweller would have lower opportunity costs. I say that with all love to basement dwellers, I’d be one if I had a basement. The large Information Technology Company may have better opportunities to make money than finding bugs in some other company’s software.

I have one final discussion point, this time on the subject of copyright infringement. If the producer takes out features in test copies for fear piracy, doesn’t that mean that there are features that have not been tested for vulnerabilities? It’s sort of self-defeating to the process to release software to the testers that is too crippled. Expiration systems could be put in place instead, but people who are good vulnerability researcher also have the skills to be good software crackers if so inclined.

Week 6

The papers for this week are:
1. An Empirical Approach to Understanding Privacy Valuation
Luc Wathieu, Allan Friedman
2. Privacy, Economics, and Price Discrimination on the Internet
Andrew Odlyzko
3. Pricing Security
L. Jean Camp, Catherine Wolfram
4. Impact of Software Vulnerability Announcements on the Market Value of Software Vendors – an Empirical Investigation
Rahul Telang, Sunil Wattal

        Let me start by quoting the general hypothesis of the first paper: “consumers are capable of expressing differentiated levels of concerns in the presence of changes that suggest indirect consequences of information transmission”. I also love the term “homo economicus” for a rational self-interested actor. All in all, I don’t have much to say on this paper. While not a direct criticism of the research methodology, I wonder how the results would be different if the message had come from a real alumni association, and not simulated by sending it to people who were paid five dollars to participate. My guess is the results might be the same for a real notice for those few that read it, but that most people would not even have bothered reading the information that was sent to them (I know I round file most things the university sends me unless it’s a bill or a check). Also, while they say the subject pool was diverse, I’d still like more information. Another point, it seems rational that people would be more concerned about privacy with “No personal benefit/Dissemination” then just “No personal benefit”, but what about the untested “Explicitly gained personal benefit/Dissemination”? What I mean by this is the notice said that they “may” receive a benefit, or a modified one said they did not, but none as I read it explicitly said “you received this benefit”. The positive feeling associated with getting something might have made people value their privacy differently (think of the “you just won a free iPad “ scams on the Internet).

        For paper two, the title “Privacy, Economics, and Price Discrimination on the Internet” seems odd as it has little to directly do with Privacy or the Internet. As a primer on price discrimination is seems great, just the examples given don’t seem closely tied to Privacy or the Internet. The airplane example does involve privacy to a degree, as government regulations about identity may make it harder to game the system in ticket buying by using different names. Perhaps finding an explicit example where “frequent flyer miles” were used to judge someone’s flight patterns and price based on that would have helped. The Dell example I’ve seen firsthand, but it’s also not about privacy really, or the Internet other than as a vector for ordering. The Dell example seems to be more of a “cost of information/opportunity cost” and “lazy shopping of the user“ issue. For example, I have the time and already know from shopping around that I’d have a hard time building a base system for cheaper than Dell offers it, but that any upgrade (RAM, hard drive, better video card, etc.) would cost way more from Dell then I could get it for if I bought it someplace else and put it in myself (granted, I guess not everyone can do this). I’ve also learned from personal experience that how you got into the store (business/education/home user) makes a big difference on the deals you receive for essentially the same hardware. I’d guess some users may not have the time to research the best buys, or their opportunity costs are higher (some people may be better off making money at their job than spending hours online looking for the best deals). On a less flattering note, some people are just lazy shoppers who click buy on a whim. On the bright side for the Internet, it can also make price discrimination harder since the prices are generally show in the open and it takes little time to compare one online shop to another (you don’t have to drive around town price checking). If you don’t have the time, there are even sites that specialize in finding the best deals, or listing competing shop’s prices side by side. There is the issue of bundling, but even that does not seem much related to privacy nor the Internet specifically. Did I miss something in the article, or was its ties to Privacy and the Internet as weak as I think? I really don’t think they made a convincing argument for “Privacy appears to be declining largely in order to facilitate differential pricing”, I’d still say the main drivers are targeting ads and predicting user behavior.

        As this write-up is already running long, I will cover the last two papers lightly. First the Pricing Security paper. On the subject of government subsidies for research, I got to be in the room when Peiter "Mudge" Zatko announced this:

https://www.infosecisland.com/blogview/11614-DARPA-Seeks-Innovation-from-Hacker-Community.html

        DARPA’s Cyber Fast Track program will be something to watch, especial as the hacker community has far different methods than academia when it comes to research(not that there aren’t some people with a foot in both realms).

        The paper defines security vulnerabilities in part by saying they “enable unauthorized access.” How about Denial of Service vulnerabilities, would they not be considered part of the proposed market? Not all DoS attacks are based on merely overwhelming a host with traffic from many other hosts (DDoS), some are the result of software errors or the unexpected outcomes of odd data that can be fixed in code. A few examples:

http://ha.ckers.org/slowloris/
http://insecure.org/sploits/ping-o-death.html
http://www.iss.net/security_center/advice/Exploits/TCP/SYN_flood/default.htm
http://www.pentics.net/denial-of-service/white-papers/smurf.cgi
(Granted the Smurf Attack is a DDoS, but one caused by someone else’s network misconfiguration)

        Also, just a little nitpick, depending on the video card and what you plan to do with it the result can be far greater than adding a general purpose CPU. I have a few friends who run a business cracking passwords, and their core box uses the parallel processing power of several video cards to speed up the process to levels greater than a standard CPU can do the task.

        For the vulnerability impact article, I found the following interesting and have theories as to why they got some of the results they did. I perfectly understand why having the vulnerabilities mentioned in the press caused more of an effect than that from a CERT announcement, how many investors read anything form CERT? As to why the loss was greater when the vendor found the vulnerabilities than a third party, perhaps the press follows the vendors press releases far closer than the security researcher’s (I image News Week follows all press releases coming from Microsoft, but I doubt they follow many from HD Moore). Thus, vendor releases of the information may mean higher press coverage. As to why Microsoft is less affected than others market share wise when a vulnerability is announced, I imagine many people have one or both of two attitudes:

1. Well, it’s Windows/Office, I have to use it anyway (not exactly true, but I think the thought process still applies).

2. Another vulnerability in a Microsoft product? In other news, ice is cold. It just does not get that much attention anymore, especially with the clock work nature of “Patch Tuesdays”.

Week 7

The readings for this week are as follows:

1. Judgment Under Uncertainty: Heuristics and Biases
D. Kahneman, Paul Slovic & Amos Tversky
2. The Economic Consequences of Sharing Security Information
Esther Gal-or and Anindya Ghose
3. Network Security: Vulnerabilities and Disclosure Policy
Jay Pil Choi, Chaim Fershtman, Neil Gandal
4. Competitive and Strategic Effects in the Timing of Patch Release, Fifth Workshop on the Economics of Information Security
Ashish Arora and Christopher M. Forman and Anand Nandkumar and Rahul Telang
The first paper on judgment may best be summarized as “people are not good at judging probability in their head”. The main cognitive biases covered are:

Representativeness

        I looked up similar scenarios and this seems close to the “Base rate neglect” and “Stereotyping” biases, at least how the example is given in the article. Given certain traits that may or may not be related to the outcome/category, the person judging will give undue weight to the traits (especially if stereotypical) and ignore the base rate of each possible outcome. Still, if a trait is highly correlated to a given outcome/category, I’d have a hard time faulting the user for selecting it.

Availability heuristic

        The items that can be remembered are the ones that get the greater weight. For example, if a list of names is 50/50 split between male and female names, but one gender of names is composed of famous people, the gender having famous people’s name may be biased for when the subject is asked to estimate the split percentages.

Anchoring effect

        Sometimes this can be just a number that is first mentioned to the subject that they then bias for when making a judgment. It’s sort of like the power of suggestion in a way. In the case of multi-event chance calculations, the tendency is to judge the outcome to be closer to the number of the starting probability.

        I wonder, evolutionarily speaking, what were the reasons for these biases to develop in human cognitive function? It could be just happenstance, but my intuition causes me to doubt that (paper referential joke).The sorts of problems given are not the kind that an ape would normally face; I wonder if these biases have some sort of advantage survival wise against common natural decisions?

        I’m not sure what to say about the “The Economic Consequences of Sharing Security Information”. They say of their model “To answer these questions, we analyze a market consisting of two firms producing a differentiated product in a two-stage non-cooperative game.” I’m still not sure how this game/simulation was run, or if I’d give its results much weight. Maybe if I had a better grasp of the kind of experiment design they were doing, I’d be able to follow their clarifications better. As it stands, I can’t say I know if they proved their point or not.

        For “Network Security: Vulnerabilities and Disclosure Policy” I’ll see if I can come up with some questions that do not seem to be answered by the paper. My biggest issue would be how do you judge the probability that a “hacker” (as the word is used in the paper) will find and use the vulnerability first? It seems that if a criminal had the exploit code, they would keep it a secret as a competitive advantage. I don’t think that choosing an accurate probability for exploitation is an easy task, but this framework relies on having that probability to make an optimizing decision. As the first paper this week points out, sometimes people are terrible intuitive statistician. I also wonder, in the case of the not patching scenario, if sales will be lost because of some customers demanding that the code be maintained and going elsewhere if it is not. I have a hard time feeling sorry for those that don’t patch their systems; then again I’m a geek. Perhaps more can be done by the companies to make the patch process less painfully, which I think Microsoft has done a lot of work on since XP SP2, but automatic patching of third party apps is still a pain. A scenario I don’t remember being mentioned in the paper is “stealth patching”, where a fix is rolled out with other things. It’s frowned on in the industry it seems, but it’s an option to look at in a paper like this. As an aside, I’d generally be against government mandating of disclosure, but I also hate the idea of laws like the DMCA being used to stifle the release of the information when it comes from a third party.

        The fourth paper seems to largely reinforce items that seem intuitive, at least to my mind. To summarize:

1. Then there is competition, vendor patch earlier. Don’t want customers to jump ship or your competition to use a bug as part of a negative marketing campaign.
2. The threat of disclosing causes vendors to patch faster. Lights a fire so to speak.

Neither of those two points seems shocking, but it’s nice to have a study based on real world data to point to. Some papers seem to be like the saying “Well yes, it works in practice, but will it work in theory?” I need to find a good attribution for that line.

Week 8

The readings for this week are as follows:

1. Economics of Security Patch Management
Huseyin Cavusoglu and Hasan Cavusoglu and Jun Zhang
2. Honey Pots, Impact of Vulnerability Disclosure and Patch Availability
Ashish Arora, Ramayya Krishnan, Anand Nandkumar , Rahul Telang and Yubao Yang
3. Windows of Vulnerability: A Case Study Analysis
William A. Arbaugh and William L. Fithen and John McHugh

The first thought that came to my mind when I read the “Economics of Security Patch Management” paper was the same thought as previous papers that are in a similar vein: How do you accurately estimate risk? Without a somewhat accurate estimation of risk, mathematical models for optimal patch policy seem of little use. Another thought came when the idea of shifting more of the cost of patching to the vendor was mentioned: How do you avoid perverse incentives? If the vendor knows that they will have to share more of the costs, then won’t they avoid releasing a patch at all (unless there is wide media attention about the vulnerability forcing their hand) to avoid having to incur the costs associated with it? Another thing to address is the use of patches in load balanced systems and methods of patching that can mean less down time (translating to less cost).

For the paper “Honey Pots, Impact of Vulnerability Disclosure and Patch Availability”, let me start by reiterating their key findings:

1. The disclosure of vulnerabilities increases the number of attacks on hosts, while the availability of patches reduces the number of attacks. Keeping vulnerabilities secret may also result in fewer attacks.
2. Vulnerabilities for which patches are release earlier are attacked less than vulnerabilities whose patches are relatively new.
3. Open source software vendors seem to patch faster than closed source vendors, and large vendors tend to be more responsive to the vulnerabilities disclosed in their products.

        Some of these statements may seem self-opposing. For example, open source software projects patch faster in general than closed source, and large vendors patch quicker than small vendors on average, but how often is an open source project something from a large vendor? Here is another interesting duality: the existence of patches reduces attacks, keeping vulnerabilities secret reduces attacks, but the existence of a patch also generally means the vulnerability is no longer a secret. These conclusions should probably be read as there being a balancing act, where one effect overwhelms the other for a certain time period or under certain conditions. This is alluded to in the closing remarks of the paper.

        Much of this paper I don’t think I have the ability to analyze thoroughly. I need to brush up on statistical terms, commonly used variables and symbols. One thing I would like some clarification on is the meaning of “secret” vulnerabilities. If a vulnerability is “secret”, there would be no signature to test against. Given that they had pcaps for the different time periods, I’m assuming that a vulnerability was considered “secret” if it was undisclosed and unpatched at the time the pcap was made, but the signature that was used to find it came out at a later date.

        Normally I don’t bother with grammar/typo issues, but there are some in this paper that were curiously missed considering it lists five authors. I’d assume this would mean five people proofreading each other’s work. I miss a lot of my own mistakes, in part because when I proofread I see in my head what I meant to write, not necessarily what I put down on paper. For example, on page eight, what does “The probability of a vulnerability being exploited is a function of the attacker’s fixed cost to attack relative to the gains from attacking relative to the gains from attacking” mean? I’m guessing the sentence was amended at some point without noticing the redundancy. On page thirteen, footnote fourteen, I’m guessing “lunch” type should be “launch” type? On page fifteen, paragraph three, “loss hat” is probably “lost that”. I look forward to folks finding similar issues in this write-up, as I’m sure I have them.

        One final point, and this is something they do mention in the paper, not all vulnerabilities are equal. Some are easier to exploit, or may cause greater damage, and future research should look into this aspect.

        I think the last paper “Windows of Vulnerability: A Case Study Analysis” might best be read as a history piece, showing how we got to the current thoughts people have on patching and vulnerabilities. Many things have changed in the last ten years. A few things I would note: The step referred to as “scripting” might more often be referred to as “weaponizing” in modern times. On the fourth page of the article (55) they state: “We rarely encounter cases with CERT/CC’s preferred ordering: Following a carefully controlled initial disclosure, a modification or configuration change corrects the vulnerability, a public advisory reveals that the problem has been corrected, the vulnerability is never scripted, and it dies quickly. Death eventually followed years later.” Anymore, if a vulnerability can be scripted, it seems to me that it likely will be. The Metasploit project and ExploitDB are quite good at releasing exploits for known vulnerabilities. Sometimes, in the case of ExploitDB, the exploit code may not be fully weponized but just pop up calc.exe or the like. This however can be enough to get an attack on their way if they know a little about shellcode. Another historical footnote: “As a result, attackers could execute arbitrary commands on the Web server at the privilege level of the HTTP server daemon—usually root, which is the most privileged user in Unix systems.” I’d hope that most HTTP services are not usually running with root privileges anymore. It’s at least no longer the default on most systems I see.

Week 9

The readings for this week are as follows:

1. When 25 Cents is too much: An Experiment on Willingness-To-Sell and Willingness-To-Protect Personal Information
Jens Grossklags, Alessandro Acquisti
2. The Red and the Black: Mental Accounting of Savings and Debt
Prelec and Loewenstein
3. Valuating Privacy, Fourth Workshop on the Economics of Information Security
Bernardo A. Huberman and Eytan Adar and Leslie R. Fine
4. Incentive Design for “Free” but “No Free Disposal” Services: The Case of Personalization under Privacy Concerns
Ramnath K. Chellappa, Shivendu Shivendu
5. Why We Cannot Be Bothered to Read Privacy Policies
Tony Vila and Rachel Greenstadt and David Molnar

        I read papers two and three before paper one, so my thoughts on paper one (When 25 Cents is too much: An Experiment on Willingness-To-Sell and Willingness-To-Protect) are influenced by reading those first. The greater willingness to reveal weight related private data in paper one seems odd, given that people demand on average $74.06 in the third paper. Perhaps this is a case of anchoring? Given that the third paper constructed the auction as being up to $100, maybe that anchored participant’s prices high? This is alluded to in the discussion section of paper one in section 5.2. The results seem to indicate people have a greater willingness to sell their data at a given price than to protect it.

        For the paper “The Red and the Black: Mental Accounting of Savings and Debt” I found a few key concept interesting. The effect of “coupling”, where the costs of a benefit are not as directly mentally tied to the use of the benefit, is a useful short hand for explain many things. Why do some people buy so much on their credit card, when they don’t have the cash? The example given of a sports car being made less enjoyable each time a payment had to be made was good. One line I would like to nitpick: “The rationale for such a feeling is somewhat unclear, since paying off the loan doesn’t diminish the real opportunity cost of purchasing the car: However he pays for the car, Jones has less wealth, which will inevitably require some sacrifice in future consumption.” Not having to pay the interest part of the payment may be a huge benefit, depending on the future value of the money. For example, let’s say I bought a house at an interest rate of 6%, but Certificates of Deposit are returning about 1.3% right now. Am I not better off keeping a small “just in case fund” and using the liquid capital to pay off the house? If rates stay the same in the long run less is coming out of my pocket grand total.

        Paper two indicates that people are debt adverse, even to the point of not making the best utility maximizing choices, and thus are less than rational. One example given is: “Contrary to the economic prediction that consumers should prefer to pay, at the margin, for what they consume, our model predicts that consumers will find it less painful to pay for, and hence will prefer, flat-rate pricing schemes such as unlimited Internet access at a fixed monthly price, even if it involves paying more for the same usage.” Perhaps the payer is still being rational however. They may not know what their usage will be, and want to avoid overage charges. Some service providers may be quite draconian when it comes to overage charges, and while it’s not an ISP example, think of charges for texting on phones even when the true costs of the overage incurred by the phones company are miniscule. Could debt aversions be largely explained by fear of the unknown?

        In the paper “Valuating Privacy, Fourth Workshop on the Economics of Information Security” the point I found most interesting is the context of the information, as in who it is given to. People were more willing to reveal BMI information to people they don’t know, “phenomenon of the stranger” effect, than people they associate with. Would other data, like financial or contact information, be as easily given to a stranger? I would guess that people are more willing to give out information that is slightly embarrassing to strangers than data that could more readily be used to directly harm them. This “phenomenon of the stranger” may however explain why people are willing to tell things to bartenders, and why people with the perceived anonymity of the Internet will say outlandish things on forums.

        As a final thought on this paper, I’d like to note that BMI is a problematic metric depending on the group being observed. BMI does not take into account muscle mass, and as such many people who are “gym rats” are labeled as obese. If the survey were done with mostly bodybuilders, I imagine most would be more than willing to reveal their BMI since they know it is of little values in their field. A better metric might be body fat percentage.

        For the last two papers I’ll give short synopses.

        The “Incentive Design for “Free” but “No Free Disposal” Services: The Case of Personalization under Privacy Concerns” concerns personalization services offered via browser tool bars and other means. The consumer’s benefit is the personalization (custom search, new features) and the vendor’s benefit is preference information obtained about the consumers. A “No Free Disposal” property occurs when more of something than is desired causes a disutility for the consumer. In other words, more is not necessarily better and there may be a point at which more is actually worse. This paper attempts to find models for optimizing “No Free Disposal” personalization vs. privacy goods so the vendor knows what to offer the consumer.

        In “Why We Cannot Be Bothered to Read Privacy Policies” the key focus is on the asymmetric nature of information as it concerns privacy on websites. As a result of this asymmetric information concerning what the owners of the site will do with the data they decided to look into it as a “lemons market with testing” where what people expect causes the sellers to only offer the least valuable good (lack of privacy). They wished to find out the effect signals in the markets, specifically privacy policies, and the costs for consumers of testing if a site meets their personal requirements. They find that that the market does not move directly to an equilibrium point, there is fluctuation as to the amount of privacy offered and expected, and that the equilibrium point may change with time.

Week 10

The readings for this week are as follows:

1. Who Signed Up for the Do-Not-Call List?
Hal Varian and Fredrik Wallenberg and Glenn Woroch
2. On the Viability of Privacy-Enhancing Technologiesin a Self-Regulated Business-to-Consumer Market:Will Privacy Remain a Luxury Good?
Rainer Bohme and Sven Koble
3. Who Gets Spammed?
Il-Horn Hann, Kai-Lung Hui, Yee-Lin Lai, and S.Y.T. Lee and I.P.L. Png
4. Spamscatter: Characterizing Internet Scam Hosting Infrastructure
David S. Anderson, Chris Fleizach, Stefan Savage and Geoffrey M. Voelker

        The key pursuit of the paper “Who Signed Up for the Do-Not-Call List?” is to find demographic information for those who enrolled in the Federal Trade Commission’s Do-Not-Call list. Some key information they were looking for includes the monetary value that households attach to being on the Do-Not-Call list and the effects of different registration vectors (phone vs. web). They also tried ascertaining racial and ethnic statistics about who signed up for the Do-Not-Call, but since the Do-Not-Call signup did not ask for this information they attempted to figure it out based on US Census reported information concerning those areas where people had signed up. Amongst the findings:

1. Figures for the value of the Do-Not-Call list varied based on the assumed level of knowledge about State-run programs that were already in existence, their costs, and what people who never signed up would have possibly valued it at. Figures varied from $7.5 million to an upper bound of $48 million per year if what people are willing to pay is the metric used to determine value. If one instead assumes each unwanted call imposes a disutility of $0.10, then the value of the Do-Not-Call would be closer to $3.6 Billion per year. There seems to be quite a disparity between what people might be willing to pay, and the benefits they receive. With the free signups more people took advantage of the FTC program than would likely have otherwise.

2. Low income households had a lower probability of signing up for the Do-Not-Call list.

3. On page 12 it is stated that counties with a high fraction of Internet users had higher signups rates, though “not by a dramatic amount”. I’m a little confused however by the line on page 2 that read “However, there are some surprises, as in the case of Internet penetration rates which appear to be negatively, if weakly, related to sign-up frequencies.”

4. Racial statistics are given on page 10. The statistics may be somewhat rough because of how the data had to be sourced. Whites seemed more likely to sign up than blacks, Asian and multiracial households also seemed to have higher signup rates than average.

5. Larger households seemed to have lower signup rates. It was conjectured that this may be because the answering of unwanted calls was spread out amongst the household.

        In the paper “On the Viability of Privacy-Enhancing Technologies in a Self-Regulated Business-to-Consumer Market: Will Privacy Remain a Luxury Good?” they tried to develop a model to compare government vs. market driven privacy demands and people’s willingness to pay for privacy. The crux of it is: can a seller gain more revenue by offering privacy and hopefully obtaining more privacy valuing customers vs. the gain obtained by using information about the buyer to price discriminate. According to their analysis most sellers can increase revenues by supporting privacy enhancing technologies. However, this is dependent on how many buyers they would lose from not offing privacy, and the potential gains from price discrimination, so different markets will vary. It should also be pointed out that the desire for privacy can be the trait that price discrimination could be based on.

        The papers “Who Gets Spammed?” and “Spamscatter: Characterizing Internet Scam Hosting Infrastructure” both concern spam of course, but the type of spam seems to differ a little.

        The “Who Gets Spammed?” paper points out that there is little to discourage spammers monetarily since most of the cost is incurred by third parties. The researchers signed up for multiple accounts at multiple mail providers and chose different setting as to privacy. Some accounts’ email addresses were also posted on the web where they could be scraped, while others were kept somewhat private. Some key findings were that Hotmail accounts got spammed more than the others, with the rest in descending order from most spammed to least spammed being Lycos, Excite, and then Yahoo. Of course, those that had their email address posted publicly on the web got more spam than those that did not. Accounts that declared interests also seemed to receive more spam. In this paper most of the spam originated from the email services providers and their marketing collaborators, that does not seem to be the type of spam focused on in the next paper.

        The paper “Spamscatter: Characterizing Internet Scam Hosting Infrastructure” describes its focus in the title. The spam here is pretty clearly not from the email services providers or their marketing collaborators. They used various image processing techniques to correlate related scams together based on graphic similarity, even if the pages differ somewhat they could still hopeful tell if two pages were part of the same scam. They found many interesting points, a few of which are:

1. Spam relays were more likely to be transitory than scam hosts, but this makes sense since web pages need to stay up to be viewed, but once the emails are sent the relays are no longer as important. There was also only 9.7 overlap between spam relays and scam hosts.

2. Most individual scams were hosted on a single IP, but may have had different URLs and vhosts in use to keep from being blacklisted based on the URL string.

3. A single server may host more than one scam. They say this suggests that individual scammers may have multiple scams going on at one time, or that some hosts are more accommodating to scammers and as such get business from multiple scammers.

4. Malicious scams (phishing/malware/etc.) had a shorter lifetime, and more mundane shopping scams had longer lifetimes.

Week 11

The readings for this week are as follows:

1. Proof-of-Work Proves Not to Work
Ben Laurie and Richard Clayton
2. Proof of Work can Work
Debin Liu and L Jean Camp
3. Adverse Selection in Online 'Trust' Certifications
Benjamin Edelman
4. Privacy-Aware Architecture for Sharing Web Histories
Alex Tsow, Camilo Viecco, and L. Jean Camp

        The first two papers will be the focus of this write-up and they deal with the feasibility of using proof-of-work algorithms to combat spam. One of the core problems of spam is that it cost next to nothing for the spammer to send their emails. The idea has been floated in the past to make emails cost some form of postage, but few seem to like the idea of real money being used. This is where proof-of-work comes in. The core idea of a proof-of-work is to make the initiator have to work out some computational problem that is hard to solve, but easy to verify. In the case of email the sender has to solve the problem and send the right solution before the recipient will accept the email. Since the problem is hard to solve, but easy to verify, the sender will take the brunt of the computational and time burden. Hashcash is one such algorithm. Part of the idea is that legitimate users send few emails relative to spammer and have enough idle CPU time available that it should not be a burden to them, but it will be a burden for spammers who send massive amounts of email.

        The paper “Proof-of-Work Proves Not to Work” tries to ascertain how hard to make the problem, time wise, to be a big enough of a burden to make spamming economically unfeasible. The first paper’s conclusion is that it is currently impossible to discourage spammers sufficiently with proof-of-work systems without unacceptably effect on legitimate users. They calculate that to drop the average spam per person down below a certain fraction (S) what the proof-of work-burden (C) would have to be. To achieve an S of 0.01 for example, a C of 346 seconds may be needed, depending on assumptions about the spammer’s infrastructure and ability to control a large botnet. A few of the problems highlighted by the first paper are:

1. The disparity in processing speeds between mail agents (desktop pc vs a PDA for example). Some of this has been offset by work in algorithms that are memory bound instead of CPU bound, as memory speed vary less than CPU. However, the paper states: “To address this problem, Dwork et al. [9] have recently proposed puzzles that rely on accessing large amounts of random access memory.” My question is how likely is a system with a slow CPU to have large amounts of memory?

2. How do you handle mailing lists where a single email is sent to a “list exploder” that would then have to figure out the proof-of-work problem for each recipient? The authors assume since this is clearly impractical that the mail exploder is delegated some authority by those signed up for the list to check the sent mail once for them.

3. Spammer would likely use botnets to send their mass mailings, so they would not have to pay directly for the hardware costs of intensive proof-of-work operations.

        The second paper proposes that proof-of-work could work if it was part of a reputation system and combined with current commercial anti-spam technologies. The proof-of-work burden could be adjusted based on reputation. New hosts would have a higher proof-of-work burden than those that have been around for a while and are well behaved. If a formally well behaved host starts to misbehave and spam is caught from it, its proof-of-work burden can be increased until its reputation as a trustworthy sender has been built up again.

        The third paper, “Adverse Selection in Online 'Trust' Certifications”, wishes to ascertain the real value of trust authority certificates like TRUSTe and BBBOnline. Is there adverse selection from companies who are not trustworthy seeking to be certified to gain a veneer of respectability? Also analyzed is the trustworthiness of the top organic results from search engines. Edelman used SiteAdvisor’s (he is on its Advisory Board ) test results to judge the meaningfulness of the certifications and search results. Sites with a TRUSTe certification were only 94.6% trustworthy, vs 97.5% of all site tested being trustworthy in SiteAdvisor determination for the general web population. This would seem to indicate that TRUSTe has adverse selection in effect, and perhaps seeing a TRUSTe certification should be a bad sign to end users. BBBOnline certified site however seemed to be more trustworthy than a random cross section of site. However, BBBOnline means of enrolling new sites to certify may not scale well. Organic search results also did not seem to suffer from adverse selection, but sold search engine ads did.

        The fourth paper, “Privacy-Aware Architecture for Sharing Web Histories” concerns the Net Trust system. Net Trust is a rating system that uses multiple sources of data to give the user information so they can decide if a site should be trusted. One of the main sources of information comes from the user’s Net Trust social network, using both implicit browsing history of peers and explicit ratings. This is implemented in part as a browser tool bar for managing social networks and personas, and for viewing ratings. The user can manage different personas so they don’t have to share possibly embarrassing or unrelated information with the wrong social group. Efforts are also made to keep the server that distributes rating information from telling too much about the users, and the social network connections are managed locally and not on the server. The rating servers protocol is kept thin in hopes of migrating to a p2p system in the future (distributed hash table?).

Week 12

The readings for this week are as follows:

1. The Privacy Jungle: On the Market for Data Protection in Social Networks
Joseph Bonneau and Soren Preibusch
2. Imagined Communities: Awareness, Information Sharing, and Privacy on the Facebook
Alessandro Acquisti and Ralph Gross
3. HIPAA Compliance: An Examination of Institutional and Market Forces
Ajit Appari, Denise Anthony and Eric Johnson
4. Data Hemorrhages in the Health-Care Sector
M. Eric Johnson
5. Information Explosion. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies
Latanya Sweeney

        The first two papers focus on social networks and their security/privacy ramifications. “The Privacy Jungle” evaluated forty five social networking sites using two-hundred and sixty criteria. One of the points they make is that while many social networks seem to vaunt their privacy practices, the policies themselves are not easily understandable by the average user without a legal background. They point out user surveys consistently show a high sense of concern for privacy, however observed user actions on the networks seem to contradict this. They divided their 45 selected sites into general (MySpace, Facebook) and niche sites (Linkedin, Habbo , and strangely selected to my mind Twitter, though they point out it may be in a niche by itself) but excluded content sharing focused sites (YouTube, Flickr). They evaluated data collected by sites, and when they signed up for services they filled out consistent data to the fullest extent possible, but withheld data which was not mandatory. After their evaluation some of their conclusions were:

1. They found strong evidence that social networks were failing to provide adequate privacy controls.
2. Evidence that one of the main problems was lack of accessible information for users.

        They suggest “privacy nutrition labels” to help standardize communication with users, and doing further research into how privacy policies could be made easier for users to understand. To increase consumer choice they recommend reducing social network lock-in by allowing for data to be portable as to allow moving to a new network (this has its own privacy concerns, but that’s not a focus of the paper).

        The second paper (Imagined Communities) focuses on Facebook. It is from the time period when Facebook was focused on colleges and high schools, so a few of the points like requiring a .edu email address no longer apply and the demographics may deferrer now. Like the previous paper they point out that while user survey results show a high sense of concern for privacy, observed user actions on the network seem to contradict this. They also found evidence that users have misconceptions about how visible their data on Facebook is. They compared survey responses with the data they could scrape from members’ profiles to see how “what they say” and “what they do” match up. A few of the interesting findings:

1. Privacy concerns may drive some older and senor college members away from Facebook, however for undergrads having reported higher privacy concerns did not seem to be driving them away from being on Facebook.
2. Non-members seemed to have higher privacy concerns in general than members.
3. What Facebook members said they used the network for differed greatly from what they think that other members are using it for. As an example: few said they were using it to find dates, but many respondents reported that they thought others were using it for that purpose. Perhaps those that responded were applying their own motives to others?
4. In expressed attitudes vs. behavior, members did not seem to be consistent. Even if a user said they cared about the privacy of their politics/sexual orientation/relationship status, they were still likely to put it in their profile.

        Also of interest to me, page 20 spells out some of the changes users made to their profiles after taking the survey (site scrapes were taken before and after). It seems just taking the survey made some people think about what was in their profile.

        Paper three (HIPAA Compliance) aims to develop “a regulatory compliance model by drawing insights from the institutional theory literature to identify the key drivers influencing HIPAA compliance, both institutional and market forces”. They put forth nine hypotheses to test with their dataset, most of which there were at least somewhat supported by their dataset after they did their evaluation. For the sake of brevity paper three will not be a focus of this write-up (no sense in repeating all of the hypotheses), but I would like to point out one of hypotheses that they indicated came back negative:

H6: Hospitals employing external consultants will exhibit a higher tendency to become HIPAA compliant.

        They point out that having an outside consultant is negatively correlated to being “compliant,” but from my reading of the paper it seems compliance is self-reported by the institution. Could it be that those institutions that have had external consultants look into their information system are just more aware of where they are lacking and responded more honestly? As the mantra from pen-testers goes “compliance != security”. It could be that those institutions that reported themselves as being less compliant may actually have better security and privacy controls in place than those that self-reported as highly compliant. If there is some place in the paper that points out a more rigorous test for compliance besides self-assessment please let me know, but on page 11 it states “self-reported level of compliance to HIPAA privacy, and security rules”.

        Paper four is titled “Data Hemorrhages in the Health-Care Sector”. Section two covers how health related data leaks can be used, and gives real world examples. A few of the missuses/ramifications of this data being misused include:

1. General privacy violations.
2. Issuance fraud for un-rendered services.
3. Issuance fraud for rendered services, but to someone that is stealing the identity.
4. Use of identities to access to multiple prescriptions and then sell the drugs (or feed a habit).
5. They mention the possibility of life threating wrong information being put on a medical record because of identity theft, but don’t seem to go into it in depth. An example I could think of is if someone stealing an identity listed an allergy or contraindication that the real person did not have, and as such the real person did not receive the best possible treatment when they went it for their own problems. As pointed out in class, having wrong blood type information put on your record would also be problematic.

        The paper briefly covers different ways breaches occur, but focuses on data available because of it being inadvertently placed on peer to peer file sharing networks. They used p2p searching software from Tiversa so scour the Gnutella, FastTrack, Aries, and e-donkey networks for Microsoft Office documents (Word, PowerPoint, Excel, and Access) containing certain medically related key terms. They kept the terms somewhat limited to cut down on the false positive they would have to weed through, but still had to manually throw out a lot of results (look on the bottom of page 9, and the top of page 10 for some of the weeding statistics). Just some of the interesting documents found include:

1. Medical documents used for tax purposes.
2. Employment documents.
3. Spreadsheet of recent hires at a hospital, including name Social Security number, contact information, etc.
4. Medical card detailing prescription information.
5. From a medical testing lab, a 1718 page document containing Social Security numbers and other information for almost 9000 patients.
6. Two spread sheets from a hospital containing Social Security numbers and other information for over 20000 patients.
7. Psychiatric evaluations.

        I could go on, but I think that makes the point. Some of their conclusions and concerns are:

1. Under HIPPA, it can be hard to correct information after an identity has been stolen.
2. HIPPA may help stop leaks from the health care providers directly, but does little to stop patients for inadvertently sharing the information themselves.
3. More tamper proof photo id issues from the insurance companies may help (or at least raise the bar).

        Page 16 is also interesting as it shows some of the searches other p2p users were doing for medical information. Idea for further research: the viability of a honey pot for people doing searches for such information?

        Paper five (Information Explosion) I will only mention in brief, and concerns the growing trend to collect more and more data. This is aided by the fact that storage is so much cheaper today that it has been in the past. Mention is made of how more information is now collected on birth certificates, via “loyalty cards” at retailers, and on employment paperwork than in the past. A few of the data collecting behaviors and trends the author points out are:

Collect more – if data is already being collected from a person, add more fields.

Collect specifically – If data in the past was aggregate, make it specific to the individual. Loyalty cards are a great example of this. In the past, retailers may only have known the aggregate sales at a store, not the can try to predict based on what an individual with certain demographics bought.

Collect if you can – given the opportunity, collect data. Examples given include new hire paper work and immunization registries.

Also mentioned is the balance of providing enough information for researchers to do useful work for society, while still providing for privacy to the individual. Datasets can be somewhat anonymized, but there is the risk of useful information being removed or obfuscated in the process.

Week 13

The readings for this week are as follows:

1. Digital Rights Management and the Pricing of Digital Products
Yooki Park and Suzanne Scotchmer
2. Competing with Free: The Impact of Movie Broadcasts on DVD Sales and Internet Piracy
Michael Smith and Rahul Telang
3. A contribution to the understanding of illegal copying of software: empirical and analytical evidence against conventional wisdom
C. Osorio
4. The Simple Economics of Open Source
Josh Lerner & Jean Triole

        Paper one, as the title spells out, concerns DRM and pricing models. Specifically it seems to be focusing on products you download without receiving physical media, though it would have been helpful if it had made this clearer from the start of the paper. After a brief synopsis of the history of Digital Rights Management the paper goes into the effects of content vendors maintaining their own DRM systems directly, or through a shared protection platform. Two important things to consider with a shared protection system are:

a. Whether the vendors can set their prices independently.
b. How costs for the share systems are allocated to the member firms.

        If the vendor can choose their own price, they can choose one to maximize profits. If the shared system does not allow the vendor to choose the price, they could still possibly undercut others via the use of rebates outside of the shared system. Decisions would also be based on how the costs of a shared system are allocated, as a percentage of selling price or as a per download cost. If the company plans to issue rebates a “per download” scheme may be preferred by them if an item has a high enough selling price.

        One of the paper’s conclusions is that with a separate DRM system prices will be lower than if there were perfect legal enforcement, and the vendors will also be burdened with the cost of the DRM system. However a perfect legal enforcement does not seem very likely to my mind, though the RIAA/MPAA and others seem to lobby for laws to support it. A shared DRM system may or may not be less costly than a firm controlling their own, and it may or may not lead to higher prices. It depends on many variables. In the paper’s opinion, it is not clear whether a shared DRM system would be more efficient than the vendor running their own.

        Paper two looks at the impact of movie broadcasts on the sale of DVDs and unlicensed downloads of movies. They used the broadcast of the selected movies as an “exogenous demand shock” and attempt to measure the change in sales of the DVDs before and after the broadcast, as well as attempting to track the downloads of the movies from popular BitTorrent trackers. Several points were brought up in the paper:

a. Movie studios are becoming more reliant on media sales for their profits.
b. Movie content may be more prone to single use consumption (watch once and never again).
c. Extras on DVDs that normally do not come with downloaded movies (commentaries, “making of” featurettes) may have an effect on people deciding to buy them.
d. Effects of ease of downloading and storage of the content.
e. Media types vary, and the effects of illegal downloading on the market place for one type of media (music) may not be the same as for another type of media (movies, games, books, etc).
f. Technology changes with time, so what is true now about the market structure for a given media type may change in the future.

        Because of the points I listed above, and others, the conclusions from this paper should probably be seen as applying to a snapshot in time, or as the authors say: “our findings may change in the future if the environment surrounding piracy changes”. With those caveats, some of their findings include:

1. Broadcasting movies on TV provided a boost in DVD sales.
2. Broadcasting movies on TV also boosted illegal downloads.
3. Over the air broadcasts had more of an effect than cable broadcasts.

        Since both the illegal copying and DVD sales of a given movie went up they also looked to see if the availability of the movie on popular BitTorrent trackers (Piratebay and Mininova) had an effect on the resulting DVD sales after the broadcast. Using movies that were and were not easily available on the chosen trackers, along with DVD sales information, they did a regression analysis so see how they were related. From their results they concluded that the illegal copiers and the DVD buyers were two separate segments without much crossover, and the availability of the movie on the BitTorrent trackers did not significantly affect the DVD sales.

        Neither paper three nor four lend themselves to what I would consider concise write-ups with so many points being made in each. Instead I’ll cover what I consider the more interesting points.

        Paper three looks into the effects of illegal copying on the software market. It points out that software is a quasi non-excludable good, one person having a copy does not keep someone else from having a copy (unlike let’s say a physical item). Osorio points out that illegal copying can cause positive network effects. More users, even the illegally copying kind, may help lead to the software being bought because of word of mouth and the usefulness of having a shared platform with others. He points out overestimates on the cost of illegal copying that stem from the faulty assumption that illegal copiers would have bought the software if they could not have gotten it for free. Osorio’s review of the literature points to four common hypotheses about why software is illegally copied. So as to not quote the text verbatim I’ll summarize them roughly as:

1. Cost: Copying is cheaper than buying the software legitimately and there could be an affordability or income issue.
2. Legal or Worldview: Not all societies/culture look at intellectual property and copying the same way, and may see nothing wrong in the copying of software.
3. Fit: The software does not fit local needs.
4. Support and Services: There may not be enough local support and complementary services for the software.

Of these hypotheses Osorio found:
1. Seems supported by his data, the lower the income in an area the higher the illegal copying.
2. His synopsis of the results for hypotheses two are not clear to me. From the Table 1 and Figure 1 it looks like there is a correlation between legal framework and illegal downloads.
3. Seems to have some support.
4. Also seems to be supported.

        Paper four looks at the economics of Open Source projects. The three projects they focus on were Apache, Perl and Sendmail, though Linux seems to pop up a lot in their discussions also. One question commonly asked is why someone would develop software for free, what are they getting out of it? One of the points made in paper four is that the utility function for an Open Source developer is not necessary money; there is also ego gratification and reputation amongst peers. Some developers also just enjoy developing their projects for the fun of it. The reputation gained from working on an Open Source project could also lead to a job, which might be a monetarily compelling reason for some. A company may support an Open Source project, either under hopes of making money from support, good PR, or for selling hardware that uses the software. In the case of company sponsored Open Source projects the developers motives can be salary based.

They mention a few pitfalls that can affect Open Source projects, the two I find most interesting are:

1. Forking if there is not strong confidence in the leadership. This may have the effect of pulling developers away from the core project. Then again, maybe the spinoff project will be compelling in its own right. To my mind forking has sort of an undeserved negative connotation in some circles/contexts.
2. Problems getting people to develop the “boring parts”. One common example of this is software that is great technically, but has very little documentation because no one finds that task interesting/fun/rewarding.

        Also covered in the paper are the leadership structures of Open Source projects. Some projects like Linux have a recognized leader (Linus Torvalds), others like Apache have more of a committee system. Not mentioned in the paper is Larry Wall’s title as the leader of the Perl project: “Benevolent Dictator for Life of the Perl project”. Something to look at for further research would be the changes in structure of Open Source projects and corporate support since the paper was written in 2000.

Week 14

The readings for this week are as follows:

1. Predictors of Home-Based Wireless Security
Matthew Hottell, Drew Carter and Matthew Deniszczuk
2. Practice & Prevention of Home-Router Mid-Stream Injection Attacks
Steven Myers and Sid Stamm
3. Information Disclosure as a light-weight regulatory mechanism
Deirdre K. Mulligan
4. Mandatory Disclosure As a Solution to Agency Problems
Paul G. Mahoney

        Paper one tries to determine factors that contribute to whether or not security features will have been enabled on a WiFi network. Some of the demographic information they tried to correlate with wireless security were education level, income and housing density. They conducted their own wardrive, and used US Census data for demographic information about the neighborhoods they were scanning. Wireless networks were marked as secured if they had WEP or WPA enabled. They had three main hypotheses to test. In the end they could find no statistical support for any of their hypotheses. I’ll quote their hypotheses and summarize their resulting conclusions below:

“Hypothesis 1: higher education level is a predictor of higher levels of wireless security.”

        I find this quote from the paper troubling “These findings indicate that investing resources in large-scale educational campaigns to raise awareness of wireless security may be a bad decision since education seems to have little effect.” I don’t see how this is a good conclusion. Just because someone has a bachelor’s degree does not mean that they would have a specific skill set, or an awareness of a given issue. Being aware that you should secure your wireless network and knowing how is pretty specific. We might question the usefulness of user awareness education for other reasons, but not based on this data concerning the number of college graduates in a given neighborhood.

“Hypothesis 2: Higher income indicates a greater likelihood of secured wireless access points.”

As stated before, they found no support for this hypothesis either.

“Hypothesis 3: Higher population density predicts better levels of wireless security.”

Much the same, they found no support for this hypothesis either.

        One factor that did seem to have an effect on the whether a network was secured was its SSID. Some SSIDs were associated with services like 2Wire that enabled encryption by default on most of their deployments (330 out of 340). About 88 percent of the routers that used the SSID “Linksys Secure Easy Setup” were secure, presumable because the install walked the user though setting up encryption. After taking these two SSIDs out about 55 percent of the routers found were configured to be secure.

        I have many, possibly minor, nitpicks about the paper. First, on page 5 as an example of the cost of implementing security on a wireless network they went into what someone would have to do to set up MAC filtering. This is not the best example for two main reasons:

1. Setting up MAC filtering is much more labor intensive to do than turning on WEP/WPA/WPA2, which is what they tested for. This sets up somewhat of a straw man as to the level of difficulty. I’m not saying setting up WEP/WPA/WPA2 is necessarily easy for the average user, but using MAC filtering as the example for the difficulty of setting up a secure network is setting the bar too high as far as difficulty.

2. Not only is MAC filtering harder to configure than WEP/WPA/WPA2, it is ineffective. All an attacker needs to do is put their WiFi card in monitor mode, sniff for active connections, and clone a MAC address. All that trouble, for little to no benefit when turning on encryption is so much easier.

        Another issue some technical readers may have with the paper is the definition of security (WEP or WPA on). Did they look for wireless networks that allowed anyone to connect, but would not let the person route traffic anywhere without connecting to a VPN first? This was a commonly seen configuration on commercial/education networks for many years since WEP was so horribly broken and required a shared key. They say on page 7 that they weeded out commercial access points as identified by SSID or maker (hopefully they excluded University ones as well), so perhaps this is not as big an issue since the VPN configuration is not as likely to be seen for home users. They also used Netstumbler, which will only see access points that allow probes or beacon. Kismet would have been a better choice as it can find “cloaked SSIDs” using monitor mode as long as there is association traffic to be seen. Depending on how it was set up (past versions have logged more than just management frames by default), this may run afoul of some wiretap laws (A good example story: http://arstechnica.com/tech-policy/reviews/2011/04/judge-was-wifi-packet-sniffing-by-google-street-view-spying.ars ). Then again, I doubt many people enable SSID cloaking as it does not add much real security and can cause connectivity issues. Still, use of cloaked SSIDs is something to look into if the study is repeated.

        If this study was done again WiGLE.net would be a good resource (assuming you trust the users to upload valid results). WiGLE has a database of found access points from around the world, and is fairly easy to query. You can also query for just access points you have found. It unfortunately lumps WPA and WEP into the same category, but this would not be a problem considering this paper’s definition of secured wireless. I can say that people do seem to be more conscious about home wireless security now than they were then, or perhaps there are more routers that turn on encryption by default. Of the 9943 access points I’ve found just this year (within .5 degrees of 38.22,-85.75), 7871 have used WEP or WPA (%79). I did not however factor out commercial installations.

        Paper two from Myers and Stamm concerns mid-stream injection attacks and countermeasures. Specifically it focuses on the injecting of scripts into unencrypted webpages by compromised home routers, but the mitigations should also be applicable to other mid-stream injection vectors such as inline proxies and routers (these are more of a threat to my mind as I will cover later). The core motivation for this research is that the use of TLS/SSL can be too costly CPU wise, which is why some sites avoid using it for all except for the most confidential information (passwords, form data, etc,). This leads some sites to use “secure post” where only the submitted data is sent using TLS/SSL. The down side to this is that the HTML form itself is sent in the clear without the signing advantages of TLS/SSL, so an attacker in the middle of a connection can modify the returned form page to include JavaScript to fork the form into sending its returned data to more than one receiver (the attacker’s collection box for example). The paper proposes using obfuscated scripts and cryptographic hashes to have the browser check to make sure that an attacker has not inserted new code mid-stream. The obfuscation used will have to be changed from time to time so the attacker will have to come up with new countermeasures. Both the client and the server do their own checks of the HTML form and also, depending on where z (the cryptographic hash of the canonical HTML form+JavaScript, etc.) is checked there may be other problems. For example, the server side may know that the form has been changed because z does not equal z’ and can warn the users, but the data may already have been submitted to the attacker. The paper says about notifying the user of an attack “This notification should be through alternative channels, as in this case the web channel may be compromised.” This is true since if an attacker can modify the page to add new content, they can also modify alerts coming in from the server over the same HTTP connection. However, what alternate channels should be used to avoid great time delays? Are they intending to email the user and say “hey, I think someone is messing with your connection”?

        While not a perfect solution, it is hoped that these mitigations will make it too costly to carry out the attack. The paper is pretty detailed as to how this can be done by a developer, but a library for making it easy to implement and to avoid “roll your own” security problems would be helpful. As stated in the paper : “Thus, our countermeasure is not computationally secure in a cryptographic sense, and we do not dispute that a dedicated hacker could overcome the solution with enough resources. Our goal however is more preventative: to make script injection unattractive and non-profitable, so that fraudsters do not attempt the attack since it will not be financially profitable (at least on those sites that bother to invest in countermeasures).” This is where I have the most nitpicks. The mid-stream injection attacks seem like they could be a big concern if the injection is happening on an ISP’s router, a shared proxy (think of a rogue Tor exit point) or even a wireless router at a location that has a lot of patrons (coffee shop, library, etc.). The early focus on home routers seems a little out of line with the “not be financially profitable” quote above as I don’t think the installation of mid-stream injection software on the routers is quite as easy as the authors make it out to be. Yes, it can be done for a large fraction of home routers, but at what time and effort cost to the attacker? They answer some objections on the second and third pages, but I still have my doubts. Yes, OpenWRT (and it’s cousin DD-WRT) support a wide range of routers, but getting it installed on a large number of home routers would be a non-trivial automation task (automating being the only way to make it profitable, unless we are considering a targeted attack). An attacker could drive around neighborhoods installing it, but that would be cost and time prohibitive. Installing a new firmware remotely across the Internet via something akin to a CSRF attack could be attempted, but this seems non-trivial. A few of the things a mass attacker would have to get around if they wanted to install a new firmware remotely include:

1. Yes, a lot of routers do use default passwords (estimates of %25 to %35 are given in the paper), but they vary and it does cut down on targets if you are an attacker going for numbers.

2. Assuming you know the default passwords, newer browsers have mitigations in place to make requests to http://root:password@someip less transparent then they use to be. This sort of attack was shown in the Grossman presentation referenced in the paper, and is a little harder now depending on the browser in question (in my tests, Firefox 4 gives a warning, IE 8 fails completely).

3. CSRF attacks could be used against some vulnerable routers to first make the admin interface available on an Internet facing IP, but uploading a new firmware from across the internet seems like an idea prone to bricking routers.

4. You have to detect the router type so you choose the right version of the firmware to upload, but this is not nearly as hard as the other issues and I’ve seen code to do it.

5. This is not really an issue, just an observation. If you are going to bother with installing OpenWRT, install TCPDump/Ettercap/Dsniff and harvest passwords/data from protocols besides just HTTP.

        Yes, I suppose these issues can all be gotten around, but it seems to me that a lot or work has to be done, it would be error prone, and it would not be attractive to an attacker looking for large aggregate numbers. I see other vectors of mid-stream injection as being far more likely than a home router being backdoored. The modification of a router by someone with direct wireless access might still be profitable if it’s:

A. Specifically targeted.

B. The router has a lot of users to aggregate semi-valuable data from.

C. There is an intention is to do identity theft against a small number of individuals and try to draw as much profit as possible from fewer people.

As I’m running way long already, I will attempt to use the third and fourth paper to answer the question proposed in the reading list:

“The argument against privacy is that disclosure is a lightweight effective market mechanism. Is this consistent with the arguments for or against mandatory disclosure?”

        Well, as I read the papers, this is an argument against a specific type of privacy, not privacy in general. For certain economic transactions to take place there has to be trust, and many times that means verification of information that may be considered private under certain circumstances. If I write someone a check, they want to have a fair degree of certainty that I am who I say I am, and will want my name and contact information. If I am a promoter of a certain stock, potential investors should want to know what my interests are and if I might have any perverse incentives. The question becomes what information they need to know. Certainly in the case of a stock promoter an investor may want to know the financial interests of the promoter in the company and related assets, but they would not need to know the promoters blood type. That example is silly and clear, but they may be other finer points about what needs to be exposed and what can remain private. Mandatory discloser of certain information may make the market more efficient since each investor will not have to do all the research for themselves. In the case of information breaches, breach notifications could help victims of data theft be able to mitigate and look out for misuses of their data. I suppose I would need more clarification for what is remaining private and what has to be revealed, but yes it can be consistent with arguments for mandatory disclosure.

Week 15

This write-upThe readings for this week are as follows:
1. Data Breaches and Identity Theft: When is Mandatory Disclosure Optimal?
Sasha Romanosky, Richard Sharp and Alessandro Acquisti
2. Stopping Spyware at the Gate: A User Study of Privacy, Notice and Spyware
Good, N., Dhamija, R., Grossklags, J., Thaw, D., Aronowitz, S., Mulligan, D., and Konstan., J.
3. The Role of Internet Service Providers in Botnet Mitigation: An Empirical Analysis Based on Spam Data
Michel van Eeten, Johannes M. Bauer, Hadi Asghari, Shirin Tabatabaie and Dave Rand

        Like many of the papers we have read over the last few weeks the “Data Breaches and Identity Theft” paper spells out its core question in the subtitle. To reduce the effects of data breaches on consumer losses many states have enacted mandatory disclosure laws to help victims mitigate misuse of their data. However, according to the paper the effects of these laws have not been rigorously tested, and there is some fear that the laws may impose a burden on data holders and consumers that is not commensurate with the results . The paper seeks to set up models for judging the usefulness of these laws, and see what would have optimal effects on firm, consumer and especially social costs.

Three common policy approaches are mentioned, relating to the chronological order of events surrounding the breach:

1. Ex ante regulations: These are preventative rules, put in place to try to keep the breach from happening in the first place.
2. Information disclosure: The hope is that the threat of internalizing costs will incentivize the data holders to be more careful. It is also hoped that the disclosure will help victims to take precautions to prevent further losses.
3. Ex post liability: These are recovery mechanisms. One example might be a victim suing the data holder for loss compensation.

        One of the more interesting sidelines of the paper is the concept of consumer under-reaction and over-reaction. Will some consumers be hardened to breach notifications and start ignoring them? Will some over-react and cause themselves more cost than is justifiable considering potential loss? An example of and over-reaction might me “I’m going to switch all of my accounts” when the chances of loss is less than the time cost involved in switching. Another example might be people who decide to never user a type of service again, even when it is normally to their economic benefit (shopping online comes to mind).

In the end, the paper claimed to show two major effects of mandatory disclosure laws:

1. It changes the care model from unilateral to bilateral. Both the firm and the consumer can take action to mitigate losses.
2. The laws impose costs on firms in two distinct ways:
a. Direct costs from disclosure: fines, fees, lost business, etc. (what they refer to as disclosure tax).
b. They can force the firm to internalize some portion of consumer loss (consumer redress).

        They indicate both will cause a firm to increase its level of care, but only the disclosure tax is a dead weight loss. However, if consumer redress is low, some disclosure tax may be necessary to reduce social cost.

        The second paper, “Stopping Spyware at the Gate”, tries to determine what influences user choices when they install software. The subjects were given the scenario of a friend asking them to help set up a new computer. The researchers gave five real world applications for the test subjects to choose to install (Google Toolbar, Webshots, Weatherscope, KaZaA and Edonkey). They split thirty-one test subjects into three groups.

1. Ten were in the control group, and only received the EULA before the install commenced.
2. Ten were in the “Generic Microsoft SP2 Short Notice+EULA” group.
3. Finally, eleven were in the “Customized Short Notice + EULA”. The short notices were generated in a standardized way, outlined at the top right corner of page 5, and include items like (copied directly from the paper):
• The name of the company
• The purpose of the data processing
• The recipients or categories of recipients of the data
• Whether replies to questions are obligatory or voluntary, as well as the possible consequences of failure to reply
• The possibility of transfer to third parties
• The right to access, to rectify and oppose

        Strangely, table 1 on page 6 gives the breakdown of groupings, and somehow 10+10+11=30 in the total. I assume they just made a typo, unless I’m missing something. They also interviewed the participants after the installations to gain insight into why they installed what they chose. A few of the key observations from the authors include:

1. While users knew they were agreeing to a contract when they clicked though, they felt they had a limited understanding of the EULA. They also expressed distaste for reading long notices.
2. Many uses expressed regret about their installation decisions after the fact.
3. Short notices did improve understanding of the consequences of installation.
4. Short notices did not however have a statistically significant effect on installation.
5. If the utility of an application was seen as high, users would install it despite bundled software.
6. Brand plays a part in deciding what to install. For example, Google was seen as a trusted name, but Weatherscope sounded too much like Weatherbug so people distrusted it. Users also had a somewhat negative response to the name KaZaA , with some choosing to install Edonkey instead (at least in the EULA Only group). However, this effect on KaZaA /Edonkey was somewhat reversed in the next point.
7. Too little information in short notices may cause an unwarranted impression of increased security. Since KaZaA had less information in its short notice than Edonkey, some in the “Generic +EULA” group thought it was less scary (See table 4 and sections 5.2.8 and 5.2.9).

        There are a few things that I think may skew the results and could be improved upon in future studies, some of which the authors also allude to as possible weaknesses (sections 4 and 5.5):

1. The subjects were asked to install the applications on a test system, not their own computer. There may be some question as to the level of care they took since it was not their computer they would be affecting.
2. The users knew they were being observed as part of a study, even if they did not know the study was about spyware. This may influence their thought processes.
3. They mention the weakness of small sample size when it comes to students, but what about installable applications? To get a better idea of the effects of brand future tests could include more installable apps from big names like Google or Microsoft. Also, there are many other apps that could fill the niches listed, especially when it comes to file sharing, so testing the effects of many other EULA and short notices based on them would be helpful to obtain a boarder sample.

        Paper four, “The Role of Internet Service Providers in Botnet Mitigation”, attempts to determine if ISPs can be an effective control points for botnet mitigation. Of special concern is the practicality of rules imposed upon ISPs, and if they would be effective given the number of ISPs out there, where they are located, and the amount of botnet traffic each one is associated with. To gain empirical data on the number of bots in the wild, and what ISPs the infected hosts were on, two approaches were considered. One would be to gain access to the command and control channel of a botnet and see which hosts were communicating. However, this approach only gives a snap shot of the botnets they have infiltrated. Another approach would be to set up hosts to act as honeypots, in this case spam traps, to look for suspicious contact. The spam trap approach also has issues, such as potential false positives, but it may give a more representative sample from a wider range of botnets than the first approach. They went with the spam trap approach, and used IPs to tie hosts to ASNs and countries. They then used the ASNs to try to figure out which ISP the bot belonged to, with some difficulties based on not having a database of mappings. More complete details on the problems of attribution can be found on page 6 of the paper. To summarize just some of their findings:

1. While the total number of ISPs is high (somewhere between 4000 and 100000 actors they say), many are bit players. Just 50 ISPs account for over half of the infected sources. This seems to indicate that even if not all ISPs could be brought under collective action (government interventions or public-private sector cooperation for example), getting the larger ones on board could still have a significant effect.
2. As competition is mostly driven by price, even if consumers care about security there are “no adequate market signals” that can reliably guide them choosing better ISPs from a security standpoint.
3. As might be expected, size was a major player in the number of infections at a given ISP. A larger user base means a larger number of potential bot hosts. However, per user, larger ISPs seemed to do better security wise that smaller ISPs (less infections per capita). It was conjectured that this may have been because of better automation. Even then, amongst ISPs of similar size there could be an order of magnitude difference in the number of detected infections.
4. Higher rates of illegally copied software were associated with higher botnet activity.
5. If level of education is used as a proxy for technical competency, it seems technical competency has negative effect on bot numbers (less total infections).

        In the end, the concentrated nature of responsible ISPs seems to indicate that they could act as a critical control point to mitigate botnets, even under current market conditions.

If you would like to republish one of the articles from this site on your webpage or print journal please contact IronGeek.