Using Crowdsourcing For Collecting Information About Security Vulnerabilities

The concept of crowdsourcing was coined by Jeff Howe in 2006. Crowdsourcing is an act of outsourcing a job, previously done by workers, to a large group of people in the form of an open call. Nowadays, crowdsourcing is used by commercial, nonprofit, and public organizations. A widely known example of a nonprofit organization using crowdsourcing is Wikipedia Foundation Inc. Wikipedia Foundation Inc. is an American non-profit charitable organization maintaining an online encyclopedia called Wikipedia. Wikipedia allows any internet user to edit its articles. The users of Wikipedia are not paid for their contributions. Nevertheless, by July 2012, the English version of Wikipedia contained 3,988,490 articles. Several studies indicate that the quality of the articles in Wikipedia is similar to the quality of paid encyclopedias (Clauson, Polen, Kamel Boulos, Dzenowagis, 2008; Leithner, Maurer-Ertl, Glehr, Friesenbichler, Leithner, Windhager, 2010; Wood and Struthers, 2010). The quality of the articles published on Wikipedia shows that platforms using crowdsourcing may constitute a reliable source of information. Being such a source, crowdsourcing may be used not only for editing and publishing of articles, but also for collecting information that may be beneficial to commercial, nonprofit, and public organizations. The present contribution explores the possibilities for using crowdsourcing for collecting information about security vulnerabilities, such as software bugs. In particular, the article discusses online competitions in which participants try to find out security vulnerabilities in software applications (Section 2), collecting information about security vulnerabilities from consumers (Section 3), and collecting information about security vulnerabilities from the Web (Section 4). Finally, a conclusion is drawn (Section 5). Online competitions in which participants try to find out information security vulnerabilities In the context of finding out security vulnerabilities in software applications, the proverb “many eyes are better than one pair” makes sense. By providing the producers of software with diverse feedback, the online competitions allow them to find out and remove from their systems a wide range of security vulnerabilities. For example, the website TopCoder (www.topcoder.com), which is based on a crowdsourcing principle, regularly organizes testing competitions. During these competitions, anyone can submit information about security vulnerabilities found in software provided by a company or by TopCoder. In competitions in which a company will use the winning submissions, the winners are required to transfer the intellectual property rights of the winning submissions to TopCoder. In exchange, participants may receive a financial award.
Other examples of websites organizing online competitions, include Hex-Rays (https://www.hex-rays.com/contests/) and Volatility Labs (http://volatility-labs.blogspot.be/2013/01/the-1st-annual-volatility-framework.html). Collecting information about information security vulnerabilities from consumers Ranker The website Ranker (www.ranker.com) provides visitors with the possibility to post and answer questions. A visitor may add his answer to a list and/or rank the answers provided by other visitors. A company willing to collect information about security vulnerabilities from consumers may use Ranker to receive a detailed list of consumer responses based on rank. Currently, Ranker has over 4 million monthly unique visitors. Ushahidi The Ushahidi platform is a free and open source platform that allows anyone to collect distributed data via SMS, email, voicemail, Facebook, Twitter, and visualize it on a map or timeline. Ushahidi Inc.has also developed smartphone apps for the platforms. The apps for iPhone, Android, and other java-enabled phones can be freely downloaded on the Internet. The initial purpose of Ushahidi platform was to create a simple way of aggregating information from the public for use in crisis response.The Ushahidi-based platform was firstly used in the aftermath of Kenya’s disputed 2007 presidential election. The election was manipulated by both candidates’ parties. This led to a widespread hostility and ethnic violence. The platform collected eyewitness reports of violence that were sent in via email and text messages and then plotted via Google Maps. Ushahidi-based platforms were deployed in over forty countries for a wide range of uses, including civil resistance, disaster response, election observation, environmental-impact reporting, and human rights monitoring. In the context of collecting information about information security incidents, Ushahidi can be used, for example, in cases when a computer virus affects hundreds of computer systems. Through the Ushahidi platform, the operators of the affected computers may provide information about the security vulnerabilities of the computer that were exploited by the virus. In this regard, it is worth mentioning that, because the Ushahidi-based platform can collect information through SMS, it is an appropriate tool for collecting information in cases when the affected computer systems are not working. Collecting information about security vulnerabilities from the Web Information about security vulnerabilities can be collected from the Web. In this regard, it is worth mentioning that the success of Google’s PageRank Algorithm (the essence of Google search engine) stems from the use of crowdsourcing. Instead of relying on appointed employees, the categorization of the search results displayed by Google is done by the crowd. To be more specific, Google’s PageRank Algorithm assigns value to each website based on the popularity of the links of that website. Most processes of automatic or semiautomatic collection and analysis of data from the Web fall into the scope of data mining.Data mining can be defined as “the process of discovering patterns, automatically or semi-automatically, in large quantities of data-and the patterns must be useful (Witten, Frank, Hall, 2011).”There are three categories of data mining, namely, (1) content mining, (2) structure mining, and (3) usage mining. Content mining refers to collecting and analyzing the content data available in web documents. Structure mining refers to collecting and analyzing link information. Specifically, structure mining may include the analysis of the way in which different web documents are linked together. Finally, usage mining refers to analyzing the transaction data, which is logged when users interact with the web. Because usage mining involves mining the web server logs, it is sometimes referred to as “log mining.” For example, through various applications allowing data mining, companies may check publicly available sources containing user generated content. Such sources include, but are not limited to, online forums, comments posted under articles published in newspapers, and social networks. The data collected from these sources may contain information about security vulnerabilities. Hootsuite (http://www.hootsuite.com/), Trackur (http://www.trackur.com/), and Crimson Hexigon(http://www.crimsonhexagon.com/)are three well known social networks monitoring tools. Conclusion Crowdsourcing is an inexpensive and efficient option for collecting information about information security vulnerabilities. This article has shown that companies may utilize crowdsourcing to collect such information in at least three ways, namely, through online competitions, from consumers, and from the web. By organizing online competitions in which the participants try to find out security vulnerabilities, a company may receive different solutions to a problem. This may bring new perspectives and shed new light on old problems. By collecting information directly from consumers, a company may receive, at no or low cost,information about security vulnerabilities from a large number of consumers. By collecting information from the Web, companies may analyze huge amounts of data and find out information about security vulnerabilities that is of interest to them. References

Clauson K., Polen H., Boulos M., Dzenowagis J., ‘Scope, completeness, and accuracy of drug information in Wikipedia’, Ann Pharmacother42 (12), 2008.

Crowe, A., ‘Disasters 2.0: The Application of Social Media Systems for Modern Emergency Management‘, Taylor & Francis Group, LLC, 2012.

Deegan, M., and McCarthy, W., ‘Collaborative Research in the Digital Humanities‘, Ashgate Publishing, 2012.

Felstiner, A., ‘Working the crowd: Employment and Labour Law in the Crouwdsourcing Industry’, Berkeley Journal of Employment and Labor Law, 32 (1), 2011.

Foth, M., ‘From Social Butterfly to Engaged Citizen: Urban Informatics, Social Media, Ubiquitous Computing, and Mobile Technology to Support Citizen Engagement‘, MIT PRESS, 2011.

Han, J., Kamber, M., Pei, J., ‘Data Mining: Concepts and Techniques: Concepts and Techniques‘, Elsevier, 2011.

Horton and Chillton, ‘The Labor Economics of Paid Crowdsourcing’, Proceedings of the 11th ACM Conference on Electronic Commerce, 2010. Available at SSRN: http://ssrn.com/abstract=1596874.

Howe, J., ‘Why the Power of the Crowd Is Driving the Future of Business‘, Crown Business, 15 September 2009.

Huang, Y., Singh, P., Srinivasan, K., ‘Crowdsourcing New Product Ideas Under Consumer Learning’, December 2011. Available at SSRN: http://ssrn.com/abstract=1974211 or http://dx.doi.org/10.2139/ssrn.1974211.

Leithner, A., Maurer-Ertl, W., Glehr, M., Friesenbichler, J., Leithner, K., Windhager, R., ‘Wikipedia and osteosarcoma: a trustworthy patients’ information?’, Journal of the American Medical Informatics Association: JAMIA17 (4): 373–4, 2010.

Surowiecki, J., ‘The Wisdom of the Crowds’, Anchor Books, 2005.

Okkoloh, P., ‘Ushahidi, or ‘testimony’: Web 2.0 tools for crowdsourcing crisis information’, Participatory Learning and Action 59, 2009, pp. 65-70.

Van Wel, L., Royakkers, L., ‘Ethical issues in web data mining’, Ethics and Information Technology 6, 2004, pp. 129-140. Available at http://alexandria.tue.nl/repository/freearticles/612259 .

Whitla, P., ‘Crowdsourcing and Its Application in Marketing activities’, Contemporaly Management Research 5(1), 2009, p. 26, http://www.cmr-journal.org/article/viewFile/1145/2641 .

Witten, I., Frank, E., Hall, M., ‘Data Mining: Practical Machine Learning Tools and Techniques: Practical Machine Learning Tools and Techniques‘, Elsevier, 2011.

Wood, A., Struthers K., ‘Pathology education, Wikipedia and the Net generation’, Medical teacher32 (7), 2010, p.618.