Harvesting social media raises legal and ethical issues that must undergo careful consideration and risk assessment before the creation of a collecting program. The legal challenges faced involve social media user rights, whereas the ethical challenges concentrate on a larger question: just because we can archive social media, does that mean we should? Though copyright and ethical concerns contain expected ambiguities on the topic of social media collecting, fair use under copyright law likely provides cultural heritage institutions with a safety net amongst the uncertainty. Having a foundation of understanding for the scope of these issues should help guide the creation of policies and practices surrounding social media harvesting and preservation.
Legal queries raised center around issues of copyright and privacy. Current laws fail to directly address social media, and discussions and conflicts regarding their legal standing are ongoing in the courts. As legal battles over social media continue, laws and legal implications are subject to change. The authors referenced offer a glimpse into the assorted ways social media does and does not fit within the United States’ current legal framework.
Intellectual property rights pervade the discussion on legal dilemmas that researchers, archivists, librarians, and others confront when harvesting social media data. Digital platforms have become increasingly complex, and social media use has escalated, creating new avenues of research data. Organizations must keep pace with research demand “in a rapidly changing environment characterized by new distribution mechanisms, expanding copyright monopolies, ever-greater technology dependencies, and changing user expectations.” Legal precedent for social media is still being created in the courts; current intellectual property regulations are not easily applied, since much of copyright law remains largely unchanged and outdated in an increasingly digital age. Intellectual property, interpreted as “a means of protection for owners of creative works,” includes copyright, as defined in Article 1, Section 8 of the United States Constitution as “[that which serves…to] promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.” These exclusive rights for original works include the rights to reproduce, distribute, display, or prepare derivative works.
Current court cases related to intellectual property in social media predominantly deal with tweets and photography. For example, Morel vs. Agence France-Press, a case brought in the Southern District Court of New York, decided that photographs posted on Twitter can be copyrighted. The case involved photographs of the disastrous 2010 earthquake in Haiti that Daniel Morel posted on Twitter. The defendant, Agence France-Presse, and its American distributor, Getty Images, argued and believed that they could use the photographs under Twitter’s Terms of Service and tried to blame the prosecution, Daniel Morel, for exposing his photographs to theft. The two publishing companies sold and distributed the images to clients, many of which were news sources. The ruling, which sided with Morel, defended intellectual property rights for social media content.
Unlike photographs, social media posts, such as those found on Twitter, may, in some cases, be insufficiently creative to qualify for copyright protection. A 2008 report sponsored by the United States Copyright Office and the National Digital Information Infrastructure and Preservation Program of the Library of Congress stated that, “Copyright is limited in time and scope, is subject to a number of exceptions and limitations, and contains ‘built-in First Amendment accommodations.’ Only creative expression is protectable; ideas, facts, systems, processes, and procedures are not.” For this reason, posts that are or include original photographs, rather than brief statements, are more easily defended as intellectual property. Photographs may prove more copyrightable, depending on the level of creativity and inventiveness, as compared to textual social media postings that may be considered circumstantial expressions rooted in conveying information. If posts meet these conditions, current copyright law grants creators federal copyright automatically without having to publish or register the work with the copyright office. Can one assume that social media posts that are original and creative, and that do not simply convey facts or ideas, are copyrighted?
Creative works clearly exist on social media platforms. An interesting example includes a short story written solely on Twitter by acclaimed author David Mitchell. No one has claimed that this innovative use of social media lacks copyright protection. However, somewhere in between the clear distinctions of short stories and short updates lie many posts that may not fall in either category. Twitter approaches this uncertainty by allowing users to file takedown requests for people who see others posting their content without attribution. Usually the takedown requests are for more obviously copyrighted content, like photographs or videos. However, recently Twitter has honored takedown requests for people copying other users’ jokes.  As courts continue to make decisions regarding copyright on social media, the ambiguity will decrease over time.
Despite ambiguity about what content is copyrightable on social media, fair use remains a significant defense against claims of copyright infringement. Like other copyrighted works in archival collections, copyrighted social media contents may be used without permission in cases permitted by fair use. U.S. copyright law allows for the fair use of copyrighted material, which occurs when a work is reproduced “for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, [and] is not an infringement of copyright.” When making fair use decisions, courts weigh the effects of the four factors of fair use. These are: “1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes, 2) the nature of the work, 3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole, 4) the effect of the use upon the potential market for or value of the copyrighted work.”
While the fact-specific nature of fair use may seem intimidating, recent judicial decisions related to the Google Books project and Hathi Trust Archives suggest that non-profit library archiving is not only compatible with fair use, but may be considered “quintessentially transformative” when used to support library practice. This Toolkit cannot answer all fair use questions, no one can until a court rules on a specific fact pattern. However, archivists often take calculated fair use risks when using copyrighted material in support of their professional mission. As Peter B. Hirtle suggests, educating oneself on the legal environment for digital collecting goes a long way toward working comfortably within a shifting legal environment. Librarians seeking to collect social media materials may find useful guidance in a variety of sources including Principle Eight of the Code of Best Practice for Fair use in Academic and Research Libraries. This principle, which describes practices for “collecting material posted on the world wide web and making it available” can provide a framework for individual fair use analysis.
Another issue faced when collecting social media is that of privacy. Privacy concerns blur the boundaries between legal and ethical considerations. An increase in computer technology means that it is easier to accumulate data and connect various data points about an individual, which is more difficult, though not impossible, to do with written records. Individual privacy as a legally defined right was outlined by Samuel Warren and Louis Brandeis in 1890. They advocated for privacy protection as they thought an invasion of privacy, “subjected [an individual] to mental pain and distress, far greater than could be inflicted by mere bodily agony.” They defined privacy as the right “to be let alone.” In 1960, William Prosser added the four classifications of interests protected under privacy law. These include 1) the intrusion upon a person’s seclusion or private affairs, 2) public disclosure of embarrassing private facts, 3) publicity which places a person in a false light, and 4) misappropriation or the use of of a person’s name and image for commercial advantage.
Though not directly related to social media, the court dismissed a case regarding the public disclosure of private facts on the internet via a digitization program at Cornell University. The case, Vanginderen vs. Cornell University, occurred when Kevin Vanginderen claimed that Cornell University’s historic student newspaper digitization project had disclosed private information by publishing a 1983 newspaper online. An issue of The Cornell Chronicle named him as the suspect in a third degree burglary case. He eventually was found innocent of the crime. Vanginderen claimed that the digitization of the newspaper harmed his reputation because it was libelous and constituted the publication of private facts. The court dismissed the case twice. In a press release, a librarian at Cornell said that “this is a real victory for the library in terms of being able to make documentary material accessible… I do share concerns that individuals might have potentially embarrassing material made accessible via the Internet but I don’t think you can go back and distort the public record.” This case indicates that similar privacy and defamation concerns would not apply to publicly available social media posts because information already accessible to the public cannot be claimed as private.The privacy implications that exist when harvesting social media occur because these platforms are dynamic, i.e., posts can be updated or deleted., and content is rarely created with use by researchers in mind. Despite the legal considerations, harnessing social media content in research “is the frontier of social science--experiments on people who may never even know they are subjects of study, let alone explicitly consent.” This implies that privacy concerns within social media collecting are better addressed within the realm of ethics.
The enigmatic ethical questions posed by social media collecting often lack clear answers, and sometimes only lead to more questions. Some of them even appear to be transplants from traditional archiving and researching. Such queries include: What social media content is ethical to preserve? What, if any, information is it ethical to extract from this content? What procedures for harvesting and preserving are ethical? How does one balance her personal and professional ethical responsibilities?
To assist archivists in approaching these questions, regardless of media or format, the Society of American Archivists (SAA) has developed the “Core Values of Archivists” and “Code of Ethics for Archivists.” The core values consist of access, use, accountability, advocacy, diversity, history, memory, preservation, professionalism, custody, selection, service, and social responsibility. These values help outline archival professional responsibilities and can act as a code of conduct. The code of ethics, which consist of professional relationships, judgment, authenticity, security, protection, access, use, privacy, and trust, act as the “principles of the profession.” In archives, or other cultural heritage institutions, ethical questions raised sometimes require balancing a donor’s right to privacy and a researcher's right to access. This difficult responsibility is often at the forefront of ethical issues in archives. For example, processing collections can reveal sensitive information about donors and their family members that remaining heirs might not want publicized or known; at the same time, restrictions hinder access to research data. Archivists have to navigate the right to access and donor privacy while working within the ethical code outlined by the SAA.
In addition to the SAA code of ethics and core values, SAA members have written case studies about ethical dilemmas they have faced and what aspect of the Code of Ethics it relates to. Currently, the four case studies provided on the SAA website demonstrate archivists need to document all of their selection actions and decisions when curating collections. Archivists should also consult with other relevant sources, be they peers or communities of interest, to guarantee that all relevant perspectives inform their selection behaviors. Additionally, archives need to be cognizant of culturally sensitive material within their collections and know how to ethically manage it.
Another resource that describes the various ethical situations faced by archives is Karen Benedict’s, Ethics and the Archival Profession, which provides forty different case studies. They address issues regarding donor relations, copyright, professional development, professional conduct, deeds of gift, and responsibility to employing institution.
Outside of the archival profession, several groups have attempted to address ethical concerns with social media collecting. As of August 2014, conferences on digital research ethics were being planned at Stanford and MIT, and academic journals started working on issues devoted to ethics.
A recent controversy on the topic of privacy involved researchers manipulating changes to Facebook users’ newsfeeds in order to study which differences correlated to an increase in voting behaviors. This example highlights the need to establish and maintain research ethics when using new digital platforms like social media sites. When this study was published, the public was outraged over the manipulation of user content and Facebook’s disregard for user privacy. A New York Times article provided a list of questions that researchers should consider when creating a study, especially one based in the online realm. These questions include: “What are the benefits of this test? Who will be affected? How would you feel if you were a subject of the experiment? Is there a better way to achieve the same result?”
While privacy leads the discussion of ethics for social media harvesting and preservation, it rarely stands alone. Frequently associated with privacy are consent, treatment of users and their creations, security, access, responsibility and control of content, transparency, and use, as demonstrated by the Facebook research controversy mentioned above. Currently, there is no standard for navigating this web of ethical quandaries; however, that could change very soon. Bryan Lewis and Caitlin Rivers await peer review of their co-authored paper “Ethical Research Standards in a World of Big Data,” which presents an ethical model for researchers using Transparency, Anonymity, Control, Tracking, Institutional Review Board, Context, referred to as “TACTICs.” Following TACTICs, researchers would make transparent studies that are publicly available, respect tweets’ contexts, secure data which might be capable of revealing tweeters’ identities, refrain from using Twitter data to garner more information about tweeters from elsewhere, be obligated to receive Institutional Review Board clearance for studies necessitating the collection of data from individuals, and honor users’ privacy settings. Though the TACTICs model was not created with collecting institutions in mind, cultural heritage organizations could advocate for its wider adoption and may expect their researchers to conform to the model.
As more archival repositories collect social media, and more researchers use social media data in their research, the legal and ethical landscape will change. For example, a recently published volume, Rights in the Digital Era, contains four articles discussing at length the legal, ethical, and practical concerns of collecting and providing access to digital content. The book is an excellent resource to consult for an in-depth analysis of the concerns discussed above and also contains some professional best practices on digital content in archives, though does not specifically address social media.
Archivists and researchers alike face the challenge of applying already vague ethical and legal practices to the less familiar landscape of social media. Discussing the main ethical and legal issues here should make clear that there is not a one-size-fits-all solution to solving the problems that present themselves when collecting social media. Yet, archivists and researchers face sizable ethical and legal considerations when making collecting, research, and publishing decisions. The same thought put into decisions made for traditional content should also be done when harvesting and preserving social media content. For this reason, familiarizing oneself the ethical and legal landscape is an important first-step in generating social media collection programs.
 Daniel Greenstein et al., Access in the Future Tense (Washington D.C.: Council on Information Resources, April 2004), vi.
 Maria A. Pallante. “The Register’s Call for Updates to U.S. Copyright Law.” United States House of Representatives: 113th Congress, 1st Session, May 2013. http://www.copyright.gov/regstat/2013/regstat03202013.html#_ftn1
 Tiffany Miao, “Access Denied: How Social Media Accounts Fall Outside the Scope of Intellectual Property Law and into the Realm of the Computer Fraud and Abuse Act,” Fordham Intellectual Property, Media Entertainment Law Journal, 2013, http://tinyurl.com/ounubc4.
 U.S. Constitution, Art. 1, § 8.
 17 U.S. Code §106
 James Estrin. “Haitian Photographer Wins Major U.S. Copyright Victory.” The New York Times, November 23, 2013. http://nyti.ms/1qCZG5
 Rebecca Haas, Twitter: New Challenges to Copyright Law in the Internet Age, 10 J. Marshall Rev. Intell. Prop. L. 230 (2010); Brock Shinen, “Twitterlogical, The Misunderstandings of Ownership,” 2009 http://canyoucopyrightatweet.com.
 Laura N. Gasaway, et al., The Section 108 Study Group Report (Washington D.C.: Library of Congress, March 2008), 11, http://www.section108.gov/docs/Sec108StudyGroupReport.pdf.
 James Estrin. “Haitian Photographer Wins Major U.S. Copyright Victory.”
 Brock Shinen, “Twitterlogical, The Misunderstandings of Ownership,” 2009. http://canyoucopyrightatweet.com.
 17 U.S.Code. §301.
 Ian Crouch. “The Great American Twitter Novel.” The New Yorker, July 2014, accessed June 11, 2015. http://www.newyorker.com/books/page-turner/great-american-twitter-novel
 Dante D’Orazio, “Twitter is deleting stolen jokes on copyright grounds,” The Verge, July 25, 2015, accessed July 30, 2015.
 17 U.S. Code §107.
 17 U.S. Code §107. Note that the courts have specific interpretations for each of these four factors.
 Authors Guild Inc., et al. v. Google, Inc., No. 12-3200 (2d Cir. 2013).
 Authors Guild v. HathiTrust, 755 F.3d 87 (2d Cir. 2014).
 Peter B. Hirtle, “Introduction,” in Rights in the Digital Era, ed. Menzi L. Behrnd-Klodt & Christopher J. Prom. (Chicago: Society of American Archivists, 2015), 4-5.
 Heather MacNeil, Without Consent: The Ethics of Disclosing Personal Information in Public Archives (Chicago: Society of American Archivists, 1992), 9-11.
 MacNeil, Without Consent, 1.
 Samuel D. Warren and Louis D. Brandeis, “The Right to Privacy,” Harvard Law Review 4, no, 5 (1890), http://faculty.uml.edu/sgallagher/Brandeisprivacy.htm
 William L. Prosser. “Privacy,” California Law Review 48, no. 3 (1960): 383.
 Digital Media Law Project, “Vanginderen vs. Cornell University,” Berkman Center for Internet and Society. August 29, 2008. http://www.dmlp.org/threats/vanginderen-v-cornell#description
Michael Stratford, “Judge Dismisses Libel Suit Against Cornell,” The Cornell Daily Sun. January 29, 2009. http://cornellsun.com/blog/2009/01/23/judge-dismisses-libel-suit-against-cornell/
 Michael Stratford, “Judge Dismisses Libel Suit Against Cornell.”
 Vindu Goel, “As Data Overflows Online, Researchers Grapple with Ethics,” New York Times, August 12, 2014, accessed March 27, 2014, http://www.nytimes.com/2014/08/13/technology/the-boon-of-online-data-puts-social-science-in-a-quandary.html
 Society for American Archivists, “SAA Core Values Statement and Code of Ethics,” last modified 2012, accessed October 3, 2014, http://www2.archivists.org/statements/saa-core-values-statement-and-code-of-ethics.
 Society for American Archivists, “SAA Core Values Statement.”
 Timothy Ericson, “Privacy: Case Twenty-Nine,” in Ethics and the Archival Profession: Introduction and Case Studies, ed. Karen Benedict (Chicago: Society of American Archivists, 2003), 62-63; Mark Greene, “ Privacy: Case Thirty,” Ethics and the Archival Profession: Introduction and Case Studies, ed. Karen Benedict (Chicago: Society of American Archivists, 2003), 64-65.
 Society for American Archivists, “SAA Core Values Statement.”
 Nancy Freeman and Robert B Riter, “An Online Exhibit: A Tale of Triumph and Tribulation,” Society of American Archivists: Case Studies in Archival Ethics (May 2014), http://www2.archivists.org/sites/all/files/AnOnlineExhibit-SAA-CaseStudy_0.pdf.
 Ellen Ryan, “Identifying Culturally Sensitive American Indian Material in a Non-tribal Institution,” Society for American Archivists: Case Studies in Archival Ethics (September 2014), http://www2.archivists.org/sites/all/files/AmericanIndianMaterial_CEPC-CaseStudy3.pdf.
 Benedict, Ethics and the Archival Profession.
 Goel, “As Data Overflows.”
 Goel, “As Data Overflows.”
 Vindu Goel, “Facebook Promises Deeper Review of Research, but is Short on the Particulars” New York Times, October 2, 2014, accessed March 27, 2015, http://www.nytimes.com/2014/10/03/technology/facebook-promises-a-deeper-review-of-its-user-research.html.
 Bryan Lewis and Caitlin Rivers, “Ethical Research Standards in a World of Big Data,” [v2; ref status: approved with reservations 2, http://f1000r.es/3vc] F1000Research (August 21, 2014), accessed March 27, 2015, doi: 10.12688/f1000research.3-38.v2.
 Eds. Behrnd-Klodt, Menzi L. and Christopher J. Prom. Rights in the Digital Era. (Chicago: Society of American Archivists, 2015).