Data Science and Visualization Institute for Librarians Schedule

April 24 - 28, 2017

James B. Hunt, Jr. Library

NC State University

Monday April 24th

8:00 am                      Transportation from hotel to Duke Energy Hall, Hunt Library

8:30 am - 9:30 am      Breakfast, Welcome and Learning Norms

9:30 am - 12:00 pm    Data Description, Sharing, and Reuse (Thu-Mai Christian, UNC)

This workshop will introduce descriptive, administrative and technical metadata that facilitates discovery and reuse of existing data. Explore unique identifiers and best practices for version control. Discuss data sensitivity and tools for managing privacy and access controls including data sharing agreements between researchers, de-identification, and encryption. Demonstrate methods of sharing large files among researchers and making them available such as file transfer and posting in general and subject matter repositories.

12:00 pm - 1:00 pm     Lunch

1:00 pm - 2:00 pm       Version Control with Git and Github (Bret Davidson and Heidi Tebbe, NCSU Libraries)

General overview of the definition, history, and purpose of version control systems with examples of use in software development and data management. Walkthrough of the key features of the GitHub web interface and desktop client. Participants will do hands on exercises using a rubric to evaluate open source software on GitHub.

2:00 pm - 2:30 pm       Break

2:30 pm - 4:30 pm       Bibliometric Network Analysis Using Sci2 and Gephi (Kris Alpi and Danica Lewis, NCSU Libraries)

Demonstration and discussion of co-authorship analysis using Web of Science data, Sci2 and Gephi with hands-on activities.

4:45 pm                        Transportation back to hotel

5:30 pm - 7:30 pm       DSVIL Welcome Reception, Sheraton Hotel


Tuesday, April 25th

8:00 am                     Transportation from hotel to Hunt Library

8:30 am - 9:00 am    Breakfast and Small Group Reflection

9:00-12:00 pm          Fundamentals of Data (Emily Griffith, NCSU)

This session will focus on the first steps of data exploration and analysis. Topics covered will include the nature and types of measurement, descriptive statistics, and guidance on how to choose an appropriate analysis method based on the research question and data types. Demonstrations and hands-on exercises will use popular statistical analysis packages such as SAS and Stata. We’ll discuss the role of statistical consultants and some typical data analysis challenges.

12:00 pm -1:00 pm   Lunch

1:00 pm - 4:30 pm    Data Cleaning and Preparation (Jerry Waller, Elon University)

Working with messy data requires skills that cross disciplines. Fortunately there are tools and strategies that help make cleaning and working with this data easier. This workshop's goals are to provide techniques for identifying patterns, and to explore tools for cleaning and exporting well-formatted datasets. OpenRefine, an open-source and free utility that anyone can download, will be the primary tool used for exercises, but other utilities and strategies will be investigated. Participants will leave with a virtual Swiss Army knife of tips, tricks, and apps, some of which can be--literally--carried in their pockets.

4:45 pm                     Transportation back to hotel


Wednesday, April 26th

8:00 am                     Transportation from hotel to Hunt Library

8:30 am - 9:00 am    Breakfast and Small Group Reflection

9:00 am - 11:00 am  Web Scraping: Gathering Data from Websites, Parsing HTML & JSON, Orchestrating APIs, and Gathering Twitter Streams (John Little, Duke University)

Preexisting and clean data sets such as the General Social Survey (GSS) or Census data are readily available, cover long periods of time, and have well documented codebooks. Meanwhile, researchers increasingly want to gather their own data from websites which provides a different layer of complexity; accessing content from these sources requires different tools and new techniques.  In this workshop we will use an open-source data wrangling tool (OpenRefine) to gather and clean data from webpages, and "crawl" whole websites, discuss and use Application Programming Interfaces (API), and give examples of how APIs are used with social media sources such as Twitter.

11:00 am - 12:00 pm  Data Use Agreements and Open Data (Kris Alpi, Will Cross, Lillian Rigling, NCSU Libraries)

Discussion of how data is addressed in publisher copyright agreements and instructions to authors, with a focus on how data use terms have evolved along with funder mandates for data access.

12:00 pm - 1:00 pm    Lunch

1:00 pm - 3:00 pm     Preparing Data for Qualitative Analysis (Denae Ford, NC State University)

Handling data can be some tricky business. Especially when you’re dealing with personally identifiable information from participants. In this session we’ll discuss ethics of cleaning data, how to prepare data for qualitative analysis, and a brief overview of tools to facilitate the analysis such as R.

3:00 pm - 4:30 pm     Mapping and Geospatial Visualization with QGIS (Jeff Essic, NCSU Libraries)

QGIS is a free and open source software for analyzing and visualizing your geo data, also known as GPS, georeferenced, geocoded, or GIS/Geographic Information System data.  QGIS can be installed on multiple operating systems and has greater analytical capabilities than online mapping sites. We'll learn some GIS fundamentals, how QGIS compares with other GIS software, and how to get started using it.

4:45 pm                      Transportation back to hotel


Thursday, April 27th

8:00 am                      Transportation from hotel to Hunt Library

8:30 am - 9:00 am      Breakfast and Small Group Reflection

9:00 am - 12:00 pm    Data Visualization, Part 1 (Angela Zoss, Duke University)

In this full-day course, participants will be introduced to a wide range of visualization types that can be created with Excel and easy-to-use, free software like Tableau Public,, CartoDB, and Raw. We will focus on skills and techniques that will apply well to both personal visualization projects and patron requests for help creating visualizations. Almost entirely hands on, this workshop will cover the process of making data visual from start to finish--knowing what visualizations work well with particular types of data, knowing what tools can produce different types of visualizations, and knowing how to adjust the design or layout of a figure to create professional output.

12:00 pm -1:00 pm     Lunch

1:00 pm - 4:00 pm      Data Visualization, Part 2 (Angela Zoss, Duke University)

4:15 pm                       Transportation back to hotel


Friday, April 28th

8:00 am                       Transportation from hotel to Hunt Library

8:30 am - 9:00 am       Breakfast and Small Group Reflection

9:00 am - 10:30 am     Tracks

Track 1: Breakout Sessions

Track 2: Intermediate Tableau (Alison Blaine, NCSU Libraries)

In this workshop we will cover how to use groups, calculated fields, filtering and combination charts to analyze and present data.

10:30-10:45                Break

10:45 am -12:00 pm    Lightning Talks

12:00 pm - 1:00 pm     Lunch & Wrap Up

1:15 pm                        Transport to RDU airport or hotel