Since I started posting network tutorials on this site, people will occasionally write to ask me about the included example datasets. I also get e-mails from people asking where they might find network data to use for a project or in teaching. Seems like a good idea to post a quick reply here.
The datasets included in my tutorials are mostly synthetic (or trimmed and heavily manipulated) in order to illustrate various visualization aspects in a manageable way. Feel free to use those datasets (citing or linking to the source is appreciated), but keep in mind that they are artificially generated and not the result of actual data collection. When I do use empirical data, the download files include documentation (if the data is collected by me) or clearly point to the source (if the data was collected by someone else).
If you are looking for network data, large or small, there are a number of excellent open online repositories that you can take a look at. Below is a short list (feel free to e-mail me if you have other good links, and I will add them here).
- Stanford Large Network Dataset Collection
- UCI Network Data Repository
- Network Data Repository
- ASU social computing data repository
- Indiana University CNetS data
- The Koblenz Network Collection
- The Nexus Network Repository
- SocioPatterns Datasets
- Ucinet Datasets
- Pajek Datasets
If you are looking for network data to use in teaching, I would also recommend having students collect social media data. For graduate students, R packages like twitteR and Rfacebook may be a good way to do this. For undergraduate students, I recommend NodeXL, an intuitive and easy to use Excel addon that can grab data from Facebook, Twitter, YouTube, and other sources.