Data Visualization

In 2022, the ORCID US Community partnered with the Drexel University LEADING program to create a set of resources for visualizing publication collaboration activity based on public data from researchers’ ORCID records and Crossref DOI publication metadata. Thanks to the two fellows who worked on this project, Negeen Aghassibake and Olivia Given Castello, anyone can use the resulting resources to gather collaboration data and create a collaboration map using open tools. Please see the README section of our ORCID Data Visualization Github repository for more information.

Creating your own Visualization

To create a data visualization for your own organization, please follow these instructions:

  1. Download RStudio, a free and open tool for running R scripts.
  2. Access the R script provided in our ORCID Data Visualization Github repository. Read the comments and follow the instructions in the R script to fill in search values for your organization, and run the script in RStudio. When the script is finished running, you should have a CSV file named orcid-data containing all of the data retrieved by the script. Additional details and tips for running the script can be found in our README file.
  3. Go to Tableau Public, a free and open tool for creating data visualizations, and create a free Tableau Public account by clicking the button to sign up for Tableau Public.
  4. Follow the instructions in the Customizing your own Tableau dashboard documentation (provided in our ORCID Data Visualization Github repository) to: make a copy of our dashboard template, load your own data into your copy of the dashboard, and customize your resulting visualization.
  5. Once you have created your visualization, email orcidus@lyrasis.org to let us know about your experience. We would love to hear from you!

Using your Data Visualization

Once you have produced a visualization for your organization, you can explore the data to investigate publication collaborations that have taken place at your organization, within the time frame that you specified in the R script. Keep in mind that not all collaboration activity is represented in the visualization, due to missing ORCID iDs and missing data in ORCID records, and the fact that the script is currently based on publications with Crossref DOIs only. However, the resulting visualization can be used to:

  • Demonstrate the value of Persistent Identifiers (PIDs) - the collaboration data and map are based solely on the existence and usage of ORCID iDs for individuals (with employment affiliations filled out in individual ORCID records) and DOIs for publications. Because DOIs can be included in individuals’ ORCID records, and ORCID iDs can be included in DOI metadata, PIDs enable us to find out, at scale, which individuals from which institutions are collaborating on which publications. Without PIDs and the underlying interoperable research infrastructure that PIDs provide, it would be much harder, if not impossible, to gather and visualize this data.
  • Gain insight about current ORCID usage at your organization - because the underlying data is based on public information within ORCID records, the size of the resulting dataset and subsequent visualization can reveal how widely researchers at an organization are using their ORCID record as a place to keep information about their works and other activities. If a resulting dataset is smaller than what it "should" be, it could be an indication that researchers are not yet using ORCID to its fullest extent, and more outreach/education is needed.
  • Encourage more usage of ORCID amongst researchers - individual researchers will only show up in the data and on the map if they have an ORCID iD, have their employment information filled out in their ORCID record, and have one or more works (with Crossref DOIs) in their ORCID record. Individuals who do not have an ORCID iD will not show up on the map. So, if individuals want to see themselves represented, they will need to get and use their ORCID iD. Those individuals who are present in the data, can use their information from their visualization to support promotion and tenure documentation, research reporting, and related activities.
  • Encourage more usage of the ORCID member API at your organization/amongst ORCID member organizations - discrepancies in the data, such as mis-spellings and typos in organization names, can be avoided when institutions use the ORCID member API to write employment affiliations to researchers’ ORCID records. This is considered a best practice, because organizations can standardize the way that their organization shows up on their researchers’ ORCID records, and lend more authority to the information in the researchers’ ORCID records.
  • Illustrate potential impact of research/publication activity at your organization - although the data presented are incomplete, the visualization can be a tool for showing research collaboration activity at an organization, which could be helpful towards the goal of determining “research impact.”
  • Draw attention to collaboration activity at your organization - the map reveals where the collaborations are taking place geographically, and with researchers at what other organizations. Conversely, the map can expose where collaboration is not taking place.

Technical Details

The R script is the underlying resource that gathers data to be used in the visualization. As an overview, the R script:

  • Retrieves current ORCID iDs for researchers who have a current, publicly visible employment affiliation for a home institution on their ORCID record
  • Unpacks the publicly visible works information present on each ORCID record
  • Retrieves Crossref DOI metadata for every work that has a Crossref DOI included on the ORCID work citation
  • Unpacks list of co-authors included in the Crossref DOI metadata for each work
  • Retrieves ORCID iD for every co-author, if available
  • Checks current employment affiliation on the ORCID record of every co-author
  • Gets location information for the co-author's institutions
  • Repackages data into CSV file containing home author ORCID iDs, co-author ORCID iDs and institutional affiliations/geographic location, and publication DOIs

While the current script is capable of retrieving significant information, some possible code improvements that could be made in the future include:

  • Query DOI metadata from DataCite as well as Crossref
  • Make time period of interest more flexible
  • Do more to resolve names and fill in blank ORCID iD data
  • Clean department affiliation data, so that dashboard could visualize collaborations by discipline
  • Create a version of the script that individuals could run to retrieve their own personal collaboration data
  • Re-code to include citation metrics for each DOI
  • Create a version of the script that could be run by funding organizations or others to retrieve collaborations for researchers from multiple organizations, based on an existing list of researcher ORCID iDs

Additional Resources

Credits

The R script for this project, created by Olivia Given Castello, uses rorcid and rcrossref packages developed by Scott Chamberlain, co-founder of rOpenSci​, and builds on code by Clarke Iakovakis that has been used for the Force 11 Scholarly Communications Institute Working with Scholarly Literature in R​.

The Tableau data visualization template and documentation were created by Negeen Aghassibake.

The Drexel University LEADING program is supported by funding from the Institute of Museum and Library Services. 

Usage License

Collaboration Data Visualization © 2022 by Lyrasis is licensed under CC BY-SA 4.0