In 2022, the ORCID US Community partnered with the Drexel University LEADING program to create a set of resources for visualizing publication collaboration activity based on public data from researchers’ ORCID records and Crossref DOI publication metadata. Thanks to the two fellows who worked on this project, Negeen Aghassibake and Olivia Given Castello, anyone can use the resulting resources to gather collaboration data and create a collaboration map using open tools. Please see the README section of our ORCID Data Visualization Github repository for more information.
Creating your own Visualization
To create a data visualization for your own organization, please follow these instructions:
- Download RStudio, a free and open tool for running R scripts.
- Access the R script provided in our ORCID Data Visualization Github repository. Read the comments and follow the instructions in the R script to fill in search values for your organization, and run the script in RStudio. When the script is finished running, you should have a CSV file named orcid-data containing all of the data retrieved by the script.
- Go to Tableau Public, a free and open tool for creating data visualizations, and create a free Tableau Public account by clicking the button to sign up for Tableau Public.
- Follow the instructions in the Customizing your own Tableau dashboard documentation (provided in our ORCID Data Visualization Github repository) to: make a copy of our dashboard template, load your own data into your copy of the dashboard, and customize your resulting visualization.
Using your Data Visualization
Once you have produced a visualization for your organization, you can explore the data to investigate publication collaborations that have taken place at your organization, within the time frame that you specified in the R script. Keep in mind that not all collaboration activity is represented in the visualization, due to missing ORCID iDs and missing data in ORCID records, and the fact that the script is currently based on publications with Crossref DOIs only. However, the resulting visualization can be used to:
- Demonstrate the value of Persistent Identifiers (PIDs) - the collaboration data and map are based solely on the existence and usage of ORCID iDs for individuals (with employment affiliations filled out in individual ORCID records) and DOIs for publications. Because DOIs can be included in individuals’ ORCID records, and ORCID iDs can be included in DOI metadata, PIDs enable us to find out, at scale, which individuals from which institutions are collaborating on which publications. Without PIDs and the underlying interoperable research infrastructure that PIDs provide, it would be much harder, if not impossible, to gather and visualize this data.
- Encourage more usage of ORCID amongst researchers - individual researchers will only show up in the data and on the map if they have an ORCID iD and have their employment information filled out in their ORCID record. Individuals who do not have an ORCID iD will not show up on the map. So, if individuals want to see themselves represented, they will need to get and use their ORCID iD. Those individuals who are present in the data, can use their information from their visualization to support promotion and tenure documentation, research reporting, and related activities.
- Encourage more usage of the ORCID member API at your organization/amongst ORCID member organizations - discrepancies, such as mis-spellings and typos in organization names, can be avoided when institutions use the ORCID member API to write employment affiliations to researchers’ ORCID records. This is considered a best practice, because organizations can standardize the way that their organization shows up on their researchers’ ORCID records, and lend more authority to the information in the researchers’ ORCID record.
- Illustrate potential impact of research/publication activity at your organization - although the data presented are incomplete, the visualization can be a tool for showing research activity at an organization, which could be helpful towards the goal of determining “research impact.”
- Draw attention to collaboration activity at your organization - the map reveals where the collaborations are taking place geographically, and with researchers at what other organizations. Conversely, the map can expose where collaboration is not taking place.
The R script is the underlying resource that gathers data to be used in the visualization. As an overview, the R script:
- Retrieves current ORCID profiles for a home institution
- Unpacks Works list for every ORCID profile
- Retrieves CrossRef data for every Work DOI
- Unpacks CrossRef Co-author list
- Retrieves ORCID profile for every Co-author ORCID ID
- Checks current Employment affiliation for every Co-author
- Gets location information for the Co-author institution
- Repackages data into CSV file of individual home author/co-author/DOI collaborations
While the current script is capable of retrieving significant information, some possible code improvements that could be made in the future include:
- Query DOI metadata from DataCite as well as CrossRef
- Make time period of interest more flexible
- Do more to resolve names and fill in blank ORCID iD data
- Clean department affiliation data, so that dashboard could visualize collaborations by discipline
- Create a version of the script that individuals could run to retrieve their own personal collaboration data
- Re-code to include citation metrics for each DOI
Watch our Data Visualization Dashboard Demo.
The R script for this project, created by Olivia Given Castello, uses rorcid and rcrossref packages developed by Scott Chamberlain, co-founder of rOpenSci, and builds on code by Clarke Iakovakis that has been used for the Force 11 Scholarly Communications Institute Working with Scholarly Literature in R.
The data visualization template and documentation were created by Negeen Aghassibake.
The Drexel University LEADING program is supported by funding from the Institute of Museum and Library Services.