-
How to Scrape 4 Million Research Papers
Alex Chan
What does it take to scrape 4 million research papers? More patience than code. This talk distills hard-won lessons from building an instrument extraction pipeline across the chemistry literature. We'll cover the problem space—why existing research databases lack the structured metadata we needed—and the technical approach combining OpenAlex, PDF scraping, and AI-powered extraction. But the real focus is on what went wrong and how we adapted: discovering misclassified documents, learning to validate early and often, and accepting that data quality trumps data quantity. Attendees will leave with a practical framework for approaching large-scale research data projects and realistic expectations for timeline and effort.
-
Research Data Services
Michael Murphy
Introduction to the scope of services and offerings of the Seton Hall University Research Data Services team. This unit located in the University Libraries, first formed in 2020, is designed to assist faculty, students, and staff in their needs for finding, analyzing, managing, and visualizing data for research.
Printing is not supported at the primary Gallery Thumbnail page. Please first navigate to a specific Image before printing.