Love Data Week

Printing is not supported at the primary Gallery Thumbnail page. Please first navigate to a specific Image before printing.

Switch View to Grid View Slideshow

How to Scrape 4 Million Research Papers

Alex Chan

What does it take to scrape 4 million research papers? More patience than code. This talk distills hard-won lessons from building an instrument extraction pipeline across the chemistry literature. We'll cover the problem space—why existing research databases lack the structured metadata we needed—and the technical approach combining OpenAlex, PDF scraping, and AI-powered extraction. But the real focus is on what went wrong and how we adapted: discovering misclassified documents, learning to validate early and often, and accepting that data quality trumps data quantity. Attendees will leave with a practical framework for approaching large-scale research data projects and realistic expectations for timeline and effort.
Powering AI Within Excel

Nalin Johri

Microsoft Copilot and Excel use natural language (no coding!) to automate tasks, generate formulas, create visualizations, and provide data insights, making complex analysis accessible to all users. Using a variety of data, this hands-on session will get users started on – Getting data ready for using Excel and Copilot Using the AI-powered chat interface Create formula using your own words Make changes to data Analysis of data and text.
Research Data Services

Michael Murphy

Introduction to the scope of services and offerings of the Seton Hall University Research Data Services team. This unit located in the University Libraries, first formed in 2020, is designed to assist faculty, students, and staff in their needs for finding, analyzing, managing, and visualizing data for research.
Grammar of Graphic Programming: Tufte and Scientific Coding

Jason Bundy
Research Data Services at Seton Hall University

Michael Murphy
Love Data Week 2021

Lisa DeLuca
Love Data Week at Seton Hall University Libraries

Lisa DeLuca
University Libraries Sponsors Second Love Data Week Series

Michael Giorgio
University Libraries Sponsors Love Data Week

Michael Giorgio

Love Data Week

How to Scrape 4 Million Research Papers

Powering AI Within Excel

Research Data Services

Grammar of Graphic Programming: Tufte and Scientific Coding

Research Data Services at Seton Hall University

Love Data Week 2021

Love Data Week at Seton Hall University Libraries

University Libraries Sponsors Second Love Data Week Series

University Libraries Sponsors Love Data Week

Search

Browse

Author Corner

SHU Links