troubleshooting Question

Extracting references from PDF files and comparing with a specified list

Avatar of Member_2_8044579
Member_2_8044579 asked on
DatabasesPDFMicrosoft Excel* extract
14 Comments2 Solutions226 ViewsLast Modified:
Hi all,
I have a series of PDF files (95% ellectronically created, not scanned) with scientific articles and for a non-commercial bibliographic database we need:
1. to extract all references (i.e. bibliography, works cited) from each one, with no fixed format, although most of them use the author's surname-comma-name format (sometimes with several authors), then year (sometimes between parenthesis), then title;
2. to compare those extracted data with a list of previously selected articles and authors, so I can see whether the PDF articles cite any (one or more) of the titles in the list;
3. to present those citations in an easily readable way, such as an Excel table with a column for the citing document and more columns for the cited ones.
Doing all these steps manually would take years, so I was wondering if any expert here could help us.
I have a related question that I will post separately.
Get vaccinated; Social distance; Wear a mask
Join our community to see this answer!
Unlock 2 Answers and 14 Comments.
Start Free Trial
Learn from the best

Network and collaborate with thousands of CTOs, CISOs, and IT Pros rooting for you and your success.

Andrew Hancock - VMware vExpert
See if this solution works for you by signing up for a 7 day free trial.
Unlock 2 Answers and 14 Comments.
Try for 7 days

”The time we save is the biggest benefit of E-E to our team. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange.

-Mike Kapnisakis, Warner Bros