Link to home
Start Free TrialLog in
Avatar of tomfolinsbee
tomfolinsbee

asked on

Excel function to retrieve data from www.clinicaltrials.gov

Hello Experts!

I'm looking for a way to connect a list of clinical trials in Excel to related trial data at ClinicalTrials.Gov .

The attached Excel file has a list of clinical trial IDs in the first column. I would like to retrieve selected data from www.clinicaltrials.gov using the available API.

API Documention available at https://clinicaltrials.gov/api/gui/ref/api_urls

I am using Office 365 ProPlus with  Excel for Office 365 MSO 32 bit.

Would Excel's Get & Transform tools work?  Or could we use VBA to write a custom function?

I don't anticipate having more than 500 trials and 20 fields.   

EE-Excel clinical trial API 20200923.xlsx 

Thank you! 

Avatar of Ryan Chong
Ryan Chong
Flag of Singapore image

Would Excel's Get & Transform tools work? 
that will probably work.

for example, see the attached.

EE-Excel-clinical-trial-API-2020092_b.xlsx
Avatar of tomfolinsbee
tomfolinsbee

ASKER

Thanks Ryan. I tried the Get & Transform web view and can see how you imported the field list.

I'm now trying to write the URL that will fetch the 250+ trial records and selected fields.

Would this approach make sense: Write a vba script that creates a URL for each NCTId and selected fields, then automate the Get & Transform step to import each row (in this test file, approx 250+ trial records). Or any other approach? 

Study Fields(demo)ClinicalTrials.gov/api/query/study_fields?expr=heart+attack&fields=NCTId,Condition,BriefTitleReturns values from selected API fields for a large set of study records. Select the fields returned using the fields parameter (shown in the Query Parameters table). For a complete list of fields, use the Study Fields List info URL.Returns 20 study records by default.Returns up to 1,000 study records when minimum rank and maximum rank parameters are set.Returns in JSON or CSV format when the format parameter is set (fmt=JSON or fmt=CSV, respectively).

Would this approach make sense: Write a vba script that creates a URL for each NCTId and selected fields, then automate the Get & Transform step to import each row (in this test file, approx 250+ trial records). Or any other approach? 

I never done such complex manipulation before, but if it's within Excel, I guess it's possibly feasible...

Are you familiar with other languages?  It would seem this may be easier to do outside of Excel where you can program logic and storing arrays etc.  In VBA you should be able to do this using xmlhttppost. I am not well versed in vba but have done this in vbs and at least I am pretty sure that is similar.  

Ryan has a similar example https://www.experts-exchange.com/questions/29190386/extract-field-in-xml-using-vba.html#a43134101..  With VBA or VBS you have to manually extract out XML and json can be troublesome.  Just about every other language has automation for this part. 
Not sure I will have time until next week, but I may have some VBA code that did something similar using JSON data stream.  I'll take a look when I can...


»bp
Okay, got some done on this today, and it's not fool proof it seemed to work for the data you had.

I used an open source JSON parsing module I had used previously, but did a little manipulation of the returned JSON stream before I parsed it to make things a little simpler.

Also no error handling built in at this point, or any fancy formatting of the sheet after it is populated.

I left the data in the sheet I got when I ran it, took a minute or two to populate.

You can run the macro named "GetTrialData" and it will erase the data on the first sheet and then pull it again.  Be patient while it works along, not a lot of visual indicators that it's working other than the cursor.

Hope this might be useful as a starting point, or maybe even good enough for you needs.

EE29195591.xlsm


»bp
Guys thanks so much for this. It's been over six years since I used experts-exchange, and its great to be back.  I've already got my subscription paid for with this solution.

Bill, can I ask you to help make a few tweaks to the solution?
1. Format dates as dates.
2. Populate the column containing the NCT_IDs by pointing to a column in another sheet (may or may not be in the same workbook), and deduplicating the NCT_IDs. This way I have option to integrate the solution with my master file.
3. If the script encounters an invalid NCT_ID not recognized by the clinicaltrial.gov API, then either highlight or other otherwise indicate which records were not retrieved. [not critical] 
Bill, can I ask you to help make a few tweaks to the solution?

I'll digest those and come back with potential changes, or questions if needed.


»bp
ASKER CERTIFIED SOLUTION
Avatar of Bill Prew
Bill Prew

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Gentlemen, many thanks for your interest in this question.  I struggled with it for so long until I remembered Experts-Exchange. I think the solution can help a lot of other people that also need to access clinical trial records. The API has over 300 fields, compared to only 20 fields that can be downloaded via the website. Again thank you! 
Thanks Bill, the solution works great and I really appreciate you adding in the formatting and error correction.  And you are absolutely right, it's better for me to generate the input list separately.

Thanks for the feedback, glad that was useful.

If you have any issues you can always message me on the site...


»bp