• Status: Open
  • Priority: Medium
  • Security: Public
  • Views: 29
  • Last Modified:

Enriching a dataset with another dataset whiling ensuring no fields are blank.

I'm attempting to create a dataset where an email list is enriched with the person's metadata such as department, manager, state, etc.  Data set A has a field of email addresses.  Data set B also has  field of email addresses along with multiple rows of metadata for the person.  As illustrated in the attached image, there may be missing values in some records, but not missing in other records.  Supposing that the attached data set is for a John Smith, how can I produce a final report where John Smith has values for each metadata field.  I need to do this in Python and the metadata should come from the latest data record first and then iterate over the other other records if values are missing.  Another point to make is that the metadata field names may change, so it's better that we refer to them by their index value.
1 Comment
Could you give a slighly more concrete example for each data set? (About three entries should be enough if they cover all cases)
Where does the data come from?

Isn't that kind of the same problem as in https://www.experts-exchange.com/questions/29088374/Ingesting-and-analysing-file.html

Join & Write a Comment

Featured Post

A proven path to a career in data science

At Springboard, we know how to get you a job in data science. With Springboard’s Data Science Career Track, you’ll master data science  with a curriculum built by industry experts. You’ll work on real projects, and get 1-on-1 mentorship from a data scientist.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now