Link to home
Start Free TrialLog in
Avatar of Crazy Horse
Crazy HorseFlag for South Africa

asked on

How to check if item in an array exists in an array of objects

I get data from an API and then save it in the database. I do this every few hours. Sometimes the data from the API is the same and in which case I don't want to add the record again to the database and have duplicate records. So currently I am fetching the data from the database and creating an array which holds one of the unique identifiers like this:


const databaseArticles = await News.find({});
const databaseArticleSourceUrls = databaseArticles.map((item) => {     return item.sourceUrl; });

Open in new window

The array of objects from the API looks something like this:

    const newsItems = [
      {         title: titleOne,         sourceUrl: urlOne       },       {         title: titleTwo sourceUrl: urlTwo       },     ];

Open in new window

    const newsForDb = newsItems.map((item) => {
      return {         title: item.title,         sourceUrl: item.sourceUrl,       };     });

Open in new window


So, basically if the url in the API exists in the database, I want to ignore it and move on to the next one, and then finally insert all the new ones into the database and prevent duplicates. 


I am not sure if there is an easy way to do this with Mongoose/Mongo or if I need to use Javascript to compare the two and then create a new array that I insert into the database. 


For reference this is the insert:


     try {
       await News.insertMany(newsForDb);      } catch (err) {       console.log(err);      }

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Michel Plungjan
Michel Plungjan
Flag of Denmark image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Well, start with a clean database design. You description means that it is not properly normalized and you're missing some candidate key definitions.
Avatar of Crazy Horse

ASKER

@Michel, thanks for that!

@ste5an, not sure what the database design has to do with it as I can't control if the API has been updated or not when the cron job runs?
When you don't want duplicates in a database table, the first and most important tool is a UNIQUE constraint. A UNIQUE constraint, which is a candidate key, ensures that a row can not have duplicates.

This would ensure that you never will have duplicates in your table. Without such a constraint, there is no guarantee that you don't insert a duplicate.

Furthermore, such a constraint would raise an error when you try to insert a duplicate. You can handle it and act accordingly.

Whether you have additional tests in the consumer for duplicates depends on the kind of usage of this table. Cause certain concurrency issues under load can only be solved with a UNIQUE constraints. Testing it only in the consumer can lead to duplicates or rows not inserted, if the testing code reads outdated data. In the latter case you need to consider which case is the bearable one or you must switch to a serialized data processing instead.
Thanks for the clarity ste5an. In my Mongoose schema I have set a particular field in the schema to be unique so there would be an error should a duplicate try to be added into the database. But I wanted to handle it when the API call is made and check the database entries before trying to add records to the database. Without this check the insert would fail because of the duplicate flag on the schema. 
Avatar of Norie
Norie

ste5an

I think Crazy Horse is working with MongoDB which is a NoSQL database.
Yup, seen it to late. But basically the same principles apply here too.

The additional round trip to the server can lead to concurrency issues. Another thing to mention is, that a consumer side approach must carefully do an actual reload of the data to avoid using cached and thus possible outdated information.