asked on
How to check if item in an array exists in an array of objects
I get data from an API and then save it in the database. I do this every few hours. Sometimes the data from the API is the same and in which case I don't want to add the record again to the database and have duplicate records. So currently I am fetching the data from the database and creating an array which holds one of the unique identifiers like this:
const databaseArticles = await News.find({});
const databaseArticleSourceUrls = databaseArticles.map((item) => {
return item.sourceUrl;
});
The array of objects from the API looks something like this:
const newsItems = [
{
title: titleOne,
sourceUrl: urlOne
},
{
title: titleTwo
sourceUrl: urlTwo
},
];
const newsForDb = newsItems.map((item) => {
return {
title: item.title,
sourceUrl: item.sourceUrl,
};
});
So, basically if the url in the API exists in the database, I want to ignore it and move on to the next one, and then finally insert all the new ones into the database and prevent duplicates.
I am not sure if there is an easy way to do this with Mongoose/Mongo or if I need to use Javascript to compare the two and then create a new array that I insert into the database.
For reference this is the insert:
try {
await News.insertMany(newsForDb);
} catch (err) {
console.log(err);
}
ASKER
@ste5an, not sure what the database design has to do with it as I can't control if the API has been updated or not when the cron job runs?
This would ensure that you never will have duplicates in your table. Without such a constraint, there is no guarantee that you don't insert a duplicate.
Furthermore, such a constraint would raise an error when you try to insert a duplicate. You can handle it and act accordingly.
Whether you have additional tests in the consumer for duplicates depends on the kind of usage of this table. Cause certain concurrency issues under load can only be solved with a UNIQUE constraints. Testing it only in the consumer can lead to duplicates or rows not inserted, if the testing code reads outdated data. In the latter case you need to consider which case is the bearable one or you must switch to a serialized data processing instead.
ASKER
I think Crazy Horse is working with MongoDB which is a NoSQL database.
The additional round trip to the server can lead to concurrency issues. Another thing to mention is, that a consumer side approach must carefully do an actual reload of the data to avoid using cached and thus possible outdated information.