Link to home
Start Free TrialLog in
Avatar of Daniel Gillett
Daniel GillettFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Help! Parsing a Json file/result from the browser's LocalStorage.

Hello,

I need some help with parsing a json file/object.
The json is from the browsers localStorage.

I have the data already, but I am having trouble parsing it out properly and I am now crunched for time.

I can evaluate and retreive the data from LocalStorage:

let json = await page.evaluate(() => {
    const storage = localStorage.getItem("VistaOmnichannelComponents::browsing-domain-store");

    return json;
});

console.log(json)
fs.writeFileSync(filepath, JSON.stringify(json));

Open in new window

let json = await page.evaluate(() => {
    const storage = localStorage.getItem("VistaOmnichannelComponents::browsing-domain-store");
    return json;
});

console.log(json)
fs.writeFileSync(filepathJSON.stringify(json));

   Easily follow my tacks...
  1. goto: https://www.odeon.co.uk/cinemas/andover/
  2. Then go to google chrome dev lools -> Local Storage to look at the data.
  3. Use the evaluate code above (use puppeterr if you need to write the evaluate function).
I can send the data here if you need.  The file is about 60kb .json file.

User generated image

I need to extract:
  • sitesById,
  • filmById
  • censoryRating
  • genresById
  • filmIdsForQuery

The data that I get is difficult to parse as it's all in one big object.
FYI - I had to create some [ ] around the data in order to start getting, but after hours, I'm not far enough and don't understand where/how to parse it correctly (many levels??)
User generated image
Here are a few sample data objects I created, ready for the data from the LocalStorage:
[sitesById.json]
{
    "id": 1234,
    "name": "London Leicester Square",
    "location": {
        "latitude": 51.5184,
        "longitude": -0.129
    },
    "contactDetails": {
        "phoneNumbers": [
            {"number": "123456789"}
        ],
        "email": "",
        "address": [
            {"line1": "30 Elmer Avenue"}, 
            {"line2":""}, 
            {"city": "London"}
        ],
        "ianaTimeZoneName": "Europe/London"
    }
}

Open in new window

{
    "id"1234,
    "name""London Leicester Square",
    "location": {
        "latitude"51.5184,
        "longitude"-0.129
    },
    "contactDetails": {
        "phoneNumbers": [
            {"number""123456789"}
        ],
        "email""",
        "address": [
            {"line1""30 Elmer Avenue"}, 
            {"line2":""}, 
            {"city""London"}
        ],
        "ianaTimeZoneName""Europe/London"
    }
}

One more sample [filmsById] ... (I can finish it once I get the parsing working for each group)
{
    "id":"HO00001344",
    "title":"Black Widow",
    "synopsis":"The MCU’s Phase 4 kicks off in cinemas with Scarlett Johansson’s Black Widow returning for a prequel solo adventure that delves into the Avenger’s prehistory as a Soviet assassin. It also introduces a mysterious and deadly new supervillain, Taskmaster, and prepares the way for new heroes to rise...",
    "shortSynopsis":"Black Widow confronts the darker parts of her ledger when a dangerous conspiracy with ties to her past arises.",
    "censorRatingId":"0000000001",
    "censorRatingNote":null,
    "releaseDate":"2021-07-07",
    "runtimeInMinutes":134,
    "genreIds": [
        {"0":"0000000001"},
        {"1":"0000000003"},
        {"2":"0000000022"}
    ],
    "categories": [
        {"0": "NowShowing"}
    ],
    "moviexchangeReleaseId":"509a78b4-fe5b-418d-88aa-db333855dfb4",
    "hopk":"HO00001344",
    "hoCode":"A000001558",
    "distributorName":"WALT DISNEY STUDIOS INTERNTL"
}

Open in new window

{
    "id":"HO00001344",
    "title":"Black Widow",
    "synopsis":"The MCU’s Phase 4 kicks off in cinemas with Scarlett Johansson’s Black Widow returning for a prequel solo adventure that delves into the Avenger’s prehistory as a Soviet assassin. It also introduces a mysterious and deadly new supervillain, Taskmaster, and prepares the way for new heroes to rise. On the run after the events of Captain ",
    "shortSynopsis":"Black Widow confronts the darker parts of her ledger when a dangerous conspiracy with ties to her past arises.",
    "censorRatingId":"0000000001",
    "censorRatingNote":null,
    "releaseDate":"2021-07-07",
    "runtimeInMinutes":134,
    "genreIds": [
        {"0":"0000000001"},
        {"1":"0000000003"},
        {"2":"0000000022"}
    ],
    "categories": [
        {"0""NowShowing"}
    ],
    "hopk":"HO00001344",
    "hoCode":"A000001558",
    "distributorName":"WALT DISNEY STUDIOS INTERNTL"
}

I'm new to this site.  I really need the urgent support.

Hope you're out there!

Cheers!
Avatar of Julian Hansen
Julian Hansen
Flag of South Africa image

Use JSON.parse
const data = JSON.parse(localStorage.getItem('VistaOmnichannelComponents::browsing-domain-store');

Open in new window

Avatar of Daniel Gillett

ASKER

Thanks Julian,

I've done that.  It doesn't parse it all the way.  Strange that my browser can parse and display it as well as other editors, but in code - it doesn't work.

I can't get any further than this...

User generated image
Dot.Name doesn't work.
item['myKey'] also does not work.
...Lots of different looping and I still can't figure it out.
Been at this for 2 days now.  grr

Thanks for your time and consideration!
Question:do you own this page and you want script on it to pull the localStorage data
OR
Are you mining the page for the data?

This works from the console for extracting sites
(function() { 
  const data = JSON.parse(localStorage.getItem('VistaOmnichannelComponents::browsing-domain-store'));
  const sites = [];
  for(let e in data['sitesById']) sites.push(data['sitesById'][e].data.payload);
  console.log(sites);
})();

Open in new window

Not sure how you want to consume the data.
The other data items you mention are similarly accessible.
Hi Julian!

Excellent!  ...well, mostly!  There is still the address  [object] to work out.
I haven't done this level of data/json in a very long time. lol.
I think we're mostly there!  Just need the address [Object].

Do you think the code will be identical for the other section I need?
  • sitesById <-- complete
  • filmById
  • censoryRating
  • genresById
  • filmIdsForQuery

Odeon is iteresting because of the "filmIdsForQuery".  It's a narrow list of ids which ref the cinima.  ...You have to go to that cinema or film page to get that specifically.  ...so the last part for me is to go loop back over all of the sites and create these datasets.  From there I will use Mongo or something so that I can query different data sets that I want to use.
Q: Are you mining the page for the data?
A. Yes. but I'm not profiting.  It's mostly accademic.

I'll get started on trying the items.
do you think you can get the address [object] to work?  :-D

There is still the address  [object] to work out. 
In my output the address is there User generated image
As for the rest of the items - yes do as you would.

What I did was did a dump of the localStorage value to the console (all the text) and then used https://codebeautify.org/jsonviewer (tree view) to view the data.
Makes it easy to see what is where.
Q: Are you mining the page for the data?
A. Yes. but I'm not profiting.  It's mostly accademic.

My question was more about the solution. If you just need the data then just dump it to the console and copy it out (I think you have that already) - then use an online JSON to Excel tool (or similar - take a look at the link in my last post) to extract the data.

Easier than writing console JavaScript for this.
Hi Julian,
Thanks for your help yesterday.  I wasn't able to get back to you as soon as I wanted to.

I'm still struggling with the Address info.  Not sure why.  Can you take a look?  I've modified the code slightl but I haven't changed the actual solution that you provided.  
So that I'm not too DRY I used a block of code generically.  It seems that the address level is deeper than coded for.
Any thoughts?

  async evaluatePageStorage(filename, dataSetName) {
    try {
      const list = [];
      const data = await this.page.evaluate(() => {
        return JSON.parse(
          localStorage.getItem(
            "VistaOmnichannelComponents::browsing-domain-store"
          )
        );
      });

      for (let e in data[dataSetName])
        list.push(data[dataSetName][e].data.payload);

      if (saveFile)
        await system.saveFile(data, `data/${filename}-${dataSetName}`);
      else console.log(data);

      return list;
    } catch (error) {
      console.log(error);
    }



Open in new window

the odeon site is a real test!  Their data is seporated across more than five datasets.  For example, it is not easy to get the list of movies playing.  It is also not easy to get the showtimes for each movie at each location.  You have to go to that particular cinema.  The data is only a reference to the SiteId.  ...basically, you have to crawl the entire site before you can make sense/use of the (relational) data.

  • sitesById <-- complete
  • filmById
  • censoryRating
  • genresById
  • filmIdsForQuery
  • showtimesById

Not sure what you are looking for - here is an example of creating a new array of objects with the site id and address.
(function() {
  const data = JSON.parse(localStorage.getItem('VistaOmnichannelComponents::browsing-domain-store'));
  const sites = [];
  for(let e in data['sitesById']) sites.push(data['sitesById'][e].data.payload);
  const siteAddress = sites.map(s => ({id:s.id, address: s.contactDetails.address}));
  console.log(siteAddress);
})();

Open in new window



Thank you again for your time!  
The new line you put it:
const siteAddress = sites.map(s => ({id:s.id, address: s.contactDetails.address}));

Open in new window

Will that return the address info 'inside' the sitesById dataset?
Hi Julian,

There's one other problem.  Same as before, but the data is diferrent. the id I believe is the key: {object}.
here's a sample...
User generated imageas you can see, I get a list of arrays with a list of objects inside each array.
Any ideas?
My Address issues looks like this:

I have a generic evaluate function.
then I have another block of code the takes the parsed data...
User generated image
User generated image
Will that return the address info 'inside' the sitesById dataset? 
It can - in this case I am returning the siteId and the address.

Can we establish what the real problem is - the data is there - so is the problem that in the output you are producing you are seeing the word "Object" instead of the data you expect or is there another problem?

I am a bit confused as to what target we are aiming at. I have demonstrated the data is there and accessible - so not sure what still need to be done.

PS: I have already seen your evaluatePageStorage - so no need to post that again. Work from the point of you have data in localStorage, retrieve it and parse it - you can merge the result back into your final solution later.
I'm really sorry to confuse you.  I've had 3 hours sleep since tuesday. haha
Some obvious things are escaping me.
Can we establish what the real problem is - the data is there - so is the problem that in the output you are producing you are seeing the word "Object" instead of the data you expect or is there another problem? 
Yes.  It's the object.  I know the data is there but I can't get my head around parsing it and keeping it all together, ready to go into a database (prob. mongo or mysql = another time.)

I really need to finish. I've got the address issue.
Then I have the second issue as teh data is different structure and your fix doesn't work with that.  I posted an image above with this problem.  The first bit of data is the key(query_siteIds-009) and then it's [object] and, again, I can't seem to parse it.  :-|
Images don't really help as I will need to type your code out again - rather post the code.
You seem to be explicitly assigning properties from sub-properties of the data that already have those properties.

Post the code from the image and let's go from there.
Hi Julian,
I'm not sure exactly what you mean "explicitly assigning properties from sub-properties".  What I am doing is using puppeteer to crawl all of the odeon pages.  Then, through the localstorage, I am using evaluation to extract data (that you helped me parse).  Once it is parsed, I then go and workout the properties that I need/want in the
getOdeonSitesByIdFromLocalStorage() routien.  Because I have another data structure/system that this data will used in.


Here is the code. (issue extracting address info)
  async evaluatePageStorage(filename, dataSetName) {
    try {
      const list = [];
      const data = await this.page.evaluate(() => {
        return JSON.parse(
          localStorage.getItem(
            "VistaOmnichannelComponents::browsing-domain-store"
          )
        );
      });

      for (let e in data[dataSetName])
        list.push(data[dataSetName][e].data.payload);

      if (saveFile) await system.saveFile(data, `raw/${filename}-raw`);
      else console.log(data);

      return list;
    } catch (error) {
      console.log(error);
    }
  },

Open in new window

The next block of code is the 'process' of the returned data from the 1st evaluate function.
  async getOdeonSitesByIdFromLocalStorage() {
    try {
      const sitesById = await this.evaluatePageStorage(
        "sitesById-data",
        "sitesById"
      );

      sites = [];
      for (let site of sitesById) {
        properties = {};
        properties.id = site.id;
        properties.name = site.name ? site.name : "";
        (properties.location = {
          latitude: site.location.latitude,
          longitude: site.location.longitude,
        }),
          (properties.contactDetails = {
            phoneNumbers: [{ number: site.contactDetails.number }],
            email: site.contactDetails.email,
            address: [
              { line1: site.contactDetails.address.line1 },
              { line2: site.contactDetails.address.line2 },
              { city: site.contactDetails.address.city },
              { postcode: site.contactDetails.address.postcode },
            ],
            ianaTimeZoneName: site.ianaTimeZoneName,
          });
        sites.push(properties);
      }
      await system.saveFile(sites, "/data/sitesById-data");
    } catch (error) {
      console.log(error);
    }



Open in new window



The second problem...
User generated imageThis code...
      for (let e in data[dataSetName])
        list.push(data[dataSetName][e].data.payload);

Open in new window

...doesn't parse the data because the data is a different structure.  When I try to parse it, I get just the list of ids, without the parent key to identify the list of ids with.

The lists of movies need their parent identifier.   ...here's the data:
{
  'query_siteIds-755': {
    loadingState: 'Success',
    data: { payload: [Array], expiresAt: [Object] }
  },
  'query_siteIds-845': {
    loadingState: 'Success',
    data: { payload: [Array], expiresAt: [Object] }
  },
  'query_siteIds-009': {
    loadingState: 'Success',
    data: { payload: [Array], expiresAt: [Object] }
  }
}
[
  'HO00001344', 'HO00001386', 'HO00001379',
  'HO00001413', 'HO00000369', 'HO00000909',
  'HO00000679', 'HO00001364', 'HO00001358',
  'HO00001464', 'HO00001711', 'HO00001965',
  'HO00001579', 'HO00001114', 'HO00001912',
  'HO00001909', 'HO00000406', 'HO00000367',
  'HO00001417', 'HO00000386', 'HO00000986',
  'HO00001421', 'HO00001466', 'HO00000395',
  'HO00001487', 'HO00001472', 'HO00000393',
  'HO00001415', 'HO00000409', 'HO00000388',
  'HO00001347', 'HO00000955'
]
[
  'HO00001344', 'HO00001386', 'HO00001379',
  'HO00001413', 'HO00000369', 'HO00000909',
  'HO00000679', 'HO00001364', 'HO00001358',
  'HO00001464', 'HO00001711', 'HO00001965',
  'HO00001579', 'HO00001114', 'HO00001912',
  'HO00001909', 'HO00000406', 'HO00000367',
  'HO00001417', 'HO00000386', 'HO00000986',
  'HO00001421', 'HO00001466', 'HO00000395',
  'HO00001487', 'HO00001472', 'HO00000393',
  'HO00001415', 'HO00000409', 'HO00001932',
  'HO00000388', 'HO00001347', 'HO00000955'
]

Open in new window



ASKER CERTIFIED SOLUTION
Avatar of Julian Hansen
Julian Hansen
Flag of South Africa image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Julian!

You are correct.  I should have known better!  how silly!

So now I just have the one problem what that different data structure.  I should be able to do it, but I can't seem to keey the parent key.  :-O


You are correct.  I should have known better!  how silly! 
@Daniel apologies if that was the implication - if I had a nickle for each time I had a similar experience I would be a wealthy man.

I should be able to do it, but I can't seem to keey the parent key. 
If you give me some context I can assist.
Hi Julian!

Thanks for getting back to me again.  The last issue I have, maybe you already solved it, is that other data structure - filmIdsForQuery dataset.  
User generated imageIn this case, the fimIdsForQuery only get's populated as you visit the cinema's venu page in order to download.  If you go to more venues, more of these filmIdsForQuery objects appear.  My problem is that the 'key' is the first object, and has a payload with only the arrays of objects.
I'm sure you will do this easily.  Here's the code...

async evaluateFilmQueries(filename, dataSetName) {
    try {
      const list = [];
      const data = await this.page.evaluate(() => {
        return JSON.parse(
          localStorage.getItem(
            "VistaOmnichannelComponents::browsing-domain-store"
          )
        );
      });




      console.log(list);
      if (saveFile) await system.saveFile(data, `raw/${filename}-raw`);
      else console.log(data);

Open in new window


The data, again, is an array of objects.  But I have lost the parent 'key' that these lists belong to.

[
  'HO00001344', 'HO00001386', 'HO00001379',
  'HO00001413', 'HO00000369', 'HO00000909',
  'HO00000679', 'HO00001364', 'HO00001358',
  'HO00001464', 'HO00001711', 'HO00001965',
  'HO00001579', 'HO00001114', 'HO00001912',
  'HO00001909', 'HO00000406', 'HO00000367',
  'HO00001417', 'HO00000386', 'HO00000986',
  'HO00001421', 'HO00001466', 'HO00000395',
  'HO00001487', 'HO00001472', 'HO00000393',
  'HO00001415', 'HO00000409', 'HO00000388',
  'HO00001347', 'HO00000955'
]
[
  'HO00001344', 'HO00001386', 'HO00001379',
  'HO00001413', 'HO00000369', 'HO00000909',
  'HO00000679', 'HO00001364', 'HO00001358',
  'HO00001464', 'HO00001711', 'HO00001965',
  'HO00001579', 'HO00001114', 'HO00001912',
  'HO00001909', 'HO00000406', 'HO00000367',
  'HO00001417', 'HO00000386', 'HO00000986',
  'HO00001421', 'HO00001466', 'HO00000395',
  'HO00001487', 'HO00001472', 'HO00000393',
  'HO00001415', 'HO00000409', 'HO00001932',
  'HO00000388', 'HO00001347', 'HO00000955'
]

Open in new window

I tried to modify the code you did, but I still can't get it right.  ...it's been a long, tiring week. lol
       for (let e in data[dataSetName])
         list.push(data[dataSetName][e].data.payload);

Open in new window