Link to home
Start Free TrialLog in
Avatar of Vlearns
Vlearns

asked on

perl regex question

i have strings of the format

{text1},{text2},{text3},{text4}

i want to separate them such that

var 1 = {text1}
var 2 = {text2}
var 3 = {text3}
var 4 = {text4}

the problem is that some of the text1 or text2/3/4 can have an nested { and } and we have to skip those

i have access to Regexp-Common-2013031301 if thats needed

thanks!
Avatar of ozo
ozo
Flag of United States of America image

Can you give an example of nested text that we have to skip and tell up what result you would want in such cases?
Avatar of Vlearns
Vlearns

ASKER

{"ns:common.document.updated":["\"2011-06-24T09:06:14.859442Z\"^^xsd:dateTime"],"key:freebase.mid":["ns:m.0gwmqkf"],"ns:freebase.acre_doc.handler":["\"Script\"@ns:m.040c1n9"],"ns:common.document.content":["ns:m.0gwmxcr"],"\u003chttp://www.w3.org/1999/02/22-rdf-syntax-ns#type\u003e":["ns:common.document"],"ns:common.document.text":["\"/* -------------------------------------------------------- *\\n* RABJ Javascript Library (server side)\\n* -------------------------------------------------------- */\\n\\nvar rabj_store_path \u003d \\\"/rabj/store/\\\";\\nvar rabj_serve_path \u003d \\\"/rabj/serve/\\\";\\n\\nvar rabj_queues_path \u003d rabj_store_path + \\\"queues/\\\";       // fetch queue data\\nvar rabj_questions_path \u003d rabj_store_path + \\\"questions/\\\"; // fetch question data\\n\\nvar rabj_question_path \u003d rabj_serve_path + \\\"question\\\";    // serve questions\\nvar rabj_answer_path \u003d rabj_serve_path + \\\"answer\\\";        // submit answers\\n\\n/**\\n* Create a new RABJ queue.\\n*/\\nfunction make_queue(user_id, name, tags, access_key, mql, votes, callback) {\\n  \\n  if (user_id \u003d\u003d null || user_id \u003d\u003d \\\"\\\") {\\n    throw new Error(\\\"Must specify a valid Freebase user ID\\\");\\n  }\\n  \\n  if (name \u003d\u003d null || name \u003d\u003d \\\"\\\") {\\n    throw new Error(\\\"Must specify a name for the queue\\\");\\n  }\\n  \\n  tags \u003d tags || \\\"\\\";\\n  \\n  if (tags instanceof Array) {\\n    t \u003d tags;\\n  } else {\\n    var t \u003d tags.split(\\\" \\\");\\n    for (var i \u003d 0; i \u003c t.length; i++) {\\n      t[i] \u003d t[i].strip();\\n    }\\n  }\\n  \\n  if (mql \u0026\u0026 mql !\u003d \\\"\\\") {    \\n    var parsed_mql \u003d JSON.parse(mql);\\n    if (typeof parsed_mql.length \u003d\u003d \\\"undefined\\\") {\\n      parsed_mql.limit \u003d 1;\\n      var query \u003d [ parsed_mql ];\\n    } else {\\n      parsed_mql[0].limit \u003d 1\\n        var query \u003d parsed_mql;\\n    }\\n    var result \u003d acre.freebase.mqlread(query);\\n    if (result.code !\u003d \\\"/api/status/ok\\\") {\\n      throw new Error(\\\"MQL query is invalid: \\\" + JSON.stringify(result));\\n    }\\n  }\\n  \\n  var payload \u003d {\\n    queue: {\\n      name: name,\\n      owner: user_id,\\n      access_key: null,\\n      tags: t\\n    }\\n  };\\n  \\n  if (votes instanceof Object) {\\n    payload.queue.votes \u003d votes;\\n  } else {\\n    payload.queue.votes \u003d parseInt(votes || 1);\\n  }\\n  \\n  if (access_key) {\\n    payload.queue.access_key \u003d access_key;\\n  }\\n  \\n  if (callback) {\\n    payload.queue.callback \u003d callback;\\n  }\\n  \\n  if (typeof parsed_mql !\u003d \\\"undefined\\\") {\\n    payload.queue.mql \u003d parsed_mql;\\n  }\\n  \\n  var headers \u003d {\\n    \\\"content-type\\\" : \\\"application/json\\\"\\n  };\\n  \\n  var url \u003d \\\"http://\\\" + _get_rabj_host() + rabj_queues_path;\\n  \\n  return JSON.parse(acre.urlfetch(url, \\\"POST\\\", headers, JSON.stringify(payload)).body);  \\n}\\n\\n/**\\n* Update the info of the given queue\\n*/\\nfunction update_queue_info(queue_info, access_key) {\\n  \\n  var headers \u003d {\\n    \\\"content-type\\\" : \\\"application/json\\\"\\n  };\\n  \\n  var url \u003d queue_info.url;\\n  \\n  // remove the stuff that we add on our end and that RABJ doesn\u0027t expect\\n  delete queue_info.url;\\n  delete queue_info.judgments;\\n  delete queue_info.complete;\\n  delete queue_info.incomplete;\\n  delete queue_info.questions;\\n  delete queue_info.created;\\n  delete queue_info.updated;\\n  delete queue_info.creator;\\n  delete queue_info.updater;\\n  delete queue_info.type;\\n  delete queue_info.mql;\\n  \\n  var payload \u003d {\\n    \\\"queue\\\" : queue_info,\\n    \\\"access_key\\\" : access_key\\n  };\\n  \\n  return JSON.parse(acre.urlfetch(url, \\\"PUT\\\", headers, JSON.stringify(payload)).body);\\n}\\n\\n/**\\n* Return a list of all the public RABJ queues.\\n*/\\nfunction get_public_queues() {\\n  var url \u003d \\\"http://\\\" + _get_rabj_host() + rabj_queues_path + \\\"public\\\";\\n  return _json_request(url);\\n}\\n\\n/**\\n* Find all the RABJ queues that match the given tags.\\n* Tags can be an array or a space-separated string.\\n* If no tag is provided, list all the queues that can be accessed\\n* with this apps\u0027s access key\\n*/\\nfunction get_queues(tags, with_info) {\\n  if (typeof tags !\u003d \\\"undefined\\\") {\\n    if (typeof tags \u003d\u003d \\\"string\\\") {\\n      tags \u003d tags.split(\u0027 \u0027);\\n    }\\n    var url \u003d \\\"http://\\\" + _get_rabj_host() + rabj_queues_path + \\\"tags?\\\" + _get_access_key_param();\\n    for each (var t in tags) {\\n      url +\u003d \\\"\u0026tag\u003d\\\" + acre.form.quote(t);\\n    }\\n  } else {\\n    var url \u003d \\\"http://\\\" + _get_rabj_host() + rabj_queues_path + \\\"access_key?\\\" + _get_access_key_param();\\n  }\\n  var result \u003d _json_request(url);\\n  if (with_info) {\\n    for each (var q in result) {\\n      var queue_id \u003d q.id;\\n      q.qid \u003d queue_id.split(\\\"/\\\").pop();\\n      q.info \u003d get_queue_info(q, true);\\n    }\\n  }\\n  return result;\\n}\\n\\n/**\\n* Filter out the queues with the given tag\\n*/\\nfunction filter_queues(tag, queues) {\\n  var filtered_queues \u003d [];\\n  for each (var q in queues) {\\n    var tags \u003d q.tags;\\n    var found \u003d false;\\n    for each (var t in q.tags) {\\n      if (t \u003d\u003d tag) found \u003d true;\\n    }\\n    if (!found) filtered_queues.push(q);\\n  }\\n  return filtered_queues;\\n}\\n\\n/**\\n* Obtain information about the queue with the given ID\\n*/\\nfunction get_queue_info(q, with_status, callback) {\\n  \\n  if (typeof with_status \u003d\u003d \\\"undefined\\\") with_status \u003d true;\\n  var queue_id \u003d (typeof q \u003d\u003d \u0027string\u0027) ? q : q.id;\\n  var url \u003d \\\"http://\\\" + _get_rabj_host() + _get_queue_path(queue_id);\\n  \\n  if (typeof callback \u003d\u003d \u0027function\u0027) {\\n    var build_wrapper \u003d function(callback, queue_id, url) {\\n      return function(result) {\\n        callback(_compute_queue_info(queue_id, url, result));\\n      };\\n    };\\n    _json_request(url + ((with_status) ? \\\"/status?\\\" : \\\"?\\\") + _get_access_key_param(), \\\"GET\\\", null, build_wrapper(callback, queue_id, url));\\n  } else {\\n    return _compute_queue_info(queue_id, url, _json_request(url + ((with_status) ? \\\"/status?\\\" : \\\"?\\\") + _get_access_key_param()));\\n  }\\n}\\n\\n/**\\n* Obtain a summary }

Open in new window

And what result would you want in that case?
Avatar of Vlearns

ASKER

is it possible in general to be able to tokenize on outer {} by splitting on the comma's?

thanks!
Avatar of Vlearns

ASKER

{hello},{hello{}{}},{xxxxxx},{jjjjjjj}

var1 = {hello}
var2 = {hello{}{}}
var3 = {xxxxxx}
var4= {jjjjjjjj}
Simply splitting on , might work for the example in http:#a39557886
but that won't work if you mean to ignore , inside of {}
The example in http:#a39557882 contains what looks like javascript, if you mean to parse it as javascript, that may mean you'd want to ignore { or } inside of quoted strings, and to ignore quotes inside of comments, and quotes that are escaped, but it also looks like you have an additional layer of escaping.

Maybe you want to use JSON::Parse
Avatar of Vlearns

ASKER

but would it not be possible to just tokenize the string based on commas and outer braces?
Yes, but your example in http:#a39557882 seems to suggest that to just tokenize the string based on commas and outer braces may not be adequate, and that quoted strings and escape sequences may also have to be considered, and I'm not sure what else would need to be considered.
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial