Link to home
Start Free TrialLog in
Avatar of MarkMcAndrews
MarkMcAndrews

asked on

A complicated character extraction in R

I've got a character vector with many elements that look like this:

katimajjutiksaq+{kati:katit/1v}{ma:ma/1vv}{jjuti:jjut/1vn}{ksaq:ksaq/1nn}
tuksiarnirmut+{tuksiar:tuksiaq/1v}{nir:niq/2vn}{mut:mut/tn-dat-s}
mista+{mista:mista/1n}
etc. (where the format consists of a whole Inuktitut word followed by a + followed by any number of curly braced tags in the format: {part of the original word:some letters/a combination of letters, numbers and -dashes}

I would like to extract only the character strings between : and } so each one is its own element.  Using the lines above, it would look like this:

katit/1v
ma/1vv
jjut/1vn
ksaq/1nn
tuksiaq/1v
niq/2vn
etc

I have tried using strsplit and regular expressions but my code is clumsy and long.  Can you suggest a good way?
ASKER CERTIFIED SOLUTION
Avatar of Anwar Saiah
Anwar Saiah

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Anwar Saiah
Anwar Saiah

edited version:  (tested!)

$string = "katimajjutiksaq+{kati:katit/1v}{ma:ma/1vv}{jjuti:jjut/1vn}{ksaq:ksaq/1nn}
tuksiarnirmut+{tuksiar:tuksiaq/1v}{nir:niq/2vn}{mut:mut/tn-dat-s}
mista+{mista:mista/1n}";
function get_string_between($string, $start, $end){
    $string = " ".$string;
    $ini = strpos($string,$start);
    if ($ini == 0) return "";
    $ini += strlen($start);
    $len = strpos($string,$end,$ini) - $ini;
    $tmp = substr($string,$ini,$len);
    return $tmp;
}
//$tmp = get_string_between($string, ":", "}");
//echo $string."</br>";
//echo str_replace(":".$tmp."}","",$string);

while(get_string_between($string, ":", "}") != "")
{
    $parsed = get_string_between($string, ":", "}");
    $string=str_replace(":".$parsed."}","",$string);
    echo $parsed."</br>";
}
Avatar of MarkMcAndrews

ASKER

Awesome, thanks!
I've requested that this question be closed as follows:

Accepted answer: 0 points for MarkMcAndrews's comment #a40781600

for the following reason:

It outputs the desired output.
Awesome, thanks! It outputs the desired output.

And then close the question with no points!!!

OOOOObjection!!
How is it "outputs the desired output." when you resolved by an EE's solution? Please review rules for closing on your own comment.
Sorry I don't know what I did wrong.  I thought I clicked on "accept solution" for your answer.  Thanks so much - it was a great answer and I really appreciate your help.  Sorry that I mess up somehow.  I hope that I fixed the problem by accepting your solution.  My bad!
Glad to have helped.

Think nothing of it, no harm done ;)
Good luck with your code.