MarkMcAndrews
asked on
A complicated character extraction in R
I've got a character vector with many elements that look like this:
katimajjutiksaq+{kati:kati t/1v}{ma:m a/1vv}{jju ti:jjut/1v n}{ksaq:ks aq/1nn}
tuksiarnirmut+{tuksiar:tuk siaq/1v}{n ir:niq/2vn }{mut:mut/ tn-dat-s}
mista+{mista:mista/1n}
etc. (where the format consists of a whole Inuktitut word followed by a + followed by any number of curly braced tags in the format: {part of the original word:some letters/a combination of letters, numbers and -dashes}
I would like to extract only the character strings between : and } so each one is its own element. Using the lines above, it would look like this:
katit/1v
ma/1vv
jjut/1vn
ksaq/1nn
tuksiaq/1v
niq/2vn
etc
I have tried using strsplit and regular expressions but my code is clumsy and long. Can you suggest a good way?
katimajjutiksaq+{kati:kati
tuksiarnirmut+{tuksiar:tuk
mista+{mista:mista/1n}
etc. (where the format consists of a whole Inuktitut word followed by a + followed by any number of curly braced tags in the format: {part of the original word:some letters/a combination of letters, numbers and -dashes}
I would like to extract only the character strings between : and } so each one is its own element. Using the lines above, it would look like this:
katit/1v
ma/1vv
jjut/1vn
ksaq/1nn
tuksiaq/1v
niq/2vn
etc
I have tried using strsplit and regular expressions but my code is clumsy and long. Can you suggest a good way?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Awesome, thanks!
ASKER
I've requested that this question be closed as follows:
Accepted answer: 0 points for MarkMcAndrews's comment #a40781600
for the following reason:
It outputs the desired output.
Accepted answer: 0 points for MarkMcAndrews's comment #a40781600
for the following reason:
It outputs the desired output.
Awesome, thanks! It outputs the desired output.
And then close the question with no points!!!
OOOOObjection!!
And then close the question with no points!!!
OOOOObjection!!
How is it "outputs the desired output." when you resolved by an EE's solution? Please review rules for closing on your own comment.
ASKER
Sorry I don't know what I did wrong. I thought I clicked on "accept solution" for your answer. Thanks so much - it was a great answer and I really appreciate your help. Sorry that I mess up somehow. I hope that I fixed the problem by accepting your solution. My bad!
Glad to have helped.
Think nothing of it, no harm done ;)
Good luck with your code.
Think nothing of it, no harm done ;)
Good luck with your code.
$string = "katimajjutiksaq+{kati:kat
tuksiarnirmut+{tuksiar:tuk
mista+{mista:mista/1n}";
function get_string_between($string
$string = " ".$string;
$ini = strpos($string,$start);
if ($ini == 0) return "";
$ini += strlen($start);
$len = strpos($string,$end,$ini) - $ini;
$tmp = substr($string,$ini,$len);
return $tmp;
}
//$tmp = get_string_between($string
//echo $string."</br>";
//echo str_replace(":".$tmp."}","
while(get_string_between($
{
$parsed = get_string_between($string
$string=str_replace(":".$p
echo $parsed."</br>";
}