Explore SharePoint 2016, the web-based, collaborative platform that integrates with Microsoft Office to provide intranets, secure document management, and collaboration so you can develop your online and offline capabilities.

Hi all,

I think this is a pretty interesting algorithm problem so I'll post a link under 'programming' as well. But I'm hoping for a python solution, so it's going to live here.

I have a list of items like this, *as strings*.

C.x

C.p

C.T

C.delta[1]

C.delta[2]

C.delta[3]

C.sat.x

C.sat.p

C.h

C.delta[5]

and I would like to convert that into something more readable, like

C: x, p, T, delta[1-3,5], h

C.sat: x, p

* values in brackets can be numbers, in which case they can be assumed integers, and sequences of integers should be condensed like [1,2,3,5] to [1-3,5]

* values in brackets can also be 'symbols', in which case they are just listed out, eg [symbol1, symbol2, symbol3].

* brackets can act as a suffix for any node, not just the final one

* nodes are either array member nodes like x in "A[1]", or symbol nodes, like "A[symbol1]" or named nodes, like "A.x".

* the algorithm should zipper together as much as possible, as shown

Optional: it would be cool if

A.x[1].p

A.x[2].p

could be output as "A: x[1-2].p". Obviously this makes the problem quite a bit harder.

I thought that it might be possible to tackle this using python's list.reduce(func) method. But I haven't had any luck yet. The only way this is going to be solvable seems to be to re-construct the 'tree' then reduce it somehow with a top-down approach of some sort.

Cheers

JP

I think this is a pretty interesting algorithm problem so I'll post a link under 'programming' as well. But I'm hoping for a python solution, so it's going to live here.

I have a list of items like this, *as strings*.

C.x

C.p

C.T

C.delta[1]

C.delta[2]

C.delta[3]

C.sat.x

C.sat.p

C.h

C.delta[5]

and I would like to convert that into something more readable, like

C: x, p, T, delta[1-3,5], h

C.sat: x, p

* values in brackets can be numbers, in which case they can be assumed integers, and sequences of integers should be condensed like [1,2,3,5] to [1-3,5]

* values in brackets can also be 'symbols', in which case they are just listed out, eg [symbol1, symbol2, symbol3].

* brackets can act as a suffix for any node, not just the final one

* nodes are either array member nodes like x in "A[1]", or symbol nodes, like "A[symbol1]" or named nodes, like "A.x".

* the algorithm should zipper together as much as possible, as shown

Optional: it would be cool if

A.x[1].p

A.x[2].p

could be output as "A: x[1-2].p". Obviously this makes the problem quite a bit harder.

I thought that it might be possible to tackle this using python's list.reduce(func) method. But I haven't had any luck yet. The only way this is going to be solvable seems to be to re-construct the 'tree' then reduce it somehow with a top-down approach of some sort.

Cheers

JP

Do more with

EXPERT OFFICE^{®} is a registered trademark of EXPERTS EXCHANGE^{®}

We could take this:

C.x[12].pqzed.o

and as it looped through, it would be stored as follows:

C -> it's a letter, store it in sub1 then check next value

. -> it's a dot, so we'll take it as a separation

x -> it's a letter, so store it in sub2 and check next value

[ -> ooh it's a bracket, so we store not only the letter before it but also the element(s) inside it as well as the number itself in

] -> end the bracket

. -> next term

p -> letter, check next...

q -> letter, check next...

z -> letter, check next...

e -> letter, check next...

d -> letter, check next...

. -> terminate the last string and start new...

o -> letter, end of term...

Archive the term and start fresh for the next one...

Once you've achieved this, you'll want to do data comparisons...

Check each of the sub-terms to see if they're the same or at least similar, differing only by numbers... and if so, order the numbers...

As for the 1-7, that's a simple counting algorithm... compare the numerical values, and increment to see if they're in sequence... if so, continue until that sequence is broken, and once it's broken you have the starting term and the term that breaks the cycle(minus one) and just separate them with a dash... then, continue for the rest of them... I could probably write this up in a day or two although I write in C/C++ more than Python...

I hope this helps.

-aNthrax

from operator import itemgetter

def fold(data):

""" fold sorted numeric sequence data into ranged representation:

>>> fold([1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28])

'[1,4-6,10,15-18,22,25-28]

"""

folded = []

for k, g in groupby(enumerate(data), lambda (i,x):i-x):

seq = map(itemgetter(1), g)

if len(seq) > 1:

x = '%s-%s' % (seq[0], seq[-1])

else:

x = str(seq[0])

folded.append(x)

return folded and '[%s]' % ','.join(folded) or ''

def reduce(names):

"""reduce a list of items nto something more readable:

>>> data = 'C.x C.p C.T C.delta[1] C.delta[2] C.delta[3] C.sat.x C.sat.p C.h C.delta[5]'.split()

>>> res = reduce(data)

>>> for k in sorted(res):

... print '%s: %s' % (k, res[k])

C: T, delta[1-3,5], h, p, x

C.sat: p, x

"""

data = sorted([n.split('.') for n in names], key=len)

res = {}

for k, g in groupby(data, lambda x: len(x)):

item = g.next()

assert len(item) == k

key = '.'.join(item[:-1])

indexed = {}

seq = set(get(indexed, item))

for item in g:

seq.add(get(indexed, item))

res[key] = ', '.join(i+fold(indexed.get(

return res

def get(indexed, item):

item = item[-1]

if item.endswith(']'):

item, idx = item[:-1].split('[')

indexed.setdefault(item, []).append(int(idx))

return item

Enjoy!

key = '.'.join(item[:-1]) or ''

seq = set([get(indexed, item)])

Global names will be grouped under ''. Put whatever name you want (eg. 'GLOBAL') instead of ''.

def fold(data):

""" fold sorted numeric sequence data into ranged representation:

>>> fold([1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28])

'[1,4-6,10,15-18,22,25-28]

"""

folded = []

for k, g in groupby(enumerate(sorted(d

seq = map(itemgetter(1), g)

if len(seq) > 1:

x = '%s-%s' % (seq[0], seq[-1])

else:

x = str(seq[0])

folded.append(x)

return folded and '[%s]' % ','.join(folded) or ''

def reduce(names):

"""reduce a list of items nto something more readable:

>>> data = 'C.x C.p C.T C.delta[1] C.delta[2] C.delta[3] C.sat.x C.sat.p C.h C.delta[5]'.split()

>>> res = reduce(data)

>>> for k in sorted(res):

... print '%s: %s' % (k, res[k])

C: T, delta[1-3,5], h, p, x

C.sat: p, x

"""

data = sorted([n.split('.') for n in sorted(names)], key=len)

res = {}

for k, g in groupby(data, lambda x: len(x)):

if k == 1:

indexed = {}

seq = set(get(indexed, item) for item in g)

res['[global]'] = ', '.join(i+fold(indexed.get(

else:

for key, g1 in groupby(g, lambda x: '.'.join(x[:-1])):

indexed = {}

seq = set(get(indexed, item) for item in g1)

res[key] = ', '.join(i+fold(indexed.get(

return res

## Premium Content

You need an Expert Office subscription to comment.Start Free Trial