Solved

Joining numpy arrays in Python

Posted on 2010-09-21
4
1,190 Views
Last Modified: 2012-05-10
Greetings all:

Imagine I have two 2-dim numpy arrays:

xs = array([
    [59, 24.5, 25.5, 26.5, 4],
    [1727, 21.5, 22.5, 23.5, 9],
    [1840, 21.5, 22.5, 23.5, 9],
    [2252, 22.0, 23.0, 24.0, 4],
    [2445, 22.0, 23.0, 24.0, 4]
])

[
    [x11, x12, x13, x14, x15],
    [x21, x22, x23, x24, x25],
    [x31, x32, x33, x34, x35],
    [x41, x42, x43, x44, x45],
    [x51, x52, x53, x54, x55]
]

ys= array([
    [159, 124.5, 125.5, 126.5],
    [1227, 121.5, 122.5, 123.5],
    [1340, 121.5, 122.5, 123.5],
    [1452, 122.0, 123.0, 124.0],
    [2945, 122.0, 123.0, 124.0]
])

[
    [y11, y12, y13, y14, y15],
    [y21, y22, y23, y24, y25],
    [y31, y32, y33, y34, y35],
    [y41, y42, y43, y44, y45],
    [y51, y52, y53, y54, y55]
]

Assume the first column of each matrix is a timestamp which can be compared to each other (e.g. TS0 < TS1 == True). I'm using the minutes and seconds only for sake of brevety (e.g. 59 is 59 seconds after the hour/minute; 1727 is 17 minutes 27 seconds after the hour, etc)

The return value must be a third matrix, nm, where the first timestamp column will survive and the remaining columns will be joined in one of two ways:

1. First way is to join the arrays if either element has changed.

So the result should look like this:

([
    [159, 24.5, 25.5, 26.5, 4, 124.5, 125.5, 126.5],
    [1227, 24.5, 25.5, 26.5, 4, 121.5, 122.5, 123.5],
    [1452, 24.5, 25.5, 26.5, 4, 122.0, 123.0, 124.0],
    [1727, 21.5, 22.5, 23.5, 9, 122.0, 123.0, 124.0],
    [2252, 22.0, 23.0, 24.0, 4, 122.0, 123.0, 124.0]
])
[
    [y11, x12, x13, x14, x15, y12, y13, y14], # element in y changes; x does not
    [y21, x12, x13, x14, x15, y22, y23, y24], # element in y changes; x does not
    [y41, x12, x13, x14, x15, y42, y43, y44], # element in y changes; x does not
    [x21, x22, x23, x24, x25, y42, y43, y44], # element in y does not change; x does
    [x41, x42, x43, x44, x45, y42, y43, y44], # element in y does not change; x does
]

2. Second way is to join the arrays if both elements have changed.

So the result should look like this:

([
    [159, 24.5, 25.5, 26.5, 4, 124.5, 125.5],
    [1727, 21.5, 22.5, 23.5, 9, 122.0, 123.0]
])

([
    [y11, x12, x13, x14, x15, y12, y13, y14],
    [x21, x22, x23, x24, x25, y42, y43, y44]
])

I've made far too many attempts to post code here, but essentially I've been able to return a matrix to (kind of) match case 1:

[
    [159 124.5 125.5 126.5],
    [1227 121.5 122.5 123.5],
    [1452 122.0 123.0 124.0],
    [1727 21.5 22.5 23.5 9],
    [2252 22.0 23.0 24.0 4]
]

but this is not complete of course because it does not actually "join the arrays".
0
Comment
Question by:strimp101
  • 2
  • 2
4 Comments
 
LVL 9

Expert Comment

by:zaghaghi
ID: 33731924
I think that it's not well defined! Please make it clear.
0
 

Author Comment

by:strimp101
ID: 33734148
Thanks for the response - this is a bit difficult for me to explain, but here goes:

The arrays represent price series for derivative (credit default swaps) securities. The goal of the project is to use the joined matrix to run robust regressions (OLS rejection, winsor, ORD, etc.).

The securities trade infrequenty without a "closing price" like stocks so we cannot compare them arbitrarily by lining them in up in series. So the two methods I outlined are ways to join the series in ways suitable for regression analysis.

For sake of argument, use the first two columns of the matricies - column 1 representing a timestamp and column 2 representing the price.

X starts at price 24.5 at time 59 (first row) and does not change until it changes to price 21.5 at time 1727 (second row).

In the meantime, price Y changes from 124.5 at time 159 (first row) to 121.5 at time 1227 (second row).

So at this point, nm should have two rows:

159, 24.5, 25.5, 26.5, 4, 124.5, 125.5, 126.5;
1227, 24.5, 25.5, 26.5, 4, 121.5, 122.5, 123.5;

Column 1 is the timestamp from matrix y.
Columns 2 - 5 are the values from matrix x (note the value is the same in both rows because the value did not change between time 159 and 1227).
Columns 6 - 8 are the values from matrix y (note the values differ in each row because the y values change at time 159 and 1227).

Y does not change again until time 1452 to price 122.

So Y changes three total times.

While Y is changing from time 159, x is not changing.

So if we were to put time times in order (column 1 of both matricies), the times would be:

59 from x < the x value from this timestamp persists at each y value change below until at least the next x change at time 1840
159 from y
1227 from y
1340 from y
1452 from y
1840 from x
2252 from x
2445 from x
2945 from y

The only time we record a price is when one or the other change from the previous value.
0
 
LVL 9

Accepted Solution

by:
zaghaghi earned 500 total points
ID: 33736545
Hi,

After an hour, i think that i understand what you want and you can find out the script needed for the first case.

#import what we need for array operations
from numpy import array,  concatenate,  insert,  delete,  allclose

#first array
xs = array([
    [59, 24.5, 25.5, 26.5, 4],
    [1727, 21.5, 22.5, 23.5, 9],
    [1840, 21.5, 22.5, 23.5, 9],
    [2252, 22.0, 23.0, 24.0, 4],
    [2445, 22.0, 23.0, 24.0, 4]
])

#second array
ys = array([
    [159, 124.5, 125.5, 126.5],
    [1227, 121.5, 122.5, 123.5], 
    [1340, 121.5, 122.5, 123.5],
    [1452, 122.0, 123.0, 124.0], 
    [2945, 122.0, 123.0, 124.0]
])

# we walk on time, start from min timestamp to max timestamp
#x array contains last xs[xi] that we pass its time
x = array([])
#x array contains last ys[yi] that we pass its time
y = array([])
#contains un filtered result, i.e. contanins additional rows that are not needed
result = array([[0., 0., 0., 0., 0., 0., 0., 0.]])
#xi and yi are indices for xs and xy respectively
xi = 0
yi = 0
#here we start, until one of xs or ys array completely  checked
while xi < len(xs) and yi < len(ys):
    #if time stamp of xs is smaller than ys, then we update result array by inserting a new row to it containing timestamp, x and y
    if xs[xi][0] < ys[yi][0]:
        x = xs[xi][1:]
        #this if checks that y is not empty(i.e for the first time)
        if len(y) > 0:
            # make new row with smallest timestamp x and y
            r = concatenate(([xs[xi][0]] ,  x ,  y))
            #insert the new row to the result array
            result = insert(result, len(result),   r,  axis=0)
        #increase xi by one
        xi = xi + 1
    else:
        y = ys[yi][1:]
        #this if checks that x is not empty(i.e for the first time)
        if len(x) > 0:
            # make new row with smallest timestamp x and y
            r = concatenate(([ys[yi][0]] ,  x ,  y))
            #insert the new row to the result array
            result = insert(result, len(result),   r,  axis=0)
        #increase yi by one
        yi = yi + 1

#if xs does not completely checked, this while help us to check remaining items in it
while xi < len(xs):
        x = xs[xi][1:]
        r = concatenate(([xs[xi][0]] ,  x ,  y))
        result = insert(result, len(result),   r,  axis=0)
        xi = xi + 1
#and if ys does not completely checked, this while help us to check remaining items in it, note that only one of these two whiles executed!
while yi < len(ys):
        y = ys[yi][1:]
        r = concatenate(([ys[yi][0]] ,  x ,  y))
        result = insert(result, len(result),   r,  axis=0)
        yi = yi + 1

#remove row number i if its x and y is the same as row number i-1.
for i in reversed(range(1, len(result))):
    if allclose(result[i][1:], result[i-1][1:]):
        result = delete(result,  i,  axis=0)

#delete the first row i.e. [0.,0.,0.,0.....
result = delete(result,  0,  axis=0)

print(result)

Open in new window

0
 

Author Closing Comment

by:strimp101
ID: 33771684
Nice.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
drawing animated level bar based on numbers 3 86
move one pixel 4 58
C# code editing and collaboration 3 98
Not needed 13 93
Displaying an arrayList in a listView using the default adapter is rarely the best solution. To get full control of your display data, and to be able to refresh it after editing, requires the use of a custom adapter.
Computer science students often experience many of the same frustrations when going through their engineering courses. This article presents seven tips I found useful when completing a bachelors and masters degree in computing which I believe may he…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

914 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now