Greetings all:

Imagine I have two 2-dim numpy arrays:

xs = array([

[59, 24.5, 25.5, 26.5, 4],

[1727, 21.5, 22.5, 23.5, 9],

[1840, 21.5, 22.5, 23.5, 9],

[2252, 22.0, 23.0, 24.0, 4],

[2445, 22.0, 23.0, 24.0, 4]

])

[

[x11, x12, x13, x14, x15],

[x21, x22, x23, x24, x25],

[x31, x32, x33, x34, x35],

[x41, x42, x43, x44, x45],

[x51, x52, x53, x54, x55]

]

ys= array([

[159, 124.5, 125.5, 126.5],

[1227, 121.5, 122.5, 123.5],

[1340, 121.5, 122.5, 123.5],

[1452, 122.0, 123.0, 124.0],

[2945, 122.0, 123.0, 124.0]

])

[

[y11, y12, y13, y14, y15],

[y21, y22, y23, y24, y25],

[y31, y32, y33, y34, y35],

[y41, y42, y43, y44, y45],

[y51, y52, y53, y54, y55]

]

Assume the first column of each matrix is a timestamp which can be compared to each other (e.g. TS0 < TS1 == True). I'm using the minutes and seconds only for sake of brevety (e.g. 59 is 59 seconds after the hour/minute; 1727 is 17 minutes 27 seconds after the hour, etc)

The return value must be a third matrix, nm, where the first timestamp column will survive and the remaining columns will be joined in one of two ways:

1. First way is to join the arrays if either element has changed.

So the result should look like this:

([

[159, 24.5, 25.5, 26.5, 4, 124.5, 125.5, 126.5],

[1227, 24.5, 25.5, 26.5, 4, 121.5, 122.5, 123.5],

[1452, 24.5, 25.5, 26.5, 4, 122.0, 123.0, 124.0],

[1727, 21.5, 22.5, 23.5, 9, 122.0, 123.0, 124.0],

[2252, 22.0, 23.0, 24.0, 4, 122.0, 123.0, 124.0]

])

[

[y11, x12, x13, x14, x15, y12, y13, y14], # element in y changes; x does not

[y21, x12, x13, x14, x15, y22, y23, y24], # element in y changes; x does not

[y41, x12, x13, x14, x15, y42, y43, y44], # element in y changes; x does not

[x21, x22, x23, x24, x25, y42, y43, y44], # element in y does not change; x does

[x41, x42, x43, x44, x45, y42, y43, y44], # element in y does not change; x does

]

2. Second way is to join the arrays if both elements have changed.

So the result should look like this:

([

[159, 24.5, 25.5, 26.5, 4, 124.5, 125.5],

[1727, 21.5, 22.5, 23.5, 9, 122.0, 123.0]

])

([

[y11, x12, x13, x14, x15, y12, y13, y14],

[x21, x22, x23, x24, x25, y42, y43, y44]

])

I've made far too many attempts to post code here, but essentially I've been able to return a matrix to (kind of) match case 1:

[

[159 124.5 125.5 126.5],

[1227 121.5 122.5 123.5],

[1452 122.0 123.0 124.0],

[1727 21.5 22.5 23.5 9],

[2252 22.0 23.0 24.0 4]

]

but this is not complete of course because it does not actually "join the arrays".

Imagine I have two 2-dim numpy arrays:

xs = array([

[59, 24.5, 25.5, 26.5, 4],

[1727, 21.5, 22.5, 23.5, 9],

[1840, 21.5, 22.5, 23.5, 9],

[2252, 22.0, 23.0, 24.0, 4],

[2445, 22.0, 23.0, 24.0, 4]

])

[

[x11, x12, x13, x14, x15],

[x21, x22, x23, x24, x25],

[x31, x32, x33, x34, x35],

[x41, x42, x43, x44, x45],

[x51, x52, x53, x54, x55]

]

ys= array([

[159, 124.5, 125.5, 126.5],

[1227, 121.5, 122.5, 123.5],

[1340, 121.5, 122.5, 123.5],

[1452, 122.0, 123.0, 124.0],

[2945, 122.0, 123.0, 124.0]

])

[

[y11, y12, y13, y14, y15],

[y21, y22, y23, y24, y25],

[y31, y32, y33, y34, y35],

[y41, y42, y43, y44, y45],

[y51, y52, y53, y54, y55]

]

Assume the first column of each matrix is a timestamp which can be compared to each other (e.g. TS0 < TS1 == True). I'm using the minutes and seconds only for sake of brevety (e.g. 59 is 59 seconds after the hour/minute; 1727 is 17 minutes 27 seconds after the hour, etc)

The return value must be a third matrix, nm, where the first timestamp column will survive and the remaining columns will be joined in one of two ways:

1. First way is to join the arrays if either element has changed.

So the result should look like this:

([

[159, 24.5, 25.5, 26.5, 4, 124.5, 125.5, 126.5],

[1227, 24.5, 25.5, 26.5, 4, 121.5, 122.5, 123.5],

[1452, 24.5, 25.5, 26.5, 4, 122.0, 123.0, 124.0],

[1727, 21.5, 22.5, 23.5, 9, 122.0, 123.0, 124.0],

[2252, 22.0, 23.0, 24.0, 4, 122.0, 123.0, 124.0]

])

[

[y11, x12, x13, x14, x15, y12, y13, y14], # element in y changes; x does not

[y21, x12, x13, x14, x15, y22, y23, y24], # element in y changes; x does not

[y41, x12, x13, x14, x15, y42, y43, y44], # element in y changes; x does not

[x21, x22, x23, x24, x25, y42, y43, y44], # element in y does not change; x does

[x41, x42, x43, x44, x45, y42, y43, y44], # element in y does not change; x does

]

2. Second way is to join the arrays if both elements have changed.

So the result should look like this:

([

[159, 24.5, 25.5, 26.5, 4, 124.5, 125.5],

[1727, 21.5, 22.5, 23.5, 9, 122.0, 123.0]

])

([

[y11, x12, x13, x14, x15, y12, y13, y14],

[x21, x22, x23, x24, x25, y42, y43, y44]

])

I've made far too many attempts to post code here, but essentially I've been able to return a matrix to (kind of) match case 1:

[

[159 124.5 125.5 126.5],

[1227 121.5 122.5 123.5],

[1452 122.0 123.0 124.0],

[1727 21.5 22.5 23.5 9],

[2252 22.0 23.0 24.0 4]

]

but this is not complete of course because it does not actually "join the arrays".

The arrays represent price series for derivative (credit default swaps) securities. The goal of the project is to use the joined matrix to run robust regressions (OLS rejection, winsor, ORD, etc.).

The securities trade infrequenty without a "closing price" like stocks so we cannot compare them arbitrarily by lining them in up in series. So the two methods I outlined are ways to join the series in ways suitable for regression analysis.

For sake of argument, use the first two columns of the matricies - column 1 representing a timestamp and column 2 representing the price.

X starts at price 24.5 at time 59 (first row) and does not change until it changes to price 21.5 at time 1727 (second row).

In the meantime, price Y changes from 124.5 at time 159 (first row) to 121.5 at time 1227 (second row).

So at this point, nm should have two rows:

159, 24.5, 25.5, 26.5, 4, 124.5, 125.5, 126.5;

1227, 24.5, 25.5, 26.5, 4, 121.5, 122.5, 123.5;

Column 1 is the timestamp from matrix y.

Columns 2 - 5 are the values from matrix x (note the value is the same in both rows because the value did not change between time 159 and 1227).

Columns 6 - 8 are the values from matrix y (note the values differ in each row because the y values change at time 159 and 1227).

Y does not change again until time 1452 to price 122.

So Y changes three total times.

While Y is changing from time 159, x is not changing.

So if we were to put time times in order (column 1 of both matricies), the times would be:

59 from x < the x value from this timestamp persists at each y value change below until at least the next x change at time 1840

159 from y

1227 from y

1340 from y

1452 from y

1840 from x

2252 from x

2445 from x

2945 from y

The only time we record a price is when one or the other change from the previous value.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.

After an hour, i think that i understand what you want and you can find out the script needed for the first case.

Open in new window