Solved

# Joining numpy arrays in Python

Posted on 2010-09-21
1,199 Views
Greetings all:

Imagine I have two 2-dim numpy arrays:

xs = array([
[59, 24.5, 25.5, 26.5, 4],
[1727, 21.5, 22.5, 23.5, 9],
[1840, 21.5, 22.5, 23.5, 9],
[2252, 22.0, 23.0, 24.0, 4],
[2445, 22.0, 23.0, 24.0, 4]
])

[
[x11, x12, x13, x14, x15],
[x21, x22, x23, x24, x25],
[x31, x32, x33, x34, x35],
[x41, x42, x43, x44, x45],
[x51, x52, x53, x54, x55]
]

ys= array([
[159, 124.5, 125.5, 126.5],
[1227, 121.5, 122.5, 123.5],
[1340, 121.5, 122.5, 123.5],
[1452, 122.0, 123.0, 124.0],
[2945, 122.0, 123.0, 124.0]
])

[
[y11, y12, y13, y14, y15],
[y21, y22, y23, y24, y25],
[y31, y32, y33, y34, y35],
[y41, y42, y43, y44, y45],
[y51, y52, y53, y54, y55]
]

Assume the first column of each matrix is a timestamp which can be compared to each other (e.g. TS0 < TS1 == True). I'm using the minutes and seconds only for sake of brevety (e.g. 59 is 59 seconds after the hour/minute; 1727 is 17 minutes 27 seconds after the hour, etc)

The return value must be a third matrix, nm, where the first timestamp column will survive and the remaining columns will be joined in one of two ways:

1. First way is to join the arrays if either element has changed.

So the result should look like this:

([
[159, 24.5, 25.5, 26.5, 4, 124.5, 125.5, 126.5],
[1227, 24.5, 25.5, 26.5, 4, 121.5, 122.5, 123.5],
[1452, 24.5, 25.5, 26.5, 4, 122.0, 123.0, 124.0],
[1727, 21.5, 22.5, 23.5, 9, 122.0, 123.0, 124.0],
[2252, 22.0, 23.0, 24.0, 4, 122.0, 123.0, 124.0]
])
[
[y11, x12, x13, x14, x15, y12, y13, y14], # element in y changes; x does not
[y21, x12, x13, x14, x15, y22, y23, y24], # element in y changes; x does not
[y41, x12, x13, x14, x15, y42, y43, y44], # element in y changes; x does not
[x21, x22, x23, x24, x25, y42, y43, y44], # element in y does not change; x does
[x41, x42, x43, x44, x45, y42, y43, y44], # element in y does not change; x does
]

2. Second way is to join the arrays if both elements have changed.

So the result should look like this:

([
[159, 24.5, 25.5, 26.5, 4, 124.5, 125.5],
[1727, 21.5, 22.5, 23.5, 9, 122.0, 123.0]
])

([
[y11, x12, x13, x14, x15, y12, y13, y14],
[x21, x22, x23, x24, x25, y42, y43, y44]
])

I've made far too many attempts to post code here, but essentially I've been able to return a matrix to (kind of) match case 1:

[
[159 124.5 125.5 126.5],
[1227 121.5 122.5 123.5],
[1452 122.0 123.0 124.0],
[1727 21.5 22.5 23.5 9],
[2252 22.0 23.0 24.0 4]
]

but this is not complete of course because it does not actually "join the arrays".
0
Question by:strimp101
• 2
• 2

LVL 9

Expert Comment

ID: 33731924
I think that it's not well defined! Please make it clear.
0

Author Comment

ID: 33734148
Thanks for the response - this is a bit difficult for me to explain, but here goes:

The arrays represent price series for derivative (credit default swaps) securities. The goal of the project is to use the joined matrix to run robust regressions (OLS rejection, winsor, ORD, etc.).

The securities trade infrequenty without a "closing price" like stocks so we cannot compare them arbitrarily by lining them in up in series. So the two methods I outlined are ways to join the series in ways suitable for regression analysis.

For sake of argument, use the first two columns of the matricies - column 1 representing a timestamp and column 2 representing the price.

X starts at price 24.5 at time 59 (first row) and does not change until it changes to price 21.5 at time 1727 (second row).

In the meantime, price Y changes from 124.5 at time 159 (first row) to 121.5 at time 1227 (second row).

So at this point, nm should have two rows:

159, 24.5, 25.5, 26.5, 4, 124.5, 125.5, 126.5;
1227, 24.5, 25.5, 26.5, 4, 121.5, 122.5, 123.5;

Column 1 is the timestamp from matrix y.
Columns 2 - 5 are the values from matrix x (note the value is the same in both rows because the value did not change between time 159 and 1227).
Columns 6 - 8 are the values from matrix y (note the values differ in each row because the y values change at time 159 and 1227).

Y does not change again until time 1452 to price 122.

So Y changes three total times.

While Y is changing from time 159, x is not changing.

So if we were to put time times in order (column 1 of both matricies), the times would be:

59 from x < the x value from this timestamp persists at each y value change below until at least the next x change at time 1840
159 from y
1227 from y
1340 from y
1452 from y
1840 from x
2252 from x
2445 from x
2945 from y

The only time we record a price is when one or the other change from the previous value.
0

LVL 9

Accepted Solution

zaghaghi earned 500 total points
ID: 33736545
Hi,

After an hour, i think that i understand what you want and you can find out the script needed for the first case.

``````#import what we need for array operations
from numpy import array,  concatenate,  insert,  delete,  allclose

#first array
xs = array([
[59, 24.5, 25.5, 26.5, 4],
[1727, 21.5, 22.5, 23.5, 9],
[1840, 21.5, 22.5, 23.5, 9],
[2252, 22.0, 23.0, 24.0, 4],
[2445, 22.0, 23.0, 24.0, 4]
])

#second array
ys = array([
[159, 124.5, 125.5, 126.5],
[1227, 121.5, 122.5, 123.5],
[1340, 121.5, 122.5, 123.5],
[1452, 122.0, 123.0, 124.0],
[2945, 122.0, 123.0, 124.0]
])

# we walk on time, start from min timestamp to max timestamp
#x array contains last xs[xi] that we pass its time
x = array([])
#x array contains last ys[yi] that we pass its time
y = array([])
#contains un filtered result, i.e. contanins additional rows that are not needed
result = array([[0., 0., 0., 0., 0., 0., 0., 0.]])
#xi and yi are indices for xs and xy respectively
xi = 0
yi = 0
#here we start, until one of xs or ys array completely  checked
while xi < len(xs) and yi < len(ys):
#if time stamp of xs is smaller than ys, then we update result array by inserting a new row to it containing timestamp, x and y
if xs[xi][0] < ys[yi][0]:
x = xs[xi][1:]
#this if checks that y is not empty(i.e for the first time)
if len(y) > 0:
# make new row with smallest timestamp x and y
r = concatenate(([xs[xi][0]] ,  x ,  y))
#insert the new row to the result array
result = insert(result, len(result),   r,  axis=0)
#increase xi by one
xi = xi + 1
else:
y = ys[yi][1:]
#this if checks that x is not empty(i.e for the first time)
if len(x) > 0:
# make new row with smallest timestamp x and y
r = concatenate(([ys[yi][0]] ,  x ,  y))
#insert the new row to the result array
result = insert(result, len(result),   r,  axis=0)
#increase yi by one
yi = yi + 1

#if xs does not completely checked, this while help us to check remaining items in it
while xi < len(xs):
x = xs[xi][1:]
r = concatenate(([xs[xi][0]] ,  x ,  y))
result = insert(result, len(result),   r,  axis=0)
xi = xi + 1
#and if ys does not completely checked, this while help us to check remaining items in it, note that only one of these two whiles executed!
while yi < len(ys):
y = ys[yi][1:]
r = concatenate(([ys[yi][0]] ,  x ,  y))
result = insert(result, len(result),   r,  axis=0)
yi = yi + 1

#remove row number i if its x and y is the same as row number i-1.
for i in reversed(range(1, len(result))):
if allclose(result[i][1:], result[i-1][1:]):
result = delete(result,  i,  axis=0)

#delete the first row i.e. [0.,0.,0.,0.....
result = delete(result,  0,  axis=0)

print(result)
``````
0

Author Closing Comment

ID: 33771684
Nice.
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

### Suggested Solutions

A short article about a problem I had getting the GPS LocationListener working.
Whether you've completed a degree in computer sciences or you're a self-taught programmer, writing your first lines of code in the real world is always a challenge. Here are some of the most common pitfalls for new programmers.
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplicâ€¦
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, Tâ€¦