Solved

Joining numpy arrays in Python

Posted on 2010-09-21
4
1,183 Views
Last Modified: 2012-05-10
Greetings all:

Imagine I have two 2-dim numpy arrays:

xs = array([
    [59, 24.5, 25.5, 26.5, 4],
    [1727, 21.5, 22.5, 23.5, 9],
    [1840, 21.5, 22.5, 23.5, 9],
    [2252, 22.0, 23.0, 24.0, 4],
    [2445, 22.0, 23.0, 24.0, 4]
])

[
    [x11, x12, x13, x14, x15],
    [x21, x22, x23, x24, x25],
    [x31, x32, x33, x34, x35],
    [x41, x42, x43, x44, x45],
    [x51, x52, x53, x54, x55]
]

ys= array([
    [159, 124.5, 125.5, 126.5],
    [1227, 121.5, 122.5, 123.5],
    [1340, 121.5, 122.5, 123.5],
    [1452, 122.0, 123.0, 124.0],
    [2945, 122.0, 123.0, 124.0]
])

[
    [y11, y12, y13, y14, y15],
    [y21, y22, y23, y24, y25],
    [y31, y32, y33, y34, y35],
    [y41, y42, y43, y44, y45],
    [y51, y52, y53, y54, y55]
]

Assume the first column of each matrix is a timestamp which can be compared to each other (e.g. TS0 < TS1 == True). I'm using the minutes and seconds only for sake of brevety (e.g. 59 is 59 seconds after the hour/minute; 1727 is 17 minutes 27 seconds after the hour, etc)

The return value must be a third matrix, nm, where the first timestamp column will survive and the remaining columns will be joined in one of two ways:

1. First way is to join the arrays if either element has changed.

So the result should look like this:

([
    [159, 24.5, 25.5, 26.5, 4, 124.5, 125.5, 126.5],
    [1227, 24.5, 25.5, 26.5, 4, 121.5, 122.5, 123.5],
    [1452, 24.5, 25.5, 26.5, 4, 122.0, 123.0, 124.0],
    [1727, 21.5, 22.5, 23.5, 9, 122.0, 123.0, 124.0],
    [2252, 22.0, 23.0, 24.0, 4, 122.0, 123.0, 124.0]
])
[
    [y11, x12, x13, x14, x15, y12, y13, y14], # element in y changes; x does not
    [y21, x12, x13, x14, x15, y22, y23, y24], # element in y changes; x does not
    [y41, x12, x13, x14, x15, y42, y43, y44], # element in y changes; x does not
    [x21, x22, x23, x24, x25, y42, y43, y44], # element in y does not change; x does
    [x41, x42, x43, x44, x45, y42, y43, y44], # element in y does not change; x does
]

2. Second way is to join the arrays if both elements have changed.

So the result should look like this:

([
    [159, 24.5, 25.5, 26.5, 4, 124.5, 125.5],
    [1727, 21.5, 22.5, 23.5, 9, 122.0, 123.0]
])

([
    [y11, x12, x13, x14, x15, y12, y13, y14],
    [x21, x22, x23, x24, x25, y42, y43, y44]
])

I've made far too many attempts to post code here, but essentially I've been able to return a matrix to (kind of) match case 1:

[
    [159 124.5 125.5 126.5],
    [1227 121.5 122.5 123.5],
    [1452 122.0 123.0 124.0],
    [1727 21.5 22.5 23.5 9],
    [2252 22.0 23.0 24.0 4]
]

but this is not complete of course because it does not actually "join the arrays".
0
Comment
Question by:strimp101
  • 2
  • 2
4 Comments
 
LVL 9

Expert Comment

by:zaghaghi
Comment Utility
I think that it's not well defined! Please make it clear.
0
 

Author Comment

by:strimp101
Comment Utility
Thanks for the response - this is a bit difficult for me to explain, but here goes:

The arrays represent price series for derivative (credit default swaps) securities. The goal of the project is to use the joined matrix to run robust regressions (OLS rejection, winsor, ORD, etc.).

The securities trade infrequenty without a "closing price" like stocks so we cannot compare them arbitrarily by lining them in up in series. So the two methods I outlined are ways to join the series in ways suitable for regression analysis.

For sake of argument, use the first two columns of the matricies - column 1 representing a timestamp and column 2 representing the price.

X starts at price 24.5 at time 59 (first row) and does not change until it changes to price 21.5 at time 1727 (second row).

In the meantime, price Y changes from 124.5 at time 159 (first row) to 121.5 at time 1227 (second row).

So at this point, nm should have two rows:

159, 24.5, 25.5, 26.5, 4, 124.5, 125.5, 126.5;
1227, 24.5, 25.5, 26.5, 4, 121.5, 122.5, 123.5;

Column 1 is the timestamp from matrix y.
Columns 2 - 5 are the values from matrix x (note the value is the same in both rows because the value did not change between time 159 and 1227).
Columns 6 - 8 are the values from matrix y (note the values differ in each row because the y values change at time 159 and 1227).

Y does not change again until time 1452 to price 122.

So Y changes three total times.

While Y is changing from time 159, x is not changing.

So if we were to put time times in order (column 1 of both matricies), the times would be:

59 from x < the x value from this timestamp persists at each y value change below until at least the next x change at time 1840
159 from y
1227 from y
1340 from y
1452 from y
1840 from x
2252 from x
2445 from x
2945 from y

The only time we record a price is when one or the other change from the previous value.
0
 
LVL 9

Accepted Solution

by:
zaghaghi earned 500 total points
Comment Utility
Hi,

After an hour, i think that i understand what you want and you can find out the script needed for the first case.

#import what we need for array operations
from numpy import array,  concatenate,  insert,  delete,  allclose

#first array
xs = array([
    [59, 24.5, 25.5, 26.5, 4],
    [1727, 21.5, 22.5, 23.5, 9],
    [1840, 21.5, 22.5, 23.5, 9],
    [2252, 22.0, 23.0, 24.0, 4],
    [2445, 22.0, 23.0, 24.0, 4]
])

#second array
ys = array([
    [159, 124.5, 125.5, 126.5],
    [1227, 121.5, 122.5, 123.5], 
    [1340, 121.5, 122.5, 123.5],
    [1452, 122.0, 123.0, 124.0], 
    [2945, 122.0, 123.0, 124.0]
])

# we walk on time, start from min timestamp to max timestamp
#x array contains last xs[xi] that we pass its time
x = array([])
#x array contains last ys[yi] that we pass its time
y = array([])
#contains un filtered result, i.e. contanins additional rows that are not needed
result = array([[0., 0., 0., 0., 0., 0., 0., 0.]])
#xi and yi are indices for xs and xy respectively
xi = 0
yi = 0
#here we start, until one of xs or ys array completely  checked
while xi < len(xs) and yi < len(ys):
    #if time stamp of xs is smaller than ys, then we update result array by inserting a new row to it containing timestamp, x and y
    if xs[xi][0] < ys[yi][0]:
        x = xs[xi][1:]
        #this if checks that y is not empty(i.e for the first time)
        if len(y) > 0:
            # make new row with smallest timestamp x and y
            r = concatenate(([xs[xi][0]] ,  x ,  y))
            #insert the new row to the result array
            result = insert(result, len(result),   r,  axis=0)
        #increase xi by one
        xi = xi + 1
    else:
        y = ys[yi][1:]
        #this if checks that x is not empty(i.e for the first time)
        if len(x) > 0:
            # make new row with smallest timestamp x and y
            r = concatenate(([ys[yi][0]] ,  x ,  y))
            #insert the new row to the result array
            result = insert(result, len(result),   r,  axis=0)
        #increase yi by one
        yi = yi + 1

#if xs does not completely checked, this while help us to check remaining items in it
while xi < len(xs):
        x = xs[xi][1:]
        r = concatenate(([xs[xi][0]] ,  x ,  y))
        result = insert(result, len(result),   r,  axis=0)
        xi = xi + 1
#and if ys does not completely checked, this while help us to check remaining items in it, note that only one of these two whiles executed!
while yi < len(ys):
        y = ys[yi][1:]
        r = concatenate(([ys[yi][0]] ,  x ,  y))
        result = insert(result, len(result),   r,  axis=0)
        yi = yi + 1

#remove row number i if its x and y is the same as row number i-1.
for i in reversed(range(1, len(result))):
    if allclose(result[i][1:], result[i-1][1:]):
        result = delete(result,  i,  axis=0)

#delete the first row i.e. [0.,0.,0.,0.....
result = delete(result,  0,  axis=0)

print(result)

Open in new window

0
 

Author Closing Comment

by:strimp101
Comment Utility
Nice.
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
This is an explanation of a simple data model to help parse a JSON feed
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now