Solved

VBA Array Reshape/Interpolate/Condense Input Data

Posted on 2015-02-15
7
315 Views
Last Modified: 2016-02-10
I'm not sure how to explain my problem in terms of general VBA functions.  But here's what I want.  Say I have an input data like this:

X      Y
0      1
3      7
6      31
9      73
12      133
15      211
18      307
21      421

But I want to reshape/interpolate/condense/adjust the step size of the data to look like this:

X      Y
0      1
5      21
10      91
15      211
20      381

Ignore the fact that I used a formula to generate the above results.  The data I will be manipulating is being read via text file, not excel.

I want to be able to specify at what delta/step size I'd like the new array to have.  Some additional details: linear interpolation of the results is fine, spline would be nice, but definitely not a deal breaker.  The data is always continuous and sorted in the X axis, starting at 0 going to some maximum, say 8500.  The data is exported from a vendor software that I use, and it could be in arbitrary step sizes from 0.1 to 10 based on the settings that the user had set at the time of export.  The text files can approach 5 MB, and I typically only want certain specific (but changing) values from the text file.  The values I want are typically multiples of 100, say 1000, 3000, 4500, 6600, etc.  For this particular application, I could be looking up ~15 values from 15 different large text files, and I know that using the current data as exported will take too long.   In order to keep from looking up/interpolating the specific values I want from a large text file, I'd like to be able to cut out a lot of the data and resave it to a much smaller text file to make it easier to handle on an as needed basis.  Whenever a value is needed in my particular application, I'll just look it up in the newer, much smaller text file.

I could probably figure it out over some time, but I'd like for the solution to be efficient, and I'm well aware that I'm not enough of a wizard to create a speedy solution for this particular problem.  Any help is much, much appreciated.
0
Comment
Question by:steller03
  • 2
  • 2
  • 2
  • +1
7 Comments
 
LVL 18

Expert Comment

by:SimonAdept
ID: 40611768
You could import the text files and then delete all records where x MOD [the value you want to step by] <>0

e.g. Delete from tblImportedTextFile where x mod 3 <>0

Or you retain the imported text file but generate your lookup tables as need using

select * into myLookupTable from tblImportedTextFile where x mod 5=0
0
 
LVL 12

Expert Comment

by:FarWest
ID: 40611775
do you mean that
X      Y
0      1
5      21
10      91
15      211
20      381
represents lines (as is) in different text files and not by formula,?
 the required  code  will:
1) only take step value to select all lines that match exact steps and store it in a shrink file
2) the code will select only one line from each data file above to build the result array
is this right?
also when you said
The values I want are typically multiples of 100, say 1000, 3000, 4500, 6600, etc.
do you mean by value the "Y" value?
0
 
LVL 84
ID: 40611922
Have you tried importing those text files to temporary tables, and then using standard VBA/SQL to query the table to retrieve the values you need? You could still create the array, or you could just dump the data into another temporary table.
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 

Author Comment

by:steller03
ID: 40613428
You could import the text files and then delete all records where x MOD [the value you want to step by] <>0

The problem with this solution is that I want to retain the overall resolution of the curve I'm resaving the data for.  For example, say the step size is 9 units.  Here, the mod 9<> 0 will occur every tenth data point.  Say I resave this data to a text file, then I want to look up some value at a given X.  If this X value is not in the shrink file, I will have to linearly interpolate between values, but I will be interpolating between every 10th value from the raw data, which would most likely result in a reporting error outside the scope of what is acceptable for this application.

Say for example the raw data has y values for x=5, 8, 11, 14, 17, 20, etc. but I want to save multiples of 5, so x=10 & x=15.   I don't want to have to delete the data where mod5<>0, and then interpolate.  This would result in interpolating between x=5 and x=20 for my x=10 & x=15 values.  I want to interpolate between x=8 and x=11 for x=10 and interpolate between x=14 and x=17 for x=15.  The order of operation matters is important in this instance.  

represents lines (as is) in different text files and not by formula,?

Here's an example pasted from a file export.  It's just a tab/ vbCr delimted text file with misc header info throughout.  The data is raw output values for a given x-axis, no formulas involved.

40            57.7034
45            53.9547
50            50.6544
55            47.7154

do you mean by value the "Y" value?

No, I mean the X-value is typically a multiple of 100.  Although, some exceptions to this rule are 1440, 4680,  5940, 7020, & 7029.

Have you tried importing those text files to temporary tables, and then using standard VBA/SQL to query the table to retrieve the values you need?

I've been hestitant to use temporary tables because of performance issues.  I know to delete temp tables, but I'm still worried about database bloat.
0
 
LVL 18

Accepted Solution

by:
SimonAdept earned 500 total points
ID: 40613527
Have you seen this Excel-based interpolation thread (I Googled "vba interpolation"):
http://www.vbaexpress.com/forum/showthread.php?41522-Linear-Interpolation
It appears to cover the same subject area, and Mike Rickson's UDF could be re-written to work in Access.

>I've been hestitant to use temporary tables because of performance issues.  I know to delete temp tables, but I'm still worried about database bloat.

Database bloat is a distant secondary consideration to getting the right result and acceptable performance, unless your environment has severe constraints on storage. It's trivial to compact the database when required, and presumably you always have the original text files to refer to.

There are also products like MatLab and OriginLab that have algorithms for reducing the number of points used in plots
See http://uk.mathworks.com/products/?s_tid=gn_ps
0
 
LVL 84
ID: 40613777
I know to delete temp tables, but I'm still worried about database bloat.
Many use a separate, external database when working with temporary tables. In some cases, where I have a LOT of activity going on in those external tables, I'll delete/recreate the entire database as needed. That pretty much stops bloat, and the performance hit for creating a new database is minimal.
0
 

Author Closing Comment

by:steller03
ID: 40632250
I saw the post on interpolation and was hesitant to use if at the get go.  I shouldn't have been.  Turns out that solution is really fast as long as your data is sorted.  And mine is, and always will be, so I just modified that for my needs.  Thanks for the help.
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
Describes a method of obtaining an object variable to an already running instance of Microsoft Access so that it can be controlled via automation.
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now