Link to home
Start Free TrialLog in
Avatar of Ian Bell
Ian BellFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Can Stata be a substitute for Excel ?

For years I have been frustrated with the limitations of Excel.
I need to perform  'IF, AND, OR' etc formulas down 200,000+ rows by 10 columns wide, all cells in that range
contain formulas. Excel can accommodate around 40,000 rows making that 400,000 cells of formulas in an
acceptable time period. I would run about 20-30 simulations in the space of one hour.
When using Excel and rows of 70k + the CPU would run to 100%  
I would like to know if Stata can carry out the above as I purchased version 10 about three years ago and it is
currently being unused. I was a user of Stata a few years ago but on pre programmed code. I am not a programmer
but willing to learn the basics required to perform these tasks.
The computer spec is as follows: Dell 7500 Inspiron Workstation with memory riser 24GB Ram and 8 processors.
Hard drive 450GB with 330GB free. I would consider increasing the RAM to say 96GB if it would speed things up.
Any advice you can give would be much appreciated.
Thank You
Ian
Avatar of Subodh Tiwari (Neeraj)
Subodh Tiwari (Neeraj)
Flag of India image

I suggest that insert the formulas in the sheet through VBA code and then convert them into values.
You may have a button to do this whenever you need to recalculate the formulas.
I guess it depends on the type of calculations. Stata is targeted to statistical calculations. Excel is a general spreadsheet analysis tool.

I can think about using Access, when you're formulas are not that complex.
Another option would be using 64bit  Excel, when it is a performance problem caused by memory pressure.

And depending on how you run your simulations, maybe an approach would be to automate these.
Avatar of Ian Bell

ASKER

I have turned the automatic calculation off and use the F9 key.
The types of calculations are shown in the formulas on the earlier
attached spreadsheet. I am using both Excel 64 Bit 2013 and Excel 32 Bit 2010
I don't see any difference between the two performance wise.
Nope, no attachment.
Oops sorry, I thought I had. Here it is.
ExcelSample.xlsx
The formulas are not that complex.

How do your simulations differ from each other?

I would think about using Access as calculation tool.

Imho Stata is already overkill, when you don't further statistical analysis or visualizations.
Here is what I was talking about.
In the attached workbook, click on the button Calculate to place the formulas in all the columns and then covert them into values.
The formulas are dynamic and depends the number of rows in col. A. So if you add more data down the rows and click on Calculate button, the code will calculate the values as per the formulas for each column and place the values in them.
Compare this with your original sample to see if you are getting the correct output for each column.
ExcelSample.xlsm
ASKER CERTIFIED SOLUTION
Avatar of Ejgil Hedegaard
Ejgil Hedegaard
Flag of Denmark image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Ejgil,

Fantastic job !  it worked brilliantly. I was able to copy down 150,000 rows
in less than one minute.
CPU usage around 23-38 max  physical memory approx 25% even with live streaming
of the rugby in the background  :)
I can see the logic where a cell is equal to the above cell saves a huge amount of time.
Thank You
Ian
oops!!  apologies. I discovered I had set the auto calc off. It took "less than one minute" to refresh the file
and after pressing F9 it took about 3 hours to calculate even with Ejgil's fine contribution. Seems like a
lost cause. I may need to look at other software for the job. If there are no other last minute comments then I will close the question and thank you all for your contributions.
Ian
All of the large calculations depend on the event number in column A.
The events are sorted, so all events with the same number follow each other.
Instead of calculating for 150,000 rows it is only necessary to use the range with the same event number.
In attached sheet I have added 2 columns, S and T, for the start and end row for the event.
Then used the Indirect function to set the ranges in the formulas.

The formula in G4 would then be
=IF(A4="","",IF(A4=A3,G3,MAX(INDEX((INDIRECT("A"&S4&":A"&T4)=A4)*(INDIRECT("D"&S4&":D"&T4)),0))))
See sheet for the others.

Copy the formulas in S4 and T4 down all 150.000 rows before copying the formulas in column G to R.
That ensures that the ranges are defined for the formulas.
ExcelSample-2.xlsx
Hi Ejgil,
That is perfect !! I was able to copy down 192,275 rows in 56 seconds.
You're my hero :)  I'd almost given up on Excel.
Thank you so much !!
Ian
question now closed
Thanks to all. That was a long time ago.
I will now raise a new question on how to incorporate Ejgil's formula into a new one.