Can Stata be a substitute for Excel ?

For years I have been frustrated with the limitations of Excel.
I need to perform  'IF, AND, OR' etc formulas down 200,000+ rows by 10 columns wide, all cells in that range
contain formulas. Excel can accommodate around 40,000 rows making that 400,000 cells of formulas in an
acceptable time period. I would run about 20-30 simulations in the space of one hour.
When using Excel and rows of 70k + the CPU would run to 100%  
I would like to know if Stata can carry out the above as I purchased version 10 about three years ago and it is
currently being unused. I was a user of Stata a few years ago but on pre programmed code. I am not a programmer
but willing to learn the basics required to perform these tasks.
The computer spec is as follows: Dell 7500 Inspiron Workstation with memory riser 24GB Ram and 8 processors.
Hard drive 450GB with 330GB free. I would consider increasing the RAM to say 96GB if it would speed things up.
Any advice you can give would be much appreciated.
Thank You
Ian
Ian BellretiredAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Subodh Tiwari (Neeraj)Excel & VBA ExpertCommented:
I suggest that insert the formulas in the sheet through VBA code and then convert them into values.
You may have a button to do this whenever you need to recalculate the formulas.
ste5anSenior DeveloperCommented:
I guess it depends on the type of calculations. Stata is targeted to statistical calculations. Excel is a general spreadsheet analysis tool.

I can think about using Access, when you're formulas are not that complex.
Another option would be using 64bit  Excel, when it is a performance problem caused by memory pressure.

And depending on how you run your simulations, maybe an approach would be to automate these.
Ian BellretiredAuthor Commented:
I have turned the automatic calculation off and use the F9 key.
The types of calculations are shown in the formulas on the earlier
attached spreadsheet. I am using both Excel 64 Bit 2013 and Excel 32 Bit 2010
I don't see any difference between the two performance wise.
Protecting & Securing Your Critical Data

Considering 93 percent of companies file for bankruptcy within 12 months of a disaster that blocked access to their data for 10 days or more, planning for the worst is just smart business. Learn how Acronis Backup integrates security at every stage

ste5anSenior DeveloperCommented:
Nope, no attachment.
Ian BellretiredAuthor Commented:
Oops sorry, I thought I had. Here it is.
ExcelSample.xlsx
ste5anSenior DeveloperCommented:
The formulas are not that complex.

How do your simulations differ from each other?

I would think about using Access as calculation tool.

Imho Stata is already overkill, when you don't further statistical analysis or visualizations.
Subodh Tiwari (Neeraj)Excel & VBA ExpertCommented:
Here is what I was talking about.
In the attached workbook, click on the button Calculate to place the formulas in all the columns and then covert them into values.
The formulas are dynamic and depends the number of rows in col. A. So if you add more data down the rows and click on Calculate button, the code will calculate the values as per the formulas for each column and place the values in them.
Compare this with your original sample to see if you are getting the correct output for each column.
ExcelSample.xlsm
Ejgil HedegaardCommented:
Another way is to reduce the number of "real" calculations, and reuse results.
The result of the formula in G5 is equal to G4 when A4 is equal to A3, so no need to calculate again for all 150,000 rows, just use the result in G4.
So the formula for G4 would be
=IF(A4="","",IF(A4=A3,G3,MAX(INDEX(($A$4:$A$150000=A4)*($D$4:$D$150000),0))))
Similar for H4
=IF(A4=A3,H3,SUM(INDEX(($A$4:$A$150000=A4)*($D$4:$D$150000),0)))
Some of the formulas in columns L, N and O use 2 conditions, but if those 2 conditions are equal to the row above, just use that result.
For L4
=IF(ISBLANK($A4),"",IF(AND(A4=A3,K4=K3),L3,SUMPRODUCT(($A$4:$A$150000=$A4)*(K4<$K$4:$K$150000))+1))
For N4
=IF(AND(A4=A3,L4=L3),N3,SUMIFS($I$4:$I$150000,$A$4:$A$150000,"="&A4,$L$4:$L$150000,"<="&L4))
For O4
=IF(AND(A4=A3,L4=L3),O3,SUMIFS($M$4:$M$150000,$A$4:$A$150000,"="&A4,$L$4:$L$150000,"<="&L4))

On this small dataset a full calculation Ctrl+Alt+F9 reduce the calculation time with a factor of more than 10.

See file.
ExcelSample-1.xlsx

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Ian BellretiredAuthor Commented:
Hi Ejgil,

Fantastic job !  it worked brilliantly. I was able to copy down 150,000 rows
in less than one minute.
CPU usage around 23-38 max  physical memory approx 25% even with live streaming
of the rugby in the background  :)
I can see the logic where a cell is equal to the above cell saves a huge amount of time.
Thank You
Ian
Ian BellretiredAuthor Commented:
oops!!  apologies. I discovered I had set the auto calc off. It took "less than one minute" to refresh the file
and after pressing F9 it took about 3 hours to calculate even with Ejgil's fine contribution. Seems like a
lost cause. I may need to look at other software for the job. If there are no other last minute comments then I will close the question and thank you all for your contributions.
Ian
Ejgil HedegaardCommented:
All of the large calculations depend on the event number in column A.
The events are sorted, so all events with the same number follow each other.
Instead of calculating for 150,000 rows it is only necessary to use the range with the same event number.
In attached sheet I have added 2 columns, S and T, for the start and end row for the event.
Then used the Indirect function to set the ranges in the formulas.

The formula in G4 would then be
=IF(A4="","",IF(A4=A3,G3,MAX(INDEX((INDIRECT("A"&S4&":A"&T4)=A4)*(INDIRECT("D"&S4&":D"&T4)),0))))
See sheet for the others.

Copy the formulas in S4 and T4 down all 150.000 rows before copying the formulas in column G to R.
That ensures that the ranges are defined for the formulas.
ExcelSample-2.xlsx
Ian BellretiredAuthor Commented:
Hi Ejgil,
That is perfect !! I was able to copy down 192,275 rows in 56 seconds.
You're my hero :)  I'd almost given up on Excel.
Thank you so much !!
Ian
question now closed
Ian BellretiredAuthor Commented:
Thanks to all. That was a long time ago.
I will now raise a new question on how to incorporate Ejgil's formula into a new one.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Excel

From novice to tech pro — start learning today.