Quick data sort for ranges

Ian Bell
Ian Bell used Ask the Experts™
on
Hi,

I have a very large database which I would like to divide into eight separate sheets. Reason being it is too large for the tasks I want to  perform.
For simplicity sake lets call the columns, Gender, Age, ID

The sort would be the following

1) Male, (age) <20, (id) <10
2) Male (age) <20, (id) >9
3) Male (age) >19,(id) <10
4) Male (age) >19, (id) >9
5) Female, (age) <20, (id) <10
6) Female (age) <20, (id) >9
7) Female (age) >19,(id) <10
8) Female (age) >19, (id) >9

The way I would normally do it takes forever.
If there is a much quicker way I'd be very appreciative.

Many Thanks

Ian
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
nociSoftware Engineer
Distinguished Expert 2018

Commented:
Create an extra column where you Calculate the sort key: and split accodring to this column.

Gender 1st character (M/F) + Age>19 (F/T) + id >9 (F/T)   So your sort key is MFF, MFT, MTF, MTT, FFF....
Gender = Female (F/T) + Age>19 (F/T) + id >9 (F/T)   so your disect key = FFF, FFT, FTF, FTT, TFF, ...

F will become before M... if order is important reverse the logic where needed.
Tomas Helgi JohannssonDatabase Administrator / Software Engineer

Commented:
Hi,

Are the columns Gender, Age and id in an index (all columns in the same index and in this order) on the table in the database ? Such index would speed up your query.

Regards,
    Tomas Helgi
Ian Bellretired

Author

Commented:
Noci, Would you mind placing your formulas in the attached sheet it would be of immense help.
Thanks
Ian
Sort-Ranges.xlsx
Exploring SQL Server 2016: Fundamentals

Learn the fundamentals of Microsoft SQL Server, a relational database management system that stores and retrieves data when requested by other software applications.

Ian Bellretired

Author

Commented:
Not in an index Tomas. I have attached a sheet can you please see if and how it can be done
Thanks
Martin Liss"Life would be infinitely happier if we could only be born at the age of eighty and gradually approach eighteen." - Mark Twain
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
I assume that your main aim is to split up the data and the attached workbook will do that. There's also a Private Sub named 'CreateData' that will allow you to create as much data as you need.
29166429.xlsm
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
I'm confused.  Are you wanting the sorting from the database or inside the spreadsheet?

If in the database, please provide the database type and version.
Ian Bellretired

Author

Commented:
TBH I am not fussy.
All I want is to relieve the pressure from my CPU that is calculating 180k+ rows risking shut downs and loss of data.
4 sheets would be ideal as per first post.
Martin Liss"Life would be infinitely happier if we could only be born at the age of eighty and gradually approach eighteen." - Mark Twain
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
It looks like you said 8, not 4.
Ian Bellretired

Author

Commented:
oh yes I meant 8 sorry
Martin Liss"Life would be infinitely happier if we could only be born at the age of eighty and gradually approach eighteen." - Mark Twain
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
Okay. Did you try my wotkbook?
Ian Bellretired

Author

Commented:
Not yet but need to go to bed as feeling unwell due to a stomach virus. Will look later. Thanks Martin
nociSoftware Engineer
Distinguished Expert 2018

Commented:
Formula: =CONCAT(LEFT(A2);IF(B2>19;"T";"F");IF(C2>9;"T";"F"))
in a column select behind the first 3.
Tom FarrarConsultant

Commented:
If the reason for splitting the data into 8 sheets is due to pressure on, your CPU, resulting in possible shutdown, you might want to look into Excel's Power Query (In 2016 called "Get and Transform").  The functionality is such as to deal with large number of records, with efficient calculations, while minimizing the size of the spreadsheet.  Just another thought..
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
Personally, I would let the database do the sorting.  Churning through and manipulating data is what they are good at.
Ian Bellretired

Author

Commented:
Thanks guys for all of your suggestions. I'm going to have to delay this for several days as I am unwell. Hopefully I can get back to you later in the week or weekend.
I will be bed bound till then.
Ian
Rob HensonFinance Analyst

Commented:
Hope you're feeling better.

I think you can use a feature of pivot tables to create your extra sheets.

In column D of the sample I have used the following formula to group, as suggested by noci:
=LEFT(A2,1)&IF(B2<20,"M","S")&IF(C2<10,"F","S")

I have assumed a slight error in your grouping criteria as they don't quite make sense. "Age < 20" OR "Age > 19". I have assumed that 20 is the break point. Likewise the ID group doesn't make sense, "ID < 10" or "ID > 9"; again I have assumed break point at 10.

This creates groups
FMF  (Female - Minor - First)
FMS (Female - Minor - Second)
FSF (Female - Senior - First)
FSS
MMF
MMS
MSF
MSS

You can then create a pivot on the data including this column and add that column as a page filter.

With the pivot that is created you can then use the feature to split it into separate sheets according to the Page filter values,

To do the split, select the pivot table and an extra group will appear on the tool bar with "PivotTable Tools" header. Select the Analyse tab and at the left hand end there is a button for Pivot Table Options. Use the dropdown next to the button and select "Show Report Filter Pages". This will then show a pop up with the headers that have been used for Page filters listed, in this instance it is just "Group", click OK and a new sheet will be added with a pivot table filtered for each group.

I have recreated a table of data that is more than 180k rows with random ages (1 to 95) and IDs (1 to 30). The pivot handled the size of the data very quickly and created the 8 pages within a matter of seconds. No sorting needed as the data gets sorted in the pivots.

One caveat, where there is a repetition of age and ID within a group, the pivot has grouped them together and given a count for that combination.

I have tried uploading the recreated file but it is just under 5mb so having issues doing so.
Ian Bellretired

Author

Commented:
Hi Rob I'm up out of bed but not recovered yet.
What is wrong with >19 or <20   doesn't <20 include 19 and same with ID ?

I hope to address these helpful solutions over the next 2-3 days. Thanks for your patience.

Ian

ps........ it would he helpful if you could place the formulas and other suggestions on the sheet I provided.
Rob HensonFinance Analyst

Commented:
Stripped out the majority of the random data I inserted and reduced to only 1000 rows so file size is much smaller.

See attached

Using the two criteria for age (and ID) just doesn't sound right although it does give the same result.
<20 will include 1 to 19
>19 will include 20 and above
Sort-Ranges.xlsx
nociSoftware Engineer
Distinguished Expert 2018

Commented:
<20 & >19    is more work from a testing perspective, programming you mention two values for one border item.
Having one border item is also less work if you once need to change things.   And the result is only equal for integers (whole discrete numbers).
>19   is true   for 19.5
<20 is true for 19.5

so <20   & >=20 is a beter formulation.    in boolean "<20"  equals "not (>= 20)" ... so one crtierium, true (or false) for all numbers.

Also the 20 could be a reference to some other data, then code like
     <(age)   or >(age -1) makes the 2nd definitely more work. (two operations (subtract then compare) in stead of one(compare) ).
where
    <age    or >=age       both are the same compare, just a different interpretation of the result.

Spreadsheets work on all numbers .. so 19 might actually be 19.0000000000000000000000001 in one case and 18.999999999999999999999 in another.. with clearly different result.   the rounding will cause both to be shown as 19. The difference may stem from calculations, entering in spreadsheet screen, another from import from CSV etc.
Ian Bellretired

Author

Commented:
Feeling a little better today so looking at the spreadsheets provided so far by
Martin and Rob.
I am wondering if putting the data into SQL might be quicker as there are approximately 900,000 rows in total
and to be divided into around 42 files. My mention of 8 files was only part of the project.

Martin & Rob, If I dumped all 900k rows into your worksheet would it handle it assuming I make the necessary changes to the VBA code ?
Martin Liss"Life would be infinitely happier if we could only be born at the age of eighty and gradually approach eighteen." - Mark Twain
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
This processes 900,000 rows in 12 seconds.
29166429a.xlsm
Ian Bellretired

Author

Commented:
Sounds great but those are only the sort columns. There are around 150 columns that form the database. I'm sure that would take a heck of a lot longer.
Martin Liss"Life would be infinitely happier if we could only be born at the age of eighty and gradually approach eighteen." - Mark Twain
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
I think the only way to know is to try it. I suggest that you save your workbook as an xlsb file first. If you use any of my code and you need any explanation then please ask.
Ian Bellretired

Author

Commented:
Thanks Martin, First I need to generate the file which may take several days.
So the output will be produced in say 42 sheets in the one workbook is that right ?
That would be quite cumbersome.
Martin Liss"Life would be infinitely happier if we could only be born at the age of eighty and gradually approach eighteen." - Mark Twain
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
My code won't automatically generate 42 sheets but adapting it to do that won't be hard. Cumbersome is in the eye of the beholder and if you see it that way then maybe it can be done in another way (a few separate workbooks?), but your original question seems to require separate sheets so I'm a little confused.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
In the original question you mentioned "database".  Excel is NOT a database and shouldn't be used as one.

If you are wanting to do analysis, I would strongly suggest at least Access.  Again, chewing through data is what databases do.

If you see this as an option, I would normalize the data into multiple tables then perform your analysis.
Ian Bellretired

Author

Commented:
Slightwv, I wondered as per one of my earlier post if SQL would be a better vehicle for this.
I have with the help of a friend been sorting data to SQL on other large worksheet data.
This question was posed to see if there was a quick way of sorting the data but now thinking about
it creating views in SQL may make life a lot simpler.
Most Valuable Expert 2012
Distinguished Expert 2018
Commented:
When you say SQL, should I assume SQL Server?  Not sure you need to make that jump.  Access has many tools like graphs that are pretty much the same as Excel.  It also has reporting capability.

I would suggest moving everything to a database and forgetting about Excel all together.
Ian Bellretired

Author

Commented:
Thanks slightwv I will do just that. I already have data stored in SQL Server. I just thought Excel might have been a viable option.
Ian Bellretired

Author

Commented:
Thanks guys for all your assistance, it would appear that SQL is the way to go.
I have taken onboard your suggestions and will use them on smaller projects.
Cheers
Ian

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial