Solved

Use Excel to solve somewhat complex data sorting and analysis

Posted on 2015-02-15
35
85 Views
Last Modified: 2016-02-11
I am work for a fire department and we are trying to do some data analysis from our call information. We have been successful using this data in Excel to answer simple questions, but one question has been vexing us for some time. That's the ability to count how many times per day/month/year/etc we have certain units out of service all at the same time.

Attached is a small amount of data with some items redacted from our raw data dump (which is about 40k lines per year). Column U has a description of which unit is assigned, and columns V and Y have the start and end time of the calls. I'm trying to answer questions like, how many times in a given time period are BAM1, BAM5, BAM6 all unavailable at the same time (meaning that the start and stop times of the calls they are on all would have them all unavailable at the same time).

We know this happens a lot, and there are other sets of units we want to apply this logic to, but I just don't know how to use this data to answer this question. I'm hoping to get a number over a year for my larger data sets and then know many times per day different unit sets are all unavailable.
Sample-FD-Data-Set.xlsx
0
Comment
Question by:ssittig
  • 13
  • 11
  • 11
35 Comments
 
LVL 29

Expert Comment

by:gowflow
ID: 40611041
Not clear can you further clarify what is needed ?

Mre your request is for units BAM1, BAM5, BAM6 but the data sample is missing BAM5 at least provide the minimum so if we build a formula we are bale to test it.

and appreciate in the sample you provide give us a clear example like as row so and so = so then we should get abc ... etc..

This would very much expedite an answer for you.

gowflow
0
 
LVL 29

Expert Comment

by:gowflow
ID: 40611079
Also when you say all at the same time unavailable what exactly in this data shows this information. Col V and Y is dispatch time and finished time ?

gowflow
0
 
LVL 29

Expert Comment

by:gowflow
ID: 40611083
if you put this formula in AP2 and drag it down
=COUNTIFS(U:U,"=BAM6",V:V,">=" & V2,Y:Y,"<=" & Y2)
it should give you a count each time BAM6 is not available

if you put this in AQ2 and drag down
=COUNTIFS(U:U,"=BAM1",U:U,"=BAM5",U:U,"=BAM6",V:V,">=" & V2,Y:Y,"<=" & Y2)
it should give you a count each time all of BAM1, BAM5 and BAM6 unavailable.

As you notice for the latter as there is not data for BAM5 you are getting a 0 all the way.
You may replace BAM6 by BAM1 in the first formula and see the results.

gowflow
0
 
LVL 1

Author Comment

by:ssittig
ID: 40611249
Gowflow,

Thanks for taking a shot at this. I can put up a bigger data sample if that's helpful. The whole data set for a year is about a 13MB excel file.

To clarify what is needed, here is what I'm looking for:

If BAM1 (Column U) goes out on a call (Column B) from 0900 (Column V) to 01000 (Column Y) and BAM6 goes out on a call from 0930 to 1030, then they are both unavailable (out of service in our terminology) from 0930 to 1000. This happens I would count this as 1 time that BAM1 and BAM6 are both unavailable. This takes calculating between multiple rows. I dont think you have to calculate Column B, only overlapping out of service times.

I'm hoping to use this logic to develop several counting situations (BAM1/BAM5/BAM6), (BAE1/BAE2), and (BAE4,BAE5,BAE6,BAE7) for starters and maybe some more. Those are the most important ones.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40611429
Do you need the instances or the exact time slots where all three are out on call?

It would be nice to see some BAM5 rows.

Do you have MS Access?
0
 
LVL 1

Author Comment

by:ssittig
ID: 40611484
I just need the number of instances, not the actual time slot.

Attached is a months worth of data so you will have all kinds of stuff in there to play around with.

Yes, I have Access, and I realize that it may be the trick, but I similarly don't know what kind of Query i would use to produce this.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40611489
Your file upload didn't work
0
 
LVL 1

Author Comment

by:ssittig
ID: 40611534
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40611644
I imported the Excel data into an Access database, creating an autonumber field (ID) in the process.  When I ran the following query, no rows were returned.
SELECT Q_28617288.[Inc Num], Q_28617288.[Apparatus Assigned], Q_28617288.[Dispatch Date], Q_28617288.[Available Date], Q_28617288_1.[Inc Num], Q_28617288_1.[Apparatus Assigned], Q_28617288_1.[Dispatch Date], Q_28617288_1.[Available Date], Q_28617288_2.[Inc Num], Q_28617288_2.[Apparatus Assigned], Q_28617288_2.[Dispatch Date], Q_28617288_2.[Available Date]

FROM Q_28617288, Q_28617288 AS Q_28617288_1, Q_28617288 AS Q_28617288_2

WHERE (
(Q_28617288.[Apparatus Assigned] In ("BAM1","BAM5","BAM6")) 

AND Q_28617288_1.[Apparatus Assigned] In ("BAM1","BAM5","BAM6") 
And Q_28617288.ID<[Q_28617288_1].ID 
AND (
	Q_28617288_1.[Dispatch Date] Between [Q_28617288].[Dispatch Date] And [Q_28617288].[Available Date]
	OR
	Q_28617288_1.[Available Date] Between [Q_28617288].[Dispatch Date] And [Q_28617288].[Available Date]
) 

AND Q_28617288_2.[Apparatus Assigned] In ("BAM1","BAM5","BAM6") 
And Q_28617288_1.ID<[Q_28617288_2].ID 
AND (
	Q_28617288_2.[Dispatch Date] Between [Q_28617288_1].[Dispatch Date] And [Q_28617288_1].[Available Date]
	OR
	Q_28617288_2.[Available Date] Between [Q_28617288_1].[Dispatch Date] And [Q_28617288_1].[Available Date]
)

)

ORDER BY Q_28617288.[Dispatch Date]

Open in new window

0
 
LVL 1

Author Comment

by:ssittig
ID: 40611677
Just to validate your logic, if you run that query with just BAE1 and BAE2 do you get a count greater than zero?
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40612070
This two-item version of the query:
SELECT Q_28617288.[Inc Num], Q_28617288.[Apparatus Assigned], 
Q_28617288.[Dispatch Date], Q_28617288.[Available Date], 
Q_28617288_1.[Inc Num], Q_28617288_1.[Apparatus Assigned], 
Q_28617288_1.[Dispatch Date], Q_28617288_1.[Available Date]

FROM Q_28617288, Q_28617288 AS Q_28617288_1

WHERE (
(Q_28617288.[Apparatus Assigned] In ("BAE1","BAE2")) 

AND Q_28617288_1.[Apparatus Assigned] In ("BAE1","BAE2") 
And Q_28617288.ID<[Q_28617288_1].ID 
AND (
	Q_28617288_1.[Dispatch Date] Between [Q_28617288].[Dispatch Date] And [Q_28617288].[Available Date]
	OR
	Q_28617288_1.[Available Date] Between [Q_28617288].[Dispatch Date] And [Q_28617288].[Available Date]
) 

)

ORDER BY Q_28617288.[Dispatch Date]

Open in new window

Yielded the following three row result:
Q_28617288.Inc Num	Q_28617288.Apparatus Assigned	Q_28617288.Dispatch Date	Q_28617288.Available Date	Q_28617288_1.Inc Num	Q_28617288_1.Apparatus Assigned	Q_28617288_1.Dispatch Date	Q_28617288_1.Available Date
BA13000092      	BAE1                        	1/2/13 17:07            	1/2/13 17:11            	BA13000092          	BAE2                        	1/2/13 17:07            	1/2/13 17:11
BA13000633       	BAE1                        	1/12/13 15:45            	1/12/13 15:53            	BA13000633          	BAE2                        	1/12/13 15:45            	1/12/13 15:47
BA13001653      	BAE1                        	1/31/13 11:44            	1/31/13 11:51            	BA13001653          	BAE2                        	1/31/13 11:44            	1/31/13 11:55

Open in new window

0
 
LVL 29

Expert Comment

by:gowflow
ID: 40612410
ok I am working for you on a different kind of solution that you will be able to apply for all your unit and that is flexible but I need sometime if you are interested.
Let me know
gowflow
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40612624
In this approach, I used a tally table (numbers table) to iterate the minutes in the time intervals.  In this example, the tallytable has values between 1 and 1000.  While I could have been as precise as one second in my iteration, performance and human readability (too many rows) were degraded.
SELECT DateAdd("n",[tallytable].[ID]-1,[dispatch date]) AS IntervalMin
FROM Q_28617288, tallytable
WHERE (((Q_28617288.[Apparatus Assigned]) In ("BAE1","BAE2")) AND ((DateAdd("n",[tallytable].[ID]-1,[dispatch date])) Between [dispatch date] And [available date]))
GROUP BY DateAdd("n",[tallytable].[ID]-1,[dispatch date])
HAVING (((Count(Q_28617288.[Apparatus Assigned]))=2));

Open in new window

The query produced the following results.
IntervalMin
1/2/2013 5:07:02 PM
1/2/2013 5:08:02 PM
1/2/2013 5:09:02 PM
1/2/2013 5:10:02 PM
1/12/2013 3:45:17 PM
1/12/2013 3:46:17 PM
1/31/2013 11:44:54 AM
1/31/2013 11:45:54 AM
1/31/2013 11:46:54 AM
1/31/2013 11:47:54 AM
1/31/2013 11:48:54 AM
1/31/2013 11:49:54 AM
1/31/2013 11:50:54 AM

Open in new window

From these results, we see that both vehicles were out between these times.
1/2/2013 5:07:02 PM and 1/2/2013 5:10:02 PM
1/12/2013 3:45:17 PM and 1/12/2013 3:46:17 PM
1/31/2013 11:44:54 AM and 1/31/2013 11:50:54 AM

It would be fairly easy to create a VBA routine that would create and execute a similar query, based on the size of the array passed to it.
Note: the BAM1,5,6 test returned no rows.

For more information on tally tables, read my EE article: http:A_5410.html
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40612628
If you wanted to use a table for your vehicle list, this result might not need any VBA code.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40612636
To illustrate this, I created a Q_28617288_Vehicles with a single text column for the vehicle name.  I populated the table with "BAE1" and "BAE2" row data.  The query can be simplified to the following and does not need to be created with VBA.
SELECT DateAdd("n",[tallytable].[ID]-1,[dispatch date]) AS IntervalMin
FROM Q_28617288, tallytable, Q_28617288_Vehicles
WHERE (((Q_28617288.[Apparatus Assigned]) = Q_28617288_Vehicles.VehicleName) 
AND ((DateAdd("n",[tallytable].[ID]-1,[dispatch date])) Between [dispatch date] And [available date]))
GROUP BY DateAdd("n",[tallytable].[ID]-1,[dispatch date])
HAVING (((Count(Q_28617288.[Apparatus Assigned]))=(Select Count(*) From Q_28617288_Vehicles)));

Open in new window

0
 
LVL 1

Author Comment

by:ssittig
ID: 40612656
GowFlow,

I'm certainly interested in any Excel solution that is really simple to deploy. That's a little easier for me to use on the fly.

This being said:

Aikimark,

Your results are exactly what I'm looking to do. I'm not surprised that you don't have any results for BAM1/BAM5/BAM6, we've had some deployment changes that have that scenario occurring quite a bit more frequently but we have been unable to quantify what that looks like. This is Jan 13 data, the Jan 15 data is quite a bit different for this combination.

And bravo to you sir, that's amazing work. I'll read your article that you posted and see if I can figure it out. I'm not going to close this question quite yet as I'd like to see what GowFlow comes up with and I may have some follow up questions on how to get your queries into my Access.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40612875
I looked at the max interval in the posted data and see it is less than 5000 seconds.  If you created a tally table with enough values to cover the worst/longest case dispatch, you could calculate with one second accuracy and not take too much of a performance hit.

I'm tempted to try to remove the seconds part of the datetime values, since we are at the minutes level of granularity.  I'll wait to see what your testing discovers.
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 29

Expert Comment

by:gowflow
ID: 40612972
Here is my solution but for sure it is more visual and can be built on further but this will give you a quick overview on what are the units that overlap others per day. Based on this data one could just create graph or other formulas that would be more cater to specific needs.

This is meant for a very fast overview and time is up or minus the 5 minutes. like I put intervals be 5 minutes and tried to reduce the length so it would minimize as much as possible scrolling.

Simply run the macro called DisplayHoursMinutes and it will build the new layout basis the data in sheet1. Make sure sheet result is always there. It will clean it at every run.

gowflow
January-2013-Fire-Call-Data.xlsm
0
 
LVL 29

Expert Comment

by:gowflow
ID: 40614010
This is an enhanced version whereas the unit dispatched will be colored in blue if it is simultaneously sent with an other unit you will see blue for those who time is overlapping the ones that are not blue means they were sent on time where no other unit during that specific day time overlapped.

Appreciate your comments.
gowflow
January-2013-Fire-Call-Data-V01.xlsm
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40614215
Here is a VBA routine to show the everyone-dispatched condition
Sub Q_28617288(parmVehicles)
    'parmVehicles should be an array of vehicle names
    Dim rng As Range
    Dim dicMinutes As Object
    Dim dicVehicles As Object
    Dim vItem As Variant
    Dim strTime As String
    Dim lngMins As Long
    Dim dtStart As Date, dtEnd As Date
    Set dicMinutes = CreateObject("scripting.dictionary")
    Set dicVehicles = CreateObject("scripting.dictionary")
    For Each vItem In parmVehicles
        If dicVehicles.exists(vItem) Then
        Else
            dicVehicles.Add vItem, 1
        End If
    Next
    
    ActiveSheet.UsedRange.Sort ActiveSheet.Range("V1"), xlAscending, header:=xlYes
    
    For Each rng In ActiveSheet.Range(ActiveSheet.Range("U1"), ActiveSheet.Range("U1").End(xlDown))
        If dicVehicles.exists(rng.Value) Then
            dtStart = DateAdd("s", -Second(rng.Offset(0, 1).Value), rng.Offset(0, 1).Value)
            dtEnd = DateAdd("s", -Second(rng.Offset(0, 4).Value), rng.Offset(0, 4).Value)
            For lngMins = 0 To DateDiff("n", dtStart, dtEnd)
                strTime = Format(DateAdd("n", lngMins, dtStart), "yymmddhhnn")
                If dicMinutes.exists(strTime) Then
                    dicMinutes(strTime) = dicMinutes(strTime) + 1
                Else
                    dicMinutes.Add strTime, 1
                End If
            Next
        End If
    Next
    'List the times (to the minute) when all vehicles were dispatched
    For Each vItem In dicMinutes
        If dicMinutes(vItem) = dicVehicles.Count Then
            Debug.Print Format(DateSerial(Mid(vItem, 1, 2), Mid(vItem, 3, 2), Mid(vItem, 5, 2)) + TimeSerial(Mid(vItem, 7, 2), Mid(vItem, 9, 2), 0), "m/d/yyyy hh:nn")
        End If
    Next

End Sub

Open in new window


When I invoke the routine like this:
Q_28617288 array("BAE1","BAE2")

It produces the following results in the Immediate window:
1/2/2013 17:07
1/2/2013 17:08
1/2/2013 17:09
1/2/2013 17:10
1/2/2013 17:11
1/5/2013 02:10
1/12/2013 15:45
1/12/2013 15:46
1/12/2013 15:47
1/12/2013 15:56
1/26/2013 17:24
1/31/2013 11:44
1/31/2013 11:45
1/31/2013 11:46
1/31/2013 11:47
1/31/2013 11:48
1/31/2013 11:49
1/31/2013 11:50
1/31/2013 11:51

Open in new window

0
 
LVL 1

Author Comment

by:ssittig
ID: 40615886
Gowflow,

I like the idea of your spreadsheet, but I'm not sure that its giving me an a count of how many time this is happening. And I don't think the 5 min intervals work for what we are doing.

Aikimark,

You clearly have the solution. I'll admit that I'm not able to follow the entire path of how you get from the data table I gave you, to where you are able to calculate the number of instances. But you have exactly what I'm looking for. Would you mind sharing the DB you made so I can see if I can mentally reverse engineer it? I'm an IT guy, not a DB/programmer guy and I've learned just how amazingly far apart those two things are over the years.
0
 
LVL 29

Assisted Solution

by:gowflow
gowflow earned 250 total points
ID: 40616101
Well I already mentioned it this is not final this is a start this is more visual and this is not specific to certain units but gives an total overview of the whole data.

now if you want specifics we can build on that by adding a dropdown and combining more VBA to reach to what you specifically need. The +/- 5min is for visuals but not for specifics.

You had mentioned 1 scenario of BAM1,BAM5 and BAM6 but you had also mentioned couple more scenarios meaning that his was variable which for sure need catering for.

Here you go !!
I like challenges.

In this version I added a listbox in sheet result that will populate all the distinct units and you have 2 buttons
Button Clear:
Will clear selections that you do in a list box you can select several items to process if they follow press shift and select them if they don't follow keep pressing CTRL on every item you select.

Button Compute
When activate it will compute the data you selected in the list and here you have 2 options to display the data the check box that says exact selection if it is ticked then it will give you ONLY the instances of all items selected if found in sheet compute. If it is not selected then it will display all options at least where there are 2 units that overlap or more.

Pls play with it and let me know.
gowlfow
January-2013-Fire-Call-Data-V02.xlsm
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40616360
@ssittig

Have you tried my VBA code example?
Does it give you the correct results?

Do you want to see the database just for learning purposes?
0
 
LVL 45

Accepted Solution

by:
aikimark earned 250 total points
ID: 40616391
Here is a database with the tables and revised query.
Q-28617288.mdb
0
 
LVL 1

Author Closing Comment

by:ssittig
ID: 40617815
Superb work.

You both have come up with great solutions for what I'm trying to do here. I'm still working on how to automate a count of those numbers of times. I'd love to award you both full points, but I'll have to settle for splitting them. Truly amazing work, and a nice foundation for me to learn how to use this stuff. Thank you.
0
 
LVL 29

Expert Comment

by:gowflow
ID: 40618365
Good lad we could help.

As far as your comment:
I'm still working on how to automate a count of those numbers of times.

What do you mean by count as number of times they all unavailable at the same time ? If in the solution I provided it is an easy formula to add at the end. Case you need this pls advise I will incorporate it or else then pls clarify.

gowflow
0
 
LVL 1

Author Comment

by:ssittig
ID: 40618415
I mean a number of instances that it occurs.

For example when I chose BAE1 and BAE2 on your spreadhsheet and hit compute, I have 3 results on the compute page where this phenomenon is occurring.

-S
0
 
LVL 29

Expert Comment

by:gowflow
ID: 40618445
ok got it. Easy try this version will give you what you want.
gowflow
January-2013-Fire-Call-Data-V03.xlsm
0
 
LVL 1

Author Comment

by:ssittig
ID: 40618510
gowflow,

That's totally what I'm looking for. I'd like to ask if its possible for one additional refinement:

Attached is a larger subset of data. If you pick a combination of three units (lets just say BAM1, BAM5, BAM6), it will return results where all three are out, but also results where you have BAM6 out and then BAM5 out twice concurrently. (run this dataset against your spreadsheet and then see 1/11/2013 in the results).

How can we restrict the results to only show when the selected units are all out together?
2013-January-Unabridged.xlsx
0
 
LVL 29

Expert Comment

by:gowflow
ID: 40618725
Yes I can see that but looking at the data they are legitimate. Ie it is not a fault but you now are saying that you only want to see the instance of each unit per day once ? or I am not reading you correctly ?

Lets take the examples:
BAM1      1/10/2013 10:27      1/10/2013 12:19
BAM5      1/10/2013 11:51      1/10/2013 11:56
BAM5      1/10/2013 11:56      1/10/2013 12:09

the second dispatch of BAM5 is still in the same slot as you can see BAM1 avail only at 12:19 and BAM5 out on 11:56 still legitimate as in 12:09 and still within the limit.
So what do you want to see here ? this one rejected coz BAM6 is not there I suppose right ?

This I would tend to agree that we need to look to make sure all selected items are out at least once. This would be ok with me to fix,

Now here comes the other scenario
BAM6      1/15/2013 18:45      1/15/2013 20:14
BAM5      1/15/2013 18:55      1/15/2013 19:07
BAM5      1/15/2013 19:14      1/15/2013 19:22
BAM1      1/15/2013 19:17      1/15/2013 19:31

what do you want to see here ? is it ok like this to show twice BAM5 as also legitimate here as well.

Let me know.
gowflow
0
 
LVL 29

Expert Comment

by:gowflow
ID: 40618876
This version has been adjusted as found that there was a small hick-up in the previous as the date were wrongly compared.

Still the issue described in my previous post is not yet handled as waiting for your comments. Meanwhile you can see now that the results for BAM1, BAM5, BAM6 are now more precise (you had 29 occurrences now you have 45)

Please advise re my comment to finalize.

Visuals have been adjusted to prevent resizing of controls as well.
gowlfow
January-2013-Fire-Call-Data-V04.xlsm
0
 
LVL 1

Author Comment

by:ssittig
ID: 40620351
gowflow.

You are correct with your assumptions. Its OK to have one vehicle listed twice, but not having all of the selected vehicles would be grounds for that not to be an occurrence. I suspect there will be more than one occurrence a day for many of days, so I wouldn't want to limit it to once per day.
0
 
LVL 29

Expert Comment

by:gowflow
ID: 40620688
ok great then we agree if we choose a selection of say 4 units and ask for all items check box then we will only list items where there are at least the 4 units (regardless if same unit is repeated twice in the time frame selected)

Prior comment I meant time frame selected and not same day actually it work by time frame.

I think you will like this version !!!
Let me know.
gowflow
January-2013-Fire-Call-Data-V05.xlsm
0
 
LVL 1

Author Comment

by:ssittig
ID: 40622135
Thats exactly right. Fantastic work gowflow. You are a true genius!

THANK YOU!
0
 
LVL 29

Expert Comment

by:gowflow
ID: 40622713
Your welcome and glad I old help.
Please feel free to put a link in here for any other question you may need help with.

BTW I was impressed with the figures when running BAM1,BAM5,BAM6 so many instances when same units go back and forth in the same time frame !!!

gowflow
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Learn more about how the humble email signature can be used as more than just an electronic business card. When used correctly, a signature can easily be tailored for different purposes by different departments within an organization.
Find out what Office 365 Transport Rules are, how they work and their limitations managing Office 365 signatures.
This Micro Tutorial will demonstrate how to use longer labels with horizontal bar charts instead of the vertical column chart.
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now