asked on
How to create (and prepare the data for) a Sankey diagram
I deal in the basics of an ERP system, we take in data, and produce reports, lists, lists, lists. Mostly operational type reports. Every now and again we will throw up a pie chart or bar chart and everyone is like blown away because it’s colorful …
The Dean now wants a Sankey diagram for changes to Academic Majors – so this is a flow of initial selection of major to final graduation major. Most will stick with their original choice.
I see on Google it’s a flow style chart (https://developers.google.com/chart/interactive/docs/gallery/sankey), like the classic steam engine measurement, and even Napoleon’s invasion of Russia (https://en.wikipedia.org/wiki/Sankey_diagram).
I’m looking for tips on how to produce this. Excel looks like it’s up to the task (https://mychartguide.com/how-to-draw-sankey-diagram-in-excel/?unapproved=1719&moderation-hash=f3ea3edb766fc2dd17971a864b195541#comment-1719).
If Excel is up to the task, then my challenge is preparing the data. Right now I have an audit trail of prior major, newly selected major, and Student Id. So at the core it’s three fields: Student Id, Old Major, newly selected major, plus of course Race-Ethn and gender.
I have no idea if this data structure will work, but I’m thinking the software could wind it’s way through Student Ids and Majors for a flow line, showing original majors on the left and final majors on the right.
Most people actually stick with their original major. Some however, change 2-4 times.
So just looking for some tips on software and data prep
ASKER
I put the data in Pivot tables and that worked great ... sample here:
EAS | ECE | EEE | EGE | EME | ENA | |
EAS | 36 | 5 | 1 | |||
ECE | 2 | 8 | ||||
EEE | 2 | 8 | 9 | |||
EGE | 4 | |||||
EME | 2 | 1 | 76 | |||
ENA | 1 | 4 |
So I was happy to get it done technically ... the numbers above are real, so they're really too small to gauge anything, and the numbers actually give you more of a picture than the Sankey (imo)
As far as the Race-Ethn, in the USA there are 5 "races":
- Black/Afr-American
- White (technically this is not a race, it's Caucasian which cover some people of color [Yes]
- Asian
- Native Hawaiian/Other Pacific Islander
- Native American / Alaska Native
Doesn't really account for the rest of the world ...
But thank you!
ASKER
here's my raw data ... can I get this to a Sankey data flow?