Link to home
Start Free TrialLog in
Avatar of richsark
richsarkFlag for United States of America

asked on

re-structure a csv file and output a new file

I have a CVS file with thousands of lines and columns. I need to focus on column AM named or any column named  "EA-domain_list" in that row, I have several domains that are separated by commas in there.

Is there a way via perl, or some type of script that will feed this spreadsheet named master.csv in and take all the EA-domain_list that are in commas, and make new columns for each domain. Basically for example in summary

you look in the attached spreadsheet and look an AM you see mobi.sark.net, ne.sark.net,sark123.test.com I need what ever is in there to strip each domain into a new column like AS would be called EA-domain_list an in there would have ne.sark.net, the another one row AT would have sark123.test.com. Keep in mind some times there are 20 of  then in there. So then I would like to spit out a new file called adjusted_domain_list.csv

Also, I am sorry I didn't mention previously that I am using a Macbook
Example-Master.csv
Avatar of Bill Prew
Bill Prew

In the new adjusted_domain_list.csv file, all the other columns stay the same and are output as well?


»bp
That really isn't a properly formatted CSV (more than one header)
Is this supposed to be tab separated ?
Avatar of richsark

ASKER

Yeah. It’s comma separated. It’s a big file. Over 900.000 lines.  There are many headers through out. Is there something you need for me to explain more ?
Also to comment on Bill. The output on the EA section can be anywhere on the outputed file. The key is the EA name.
Can you provide a sample of the output you want for the example input file you posted earlier?

And what you are saying is don't look in column 39 (AM) for the data to split, but rather look at any header that is found looking for the column labeled "EA-domain_list" and split the data in that to multiple rows?


»bp
A CSV has 1 header and several data lines
You can't use comma's in the data unless surrounded by quotation marks
The fields with embeded double quotes or commas do seem to be handled properly David, surounded by double quotes, and any embedded double quotes "escapes" by doubling them up.

"NPC*MLAN*BYOD*Wellesley BYOD \"" Wireless*vl1191"
"167.222.96.234,167.222.96.188,160.254.96.234,160.254.96.212"


»bp
Hi bill. Yes. Anything with EA- domain_list needs to be split out into separate EA- domain_list Colums
Can you provide a sample of the output you want for the example input file you posted earlier?


»bp
Hi richsark,

I agree with Bill - it would be helpful to have the output file you would expect from the input data.
Could you please provide new sample input data with at least 12 rows instead of just 6, to give us a better idea of how more data will look.  I'm also not completely clear on how the new headings should look exactly, especially where different "Network" rows have different numbers of items in the "EA-domain_list" column to split, so please provide test data with different numbers of items in that column.

And in future, I suggest that unless you're asking a very simple question, or one which doesn't involve data, you provide sample input & expected output data in your first post.  That may take you a bit more time in the first place, but will quite likely save you and experts (who are spending their time trying to help you for free), a lot of time with questions and answers, and helps us to know when we've got our solution working, which means you probably get a much faster answer.  Make it easy for us and we'll probably be able to make it easy for you.

Questions from me:
Q1. Just to clarify your answer to one of Bill's questions, are you saying that it is definitely NOT good enough to simply look for column 39 (AM), because the position of "EA-domain_list" could change in future, and you don't want to have to change the script as a result?
Q2. There's definitely going to be only 1 column with the "EA-domain_list" heading, right?
Q3. You say there will be multiple headers throughout, so for a given input file, will the "EA-domain_list" column always be in the same column number?
Helo guys, sorry for the delay.

I have attached two files. One showing the master, the second file is what I expect it to be.

Notice the EA-domain_list section wast simply made into each individual EA=domain_list column no matter how many it had listed separated by a commas
Example-Master-2.csv
Example-Master-2-what-file-should_lo.csv
ASKER CERTIFIED SOLUTION
Avatar of Bill Prew
Bill Prew

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hello bill. Thanks for that vbs. May I ask what do you mean I have to add those manually later?
I don’t suppose a bash or perl can be used too?  Just a thought
Hi richsark,

I might write a Perl solution for you, if you answer my clarification questions.

Your sample output data has ne.sark.net & sark123.test.com in columns AO & AP, but your original post says:
"I need what ever is in there to strip each domain into a new column like AS would be called EA-domain_list an in there would have ne.sark.net, the another one row AT would have sark123.test.com."
Q4. So do you want to have these split values go into columns starting at AS as per your description, or starting at AO as per your sample output data?  If it's AS, please correct your sample output data and repost it.

Please also answer Q1 - Q3 from my post above.

Thanks.
tel2
I mean you will have to edit the header lines if you want to add the extra EA-domain_list column headers for the added columns.

Run mt script and look at the output, you'll see what I mean.


»bp
Hello bill. Just one clarification. If I use a Mac, will the syntax change or procedures?

And tel2 I’ll respond to you shortly. I did oost a new sample.  To clarify, I’ll get back to you ASAP.
If I use a Mac, will the syntax change or procedures?

My solution involved a VBS script based on the Windows Scripting Engine, which does not exist on a Mac.  So it will not run there.


»bp
Hi richsark,

What happened above with Bill shows the importance of specifying your operating system up front (i.e. Mac), and acceptable programming languages.  Yes, Bill could have asked if VBS would be OK for you, but technically he shouldn't have had to, because you asked "Is there a way via perl, or some type of script...", and VB Script, is a "type of script".
A few extra seconds from you when asking your question can save hours for people who are volunteering their time to try to help you.  Writing a script as long and complex as Bill's takes a lot of time.  And as you can see above, that would have saved you more time later, because you wouldn't have had to communicate back and forth with Bill later.  So it would be win-win for everyone involved.
So please bear that in mind when asking future questions.

Looking forward to your answers to my questions and new output sample.

Thanks.
tel2
Hello Bill And Tel2, Bill I am sorry I didn't mention my oS type. I am sorry you may have waisted your time. I am deeply sorry.

Tel2, I have uploaded 2 new samples. My goal is to take the Example-Master-2.csv( type network which is named Header-Network from A4)  file and take however the length and amount of domain names listed as commas of whats in Column AM which is named EA-domain_list and spit out a new file that basically takes each and every one of the EA-domain_list and make a new column for each domain name. For me as long as its labeled as EA-domain_list it can reside at the very end of a row. I don't have to squeeze it in the middle, as long as each domain is moved to a separate column named EA-domain_list for type Header-Network and under network as seen in A1 and A4 from the Example-Master-2.csv file

I hope this makes sense. If not, I can try better.

Again, I appreciate everyones help on this. I apologize for not mentioning I am using a Macbook.
Example-Master-2.csv
Example-Master-2-what-file-should_lo.csv
Not a problem, I should have picked up on it from the topics you tagged in the original question post, so I certainly could have / should have asked.  No big deal, it happens.

I don't do much with Perl (and I don't think AWK handles quoted fields in delimited files well at all) so I'll leave you in tel2's hands, good luck.


»bp
Hi richsark,

Thanks for the info.

Looking at the latest sample input & output data you've supplied, I'm confused, because:
- Cell AM12 in the input contains: "mobi.sark.net, ne.sark.net,sark123.test.com,ns1.am1.com,ns2.amm.com"
- Cell AM12 in the output has: "mob-34-yt1i.sark.net"
Where did "mob-34-yt1i.sark.net" come from?  I was expecting "mobi.sark.net".
I see similar problems for cells AM16, AM17 & AM18.  There may be many more issues, but I haven't taken the time to look.
Or have I misunderstood something?
Please take the time to carefully provide appropriate output data for the input data you've supplied.

Also, I know I've only asked a few times, but I'm still waiting for answers to my questions Q1 - Q3 in post 42567850.

Thanks.
Hello Tel2,
I have modified the input and output csv files. The file has many columns hence trying to simulate what I am looking for in the output.

Basically, the input file has "EA-domain_list" all together separated with a comma. The output I am seeking is to take them and just put them in separate columns labeled as "EA-domain_list"  The EA-domain_list" can be moved anywhere at the end or in the middle. as long as the named "EA-domain_list" is named that way. to make it easier, perhaps at the very end on the column like i showed in my example "Example-Master-2.1-what-file-should_look-like"

As far as your answer,
The column on the input file may move around depending if someone adds any new options, or data in the system. So perhaps make it flotable? The key is always "EA-domain_list" where ever that is located on the master, is where we need to separate them on the output into separate columns .

I hope this helps and I am sorry for the late response.  I truly appreciate everyone's help
Example-Master-2.1.csv
Example-Master-2.1-what-file-should_.csv
Hi richsark,

Thanks for the updated files.  More questions:

Q5. In the output file, are cells AO15 to AR15 intentionally empty, or should they contain "EA_domain-list" headings?  If they should be empty, please explain why.  If they should contain headings, please provide a corrected output file and be careful to get everything as you want it, so we don't have to do this again.

Q6. I doubt this matters to my mission, but are you aware that some of the other columns seem to have data in the wrong place?  For example, column R (lease_time) usually contains 5 digit numbers like 14400, but in rows 5 & 6 that column is blank and 14400 appears in column T or U.  Same kind of thing with column AG & AH for rows 5 & 6.

Thanks for the answer to Q1.  Please also answer Q2 & Q3.  (Q2 relates to the input file, not the output file.)

Please number your answers to Q2, Q3, Q5 & Q6 accordingly, for clarity.

Please see if you can get all the above sorted out in one go, because each time I discover unanswered questions or problems in the data, it can cost me 30 mins or more to analyse it and post feedback.  I could make assumptions re the outstanding issues, but I see no need to, because you know the answers, and I could also end up assuming wrong and wasting more time.

Thanks.
Hello tel2,

My answers are inline

Q5. In the output file, are cells AO15 to AR15 intentionally empty, or should they contain "EA_domain-list" headings?  If they should be empty, please explain why.  If they should contain headings, please provide a corrected output file and be careful to get everything as you want it, so we don't have to do this again.

A: so how this exported spreadsheet works is you have a type of import header. For example “Header-Network” which is a type of data that belongs to settings in type “Network” same as “header-dhcp” which has settings for “dhcprange” the type of stuff I’m looking to split is in "EA_domain-list" which is found under “header-network” when I import this stuff in my system, all the dhcp ranges it creates a domain list specifiing all the domain lists for that range. So when dhcp is given to an end user and you type ipconfig -all you see all the domain you can search for when using short names.  Maybe too much details but I thought maybe you would benefit from it.


Q6. I doubt this matters to my mission, but are you aware that some of the other columns seem to have data in the wrong place?  For example, column R (lease_time) usually contains 5 digit numbers like 14400, but in rows 5 & 6 that column is blank and 14400 appears in column T or U.  Same kind of thing with column AG & AH for rows 5 & 6.

A: that’s ok. The lease time seconds is how it export, it will be ok.


Please number your answers to Q2, Q3, Q5 & Q6 accordingly, for clarity.

Q2. There's definitely going to be only 1 column with the "EA-domain_list" heading, right?

A: yes. For the input file there will be always one "EA-domain_list" which will be all the domains listed sperated with commas.


Q3. You say there will be multiple headers throughout, so for a given input file, will the "EA-domain_list" column always be in the same column number?
A: it depends, more options exported for that type, it will move "EA-domain_list" to a different column.


All in all all we need to figure out is to take all "EA-domain_list" in the in the master file that are all bunched up in one colum with commas and split then out. I can tell you that in my file that showes what it shirked look like. The “EA-domain_list" under header-network can be written anywhere. For simplicity sake, I just thought we add it to the very end of each row for ease of review and identification.

Thanks again for your help.
Hi richsark,

Thanks for your answers.

Re your answer to Q5, I'm not sure you understand me.  Allow me to rephrase.
In the last output file you supplied, there are no new "EA-domain_list" headings in cells AO15 to AR15 to go above the split up domain names.  Why not?
- If this is an error, please provide a corrected output file.  I expect this most likely is an error, but I don't want to have to assume that kind of thing since you know the facts.
- If this is not an error, I need to get a good understanding of the rules if I'm going to have any chance of programming it.  I don't think I need you to re-explain the rules you've already explained several times.  I need you to answer the above question.
Now please go and re-read my original Q5 for extra understanding.
Do you understand what I meant in my post above, richsark?
Hello Tel2,

Yes you are right, My file was not prepared right. Have a look a my new enhanced version of what the out should look like.

I hope this sheds better light
Allnetworks-Ranges-2.7.csv
Hi richsark,

Weeks ago I did most of the programming for your original specifications.  That took me hours, and now I’m regretting that.  I didn't finalise or post my code because there were some unclear requirements, which I spent more hours communicating with you about, and rechecking your various versions of sample data, none of which corresponded with your written specifications (as you seem to have agreed with in your last post where you said "Yes you are right, My file was not prepared right").  I also had to ask you several times to answer my clearly numbered questions.

Now you've provided a new file, which has these problems:
a) It seems to be quite different from the previous versions.  For example, the previous versions had "Header-Network" rows which included a "EA-domain_list" column.  This version has "header-network" (lower case) rows which don't include any "EA-domain_list" columns.
b) Where is the corresponding input data?  The old input data doesn't correspond to this format.
c) It no longer seems to line up with the specifications you previously provided, and you haven't provided any new specifications.

So you're giving us volunteers a near impossible task.  The programming is hard enough without having to try to guess what you could possibly mean.  You should be taking the time to make your question as clear as possible, with:
i) Clear specifications.
ii) Correct sample input data which corresponds to those specifications.
iii) Correct sample output data which corresponds to the input data and the specifications.
So that experts don't have to waste time guessing what you mean.  Yes, we can assume/guess things, but no we shouldn't have to because it's a waste of time which we are not getting paid for.

Where to from here?  Please answer these questions:
Q7: Do you have any interest in getting my solution based on your previous requirements?  If so, please provide the expected output data which I requested in my 2nd-to-last post?
Q8: Do you want a solution based on the latest sample data you've provided?  If so, I suggest you open a new question and provide everything I've listed above (i, ii & iii), up front this time, and make sure you specify your operating system and acceptable programming languages.  Maybe someone else will do it for you.

Thanks.
tel2
Hi tel2. I’m sorry. I never want to seem like I taking advantage of you or anyone. I value your time spent on this. My mistake was that I took a new file and made it ) I thought simpler for you guys. Please give me another chance to correct this. If it means anything now, whatever the input file looks like im just focused in the EA-domain_list from any input file. The cell has the EA-domain_list separated by commas. It’s usually just on the dhcp-range lines where I need them separated into there own cell labeled as EA-domain_list.

However I know it’s late.  I can’t sleep even at 3:30am. Going to hospital to get stitches on my right hand wasn’t what I had planned.

Please allow me to fix this if the above (I hope helps for now)

 I take off my hat to you for putting up with me. I value your time and patients greatly
OK richsark.

Please answer Q7 & Q8 when you are ready.  In case it's not clear, Q7 refers to you providing expected output data which corresponds to your Example-Master-2.1.csv input data & specifications.  I don't think I'll need to hear the same specifications again, so you can focus on getting the expected output data right.
Hello Tel2, please see my reply to your questions:
Q7: Do you have any interest in getting my solution based on your previous requirements?  If so, please provide the expected output data which I requested in my 2nd-to-last post?

I am very interested in getting this script. Attached is the corrected output based on my input. The Example Master

Q8: Do you want a solution based on the latest sample data you've provided?  If so, I suggest you open a new question and provide everything I've listed above (i, ii & iii), up front this time, and make sure you specify your operating system and acceptable programming languages.  Maybe someone else will do it for you.

Yes, I would like the script done on the original csv. I have attached it again with corrections on the output logic. I admit, I have fat figured the output not matching the input contents correctly. I have corrected it so it now matches the master input.

Thank you agian
Example-Master-2.csv
Example-Master-2-what-file-should_lo.csv
Hi richsark,

> "Yes, I would like the script done on the original csv."
I wasn't referring to the "original" csv, richsark.  I said:
"Q8: Do you want a solution based on the latest sample data you've provided?"  That latest sample (Allnetworks-Ranges-2.7.csv) was quite different from the original, as I've pointed out.  Please answer Q8 again with that in mind.

However, thanks for providing new sample input & output data.  Unfortunately, I can't see how to get from the input to the output based on the specifications you've provided.

Example 1: Cell AM9
- Input: "mobi.sark.net, ne.sark.net,sark123.test.com"
- Output: Where did ns1.am1.com come from in AQ9?

Example 2: Cells AO15 - AR15
- Why are there no "EA-domain_list" headings?  Is that intentional?  If so, how does it relate to your specifications and what is the logic?

Example 3: Cell AM12
- Input: "mobi.sark.net, ne.sark.net,sark123.test.com,ns1.am1.com,ns2.amm.com"
- Output: mob-34-yt1i.sark.net
Where did the "-34-yt1i" part come from?

I've raised some of the above kinds of issues before, but they're still occurring.

The above are just examples.  There are more errors, so please don't waste more of my time and yours by just fixing those 3 and assuming everything's correct.

As mentioned before, each time you give me mismatching data costs me another 30+ mins for checking and responding.  So if you really do value my time, all you have to do is take a copy of your input data, carefully manually change it to make your output data according to your specifications, and upload it here.  Should only take 5-10 mins, which is a lot less than the time it takes me to deal with these mismatches.  If you can't do it by yourself, I suggest you get someone else to help you.  If your next attempt still has errors then I'm sorry but I might just have to abandon this, because this could go on forever and is taking too much of my time...and even some of yours.
Hello Tel2,
Let me try to make this more simple to help with this. I was trying to create an output file manually to show what I am expecting. I guess my matching of words is not the best.  Can we look at it on a different angle.

The header "EA-domain_list" is a trigger in my import utility "Hence the words" EA, I can have any EA name brought in as long as I define the word in my product before I import it. I can have EA-richsark if I wanted too, as long as I pre define it as type list. It can be type IP-Address etc... there lots of options.  In my case its type list.

All I am searching for is to take my master file row that has EA-domain_list with all the different domains in one cell separated by commas  mobi.sark.net, ne.sark.net,sark123.test.com
to make a new home for them as separate row and column for each of the domains. The modification can be anywhere on the new output csv file. As long as the word EA is there we can insert them anywhere in the new file.

rather then me trying to create a sample output and messing up with the original. lets just please focues on the master file where "EA-domain_list" for "Network" rows that have how every many domain names separated by commas to have each EA-domain_list in commas to have a new row and column labeled as "EA-domain_list" and domain name, then same with the next comma and so on and so fourth.
Hi, Just wanted to touch base on my above comment to simplify it better and to reduce inconsistency
Hi richsark,

I am happy to focus on all the requirements of your original request.  It ain't rocket science.

However, I think that if you are not willing & able to provide expected output data which shows correctly & exactly what the program should produce, based on specifications and sample input you've provided, then why should I put any more effort into this?  We're talking about minutes for you, and hours for me, and I've already spent enough hours.
Getting those 3 things correct (i.e. spec's, sample input data, expected output data) are basic disciplines in making your requirements clear, so it's easier for programmers to meet your needs quickly and efficiently (they can usually tell when they have met them because their output matches yours), and I think this is the least you can do considering the fact that we are volunteering out time.  The above posts have not been quick or efficient, and I probably should have abandoned this long ago, but I'm thinking long term and hoping that if you eventually provide what has been asked for, twice by Bill, and many times by me, then hopefully you'll start to get into that habit and make life easier and more efficient for all those involved in future (including yourself).  From my experience, most askers provide expected (correct) output data either in their original post or on request.

But I'd like to understand your side of this, so here are some questions for you:
Q9. What is stopping you from spending maybe 5-10 mins providing correct expected output data?  Even if it takes you an hour, to properly check it, it's a tiny fraction of the time I've spent so far.
Q10. "Where did the "-34-yt1i" part come from?" in cell AM12 of the last output data you provided?  I didn't see it in your input data, and if you're using the method I suggested, this kind of text should not just be appearing.
Q11. Do you understand the method I gave you in my last post for providing expected output data, i.e. "...all you have to do is take a copy of your input data, carefully manually change it to make your output data according to your specifications, and upload it here."?

I await your answers to questions Q9 - Q11.

Thanks.
tel2
Hello Tel2, The output file that I was trying to simulate had just too many mistakes trying to get it right. For me, it was the not working right with the many mistakes and details. I figured that the output file will be the same as the input file except to have the EA-domain_list put into different columns. honestly thats the only change needed.

Q10. "Where did the "-34-yt1i" part come from?" in cell AM12 of the last output data you provided?  I didn't see it in your input data, and if you're using the method I suggested, this kind of text should not just be appearing.

A: It was fat fingered during the creation of the output file. my issue is attention to detail. However, I would just like to re-due the master file on. the EA-domain_list" where all the contents in the cell that a separated by a comma, to have then separated into there own "EA-domain_list" row and column for each given domain
Hi richsark,

If you are unable to provide specifications, sample input data and expected output data, which all accurately correspond to each other, then that could slow you, and programmers who are doing work for you, down a lot, now and in the future.  It sure has in this case.  If you really can't do it, then as previously suggested, I suggest you get someone else at your end (who has attention to detail) to do so, and check their results to make sure they're doing what you want.  2 pairs of eyes are better than one.  Or get such a person to check the sample output data that you have prepared.  Someone has to be able to give attention to detail when asking a programmer to do a job, because programmers are working with details, and you will presumably want every detail of the output to be correct.

Q12. If you really can't tell whether the expected output that you've manually created is wrong before you post it here, then how are you going to tell if the output from my program is wrong?

Yes, I think I understand the requirements, but after spending probably 8+ hours on this, I think it's better to get you to meet the normal requirements for a programming task like this, because the right habit has got to start sometime, and I'm hoping that will be now, and this should save a lot of people a lot of time in future, including yourself, if you continue with that habit.

So at the risk of sounding like a broken record, please provide expected output data which corresponds to your Example-Master-2.1.csv input data & specifications.  I don't think I'll need to hear the same specifications again, so you can focus on getting the expected output data right.

Please answer Q12 above, also.

Thanks.
tel2
Hi Tel2,
I took my time and made the corrections on both files. For my sanity, I renamed the two files to keep myself straight.  You were right it did not make sence, I loaded the Example-Master-2.1.1-what-file-should_look-like just to make sure its right. And it loaded :)

See (2) attached and it looks correct too.

Thanks for bearing with me

/Rich
Example-Master-2.1.1.csv
Example-Master-2.1.1-what-file-shoul.csv
Still doesn't look right, richsark.

There are differences between the 2 files which don't even relate to the specifications you've provided, e.g.
- Cell C6: Input = "BN-Rich".  Output = blank.
- Cell D6: Input = blank.  Output = "BN".

Questions:
Q13. Is my program meant to make the above changes?
Q14. How & why did you make those changes?  I thought the only columns that would be changed were far to the right (e.g. AM).
Q15. Did you use the method I suggested for creating the sample output data?
Q16. And did you get someone else to check your work?
Q17. Are you preparing your sample output data in a spreadsheet, or what?

There are a lot more errors of the types I have mentioned in previous posts, relating to columns AN, etc.  I won't waste more time trying to list them.

Please answer Q13 - Q17.

tel2
Hello Tel2,

My Answers are inine

here are differences between the 2 files which don't even relate to the specifications you've provided, e.g.
- Cell C6: Input = "BN-Rich".  Output = blank.
- Cell D6: Input = blank.  Output = "BN".

A: My mistake. Input C6 and Output D6 should of been "BN"

Questions:
Q13. Is my program meant to make the above changes?
No, just the EA-domain_list part

Q14. How & why did you do that?

Q15. Did you use the method I suggested for creating the sample output data?
I copy and pasted the input and output and I meant to extend the EA-domain_list row onto separate columns

Q16. And did you get someone else to check your work?
Yes, I had my intern check it

Q17. Are you preparing your sample output data in a spreadsheet, or what?

Yes, the product spits it into a csv file, I then massage it and import it back into the product
Hi richsark.

What is the answer to Q14, which is:
"Q14. How & why did you make those changes?   [ I was referring to cell C6 & D6, which are in an area of the spreadsheet which you should not be coming anywhere near to.]  I thought the only columns that would be changed were far to the right (e.g. AM)."

Q18. Are you using the method I suggested twice, which is "all you have to do is take a copy of your input data, carefully manually change it to make your output data according to your specifications, and upload it here."?  You don't need to copy & paste the spreadsheet contents.  Just copy the entire CSV file.

Q19. In previous versions you were putting the split up domains to the right of the EA-subnet_org column.  In this latest version you've put them to the left.  Are you intentionally changing the requirements?

Q20. Can you see the problems in columns AN, etc, when you open the input and output data at the same time in 2 different windows, and compare them?
Hello Tel2,

I see below answers:

What is the answer to Q14, which is:
"Q14. How & why did you make those changes?   [ I was referring to cell C6 & D6, which are in an area of the spreadsheet which you should not be coming anywhere near to.]  I thought the only columns that would be changed were far to the right (e.g. AM)."

A: actually the keywords "EA*" can float anywhere, it does not need to be in a specific row. Keep note, that if any changes or updates done to the system will produce different output. Hence if we can have the script only key in to EA-domain_list and just break that up would be key.  Everything else stays where it is.

Q18. Are you using the method I suggested twice, which is "all you have to do is take a copy of your input data, carefully manually change it to make your output data according to your specifications, and upload it here."?  You don't need to copy & paste the spreadsheet contents.  Just copy the entire CSV file.

A:  I understand.

Q19. In previous versions you were putting the split up domains to the right of the EA-subnet_org column.  In this latest version you've put them to the left.  Are you intentionally changing the requirements?

A: as mentioned above the words "EA*" can float anywhere on the spreadsheet. That makes it easier to spit it at the far left if needed.

Q20. Can you see the problems in columns AN, etc, when you open the input and output data at the same time in 2 different windows, and compare them?

Yes, I had to shift it to insert the different EA-domain_list into there own row and column
Hi richsark.

Thanks for your answers, but unfortunately they raise more questions than they answer.


'A: actually the keywords "EA*" can float anywhere, it does not need to be in a specific row. Keep note, that if any changes or updates done to the system will produce different output. Hence if we can have the script only key in to EA-domain_list and just break that up would be key.  Everything else stays where it is.'

I  know about the floating EA* thing, but that's not my point in Q14.  I'm trying to understand how and why you could possibly be changing cells C6 & D6, between the input & output files.  If we don't understand that, it could well continue to happen, which could make this process even longer.  Your initial answer was 'My mistake. Input C6 and Output D6 should of been "BN"', but that doesn't tell me how and why it's happening.  Please answer Q14 again with that in mind.  How and why?


'A:  I understand.'

This does not answer my question Q18.  The question is "Are you using the method I suggested twice...?".  What is the answer, please?  (I'm trying to understand how all these errors could possibly be creeping in.)


'A: as mentioned above the words "EA*" can float anywhere on the spreadsheet. That makes it easier to spit it at the far left if needed.'

I don't see how the fact that it can float is relevant to my question Q19.  Previous versions of the sample output had the "EA-subnet_org" column between the original "EA-domain_list" column and the added "EA-domain_list" columns.
Q19a. How many columns should there be between the first (original) EA-domain_list and the new split up ones?
Q19b. Why do you have no columns between them now, but 1 column in previous versions?


'Yes, I had to shift it to insert the different EA-domain_list into there own row and column

I don't think that answers my Q20.  I'm meaning errors like missing EA-domain_list headings (which should have been added, e.g. cell AP7), and missing info under some of the new headings (e.g. cell AN10).  Do you see the problems?


Q21. How many more attempts do you think it will take you to give me accurate expected output data which you would actually want from a program which processes your sample input data?


Please answer questions Q14, Q18, Q19, Q19a, Q19b, Q20 again.


The above has taken over 30 mins more to write.  So we're probably up to about 9 hours of my time now.  I'm not sure how long I can afford to continue with simply getting accurate expected output data.

Thanks.
tel2
Hi  there,

It seems were going about it in loops. Can we simplify this a different way please. Lets adapt it to where the input file can be anything in terms of data and rows and columns. The product depending on what options are used will always increase the column and rows. Hence, if another DHCP option is added will start to shift the whole file over. Same with adding more networks and or dhcp ranges or even more EA* stuff. The script should not care what the input file looks like. It needs to only look for the EA-domain_list and split it up into separate columns for each domain. I am sorry I cant seem to copy and paste the info, however the data I am providing is sorta masking the actual names and IP addresses. Hence the reason the copy and paste tactic is not working well.

In summery, the script should be adaptable to any input, and spit out an output with just an extended row and column for EA-domain_list.

I also really appreciate your attention to details and effort. I would say... let me run it on my file and see if it spits it out how I need it rather then trying to create a duplicate of what it should look like. Basically, all we are really doing is adding several more rows and columns with all the different domain names that were in the input file in one column separated with a comma.

I look forward in running your script.
Hi richsark,

I'm not looking for new specifications, and I already know the EA* columns can float.

> "I am sorry I cant seem to copy and paste the info, however the data I am providing is sorta masking the actual names and IP addresses."
Q22. What are you trying to copy and paste?  The entire contents of the spreadsheet, or individual cells one at a time?

Q23. Is this masking because the cells are not wide enough in your spreadsheet?  If so, do you know how to widen them?  In Excel, click in top-left corner (just to the left of the "A" column heading and above the "1`" row heading) to select the entire sheet, then double-click the separator line between any of the headings (e.g. between "A" & "B").  That should spread all columns out so they are wide enough to fit the widest cell of data in their column.  Alternatively you can widen specific cells manually.

tel2
Hi
Q22.
 just trying to
Copy a mix amount of examples.


Q23. I do know how to widen them. I spprrviztd it. When can we give your script s whril
I can't understand your English enough to know exactly what you mean, sorry richsark.
Please read your last post carefully, fix the spelling and grammar errors, and repost.  Make sure you have complete clear sentences.  For example, I don't know what "spprrviztd" means.
Regarding copying a mix of examples, I have no problem with your sample input data, so you don't need to keep creating new versions of that.  All I want is expected output data which accurately corresponds to the sample input data, and I have no intention of providing code until I get this.  I've already explained how to create it.  Get someone else to check it carefully before submitting it, please, so we don't have to keep going back and forth with this.  Please make sure there are no differences between the input and output files, to the left of the "EA*" columns (this is a problem your last samples had).

I suggest you always read your posts before you post them, because if it doesn't make sense then it may just waste more time for experts and you.

Thanks.
tel2
Hello Tel2, sorry for the word "spprrviztd" It was to read supervised. Anyway, I feel I have provided you what I am looking for.

All we need is to take the input file with the EA-domain_list row that is filled in with domain names separated by commas. The output show create another file with the addition of columns labeled for each domain in its own column.  As you said I think we have all the details of the requirements. I dont want to over complicate this with trying to muck up a file of what it should look like since the the script wont care of what the input file look like since the data can change, The script should not care of more rows and columns with data. It just need to breakup the contents in the "EA-domain_list"

Lastly the EA-domain_list can float anywhere in the input file since data can always change. With the output, the EA-domain_list can be placed at the tail end of that header or anywhere any "EA's" start.

I hope I cleared this up, don't want to over complicate this

Regards
Hi richsark.

> "All we need is to take the input file with the EA-domain_list row..."
I thought the EA-domain_list was a column.  Are you changing the spec's?  An expected output file would help clear this kind of thing up.

> "I dont want to over complicate this with trying to muck up a file of what it should look like since the the script wont care of what the input file look like since the data can change,"
I don't want you to muck it up either.  I want you to get it right.  If you can't get it right, get someone who has sufficient attention to detail to help.  It looks as if that intern you used before doesn't qualify, based on the issues I raised about the output that person helped with.  It should be a 5-10 minute job for someone with attention to detail.

I think I've explained before why I want an expected output file.  Let me know if you want me to repeat it.

Please re-read the underlined sentence in my previous post.

Thanks.
tel2
Hello. I would like to ask if I could please try what you have this far. I’m sure it would be useful and perfect. I hope you agree and understand.
As much as I think I understand your requirements and would love to give you the code, richsark, I'm not planning to do so until I get the accurate expected output data, because after all the hours I spent on this so far, I think that is the least you can do.  Having such data should also give me an instant way of seeing whether my code is working properly, without possibly having to waste more time going back and forth with you, because my code's output should be identical to what you provided.

Most askers can provide accurate sample input & expected output data, and some even provide it in their first post without being asked.  It's a good habit to get into for efficient requests in future.  If you'd done that in the first place, you could have had your solution with a single post.

If you can't do it by yourself, I suggest you get someone else who has attention to detail to do it with you.  All they have to do is copy one of the (good) versions of sample input data you've already provided, edit it to make it look like what you want the program to change it to, save it as expected output data and post it here.  Should be a 5-10 minute job.

Thanks for your understanding.
tel2
Hello all and Tel2,sorry for the delay, had a death in the family. I have attached 2 new csv's that haveshow"as-is" and "converted" my hope i that it makes sense now for all of us

Thanks again Tel2
Master-file-as-is.csv
Master-file-Converted.csv
Hi richsark,

Sorry to hear of the death in your family.


Questions:

Q24. Why does cell AM4 contain domains in the input but that same cell is blank in the output?  Is it meant to be that way?

Q25. Why are there no "EA-domain_list" headings in row 5 of the input & output?  That is a "Header-Network" row.  Is it meant to be that way?

Q26. Why do you keep spending time creating new input data, when all you have to do is pick up some of the input data you have previously supplied (e.g. Example-Master-2.csv), manually convert it to how you want it to look, then post both input and output here?

Q27. Did you "get someone else who has attention to detail to do this with you", as I previously requested?  Was it the same intern that helped you last time, which resulted in errors?

The above questions are not intended to be rhetorical, so please answer them and number your answers accordingly for clarity.  That could save me from having to ask them again.

Thanks.
tel2
Q24:
Why does cell AM4 contain domains in the input but that same cell is blank in the output?  Is it meant to be that way?

A: Yes its meant to be that way. output data is blank because they need to be separated in individual cells like I mentioned I need to hav happen on the output

Q25. Why are there no "EA-domain_list" headings in row 5 of the input & output?  That is a "Header-Network" row.  Is it meant to be that way?

A: Yes in the EA-domain_list on input data, they are all in one cell, on the output, the are split, If one EA-domain_list is blank its ok, because it is filled on the next row, I was just showing a possibility.

Q26. Why do you keep spending time creating new input data, when all you have to do is pick up some of the input data you have previously supplied (e.g. Example-Master-2.csv), manually convert it to how you want it to look, then post both input and output here?

A: The file is huge, and I want to try to capture all the various possibilities.
Thanks for your responses, richsark.

Before I respond to those, please answer Q27, which you seem to have missed.

tel2
Hello tel2.
On q27
I did it.

Hope we can move forward. I’m  anxious to get your script and test it  

Really appreciate your time and I’m sorry for the long time you spent on this
Hi richsark,

I will reply to your responses to my questions, below.

Q24: Why does cell AM4 contain domains in the input but that same cell is blank in the output?  Is it meant to be that way?
A: Yes its meant to be that way. output data is blank because they need to be separated in individual cells like I mentioned I need to hav happen on the output.

Q24a. Well usually you leave the 1st domain in the column it is (as you still are doing now for AM2 & AM3), and split the 2nd, 3rd, etc domains into the columns to the right of that, so why are you not following the same process with AM4?  (In case you're planning to respond with "because it contains more than 1 domain", please don't.  That's not how it's been happening before.)

Q25. Why are there no "EA-domain_list" headings in row 5 of the input & output?  That is a "Header-Network" row.  Is it meant to be that way?
A: Yes in the EA-domain_list on input data, they are all in one cell, on the output, the are split, If one EA-domain_list is blank its ok, because it is filled on the next row, I was just showing a possibility.

Q25a. When you say "filled on the next row", do you mean "filled on the next column"?
Q25b. Your answer does not make sense to me because I'm talking about "Header-Network" rows and you seem to be talking about split things, which is what happens in "Network" rows.  Please re-read Q25 slowly and carefully and look at row 5 of the input and output data you have supplied, then answer Q25 again.  I said 'That is a "Header-Network" row'.  Usually those rows contain "EA-domain_list" somewhere.  This one doesn't.  As I understand it, the "EA-domain_list" header is what we should be looking for, to decide what column to find the domains in, e.g. you told Bill "Anything with EA- domain_list needs to be split out into separate EA- domain_list Colums".

Q27. Did you "get someone else who has attention to detail to do this with you", as I previously requested?  Was it the same intern that helped you last time, which resulted in errors?
A: I did it.

Q27a. Well I think you've already admitted that your attention to detail isn't sufficient for this task, and that's why you got help from an intern, but unfortunately their attention to detail also proved to be insufficient, so why have you gone back to doing it by yourself, rather than doing what I requested by getting someone with good attention to detail to help?  If you value my time, please prove it by getting someone who has the attention to detail to do the job, to help.  (If I valued my time enough, I would have given up long ago.)

Q28. Have you actually opened the last 2 CSV files you posted, in a spreadsheet, side-by-side, so you can see the issues I'm talking about?

Please answer the above new questions, i.e. Q24a, Q25a, Q25b, Q27a & Q28.
See if you can answer all questions clearly first time, so we can progress more quickly.  Clear communication is key to getting this job done.

Thanks.
tel2
To expand on what David Johnson said in his first comment...

Many of the files examples I looked at have multiple header lines.

This is broken CSV formatting. The results of parsing a file like this are unknown.

Likely first header line will be used + other header lines turned into data records.

To start with, you'll either have to produce a valid CSV file + use a parser like PERL's Text::CSV_XS module.

Or you'll have to write a custom CSV parser.

After that, all the original questions you had are easy to do in PERL or any scripting language.

After your (valid) CSV is parse, you can delete columns or parse existing columns + create new columns.

You can produce any changes to your array of hashes which represent your CSV file + then just dump your CSV file back out to another file.

And, your starting point is still a valid CSV file or writing your own custom parser.
Thanks for that, Raja.

Hi David,
I'll comment on some of your points, below:

> "This is broken CSV formatting. The results of parsing a file like this are unknown."
The results of parsing such a CSV file can be known by writing code which caters for it, right?

> "Likely first header line will be used + other header lines turned into data records."
If you're talking about using a module which is limited to working with 1-line headers, then I'd agree.  But we can write our code to handle multi-line headers.

> "To start with, you'll either have to produce a valid CSV file + use a parser like PERL's Text::CSV_XS module."
richsark's CSV file is already valid enough to use with Text::CSV_XS.  The fact that it has more than 1 header line at the top, and more header lines throughout the file, is not a problem, even with modules like Perl's Text::CSV_XS.  All we have to do is, instead of using the special header features of the module, we read each record as a data record, and process records as headers if they look like headers.  In this case that isn't hard.  Shortly after this question was opened, I came up with solutions which will hopefully do the job (a one-liner which does most of it, and a full script which does all of it), but on principle I don't want to finalise or release my code until richsark has provided sample input and expected output data which corresponds to his specifications, and I can see whether my code does the job before I post it.  It's just the pig-headed kind of guy I am...sometimes.  I used the Perl module Text::CSV, which I understand is a wrapper for Text::CSV_XS.

> "Or you'll have to write a custom CSV parser."
Would you call my approach (above) a custom CSV parser?  It uses a generic CSV parser, but doesn't use many of the features of that parser, including the header feature.

> "After that, all the original questions you had are easy to do in PERL or any scripting language."
True, and hopefully you'll get to see what I came up with after richsark does his bit.  Not that it's anything impressive...trust me.
Hi richsark,

A feedback would be appreciated..
Kindly let us know whether you still wish to pursue this issue as we haven't heard back from you for a long time..

Regards,
Raja Jegan R.
EE Solution Guide.
Gave up with no solution
Workable solution. Not exactly what I needed but a solution