Question

Smart Batch File

Asked by: netcmh

Hello,

I need help writing a batch file that will:

1. take a csv as the input
2. go through that entire file, line by line
3. look for anything out of the ordinary and
4. correct it

Not really that difficult, Not asking for AI. :)

That csv will always have a standard format. Sometimes however, I've noticed that it has some garbled lines. I've attached a sample, with a line of error.

Thank you

This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.

Subscribe now for full access to Experts Exchange and get

Instant Access to this Solution

  • Plus...
  • 30 Day FREE access, no risk, no obligation
  • Collaborate with the world's top tech experts
  • Unlimited access to our exclusive solution database
  • Never be left without tech help again

Subscribe Now

Asked On
2009-10-19 at 12:56:48ID24824867
Tags

DOS Batch File

Topics

Windows Batch Scripting

,

MS DOS

Participating Experts
4
Points
500
Comments
13

Trusted by hundreds of thousands everyday for fast, accurate and reliable tech support.

  • "The time we save is the biggest benefit of Experts Exchange to Warner Bros. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange." Mike Kapnisakis, Warner Bros.
  • "Our team likes having a resource that is more secure than just using Google and most experts using this service really know their stuff. It's nice to look here first versus using Google." Dayna Sellner, Lockheed Martin
  • "Anytime that I've been stumped with a problem, 9 out of 10 times Experts Exchange has either the accepted solution or an open discussion of the potential solution to the problem." Kenny Red, eBay Inc.

See what Experts Exchange can do for you.

Got a question?

We've got the answer.

Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.

Screenshot of Experts Exchange Knowledgebase

Need individual assistance?

Our experts are ready to help.

If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.

Screenshot of Experts Exchange Knowledgebase

Want to learn from the best?

Read articles from industry experts.

Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.

Screenshot of an Article

Working on a long term project?

Store your work and research.

Save solutions to your questions, answers you’ve discovered through searching plus helpful articles in your personal knowledgebase for easy future access.

Screenshot of Experts Exchange Knowledgebase

Access the answers to your technology questions today.

Subscribe Now

30-day free trial. Register in 60 seconds.

What Makes Experts Exchange Unique?

Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Trusted by the world's most respected brands.

image of each brand's logo

Faithfully serving IT professionals since 1996.

Experts Exchange Logo

Try it out and discover for yourself.

Subscribe Now

30-day free trial. Register in 60 seconds.

Related Solutions

  1. 2D Fighters - AI Techniques
    Hello, I've been programming for a few years now, and have only recently decided to do much Game Development. I have created simple games like Pong, etc, no problem at all; I am now planning to develop a slightly more advanced (and more graphic) game. My plans are to create...
  2. Batch Files
    Is there a way to create a batch file to open notepad, then open an .csv file, then save the file as text?
  3. BATCH FILE
    Hi Does anyone know how to fire a PHP script using a BATCH file? Thanks
  4. Batch File
    Hi... I'm trying to create a batch file that is supposed to check a directory for specific file extensions and then delete them or move them to a different folder. After that it is supposed to send a message to the administrator or send an email to the administrator. Is t...
  5. converting a batch of illustrator (ai files) to jpg
    I have a lot of ai images that I need converted to jpg. I don't want to convert them in illustrator (unless you know of a batch process) Does anyone know of a graphic converter that will convert from ai to jpg?

Free Tech Articles

  1. WARNING: 5 Reasons why you should NEVER fix a computer for free.
    It is in our nature to love the puzzle. We are obsessed. The lot of us. We love puzzles. We love the challenge. We thrive on finding the answer. We hate disarray. It bothers us deep in our soul. W...
  2. SCCM OSD Basic troubleshooting
    SCCM 2007 OSD is a fantastic way to deploy operating systems, however, like most things SCCM issues can sometimes be difficult to resolve due to the sheer volume of logs to sift through and the dispe...
  3. Migrate Small Business Server 2003 to Exchange 2010 and Windows 2008 R2
    This guide is intended to provide step by step instructions on how to migrate from Small Business Server 2003 to Windows 2008 R2 with Exchange 2010. For this migration to work you will need the fo...
  4. Create a Win7 Gadget
    This article shows you how to create a simple "Gadget" -- a sort of mini-application supported by Windows 7 and Vista. Gadgets can be dropped anywhere on the desktop to provide instant information, ...
  5. Outlook continually prompting for username and password
    There have been a lot of questions recently regarding Outlook prompting for a username and password whilst using Exchange 2007. There are a few reasons why this would happen and I will try to cover t...
  6. Backup Exchange 2010 Information Store using Windows Backup
    There seems to be quite a lot of confusion around the ability to backup Exchange 2010 using the built in Windows Backup feature. This stems from the omission of this feature prior to Exchange 2007 s...

Cloud Class Webinars

  1. Avoiding Bugs in Microsoft Access
    Alison Balter takes and in-depth look at avoiding bugs in Access. In this webinar you will learn about using the immediate window to debug your applications, invoking the debugger, using breakpoints to troubleshoot, stepping through code, setting the next statement to execute, ...
  2. Top 10 Best New Features in Visio 2010
    Scott Helmers gives live demonstrations of the top 10 new features in Visio 2010. This webinar will teach you how to create compelling diagrams by adding shapes to the page with a single click, linking the shapes in a diagram to data in Excel (or SQL Server, or SharePoint), ...
  3. IT Consultant Business Secrets Revealed
    Michael Munger, Experts Exchange tech pro and IT consultant, pulls back the curtain on his very successful businesses and answers question on every IT consultant and business owner should know about. He shares secrets on what he did to solve the 5 most common problems in IT, ...
  4. Disaster Recovery and Business Continuity
    Quest CTO, Mike Billon, gives an overview of the steps involved in building a dunamic disaster recovery plan. Through case studies and an examination of software/hardware tooles for monitoring and testing, you'll gain a better understandin of where you are, where you want ...
  5. Organize Your Visio Diagrams with Containers and Lists
    Scott Helmers uses cross functional flowcharts, wireframe diagrams, data graphic legends and seating charts to teach you: how to ustilize all three new structured diagram components in Visio 2010, the best practices for organizeing shapes in previous version of Visio, how to organize ...
  6. How to Us Objects, Properties, Events and Methods in Microsoft Access
    Alison Dalter gives an in-depbth look at objects, properties, events and methods in Microsoft Access. In this webinar you will learn about using the object browser, referring to objects, working with properties and methods, working with object variables, understanding the ...

Join the Community

Give a Little. Get a Lot.

Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.

Join the Community

Answers

 

by: peetjhPosted on 2009-10-19 at 17:17:03ID: 25610046

What is wrong with the line that starts with PC-2-City?

 

by: BillDLPosted on 2009-10-20 at 00:49:32ID: 25611767

Hi netcmh

OK, so what the annotations in your attached csv file SEEM to imply is that you have 5 fields:

PC-Num, YY-MM-YY HH:MM:SS.000, YY-MM-YY HH:MM:SS, 0.0, 0.0

The first HH:MM:SS are all set to Zero, but the SS in the 2nd HH:MM:SS is always rounded to Zero seconds.

You are saying that it is OK to have some extra numbers following the PC-Num but that they must be in the same field?

You are also saying that the two fields that follow the 2nd instance of HH:MM:SS ie. 0.0,0.0 must be consistently 0.0,0.0 and that if any other values are added in there, they should be corrected?

I suppose the easiest way to deal with this is to use the FOR command to grab each comma-separated "token" and verify that the important ones match your rule.  You tell the FOR command to use the comma as the field delimeter and you can break up each line into as many tokens as it finds like this:

for /f "tokens=1-6 delims=," %%a in (4.csv) do (
    set FIELD1=%%a
    set FIELD2=%%b
    set FIELD3=%%c
    set FIELD4=%%d
    set FIELD5=%%e
)

So, for each line parsed in the file "4.csv" you have 6 variables that you can test against a specific rule with each pass.  If the values held in the variables being tested don't match what you want them to match, then you reset the variable to whatever value you do want and then echo all the variables to another file.

Take the first line in your example file.
PC-20,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
When parsed and split into tokens, the following variables should hold the data shown:

FIELD1 = PC-20
FIELD2 = 2009-10-19 00:00:00.000
FIELD3 = 2009-08-20 11:30:00
FIELD4 = 0.0
FIELD5 = 0.0

Taking your last "wrong" line as the next example
PC-8,2009-10-18 00:00:00.000,2009-08-20 11:30:00,0.0,0.0PC-9,2009-10-19 00:00:00.000,
the following would be the case when parsed by the same FOR command:

FIELD1 = PC-8
FIELD2 = 2009-10-19 00:00:00.000
FIELD3 = 2009-08-20 11:30:00
FIELD4 = 0.0
FIELD5 = 0.0PC-9

Because we haven't stored the 6th token (2009-10-19 00:00:00.000), and because the 7th field of your csv file would just be a blank cell in an excel spreadsheet because there is nothing after that terminating comma, we could just output the "corrected" line with 5 fields:

PC-8,2009-10-18 00:00:00.000,2009-08-20 11:30:00,0.0,0.0

using the command:

echo %FIELD1%,%FIELD2%,%FIELD3%,%FIELD4%,%FIELD5:~0,3%

that %FIELD5:~0,3% variable modifier just starts at the first character and includes only 3 characters, so 0.0PC-9 would be modified to 0.0

There are just too many unknowns in your question to know what you would want corrected.

I mean, if you consistently want to have the HH:MM:SS part of your 2nd comma-delimeted field set to 00:00:00.000, and have the 4th and 5th fields consistently set to 0.0, then you could tell the FOR command to use spaces and commas as delimeters (using 7 tokens now) and only get tokens 1, 2, and 4 and 5 into variables.  You could SET tokens 3, 6, and 7 to the static values you always want to have in your file.  The following example would do this:

-------------------------------------------------
@echo off
set TOKEN3=00:00:00.000
set TOKEN6=0.0
set TOKEN7=0.0

for /f "tokens=1-7 delims=, " %%a in (4.csv) do (
    set TOKEN1=%%a
    set TOKEN2=%%b
    set TOKEN4=%%d
    set FIELD5=%%e
)

echo %TOKEN1%,%TOKEN1% %TOKEN3%,%TOKEN4% %TOKEN5%,%TOKEN6%,%TOKEN7% >> NewFile.csv

del 4.csv
ren NewFile.csv 4.csv

EXIT

---------------------------------------------

So, there are methods that can be used, but we would have to know EXACTLY what your rules actually are.  As peetjh has asked, "What is wrong with the line that starts with PC-2-City?".  In fact, could you please explain why all the lines you marked as "wrong" don't fit your rules.

Bill

 

by: BillDLPosted on 2009-10-20 at 01:07:11ID: 25611834

Actually, the echo command should be inside the parentheses containing the FOR statement.  It is a looping function, so with each pass the line would have to be echoed out before parsing the next line and doing the same.  We can fix that if you can explain more.

 

by: t0t0Posted on 2009-10-20 at 05:28:23ID: 25613284

What a lovelly problem. I love challenges like these.

Before looking for a solution to this problem, my first instinct would be to turn to the mechanism responsible for constructing the errornous text file in the first place. It may be easier to prevent these (and other errors) creeping in rather than attempting to correct them afterwards! Anyway....

Parsing the file would be simple if we could assume fields 2, 3, 4 and 5 are constants as we could use token-matching starting from the first 'PC-'.

Even if the data is different for each line, token-by-token pattern-matching can be used to reconstruct each line again, starting from the first 'PC-'.

I will attempt to include some code.

 

by: billprewPosted on 2009-10-20 at 05:40:31ID: 25613373

As several have mentioned, the mechanical part of parsing the file and working with it are the easy part of this question.  The much harder part is what conditions do you want to look for, and what action is appropriate to "correct" them when found?

I would actually be more concerned though about why the problem is occurring in the first place.  I'm not sure what the source of the file is, or how it is being transmitted, but clearly somewhere along the line it is getting damaged.  Based on your example it seems like some data may be getting lost or "dropped", but what if some data is being repeated in error.  In that case some of the repeated lines might pass as valid format, but cause problems downstream.

You mention that

PC-3-farm 1120 1178,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0

is OKAY, but the following is WRONG.

PC-2-city,1120,1178,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0

This appears to be a case of extra commas being inserted by the program that created the file, which is hard to imagine.  Or the commas are somehow being added after the file is produced initially, which is also hard to imagine.

You also mention that the following can occur.

PC-8,2009-10-18 00:00:00.000,2009-08-20 11:30:00,0.0,0.0PC-9,2009-10-19 00:00:00.000,
2009-08-20 11:30:00,0.0,0.0

Which probably should be:

PC-8,2009-10-18 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-9,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0

But that problem would be caused by a loss of the end of line characters.  Well, couldn't a data loss like that occur in other locations in the file, and what if real data was actually dropped?

I know it's not always possible, but I would look hard at solving these problems at their root cause, not after the fact.  In my experience these band aid solutions often get complicated and confusing over time, and can introduce errors of their own.

~bp

 

by: t0t0Posted on 2009-10-20 at 07:07:45ID: 25614253

billprew

I totally agree with what you've written. I especially would likek to know what processes are involved in the making of the original file - addressing the problems at this stage would be a far better solution.

If, on the other hand, the file is made up from various sources say, multiple data entry personel, then the validation processes employed need reviewing.


netcmh

Please state how the file is created. If it is created programatically then please consider allowing us to examine the code responsible if it is available.

In the meantime, I apply token-by-token predictive pattern-matching logic to validate the file however, this is likely to take a little while due to my other comitments.

Perhaps Billprew could chuck in some ideas of his own.
 

 

by: netcmhPosted on 2009-10-20 at 08:32:13ID: 25615193

Replying from BB.

csv created from bcp dump
fixed format, dunno why corruption occurs
thank you all for writing in with your ideas
will get back to you as soon as i can
thank you again

 

by: billprewPosted on 2009-10-20 at 10:50:02ID: 25616637

This level of string parsing and interogation is going to be somewhat hard in DOS Batch syntax, would you be opposed to an AWK solution?

~bp

 

by: billprewPosted on 2009-10-20 at 12:41:48ID: 25617697

I decided to play around with an AWK solution to this, since the DOS batch approach is in my mind the wrong tool for this job.  

I'm attaching the AWK script I wrote, as well as your test input and the cleaned output from the AWK program, it seems to handle the cases you identified.  If you want to provide a larger test file I'm happy to wring out some of the other problems that could occur.

To run it I used the following command:

gawk -f EE24824867.awk EE24824867.txt >EE24824867.out

If you don't have a windows version of AWK you can get a free one here (I tested with version 3.1.6).

http://sourceforge.net/projects/gnuwin32/files/gawk/

Hope this helps.

~bp

------------------- BEGIN EE24824867.AWK ---------------------
{
   # for each input line call validation routine to check for bad formats
   Validate($0)
}
 
function Validate(LineIn, _Tokens, _Fields, _I, _Rest) {
   # parse passed in input line on commas
   _Fields = split(LineIn, _Tokens, ",")
 
   # if 5 fields found assume good
   if (_Fields == 5) {
      print LineIn
      return
   }
 
   # Found less than 5 fields
   if (_Fields < 5) {
      # What do we do with these bad records?
      print LineIn>>"EE24824867.err"
      return
   }
 
   # Found more than 5 fields, might be able to correct some problems
   _I = index(substr(LineIn, 2), "PC-") + 1
   # Was a second line concatenated on the end of this line?
   if (_I > 1) {
      # Try to process the first line (up to the "PC-")
      Validate(substr(LineIn, 1, _I-1))
      # Then get the next line and try to process it with the rest of the line (after the "PC-"
      getline
      Validate(substr(LineIn, _I) $0)
      return
   }
 
   # Too many fields, but not a "double" line, see if it looks like "PC-2-city,1120,1178,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0"
   if ((_Tokens[2] ~ /[0-9]+/) && (_Tokens[3] ~ /[0-9]+/) && (_Tokens[4] ~ /[0-9]+-[0-9]+-[0-9]/)) {
      # replace the first 2 commas with spaces and try again
      sub(",", " ", LineIn)
      sub(",", " ", LineIn)
      Validate(LineIn)
      return
   }
 
   # if we got here unhandled format errors were encountered
   print LineIn>>"EE24824867.err"
   return
}
-------------------- END EE24824867.AWK ----------------------
 
 
------------------- BEGIN EE24824867.IN ----------------------
PC-20,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-21,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-22,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-23,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-3-farm 1120 1178,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-4,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-5,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-6,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-7,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-8,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-9,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-2-city,1120,1178,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-8,2009-10-18 00:00:00.000,2009-08-20 11:30:00,0.0,0.0PC-9,2009-10-19 00:00:00.000,
2009-08-20 11:30:00,0.0,0.0
-------------------- END EE24824867.IN -----------------------
 
 
------------------- BEGIN EE24824867.OUT ---------------------
PC-20,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-21,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-22,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-23,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-3-farm 1120 1178,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-4,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-5,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-6,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-7,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-8,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-9,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-2-city 1120 1178,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-8,2009-10-18 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
PC-9,2009-10-19 00:00:00.000,2009-08-20 11:30:00,0.0,0.0
-------------------- END EE24824867.OUT ----------------------
                                              
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:
63:
64:
65:
66:
67:
68:
69:
70:
71:
72:
73:
74:
75:
76:
77:
78:
79:
80:
81:
82:
83:
84:
85:

Select allOpen in new window

 

by: BillDLPosted on 2009-10-20 at 13:44:43ID: 25618383

I am going to wait until netcmh has the time to return and provide a fuller explanation.  There are too many unexplained elements.  Extract data drom databases with a flawed command and you get inconsistent data.  Extract data from an inconsistent database and you get bad data.  As t0t0 and billprew have both suggested, it would be far more productive to try and troubleshoot the reasons why the extracted data is inconsistent or bad than to try and create a universal method of fixing it.

Personally I don't see anything wrong with a batch file that backs up the original file and then writes out the good and corrected fields to a new file, but it all hinges on WHY the stated lines don't comply with the "rules".

netcmh, all we really need to know is whether any of the comma-separated fields or space-separated groups within those fields in your example can be assumed to always have the same syntax and always have the exact same values for every csv file you would need to "correct".  ie. Are there any constants.

 

by: BillDLPosted on 2009-10-20 at 13:57:55ID: 25618523

.... and before anyone tells me that my batch file examples are wrong, I know.  They were off the top of my head and meant to demonstrate a general idea rather than as tested code ;-)

 

by: netcmhPosted on 2009-11-04 at 10:41:44ID: 31643090

Thank you all for your help. Unfortunately, that project's been shelved. I'll have to open up another when we need it. Thanks again.

 

by: BillDLPosted on 2009-11-05 at 00:59:12ID: 25747695

Thank you netcmh.

20120131-EE-VQP-002

3 Ways to Join

30-Day Free Trial

The Experts

98% positive feedback on 31,087 answers since March 2000. angeliii is a Microsoft Most Valuable Professional for his work with MS SQL Server & Develoment.

He has also proven his knowledge of Visual Basic Programming, PHP Scripting and Oracle Databases.

The Experts

97% positive feedback on 10,752 answers since July 2000. lrmoore has more than 18 years experience in the networking industry.

The six-time Mircosoft MVPs specialties include firewalls, virtual private networking, and network management.

Testimonials

"...and excellent source for support... Kind of like having your very own IT dept." Electriciansnet

Testimonials

"I was apprehensive at signing up at first. However... it has already made my life as an IT administrator much easier." JaCrews

Testimonials

"WOW! You guys have great, active, and knowledgeable people on here." moore50

Business Clients

Business Clients

In the Press

"If you’ve got a question... Experts Exchange can supply an answer.”

In the Press

"...an invaluable aid for both IT professionals and those who require tech support."

In the Press

"where IT professionals provide quick answers on just about any topic"

Business Account Plans

Loading Advertisement...