Solved

Need Regular expression to pre process CSV file

Posted on 2008-10-23
6
436 Views
Last Modified: 2010-04-21
( PHP 5.2 / Apache / WinXP )
I need to clean up a CSV file line by line before I process it.
The file has been mangled by Excel.

I'm new to regular expressions so I need some help.
The CSV file is actually a .txt file which is tab separated.
What I need to do is remove any tabs, commas and single quotes that are contained within a double quote.
And also remove the double quote as well.
So I end up with a clean CSV line separated by tabs.

Example this;
C05110200      "Trish, Ruducheerry"      Cantonon      TH18312      1973/0726
Should become
C05110200      Trish Ruducheerry      Cantonon      TH18312      1973/0726

Please provide a code example
0
Comment
Question by:Matthew_Way
  • 4
  • 2
6 Comments
 
LVL 27

Expert Comment

by:ddrudik
ID: 22793417
You mention single quotes ' but not double quotes " but your result example above shows the double quotes gone.
If you want single-quotes removed:
<?php
$str="C05110200\t\"Trish, \tRuducheerry\"\tCantonon\tTH18312\t1973/0726";
echo "<pre>$str";
$str=preg_replace_callback('/"[^"]*"/','repfunc',$str);
function repfunc($match){
  return preg_replace("/[\t,']/",'',$match[0]);
}
echo "<br>$str";
?>

If you want the double-quotes removed:
<?php
$str="C05110200\t\"Trish, \tRuducheerry\"\tCantonon\tTH18312\t1973/0726";
echo "<pre>$str";
$str=preg_replace_callback('/"[^"]*"/','repfunc',$str);
function repfunc($match){
  return preg_replace('/[\t,"]/','',$match[0]);
}
echo "<br>$str";
?>
0
 

Author Comment

by:Matthew_Way
ID: 22793445
Okay let me reword

Remove all single and double quotes.
Remove tab character only if it appears within a double quote.
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22793477
Please confirm that you want to remove all single and double quotes, regardless of where they appear in the text, such as:
C05110200      "Trish Ruducheerry's Name"      Cantonon's Test      TH18312      1973/0726

Also confirm that the single quotes or double quotes to be removed are not escaped in any way in the text, such as this:
C05110200      "the following is a \"quote\" that someone said"      Cantonon's Test      TH18312      1973/0726
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 27

Accepted Solution

by:
ddrudik earned 500 total points
ID: 22793522
Consider this example:
<?php

$str="C05110200\t\"Trish, \tRuducheerry\"\tCantonon\tTH1'8312\t1973/07\"26";

echo "<pre>$str";

$str=preg_replace('/["\']/','',preg_replace_callback('/"[^"]+"/','repfunc',$str));

function repfunc($match){

  return preg_replace("/[\t,'\"]/",'',$match[0]);

}

echo "<br>$str";

// it may appear in your output that the last tab was deleted, but it's there:

echo '<br>'.preg_replace('/\t/',',',$str);

?>

Open in new window

0
 

Author Closing Comment

by:Matthew_Way
ID: 31509504
Thank you v.much
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22802979
Thanks for the question and the points.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Deprecated and Headed for the Dustbin By now, you have probably heard that some PHP features, while convenient, can also cause PHP security problems.  This article discusses one of those, called register_globals.  It is a thing you do not want.  …
Build an array called $myWeek which will hold the array elements Today, Yesterday and then builds up the rest of the week by the name of the day going back 1 week.   (CODE) (CODE) Then you just need to pass your date to the function. If i…
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

930 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now