Solved

PHP Read CSV File AND Allow For Line Either Mac or Windows Line Breaks

Posted on 2014-10-03
23
1,634 Views
Last Modified: 2014-10-04
Using the code from PHP.NET to read a csv file, I can make this work when using excel for Mac and Saving as Windows Comma CSV.  When just selecting CSV, the line breaks are not detected correctly.

 http://php.net/manual/en/function.fgetcsv.php
<?php
$row = 1;
if (($handle = fopen("test.csv", "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
        $num = count($data);
        echo "<p> $num fields in line $row: <br /></p>\n";
        $row++;
        for ($c=0; $c < $num; $c++) {
            echo $data[$c] . "<br />\n";
        }
    }
    fclose($handle);
}
?>

Open in new window


In the fopen manual page http://php.net/manual/en/function.fopen.php it talks about using the b or t flags, but I don't see anything on that page about what these are.  Is this a hidden thing for PHP or am I looking in the wrong place.

My goal is to be able to allow somebody to just upload a csv file (to my windows server) and not make the user try and figure out if they are saving as mac csv or windows csv.  Is there some way I can detect this or will the b or t flags just make this work for any file?
0
Comment
Question by:Scott Fell,  EE MVE
  • 7
  • 6
  • 5
  • +1
23 Comments
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
It's there, just further down the page in the descriptions.  You want the \b flag in almost all cases.  Basically this is the situation.  There are three kinds of line endings.  Quoth the man page:

"Different operating system families have different line-ending conventions. When you write a text file and want to insert a line break, you need to use the correct line-ending character(s) for your operating system. Unix based systems use \n as the line ending character, Windows based systems use \r\n as the line ending characters and Macintosh based systems use \r as the line ending character."

The \t flag tells PHP to translate \n to \r\n but this carries the (small) risk that your data will be munged.  It seems to me that you could use PHP str_replace() like this.

$str = str_replace("\r", "\r\n", $str);
$str = str_replace("\r\n\n", PHP_EOL, $str);

My thinking is that if you get Mac with \r the first str_replace() will give you \r\n.  And if you get PC with \r\n, the first str_replace() will give you \r\n\n and the second will give you the standard PHP end-of-line character.  You might also try to read the file once line-by-line with fgets() and use rtrim() to remove the end-of--line characters, then write each line back into the server file system.
0
 
LVL 51

Expert Comment

by:Julian Hansen
Comment Utility
Given PHP is C based the b and t I would imagine are borrowed from the C language. They are flags to indicate whether the file should be opened as a binary file (no translation) or a text file (translate as discussed above)

Having said that I put some code together to try and replicate the problem. First the code to create the file.
<?php
$str = array();
$str[] = "This,is,the,first,test\n";
$str[] = "This,is,the,second,test\r\n";
$str[] = "This,is,the,third,test\r";
$str[] = "This,is,the,fourth,test\n\r";

foreach($str as $s) {
  file_put_contents('t720.txt', $s, FILE_APPEND);
}

Open in new window

Next the code to read it
<?php
if (($handle = fopen("t720.txt", "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
print_r($data);	
    }
    fclose($handle);
}

Open in new window

I tried with the "r", "rb" and "rt" flags for the fopen - in all cases the output was as expected
[output]
Array
(
    [0] => This
    [1] => is
    [2] => the
    [3] => first
    [4] => test
)
Array
(
    [0] => This
    [1] => is
    [2] => the
    [3] => second
    [4] => test
)
Array
(
    [0] => This
    [1] => is
    [2] => the
    [3] => third
    [4] => test
This
    [5] => is
    [6] => the
    [7] => fourth
    [8] => test
)
[/output]
I also verified the data file in a HEX editor to ensure that the line endings were as expected.

Maybe your problem is elsewhere? For instance you might have a CR/LF in your data (before the line end) - I have often come across this - which causes a false positive on line ending - and subsequently messes up the row.

I would look at your data to check that this is not happening - in a text editor - open the file and scroll down looking at the left hand column - check for any anomalies where there is suddenly data that appears out of place.
0
 
LVL 82

Expert Comment

by:Dave Baldwin
Comment Utility
I thought we already went thru this.  Maybe it was someone else.  Post a short example file that shows the problem.
0
 
LVL 52

Author Comment

by:Scott Fell, EE MVE
Comment Utility
Dave, this is similar to my other question.  

I am finding some odd things in the excel file after doing this.  After the the last row of "real" data, there is a row of blank commas, then another 500 or so rows of what must be carriage returns.

It's probably has something to do with Mac/Excel.  

With the blanks, I am using
$expected_columns = 10
$num = count($data);
if ($num === $expected_columns) {
   //good
}
ELSE
{
  // bad
}

Open in new window

I'm using this to import data and only import if it is a good row of data. Then look at each field and check it for the correct data type and length.

Let me obfuscate this excel file and output a few rows to csv and you can see what I have.
0
 
LVL 52

Author Comment

by:Scott Fell, EE MVE
Comment Utility
This is the bottom of the file.

When I just tested the 2 csv files in my upload, the mac file fails but I have not done anything with the flags.
test-windows.csv
test-mac.csv
test.xlsx
0
 
LVL 51

Assisted Solution

by:Julian Hansen
Julian Hansen earned 100 total points
Comment Utility
Not sure if this on target but usually in Excel if you have entered data into rows and then cleared them excel records that those rows have data and when you dump them to a text (csv / tab delimited) you end up with a bunch of "dead" rows at the end of the file.

Before saving, if I have messed with the file I select the last 20 or so empty rows and delete them.

Bottom line - users are gifted in making sure whatever they upload is not what you were expecting - I have had this problem with users for many years and nothing seems to work - they always find a way to fubar it - not saying there isn't a solution but maybe part of it is to impose some rules for uploading - if you detect a bad file just reject it rather than trying to fix it.
0
 
LVL 52

Author Comment

by:Scott Fell, EE MVE
Comment Utility
Yea, understand the potential for bad data and I think I have that covered in testing rows and individual fields, then spitting out a table with red squares for bad data and not allowing the import to work until they have fixed up the bad data.

As far as the line breaks, I am going to try what Ray suggested.  

Thank you all!
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 200 total points
Comment Utility
You might want to look into "auto-detect-line-endings" but I think this process will be a bit more foolproof.
Script: http://iconoun.com/demo/temp_padas.php
Output: http://iconoun.com/demo/storage/padas_new.csv
<?php // demo/temp_padas.php
error_reporting(E_ALL);
echo '<pre>';

// SEE: http://www.experts-exchange.com/Programming/Languages/Scripting/PHP/Q_28530947.html
// REF: http://www.php.net/manual/en/function.fputcsv.php
// REF: http://www.php.net/manual/en/filesystem.configuration.php#107333
// REF: http://www.php.net/manual/en/filesystem.configuration.php#ini.auto-detect-line-endings

// INPUT IS A CSV, SAVED FROM THE EXCEL SPREADSHEET
// http://filedb.experts-exchange.com/incoming/2014/10_w40/875735/test.xlsx
$url = 'http://iconoun.com/demo/storage/padas.csv';
$fpr = fopen($url, 'rb');
if (!$fpr) trigger_error("UNABLE TO OPEN $url", E_USER_ERROR);

// OUTPUT IS A CSV ON MY SERVER
$out = 'storage/padas_new.csv';
$fpw = fopen($out, 'wb');
if (!$fpw) trigger_error("UNABLE TO OPEN $out", E_USER_ERROR);

while (!feof($fpr))
{
    // READ EACH ROW AND TRIM THE EOL CHARACTERS
    $row = fgets($fpr);
    $row = trim($row);

    // CHECK FOR EMPTY ROWS
    $arr = explode(',', $row);
    $chk = implode(NULL, $arr);
    if (empty($chk)) continue;

    // WRITE THE EDITED CSV FILE
    $num = fputcsv($fpw, $arr);
    if ($num === FALSE) trigger_error("FPUTCSV() FAILED FOR $out", E_USER_ERROR);
}
fclose($fpr);
fclose($fpw);

// CHECK THE PROCESS
$fpp = fopen($out, 'rb');
while (!feof($fpp))
{
    $row = fgetcsv($fpp);
    print_r($row);
}

Open in new window

0
 
LVL 82

Expert Comment

by:Dave Baldwin
Comment Utility
The two CSV files you posted are not in my experience legitimate CSV formats.  I haven't even gotten to my Mac yet.  I opened the XLSX file in LibreOffice and did a CSV export and that turned out perfect, no extra characters or garbage.  So I tried the same thing with Excel 2007.  It always exported 368 rows and it seems to think there is something in one of the apparently blank rows.

Attached is a 'correct' CSV file.  I generate and use CSV files and upload them using phpMyAdmin in this format fairly frequently although I usually double-quote text fields.
test-LibreOffice.csv
0
 
LVL 82

Assisted Solution

by:Dave Baldwin
Dave Baldwin earned 200 total points
Comment Utility
Ray's demo above puts out 'standard CSV format' files.  I copied your code above and it acted as you described.  I added "ini_set('auto_detect_line_endings',TRUE);" as recommended in http://us3.php.net/manual/en/function.fgetcsv.php and then it read both the Windows and Mac files the same.
0
 
LVL 52

Author Comment

by:Scott Fell, EE MVE
Comment Utility
Thanks you both.   I'm going to give it a go.  

>The two CSV files you posted are not in my experience legitimate CSV formats.

Dave, I received an excel file to test.  Then simply opened in excel, and a) save as csv b) save as windows csv.  I'm not the one that will ultimately upload this or create the excel file/csv so the scripting needs to clean up as best it can and then point out errors.

My only real regret about Mac is using excel is really not a good experience when you are used to it on the PC.   I heard the newest versions are a lot better but I don't plan on buying it.  Been 5  years and thinking of going back to Windows or Ubuntu next.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 82

Expert Comment

by:Dave Baldwin
Comment Utility
What version of Excel?  Excel:Mac?  Which version?  My old Mac Office 2004 won't open 'xlsx' files unfortunately.

When I exported to CSV from Windows Excel 2007, it did it 'correctly' but it exported all of the rows including 366 blank ones.

Before you are done, you will need to get files from the customer/client and see what they are doing.  The CSV import in phpMyAdmin will flag most errors and refuse to put them in the database.

Ubuntu will be Open Office or Libre Office, not Microsoft Office.  If you need business compatibility, Windows is still the way to go.  I run Libre Office on my MacBook Pro and on my Windows machines that don't have Microsoft Office which is most of them.
0
 
LVL 52

Author Closing Comment

by:Scott Fell, EE MVE
Comment Utility
Thanks to all.    

Ray's code did the trick and Dave's tip made it work for Mac and Julian's comment is on the money.
0
 
LVL 82

Expert Comment

by:Dave Baldwin
Comment Utility
You're welcome, glad to help.
0
 
LVL 51

Expert Comment

by:Julian Hansen
Comment Utility
If you want to use fgetcsv then this should work on Mac CSV as well
ini_set('auto_detect_line_endings',TRUE);
if (($handle = fopen("t7204.csv", "rt")) !== FALSE) {
 // On my system Excel delimits using ; - change as required
  while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
    $linetest = trim(implode("", $data));
    if (!empty($linetest)) {
      print_r($data);  
    }
  }
  fclose($handle);
}

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
The really great followon question might be, "How do I read an XLSX file and write a CSV?"  I think there is an "unzip" step involved, but I've never explored it.
0
 
LVL 52

Author Comment

by:Scott Fell, EE MVE
Comment Utility
This bit of code is adding an extra blank line at the end (windows server)

 $out = 'C:\inetpub\mysite.com\httpdocs\upload\filename_'.$now.'.csv';
    $fpw = fopen($out, 'wb');
    if (!$fpw) die("UNABLE TO OPEN FILE 2");

    while (!feof($fpr)) {

        // READ EACH ROW AND TRIM THE EOL CHARACTERS
        $row = fgets($fpr);
        $row = trim($row);

        // CHECK FOR EMPTY ROWS
        $arr = explode(',', $row);
        $chk = implode(null, $arr);
        if (empty($chk)) continue;


        // WRITE THE EDITED CSV FILE

            $num = fputcsv($fpw, $arr);
            if ($num === FALSE) die("WRITE FAILED");



    }
    fclose($fpr);
    fclose($fpw);

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
Windows is always a nuisance!  This is untested but it's unlikely to cause any side-effects and may help.  After this line, add a test for an empty row

$row = trim($row);
if (empty($row)) continue;

A possible explanation lies in the way that Windows creates line endings.  Windows uses \r\n for a line-ending character.  *nix uses only \n.  What this means is that files created on Windows servers may appear to have blank lines, since the \r character on *nix will cause a "carriage return" effect.  You may want to experiment with the auto-detect-line-endings settings or try turning them off.  When I have read Windows files, I have often had to read all the lines into an array with file(), then use foreach() to iterate over the array, to trim() each of the lines before processing the data.  Nuisance, but usually an acceptable workaround for small files, perhaps anything under a million lines.
0
 
LVL 52

Author Comment

by:Scott Fell, EE MVE
Comment Utility
I think I have it fixed.

 $out = 'C:\inetpub\mysite.com\httpdocs\upload\filename_'.$now.'.csv';
    $fpw = fopen($out, 'wb');
    if (!$fpw) die("UNABLE TO OPEN FILE 2");

    while (!feof($fpr)) {

        // READ EACH ROW AND TRIM THE EOL CHARACTERS
        $row = fgets($fpr);
        $row = trim($row);

        // CHECK FOR EMPTY ROWS
        $arr = explode(',', $row);
        $chk = implode(null, $arr);
        if (empty($chk)) continue;


        // WRITE THE EDITED CSV FILE

            $num = fputcsv($fpw, $arr);
            if ($num === FALSE) die("WRITE FAILED");



    }

     $stat = fstat($fpw);
    ftruncate($fpw, $stat['size']-1);

    fclose($fpr);
    fclose($fpw);

Open in new window

0
 
LVL 51

Expert Comment

by:Julian Hansen
Comment Utility
I think there is an "unzip" step involved, but I've never explored it.
Depends on version .xlsx files are zipped files with xml and resource files inside. Older versions of office are a proprietary binary format. There are libraries to do that but if you can export to CSV why go to all the trouble.

Seems to me the problem here was that fgetcsv was not working for csv files created with a Mac export to CSV of an Excel file.
a) There were dead rows created at the end of the doc
b) The Mac line separator is a 0x0d \r

The first problem can be solved by imploding the created array with null separators, trimming the result and comparing to the empty string.
The second problem can be solved by the ini_set('auto_detect_line_endings',TRUE);

At the risk of being pedantic trying to work out why we are dropping fgetcsv and now using fputcsv? Might have missed a step but it seems that the original question was around fgetcsv and different CSV formats?
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
You may want to add this to the mix: http://www.php.net/manual/en/function.clearstatcache.php
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
if you can export to CSV why go to all the trouble
Good question.  Almost everyone can click "save" but not everyone can understand and remember "save as CSV."  It may seem like a little thing, but just wait till you have to start debugging errors caused by wrong file formats.  It's one of those non-value-added activities that steals time, costs money, and (from the perspective of most managers) "just shouldn't happen."  So as I think about the larger picture here with a non-technical Excel user creating a file for consumption by a PHP script, I think "let's simplify as much as possible on the user end of things."
0
 
LVL 51

Expert Comment

by:Julian Hansen
Comment Utility
Ja, but I have been through the whole - read the raw format file as well - at least with CSV it strips out all the other formatting and junk in the file that is not needed.

My best results have been from training users of the site to do it properly rather than trying to write an AI system to work around their mistakes.

However, each to his own.
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Deprecated and Headed for the Dustbin By now, you have probably heard that some PHP features, while convenient, can also cause PHP security problems.  This article discusses one of those, called register_globals.  It is a thing you do not want.  …
This article discusses four methods for overlaying images in a container on a web page
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now