Solved

PHP Read CSV File AND Allow For Line Either Mac or Windows Line Breaks

Posted on 2014-10-03
23
1,787 Views
Last Modified: 2014-10-04
Using the code from PHP.NET to read a csv file, I can make this work when using excel for Mac and Saving as Windows Comma CSV.  When just selecting CSV, the line breaks are not detected correctly.

 http://php.net/manual/en/function.fgetcsv.php
<?php
$row = 1;
if (($handle = fopen("test.csv", "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
        $num = count($data);
        echo "<p> $num fields in line $row: <br /></p>\n";
        $row++;
        for ($c=0; $c < $num; $c++) {
            echo $data[$c] . "<br />\n";
        }
    }
    fclose($handle);
}
?>

Open in new window


In the fopen manual page http://php.net/manual/en/function.fopen.php it talks about using the b or t flags, but I don't see anything on that page about what these are.  Is this a hidden thing for PHP or am I looking in the wrong place.

My goal is to be able to allow somebody to just upload a csv file (to my windows server) and not make the user try and figure out if they are saving as mac csv or windows csv.  Is there some way I can detect this or will the b or t flags just make this work for any file?
0
Comment
Question by:Scott Fell,  EE MVE
  • 7
  • 6
  • 5
  • +1
23 Comments
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 40360435
It's there, just further down the page in the descriptions.  You want the \b flag in almost all cases.  Basically this is the situation.  There are three kinds of line endings.  Quoth the man page:

"Different operating system families have different line-ending conventions. When you write a text file and want to insert a line break, you need to use the correct line-ending character(s) for your operating system. Unix based systems use \n as the line ending character, Windows based systems use \r\n as the line ending characters and Macintosh based systems use \r as the line ending character."

The \t flag tells PHP to translate \n to \r\n but this carries the (small) risk that your data will be munged.  It seems to me that you could use PHP str_replace() like this.

$str = str_replace("\r", "\r\n", $str);
$str = str_replace("\r\n\n", PHP_EOL, $str);

My thinking is that if you get Mac with \r the first str_replace() will give you \r\n.  And if you get PC with \r\n, the first str_replace() will give you \r\n\n and the second will give you the standard PHP end-of-line character.  You might also try to read the file once line-by-line with fgets() and use rtrim() to remove the end-of--line characters, then write each line back into the server file system.
0
 
LVL 54

Expert Comment

by:Julian Hansen
ID: 40360505
Given PHP is C based the b and t I would imagine are borrowed from the C language. They are flags to indicate whether the file should be opened as a binary file (no translation) or a text file (translate as discussed above)

Having said that I put some code together to try and replicate the problem. First the code to create the file.
<?php
$str = array();
$str[] = "This,is,the,first,test\n";
$str[] = "This,is,the,second,test\r\n";
$str[] = "This,is,the,third,test\r";
$str[] = "This,is,the,fourth,test\n\r";

foreach($str as $s) {
  file_put_contents('t720.txt', $s, FILE_APPEND);
}

Open in new window

Next the code to read it
<?php
if (($handle = fopen("t720.txt", "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
print_r($data);	
    }
    fclose($handle);
}

Open in new window

I tried with the "r", "rb" and "rt" flags for the fopen - in all cases the output was as expected
[output]
Array
(
    [0] => This
    [1] => is
    [2] => the
    [3] => first
    [4] => test
)
Array
(
    [0] => This
    [1] => is
    [2] => the
    [3] => second
    [4] => test
)
Array
(
    [0] => This
    [1] => is
    [2] => the
    [3] => third
    [4] => test
This
    [5] => is
    [6] => the
    [7] => fourth
    [8] => test
)
[/output]
I also verified the data file in a HEX editor to ensure that the line endings were as expected.

Maybe your problem is elsewhere? For instance you might have a CR/LF in your data (before the line end) - I have often come across this - which causes a false positive on line ending - and subsequently messes up the row.

I would look at your data to check that this is not happening - in a text editor - open the file and scroll down looking at the left hand column - check for any anomalies where there is suddenly data that appears out of place.
0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 40360528
I thought we already went thru this.  Maybe it was someone else.  Post a short example file that shows the problem.
0
Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

 
LVL 52

Author Comment

by:Scott Fell, EE MVE
ID: 40360560
Dave, this is similar to my other question.  

I am finding some odd things in the excel file after doing this.  After the the last row of "real" data, there is a row of blank commas, then another 500 or so rows of what must be carriage returns.

It's probably has something to do with Mac/Excel.  

With the blanks, I am using
$expected_columns = 10
$num = count($data);
if ($num === $expected_columns) {
   //good
}
ELSE
{
  // bad
}

Open in new window

I'm using this to import data and only import if it is a good row of data. Then look at each field and check it for the correct data type and length.

Let me obfuscate this excel file and output a few rows to csv and you can see what I have.
0
 
LVL 52

Author Comment

by:Scott Fell, EE MVE
ID: 40360575
This is the bottom of the file.

When I just tested the 2 csv files in my upload, the mac file fails but I have not done anything with the flags.
test-windows.csv
test-mac.csv
test.xlsx
0
 
LVL 54

Assisted Solution

by:Julian Hansen
Julian Hansen earned 100 total points
ID: 40360600
Not sure if this on target but usually in Excel if you have entered data into rows and then cleared them excel records that those rows have data and when you dump them to a text (csv / tab delimited) you end up with a bunch of "dead" rows at the end of the file.

Before saving, if I have messed with the file I select the last 20 or so empty rows and delete them.

Bottom line - users are gifted in making sure whatever they upload is not what you were expecting - I have had this problem with users for many years and nothing seems to work - they always find a way to fubar it - not saying there isn't a solution but maybe part of it is to impose some rules for uploading - if you detect a bad file just reject it rather than trying to fix it.
0
 
LVL 52

Author Comment

by:Scott Fell, EE MVE
ID: 40360617
Yea, understand the potential for bad data and I think I have that covered in testing rows and individual fields, then spitting out a table with red squares for bad data and not allowing the import to work until they have fixed up the bad data.

As far as the line breaks, I am going to try what Ray suggested.  

Thank you all!
0
 
LVL 109

Accepted Solution

by:
Ray Paseur earned 200 total points
ID: 40360757
You might want to look into "auto-detect-line-endings" but I think this process will be a bit more foolproof.
Script: http://iconoun.com/demo/temp_padas.php
Output: http://iconoun.com/demo/storage/padas_new.csv
<?php // demo/temp_padas.php
error_reporting(E_ALL);
echo '<pre>';

// SEE: http://www.experts-exchange.com/Programming/Languages/Scripting/PHP/Q_28530947.html
// REF: http://www.php.net/manual/en/function.fputcsv.php
// REF: http://www.php.net/manual/en/filesystem.configuration.php#107333
// REF: http://www.php.net/manual/en/filesystem.configuration.php#ini.auto-detect-line-endings

// INPUT IS A CSV, SAVED FROM THE EXCEL SPREADSHEET
// http://filedb.experts-exchange.com/incoming/2014/10_w40/875735/test.xlsx
$url = 'http://iconoun.com/demo/storage/padas.csv';
$fpr = fopen($url, 'rb');
if (!$fpr) trigger_error("UNABLE TO OPEN $url", E_USER_ERROR);

// OUTPUT IS A CSV ON MY SERVER
$out = 'storage/padas_new.csv';
$fpw = fopen($out, 'wb');
if (!$fpw) trigger_error("UNABLE TO OPEN $out", E_USER_ERROR);

while (!feof($fpr))
{
    // READ EACH ROW AND TRIM THE EOL CHARACTERS
    $row = fgets($fpr);
    $row = trim($row);

    // CHECK FOR EMPTY ROWS
    $arr = explode(',', $row);
    $chk = implode(NULL, $arr);
    if (empty($chk)) continue;

    // WRITE THE EDITED CSV FILE
    $num = fputcsv($fpw, $arr);
    if ($num === FALSE) trigger_error("FPUTCSV() FAILED FOR $out", E_USER_ERROR);
}
fclose($fpr);
fclose($fpw);

// CHECK THE PROCESS
$fpp = fopen($out, 'rb');
while (!feof($fpp))
{
    $row = fgetcsv($fpp);
    print_r($row);
}

Open in new window

0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 40360758
The two CSV files you posted are not in my experience legitimate CSV formats.  I haven't even gotten to my Mac yet.  I opened the XLSX file in LibreOffice and did a CSV export and that turned out perfect, no extra characters or garbage.  So I tried the same thing with Excel 2007.  It always exported 368 rows and it seems to think there is something in one of the apparently blank rows.

Attached is a 'correct' CSV file.  I generate and use CSV files and upload them using phpMyAdmin in this format fairly frequently although I usually double-quote text fields.
test-LibreOffice.csv
0
 
LVL 83

Assisted Solution

by:Dave Baldwin
Dave Baldwin earned 200 total points
ID: 40360769
Ray's demo above puts out 'standard CSV format' files.  I copied your code above and it acted as you described.  I added "ini_set('auto_detect_line_endings',TRUE);" as recommended in http://us3.php.net/manual/en/function.fgetcsv.php and then it read both the Windows and Mac files the same.
0
 
LVL 52

Author Comment

by:Scott Fell, EE MVE
ID: 40360794
Thanks you both.   I'm going to give it a go.  

>The two CSV files you posted are not in my experience legitimate CSV formats.

Dave, I received an excel file to test.  Then simply opened in excel, and a) save as csv b) save as windows csv.  I'm not the one that will ultimately upload this or create the excel file/csv so the scripting needs to clean up as best it can and then point out errors.

My only real regret about Mac is using excel is really not a good experience when you are used to it on the PC.   I heard the newest versions are a lot better but I don't plan on buying it.  Been 5  years and thinking of going back to Windows or Ubuntu next.
0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 40360800
What version of Excel?  Excel:Mac?  Which version?  My old Mac Office 2004 won't open 'xlsx' files unfortunately.

When I exported to CSV from Windows Excel 2007, it did it 'correctly' but it exported all of the rows including 366 blank ones.

Before you are done, you will need to get files from the customer/client and see what they are doing.  The CSV import in phpMyAdmin will flag most errors and refuse to put them in the database.

Ubuntu will be Open Office or Libre Office, not Microsoft Office.  If you need business compatibility, Windows is still the way to go.  I run Libre Office on my MacBook Pro and on my Windows machines that don't have Microsoft Office which is most of them.
0
 
LVL 52

Author Closing Comment

by:Scott Fell, EE MVE
ID: 40360813
Thanks to all.    

Ray's code did the trick and Dave's tip made it work for Mac and Julian's comment is on the money.
0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 40360818
You're welcome, glad to help.
0
 
LVL 54

Expert Comment

by:Julian Hansen
ID: 40360916
If you want to use fgetcsv then this should work on Mac CSV as well
ini_set('auto_detect_line_endings',TRUE);
if (($handle = fopen("t7204.csv", "rt")) !== FALSE) {
 // On my system Excel delimits using ; - change as required
  while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
    $linetest = trim(implode("", $data));
    if (!empty($linetest)) {
      print_r($data);  
    }
  }
  fclose($handle);
}

Open in new window

0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 40361195
The really great followon question might be, "How do I read an XLSX file and write a CSV?"  I think there is an "unzip" step involved, but I've never explored it.
0
 
LVL 52

Author Comment

by:Scott Fell, EE MVE
ID: 40361203
This bit of code is adding an extra blank line at the end (windows server)

 $out = 'C:\inetpub\mysite.com\httpdocs\upload\filename_'.$now.'.csv';
    $fpw = fopen($out, 'wb');
    if (!$fpw) die("UNABLE TO OPEN FILE 2");

    while (!feof($fpr)) {

        // READ EACH ROW AND TRIM THE EOL CHARACTERS
        $row = fgets($fpr);
        $row = trim($row);

        // CHECK FOR EMPTY ROWS
        $arr = explode(',', $row);
        $chk = implode(null, $arr);
        if (empty($chk)) continue;


        // WRITE THE EDITED CSV FILE

            $num = fputcsv($fpw, $arr);
            if ($num === FALSE) die("WRITE FAILED");



    }
    fclose($fpr);
    fclose($fpw);

Open in new window

0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 40361234
Windows is always a nuisance!  This is untested but it's unlikely to cause any side-effects and may help.  After this line, add a test for an empty row

$row = trim($row);
if (empty($row)) continue;

A possible explanation lies in the way that Windows creates line endings.  Windows uses \r\n for a line-ending character.  *nix uses only \n.  What this means is that files created on Windows servers may appear to have blank lines, since the \r character on *nix will cause a "carriage return" effect.  You may want to experiment with the auto-detect-line-endings settings or try turning them off.  When I have read Windows files, I have often had to read all the lines into an array with file(), then use foreach() to iterate over the array, to trim() each of the lines before processing the data.  Nuisance, but usually an acceptable workaround for small files, perhaps anything under a million lines.
0
 
LVL 52

Author Comment

by:Scott Fell, EE MVE
ID: 40361243
I think I have it fixed.

 $out = 'C:\inetpub\mysite.com\httpdocs\upload\filename_'.$now.'.csv';
    $fpw = fopen($out, 'wb');
    if (!$fpw) die("UNABLE TO OPEN FILE 2");

    while (!feof($fpr)) {

        // READ EACH ROW AND TRIM THE EOL CHARACTERS
        $row = fgets($fpr);
        $row = trim($row);

        // CHECK FOR EMPTY ROWS
        $arr = explode(',', $row);
        $chk = implode(null, $arr);
        if (empty($chk)) continue;


        // WRITE THE EDITED CSV FILE

            $num = fputcsv($fpw, $arr);
            if ($num === FALSE) die("WRITE FAILED");



    }

     $stat = fstat($fpw);
    ftruncate($fpw, $stat['size']-1);

    fclose($fpr);
    fclose($fpw);

Open in new window

0
 
LVL 54

Expert Comment

by:Julian Hansen
ID: 40361253
I think there is an "unzip" step involved, but I've never explored it.
Depends on version .xlsx files are zipped files with xml and resource files inside. Older versions of office are a proprietary binary format. There are libraries to do that but if you can export to CSV why go to all the trouble.

Seems to me the problem here was that fgetcsv was not working for csv files created with a Mac export to CSV of an Excel file.
a) There were dead rows created at the end of the doc
b) The Mac line separator is a 0x0d \r

The first problem can be solved by imploding the created array with null separators, trimming the result and comparing to the empty string.
The second problem can be solved by the ini_set('auto_detect_line_endings',TRUE);

At the risk of being pedantic trying to work out why we are dropping fgetcsv and now using fputcsv? Might have missed a step but it seems that the original question was around fgetcsv and different CSV formats?
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 40361254
You may want to add this to the mix: http://www.php.net/manual/en/function.clearstatcache.php
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 40361257
if you can export to CSV why go to all the trouble
Good question.  Almost everyone can click "save" but not everyone can understand and remember "save as CSV."  It may seem like a little thing, but just wait till you have to start debugging errors caused by wrong file formats.  It's one of those non-value-added activities that steals time, costs money, and (from the perspective of most managers) "just shouldn't happen."  So as I think about the larger picture here with a non-technical Excel user creating a file for consumption by a PHP script, I think "let's simplify as much as possible on the user end of things."
0
 
LVL 54

Expert Comment

by:Julian Hansen
ID: 40361264
Ja, but I have been through the whole - read the raw format file as well - at least with CSV it strips out all the other formatting and junk in the file that is not needed.

My best results have been from training users of the site to do it properly rather than trying to write an AI system to work around their mistakes.

However, each to his own.
0

Featured Post

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Generating table dynamically is the most common issue faced by php developers.... So it seems there is a need of an article that explains the basic concept of generating tables dynamically. It just requires a basic knowledge of html and little maths…
Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

808 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question