Link to home
Start Free TrialLog in
Avatar of Evan Cutler
Evan CutlerFlag for United States of America

asked on

get specific line in large csv file using php

Greetings,
I have a file.
It is standard with <email>,<tag>
There are over a million lines in the file like that.

<email>,<tag>
<email>,<tag>
..continued.

no headers.
If I have a random value, say 124535,

How can I get the 124535th line in the file without reading the whole thing into memory?
I can call local commands, but the execution order must be in PHP.
Thanks
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

You can do this easily.  Just read up to the 124,535th line.  That's only about 1/8 of the lines. If you load the file into a database table, you can use the LIMIT clause to get the correct line.  Use this function: http://php.net/manual/en/function.fgets.php
Avatar of duncanb7
duncanb7

Following code  might be what you need for complete PHP code to read content at certain line-number,(Note: where 4096 in code area, you can change it with adding some margin if you know the max characters in one line of your file to speed up the code running time)  

Hope I understand your questionc completely, if no, please point it out


Duncan

php code , it is tested at my linux server
========================================
<?php
  $handle = @fopen("../../yourfile.csv", "r");
if ($handle) {
   // while (($buffer = fgets($handle, 4096)) !== false) { echo $buffer; }
   // if (!feof($handle)) {   echo "Error: unexpected fgets() fail\n";}
   $num=123;
	$line=readLine($num,$handle); ///123 is just exmple line-number;
	echo "Line Num:".$num."-----------------\n";
	echo $line."\n";
    fclose($handle);
}

  function readLine ($linenum,$fh) {
        $line = fgets ($fh, 4096);
        $pos = -1;
        $i = 0;
        while (!feof($fh) && $i<($linenum-1)) {
            $char = fgetc($fh);
            if ($char != "\n" && $char != "\r") {
                fseek($fh, $pos, SEEK_SET);
                $pos ++;
            }
            else $i ++; 
        }
        $line = fgets($fh);
        return $line;
    } 
?>

Open in new window

Please examine the descriptions of these functions:

fgetc()
fgets()
fseek()

It always pays to read the manual :-)
Avatar of Tintin
Depending on whether you have safe_mode enabled, you can always "cheat" and do:

<?php
$line = `sed -n 124535p filename`;
echo $line
?>

Open in new window

Dear All,

Thanks Ray_Paseur's reminder, I used slower function that is not related to 4096
and it also run much slower since using fgetc from last code.
After Tintin's suggestion code that is really good code and really fast & easy, and I try to
improve my code as follows code ,as same speed as "sed"

 arcee123, hope you try this all code including Tintin's code for speed at your side.
I think I have no more room to improve the speed  faster by fgets.

Duncan

<?php
  $f = fopen ("../../data/yourfile.csv", "r");  $i= 0;// where 5000 is line number
  while ($line= fgets ($f)) { if ($i==5000) {print ($line);break;}; ++$i; }   fclose ($f);
?>

Open in new window

Avatar of Evan Cutler

ASKER

Ok guys.Thanks much.
What happens if I need to read somewhere beyond 800000?  Doesn't it cost alt to get to the 800000 if I have to readline to that point?
Yes, it does cost.  But as long as it is a text file without any indexes that you can use to skip the unwanted lines, that is what you have to do.  Which is why Ray suggested that you load the CSV file into a database because databases do create indexes and provide the ability to search much more quickly than reading a text file line by line.

Years ago when I had to write a program that used a CSV file for a 'database', the first thing I did when the program started was to read it all into memory and create indexes to find the data I needed.  That worked then because the data was read many times while the program was running.  If you are doing this on a web site and only reading one line each time, you would be Much better off putting it a database because it would be Much quicker.
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Using @gr8gonzo's method the results are substantially faster.

WRITE LINES STOPPED  3,467.103 ms
READ LINES STOPPED  0.057 ms

This also appeared to be independent of the actual record chosen - line #200,000 was as fast as line #800,000.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Dave: I read the entire file ...

$arr = file('RAY_temp_arcee123.txt', FILE_IGNORE_NEW_LINES);

... and used an iterator to get to the selected line.  Using or omitting the iterator was undetectable in my tests -- iterating over hundreds of thousands of lines with foreach() did not even show up in the timing results.  The time to read the 45 million bytes was where the 344 milliseconds came from.

It appears "much faster" to use fseek() if you know what you want to locate.  And (serve this with a helping of mouse milk) we are talking about taking down something that needs 1/3 second to run ;-)
1.7 seconds is Ok.  It's having to read for 2 minutes that's the problem. I am trying it now.
Thanks
Oh... Ok.
Hey, thanks guys.  This got me where I was going.
Appreciate it so much.
Thanks for the points and thanks for using EE, ~Ray
If this gets re-opened for any reason, I shouldn't get any points for that post because, as Ray pointed out, I was wrong.
Dave: It was still a useful contribution to the conversation.  Happy Thanksgiving, ~Ray