Solved progressbar max value for number of lines in a file

Posted on 2005-04-29
Last Modified: 2012-05-05
I have a file that I am reading and manipulating, I am not sure the maxium lines in the file, so is there a way to read how many lines are in a file first.

Right now I am looping a line at a time. Is there a file.linecount or anything I could use???
Question by:bman9111
    LVL 5

    Expert Comment

    Well there really is no such thing as 'lines' in the file, just a long string of characters (which may have newline characters).

    If you are using a StreamReader, or similar, you can do StreamReader.basestream.length to see the total length.

    Compare this to the Position property, and you can see the percentage of the file that has been read.


    LVL 8

    Author Comment

    yeah right now I have a file that looks like this

    100 students
    just in time

    what I was trying to do is create a progress bar that moved as it read each line, but having the progressbar be set at 3 for the max. Now keep in mind the max may be 12 , 100, 300.

    thats why I wanted to see if it could could lines,

    Any better alternatives

    LVL 5

    Expert Comment

    Well it is not possible to know the number of lines in the file without reading the entire file first.  

    But for small files like you have I dont know why you wouldn't read the entire thing anyway, it would happen instantaneously, so the progress bar wouldn't serve much purpose.

    You can read the entire file, then do a split() on the newline character to get an array of lines......

    But really no way to read the number of lines faster than reading the entire file, unless of course you write the number of lines to the top of the file, the first line could be '3' on  your example file.
    LVL 8

    Author Comment

    well which is better.

    dim lines as string
    While filetoread.Peek <> -1
           lines = filetoread.readline
    end while


    dim i as integer
    dim lines as string
    dim linesplit () as string

    lines = filetoread.readtoend
    linesplit = line.split(vbcrlf)

    for i = 0 to lines - 1
       msgbox linesplit(i)  

    some textfiles will be really really big. Not sure what is better in memory, reading all at once or a line at a time.

    where this is going if I readtoend then I can do a length and get how many lines, but I don't want to ruin performance if using peek is better
    LVL 85

    Expert Comment

    by:Mike Tomlinson
    If your files will be "really really big" then reading the entire thing into a string and using split is not a good idea.  Reading line by line will take longer...but will not eat up lots of memory.
    LVL 2

    Accepted Solution

      Unfortunately, the modern way with files seems to have settled on them containing a stream of bytes only. The filesystem interface to your computer language on your system may offer some call that will tell you the file size in bytes (or whatever unit), and with a guess as to the average number of bytes/line you could make an estimate of the number of lines. Alternatively, your progress percentage could be based on (bytes read)/FileSizeInBytes*100% for an equivalent effect. Your accounting will need to be sure whether the end-of-record marker is included in the various counts. It may be CR, LF, or CRLF or LFCR, ha ha, but the question is whether the filesize includes them (probably does) and whether the READ statement's result does (probably doesn't), thus you need to add one, or maybe two to the bytes read of each input record. Be cautious as to the treatment of trailing spaces and tabs, etc. as well.
       Other file systems offer files with record lengths and know how many records there are in a file, so your requirement could be met directly.
    LVL 1

    Expert Comment

    Your application could look for another file (say your text file is named "DATA.TXT" then this other file could be called "DATA.LEN" or something) and if it exists, read the number of lines from it before reading in the lines. If this extra file is missing or older than the data file (in case some other program changed it), then your application would count the number of lines as it reads the file, and then create a new line count file. Whenever the data file is written to, the line count file should be updated.

    This would speed up operation in the long run. Some professional programs use similar methods, eg. CoolEdit, which is an audio editor that stores some complementary data in a "peak file" so loading is faster. If the peak file is removed or the .WAV file changed by another program then loading is slower the next time (and a new peak file is created).

    If you prefer seeking and reading fragments from the data file to reading the whole thing into memory, then you might even want to store a complete index of all the lines in your complementary file (which in this case I'll call the "index file"). If you use a long integer for each line position then the length of the index file (if divided by 4, the size of a long integer) will be the number of lines, and to get the position of a specific line you'd first do a seek in the index file to the line number (starting at 0) times four to find out where the line is. As with the line count file described above, whenever the index file is missing or the dates of the index file and data file don't match, and whenever you change the data file, you'd have to rebuild the index.

    Hope this is of some help to you.
    LVL 8

    Author Comment

    so it the file.peek <> -1 a better way to read files???

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    What Security Threats Are You Missing?

    Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

    Purpose To explain how to place a textual stamp on a PDF document.  This is commonly referred to as an annotation, or possibly a watermark, but a watermark is generally different in that it is somewhat translucent.  Watermark’s may be text or graph…
    This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
    An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
    In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

    759 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    10 Experts available now in Live!

    Get 1:1 Help Now