• C

Combined String/File Handling

I've prototyped a program in Python that groups files (ie, tar replacement) and I'm about to start with the real version, which will be in C. For reference, this is the header for the archives:

0#(creation date)#(creation time)#(creator name)#
(number of files)#(file 1's size)#(file 2's size)#...#(file n's size)#
0#(file 1's path)#(file 2's path)#...#(file n's path)#

Then it's a straight file system of the files. And the footer is just "## (eof):".
What I need to do is split the archive by that weird line in the header then take the line with the file sizes and use that to split up the actual data of the archive itself. (Ex: if file 1 is 30 bytes long, it reads 30 bytes into the data and gets the filename from line 3 of the header and writes the file.)

What I need from everyone else is either links to a good site to learn string handling in C, or sample code on string handling. I'll accept C++, but I'm trying to stay C. I'll put everyone's name who helps in the credits. :) (I'm calling this mar and so far makes smaller, more efficient files than tar.)
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

simple, use strchr() to find the position of a #, then strncpy to extract substrings, then atoi() to convert numbers to binary.

Also, you might want to think about these issues:

(1)  How are you going to ensure the file's integrity  (think: checksum)

(2) How are you going to handle files that may have had their end-of-line codes translated or mangled or space-trimmed in transmission.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Kent OlsenDBACommented:

Smaller files than tar, huh?  That's a pretty good motivation!  :)

You'll want to think very carefully about your header structure.  As you've indicated, you can place the directory structure at the beginning of the file, but no matter where you place it, there will be trade-offs.

*  The archive starts with the header/directory as you've indicated.

What happens when you want to mar(1) a lot of files?  You have to scan all of the directory entries, build the header, and then copy the files.  What if some of the files are dynamic?  If /var/log/messages is one of the files, it could easily be a different size when you go to copy the data than when you built the header.  There goes any hope of restoring the file or any other file on the archive that was written after this file.

*  The archive has an archive header, and a file header is written immediately prior to each file.

This solves the length mismatch issue.  But listing the contents of an archive or searching an archive for a particular file becomes very inefficient since you'll have to walk through the archive, potentially reading from disk for every header.

*  The directory is written at the end of the file, after all of the file contents are recorded.

If data integrity is important (and it should be) this is probably the easiest.  It also allows you to scan the directory just as quickly as if it were the first item on the file.

The archive contents could look something like this:

#archive header

dp is the random address (byte offset) of the directory.  To access the directory:

  handle = open(ArchiveName, O_RDONLY);
  seek(handle, 0l-(sizeof(long)), SEEK_END);

Now read and process it as if it were at the beginning of the file.

Are you going to compress these files as you record them?

Kent OlsenDBACommented:

Sorry.  Got ahead of myself...

dp is the random address (byte offset) of the directory.  To access the directory:

  handle = open(ArchiveName, O_RDONLY);
  seek (handle, 0l-(sizeof(long)), SEEK_END);
  read (handle, &dp, sizeof (long));
  seek (handle, dp, SEEK_SET);


Protecting & Securing Your Critical Data

Considering 93 percent of companies file for bankruptcy within 12 months of a disaster that blocked access to their data for 10 days or more, planning for the worst is just smart business. Learn how Acronis Backup integrates security at every stage

It seems that you are a good programmer ...
Open the header file  string.h  to find the functionalities that c string library provides

just as an example here is a sample code to extract a substring from a bigger string  as is your case


char  buffer [] ="0#date#date#myfile.txt#"
char  buffer_1[64];

//now suppose you know that you want to get the first filename
//use followin code

char * temp = buffer;
int     seperator_count;
int     start,end;

    if (*temp =='\0' ) break;//endo of string
    if (*temp == #)  {seperator_count++;

   if (seperator_count ==3)  start = temp -buffer;
   if  (seperator_count ==4)  {end = temp -buffer ; break;}

// now copy the required string into the buffer_1;

memcpy(buffer_1, buffer+start+1/*skip the beginning #*/,end -start -2/*-2 for the two #*/);
//now terminate the buffer_1 with '\0'

*(buffer_1 +start -end-2) ='\0';

MalevolynAuthor Commented:
Hm...thanks for the help. You see, I'm a big time Python programmer trying to move into compiled languages from interpreted (despite compiling Python and Perl scripts into exe's on a daily basis without embedding manually) and, of course, going from pretty syntax to angry syntax is a killer. What I don't understand is how I can bang out PHP no problem but can't do C at all...

Oh, and I changed the name to a3f.

And about the /var/log/messages thing, wouldn't you have similar problems with other file grouping algorithms? I could always make a3f cache the files before writing them...but that's not an issue anymore. I changed the format of the resulting archives. I was having trouble in the prototype with how it reads a given amount of bytes. Files are seperated with the same line as seperates the header from the data. So there's no need to worry about having the incorrect filesize in the header anymore, but I'm going to keep the code there for a few reasons. I'm too lazy to remove it and it makes the header look cooler. =D

My aim is to make the same exact code work on POSIX and Win32. At least on a prototype level. The C will obviously be different. As I type this, I'm realizing that I'm basically asking this community to write this program for me. Which is explainable in that I don't know C very well. But as I said, everyone will get credited for their help. Hopefully a3f will become a popular grouping format. My POSIX Python module distributions could be printnn-! =D
MalevolynAuthor Commented:
Forgot to answer one question: No, I'm not going to compress them. I might do some slight compression work later (replacing spaces at the beginning of lines in non-binary files), but that's for another day...
Kent OlsenDBACommented:

If you open the file (with lock) and then stat() it, you'll be able to copy it intact and get the correct length without fear of other processes changing the file.  Of course, this has its own pitfalls in that you will have changed the file's access timestamp before you stat() it.  Perhaps two stat() calls are required?  Or maybe just the one prior to opening the file is sufficient.  (It's possible for another file to access the file between the stat() and the open().)  Then again, if you count the bytes as you copy them, the file length won't be a problem, huh?

StatBefore  = stat (FileName);
handle = open (FileName, O_RDONLY|O_BINARY);

With perl you can easily change file attributes such as the access time stamps.  With C it's not so easy.  And since you're opening the file for read, you may want to reset the timestamps to their "user" values.

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.