Link to home
Start Free TrialLog in
Avatar of Peter Chan
Peter ChanFlag for Hong Kong

asked on

Problem to search

Hi,
It is fine to search file like

https://dl.dropboxusercontent.com/u/40211031/flout_w.bin

using the exe file generated from
// 
//

#pragma warning (disable: 4996) 
#include "stdafx.h"
#include <set>
#include <sys/stat.h>
#include <string>
#include <fstream>
#include <sstream>
#include <atlbase.h>
#include <ctype.h>
#include <process.h>
#include <vector>
#include <iostream>
#include <algorithm>
#include "..\..\include\nameval.h"
#include <iomanip>
#include <Windows.h>
struct stat fs = { 0 };
int ret; //
int numRecords;
nameval binrec;
bool LessComp(const nameval& a1, const nameval& a2)
{
	if (strcmp(a1.fld_nm, a2.fld_nm) < 0) return true;
	if (strcmp(a1.fld_nm, a2.fld_nm) > 0) return false;
	if (a1.fld_val < a2.fld_val) return true;
	return false;
}
int _tmain(int argc, _TCHAR* argv[])
{
	if (argc < 1)
	{
		return ERROR;
	}
	unsigned int nbegin = 0;
	unsigned int nend = numRecords - 1;
	unsigned int nmid;
	unsigned int nstop = 0;
	char nm_got[100];
	unsigned int val_got;
	time_t timev, currtime;
	float sec;
	timev = time(0);
	std::ifstream inputfiles;
	nameval names = { 0 };
		std::ostringstream filename;
		filename << "c:\\dp4\\flout_w.bin";
		std::set<nameval> records;
		std::set<nameval>::iterator iter;
		inputfiles.open(filename.str().c_str(), std::ios::binary | std::ios::in);
		if (!inputfiles.is_open())
			return -3; //
		if (!inputfiles.read((char*)&names, sizeof(nameval)))
			return -4; //
		ret = stat(filename.str().c_str(), &fs);
		if (ret != 0)
			return -4;
		numRecords = (int)(fs.st_size / sizeof(nameval));
		nbegin = 0;
		nend = numRecords - 1;
		nstop = 0;
		char szArgv2[512] = { 0 };
		size_t ncharsConverted = 0;
		wcstombs(szArgv2, argv[1], sizeof(szArgv2));
		while (nbegin <= nend && nstop != -1)
		{
			nmid = (nbegin + nend) / 2;
			nameval rec = { 0 };
			inputfiles.seekg(nmid* sizeof(nameval));
			inputfiles.read((char*)&rec, sizeof(nameval));
			if (strcmp(szArgv2, rec.fld_nm)<0)
			{
				nend = nmid - 1;
				nmid = (nbegin + nend) / 2;
			}
			else
			{
				if (strcmp(szArgv2, rec.fld_nm)>0)
				{
					nbegin = nmid + 1;
					nmid = (nbegin + nend) / 2;
				}
				else
				{
					nstop = -1;
					strcpy(rec.fld_nm, nm_got);
					val_got = rec.fld_val;
				}
			}
		}
		if (nstop == -1)
		{
			std::cout << "\nFound it!\n";
			std::cout << "(From vector record: " << nm_got
				<< ' ' << val_got << ")\n";
		}
		else std::cout << "\nDidn't find it within file -'" << filename.str().c_str() << "'!\n";
	return 0;
}

Open in new window


and here are what I get

C:\ReadBinaryFile\x64\Release>ReadBinaryFile "zzzwBdUCSIZpiPajxmVV"

Found it!
(From vector record: Éï? 588081)

Open in new window


but when I'm to search big file having the same structure, I get this
C:\ReadBinaryFile\x64\Release>ReadBinaryFile "zzzzzOMoXmtyPzuCfXEJ"

Open in new window


while the string does exist within the file.

here is .h file
// nameval.h
#ifndef NAME_VAL_H
#define NAME_VAL_H

struct nameval
{
     char fld_nm[21];
     long long    fld_val;

     int  get_len() { return (int)min(strlen(fld_nm), sizeof(fld_nm) ) ; }
     void get_uni_nm(wchar_t nm_uni[], int sizfld)
     {
            //mbstowcs_s(nm_uni, fld_nm, min(sizfld-1, strlen(fld_nm)));
			size_t ncharsConverted = 0;
			mbstowcs_s(&ncharsConverted, nm_uni, sizfld, fld_nm, min(sizfld-1, (int)strlen(fld_nm)));
     }
     bool operator< (const nameval & a2) const
     {
           if(strcmp(fld_nm, a2.fld_nm) < 0) return true;
           if(strcmp(fld_nm, a2.fld_nm) > 0) return false;
           if (fld_val < a2.fld_val) return true;
           return false;
     }
};

#endif

Open in new window

Avatar of ozo
ozo
Flag of United States of America image

Is the file you are searching sorted on fld_nm?
Avatar of Peter Chan

ASKER

Yes, sorted already.
It looks like strcpy(rec.fld_nm, nm_got);
should be strcpy(nm_got, rec.fld_nm);
No, it is working fine to search file with smaller size. Why does the problem arise with file in bigger size?
(From vector record: Éï? 588081)
does not seem to be working fine.  Shouldn't it have been
(From vector record: zzzwBdUCSIZpiPajxmVV 588081)

strcpy from an uninitialized variable would give undefined behavior
which may manifest in inconsistent ways.

If a file is large enough for 2*numRecords to exceed the size of an int, or for st_size to exceed the size of off_t, that could cause problems to arise, bit it should still report Didn't find it within file -
which I don't see in your output.
strcpy from an uninitialized variable would give undefined behavior
which may manifest in inconsistent ways.

Thanks. What to adjust to the above codes?

If a file is large enough for 2*numRecords to exceed the size of an int, or for st_size to exceed the size of off_t, that could cause problems to arise, bit it should still report Didn't find it within file -
which I don't see in your output.

How to enhance the codes to read big file?
strcpy(rec.fld_nm, nm_got); should be strcpy(nm_got, rec.fld_nm);

For very large files, you may need
long numRecords,nbegin,nend,nmid;
Sorry, if there is problem with the strcpy line, why is it fine to search the file mentioned in above? Thanks.
It is not fine to search the file mentioned in above.
C:\ReadBinaryFile\x64\Release>ReadBinaryFile "zzzwBdUCSIZpiPajxmVV"

Found it!
(From vector record: Éï? 588081)
should be
(From vector record: zzzwBdUCSIZpiPajxmVV 588081)

Also, with undefined behavior, anything at all can happen, including failing randomly, or accidentally appearing fine.
Thanks a lot.

I've done the change below
// 
//

#pragma warning (disable: 4996) 
#include "stdafx.h"
#include <set>
#include <sys/stat.h>
#include <string>
#include <fstream>
#include <sstream>
#include <atlbase.h>
#include <ctype.h>
#include <process.h>
#include <vector>
#include <iostream>
#include <algorithm>
#include "..\..\include\nameval.h"
#include <iomanip>
#include <Windows.h>
struct stat fs = { 0 };
int ret; //
long numRecords;
nameval binrec;
bool LessComp(const nameval& a1, const nameval& a2)
{
	if (strcmp(a1.fld_nm, a2.fld_nm) < 0) return true;
	if (strcmp(a1.fld_nm, a2.fld_nm) > 0) return false;
	if (a1.fld_val < a2.fld_val) return true;
	return false;
}
int _tmain(int argc, _TCHAR* argv[])
{
	if (argc < 1)
	{
		return ERROR;
	}
	long nbegin = 0;
	long nend = numRecords - 1;
	long nmid;
	unsigned int nstop = 0;
	char nm_got[100];
	unsigned int val_got;
	time_t timev, currtime;
	float sec;
	timev = time(0);
	std::ifstream inputfiles;
	nameval names = { 0 };
		std::ostringstream filename;
		filename << "c:\\dp4\\flout_w.bin";
		std::set<nameval> records;
		std::set<nameval>::iterator iter;
		inputfiles.open(filename.str().c_str(), std::ios::binary | std::ios::in);
		if (!inputfiles.is_open())
			return -3; //
		if (!inputfiles.read((char*)&names, sizeof(nameval)))
			return -4; //
		ret = stat(filename.str().c_str(), &fs);
		if (ret != 0)
			return -4;
		numRecords = (int)(fs.st_size / sizeof(nameval));
		nbegin = 0;
		nend = numRecords - 1;
		nstop = 0;
		char szArgv2[512] = { 0 };
		size_t ncharsConverted = 0;
		wcstombs(szArgv2, argv[1], sizeof(szArgv2));
		while (nbegin <= nend && nstop != -1)
		{
			nmid = (nbegin + nend) / 2;
			nameval rec = { 0 };
			inputfiles.seekg(nmid* sizeof(nameval));
			inputfiles.read((char*)&rec, sizeof(nameval));
			if (strcmp(szArgv2, rec.fld_nm)<0)
			{
				nend = nmid - 1;
				nmid = (nbegin + nend) / 2;
			}
			else
			{
				if (strcmp(szArgv2, rec.fld_nm)>0)
				{
					nbegin = nmid + 1;
					nmid = (nbegin + nend) / 2;
				}
				else
				{
					nstop = -1;
					strcpy(nm_got, rec.fld_nm);
					val_got = rec.fld_val;
				}
			}
		}
		if (nstop == -1)
		{
			std::cout << "\nFound it!\n";
			std::cout << "(From vector record: " << nm_got
				<< ' ' << val_got << ")\n";
		}
		else std::cout << "\nDidn't find it within file -'" << filename.str().c_str() << "'!\n";
	time(&currtime);
	sec = difftime(currtime, timev);
	std::cout << "Search finishes with only " << sec << " seconds";
	system("pause>null");
	return 0;
}

Open in new window


but I still get this
C:\ReadBinaryFile\x64\Release>ReadBinaryFile "zzzzzOMoXmtyPzuCfX
EJ"

C:\ReadBinaryFile\x64\Release>

Open in new window


when searching against big file. I can further show you the big file, if possible.
"zzzzzOMoXmtyPzuCfX
EJ"
looks like 21 characters.  If it is really contained in the file, it would either overflow  char fld_nm[21]; or be unterminated, either of which would again cause undefined behavior.

Also, if the file too big, you are still casting to (int)
Yes, definitely the string does exist within the 5GB file. I'm afraid of that I may not be able to upload it, as it is still 2.8 GB after having zipped it.

Also, if the file too big, you are still casting to (int)

What do you mean to this?
2.8GB/sizeof(nameval) should not overflow a signed 32 bit int.

But a 21 character string existing in a fld_nm[21] would cause undefined behavior
Does your ifstream handle >32 bit streampos?  What is tellg after the seekg?
Does your ifstream handle >32 bit streampos?  What is tellg after the seekg?

It is one x64 project. What do I need to show to you to check it? thanks
What return value did you get from the program?
I get nothing like
C:\ReadBinaryFile\x64\Release>ReadBinaryFile "zzzzzOMoXmtyPzuCfX
EJ"

C:\ReadBinaryFile\x64\Release>

Open in new window

Did you get a return value from one of your
return ERROR;
return -3;
 return -4;
return 0;
statements?
If not, what was the system return value?
If you check it with
echo $?
or
echo %errorlevel%
that should give you a clue to where your program is failing.
I did not get it.
Are you saying the exit status was 0 ?
Sorry, the question is, when running it against big file, it is not showing any output. What to adjust to the codes?
What was the exit status?
Sorry, I run exe file to do search. How to adjust the codes to show exit status?
echo %errorlevel%
I get

-4
So it looks like it came from one of
            if (!inputfiles.read((char*)&names, sizeof(nameval)))
                  return -4; //
            ret = stat(filename.str().c_str(), &fs);
            if (ret != 0)
                  return -4;
How to identify the problem?
It would seem that either the read failed, or the stat failed.
How to correct the codes (regarding -4), to ensure it is fine?
If the read failed, you might check the state flags eofbit, failbit, badbit
If the stat failed, you might check errno
Can I have more details to check these? thanks.
stat can fail due to
EOVERFLOW
    path or fd refers to a file whose size, inode number, or number of blocks cannot be represented in, respectively, the types off_t, ino_t, or blkcnt_t. This error can occur when, for example, an application compiled on a 32-bit platform without -D_FILE_OFFSET_BITS=64 calls stat() on a file whose size exceeds (1<<31)-1 bytes.
SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The project is already x64 project.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
OK. Thanks.