Peter Chan
asked on
Problem to search
Hi,
It is fine to search file like
https://dl.dropboxusercontent.com/u/40211031/flout_w.bin
using the exe file generated from
and here are what I get
but when I'm to search big file having the same structure, I get this
while the string does exist within the file.
here is .h file
It is fine to search file like
https://dl.dropboxusercontent.com/u/40211031/flout_w.bin
using the exe file generated from
//
//
#pragma warning (disable: 4996)
#include "stdafx.h"
#include <set>
#include <sys/stat.h>
#include <string>
#include <fstream>
#include <sstream>
#include <atlbase.h>
#include <ctype.h>
#include <process.h>
#include <vector>
#include <iostream>
#include <algorithm>
#include "..\..\include\nameval.h"
#include <iomanip>
#include <Windows.h>
struct stat fs = { 0 };
int ret; //
int numRecords;
nameval binrec;
bool LessComp(const nameval& a1, const nameval& a2)
{
if (strcmp(a1.fld_nm, a2.fld_nm) < 0) return true;
if (strcmp(a1.fld_nm, a2.fld_nm) > 0) return false;
if (a1.fld_val < a2.fld_val) return true;
return false;
}
int _tmain(int argc, _TCHAR* argv[])
{
if (argc < 1)
{
return ERROR;
}
unsigned int nbegin = 0;
unsigned int nend = numRecords - 1;
unsigned int nmid;
unsigned int nstop = 0;
char nm_got[100];
unsigned int val_got;
time_t timev, currtime;
float sec;
timev = time(0);
std::ifstream inputfiles;
nameval names = { 0 };
std::ostringstream filename;
filename << "c:\\dp4\\flout_w.bin";
std::set<nameval> records;
std::set<nameval>::iterator iter;
inputfiles.open(filename.str().c_str(), std::ios::binary | std::ios::in);
if (!inputfiles.is_open())
return -3; //
if (!inputfiles.read((char*)&names, sizeof(nameval)))
return -4; //
ret = stat(filename.str().c_str(), &fs);
if (ret != 0)
return -4;
numRecords = (int)(fs.st_size / sizeof(nameval));
nbegin = 0;
nend = numRecords - 1;
nstop = 0;
char szArgv2[512] = { 0 };
size_t ncharsConverted = 0;
wcstombs(szArgv2, argv[1], sizeof(szArgv2));
while (nbegin <= nend && nstop != -1)
{
nmid = (nbegin + nend) / 2;
nameval rec = { 0 };
inputfiles.seekg(nmid* sizeof(nameval));
inputfiles.read((char*)&rec, sizeof(nameval));
if (strcmp(szArgv2, rec.fld_nm)<0)
{
nend = nmid - 1;
nmid = (nbegin + nend) / 2;
}
else
{
if (strcmp(szArgv2, rec.fld_nm)>0)
{
nbegin = nmid + 1;
nmid = (nbegin + nend) / 2;
}
else
{
nstop = -1;
strcpy(rec.fld_nm, nm_got);
val_got = rec.fld_val;
}
}
}
if (nstop == -1)
{
std::cout << "\nFound it!\n";
std::cout << "(From vector record: " << nm_got
<< ' ' << val_got << ")\n";
}
else std::cout << "\nDidn't find it within file -'" << filename.str().c_str() << "'!\n";
return 0;
}
and here are what I get
C:\ReadBinaryFile\x64\Release>ReadBinaryFile "zzzwBdUCSIZpiPajxmVV"
Found it!
(From vector record: Éï? 588081)
but when I'm to search big file having the same structure, I get this
C:\ReadBinaryFile\x64\Release>ReadBinaryFile "zzzzzOMoXmtyPzuCfXEJ"
while the string does exist within the file.
here is .h file
// nameval.h
#ifndef NAME_VAL_H
#define NAME_VAL_H
struct nameval
{
char fld_nm[21];
long long fld_val;
int get_len() { return (int)min(strlen(fld_nm), sizeof(fld_nm) ) ; }
void get_uni_nm(wchar_t nm_uni[], int sizfld)
{
//mbstowcs_s(nm_uni, fld_nm, min(sizfld-1, strlen(fld_nm)));
size_t ncharsConverted = 0;
mbstowcs_s(&ncharsConverted, nm_uni, sizfld, fld_nm, min(sizfld-1, (int)strlen(fld_nm)));
}
bool operator< (const nameval & a2) const
{
if(strcmp(fld_nm, a2.fld_nm) < 0) return true;
if(strcmp(fld_nm, a2.fld_nm) > 0) return false;
if (fld_val < a2.fld_val) return true;
return false;
}
};
#endif
Is the file you are searching sorted on fld_nm?
ASKER
Yes, sorted already.
It looks like strcpy(rec.fld_nm, nm_got);
should be strcpy(nm_got, rec.fld_nm);
should be strcpy(nm_got, rec.fld_nm);
ASKER
No, it is working fine to search file with smaller size. Why does the problem arise with file in bigger size?
(From vector record: Éï? 588081)
does not seem to be working fine. Shouldn't it have been
(From vector record: zzzwBdUCSIZpiPajxmVV 588081)
strcpy from an uninitialized variable would give undefined behavior
which may manifest in inconsistent ways.
If a file is large enough for 2*numRecords to exceed the size of an int, or for st_size to exceed the size of off_t, that could cause problems to arise, bit it should still report Didn't find it within file -
which I don't see in your output.
does not seem to be working fine. Shouldn't it have been
(From vector record: zzzwBdUCSIZpiPajxmVV 588081)
strcpy from an uninitialized variable would give undefined behavior
which may manifest in inconsistent ways.
If a file is large enough for 2*numRecords to exceed the size of an int, or for st_size to exceed the size of off_t, that could cause problems to arise, bit it should still report Didn't find it within file -
which I don't see in your output.
ASKER
strcpy from an uninitialized variable would give undefined behavior
which may manifest in inconsistent ways.
Thanks. What to adjust to the above codes?
If a file is large enough for 2*numRecords to exceed the size of an int, or for st_size to exceed the size of off_t, that could cause problems to arise, bit it should still report Didn't find it within file -
which I don't see in your output.
How to enhance the codes to read big file?
strcpy(rec.fld_nm, nm_got); should be strcpy(nm_got, rec.fld_nm);
For very large files, you may need
long numRecords,nbegin,nend,nmi d;
For very large files, you may need
long numRecords,nbegin,nend,nmi
ASKER
Sorry, if there is problem with the strcpy line, why is it fine to search the file mentioned in above? Thanks.
It is not fine to search the file mentioned in above.
C:\ReadBinaryFile\x64\Rele ase>ReadBi naryFile "zzzwBdUCSIZpiPajxmVV"
Found it!
(From vector record: Éï? 588081)
should be
(From vector record: zzzwBdUCSIZpiPajxmVV 588081)
Also, with undefined behavior, anything at all can happen, including failing randomly, or accidentally appearing fine.
C:\ReadBinaryFile\x64\Rele
Found it!
(From vector record: Éï? 588081)
should be
(From vector record: zzzwBdUCSIZpiPajxmVV 588081)
Also, with undefined behavior, anything at all can happen, including failing randomly, or accidentally appearing fine.
ASKER
Thanks a lot.
I've done the change below
but I still get this
when searching against big file. I can further show you the big file, if possible.
I've done the change below
//
//
#pragma warning (disable: 4996)
#include "stdafx.h"
#include <set>
#include <sys/stat.h>
#include <string>
#include <fstream>
#include <sstream>
#include <atlbase.h>
#include <ctype.h>
#include <process.h>
#include <vector>
#include <iostream>
#include <algorithm>
#include "..\..\include\nameval.h"
#include <iomanip>
#include <Windows.h>
struct stat fs = { 0 };
int ret; //
long numRecords;
nameval binrec;
bool LessComp(const nameval& a1, const nameval& a2)
{
if (strcmp(a1.fld_nm, a2.fld_nm) < 0) return true;
if (strcmp(a1.fld_nm, a2.fld_nm) > 0) return false;
if (a1.fld_val < a2.fld_val) return true;
return false;
}
int _tmain(int argc, _TCHAR* argv[])
{
if (argc < 1)
{
return ERROR;
}
long nbegin = 0;
long nend = numRecords - 1;
long nmid;
unsigned int nstop = 0;
char nm_got[100];
unsigned int val_got;
time_t timev, currtime;
float sec;
timev = time(0);
std::ifstream inputfiles;
nameval names = { 0 };
std::ostringstream filename;
filename << "c:\\dp4\\flout_w.bin";
std::set<nameval> records;
std::set<nameval>::iterator iter;
inputfiles.open(filename.str().c_str(), std::ios::binary | std::ios::in);
if (!inputfiles.is_open())
return -3; //
if (!inputfiles.read((char*)&names, sizeof(nameval)))
return -4; //
ret = stat(filename.str().c_str(), &fs);
if (ret != 0)
return -4;
numRecords = (int)(fs.st_size / sizeof(nameval));
nbegin = 0;
nend = numRecords - 1;
nstop = 0;
char szArgv2[512] = { 0 };
size_t ncharsConverted = 0;
wcstombs(szArgv2, argv[1], sizeof(szArgv2));
while (nbegin <= nend && nstop != -1)
{
nmid = (nbegin + nend) / 2;
nameval rec = { 0 };
inputfiles.seekg(nmid* sizeof(nameval));
inputfiles.read((char*)&rec, sizeof(nameval));
if (strcmp(szArgv2, rec.fld_nm)<0)
{
nend = nmid - 1;
nmid = (nbegin + nend) / 2;
}
else
{
if (strcmp(szArgv2, rec.fld_nm)>0)
{
nbegin = nmid + 1;
nmid = (nbegin + nend) / 2;
}
else
{
nstop = -1;
strcpy(nm_got, rec.fld_nm);
val_got = rec.fld_val;
}
}
}
if (nstop == -1)
{
std::cout << "\nFound it!\n";
std::cout << "(From vector record: " << nm_got
<< ' ' << val_got << ")\n";
}
else std::cout << "\nDidn't find it within file -'" << filename.str().c_str() << "'!\n";
time(&currtime);
sec = difftime(currtime, timev);
std::cout << "Search finishes with only " << sec << " seconds";
system("pause>null");
return 0;
}
but I still get this
C:\ReadBinaryFile\x64\Release>ReadBinaryFile "zzzzzOMoXmtyPzuCfX
EJ"
C:\ReadBinaryFile\x64\Release>
when searching against big file. I can further show you the big file, if possible.
"zzzzzOMoXmtyPzuCfX
EJ"
looks like 21 characters. If it is really contained in the file, it would either overflow char fld_nm[21]; or be unterminated, either of which would again cause undefined behavior.
Also, if the file too big, you are still casting to (int)
EJ"
looks like 21 characters. If it is really contained in the file, it would either overflow char fld_nm[21]; or be unterminated, either of which would again cause undefined behavior.
Also, if the file too big, you are still casting to (int)
ASKER
Yes, definitely the string does exist within the 5GB file. I'm afraid of that I may not be able to upload it, as it is still 2.8 GB after having zipped it.
What do you mean to this?
Also, if the file too big, you are still casting to (int)
What do you mean to this?
2.8GB/sizeof(nameval) should not overflow a signed 32 bit int.
But a 21 character string existing in a fld_nm[21] would cause undefined behavior
But a 21 character string existing in a fld_nm[21] would cause undefined behavior
Does your ifstream handle >32 bit streampos? What is tellg after the seekg?
ASKER
Does your ifstream handle >32 bit streampos? What is tellg after the seekg?
It is one x64 project. What do I need to show to you to check it? thanks
What return value did you get from the program?
ASKER
I get nothing like
C:\ReadBinaryFile\x64\Release>ReadBinaryFile "zzzzzOMoXmtyPzuCfX
EJ"
C:\ReadBinaryFile\x64\Release>
Did you get a return value from one of your
return ERROR;
return -3;
return -4;
return 0;
statements?
If not, what was the system return value?
If you check it with
echo $?
or
echo %errorlevel%
that should give you a clue to where your program is failing.
return ERROR;
return -3;
return -4;
return 0;
statements?
If not, what was the system return value?
If you check it with
echo $?
or
echo %errorlevel%
that should give you a clue to where your program is failing.
ASKER
I did not get it.
Are you saying the exit status was 0 ?
ASKER
Sorry, the question is, when running it against big file, it is not showing any output. What to adjust to the codes?
What was the exit status?
ASKER
Sorry, I run exe file to do search. How to adjust the codes to show exit status?
echo %errorlevel%
ASKER
I get
-4
-4
So it looks like it came from one of
if (!inputfiles.read((char*)& names, sizeof(nameval)))
return -4; //
ret = stat(filename.str().c_str( ), &fs);
if (ret != 0)
return -4;
if (!inputfiles.read((char*)&
return -4; //
ret = stat(filename.str().c_str(
if (ret != 0)
return -4;
ASKER
How to identify the problem?
It would seem that either the read failed, or the stat failed.
ASKER
How to correct the codes (regarding -4), to ensure it is fine?
If the read failed, you might check the state flags eofbit, failbit, badbit
If the stat failed, you might check errno
If the stat failed, you might check errno
ASKER
Can I have more details to check these? thanks.
stat can fail due to
EOVERFLOW
path or fd refers to a file whose size, inode number, or number of blocks cannot be represented in, respectively, the types off_t, ino_t, or blkcnt_t. This error can occur when, for example, an application compiled on a 32-bit platform without -D_FILE_OFFSET_BITS=64 calls stat() on a file whose size exceeds (1<<31)-1 bytes.
EOVERFLOW
path or fd refers to a file whose size, inode number, or number of blocks cannot be represented in, respectively, the types off_t, ino_t, or blkcnt_t. This error can occur when, for example, an application compiled on a 32-bit platform without -D_FILE_OFFSET_BITS=64 calls stat() on a file whose size exceeds (1<<31)-1 bytes.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
The project is already x64 project.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
OK. Thanks.