Peter Chan
asked on
Way to improve it
Hi,
Further to this thread
https://www.experts-exchange.com/questions/28629782/Details-to-do-the-search.html?anchorAnswerId=40686952#a40686952
I try the way to have 10 items to the struct below
struct rec_struc
{
item items[10];
rec_struc(bool bfill = false);
};
but the speed of the project is rather slow. Is there a way to improve it?
Further to this thread
https://www.experts-exchange.com/questions/28629782/Details-to-do-the-search.html?anchorAnswerId=40686952#a40686952
I try the way to have 10 items to the struct below
struct rec_struc
{
item items[10];
rec_struc(bool bfill = false);
};
but the speed of the project is rather slow. Is there a way to improve it?
I would use something like Telerik JustTrace to analyse my code to determine where the maximum time is spent. I'd combine Just trace with perfmon or some other system performance monitoring tool and look for disk and memory usage.
Each time you create an instance of 'rec_struct', you are creating 10 instances of 'item'.
Do you ALWAYS need 10 instances of 'item'?
If not, you can improve performance by changing your structure from an array of 'item' objects to an array of pointers to 'item' objects.
struct rec_struc
{
item* pItems[10];
rec_struc(bool bfill = false);
};
Even better should be to use a dynamic array object. That way you're not locked into only 10 pointers. For example, I use MFC that includes a CPtrArray. So I might have code that looks like this:
struct rec_struc
{
CPtrArray Items;
~rec_struc(); //Destructor
};
//This destructor will make sure that all 'Item' objects placed into Items array are deleted when the structure goes out of scope
rec_struc::~rec_struc()
{
for(int i=0; i<Items.GetCount(); i++
delete Items.GetAt(i);
}
...
rec_struc r;
item* pItem;
pItem = new Item();
r.Items.Add( pItem ); //The CPtrArray will auto-grow the array as needed
...
pItem = (Item) r.Items.GetAt(i);
Do you ALWAYS need 10 instances of 'item'?
If not, you can improve performance by changing your structure from an array of 'item' objects to an array of pointers to 'item' objects.
struct rec_struc
{
item* pItems[10];
rec_struc(bool bfill = false);
};
Even better should be to use a dynamic array object. That way you're not locked into only 10 pointers. For example, I use MFC that includes a CPtrArray. So I might have code that looks like this:
struct rec_struc
{
CPtrArray Items;
~rec_struc(); //Destructor
};
//This destructor will make sure that all 'Item' objects placed into Items array are deleted when the structure goes out of scope
rec_struc::~rec_struc()
{
for(int i=0; i<Items.GetCount(); i++
delete Items.GetAt(i);
}
...
rec_struc r;
item* pItem;
pItem = new Item();
r.Items.Add( pItem ); //The CPtrArray will auto-grow the array as needed
...
pItem = (Item) r.Items.GetAt(i);
HooKooDooKu, the structures of the program were used for storing binary data to records of file. so, it is not possible to using pointers or classes/structures which cannot made persistent by a flat copy.
in the previous thread (see the link in the original post), we used a structure fouritems (which has an array of 4 items and not 10 like it is here) and stored 80.000.000 records (320 million items) to a binary file. an item contains of a name (char[21]), a 64-bit number, and a description (wchar_t[100]). for name and number there are additional index files which refer to the record numbers of the data file (means any item name and item number - which are supposed to be unique - point to a 4-item-record of the data file). the index files are sorted and allow a binary search in a second program for either name or number.
I don't know why the number of items per record now should be increased to 10. my suggestion was to use 1 item per record. but anyhow, the most negative influence to the performance are due to std::set containers which are used to get the name and number indices sorted. with the current program (it is a 64-bit program because it needs more than 4 gb virtual memory), there are 2 std::set containers, one would store 10 million of name indices (a 21 byte name + unsigned int record number) and the other 10 million of number indices (a 64-bit integer + unsigned int record number). after the std::set containers were filled, they were stored to a temporary file. currently, 8 index files of each index were built and finally merged to a big index file which contains all 80 million of indices of each kind. my suggestion to improve performance is to decrease the number of entries for one temporary index file and increase the number of temporary files to merge since currently the virtual memory used for the std::set containers is more than available what leads to extreme swapping and a cillion of page faults. on the other hand the merge operation of the index files doesn't need a relevant amount of memory and the speed to read 100 k records from 100 files is the same as reading 1 million records from 10 files. so, if the speed would improve dramatically - as I would expect - it might be worth a try to go to 1000 files of 10000 records each.
Hua, would you please post the latest code of savebinaryfile here in a code window such that other experts have a fair chance to participate?
Sara
in the previous thread (see the link in the original post), we used a structure fouritems (which has an array of 4 items and not 10 like it is here) and stored 80.000.000 records (320 million items) to a binary file. an item contains of a name (char[21]), a 64-bit number, and a description (wchar_t[100]). for name and number there are additional index files which refer to the record numbers of the data file (means any item name and item number - which are supposed to be unique - point to a 4-item-record of the data file). the index files are sorted and allow a binary search in a second program for either name or number.
I don't know why the number of items per record now should be increased to 10. my suggestion was to use 1 item per record. but anyhow, the most negative influence to the performance are due to std::set containers which are used to get the name and number indices sorted. with the current program (it is a 64-bit program because it needs more than 4 gb virtual memory), there are 2 std::set containers, one would store 10 million of name indices (a 21 byte name + unsigned int record number) and the other 10 million of number indices (a 64-bit integer + unsigned int record number). after the std::set containers were filled, they were stored to a temporary file. currently, 8 index files of each index were built and finally merged to a big index file which contains all 80 million of indices of each kind. my suggestion to improve performance is to decrease the number of entries for one temporary index file and increase the number of temporary files to merge since currently the virtual memory used for the std::set containers is more than available what leads to extreme swapping and a cillion of page faults. on the other hand the merge operation of the index files doesn't need a relevant amount of memory and the speed to read 100 k records from 100 files is the same as reading 1 million records from 10 files. so, if the speed would improve dramatically - as I would expect - it might be worth a try to go to 1000 files of 10000 records each.
Hua, would you please post the latest code of savebinaryfile here in a code window such that other experts have a fair chance to participate?
Sara
ASKER
Many thanks all.
Sara,
Using these
it took 7 hours to generate 100 name files and 100 number files. Can the speed of this be improved? Have a great weekend!
Sara,
Using these
const unsigned int NUM_FILES = 100;
const unsigned int NUM_RECORDS = 80000;
...
it took 7 hours to generate 100 name files and 100 number files. Can the speed of this be improved? Have a great weekend!
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Many thanks Sara.
When running "ReadBinaryFile", do we put
Name=...
with the name and then press Enter to start the search? Is this correct?
When running "ReadBinaryFile", do we put
Name=...
with the name and then press Enter to start the search? Is this correct?
yes. to find valid names you ,ay open any of the name index files within visual studio.
Sara
Sara
ASKER
Good day Sara,
I encounter one open error like
when running ReadBinaryFile
I encounter one open error like
Enter your Search Criteria: (CTRL-Z to stop)
Name=PvPHLtNDJAJMeENgbGZ
open error
Enter your Search Criteria: (CTRL-Z to stop)
when running ReadBinaryFile
you may output 'errno' or GetLastError() with the "open error".
error code 2: file or path not found
error code 3: wrong directory
error code 4: too many open files
error code 5: access denied
for 2 and 3 you may output (or debug) the path passed to the open call.
for 4 (less likely) you would have to reduce the number of files opened (or look why the files no longer in use are not closed)
if error is 5 it is likely that another program opened the files and either still was running (look into task manager) or did not close the file properly. if you opened a file in visual studio for to look into, this also may prevent from opening in readbinaryfile.
Sara
error code 2: file or path not found
error code 3: wrong directory
error code 4: too many open files
error code 5: access denied
for 2 and 3 you may output (or debug) the path passed to the open call.
for 4 (less likely) you would have to reduce the number of files opened (or look why the files no longer in use are not closed)
if error is 5 it is likely that another program opened the files and either still was running (look into task manager) or did not close the file properly. if you opened a file in visual studio for to look into, this also may prevent from opening in readbinaryfile.
Sara
ASKER
Many thanks Sara.
Where to put "GetLastError() "?
Where to put "GetLastError() "?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Many thanks Sara. How to get full error message? I now only get this
after I've added the line like
Enter your Search Criteria: (CTRL-Z to stop)
Name=PvPHLtNDJAJMeENgbGZ
open error 2
Enter your Search Criteria: (CTRL-Z to stop)
after I've added the line like
if (!indexfile.is_open())
{
std::cout << " open error " << GetLastError() << std::endl;
return false;
}
...
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Yes, I do use createFilename. As it is showing the full message, per the change I applied to the codes, what to further adjust?
ASKER
Good day Sara,
I recently did no change to ReadBinaryFile but have adjust the number of files, and number of records, within SaveBinaryFile. Many thanks.
I recently did no change to ReadBinaryFile but have adjust the number of files, and number of records, within SaveBinaryFile. Many thanks.
Open in new window
what produces the same output.
the speed mostly is determined by the size of the std::set containers, so the above should increase speed dramatically.
a second thing you could do is to move the descriptions of dynamic length to a separate data file. that way you would store less than the half of the current data to disk.
Sara