Link to home
Start Free TrialLog in
Avatar of Peter Chan
Peter ChanFlag for Hong Kong

asked on

Way to improve it

Hi,
Further to this thread
https://www.experts-exchange.com/questions/28629782/Details-to-do-the-search.html?anchorAnswerId=40686952#a40686952

I try the way to have 10 items to the struct below

struct rec_struc
{
      item items[10];
      rec_struc(bool bfill = false);

};

but the speed of the project is rather slow. Is there a way to improve it?
Avatar of sarabande
sarabande
Flag of Luxembourg image

you may try with

const unsigned int NUM_FILES = 100;
const unsigned int NUM_RECORDS = 80000;

Open in new window


what produces the same output.

the speed mostly is determined by the size of the std::set containers, so the above should increase speed dramatically.

a second thing you could do is to move the descriptions of dynamic length to a separate data file. that way you would store less than the half of the current data to disk.

Sara
I would use something like Telerik JustTrace to analyse my code to determine where the maximum time is spent. I'd combine Just trace with perfmon or some other system performance monitoring tool and look for disk and memory usage.
Avatar of HooKooDooKu
HooKooDooKu

Each time you create an instance of 'rec_struct', you are creating 10 instances of 'item'.

Do you ALWAYS need 10 instances of 'item'?

If not, you can improve performance by changing your structure from an array of 'item' objects to an array of pointers to 'item' objects.

struct rec_struc
{
      item* pItems[10];
      rec_struc(bool bfill = false);

};

Even better should be to use a dynamic array object.  That way you're not locked into only 10 pointers.  For example, I use MFC that includes a CPtrArray.  So I might have code that looks like this:

struct rec_struc
{
      CPtrArray Items;
      ~rec_struc();    //Destructor

};

//This destructor will make sure that all 'Item' objects placed into Items array are deleted when the structure goes out of scope
rec_struc::~rec_struc()
{
    for(int i=0; i<Items.GetCount(); i++
        delete Items.GetAt(i);
}

...

rec_struc r;
item* pItem;
pItem = new Item();
r.Items.Add( pItem );    //The CPtrArray will auto-grow the array as needed

...

pItem = (Item) r.Items.GetAt(i);
HooKooDooKu, the structures of the program were used for storing binary data to records of file. so, it is not possible to using pointers or classes/structures which cannot made persistent by a flat copy.

in the previous thread (see the link in the original post), we used a structure fouritems (which has an array of 4 items and not 10 like it is here) and stored 80.000.000 records (320 million items) to a binary file. an item contains of a name (char[21]), a 64-bit number, and a description (wchar_t[100]). for name and number there are additional index files which refer to the record numbers of the data file (means any item name and item number - which are supposed to be unique - point to a 4-item-record of the data file). the index files are sorted and allow a binary search in a second program for either name or number.

I don't know why the number of items per record now should be increased to 10. my suggestion was to use 1 item per record. but anyhow, the most negative influence to the performance are due to std::set containers which are used to get the name and number indices sorted. with the current program (it is a 64-bit program because it needs more than 4 gb virtual memory), there are 2 std::set containers, one would store 10 million of name indices (a 21 byte name + unsigned int record number)  and the other 10 million of number indices (a 64-bit integer + unsigned int record number). after the std::set containers were filled, they were stored to a temporary file. currently, 8 index files of each index were built and finally merged to a big index file which contains all 80 million of indices of each kind. my suggestion to improve performance is to decrease the number of entries for one temporary index file and increase the number of temporary files to merge since currently the virtual memory used for the std::set containers is more than available what leads to extreme swapping and a cillion of page faults. on the other hand the merge operation of the index files doesn't need a relevant amount of memory and the speed to read 100 k records from 100 files is the same as reading 1 million records from 10 files.  so, if the speed would improve dramatically - as I would expect - it might be worth a try to go to 1000 files of 10000 records each.

Hua, would you please post the latest code of savebinaryfile here in a code window such that other experts have a fair chance to participate?

Sara
Avatar of Peter Chan

ASKER

Many thanks all.
Sara,
Using these
const unsigned int NUM_FILES = 100;
const unsigned int NUM_RECORDS = 80000;
...

Open in new window


it took 7 hours to generate 100 name files and 100 number files. Can the speed of this be improved? Have a great weekend!
SOLUTION
Avatar of sarabande
sarabande
Flag of Luxembourg image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Many thanks Sara.

When running "ReadBinaryFile", do we put

Name=...

with the name and then press Enter to start the search? Is this correct?
yes. to find valid names you ,ay open any of the name index files within visual studio.

Sara
Good day Sara,
I encounter one open error like

Enter your Search Criteria: (CTRL-Z to stop)
Name=PvPHLtNDJAJMeENgbGZ
open error
Enter your Search Criteria: (CTRL-Z to stop)

Open in new window


when running ReadBinaryFile
you may output 'errno' or GetLastError() with the "open error".

error code 2: file or path not found
error code 3: wrong directory
error code 4: too many open files
error code 5: access denied

for 2 and 3 you may output (or debug) the path passed to the open call.
for 4 (less likely) you would have to reduce the number of files opened (or look why the files no longer in use are not closed)
if error is 5 it is likely that another program opened the files and either still was running (look into task manager) or did not close the file properly. if you opened a file in visual studio for to look into, this also may prevent from opening in readbinaryfile.

Sara
Many thanks Sara.
Where to put "GetLastError() "?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Many thanks Sara. How to get full error message? I now only get this

Enter your Search Criteria: (CTRL-Z to stop)
Name=PvPHLtNDJAJMeENgbGZ
 open error 2

Enter your Search Criteria: (CTRL-Z to stop)

Open in new window


after I've added the line like
	if (!indexfile.is_open())
	{
		std::cout << " open error " << GetLastError() << std::endl;
		return false;
	}
	...

Open in new window

ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Yes, I do use createFilename. As it is showing the full message, per the change I applied to the codes, what to further adjust?
Good day Sara,
I recently did no change to ReadBinaryFile but have adjust the number of files, and number of records, within SaveBinaryFile. Many thanks.