?
Solved

Minimizing the Number of Disc Accesses

Posted on 2003-03-07
17
Medium Priority
?
239 Views
Last Modified: 2010-04-15
Hello,

I am facing a quandary of late of no easy solution, or so it would appear, and was wandering whether anybody could assist me. I wish to know how to query an operating system or some other unholy thing as to a computer's page size. To make sure we all mean the same by saying "page", I took the liberty of quoting the following from Introduction to Algorithms (heavier than the bible, twice as useful) by Thomas H. Cormen, Charles E. Leiserson and Ronald L. Rivest, page 382:

"The read/write arm can position the head at different distances from the center of the disk. When the head is stationary, the surface that passes underneath it is called a _track_. The information stored on each track is often divided into a fixed number of equal sized _pages_; for a typical disk, a page might be 2048 bits in length. The basic unit of information storage and retrieval is usually a page of information - that is, disk reads and write are usually of entire pages."

Some applications, such as B-Trees, which wish to minimize the number of disk accesses, use page sized nodes. My question actually consists of 3 sub questions:
a) How am I to query whatever blasphemous thing governs my computer as to my computer's page size? (Using C, obviously.)
b) Is there a better way of working with pages than the ubiquitous fread and fwrite?
c) Could I, perchance, have the computer allot me an entire disk track ("It's mine, all mine!") so as to minimize disk entries, or is there a way to transcend matter altogether?
d) Why are you even bothering to read my fourth sub question when I specifically told you that there were only three? I’m greatly offended.

My sincerest thanks,

Itai
0
Comment
Question by:Itaifi
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
  • 3
  • +1
17 Comments
 

Author Comment

by:Itaifi
ID: 8092222
It has just occurred to me that the following detail might be of interest to whoever chooses to answer this question: I (am frightfully sorry) use (oh dear) Windows (please forgive me). I have been able to find functions that get the page size for both Linux and Mac. Gosh, isn't that surprising.
0
 
LVL 6

Accepted Solution

by:
gj62 earned 2000 total points
ID: 8092553
Well, just how much work, and how hardware specific, do you want to go?  I will assume this is a real-world question, and not a rhetorical one.

I can tell you that we wrote (about 7 years ago) routines to access over 1TB of information through Windows - and we played around with darn near everything to get it as fast as possible.  

Here's what I can tell you...

Use memory-mapped files:  http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dngenlib/html/msdn_manamemo.asp  

give you a good background...

This will get you 90% of the gain, for 10% of the effort.  BTW, MSFT currently uses 4K pages...

Beyond that, be prepared to write in assembly and create your own disk I/O subsystem if you really want to squeeze the last nth of performance out of your system.  By far, you get more benefit out of reading as much as you can into memory and managing it there, rather than trying to read the exact right amount.

Another thing is that no matter what function you use in C, the amount of data read off the disk is up to the OS.  That's right.  Even if you do an fread of 2048, the OS may choose to read 4K, or even more.  I know of no easy way around that other than to skip the OS and go direct to the disk.

So even though Linux and Mac (though I am not as sure on the Mac), may give you page-size functions (never looked, actually), I'm pretty sure when you do a read they are going to read whatever the OS says is what they want - and that is NOT disk specific.

So what would I recommend? Again, not knowing your application, all of this is just off the cuff, but I would devise your disk access to work in powers of 2, starting at 8K pages.  NOT that your structures are that size, etc, just that when you read data off the disk, that's the multiple you use.  You can then tune your application by increasing the multiplier for 16, 32 or even 64K at a time.  If you've used memory mapping for your files, you will also see a significant gain.

Hope this helps...
0
 
LVL 7

Expert Comment

by:billious
ID: 8094647
a) Depends on platform. Sector sizes range from 128 bytes (CP/M) to 64Kbytes (IIRC on Microsoft systems). On a Microsoft system, get it from int 21H/function 1Ch - it appears in CX on return. The standard sector size for NT is 4K, as already stated.

In essence, use RAM (as already stated) or ignore it - it's an old concept, and outdated by modern drives' hardware read-ahead buffering.

OTOH, a 2**n size for your pages is heaps better than any other size...

b) read/write in any size of 2**n...probably the bigger the better. Of course, there's a trade-off between large I/O sizes and number of times you want to write - especially if you're committing updates. Perhaps a big read and a small write could be better in theory, but the drive's buffering will probably over-ride any theoretical gains.

c) Old, Old, Old concept supported on OS/360 and systems of that vintage, but not supported on modern systems. Sure, contiguous data is nice - just create a really big file and defrag. All is OS-dependent!

d) Take offence if it suits you. C programmers start counting at zero, you realise.

...Bill


0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:Itaifi
ID: 8094981
0. Thanks a million to the two of you. The assembly function seems promising enough, although my assembly references speaks of FAT cluster size, while I, using Windows XP - did I mention I was sorry? - use NTFS. I'm not sure whether that would make a difference. Also, I really should read the MSDN reference thoroughly. I'm not sure whether it'd be proper Expert etiquette, but It's getting late around here, so I think I'll go over both these options tomorrow and only then select one as an answer. I do hope that's allowed. Will you exit() me if is isn't?
1. As for my application, a very basic database - unless the Lord God reveals himself/herself/itself/a talking carrot to me and says differently, no SQL and nothing fancy - only worthwhile mentioning because of its size, which I estimate to be approximately 1 GB (perhaps more, hopefully less. Maybe by some feat of rand() luck my computer will start using qubits, although I have an inkling that long beforehand I will find my computer filled with dead cats). !(A TB of information, mind you.)

Once again, my sincerest thanks.
0
 
LVL 6

Expert Comment

by:gj62
ID: 8095291
1) Take your time - ask more questions in this thread too if you need more info.
2) The FASTEST data access is using fixed-length fields and records (remember xBase/dBase/FoxPro?)- of course, that is not the most efficient for storage.  Depending upon what you are doing, fixed length fields coupled with bitmap indicies can make for blazing query and extraction (but terrible xaction processing...).  On almost .75TB and 200 million rows by 100 fields of ranged data, we could query on 8 to 10 fields in less than 1 second... Not bad on $40K of hardware 5 years ago...
0
 

Author Comment

by:Itaifi
ID: 8122299
Well, just in case someone reviews this question later on - I got to this site after seeing numerous references on Google, what with the PAQ file being thus accessible, and can only assume that, theoretically, someone might tread upon this question in the future, poor sod - I decided to accept gj62's answer because he/she - in German you can apparently use "it", which I think is awfully convenient, as well as expresses my personal feeling that all humans save myself are merely talking plants - gave me the answer I _needed_. Billious gave me the answer I _asked_ for, and the one you need in case you stumbled upon this question. The answer I _wanted_, mind you, is: "I don't know anything about this computer thingy, but I hold you to be my one and only Lord and Sovereign. Incidentally, I'm an incredibly good looking human female." <sigh>
0
 
LVL 6

Expert Comment

by:gj62
ID: 8122441
I have been castigated for offhandedly referring to a poster as "he", but given my limited typing skills, and even more limited ability to reword sentences without using such a pronoun, I "takes my chances" <grin>.

He/She/It is too cumbersome, and the "it" portion will undoubtedly offend someone.  "Est" I believe is the German neuter pronous, but while I think it is OK to refer to someone's pet as a neuter, you get into problems with people (and it is just too darn confusing for other nouns - I seem to remember that knife, fork and spoon and male, female and neuter nouns - now how do you remember that!)...

Maybe I will settle on the "royal" 'they'.  Sounds a bit removed, but alas, I see no other easy route...



0
 

Author Comment

by:Itaifi
ID: 8122729
This discussion is getting linguistic. Argh. Anyway, English does have its advantages. Very much like Arabic, French, possibly German and many other languages, Hebrew - my native language - has different adjectives and nouns for males and females. A male teacher, for instance, is a "Moré", whereas a female is a "Mora". This means that expressing one self in Hebrew while remaining both politically correct and coherent (before you jump to conclusions, let me add that English got around this by leaving out the coherency part) is an art form mastered by few and understood by fewer. The only logical conclusion to be derived from this is that the best way to express one self in Hebrew is by remaining perfectly silent.
0
 

Expert Comment

by:SpideyMod
ID: 8122885
A request for a split has been made at:
http://www.experts-exchange.com/Community_Support/Q_20548415.html

However, we typically do not redice the amount of points awarded to an expert after they have already been awarded unless the expert is OK with it. Note: All splits should be requested before accepting an answer

Points splits take a lot of work and we require the questioner to do it (unless the questioner has left the site or is not responding).

The request is to let billious have 444 points.

If you would like billious to have 444 points, they will need to come out of your pocket.

Here's how to do it:

1) post a new question in the same topic area as this one (http://www.experts-exchange.com/Programming/Programming_Languages/C/).

Title it "points for billious re: Q_20543085"
Assign a value of 444 points

Grab the URL for the new question (make sure it has the question number in it).

Return back to this question and create a comment that says:
billious, points for you at: http://www.experts-exchange.com/Programming/Programming_Languages/C/00000000.html (of course this will be the actual link where you put the question)

There's where we are.

Let me know if you need further help.

SpideyMod
Community Support Moderator @Experts Exchange

0
 
LVL 6

Expert Comment

by:gj62
ID: 8123009
Sure, I will share, but why does Itaifi want to send more 444 points - I thought it was worth 500?  A split would be 250 right?  Besides, that person is confused - above they claim to be girl, but in the post to split points, refer to themself as a "guy".

<grin>...

billious - 250 each way sound OK to you?
0
 

Expert Comment

by:SpideyMod
ID: 8123082
I'm just as confused.  I don't know if an additional 444 is desired to be spent, but I do thank you for your willingness to share the points.  We'll wait to see what the questioner wants.  I can do it either way now that we have your OK.  That's what I like to see...experts behaving like experts (I do sometimes get to see the uglier side of things as you can imagine)

SpideyMod
Community Support Moderator @Experts Exchange

To recap: 2 choices here:
1) spend another 444 points (which I don't believe is the best thing at this point)
2) Reduce the points down to 250 for gj62, create a 250 "points for billious" question as described above

0
 
LVL 7

Expert Comment

by:billious
ID: 8126168
gj62:

Itaifi said

The answer I _wanted_, mind you, is: "I don't know anything about this computer thingy, but I hold you to be my one and only Lord and Sovereign. Incidentally, I'm an incredibly good looking human female." <sigh>

iow, Itaifi would have preferred  a reply from

an incredibly good looking human female

in the form

"I don't know anything about this computer thingy, but I hold you to be my one and only Lord and Sovereign"

so, I conclude that Itaifi, expecting to be described as "only Lord" is male, or if not prefers to be regarded as such.

Be warned, Itaifi - beauty fades. Well, physical beauty at any rate.

I've got to agree with the general sentiment that PC practices obstruct effective communication. I don't bother with them. 'He' is the default personal pronoun in English. If you don't like it, learn Chinese, or whatever takes your fancy.

So, as far as I can see, Spidey wants to split the points, and gj62 wants me to bet 250 points each way - but on what animal, at what track, and when?

(Spidey's suggestion OK with me!)

...Bill
0
 

Author Comment

by:Itaifi
ID: 8129485
<Generally confused, not quite following, absentmindedly eating a banana>

All right! Everybody put down your weapons and hold still. Good. Gj62, thank you very much, off you go. SpideyMod, hold this banana for me. Billious, I'll post a new question in this area and title it "points for billious re: Q_20543085", if only because "points for those who fancy them" may be ambiguous. It’s not like I need the points, and this is all part of my secret plan to take over the world. Now, where's my banana?
0
 

Author Comment

by:Itaifi
ID: 8129493
P.S. I have never claimed to be a girl. I have claimed to be a penguin of indeterminate sex, but nobody seemed to believe me, even after seeing the beak.
0
 

Expert Comment

by:SpideyMod
ID: 8129534
Do the lies never end?  Penguins don't eat bananas,you can't fool me, so I've taken the liberty of finishing it off for you. LOL


BTW, billious, your points are here:
http://www.experts-exchange.com/Programming/Programming_Languages/C/Q_20549635.html

SpideyMod
Community Support Moderator @Experts Exchange
0
 
LVL 6

Expert Comment

by:gj62
ID: 8129539
Itaifi, I sit corrected - my reading is almost as poor as my typing.

Thank you *sir*...

0
 
LVL 7

Expert Comment

by:billious
ID: 8132659
Itaifi,
  now you're happy that someone's holding your banana, but perplexed that no-one can determine the sex of your penguin?

It gets dark at times, so I'd suggest it's unsafe to make such a determination on the basis of the size of its beak.

Bill
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
Windows programmers of the C/C++ variety, how many of you realise that since Window 9x Microsoft has been lying to you about what constitutes Unicode (http://en.wikipedia.org/wiki/Unicode)? They will have you believe that Unicode requires you to use…
The goal of this video is to provide viewers with basic examples to understand and use pointers in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use conditional statements in the C programming language.
Suggested Courses
Course of the Month11 days, left to enroll

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question