Solved

Help on big oh analysis

Posted on 2013-01-02
16
471 Views
Last Modified: 2013-01-03
Hi,

I am trying to understand big oh notations. Please correct me.

Say there is a program that accesses a file on disk randomly and say the size of the file n.
So, the big oh notation for this program is O(n) this is because it may access the file randomly at n positions. However, there is a possibility that the program assess the data for the file from the cache but lets ignore this for now.

If the program accesses the file sequantially (one after the other), the big oh notation is O(1).

Is this right ?

Thank you very much.
0
Comment
Question by:zizi21
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 5
  • 2
16 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 38739130
O(1) time for a single access.
O(n) time to access all of the file.
0
 

Author Comment

by:zizi21
ID: 38739152
Doesn't making random accesses to disk does not make a difference to the big oh notation ?
0
 

Author Comment

by:zizi21
ID: 38739156
Thanks ozo. I thought random accesses influeces big oh notations ?
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 84

Expert Comment

by:ozo
ID: 38739158
It can.  
What disk operations does the program require?
0
 

Author Comment

by:zizi21
ID: 38739164
Thanks ozo.

The program accesses the data on file on random positions...But, how do I write that in big oh notation?
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 300 total points
ID: 38739187
You probably wouldn't.

But there could be some situations where it would make a difference.
For example, if n is the size of the file, random access to some record in the middle might be O(1),
whereas if you need to sequentially scan the file from the beginning to get to a record, it might be O(n).
Or, if the actual random access method works more like O(sqrt(n)) because seeking gets slower in a larger file, then  randomly seeking every record in the file might be O(n*sqrt(n))
whereas if going from one record to the next sequential record was O(1) then reading the entire file sequentially might be O(n)
0
 

Author Comment

by:zizi21
ID: 38739202
Thank you very much for the reply.

Does this mean that if seeking gets slower in large files, it is always sqrt(n)? But, why sqrt(n)  and not only other number ? Seeking would get slower in larger files because the files would not be cached in RAM. But, if the file is small, it is cached in RAM and accessing it is fast.

For example, if the size of the file is larger than the available RAM, we can say the file is not cached into RAM. And, in this situation, making random accesses is expensive.
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 300 total points
ID: 38739239
sqrt(n) was just an example.  For many practical purposes, you may be able to treat random access as O(1), independent of the size of the file.
But in actuality, there are physical constraints on zetabyte files that would not be a concern with kilobyte files.
At some point, large enough files have to require more physical space, and there's a speed of light limit to how fast you can transmit information across that space,
which might imply O(cube root (n)) access time.
Or, ultimately, when your storage is so vast that it would collapse itself into a black hole if you kept the density constant, you'd be back to O(sqrt(n))
But I doubt you'd have to be concerned in practice with a theoretically unbounded n,
and I was thinking more in terms of physically moving a read head across a 2 dimensional disk.
You said you were ignoring the possibility of caching, and in principle that should only change the constant factor which O notation ignores, and not the asymptotic behaviour which O notation describes.  But if you are concerned in particular with behaviour across the range of n where files of some sizes are cached and files of other sizes are not,
then you would have to examine the particular characteristics of your cached and non-cached storage in order to do your big oh analysis.
0
 
LVL 32

Accepted Solution

by:
phoffric earned 200 total points
ID: 38739307
Here is my analysis for a specific scenario.
1. the file is large taking up almost an entire single dedicate disk
2. the data file is written to in contiguous sectors, and when a cylinder is filled, then the next adjacent cylinder
3. The random access system allows the OS and disk system to compute the (cylinder, platter, and sector in the platter) to access the record
4. As per OP, ignoring caching
5. The hard disk drive is not supporting a solid state disk

The largest time factor is moving the head to the correct cylinder. Suppose you have n records filling up N cylinders. To access a record, the worst case head movement occurs when you just read from cyclinder 1 to cylinder N. So that access is N-1 time units (a time unit is time to move from one cylinder to another). Assume, for simplicity, that the time unit is linear - travelling twice the distance takes twice as long.

In general, the average seek time is approximately N/2.
If you double the amount of data to 2n, then that will fill 2N sectors., so now the average seek time is (2N/2) = N.

Conclusion, by doubling the amount of data, the worst-case access time, the head seek time, doubles. This is an O(N) operation to access a random record.

If you read the data sequentially, it will take twice as long to read the entire file if the data doubles, so sequential reading is also an O(n) operation.
0
 

Author Comment

by:zizi21
ID: 38739454
Thank you very much for all the replies. I am studying it now.
0
 

Author Comment

by:zizi21
ID: 38739515
"If you read the data sequentially, it will take twice as long to read the entire file if the data doubles, so sequential reading is also an O(n) operation. "

Thank you very much for your reply. If I understand this correctly, accessing each record of data takes O(1) time (for sequential access) and O(n) for the entire file.
0
 
LVL 84

Expert Comment

by:ozo
ID: 38739531
Under that scenario, reading 1 record is O(1), reading n records is O(n)
0
 

Author Comment

by:zizi21
ID: 38739543
Thank you very much....
0
 

Author Closing Comment

by:zizi21
ID: 38739549
Thank you .
0
 

Author Comment

by:zizi21
ID: 38739550
I spent days trying to understand and all of you helped lots...Thank you verymuch.
0
 
LVL 32

Expert Comment

by:phoffric
ID: 38741253
For very large files, I was saying that because additional cylinders will result in the average seek time in accessing a single record will increase linearly as the number of cylinders increase, then reading one record is O(n), not O(1).

Suppose a file is small, and that the additional records are placed in the same cylinder. In that case the reading of one records is O(1).
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This tutorial is posted by Aaron Wojnowski, administrator at SDKExpert.net.  To view more iPhone tutorials, visit www.sdkexpert.net. This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
Iteration: Iteration is repetition of a process. A student who goes to school repeats the process of going to school everyday until graduation. We go to grocery store at least once or twice a month to buy products. We repeat this process every mont…
The goal of this video is to provide viewers with basic examples to understand and use structures in the C programming language.
The goal of this video is to provide viewers with basic examples to understand recursion in the C programming language.

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question