Solved

# Help on big oh analysis

Posted on 2013-01-02
462 Views
Hi,

I am trying to understand big oh notations. Please correct me.

Say there is a program that accesses a file on disk randomly and say the size of the file n.
So, the big oh notation for this program is O(n) this is because it may access the file randomly at n positions. However, there is a possibility that the program assess the data for the file from the cache but lets ignore this for now.

If the program accesses the file sequantially (one after the other), the big oh notation is O(1).

Is this right ?

Thank you very much.
0
Question by:zizi21
• 9
• 5
• 2

LVL 84

Expert Comment

O(1) time for a single access.
O(n) time to access all of the file.
0

Author Comment

Doesn't making random accesses to disk does not make a difference to the big oh notation ?
0

Author Comment

Thanks ozo. I thought random accesses influeces big oh notations ?
0

LVL 84

Expert Comment

It can.
What disk operations does the program require?
0

Author Comment

Thanks ozo.

The program accesses the data on file on random positions...But, how do I write that in big oh notation?
0

LVL 84

Assisted Solution

ozo earned 300 total points
You probably wouldn't.

But there could be some situations where it would make a difference.
For example, if n is the size of the file, random access to some record in the middle might be O(1),
whereas if you need to sequentially scan the file from the beginning to get to a record, it might be O(n).
Or, if the actual random access method works more like O(sqrt(n)) because seeking gets slower in a larger file, then  randomly seeking every record in the file might be O(n*sqrt(n))
whereas if going from one record to the next sequential record was O(1) then reading the entire file sequentially might be O(n)
0

Author Comment

Thank you very much for the reply.

Does this mean that if seeking gets slower in large files, it is always sqrt(n)? But, why sqrt(n)  and not only other number ? Seeking would get slower in larger files because the files would not be cached in RAM. But, if the file is small, it is cached in RAM and accessing it is fast.

For example, if the size of the file is larger than the available RAM, we can say the file is not cached into RAM. And, in this situation, making random accesses is expensive.
0

LVL 84

Assisted Solution

ozo earned 300 total points
sqrt(n) was just an example.  For many practical purposes, you may be able to treat random access as O(1), independent of the size of the file.
But in actuality, there are physical constraints on zetabyte files that would not be a concern with kilobyte files.
At some point, large enough files have to require more physical space, and there's a speed of light limit to how fast you can transmit information across that space,
which might imply O(cube root (n)) access time.
Or, ultimately, when your storage is so vast that it would collapse itself into a black hole if you kept the density constant, you'd be back to O(sqrt(n))
But I doubt you'd have to be concerned in practice with a theoretically unbounded n,
and I was thinking more in terms of physically moving a read head across a 2 dimensional disk.
You said you were ignoring the possibility of caching, and in principle that should only change the constant factor which O notation ignores, and not the asymptotic behaviour which O notation describes.  But if you are concerned in particular with behaviour across the range of n where files of some sizes are cached and files of other sizes are not,
then you would have to examine the particular characteristics of your cached and non-cached storage in order to do your big oh analysis.
0

LVL 32

Accepted Solution

phoffric earned 200 total points
Here is my analysis for a specific scenario.
1. the file is large taking up almost an entire single dedicate disk
2. the data file is written to in contiguous sectors, and when a cylinder is filled, then the next adjacent cylinder
3. The random access system allows the OS and disk system to compute the (cylinder, platter, and sector in the platter) to access the record
4. As per OP, ignoring caching
5. The hard disk drive is not supporting a solid state disk

The largest time factor is moving the head to the correct cylinder. Suppose you have n records filling up N cylinders. To access a record, the worst case head movement occurs when you just read from cyclinder 1 to cylinder N. So that access is N-1 time units (a time unit is time to move from one cylinder to another). Assume, for simplicity, that the time unit is linear - travelling twice the distance takes twice as long.

In general, the average seek time is approximately N/2.
If you double the amount of data to 2n, then that will fill 2N sectors., so now the average seek time is (2N/2) = N.

Conclusion, by doubling the amount of data, the worst-case access time, the head seek time, doubles. This is an O(N) operation to access a random record.

If you read the data sequentially, it will take twice as long to read the entire file if the data doubles, so sequential reading is also an O(n) operation.
0

Author Comment

Thank you very much for all the replies. I am studying it now.
0

Author Comment

"If you read the data sequentially, it will take twice as long to read the entire file if the data doubles, so sequential reading is also an O(n) operation. "

Thank you very much for your reply. If I understand this correctly, accessing each record of data takes O(1) time (for sequential access) and O(n) for the entire file.
0

LVL 84

Expert Comment

Under that scenario, reading 1 record is O(1), reading n records is O(n)
0

Author Comment

Thank you very much....
0

Author Closing Comment

Thank you .
0

Author Comment

I spent days trying to understand and all of you helped lots...Thank you verymuch.
0

LVL 32

Expert Comment

For very large files, I was saying that because additional cylinders will result in the average seek time in accessing a single record will increase linearly as the number of cylinders increase, then reading one record is O(n), not O(1).

Suppose a file is small, and that the additional records are placed in the same cylinder. In that case the reading of one records is O(1).
0

## Featured Post

Summary: This tutorial covers some basics of pointer, pointer arithmetic and function pointer. What is a pointer: A pointer is a variable which holds an address.Â This address might be address of another variable/address of devices/address of fuâ€¦
Linear search (searching each index in an array one by one) works almost everywhere but it is not optimal in many cases. Let's assume, we have a book which hasÂ 42949672960Â pages. We also have a table of contents.Â Now we want to read the content on pâ€¦
The goal of this video is to provide viewers with basic examples to understand and use pointers in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use conditional statements in the C programming language.