Help on big oh analysis

Hi,

I am trying to understand big oh notations. Please correct me.

Say there is a program that accesses a file on disk randomly and say the size of the file n.
So, the big oh notation for this program is O(n) this is because it may access the file randomly at n positions. However, there is a possibility that the program assess the data for the file from the cache but lets ignore this for now.

If the program accesses the file sequantially (one after the other), the big oh notation is O(1).

Is this right ?

Thank you very much.
zizi21Asked:
Who is Participating?
 
phoffricCommented:
Here is my analysis for a specific scenario.
1. the file is large taking up almost an entire single dedicate disk
2. the data file is written to in contiguous sectors, and when a cylinder is filled, then the next adjacent cylinder
3. The random access system allows the OS and disk system to compute the (cylinder, platter, and sector in the platter) to access the record
4. As per OP, ignoring caching
5. The hard disk drive is not supporting a solid state disk

The largest time factor is moving the head to the correct cylinder. Suppose you have n records filling up N cylinders. To access a record, the worst case head movement occurs when you just read from cyclinder 1 to cylinder N. So that access is N-1 time units (a time unit is time to move from one cylinder to another). Assume, for simplicity, that the time unit is linear - travelling twice the distance takes twice as long.

In general, the average seek time is approximately N/2.
If you double the amount of data to 2n, then that will fill 2N sectors., so now the average seek time is (2N/2) = N.

Conclusion, by doubling the amount of data, the worst-case access time, the head seek time, doubles. This is an O(N) operation to access a random record.

If you read the data sequentially, it will take twice as long to read the entire file if the data doubles, so sequential reading is also an O(n) operation.
0
 
ozoCommented:
O(1) time for a single access.
O(n) time to access all of the file.
0
 
zizi21Author Commented:
Doesn't making random accesses to disk does not make a difference to the big oh notation ?
0
Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

 
zizi21Author Commented:
Thanks ozo. I thought random accesses influeces big oh notations ?
0
 
ozoCommented:
It can.  
What disk operations does the program require?
0
 
zizi21Author Commented:
Thanks ozo.

The program accesses the data on file on random positions...But, how do I write that in big oh notation?
0
 
ozoCommented:
You probably wouldn't.

But there could be some situations where it would make a difference.
For example, if n is the size of the file, random access to some record in the middle might be O(1),
whereas if you need to sequentially scan the file from the beginning to get to a record, it might be O(n).
Or, if the actual random access method works more like O(sqrt(n)) because seeking gets slower in a larger file, then  randomly seeking every record in the file might be O(n*sqrt(n))
whereas if going from one record to the next sequential record was O(1) then reading the entire file sequentially might be O(n)
0
 
zizi21Author Commented:
Thank you very much for the reply.

Does this mean that if seeking gets slower in large files, it is always sqrt(n)? But, why sqrt(n)  and not only other number ? Seeking would get slower in larger files because the files would not be cached in RAM. But, if the file is small, it is cached in RAM and accessing it is fast.

For example, if the size of the file is larger than the available RAM, we can say the file is not cached into RAM. And, in this situation, making random accesses is expensive.
0
 
ozoCommented:
sqrt(n) was just an example.  For many practical purposes, you may be able to treat random access as O(1), independent of the size of the file.
But in actuality, there are physical constraints on zetabyte files that would not be a concern with kilobyte files.
At some point, large enough files have to require more physical space, and there's a speed of light limit to how fast you can transmit information across that space,
which might imply O(cube root (n)) access time.
Or, ultimately, when your storage is so vast that it would collapse itself into a black hole if you kept the density constant, you'd be back to O(sqrt(n))
But I doubt you'd have to be concerned in practice with a theoretically unbounded n,
and I was thinking more in terms of physically moving a read head across a 2 dimensional disk.
You said you were ignoring the possibility of caching, and in principle that should only change the constant factor which O notation ignores, and not the asymptotic behaviour which O notation describes.  But if you are concerned in particular with behaviour across the range of n where files of some sizes are cached and files of other sizes are not,
then you would have to examine the particular characteristics of your cached and non-cached storage in order to do your big oh analysis.
0
 
zizi21Author Commented:
Thank you very much for all the replies. I am studying it now.
0
 
zizi21Author Commented:
"If you read the data sequentially, it will take twice as long to read the entire file if the data doubles, so sequential reading is also an O(n) operation. "

Thank you very much for your reply. If I understand this correctly, accessing each record of data takes O(1) time (for sequential access) and O(n) for the entire file.
0
 
ozoCommented:
Under that scenario, reading 1 record is O(1), reading n records is O(n)
0
 
zizi21Author Commented:
Thank you very much....
0
 
zizi21Author Commented:
Thank you .
0
 
zizi21Author Commented:
I spent days trying to understand and all of you helped lots...Thank you verymuch.
0
 
phoffricCommented:
For very large files, I was saying that because additional cylinders will result in the average seek time in accessing a single record will increase linearly as the number of cylinders increase, then reading one record is O(n), not O(1).

Suppose a file is small, and that the additional records are placed in the same cylinder. In that case the reading of one records is O(1).
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.