Unreliable OS/2 Booting in dual

When we load OS/2 on Pentium 166 MHz or above, for dual boot operation (DOS and OS/2), the machine hangs on selecting the
os/2 option from boot manager menu. This may happen randomly, i.e. many times the system boots smoothly to OS/2 . When there is a problem, the screen is blank with the cursor blinking in the top left corner. The usual white square followed by text OS/2 does not appear in this case.

Hoever, the DOS option from boot manager menu works 100% of times.

We tried various timeouts for boot menu selection, using OS/2 FDISK(including value 0 meaning always boot to OS/2 directly). When this problem occurs, one has to reset the PC , and give another try.

We have OS/2 warp 3.0 with Fixpack 17.
a012223aConnect With a Mentor Commented:
We have seen this problem time and again. You are somewhere between the bios configuration and the beginning of the OS configuration and it hangs with a blinking cursor in the top left hand corner.
Do the following two things:

Get the latest bios revision level and load it.

Get your revision levels for the OS up to 36.
If you don't get the "white square", OS/2 has not started to boot,
maybe due to I/O errors on the disk.
If this is true, I hope you have a good backup of your files.
Tim HolmanCommented:
  Is this after a cold reboot, a warm boot, having rebooted from DOS or what ??
  Are you sure you're not getting a little white box ?
  Do you have an extra video card in your system - if so, what kind of result do you get via the VGA port ?
  Sounds like this error is happening well before OS/2 even thinks about loading, so maybe there is a problem with your partitioning ?
> Get your revision levels for the OS up to 36.

Easy to say, but hard to do, especially when the
Operating System cannot be booted (to allow a downloading
of Fix Pack #36 for OS/2 Warp 3),
and probably not the correct "solution"
(changing software to fix a hardware problem???).

P.S. Why '36' ??

Fix Pack #38 is the most-recent.

Tim HolmanCommented:
  Does this happen on a number of machines all with the same spec. ?
  OS/2 does interrogate hardware far more vigorously than DOS could ever dream of and will pick up system quirks and hang if the hardware platform is not fully supported....
  Warp 4 offers far more support for hardware...
  It's worth checking whether or not your PC is on the compatability list for Warp 3.0.

The system will intermittantly boot correctly (as described in the original description of the problem), if this is the same problem we see. The solution I gave works on the 10,000+ WS and 2,000+ servers we administer at work every day.

 Additionaly, the reason I chose 36 rather than 38 is because 36 is what we have verified to upgrade correctly without adding several patches. 37 & 38 are fixpacks that we have not verified and we consider volitile until we do. Just because IBM says they work don't make it so.

Many times the hardware will not work correctly if the software is outdated. This does include the system board as the bios level is S/W and the O/S which determines how the hardware will be utilized.
> the reason I chose 36 rather than 38 is because 36 is
> what we have verified to upgrade correctly without
> adding several patches. 37 & 38 are fixpacks that we
> have not verified and we consider volitile until we do.

Note that Fix Pack #36 is no longer available from IBM's FTP-server;
so, selecting FP#37 or FP#38 are the only choices available to 'JSPL'.

However, if the computer won't boot, how can *any* Fix Pack
be applied to a "non-functional" system?

OTTA: IBM FTP sites are not the only sites for obtaining fixpacks. Several engineers including myself have techconnect CD's sent by IBM to us with these fixpacks and other patches on them.

I will once again refer you to the original question and my previous response. Additionally, I will state this so there's no doubt in your mind. This person is saying that sometimes the P.C. will hang booting up. Other times the P.C. boots as expected. Therefor, one would boot the P.C. and reboot the P.C. until it does not hang and boots up normally and apply the fixpack. Alternatively, We have fixed this problem using multiple CID diskettes Downloaded to the users onsight server, created by the user at thier server and placed in the P.C. that hangs. The first and second diskettes usually are setting up the P.C. to operate on the LAN. The third diskette usually finishes this configuration and calls a .cmd (REXX)file on the CID server which performs the fixpack upgrade, asks the user to remove the disk in the floppy drive and reboots the P.C. Of course, we have already checked the diskettes in the LAB to insure that they perform as expected.  
jsplAuthor Commented:
Thanks all of you for suggesting solution to my problem. I am presently evaluating these
suggestions as follows:
1. Trying the same hard-disk on different machine with different bios and display card.
2. Trying a different hard-disk with OS/2 on the machine where this problem is presently occurring.
3. Using fixpack 36 (I already have it but did not try it so as I did not know how stable it is.)

This problem arises irrespective of cold boot or warn boot or booting after previous boot from DOS option.

Let me clarify that this problem occurs about 5-10 % of time on the machine. So I can very well install fix-pack 36 which is now proven to be stable and free of any major bugs.
The issue is that we run an application ( from startup folder )  which needs to be running  continueously for 24 hrs a day, 7 days a week. If by any chance there is a power failure the machine is expected to boot to OS/2 when power resumes. The PC is supposed to be kept in unattended stations. Hence I need a solution which will guarantee 100 % succesful booting to OS/2, which is not happenning presently !

I have one more query regarding this problem :

What is the booting process from boot manager to OS/2 desktop loaded ?
Which particular executable (os2ldr os2krnl etc. ) shows the WHITE BOX followed by OS/2 in top left corner ?
I have very well ensured that this white box does not appear and cursor is simply blinking
when there is a booting problem. The PC remains hanged in this blank screen mode !

Tim HolmanCommented:
  The boot process !
  If only I had a machine by me.. it should all be in one of the .INF files in OS2\BOOK, but it's in Warp Unleashed which is at home somewhere.
  The little white box comes from OS2LDR, a hidden file in the root.
  I was thinking maybe it is a problem with ABIOS ?
  Do you have a seperate BIOS partition, or do you have ABIOS files on the system ?
  That's the only other thing I can think of that loads before OS2LDR but after boot manager.
  I'll look in my manuals tonight.

> What is the booting process from boot manager to OS/2 desktop loaded ?

When you see the "white-box" at "top-left",
press ALT+F2, and watch the "details" of the boot-process.

> I will state this so there's no doubt in your mind.
> This person is saying that sometimes the P.C. will hang
> booting up. Other times the P.C. boots as expected.
> Therefor, one would boot the P.C. and reboot the P.C.
> until it does not hang and boots up normally and apply the fixpack.

Let me state this, in small words, so that you "get" it.
If the PC does not boot, then applying a Fix Pack will not help.
There's something wrong with the hardware,
and no changes you make in the "software" area
can possibly fix a "hardware" problem.

Instead, look elsewhere, i.e., configure Boot Manager
to wait a few more seconds before auto-booting OS/2,
so that the disk can "spin-up" to full-speed.
Try the disk in some other computer,
to see if there are hardware-problems with the disk.

A heavy "exercise" of the disk, by applying the Fix Pack,
may only aggravate any hardware-problem,
and may cause the disk to *never* boot again.

Backup, backup, backup!

OTTA: I will not debate the issue any further than this: Imperical evidence (meaning evidence which cannot be disputed)
This (meaning a hang a small percentage of the time upon bootup) happens at work often. Especially when a user will load an application on their P.C. that is not standard for the comapny I work for. When this happens, we do two things: flash the bios with the latest tested and accepted upgrade and upgrade the OS with the latest tested and accepted fixpack. We still have 2.X in the field with the original bios. Therefore we have to cover all aspects of the problem. On a small percentage (about .5% and this is a rough estimate) of the time the bios upgrade will crap out the board. Due to the percentage we feel safe in having the UG performed prior to requiring a H/W UG. In addition to that on a few occassions (noticably less than .5%) we can correct an OS crap out by having the user create diskettes from the server and boot on the P.C. in question to an automaticaly loading program using REXX as the auto load language. THe P.C. then will gets it's bios upgrade and OS uopgrade.

On occasion we do have the processor board replaced. Since U/Gs cost nothing and processor boards do cost we utilize the cheapest method first.

Adtitionally we use much the same boot sequence with diskettes to upgrade servers and P.C's for Y2K with tested and proven no ill effects.

Bottom line this corrective action has worked for us for years. It is bios and OS/2 setup related. I know it works, I will not debate the issue any further. The person who asked the question will evaluate what I gave as an answer and determine if it worked for them.

Self admittantly, We use only IBM hardware and the effect may be different on H/W manufactered in another plant or by a different brand name but this user is looking for any possibility to fix the problem. I supplied the answer that worked for us. Debate on whether it will work for this user is closed. The use will have the last word.
> Debate on whether it will work for this user is closed.

Who appointed you as g*d of this forum?  
jsplAuthor Commented:
I loaded fixpack 36 on my devlopement machine. The hanging problem did not appear during about 25 trials we took subsequently. However our actual application did not run properly due to fixpack 36. So presently I have reverted back to fixpack 17 just to ensure that the application is running again  with fixpack 17. I will need some time to explore why my application is not running properly (some file reading problem mostly) with fixpack 36. After debugging that I will take regorous tests to come to conclusion if fixpack 36 works for my problem.

Dear OTTA, your suggestion on INCREASING the boot manager time needs to be tested as well (unfortunately I had tried to reduce the time from 30 sec to 0 sec). In fact I will be shortly visiting our client who is possessing the machines where this problem is occurring.
I have only one machine in the company where same hangup  problem is occurring. So after having bagful of solutions which you have suggested,  I will visit the client and make a comparative study of all solutions.

Thanks again all of you !
Take a screwdriver with you, and open-up the computer's case,
and make sure that all the data-cables are tightly inserted,
and all the auxiliary-boards are properly seated into the
slots on the motherboard.
OTTA: No one appointed me G*D of this or any other forum. I do however retain the right to disengage in debate of any kind or discription at any point I choose (especially when one ignores parts of the question or responses and attempts to engage me in some sort of "I'm the prema donna crap", I am not the prema donna and don't want to be). I was not aware that this would qualify one as G*D. Additionally, I was answering the question with what I knew worked in the past and works presently for us, thats not conjecture it's fact. I did not attack your answers or anyone elses. As a matter of fact I have previously seen and also see in your comments on this problem a logical thought process in attempting to determine the root of a problem. I however did not need to determine the root of the problem as I had seen this before and know what we do to fix it. I did not and do not attempt to attack people who disagree with my thought process. I am not going to verbally spar with people over the answer to a problem. I barely have time to get on the forum and look around much less waste my time with people who act like they are the only ones who can correctly answer a problem. If you want to be the prema donna please do so but please don't try to beat any more dead horses with me. I will respond to some questions but when it gets ridiculous, I will not even qualify the ridiculous with an answer any more.
> I am not going to verbally spar with people over the answer to a problem.

Curious!  You say one thing, but then do the opposite.

Anyway, JSPL has some ideas to try, which is the point of posting his/her question,
and it is he/she who needs to find the "best" answer for his/her circumstances.
Tim HolmanCommented:
All this bickering !
One would think you didn't have any other problems to fix.
If either of you really are that bored, flick through the New Users queue or something !

Point taken and I am not going to do what I said I was going to do some time ago, not discuss this issue any more.
JSPL: Since you have a non-production unit you are testing with, try uninstalling the application, installing the fixpack then installing the application again. We have found this to work on about 15% of those type problems. If it's generally an easy Uninstall and Install it might be woth it. If the Uninstall and Install would be time consuming I would make this a last resort option.
I would *NOT* try "uninstalling" any applications,
because I suspect that _WRITING_ to the hard-drive
may only _CREATE_ more problems.

Instead, since the "white-box" is not appearing,
I would take the hard-drive to another computer,
and copy all the files to another hard-drive,
and then install the "copy" of the hard-drive
into the original computer, updating the BIOS-settings,
if the new hard-drive is a different manufacturer or size
than the original hard-drive.

jsplAuthor Commented:
Dear Everybody

I installed fixpack 36 on one of my machines which was showing this problems very rarely. I also fixed the bug in my application which was causing my application not run after fixpack 36. I have tested the machine repeatedly for last 15 days or so and have found the machine ALWAYS boot successfully to OS/2. However as I have already mentioned, this unreliable booting is more prominent on about 7 machines belonging to my client. Since I have concluded that my application runs with fixpack 36, I will be first taking this harddisk with fixpack 36 to client site. After testing there, based on the results I will try loading fixpack 36 on client's machines.

Comming to Otta's suggestions, the machines are new and there is a remote possibility of all hard-disks being weak. But your suggestion about increasing the boot-up  time to more  than 30 secs is worth trying at client's site first!

I will immediately tell you the results on visiting the client.

Thanks a lot for all your suggestions.
If you are using an IBM machine, flash your bios, or upgrade the bios for other vendors, for the problem is that it hangs when it calls your machines bios, and load values that hangs the OS/2.

if this didn't solve the problem, try applying fixpak 38
> Try applying Fix Pack 38.

As of February 10, Fix Pack 40 is available:
