Link to home
Start Free TrialLog in
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)Flag for United Kingdom of Great Britain and Northern Ireland

asked on

Ubuntu 13.10 installation error ??? ???

The end goal is to build an Ubuntu 13.10 Compute unit for CUDA 5.5 using the nVidia Tesla K40.

Hardware:-

ASUS P9X79 Motherboard
64GB RAM (8 x 8GB Kingston Hyper Beast KHX24C11T3K2/16X)
Intel I7 4960X CPU
EVGA nVidia GT610 GPU
nVidia K40 Tesla
1 x OWC Mercury Accelsior 1TB PCI-e  SSD
2 x SanDisk Ultra Plus SSD 256GB (SDSSDHP256G)

(RAID1 attached to motherboard in IRST mode, because we have found that RST does not function at all correctly!)

All firmwares have been updated, and this rig above was soak tested for 72 hours, with Windows 7 with no issues. (including running SETI on the K40!).

The configuration required is to use the two SanDisk SSDs in RAID1 for the OS, and use the 1TB PCI-e card (this can only fit in one slot!) for data.

BIOS set to defaults, except SATA changed to RAID1, and IRST mode, because not compatible with Ubuntu.

Install performed from USB pen and CDROM using Ubuntu 13.10 Desktop.

PC is connected to internet, tried selecting Download Updates whilst installing and install third party software.

Disks have been erased and wiped using DBAN.

Selecting Erase Disk and Install Ubuntu continues, recognizes the SATA RAID, continuing to the Time Zone (location) selection...

it fails with Error ??? ???

User generated image
I have read this could be, you need to create manual partition tables, but this has worked, on this machine, but we had another issue! (later!).

So why the errror?

Later we've had issues, hence trying to re-install, that the RAID1 configuration on the motherboard reset, resulting in OS lost! (requring re-install!)

(and we have two machines identical, and they behave the same, other than, if we remove ALL SSDs in RAID1, the ??? ??? goes away, e.g. install on 1TB PCI-e card.

This error does not occur with 12.04 LTS, the installation completes successfully, but the installation does NOT Boot!

Comments, welcome, and points for a fix!
Avatar of gheist
gheist
Flag of Belgium image

CUDA 5.5 supports ONLY  one currently supported version - Ubuntu LTS 12.04.3 (NOT kernel 3.11 12.04.4)
Note that 12.04.2 and 13.04 are oldest releases that will install on UEFI system (like your motherboard)
CentOS 6 ( called RrtHEL6 there) is another option still in supported train.
Also start with minimal version, you can use ubuntu tasksel to add unity later.
Make sure you dont soak test, but run memtest86(+) for those 3 days
And after you install Linux run (yes > /dev/null) & ion each CPU for a day or two.
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

ASKER

Thanks for your reply, but it does not get me near a solution at present, we've spent three man days on this, and we are likely to start dumping SSDs (RAID1) and 1TB PCI-e card, to get an installation of Ubuntu 13.10 (and then fun starts with CUDA, and may have to drop to 13.04/12.04)

Yes, we are aware of the limitation of CUDA 5.5 on 13.10, but this is the requirement, we are told at present!

(we are no where near this at present, that's more fun to come, because of these issues, with installation).

12.04 as noted, installs, no error, but does not boot.

13.10 causes these errors as above, but when SSDs are removed, it works on the 1TB card, but this was supposed to be for storage not OS.
12.04.2 aka kernel 3.5 is minimal to boot on UEFI
See here 12.04.4:
http://releases.ubuntu.com/precise/

You cannot run CUDA on 13.10 (kernel 3.11)
It will run on 13.04/12.04.3 (kernel 3.8)

Whatever you are told just will not work at all.
So 13.10 should work?

Why the ??? ??? On installation why do the ??? ??? Disappear on removal of SSDs
At present we need to install 13.10. It does not with system as above.

This is the requirement.

If CUDA 5.5 does not function as seen in some threads we will then have to advise. But that is Part 2 of the issue.
Use server or alternate  media to install in text mode, error seems like graphical toolkit problem, not anything technical. (and also shows non-LTS versions are not so well elaborated)

Really you go to bin 6000$ compute card? I can give you my address to send it to for proper dispsal...
We have two K40s.

If 13.10 cannot be made to comply with CUDA 5.5 we will advise client accordingly but cannot at present provide working proof without working 13.10 system.

I suspect install issue is storage or BIOS related.
Have you tried breaking the RAID and just installing on a single disk first to test isolate the RAID hardware?
Yes does not work same error.

If we remove All Sandisk SSDs an use PCI-E 1TB card works okay. And vice versa works okay but combined does not work.
Is there an interrupt conflict or memory space conflict between the 2 in BIOS?

Can you just install the system with just the RAID first, configure it and make sure it boots?  If it works all the way through, then maybe you can add the  PCI-E 1 TB card after everything is installed.
Make it IBM way - install on a USB stick
Or the HP way - use the (micro-)SD card ;)
Why would you mirror system boot that takes 10-20min to reinstall...
@serialband Yes, we've tried that as well, and after inserting the PCI-E 1 TB card, all is well with a working system, and on the 3rd reboot laster, the RAID configuration on the motherboard "resets", leaving a blank system.

@gheist, there is no option on the motherboard for SD card or USB installation, and it's also part of the brief and requirement for Mirrored RAID 1 OS installation.
I am not running demagogy, but IBM ships OS USB and hp servers have SD card slot inside for (not wasting too much money at) booting system

Your motherboard has software (aka FAKE) RAID, so it is better to enable it in Linux (no difference, RST windows driver reads RAID config from BIOS, nobody gets hurt if Linux reads it from the last sector of the disk)
https://help.ubuntu.com/community/FakeRaidHowto
ASKER CERTIFIED SOLUTION
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The solution provided and accepted, is the Answer to this Question. nVidia may state that CUDA 5.5.0 is not supported on Ubuntu 13.10, but this shows it does compile and function correclty without issue.