Link to home
Start Free TrialLog in
Avatar of NORSAR
NORSAR

asked on

Solaris 10 Sparc bge issues - Repeated link up messages

I have several SunFire V240 with identical hardware and identical operatimg systems. Solaris 10 10/08 release. Recommended & security patches. Quad onboard ethernet (Broadcom) with two ports used (0 & 1) on different subnets. iSCSI-connected storage. Host is connected to a Cisco switch. Auto-sense speed/duplex.

var/adm/messages:

/var/adm/messages:May  9 09:02:34 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  3 01:15:47 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  4 13:11:41 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  4 19:09:15 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 11:59:37 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 14:10:02 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 14:16:01 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 14:16:23 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 16:16:45 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 16:18:22 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 16:25:38 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 16:26:41 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.1:Apr 25 16:13:54 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.1:Apr 25 18:35:13 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.1:Apr 25 19:17:32 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
........

Network interface does not go down; only interface up messages.

# uname -a
SunOS iscsi-freya 5.10 Generic_138888-06 sun4u sparc SUNW,Sun-Fire-V240

# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
bge0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 2
        inet 10.10.11.10 netmask ffffff00 broadcast 10.10.11.255
        ether 0:3:ba:9f:a2:21
bge1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 3
        inet 10.10.10.3 netmask ffffff00 broadcast 10.10.10.255
        ether 0:3:ba:9f:a2:22

# kstat -m bge | grep chip_type
        chip_type                       5794
        chip_type                       5794
        chip_type                       5794
        chip_type                       5794
      
# modinfo | grep bge
141 7b6e0000  148f8 202   1  bge (BCM579x driver v0.56)

Any ideas on how to fix this?
Avatar of dalesit
dalesit

There's a thread about this at http://osdir.com/ml/solaris.solarisx86/2005-02/msg01484.html although that is using generic Dell hardware, not Sun hardware. Have you asked Sun about this? Have you added the latest patch cluster?

Cheers,

Joel
Is there anything in the logs on the Cisco switch?  I'd stop auto-sensing and set both the switch port and the NIC to 1000/Full as there's been a lot of issues around that in the past.
Avatar of NORSAR

ASKER

I have installed the latest sec. and rec. patch cluster from Sun. Problem remains the same.
Set both the switch and the host to 1000Mbps Full-Duplex.
Avatar of NORSAR

ASKER

I'm very reluctant to disable autoneg on switch and SunFire...
Why?
Agree with Elf_bin, have seen a lot of issues with autoneg in the past. Besides, the next natural step in yourl course of action to try and diagnose the issue might well be to try forcing link/speed settings.
Avatar of NORSAR

ASKER

Tried to force fdx/1000 on the bge-interfaces. On my Cisco 4006:

   conf t
   int g4/7
   speed 1000
   int g4/15
   speed
   ^Z
   wr mem

Then edited /platform/sun4u/kernel/drv/bge.conf. First I changed

   adv_autoneg_cap       = 0;
   adv_1000fdx_cap       = 1;

and rebooted. The network did not come up at all - no link on the interfaces on the switch.

Then I reset the bge.conf to default and changed

   speed                 = 1000;
   full-duplex           = 1;

and rebooted. No network this time either...

I thought you had to supply all the options in the bge file in one line as in:
adv_autoneg_cap=0 adv_1000fdx_cap=1 adv_1000hdx_cap=0 adv_100fdx_cap=0 adv_100hdx_cap=0 adv_10fdx_cap=0 adv_10hdx_cap=0;
But perhaps not....
Personally I prefer using dladm in Solaris rather than ndd.
What does dladm show-dev and dladm show-linkprop now show?
Avatar of NORSAR

ASKER

I have not tried to put everthing on one line... Will have a go later.

# dladm show-dev
bge0            link: up        speed: 1000  Mbps       duplex: full
bge1            link: up        speed: 1000  Mbps       duplex: full
bge2            link: unknown   speed: 0     Mbps       duplex: unknown
bge3            link: unknown   speed: 0     Mbps       duplex: unknown

# dladm show-linkprop now
dladm: cannot get link property 'zone': invalid argument

# dladm show-linkprop
LINK         PROPERTY        VALUE          DEFAULT        POSSIBLE            
bge0         zone            --             --             --                  
bge1         zone            --             --             --                  
bge2         zone            --             --             --                  
bge3         zone            --             --             --                  
Avatar of NORSAR

ASKER

Everything on one line in bge.conf gave the same result...

# dladm show-dev
bge0            link: down      speed: 0     Mbps       duplex: unknown
bge1            link: down      speed: 0     Mbps       duplex: unknown
bge2            link: unknown   speed: 0     Mbps       duplex: unknown
bge3            link: unknown   speed: 0     Mbps       duplex: unknown
# dladm show-linkprop
LINK         PROPERTY        VALUE          DEFAULT        POSSIBLE
bge0         zone            --             --             --
bge1         zone            --             --             --
bge2         zone            --             --             --
bge3         zone            --             --             --
# ndd -get /dev/bge0 adv_autoneg_cap
0
# ndd -get /dev/bge0 adv_1000fdx_cap
1

Cisco 4006 switches, broadcom network devices and Solaris 10 - no good...
Hmm, interesting...

In your first post (28/05/09 12:46 AM), bge0 and bge1 are 1000 fdx and up.  The other bge's (3 and 4) are unknown (so perhaps not configured?)
In the second post (28/05/09 04:29 AM), bge0 and bge1 are down.  
I'd suggest you stick to using dladm and not the configuration file (simply the environment - it may be that dladm will write to the bge config file, but I don't know as I don't have a bge to test with).

The first posting indicates that the bge0 and 1 cards have set themselves to 1000FDX and marked the link up.  You should check that from the Cisco side of things (from memory you go into exec mode (enable) and type sh int status and confirm you've marked those ports as 1000FDX).

If you've definitely marked BOTH ends as 1000FDX and the link still won't come up, you have either of a broken port on the switch (least likely), a broken network card or a broken/incorrect cable (most likely).

I know that we've had to switch off 'linktrap' on some high end Nortel routers that we have here to connect to Solaris SPARC boxes in the past.
Avatar of NORSAR

ASKER

Ops, sorry for the confusion.

To clarify:

The first dladm output is when bge/switch  has been set back to auto; network works fine.

The second dladm is from bge set to no autosense and 1000fdx.

Have tested on two identical 240 hosts and different Cisco ports; same thing.

sh int on the Cisco states link down when autosense is off and speed is set to 1000fdx on Sun. The link LEDs on the 240 are off.

BR,

Nils
Are you using auto-sense rather than auto-negotiate on the switch?
Auto-sense is a proprietary method (well there are several) and doesn't usually do duplex settings.
I'd recommend you set it to auto-negotiate and hence, comply with the Ethernet standard.
So when you're set to auto-negotiate at both ends, the link is up and flaps (goes up and down) occasionally.  When you manually set both sides to 1000 FDX you get no link.  Is that right?  If it is, then that makes no sense what-so-ever.  If when you manually match both sides of the link, the IDLE frames are not detected by the receiver, then you've got:
1) Broken switch
2) Broken NIC
3) Broken cable
4) Broken driver
You say you're happy that none of the ports, NICs or cables are broken, then it must be the driver.
I'd suggest you ask Cisco or your Sun supplier about this.
Oh second thought, there may be some global setting on the Cisco that overrides per-port manual configuration, but I don't know enough about Cisco (haven't touched them for over ten years now) to be of any help.
Sorry.
Avatar of NORSAR

ASKER


http://www.cisco.com/en/US/tech/tk870/tk877/tk880/technologies_tech_note09186a008011a218.shtml

Have done the following:

/usr/sbin/ndd -set /dev/ip ip_path_mtu_discovery 0
/usr/sbin/ndd -set /dev/tcp tcp_mss_max_ipv4 1460

Will take some days before I see the results...

Avatar of NORSAR

ASKER

ndd-commands did not help... From /var/adm/messages:

Jun  7 23:29:54 iscsi-freya genunix: [ID 936769 kern.info] rsm0 is /pseudo/rsm@0
Jun  8 10:42:02 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
Jun  8 11:03:02 iscsi-freya /etc/rc2.d/S68nettune - NKS changed network parameters using ndd
Jun  8 16:19:48 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
Jun  9 08:11:39 iscsi-freya genunix: [ID 408114 kern.info] /pci@1f,700000/network@2 (bge0) online
Jun  9 08:12:06 iscsi-freya bge: [ID 801725 kern.info] NOTICE: bge2: link down (initialized)
Jun  9 08:12:10 iscsi-freya bge: [ID 801725 kern.info] NOTICE: bge3: link down (initialized)
SOLUTION
Avatar of elf_bin
elf_bin

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of NORSAR

ASKER

My guess is that there is some kind of incompability between the QuadGB Broadcom card and our Cisco switch.

We installed a Sun X4445A QuadGB card this morning. I'll see how this behaves.

ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial