NORSAR
asked on
Solaris 10 Sparc bge issues - Repeated link up messages
I have several SunFire V240 with identical hardware and identical operatimg systems. Solaris 10 10/08 release. Recommended & security patches. Quad onboard ethernet (Broadcom) with two ports used (0 & 1) on different subnets. iSCSI-connected storage. Host is connected to a Cisco switch. Auto-sense speed/duplex.
var/adm/messages:
/var/adm/messages:May 9 09:02:34 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 3 01:15:47 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 4 13:11:41 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 4 19:09:15 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 6 11:59:37 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 6 14:10:02 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 6 14:16:01 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 6 14:16:23 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 6 16:16:45 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 6 16:18:22 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 6 16:25:38 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 6 16:26:41 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.1:Apr 25 16:13:54 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.1:Apr 25 18:35:13 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.1:Apr 25 19:17:32 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
........
Network interface does not go down; only interface up messages.
# uname -a
SunOS iscsi-freya 5.10 Generic_138888-06 sun4u sparc SUNW,Sun-Fire-V240
# ifconfig -a
lo0: flags=2001000849<UP,LOOPBA CK,RUNNING ,MULTICAST ,IPv4,VIRT UAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
bge0: flags=201000843<UP,BROADCA ST,RUNNING ,MULTICAST ,IPv4,CoS> mtu 1500 index 2
inet 10.10.11.10 netmask ffffff00 broadcast 10.10.11.255
ether 0:3:ba:9f:a2:21
bge1: flags=201000843<UP,BROADCA ST,RUNNING ,MULTICAST ,IPv4,CoS> mtu 1500 index 3
inet 10.10.10.3 netmask ffffff00 broadcast 10.10.10.255
ether 0:3:ba:9f:a2:22
# kstat -m bge | grep chip_type
chip_type 5794
chip_type 5794
chip_type 5794
chip_type 5794
# modinfo | grep bge
141 7b6e0000 148f8 202 1 bge (BCM579x driver v0.56)
Any ideas on how to fix this?
var/adm/messages:
/var/adm/messages:May 9 09:02:34 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 3 01:15:47 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 4 13:11:41 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 4 19:09:15 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 6 11:59:37 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 6 14:10:02 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 6 14:16:01 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 6 14:16:23 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 6 16:16:45 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 6 16:18:22 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 6 16:25:38 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May 6 16:26:41 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.1:Apr 25 16:13:54 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.1:Apr 25 18:35:13 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.1:Apr 25 19:17:32 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
........
Network interface does not go down; only interface up messages.
# uname -a
SunOS iscsi-freya 5.10 Generic_138888-06 sun4u sparc SUNW,Sun-Fire-V240
# ifconfig -a
lo0: flags=2001000849<UP,LOOPBA
inet 127.0.0.1 netmask ff000000
bge0: flags=201000843<UP,BROADCA
inet 10.10.11.10 netmask ffffff00 broadcast 10.10.11.255
ether 0:3:ba:9f:a2:21
bge1: flags=201000843<UP,BROADCA
inet 10.10.10.3 netmask ffffff00 broadcast 10.10.10.255
ether 0:3:ba:9f:a2:22
# kstat -m bge | grep chip_type
chip_type 5794
chip_type 5794
chip_type 5794
chip_type 5794
# modinfo | grep bge
141 7b6e0000 148f8 202 1 bge (BCM579x driver v0.56)
Any ideas on how to fix this?
Is there anything in the logs on the Cisco switch? I'd stop auto-sensing and set both the switch port and the NIC to 1000/Full as there's been a lot of issues around that in the past.
ASKER
I have installed the latest sec. and rec. patch cluster from Sun. Problem remains the same.
Set both the switch and the host to 1000Mbps Full-Duplex.
ASKER
I'm very reluctant to disable autoneg on switch and SunFire...
Why?
Agree with Elf_bin, have seen a lot of issues with autoneg in the past. Besides, the next natural step in yourl course of action to try and diagnose the issue might well be to try forcing link/speed settings.
ASKER
Tried to force fdx/1000 on the bge-interfaces. On my Cisco 4006:
conf t
int g4/7
speed 1000
int g4/15
speed
^Z
wr mem
Then edited /platform/sun4u/kernel/drv /bge.conf. First I changed
adv_autoneg_cap = 0;
adv_1000fdx_cap = 1;
and rebooted. The network did not come up at all - no link on the interfaces on the switch.
Then I reset the bge.conf to default and changed
speed = 1000;
full-duplex = 1;
and rebooted. No network this time either...
conf t
int g4/7
speed 1000
int g4/15
speed
^Z
wr mem
Then edited /platform/sun4u/kernel/drv
adv_autoneg_cap = 0;
adv_1000fdx_cap = 1;
and rebooted. The network did not come up at all - no link on the interfaces on the switch.
Then I reset the bge.conf to default and changed
speed = 1000;
full-duplex = 1;
and rebooted. No network this time either...
I thought you had to supply all the options in the bge file in one line as in:
adv_autoneg_cap=0 adv_1000fdx_cap=1 adv_1000hdx_cap=0 adv_100fdx_cap=0 adv_100hdx_cap=0 adv_10fdx_cap=0 adv_10hdx_cap=0;
But perhaps not....
Personally I prefer using dladm in Solaris rather than ndd.
What does dladm show-dev and dladm show-linkprop now show?
adv_autoneg_cap=0 adv_1000fdx_cap=1 adv_1000hdx_cap=0 adv_100fdx_cap=0 adv_100hdx_cap=0 adv_10fdx_cap=0 adv_10hdx_cap=0;
But perhaps not....
Personally I prefer using dladm in Solaris rather than ndd.
What does dladm show-dev and dladm show-linkprop now show?
ASKER
I have not tried to put everthing on one line... Will have a go later.
# dladm show-dev
bge0 link: up speed: 1000 Mbps duplex: full
bge1 link: up speed: 1000 Mbps duplex: full
bge2 link: unknown speed: 0 Mbps duplex: unknown
bge3 link: unknown speed: 0 Mbps duplex: unknown
# dladm show-linkprop now
dladm: cannot get link property 'zone': invalid argument
# dladm show-linkprop
LINK PROPERTY VALUE DEFAULT POSSIBLE
bge0 zone -- -- --
bge1 zone -- -- --
bge2 zone -- -- --
bge3 zone -- -- --
# dladm show-dev
bge0 link: up speed: 1000 Mbps duplex: full
bge1 link: up speed: 1000 Mbps duplex: full
bge2 link: unknown speed: 0 Mbps duplex: unknown
bge3 link: unknown speed: 0 Mbps duplex: unknown
# dladm show-linkprop now
dladm: cannot get link property 'zone': invalid argument
# dladm show-linkprop
LINK PROPERTY VALUE DEFAULT POSSIBLE
bge0 zone -- -- --
bge1 zone -- -- --
bge2 zone -- -- --
bge3 zone -- -- --
ASKER
Everything on one line in bge.conf gave the same result...
# dladm show-dev
bge0 link: down speed: 0 Mbps duplex: unknown
bge1 link: down speed: 0 Mbps duplex: unknown
bge2 link: unknown speed: 0 Mbps duplex: unknown
bge3 link: unknown speed: 0 Mbps duplex: unknown
# dladm show-linkprop
LINK PROPERTY VALUE DEFAULT POSSIBLE
bge0 zone -- -- --
bge1 zone -- -- --
bge2 zone -- -- --
bge3 zone -- -- --
# ndd -get /dev/bge0 adv_autoneg_cap
0
# ndd -get /dev/bge0 adv_1000fdx_cap
1
Cisco 4006 switches, broadcom network devices and Solaris 10 - no good...
# dladm show-dev
bge0 link: down speed: 0 Mbps duplex: unknown
bge1 link: down speed: 0 Mbps duplex: unknown
bge2 link: unknown speed: 0 Mbps duplex: unknown
bge3 link: unknown speed: 0 Mbps duplex: unknown
# dladm show-linkprop
LINK PROPERTY VALUE DEFAULT POSSIBLE
bge0 zone -- -- --
bge1 zone -- -- --
bge2 zone -- -- --
bge3 zone -- -- --
# ndd -get /dev/bge0 adv_autoneg_cap
0
# ndd -get /dev/bge0 adv_1000fdx_cap
1
Cisco 4006 switches, broadcom network devices and Solaris 10 - no good...
Hmm, interesting...
In your first post (28/05/09 12:46 AM), bge0 and bge1 are 1000 fdx and up. The other bge's (3 and 4) are unknown (so perhaps not configured?)
In the second post (28/05/09 04:29 AM), bge0 and bge1 are down.
I'd suggest you stick to using dladm and not the configuration file (simply the environment - it may be that dladm will write to the bge config file, but I don't know as I don't have a bge to test with).
The first posting indicates that the bge0 and 1 cards have set themselves to 1000FDX and marked the link up. You should check that from the Cisco side of things (from memory you go into exec mode (enable) and type sh int status and confirm you've marked those ports as 1000FDX).
If you've definitely marked BOTH ends as 1000FDX and the link still won't come up, you have either of a broken port on the switch (least likely), a broken network card or a broken/incorrect cable (most likely).
I know that we've had to switch off 'linktrap' on some high end Nortel routers that we have here to connect to Solaris SPARC boxes in the past.
In your first post (28/05/09 12:46 AM), bge0 and bge1 are 1000 fdx and up. The other bge's (3 and 4) are unknown (so perhaps not configured?)
In the second post (28/05/09 04:29 AM), bge0 and bge1 are down.
I'd suggest you stick to using dladm and not the configuration file (simply the environment - it may be that dladm will write to the bge config file, but I don't know as I don't have a bge to test with).
The first posting indicates that the bge0 and 1 cards have set themselves to 1000FDX and marked the link up. You should check that from the Cisco side of things (from memory you go into exec mode (enable) and type sh int status and confirm you've marked those ports as 1000FDX).
If you've definitely marked BOTH ends as 1000FDX and the link still won't come up, you have either of a broken port on the switch (least likely), a broken network card or a broken/incorrect cable (most likely).
I know that we've had to switch off 'linktrap' on some high end Nortel routers that we have here to connect to Solaris SPARC boxes in the past.
ASKER
Ops, sorry for the confusion.
To clarify:
The first dladm output is when bge/switch has been set back to auto; network works fine.
The second dladm is from bge set to no autosense and 1000fdx.
Have tested on two identical 240 hosts and different Cisco ports; same thing.
sh int on the Cisco states link down when autosense is off and speed is set to 1000fdx on Sun. The link LEDs on the 240 are off.
BR,
Nils
To clarify:
The first dladm output is when bge/switch has been set back to auto; network works fine.
The second dladm is from bge set to no autosense and 1000fdx.
Have tested on two identical 240 hosts and different Cisco ports; same thing.
sh int on the Cisco states link down when autosense is off and speed is set to 1000fdx on Sun. The link LEDs on the 240 are off.
BR,
Nils
Are you using auto-sense rather than auto-negotiate on the switch?
Auto-sense is a proprietary method (well there are several) and doesn't usually do duplex settings.
I'd recommend you set it to auto-negotiate and hence, comply with the Ethernet standard.
So when you're set to auto-negotiate at both ends, the link is up and flaps (goes up and down) occasionally. When you manually set both sides to 1000 FDX you get no link. Is that right? If it is, then that makes no sense what-so-ever. If when you manually match both sides of the link, the IDLE frames are not detected by the receiver, then you've got:
1) Broken switch
2) Broken NIC
3) Broken cable
4) Broken driver
You say you're happy that none of the ports, NICs or cables are broken, then it must be the driver.
I'd suggest you ask Cisco or your Sun supplier about this.
Auto-sense is a proprietary method (well there are several) and doesn't usually do duplex settings.
I'd recommend you set it to auto-negotiate and hence, comply with the Ethernet standard.
So when you're set to auto-negotiate at both ends, the link is up and flaps (goes up and down) occasionally. When you manually set both sides to 1000 FDX you get no link. Is that right? If it is, then that makes no sense what-so-ever. If when you manually match both sides of the link, the IDLE frames are not detected by the receiver, then you've got:
1) Broken switch
2) Broken NIC
3) Broken cable
4) Broken driver
You say you're happy that none of the ports, NICs or cables are broken, then it must be the driver.
I'd suggest you ask Cisco or your Sun supplier about this.
Oh second thought, there may be some global setting on the Cisco that overrides per-port manual configuration, but I don't know enough about Cisco (haven't touched them for over ten years now) to be of any help.
Sorry.
Sorry.
ASKER
http://www.cisco.com/en/US/tech/tk870/tk877/tk880/technologies_tech_note09186a008011a218.shtml
Have done the following:
/usr/sbin/ndd -set /dev/ip ip_path_mtu_discovery 0
/usr/sbin/ndd -set /dev/tcp tcp_mss_max_ipv4 1460
Will take some days before I see the results...
ASKER
ndd-commands did not help... From /var/adm/messages:
Jun 7 23:29:54 iscsi-freya genunix: [ID 936769 kern.info] rsm0 is /pseudo/rsm@0
Jun 8 10:42:02 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
Jun 8 11:03:02 iscsi-freya /etc/rc2.d/S68nettune - NKS changed network parameters using ndd
Jun 8 16:19:48 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
Jun 9 08:11:39 iscsi-freya genunix: [ID 408114 kern.info] /pci@1f,700000/network@2 (bge0) online
Jun 9 08:12:06 iscsi-freya bge: [ID 801725 kern.info] NOTICE: bge2: link down (initialized)
Jun 9 08:12:10 iscsi-freya bge: [ID 801725 kern.info] NOTICE: bge3: link down (initialized)
Jun 7 23:29:54 iscsi-freya genunix: [ID 936769 kern.info] rsm0 is /pseudo/rsm@0
Jun 8 10:42:02 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
Jun 8 11:03:02 iscsi-freya /etc/rc2.d/S68nettune - NKS changed network parameters using ndd
Jun 8 16:19:48 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
Jun 9 08:11:39 iscsi-freya genunix: [ID 408114 kern.info] /pci@1f,700000/network@2 (bge0) online
Jun 9 08:12:06 iscsi-freya bge: [ID 801725 kern.info] NOTICE: bge2: link down (initialized)
Jun 9 08:12:10 iscsi-freya bge: [ID 801725 kern.info] NOTICE: bge3: link down (initialized)
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
My guess is that there is some kind of incompability between the QuadGB Broadcom card and our Cisco switch.
We installed a Sun X4445A QuadGB card this morning. I'll see how this behaves.
We installed a Sun X4445A QuadGB card this morning. I'll see how this behaves.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Cheers,
Joel