Solved

Solaris 10 Sparc bge issues - Repeated link up messages

Posted on 2009-05-12
20
5,164 Views
Last Modified: 2013-12-27
I have several SunFire V240 with identical hardware and identical operatimg systems. Solaris 10 10/08 release. Recommended & security patches. Quad onboard ethernet (Broadcom) with two ports used (0 & 1) on different subnets. iSCSI-connected storage. Host is connected to a Cisco switch. Auto-sense speed/duplex.

var/adm/messages:

/var/adm/messages:May  9 09:02:34 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  3 01:15:47 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  4 13:11:41 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  4 19:09:15 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 11:59:37 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 14:10:02 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 14:16:01 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 14:16:23 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 16:16:45 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 16:18:22 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 16:25:38 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 16:26:41 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.1:Apr 25 16:13:54 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.1:Apr 25 18:35:13 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.1:Apr 25 19:17:32 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
........

Network interface does not go down; only interface up messages.

# uname -a
SunOS iscsi-freya 5.10 Generic_138888-06 sun4u sparc SUNW,Sun-Fire-V240

# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
bge0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 2
        inet 10.10.11.10 netmask ffffff00 broadcast 10.10.11.255
        ether 0:3:ba:9f:a2:21
bge1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 3
        inet 10.10.10.3 netmask ffffff00 broadcast 10.10.10.255
        ether 0:3:ba:9f:a2:22

# kstat -m bge | grep chip_type
        chip_type                       5794
        chip_type                       5794
        chip_type                       5794
        chip_type                       5794
      
# modinfo | grep bge
141 7b6e0000  148f8 202   1  bge (BCM579x driver v0.56)

Any ideas on how to fix this?
0
Comment
Question by:NORSAR
20 Comments
 
LVL 12

Expert Comment

by:dalesit
ID: 24372161
There's a thread about this at http://osdir.com/ml/solaris.solarisx86/2005-02/msg01484.html although that is using generic Dell hardware, not Sun hardware. Have you asked Sun about this? Have you added the latest patch cluster?

Cheers,

Joel
0
 
LVL 10

Expert Comment

by:elf_bin
ID: 24384069
Is there anything in the logs on the Cisco switch?  I'd stop auto-sensing and set both the switch port and the NIC to 1000/Full as there's been a lot of issues around that in the past.
0
 

Author Comment

by:NORSAR
ID: 24428921
I have installed the latest sec. and rec. patch cluster from Sun. Problem remains the same.
0
 
LVL 10

Expert Comment

by:elf_bin
ID: 24429091
Set both the switch and the host to 1000Mbps Full-Duplex.
0
 

Author Comment

by:NORSAR
ID: 24429220
I'm very reluctant to disable autoneg on switch and SunFire...
0
 
LVL 10

Expert Comment

by:elf_bin
ID: 24431289
Why?
0
 
LVL 13

Expert Comment

by:Rowley
ID: 24431442
Agree with Elf_bin, have seen a lot of issues with autoneg in the past. Besides, the next natural step in yourl course of action to try and diagnose the issue might well be to try forcing link/speed settings.
0
 

Author Comment

by:NORSAR
ID: 24490014
Tried to force fdx/1000 on the bge-interfaces. On my Cisco 4006:

   conf t
   int g4/7
   speed 1000
   int g4/15
   speed
   ^Z
   wr mem

Then edited /platform/sun4u/kernel/drv/bge.conf. First I changed

   adv_autoneg_cap       = 0;
   adv_1000fdx_cap       = 1;

and rebooted. The network did not come up at all - no link on the interfaces on the switch.

Then I reset the bge.conf to default and changed

   speed                 = 1000;
   full-duplex           = 1;

and rebooted. No network this time either...

0
 
LVL 10

Expert Comment

by:elf_bin
ID: 24490321
I thought you had to supply all the options in the bge file in one line as in:
adv_autoneg_cap=0 adv_1000fdx_cap=1 adv_1000hdx_cap=0 adv_100fdx_cap=0 adv_100hdx_cap=0 adv_10fdx_cap=0 adv_10hdx_cap=0;
But perhaps not....
Personally I prefer using dladm in Solaris rather than ndd.
What does dladm show-dev and dladm show-linkprop now show?
0
 

Author Comment

by:NORSAR
ID: 24490480
I have not tried to put everthing on one line... Will have a go later.

# dladm show-dev
bge0            link: up        speed: 1000  Mbps       duplex: full
bge1            link: up        speed: 1000  Mbps       duplex: full
bge2            link: unknown   speed: 0     Mbps       duplex: unknown
bge3            link: unknown   speed: 0     Mbps       duplex: unknown

# dladm show-linkprop now
dladm: cannot get link property 'zone': invalid argument

# dladm show-linkprop
LINK         PROPERTY        VALUE          DEFAULT        POSSIBLE            
bge0         zone            --             --             --                  
bge1         zone            --             --             --                  
bge2         zone            --             --             --                  
bge3         zone            --             --             --                  
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 

Author Comment

by:NORSAR
ID: 24491759
Everything on one line in bge.conf gave the same result...

# dladm show-dev
bge0            link: down      speed: 0     Mbps       duplex: unknown
bge1            link: down      speed: 0     Mbps       duplex: unknown
bge2            link: unknown   speed: 0     Mbps       duplex: unknown
bge3            link: unknown   speed: 0     Mbps       duplex: unknown
# dladm show-linkprop
LINK         PROPERTY        VALUE          DEFAULT        POSSIBLE
bge0         zone            --             --             --
bge1         zone            --             --             --
bge2         zone            --             --             --
bge3         zone            --             --             --
# ndd -get /dev/bge0 adv_autoneg_cap
0
# ndd -get /dev/bge0 adv_1000fdx_cap
1

Cisco 4006 switches, broadcom network devices and Solaris 10 - no good...
0
 
LVL 10

Expert Comment

by:elf_bin
ID: 24492091
Hmm, interesting...

In your first post (28/05/09 12:46 AM), bge0 and bge1 are 1000 fdx and up.  The other bge's (3 and 4) are unknown (so perhaps not configured?)
In the second post (28/05/09 04:29 AM), bge0 and bge1 are down.  
I'd suggest you stick to using dladm and not the configuration file (simply the environment - it may be that dladm will write to the bge config file, but I don't know as I don't have a bge to test with).

The first posting indicates that the bge0 and 1 cards have set themselves to 1000FDX and marked the link up.  You should check that from the Cisco side of things (from memory you go into exec mode (enable) and type sh int status and confirm you've marked those ports as 1000FDX).

If you've definitely marked BOTH ends as 1000FDX and the link still won't come up, you have either of a broken port on the switch (least likely), a broken network card or a broken/incorrect cable (most likely).

I know that we've had to switch off 'linktrap' on some high end Nortel routers that we have here to connect to Solaris SPARC boxes in the past.
0
 

Author Comment

by:NORSAR
ID: 24492147
Ops, sorry for the confusion.

To clarify:

The first dladm output is when bge/switch  has been set back to auto; network works fine.

The second dladm is from bge set to no autosense and 1000fdx.

Have tested on two identical 240 hosts and different Cisco ports; same thing.

sh int on the Cisco states link down when autosense is off and speed is set to 1000fdx on Sun. The link LEDs on the 240 are off.

BR,

Nils
0
 
LVL 10

Expert Comment

by:elf_bin
ID: 24493339
Are you using auto-sense rather than auto-negotiate on the switch?
Auto-sense is a proprietary method (well there are several) and doesn't usually do duplex settings.
I'd recommend you set it to auto-negotiate and hence, comply with the Ethernet standard.
So when you're set to auto-negotiate at both ends, the link is up and flaps (goes up and down) occasionally.  When you manually set both sides to 1000 FDX you get no link.  Is that right?  If it is, then that makes no sense what-so-ever.  If when you manually match both sides of the link, the IDLE frames are not detected by the receiver, then you've got:
1) Broken switch
2) Broken NIC
3) Broken cable
4) Broken driver
You say you're happy that none of the ports, NICs or cables are broken, then it must be the driver.
I'd suggest you ask Cisco or your Sun supplier about this.
0
 
LVL 10

Expert Comment

by:elf_bin
ID: 24493568
Oh second thought, there may be some global setting on the Cisco that overrides per-port manual configuration, but I don't know enough about Cisco (haven't touched them for over ten years now) to be of any help.
Sorry.
0
 

Author Comment

by:NORSAR
ID: 24570624

http://www.cisco.com/en/US/tech/tk870/tk877/tk880/technologies_tech_note09186a008011a218.shtml

Have done the following:

/usr/sbin/ndd -set /dev/ip ip_path_mtu_discovery 0
/usr/sbin/ndd -set /dev/tcp tcp_mss_max_ipv4 1460

Will take some days before I see the results...

0
 

Author Comment

by:NORSAR
ID: 24578566
ndd-commands did not help... From /var/adm/messages:

Jun  7 23:29:54 iscsi-freya genunix: [ID 936769 kern.info] rsm0 is /pseudo/rsm@0
Jun  8 10:42:02 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
Jun  8 11:03:02 iscsi-freya /etc/rc2.d/S68nettune - NKS changed network parameters using ndd
Jun  8 16:19:48 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
Jun  9 08:11:39 iscsi-freya genunix: [ID 408114 kern.info] /pci@1f,700000/network@2 (bge0) online
Jun  9 08:12:06 iscsi-freya bge: [ID 801725 kern.info] NOTICE: bge2: link down (initialized)
Jun  9 08:12:10 iscsi-freya bge: [ID 801725 kern.info] NOTICE: bge3: link down (initialized)
0
 
LVL 10

Assisted Solution

by:elf_bin
elf_bin earned 250 total points
ID: 24681095
Sorry for the delay in posting, have been on my jollies :o)

Are we right in saying that the problem with having manual settings on both sides (i.e.: on the Sun v240 and the Cisco switch) is that LINK will not be up?  If so (which is what I can gleam from these postings) that has very little to do with TCP/IP.  As I've said before, if you've correctly configured BOTH the Cisco and the Sun NIC(s) to do manual Ethernet settings and can not get the link up, then you must have one of:
1) Broken switch
2) Broken NIC
3) Broken cable
4) Broken driver
0
 

Author Comment

by:NORSAR
ID: 24743077
My guess is that there is some kind of incompability between the QuadGB Broadcom card and our Cisco switch.

We installed a Sun X4445A QuadGB card this morning. I'll see how this behaves.

0
 

Accepted Solution

by:
NORSAR earned 0 total points
ID: 24933209
Machine has run without problems for appr. three weeks with the Sun X4445A QuadGB card.
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

#Citrix #Citrix Netscaler #HTTP Compression #Load Balance
Even if you have implemented a Mobile Device Management solution company wide, it is a good idea to make sure you are taking into account all of the major risks to your electronic protected health information (ePHI).
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now