Solved

Solaris 10 Sparc bge issues - Repeated link up messages

Posted on 2009-05-12
20
5,181 Views
Last Modified: 2013-12-27
I have several SunFire V240 with identical hardware and identical operatimg systems. Solaris 10 10/08 release. Recommended & security patches. Quad onboard ethernet (Broadcom) with two ports used (0 & 1) on different subnets. iSCSI-connected storage. Host is connected to a Cisco switch. Auto-sense speed/duplex.

var/adm/messages:

/var/adm/messages:May  9 09:02:34 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  3 01:15:47 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  4 13:11:41 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  4 19:09:15 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 11:59:37 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 14:10:02 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 14:16:01 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 14:16:23 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 16:16:45 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 16:18:22 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 16:25:38 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.0:May  6 16:26:41 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
/var/adm/messages.1:Apr 25 16:13:54 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.1:Apr 25 18:35:13 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
/var/adm/messages.1:Apr 25 19:17:32 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge1: link up 1000Mbps Full-Duplex
........

Network interface does not go down; only interface up messages.

# uname -a
SunOS iscsi-freya 5.10 Generic_138888-06 sun4u sparc SUNW,Sun-Fire-V240

# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
bge0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 2
        inet 10.10.11.10 netmask ffffff00 broadcast 10.10.11.255
        ether 0:3:ba:9f:a2:21
bge1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 3
        inet 10.10.10.3 netmask ffffff00 broadcast 10.10.10.255
        ether 0:3:ba:9f:a2:22

# kstat -m bge | grep chip_type
        chip_type                       5794
        chip_type                       5794
        chip_type                       5794
        chip_type                       5794
      
# modinfo | grep bge
141 7b6e0000  148f8 202   1  bge (BCM579x driver v0.56)

Any ideas on how to fix this?
0
Comment
Question by:NORSAR
20 Comments
 
LVL 12

Expert Comment

by:dalesit
ID: 24372161
There's a thread about this at http://osdir.com/ml/solaris.solarisx86/2005-02/msg01484.html although that is using generic Dell hardware, not Sun hardware. Have you asked Sun about this? Have you added the latest patch cluster?

Cheers,

Joel
0
 
LVL 10

Expert Comment

by:elf_bin
ID: 24384069
Is there anything in the logs on the Cisco switch?  I'd stop auto-sensing and set both the switch port and the NIC to 1000/Full as there's been a lot of issues around that in the past.
0
 

Author Comment

by:NORSAR
ID: 24428921
I have installed the latest sec. and rec. patch cluster from Sun. Problem remains the same.
0
 
LVL 10

Expert Comment

by:elf_bin
ID: 24429091
Set both the switch and the host to 1000Mbps Full-Duplex.
0
 

Author Comment

by:NORSAR
ID: 24429220
I'm very reluctant to disable autoneg on switch and SunFire...
0
 
LVL 10

Expert Comment

by:elf_bin
ID: 24431289
Why?
0
 
LVL 13

Expert Comment

by:Rowley
ID: 24431442
Agree with Elf_bin, have seen a lot of issues with autoneg in the past. Besides, the next natural step in yourl course of action to try and diagnose the issue might well be to try forcing link/speed settings.
0
 

Author Comment

by:NORSAR
ID: 24490014
Tried to force fdx/1000 on the bge-interfaces. On my Cisco 4006:

   conf t
   int g4/7
   speed 1000
   int g4/15
   speed
   ^Z
   wr mem

Then edited /platform/sun4u/kernel/drv/bge.conf. First I changed

   adv_autoneg_cap       = 0;
   adv_1000fdx_cap       = 1;

and rebooted. The network did not come up at all - no link on the interfaces on the switch.

Then I reset the bge.conf to default and changed

   speed                 = 1000;
   full-duplex           = 1;

and rebooted. No network this time either...

0
 
LVL 10

Expert Comment

by:elf_bin
ID: 24490321
I thought you had to supply all the options in the bge file in one line as in:
adv_autoneg_cap=0 adv_1000fdx_cap=1 adv_1000hdx_cap=0 adv_100fdx_cap=0 adv_100hdx_cap=0 adv_10fdx_cap=0 adv_10hdx_cap=0;
But perhaps not....
Personally I prefer using dladm in Solaris rather than ndd.
What does dladm show-dev and dladm show-linkprop now show?
0
 

Author Comment

by:NORSAR
ID: 24490480
I have not tried to put everthing on one line... Will have a go later.

# dladm show-dev
bge0            link: up        speed: 1000  Mbps       duplex: full
bge1            link: up        speed: 1000  Mbps       duplex: full
bge2            link: unknown   speed: 0     Mbps       duplex: unknown
bge3            link: unknown   speed: 0     Mbps       duplex: unknown

# dladm show-linkprop now
dladm: cannot get link property 'zone': invalid argument

# dladm show-linkprop
LINK         PROPERTY        VALUE          DEFAULT        POSSIBLE            
bge0         zone            --             --             --                  
bge1         zone            --             --             --                  
bge2         zone            --             --             --                  
bge3         zone            --             --             --                  
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 

Author Comment

by:NORSAR
ID: 24491759
Everything on one line in bge.conf gave the same result...

# dladm show-dev
bge0            link: down      speed: 0     Mbps       duplex: unknown
bge1            link: down      speed: 0     Mbps       duplex: unknown
bge2            link: unknown   speed: 0     Mbps       duplex: unknown
bge3            link: unknown   speed: 0     Mbps       duplex: unknown
# dladm show-linkprop
LINK         PROPERTY        VALUE          DEFAULT        POSSIBLE
bge0         zone            --             --             --
bge1         zone            --             --             --
bge2         zone            --             --             --
bge3         zone            --             --             --
# ndd -get /dev/bge0 adv_autoneg_cap
0
# ndd -get /dev/bge0 adv_1000fdx_cap
1

Cisco 4006 switches, broadcom network devices and Solaris 10 - no good...
0
 
LVL 10

Expert Comment

by:elf_bin
ID: 24492091
Hmm, interesting...

In your first post (28/05/09 12:46 AM), bge0 and bge1 are 1000 fdx and up.  The other bge's (3 and 4) are unknown (so perhaps not configured?)
In the second post (28/05/09 04:29 AM), bge0 and bge1 are down.  
I'd suggest you stick to using dladm and not the configuration file (simply the environment - it may be that dladm will write to the bge config file, but I don't know as I don't have a bge to test with).

The first posting indicates that the bge0 and 1 cards have set themselves to 1000FDX and marked the link up.  You should check that from the Cisco side of things (from memory you go into exec mode (enable) and type sh int status and confirm you've marked those ports as 1000FDX).

If you've definitely marked BOTH ends as 1000FDX and the link still won't come up, you have either of a broken port on the switch (least likely), a broken network card or a broken/incorrect cable (most likely).

I know that we've had to switch off 'linktrap' on some high end Nortel routers that we have here to connect to Solaris SPARC boxes in the past.
0
 

Author Comment

by:NORSAR
ID: 24492147
Ops, sorry for the confusion.

To clarify:

The first dladm output is when bge/switch  has been set back to auto; network works fine.

The second dladm is from bge set to no autosense and 1000fdx.

Have tested on two identical 240 hosts and different Cisco ports; same thing.

sh int on the Cisco states link down when autosense is off and speed is set to 1000fdx on Sun. The link LEDs on the 240 are off.

BR,

Nils
0
 
LVL 10

Expert Comment

by:elf_bin
ID: 24493339
Are you using auto-sense rather than auto-negotiate on the switch?
Auto-sense is a proprietary method (well there are several) and doesn't usually do duplex settings.
I'd recommend you set it to auto-negotiate and hence, comply with the Ethernet standard.
So when you're set to auto-negotiate at both ends, the link is up and flaps (goes up and down) occasionally.  When you manually set both sides to 1000 FDX you get no link.  Is that right?  If it is, then that makes no sense what-so-ever.  If when you manually match both sides of the link, the IDLE frames are not detected by the receiver, then you've got:
1) Broken switch
2) Broken NIC
3) Broken cable
4) Broken driver
You say you're happy that none of the ports, NICs or cables are broken, then it must be the driver.
I'd suggest you ask Cisco or your Sun supplier about this.
0
 
LVL 10

Expert Comment

by:elf_bin
ID: 24493568
Oh second thought, there may be some global setting on the Cisco that overrides per-port manual configuration, but I don't know enough about Cisco (haven't touched them for over ten years now) to be of any help.
Sorry.
0
 

Author Comment

by:NORSAR
ID: 24570624

http://www.cisco.com/en/US/tech/tk870/tk877/tk880/technologies_tech_note09186a008011a218.shtml

Have done the following:

/usr/sbin/ndd -set /dev/ip ip_path_mtu_discovery 0
/usr/sbin/ndd -set /dev/tcp tcp_mss_max_ipv4 1460

Will take some days before I see the results...

0
 

Author Comment

by:NORSAR
ID: 24578566
ndd-commands did not help... From /var/adm/messages:

Jun  7 23:29:54 iscsi-freya genunix: [ID 936769 kern.info] rsm0 is /pseudo/rsm@0
Jun  8 10:42:02 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
Jun  8 11:03:02 iscsi-freya /etc/rc2.d/S68nettune - NKS changed network parameters using ndd
Jun  8 16:19:48 iscsi-freya bge: [ID 801593 kern.notice] NOTICE: bge0: link up 1000Mbps Full-Duplex
Jun  9 08:11:39 iscsi-freya genunix: [ID 408114 kern.info] /pci@1f,700000/network@2 (bge0) online
Jun  9 08:12:06 iscsi-freya bge: [ID 801725 kern.info] NOTICE: bge2: link down (initialized)
Jun  9 08:12:10 iscsi-freya bge: [ID 801725 kern.info] NOTICE: bge3: link down (initialized)
0
 
LVL 10

Assisted Solution

by:elf_bin
elf_bin earned 250 total points
ID: 24681095
Sorry for the delay in posting, have been on my jollies :o)

Are we right in saying that the problem with having manual settings on both sides (i.e.: on the Sun v240 and the Cisco switch) is that LINK will not be up?  If so (which is what I can gleam from these postings) that has very little to do with TCP/IP.  As I've said before, if you've correctly configured BOTH the Cisco and the Sun NIC(s) to do manual Ethernet settings and can not get the link up, then you must have one of:
1) Broken switch
2) Broken NIC
3) Broken cable
4) Broken driver
0
 

Author Comment

by:NORSAR
ID: 24743077
My guess is that there is some kind of incompability between the QuadGB Broadcom card and our Cisco switch.

We installed a Sun X4445A QuadGB card this morning. I'll see how this behaves.

0
 

Accepted Solution

by:
NORSAR earned 0 total points
ID: 24933209
Machine has run without problems for appr. three weeks with the Sun X4445A QuadGB card.
0

Featured Post

Network it in WD Red

There's an industry-leading WD Red drive for every compatible NAS system to help fulfill your data storage needs. With drives up to 8TB, WD Red offers a wide array of solutions for customers looking to build the biggest, best-performing NAS storage solution.  

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Data center, now-a-days, is referred as the home of all the advanced technologies. In-fact, most of the businesses are now establishing their entire organizational structure around the IT capabilities.
Most of the applications these days are on Cloud. Cloud is ubiquitous with many service providers in the market. Since it has many benefits such as cost reduction, software updates, remote access, disaster recovery and much more.
Viewers will learn how to connect to a wireless network using the network security key. They will also learn how to access the IP address and DNS server for connections that must be done manually. After setting up a router, find the network security…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

914 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now