Link to home
Start Free TrialLog in
Avatar of emherman
emherman

asked on

IP Masquerade problems in RH Linux

I have a RH Linux 6.2 box, configured to be an Internet gateway/firewall. I have the default IP Masquerading modules that came with 6.2. I dial out to the ISP using "ifup ppp0".

I have five machines that can access the gateway and the Internet (one/two at a time - slowly). My XP, w2000, and NT4  boxes all reach the Net through the gateway just fine. I have downloaded ISO CD's from them too.

My problem comes with the RH 7.2 server and RH 7.2 workstations. I can "surf" the net (through the RH 6.2 gateway) but frequently lock the gateway up. I then have to "ifdown ppp0"/"ifup ppp0" the gateway to get the 7.2 boxes to continue on the Internet. I can't download any files over about 50KB (via http) without locking up the gateway.

I need to stop this RH7.2 to 6.2 lockup problem.
ASKER CERTIFIED SOLUTION
Avatar of jlevie
jlevie

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of emherman
emherman

ASKER

I'll take a crack at your solutions tonight. thanks.
listening ..
OK this is what I know:

Using the Gateway (brand) GP6-350 (which is my Linux workstation), the BIOS was set to "Plug and Play O/S -- NO". I had both motherboard com ports enabled, but I disabled one of them (the one that represents com 2). The motherboard has an embedded Ensoniq sound chip which I use. The computer has a USR 56k ISA hardware modem with jumpers set to "plug and pray".

---------------------------------

Here are the results of "dmesg | grep -i irq":

PCI: Using IRQ router PIIX [8086/7110] at 00:07.0
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI ISAPNP enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at port 0x02f8 (irq = 3) is a 16550A
PIIX4: not 100% native mode: will probe irqs later
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
PCI: Found IRQ 9 for device 00:07.2
usb-uhci.c: USB UHCI at I/O 0x1440, IRQ 9
PCI: Found IRQ 10 for device 00:0f.0
eth0: Lite-On 82c168 PNIC rev 33 at 0xd087f000, 00:A0:CC:3D:19:98, IRQ 10.
PCI: Found IRQ 11 for device 00:0c.0
es1371: found es1371 rev 4 at io 0x1400 irq 11

------------------------------

Here are the results of "cat /proc/interrupts":

           CPU0      
  0:    3858229          XT-PIC  timer
  1:        384          XT-PIC  keyboard
  2:          0          XT-PIC  cascade
  8:          1          XT-PIC  rtc
  9:          0          XT-PIC  usb-uhci
 10:      30526          XT-PIC  eth0
 11:       3029          XT-PIC  es1371
 12:      32664          XT-PIC  PS/2 Mouse
 14:      18617          XT-PIC  ide0
 15:      79719          XT-PIC  ide1
NMI:          0
ERR:          0

---------------------------------

Here are the results of "cat /proc/pci":

PCI devices found:
  Bus  0, device   0, function  0:
    Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge (rev 3).
      Master Capable.  Latency=64.  
      Prefetchable 32 bit memory at 0xf8000000 [0xfbffffff].
  Bus  0, device   1, function  0:
    PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev 3).
      Master Capable.  Latency=128.  Min Gnt=140.
  Bus  0, device   7, function  0:
    ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 2).
  Bus  0, device   7, function  1:
    IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 1).
      Master Capable.  Latency=64.  
      I/O at 0x1460 [0x146f].
  Bus  0, device   7, function  2:
    USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 1).
      IRQ 9.
      Master Capable.  Latency=64.  
      I/O at 0x1440 [0x145f].
  Bus  0, device   7, function  3:
    Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 2).
      IRQ 9.
  Bus  0, device  12, function  0:
    Multimedia audio controller: Ensoniq ES1371 [AudioPCI-97] (rev 4).
      IRQ 11.
      Master Capable.  Latency=96.  Min Gnt=12.Max Lat=128.
      I/O at 0x1400 [0x143f].
  Bus  0, device  15, function  0:
    Ethernet controller: Lite-On Communications Inc LNE100TX (rev 33).
      IRQ 10.
      Master Capable.  Latency=64.  
      I/O at 0x1000 [0x10ff].
      Non-prefetchable 32 bit memory at 0xf4000000 [0xf40000ff].
  Bus  1, device   0, function  0:
    VGA compatible controller: ATI Technologies Inc 3D Rage Pro AGP 1X/2X (rev 92).
      Master Capable.  Latency=66.  Min Gnt=8.
      Non-prefetchable 32 bit memory at 0xf5000000 [0xf5ffffff].
      I/O at 0x9000 [0x90ff].
      Non-prefetchable 32 bit memory at 0xf4100000 [0xf4100fff].


------------------------------------

Before I go any farther, I don't know how to tell if there is anything but an obvious intereference there. How does it look to you?

Well, the most obvious thing that leaps out at me is that I see nothing that looks like eth1. So there's definitely something wrong with the hardware configuration.

Since I can't see any resources assigned for eth1, I'd guess that it's "hiding behind something". Could I see what 'ifconfig eth1' shows? The interrupt ought to be in that output.
Avatar of The--Captain
Jlevie's comment seem on the level (sorry, I've been waiting to make that pun for ages).  Another thing to try - disable that sound card and USB controller (if you are not using them), and any other hardware that is not in use.  I am also interested in the output of ifconfig -a, but for different reasons...  I have seen boxes in the past that give excessive ethernet collisions/errors (but only when talking to specific other machines) until enough hardware was swapped out of them to make them behave - I am wondering if this is one of those cases.

Cheers,
-Jon
This was the results of "/sbin/ifconfig -a". The reason that /sbin/ifconfig eth1 didn't work is that the ethernet card is assigned to eth0. BTW - these results are coming from the RH7.2 workstation (know as "pig"). Cow is the 7.2 server and "troll" is the RH6.2 gateway.

eth0      Link encap:Ethernet  HWaddr 00:A0:CC:3D:19:98
          inet addr:192.168.1.17  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3750 errors:1 dropped:0 overruns:0 frame:0
          TX packets:2963 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          RX bytes:1043305 (1018.8 Kb)  TX bytes:287599 (280.8 Kb)
          Interrupt:10 Base address:0xf000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:182 errors:0 dropped:0 overruns:0 frame:0
          TX packets:182 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:12410 (12.1 Kb)  TX bytes:12410 (12.1 Kb)
192.168.1.17 is the address for pig, 192.168.1.1 is the address for cow, 192.168.1.5 is the address for troll
Hmm, I think there's been a bit of confusion here. The data that I asked to see should have all come from the gateway box. I was wrong in asking about eth1. Looking back at the question I see that you are using PPP for the Internet link. For some reason I got it into my head that the gateway had two ethernets.

What does 'ifconfig -a' on the gateway show?
eth0      Link encap:Ethernet  HWaddr 00:A0:CC:D0:9F:8F  
          inet addr:192.168.1.5  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:281529 errors:0 dropped:0 overruns:0 frame:0
          TX packets:238860 errors:3 dropped:0 overruns:0 carrier:3
          collisions:0 txqueuelen:100
          Interrupt:10 Base address:0xd400

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3924  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0

ppp0      Link encap:Point-to-Point Protocol  
          inet addr:xxx.xxx.xxx.xxx P-t-P:xxx.xxx.xxx.xxx Mask:255.255.255.255
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
          RX packets:12508 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13670 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:10

I "xxx.xxx.xxx.xxx" the internet address because they had valid external IP addresses. If you need them I can e-mail them to you.
I'm streaming real audio to my w2000 box now and operating as I normally do with my mail client on the Linux workstation. If I get the connection to fail. I'll get another "ifconfig -a" on the gateway (troll).
Hmm, you are showing carrier loss/TX errors on the ethernet controller. That's not normal and is probably an indicator of something wrong.

What are you using in the inside network? A hub or switch and what make/model? You could have a bad port or cable, could you swap ports and/or cables?
I run three machines on the lower floor into a Netsurf 8 port "10/100 switch hub". I connect the "uplink" port (lower) to port 1 (upper) on another Netsurf 8 port switch (different model) and link to three more machines on the upper floor. Yeah they are no-name switches.

Currently I have 248862 TX packets and still only three errors. I'd like to wait to see the failure again to see what it does. Then I'll swap ports/cables.

Let me try to download something right quick..
OK, I knew I could make it fail easily.

I went to download a program (http) 902k in size using Netscape 6.2. I got 40k of it and then it blew the Internet connection. When this happens I telnet to the gateway "troll" (I know I need to run SSH) and the login prompt stalls (only when this happens). It will take two to three minutes to get the login prompt (versus about one second) from a remote computer. Once I get that, I can login and "ifdown ppp0" and "ifup ppp0" and things will go again. Sometimes I need to restart the gateway (troll) to get it to go.

If I run to the lower level, I can access the box directly, and login immediately... even when stalled.

All internet access is now stopped until I reset the ppp0 connection.

250565 TX packets and still only three errors.

** This is one of the times where I can't reestablish the ppp0 connection and I have to reboot...

On the completion of a reboot, I have 59 TX packets and 1 error and 1 carrier.

After the reboot, everything is fine
The login delay is most likely due to a DNS timeout. Since the Internet link is hosed the gateway can't get to DNS to check for a reverse lookup of the IP of the telnet client.

From what you described it doesn't sound like it's an ethernet problem. This sounds more like a software problem.

Has the gateway had all applicable RedHat errata applied to it? Or is it still running the 'as installed' packages?
OK, I knew I could make it fail easily.

I went to download a program (http) 902k in size using Netscape 6.2. I got 40k of it and then it blew the Internet connection. When this happens I telnet to the gateway "troll" (I know I need to run SSH) and the login prompt stalls (only when this happens). It will take two to three minutes to get the login prompt (versus about one second) from a remote computer. Once I get that, I can login and "ifdown ppp0" and "ifup ppp0" and things will go again. Sometimes I need to restart the gateway (troll) to get it to go.

If I run to the lower level, I can access the box directly, and login immediately... even when stalled.

All internet access is now stopped until I reset the ppp0 connection.

250565 TX packets and still only three errors.

** This is one of the times where I can't reestablish the ppp0 connection and I have to reboot...

On the completion of a reboot, I have 59 TX packets and 1 error and 1 carrier.

After the reboot, everything is fine
OK, I knew I could make it fail easily.

I went to download a program (http) 902k in size using Netscape 6.2. I got 40k of it and then it blew the Internet connection. When this happens I telnet to the gateway "troll" (I know I need to run SSH) and the login prompt stalls (only when this happens). It will take two to three minutes to get the login prompt (versus about one second) from a remote computer. Once I get that, I can login and "ifdown ppp0" and "ifup ppp0" and things will go again. Sometimes I need to restart the gateway (troll) to get it to go.

If I run to the lower level, I can access the box directly, and login immediately... even when stalled.

All internet access is now stopped until I reset the ppp0 connection.

250565 TX packets and still only three errors.

** This is one of the times where I can't reestablish the ppp0 connection and I have to reboot...

On the completion of a reboot, I have 59 TX packets and 1 error and 1 carrier.

After the reboot, everything is fine
Packages are as installed. However, I shut down several unneeded services. How do I get updates with a text based box? I'm still a point and click kind of guy. :-)
It's easy enough to get the updates, the problem comes in manually installing them. If you can give me a day or so I'll bring my 6.2 update script up to the current set of errata. It makes the job of applying the updates fairly easy. Send me an email (jim@entrophy-free.net) and I'll return the script to you.

Downloading the updates takes a while. You can be working on that in the meantime. Pick someplace on the 6.2 box where you have about 500Mb of free space, preferrably other that / or /usr. Then do:

# mkdir /where-theres-room/updates
# cd /where-theres-room/updates
# ncftp ftp.redhat.com
...
ncftp / > cd /pub/redhat/linux/updates/6.2/en/os
Directory successfully changed.
ncftp ...inux/updates/6.2/en/os >get -RT i386 i586 i686 images noarch

That will effectively mirror those dirs that contain errata that could be needed for your system. Not everything that will be downloaded will be used, but since I can't tell ahead of time exactly what is installed the script will intelligently attempt all of the updates, skipping any that don't correspond to an installed package.

If you're interested I also have a script for 7.2.
OK on the 2.1GB drive I have 797MB available in the /usr directory. This is really a single function server so there is only one user on it.
...ncftp!  That's pretty cool!!!
I have AMD K6 333 on that box and it shows as an Intel 586 so I opted NOT to get the i686 files. If I need them please let me know.
FYI - I have the gateway box (troll) FTPing as you had said. No masquerading. The FTP process is working fine.
The update script can be modified to not require the 686 files, so it's okay not to download them.
your problem looks pretty similar to one I fixed a couple of weeks ago: in my case the NIC was the culprit.
I also got similar behaviours whith some switch/hub.

I know of a problem with the Linux driver for Intel NIC up to kernel 2.4.12 (probably 2.4.15).
Not shure what your
> PCI: Found IRQ 10 for device 00:0f.0
> eth0: Lite-On 82c168 PNIC rev 33 at 0xd087f000, 00:A0:CC:3D:19:98, IRQ 10.
is, but it might be worth just replacing your NIC and then try again.

FYI: I also have seen a NIC which stated unexpectly flooding my switch with millions of different MACs, so the switch stops working properly (behaves like a hub then). I didn't dig deeper in this problem, means if it was a hardware problem of the NIC, or a driver problem. Replacing the NIC solved it.
I'm having problems downloading the files that you asked (jlevie). I did just change network cards from a Netgear FA310TX (Lite-On) to a Zonet (Realtek compatable) NIC. I'll try to resume downloading the updates and see how things go.
I also changed the gateway's (troll) port in the switch from #3 to #8. Cable looked undamaged and was prefabricated cat-5.
AFAIK both NICs are low(est) cost, just keep in mind ...

BTW, there was a very intersting test of NICs in german magazin c't 6/2002. The benchmark compares sevaral common used NICs on Linux and Windoze.
Yeah, I realize they are cheap NIC's. I have an (ISA) Intel PRO NIC with a chip code of "FA82595TX" on the shelf. I was tempted to drop that one in. So far so good on the Realtek.
OK - an update. I downloaded the updated 6.2 files for the gateway (troll) and I have run the install script emailed to me from jlevie. I am testing the network in regards to my downloading problem and getting the 7.2 workstation files via Red Hat's "up2date". I'll update as I know something new.
Check your email... There's a significant problem with the updates of your 6.2 system.
I ran the updated script sent by jlevie. The first time through, the updates were "successful", however, they didn't stop the problem. I had some dependancy problems to fix and was sent a second script. I hastily ran the second script (not following the notes at the beginning of the script) and managed to create a non-bootable computer. :-(

Unfortunately, since this is a production computer for my small LAN, I had to get it up and running quick. I ended up reinstalling the RH6.2 OS and starting over. I had to write over my /usr directory to have enough space on the 2.1g drive. This means that I have to get the updates again from Red Had (56k).

On the RH 7.2 workstation, I did successfully run "up2date" and got it updated to "2.4.9-31" (i686). However, this did not change the problem.

Some observations that may (or may not) change your thinking of the problem:

- I downloaded, via FTP at a command prompt (no X), the entire pile of updates for the gateway (troll) with only five disconnects... which could have come from the ISP. This was direct from the gateway to Red Hat.

- I downloaded under KDE, all the files for the RH7.2 workstation with about the same number of disconnects... which also could have come from the ISP. I was using the Red Hat "up2date" program in KDE. This was masqueraded from the workstation, through the gateway, to Red Hat

- The gateway provides IP Masquerading for all of the workstations. NT, w98, and w2000 boxes do not appear to have the lockup problem. It only apears to happen when the 7.2 workstations try to access the Internet via browsers. I'm not sure if getting the mail using the Linux boxes causes any lockups.

- I would get the problem when trying to get an HTTP (I think) download for a Netscape 6.2 program. I can also get it when actively "surfing", but it is not as predictable. I can get it using Konquerer too.

- I had to buy the CD for Netscape 6.2 since I could not successfully download it.

- I have problems downloading files from ANY web site using the Linux based browsers.

- When the lockup occurrs, I telnet in to the gateway from the 7.2 workstation and reset the Dial-up connection to the ISP. It takes about a minute or two to get a login prompt.
My network situation changed to the point that this question is no longer valid.

Cable became available in the area and I connected to it and dropped the dial-up. I went with other firewall options so the whole Linux 6.2 thing is also no longer applicable.

I would like to delete this question so I don't close out the question and have an erroneous one that others might purchase.

I thought there was an easy way to delete a question, but I can't seem to find it. Can someone tell me how to do it?

BTW - Thank you for all who have helped me on this question..
emherman:
This old question needs to be finalized -- accept an answer, split points, or get a refund.  For information on your options, please click here-> http:/help/closing.jsp#1 
EXPERTS:
Post your closing recommendations!  No comment means you don't care.