Solved: IP Masquerade problems in RH Linux

emherman

ASKER

I'll take a crack at your solutions tonight. thanks.

ahoffmann

listening ..

emherman

ASKER

OK this is what I know:

Using the Gateway (brand) GP6-350 (which is my Linux workstation), the BIOS was set to "Plug and Play O/S -- NO". I had both motherboard com ports enabled, but I disabled one of them (the one that represents com 2). The motherboard has an embedded Ensoniq sound chip which I use. The computer has a USR 56k ISA hardware modem with jumpers set to "plug and pray".

---------------------------------

Here are the results of "dmesg | grep -i irq":

PCI: Using IRQ router PIIX [8086/7110] at 00:07.0
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI ISAPNP enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at port 0x02f8 (irq = 3) is a 16550A
PIIX4: not 100% native mode: will probe irqs later
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
PCI: Found IRQ 9 for device 00:07.2
usb-uhci.c: USB UHCI at I/O 0x1440, IRQ 9
PCI: Found IRQ 10 for device 00:0f.0
eth0: Lite-On 82c168 PNIC rev 33 at 0xd087f000, 00:A0:CC:3D:19:98, IRQ 10.
PCI: Found IRQ 11 for device 00:0c.0
es1371: found es1371 rev 4 at io 0x1400 irq 11

------------------------------

Here are the results of "cat /proc/interrupts":

CPU0
0: 3858229 XT-PIC timer
1: 384 XT-PIC keyboard
2: 0 XT-PIC cascade
8: 1 XT-PIC rtc
9: 0 XT-PIC usb-uhci
10: 30526 XT-PIC eth0
11: 3029 XT-PIC es1371
12: 32664 XT-PIC PS/2 Mouse
14: 18617 XT-PIC ide0
15: 79719 XT-PIC ide1
NMI: 0
ERR: 0

---------------------------------

Here are the results of "cat /proc/pci":

PCI devices found:
Bus 0, device 0, function 0:
Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge (rev 3).
Master Capable. Latency=64.
Prefetchable 32 bit memory at 0xf8000000 [0xfbffffff].
Bus 0, device 1, function 0:
PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev 3).
Master Capable. Latency=128. Min Gnt=140.
Bus 0, device 7, function 0:
ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 2).
Bus 0, device 7, function 1:
IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 1).
Master Capable. Latency=64.
I/O at 0x1460 [0x146f].
Bus 0, device 7, function 2:
USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 1).
IRQ 9.
Master Capable. Latency=64.
I/O at 0x1440 [0x145f].
Bus 0, device 7, function 3:
Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 2).
IRQ 9.
Bus 0, device 12, function 0:
Multimedia audio controller: Ensoniq ES1371 [AudioPCI-97] (rev 4).
IRQ 11.
Master Capable. Latency=96. Min Gnt=12.Max Lat=128.
I/O at 0x1400 [0x143f].
Bus 0, device 15, function 0:
Ethernet controller: Lite-On Communications Inc LNE100TX (rev 33).
IRQ 10.
Master Capable. Latency=64.
I/O at 0x1000 [0x10ff].
Non-prefetchable 32 bit memory at 0xf4000000 [0xf40000ff].
Bus 1, device 0, function 0:
VGA compatible controller: ATI Technologies Inc 3D Rage Pro AGP 1X/2X (rev 92).
Master Capable. Latency=66. Min Gnt=8.
Non-prefetchable 32 bit memory at 0xf5000000 [0xf5ffffff].
I/O at 0x9000 [0x90ff].
Non-prefetchable 32 bit memory at 0xf4100000 [0xf4100fff].

------------------------------------

Before I go any farther, I don't know how to tell if there is anything but an obvious intereference there. How does it look to you?

jlevie

Well, the most obvious thing that leaps out at me is that I see nothing that looks like eth1. So there's definitely something wrong with the hardware configuration.

Since I can't see any resources assigned for eth1, I'd guess that it's "hiding behind something". Could I see what 'ifconfig eth1' shows? The interrupt ought to be in that output.

The--Captain

Jlevie's comment seem on the level (sorry, I've been waiting to make that pun for ages). Another thing to try - disable that sound card and USB controller (if you are not using them), and any other hardware that is not in use. I am also interested in the output of ifconfig -a, but for different reasons... I have seen boxes in the past that give excessive ethernet collisions/errors (but only when talking to specific other machines) until enough hardware was swapped out of them to make them behave - I am wondering if this is one of those cases.

Cheers,
-Jon

emherman

ASKER

This was the results of "/sbin/ifconfig -a". The reason that /sbin/ifconfig eth1 didn't work is that the ethernet card is assigned to eth0. BTW - these results are coming from the RH7.2 workstation (know as "pig"). Cow is the 7.2 server and "troll" is the RH6.2 gateway.

eth0 Link encap:Ethernet HWaddr 00:A0:CC:3D:19:98
inet addr:192.168.1.17 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3750 errors:1 dropped:0 overruns:0 frame:0
TX packets:2963 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:1043305 (1018.8 Kb) TX bytes:287599 (280.8 Kb)
Interrupt:10 Base address:0xf000

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:182 errors:0 dropped:0 overruns:0 frame:0
TX packets:182 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:12410 (12.1 Kb) TX bytes:12410 (12.1 Kb)

emherman

ASKER

192.168.1.17 is the address for pig, 192.168.1.1 is the address for cow, 192.168.1.5 is the address for troll

jlevie

Hmm, I think there's been a bit of confusion here. The data that I asked to see should have all come from the gateway box. I was wrong in asking about eth1. Looking back at the question I see that you are using PPP for the Internet link. For some reason I got it into my head that the gateway had two ethernets.

What does 'ifconfig -a' on the gateway show?

emherman

ASKER

eth0 Link encap:Ethernet HWaddr 00:A0:CC:D0:9F:8F
inet addr:192.168.1.5 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:281529 errors:0 dropped:0 overruns:0 frame:0
TX packets:238860 errors:3 dropped:0 overruns:0 carrier:3
collisions:0 txqueuelen:100
Interrupt:10 Base address:0xd400

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:3924 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0

ppp0 Link encap:Point-to-Point Protocol
inet addr:xxx.xxx.xxx.xxx P-t-P:xxx.xxx.xxx.xxx Mask:255.255.255.255
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:12508 errors:0 dropped:0 overruns:0 frame:0
TX packets:13670 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:10

I "xxx.xxx.xxx.xxx" the internet address because they had valid external IP addresses. If you need them I can e-mail them to you.

emherman

ASKER

I'm streaming real audio to my w2000 box now and operating as I normally do with my mail client on the Linux workstation. If I get the connection to fail. I'll get another "ifconfig -a" on the gateway (troll).

jlevie

Hmm, you are showing carrier loss/TX errors on the ethernet controller. That's not normal and is probably an indicator of something wrong.

What are you using in the inside network? A hub or switch and what make/model? You could have a bad port or cable, could you swap ports and/or cables?

emherman

ASKER

I run three machines on the lower floor into a Netsurf 8 port "10/100 switch hub". I connect the "uplink" port (lower) to port 1 (upper) on another Netsurf 8 port switch (different model) and link to three more machines on the upper floor. Yeah they are no-name switches.

Currently I have 248862 TX packets and still only three errors. I'd like to wait to see the failure again to see what it does. Then I'll swap ports/cables.

Let me try to download something right quick..

emherman

ASKER

OK, I knew I could make it fail easily.

I went to download a program (http) 902k in size using Netscape 6.2. I got 40k of it and then it blew the Internet connection. When this happens I telnet to the gateway "troll" (I know I need to run SSH) and the login prompt stalls (only when this happens). It will take two to three minutes to get the login prompt (versus about one second) from a remote computer. Once I get that, I can login and "ifdown ppp0" and "ifup ppp0" and things will go again. Sometimes I need to restart the gateway (troll) to get it to go.

If I run to the lower level, I can access the box directly, and login immediately... even when stalled.

All internet access is now stopped until I reset the ppp0 connection.

250565 TX packets and still only three errors.

** This is one of the times where I can't reestablish the ppp0 connection and I have to reboot...

On the completion of a reboot, I have 59 TX packets and 1 error and 1 carrier.

After the reboot, everything is fine

jlevie

The login delay is most likely due to a DNS timeout. Since the Internet link is hosed the gateway can't get to DNS to check for a reverse lookup of the IP of the telnet client.

From what you described it doesn't sound like it's an ethernet problem. This sounds more like a software problem.

Has the gateway had all applicable RedHat errata applied to it? Or is it still running the 'as installed' packages?

emherman

ASKER

OK, I knew I could make it fail easily.

I went to download a program (http) 902k in size using Netscape 6.2. I got 40k of it and then it blew the Internet connection. When this happens I telnet to the gateway "troll" (I know I need to run SSH) and the login prompt stalls (only when this happens). It will take two to three minutes to get the login prompt (versus about one second) from a remote computer. Once I get that, I can login and "ifdown ppp0" and "ifup ppp0" and things will go again. Sometimes I need to restart the gateway (troll) to get it to go.

If I run to the lower level, I can access the box directly, and login immediately... even when stalled.

All internet access is now stopped until I reset the ppp0 connection.

250565 TX packets and still only three errors.

** This is one of the times where I can't reestablish the ppp0 connection and I have to reboot...

On the completion of a reboot, I have 59 TX packets and 1 error and 1 carrier.

After the reboot, everything is fine

emherman

ASKER

OK, I knew I could make it fail easily.

I went to download a program (http) 902k in size using Netscape 6.2. I got 40k of it and then it blew the Internet connection. When this happens I telnet to the gateway "troll" (I know I need to run SSH) and the login prompt stalls (only when this happens). It will take two to three minutes to get the login prompt (versus about one second) from a remote computer. Once I get that, I can login and "ifdown ppp0" and "ifup ppp0" and things will go again. Sometimes I need to restart the gateway (troll) to get it to go.

If I run to the lower level, I can access the box directly, and login immediately... even when stalled.

All internet access is now stopped until I reset the ppp0 connection.

250565 TX packets and still only three errors.

** This is one of the times where I can't reestablish the ppp0 connection and I have to reboot...

On the completion of a reboot, I have 59 TX packets and 1 error and 1 carrier.

After the reboot, everything is fine

emherman

ASKER

Packages are as installed. However, I shut down several unneeded services. How do I get updates with a text based box? I'm still a point and click kind of guy. :-)

jlevie

It's easy enough to get the updates, the problem comes in manually installing them. If you can give me a day or so I'll bring my 6.2 update script up to the current set of errata. It makes the job of applying the updates fairly easy. Send me an email (jim@entrophy-free.net) and I'll return the script to you.

Downloading the updates takes a while. You can be working on that in the meantime. Pick someplace on the 6.2 box where you have about 500Mb of free space, preferrably other that / or /usr. Then do:

# mkdir /where-theres-room/updates
# cd /where-theres-room/updates
# ncftp ftp.redhat.com
...
ncftp / > cd /pub/redhat/linux/updates/6.2/en/os
Directory successfully changed.
ncftp ...inux/updates/6.2/en/os >get -RT i386 i586 i686 images noarch

That will effectively mirror those dirs that contain errata that could be needed for your system. Not everything that will be downloaded will be used, but since I can't tell ahead of time exactly what is installed the script will intelligently attempt all of the updates, skipping any that don't correspond to an installed package.

If you're interested I also have a script for 7.2.

emherman

ASKER

OK on the 2.1GB drive I have 797MB available in the /usr directory. This is really a single function server so there is only one user on it.

emherman

ASKER

...ncftp! That's pretty cool!!!

emherman

ASKER

I have AMD K6 333 on that box and it shows as an Intel 586 so I opted NOT to get the i686 files. If I need them please let me know.

emherman

ASKER

FYI - I have the gateway box (troll) FTPing as you had said. No masquerading. The FTP process is working fine.

jlevie

The update script can be modified to not require the 686 files, so it's okay not to download them.

ahoffmann

your problem looks pretty similar to one I fixed a couple of weeks ago: in my case the NIC was the culprit.
I also got similar behaviours whith some switch/hub.

I know of a problem with the Linux driver for Intel NIC up to kernel 2.4.12 (probably 2.4.15).
Not shure what your
> PCI: Found IRQ 10 for device 00:0f.0
> eth0: Lite-On 82c168 PNIC rev 33 at 0xd087f000, 00:A0:CC:3D:19:98, IRQ 10.
is, but it might be worth just replacing your NIC and then try again.

FYI: I also have seen a NIC which stated unexpectly flooding my switch with millions of different MACs, so the switch stops working properly (behaves like a hub then). I didn't dig deeper in this problem, means if it was a hardware problem of the NIC, or a driver problem. Replacing the NIC solved it.

emherman

ASKER

I'm having problems downloading the files that you asked (jlevie). I did just change network cards from a Netgear FA310TX (Lite-On) to a Zonet (Realtek compatable) NIC. I'll try to resume downloading the updates and see how things go.

emherman

ASKER

I also changed the gateway's (troll) port in the switch from #3 to #8. Cable looked undamaged and was prefabricated cat-5.

ahoffmann

AFAIK both NICs are low(est) cost, just keep in mind ...

BTW, there was a very intersting test of NICs in german magazin c't 6/2002. The benchmark compares sevaral common used NICs on Linux and Windoze.

emherman

ASKER

Yeah, I realize they are cheap NIC's. I have an (ISA) Intel PRO NIC with a chip code of "FA82595TX" on the shelf. I was tempted to drop that one in. So far so good on the Realtek.

emherman

ASKER

OK - an update. I downloaded the updated 6.2 files for the gateway (troll) and I have run the install script emailed to me from jlevie. I am testing the network in regards to my downloading problem and getting the 7.2 workstation files via Red Hat's "up2date". I'll update as I know something new.

jlevie

Check your email... There's a significant problem with the updates of your 6.2 system.

emherman

ASKER

I ran the updated script sent by jlevie. The first time through, the updates were "successful", however, they didn't stop the problem. I had some dependancy problems to fix and was sent a second script. I hastily ran the second script (not following the notes at the beginning of the script) and managed to create a non-bootable computer. :-(

Unfortunately, since this is a production computer for my small LAN, I had to get it up and running quick. I ended up reinstalling the RH6.2 OS and starting over. I had to write over my /usr directory to have enough space on the 2.1g drive. This means that I have to get the updates again from Red Had (56k).

On the RH 7.2 workstation, I did successfully run "up2date" and got it updated to "2.4.9-31" (i686). However, this did not change the problem.

Some observations that may (or may not) change your thinking of the problem:

- I downloaded, via FTP at a command prompt (no X), the entire pile of updates for the gateway (troll) with only five disconnects... which could have come from the ISP. This was direct from the gateway to Red Hat.

- I downloaded under KDE, all the files for the RH7.2 workstation with about the same number of disconnects... which also could have come from the ISP. I was using the Red Hat "up2date" program in KDE. This was masqueraded from the workstation, through the gateway, to Red Hat

- The gateway provides IP Masquerading for all of the workstations. NT, w98, and w2000 boxes do not appear to have the lockup problem. It only apears to happen when the 7.2 workstations try to access the Internet via browsers. I'm not sure if getting the mail using the Linux boxes causes any lockups.

- I would get the problem when trying to get an HTTP (I think) download for a Netscape 6.2 program. I can also get it when actively "surfing", but it is not as predictable. I can get it using Konquerer too.

- I had to buy the CD for Netscape 6.2 since I could not successfully download it.

- I have problems downloading files from ANY web site using the Linux based browsers.

- When the lockup occurrs, I telnet in to the gateway from the 7.2 workstation and reset the Dial-up connection to the ISP. It takes about a minute or two to get a login prompt.

emherman

ASKER

My network situation changed to the point that this question is no longer valid.

Cable became available in the area and I connected to it and dropped the dial-up. I went with other firewall options so the whole Linux 6.2 thing is also no longer applicable.

I would like to delete this question so I don't close out the question and have an erroneous one that others might purchase.

I thought there was an easy way to delete a question, but I can't seem to find it. Can someone tell me how to do it?

BTW - Thank you for all who have helped me on this question..

CleanupPing

emherman:
This old question needs to be finalized -- accept an answer, split points, or get a refund. For information on your options, please click here-> http:/help/closing.jsp#1
EXPERTS:
Post your closing recommendations! No comment means you don't care.