Link to home
Start Free TrialLog in
Avatar of morgan_bear
morgan_bear

asked on

Routing on a 2D Torus Topology

Hello.  I am currently setting up a 2D Torus network topology with 9 hosts and 1 server.  Each host has four network cards on it and the IP addresses are distributed on 19 subnetworks including the server.  Each node has a direct connection to 2 adjacent nodes and the nodes on the end of the row and column.  Here is a diagram to better describe the topology.  
                   
Server
|       
1 --- 2 --- 3 ---
|           |     |
4 --- 5 --- 6 ---
|           |     |
7 --- 8 --- 9 ---
|           |     |
|     |     |
note: diagram might be unaligned. 2 connected with 5, 5 connected to 8, 3 connected with 6, 6 connected with 9  Hope you can make sense of it!


The server is connected to 1.  All packets that are sent to any node on the network have to travel through 1.  The lines in the diagram represent connections.  The lines on the end of the rows coming from 3, 6, and 9 are connected to the nodes on the beginning of the rows(3 connected to 1, 6 connected to 4, and 9 connected to 7).  The lines on the end of the nodes on the bottom of the columns are connected to the top of the columns(7 connected to 1, 8 connected to 2, and 9 connected to 3.  We have configured our subnets and IP addresses correctly.  All nodes that are directly connected to each other can communicate.  We have been able to route to a node that is at most 2 hops away from another.  The server can ping 2, the server can ping 4, and the server can ping 7.  1 can ping 5 through 4 and 2 can ping 9 through 3.  The problem has come when we try to communicate with a node that is 3 hops away.  Server to 5.  What we want to do is to go from the server to 5 through 1 and 4.  The farthest we have been able to get is to be able to ping the interface on 4 that is directly connected to 5 from the server.  Now since we can ping, from the server, the interface on 4 that is connected to(on the same network as) 5, we are trying to use that interface as a gateway to 5.  When we set it up that way we still cannot communicate with 5 from the server.

Not the problem:

- Subnet IP addresses are configured correctly.
- The route the packet has to take is not a one way street.  To get to 5 from the server the gateway is the interface on four that is connceted to five and to get from 5 to the server the gateway is the interface on 1 that is connected to the server.
-The routes are staying in the system upon reboot because they are placed in the static-routes file which is scanned to place routes in the routing table upon reboot.

-We can ping the interface that is connected to 5 on 4 from the server and we can ping the interface on 1 that is connected to the server from 5.

Currently when setting up this network, when we want to communicate with 5 from one, the gateway to five would be the interface on 4 that is connected to five.  That method has been working for use successfully for 2 hop communication.  The bottom line is:  How do we communicate with one node to another when the communication requires more than 2 hops?  How do we set up the route and are we configuring the gateways correctly?  

P.S. - Will we be able to collaborate on this question without me losing any more points?
Avatar of morgan_bear
morgan_bear

ASKER

Edited text of question.
Edited text of question.
Edited text of question.
Edited text of question.
Edited text of question.
Edited text of question.
I guess I am slightly confused.  How can 1 only have 4 cards in it if it is connected to the following:

Server
2
4
3
7

It would appear that 1 has to have 5 nics or there is a broken link somewhere.
I define "default gateway" as the IP address in my current subnet that takes packets to other subnets when I don't have a static route.

Although I have never having set up a network like this, I would attempt the following:

Server has a default gateway of 1
1 has a default gateway of 4
4 has a default gateway of 7
7 has a default gateway of Server
2 and 3 have a default gateway of 1
5 and 6 have a default gateway of 4
8 and 9 have a default gateway of 7

In this configuration, the only static routes you would have are for the local subnets on each machine, which use the local interfaces.

Is this a real-world application?

I wonder what would happen if you setup "routed".
Can I sell you some network cards, wire and connectors? :)

Could you give an example of how at least 3 machines are configured as far as IP, netmask, gateway and routing? If not too verbose is there a scheme to the IP setup?

Why send all traffic to the server thru host 1?

I agree with baird where is the missing card?

I am gald to see at least one mind bender on here!
You guys are correct on your observation Biard and Lewisq, node one does have 5 network cards on it.

dcanvanaugh-  we never attempted to use a default routing scheme but we probably will depending on what we discover.  In theory it would seem that a default gateway scheme would produce the same result since we have explicitly expressed all of the static routes on the network.  I will add default gateways to the routing table and observe the result.  A default gateway scheme would probably be more effective since I know for sure that a message is going to be sent to the default gateway until reaching a destination.   I'll try it and get back to you.  
This networking scheme is being done as a research project at my University in collaboration between the Computer Science department (where I and my collegue are from) and the school of Electrical Engineering(the people who get the grants to pay me).  The bottom line is to produce a parallel computing machine using commodity of the shelf components and linux as the OS that can solve highly computational intensive programs almost efficiently as a Supercomputer.  Right now we are playing with different networking topologies.  We have already done switch and hub topologies(non-trivial) and successfully run some Computational Fluid Dynamics codes on the entire machine(it ran at about 350Mflops.  Thats pretty good for 10 nodes but our goal is a Gigaflop or more).  Search "Beowulf Clustered Computing" or "Beowulf Parallel Computing" for alot more information.
We are supposedly running routed on 1 3 4 6 7 9 and gated on 2 5 8.   Does that make a difference?  Should we reload 2 5 and 8 with routed?  We changed them to gated because we thought that routed was not letting us route at all(long story).  To make it short... We didn't know then what we know now.

lewisq:

I'm going to give you the IP address of 4 machines and a couple of routes that we took and that I know for sure are working.

the netmask is 255.255.255.0

Server:
eth0 - 192.168.1.254

1:

eth0 -  192.168.1.1
eth1 -  192.168.2.1
eth2 -  192.168.3.1
eth3 -  192.168.4.1
eth4 -  192.168.5.1

4:

eth0 - 192.168.4.2
eth1 - 192.168.11.1
eth2 - 192.168.12.1
eth3 - 192.168.13.1

5:

eth0 - 192.168.11.2
eth1 - 192.168.7.2
eth2 - 192.168.14.1
eth3 - 192.168.15.1

routes-

192.68.-.-

server - 4 :  the gateway is 1.1.  Sever can ping eth0 on 4.  The routing table entry is:
Dest            `Gateway
4.0            1.1

Since I can ping 4.2 I assume that I can make 4.2 the gateway to 11.0 so I can ping eth0 on 5.  I haven't been successfull.

What is even more perplexing to me is that 1 can ping 5.

1 - 5:  the gateway is 4.2. 1 can ping eth0 on 5

If the route to 5 from the server is using 4.2 as the gateway to network 11.0.  Shouldn't I be able to ping five from the server?  That's how I have it in the routing table.

One thing I can definitally say is that my routing tables are changing.  We restart all hosts in the route everytime that we make a change in the routing table.   When I came in today I couldn't ping the eth1 from the server but I'm positive I could do it yesterday and the day before.  It seems like we're missing one minor detail.

The scheme to the IP setup is as follow.  The way our topology is we have 19 subnets.  These subnets are counted up by the number of direct connections we have.  Since each connection is point to point, the last byte on the IP address is 1 for the start of the link and 2 for the end of the link(we gave the last byte on the server the number 254 just so we could make it special :-) ).  Each connection is given a number.  There was no set way we numbered them we just made a diagram and counted the links.  The number of the connection on the diagram is the number we use in the third byte of the IP address.   Link number 4 connects 1 and 4 thats why 1 has the IP address of 192.168.4.1(start on link number 4) and 4 has the IP address 192.168.4.2(end on link number 4).  You can then deduce that the connection between 4 and 5 is number 11.  IP address on 4 is 192.168.11.1(start of link number 11) and IP address on 5 is 192.168.11.2 (end of link number 11).   The entire network is organized in this manner.  I know you would love to take a gander at this diagram we have.  It would clear up alot of basic things for you.  If you have a fax machine you can see it.  I want to let you see the routing tables but it would double the size of this message.   If you need to see them I can take care of that though.
Another thing for lewsiq:  We are sending all of the traffic through 1 for this reason.  Even though we have really have made a Linux LAN with a server and network information system it is going to be used for a different purpose than you probably expect.  I mentioned in the previous message that this is really  a parallel computing machine(even though it can be used as a Linux network).  We have to look at all the hosts on this LAN as just processors and nothing more.  The server is the only machine that we are going to have a keyboard and mouse hooked up to we this machine is completed.  So  you should look at this machine as a parallel computing machine with 9 processors and 1 master computer which can use these processors in parallel to run some very computation intensive programs(interpreting satallite data,  electromagnetic problems, computational fluid dynamics, advanced rendering(stuff you can't do on a PC or workstation)).  The master(sever) must have access to all the nodes on the system.   To be able to use our processors in parallel we are using the Message Passing Interface(MPI).   MPI is an library that can be used in C or Fortran programs to be able to send certain portions of a program to certain processors(hosts).  I can write a program in C that will take 9 hours to complete on one computer.  I can then write that program again using MPI to break it up, run it on our system and I theory the program will take one hour to complete since nine computers will be working in parallel(excluding some network overhead).  These programs will only reside on the server.   When I run the program our machine the server broadcasts the messages to the appropriate processors onto the network.  In the topology we are using we can only have a connection to one processor on the machine.
I suspect that routed and/or gated are intefering with the manual changes you are making to the routing tables.  This would explain your observation about the unexpected changes in routing.   If you are going to determine the routes yourself, there should be no need for routed or gated.   In any event, rebooting after a route change should <not> be necessary.  

My previous comments suggesting routed/gated were based on my assumption that you would want a strategy that allowed you to dynamically add/delete nodes (which appears to NOT be the case).

I used to work with Cisco routers, and we would reboot them when we made a mistake in configuration and routed would create bogus routes that we were too lazy to delete.
I think dcavanaugh has hit the problem on the head with his suspicion that routed/gated are screwing you up. The net3-4-howto contains the following:
"The dynamic routing daemon will automatically modify your routing table to adjust to changes in your network."
Since your network does not change you don't need routed with the increased overhead and RIP packets that it entails. Also in an old TCP/IP book I have there is quite a bit of todo about count-to-infinity problems in RIP systems that have router loops.

I would remove routed & gated and instead implement static routes (which i think you already have setup).

Very cool application!
Thank You.  So what your saying is that there is no need for routed or gated to use machines as routers.  Is there any logical explanation on what was causing routed and gated to only allow 2 hops?
ASKER CERTIFIED SOLUTION
Avatar of dcavanaugh
dcavanaugh

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
This is not an answer but it helps.
Did you remove routed and gated and did that fix the problem?

The explanation you are looking for about the 2 hop problem may be found in the count-to-infinity problem that looped routing systems appear to have. In the reading I did (which I did not totally understand) it appears that RIP based routing systems (routed) that are configuted in loops (like yours) can quickly count past 30? hops which is considered unreachable.

Anyway - I would like to know if you had any sucess.