The kernel ate my packets

Some time ago I had a problem with a server. It had two ethernet interfaces connected to different vlans. The main network traffic went via the default gateway in the first vlan, but there was a listening service in the other interface.

Everything was right until we tried to reach the second interface from another node out of the second vlan but near of this. It seemed there was not connection, but as I saw with tcpdump, the traffic arrived. It was a simple test, I ran a ping from the other node (10.1.2.55) and captured traffic in the second interface (10.10.1.62):

[root@blackdog ~]# tcpdump -w /tmp/inc-eth1-ping.pcap -i eth1
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
20 packets captured
20 packets received by filter
0 packets dropped by kernel
[root@blackdog ~]# tcpdump -nnr /tmp/inc-eth1-ping.pcap
reading from file /tmp/inc-eth1-ping.pcap, link-type EN10MB (Ethernet)
01:35:15.751507 IP 10.1.2.55 > 10.10.1.62: ICMP echo request, id 65466, seq 78, length 64
01:35:16.759271 IP 10.1.2.55 > 10.10.1.62: ICMP echo request, id 65466, seq 79, length 64
01:35:17.767223 IP 10.1.2.55 > 10.10.1.62: ICMP echo request, id 65466, seq 80, length 64
01:35:18.775153 IP 10.1.2.55 > 10.10.1.62: ICMP echo request, id 65466, seq 81, length 64

So the ping packets arrived to the server but there was no answer via this interface. I captured traffic in the other interface but there was no answer either:

[root@blackdog ~]# tcpdump -nnr /tmp/inc-eth0-ping.pcap |grep 10.1.2.55
[root@blackdog ~]#

Ok, that’s the cause:

[root@blackdog ~]# cat /proc/sys/net/ipv4/conf/all/rp_filter
1

And one solution:

[root@blackdog ~]# echo 2 > /proc/sys/net/ipv4/conf/all/rp_filter

So let’s see again the incoming packets at eth1:

[root@blackdog ~]# tcpdump -nnr /tmp/inc-eth1-ping.pcap|grep 10.1.2.55
01:47:00.322056 IP 10.1.2.55 > 10.10.1.62: ICMP echo request, id 42171, seq 1, length 64
01:47:01.323834 IP 10.1.2.55 > 10.10.1.62: ICMP echo request, id 42171, seq 2, length 64
01:47:02.324601 IP 10.1.2.55 > 10.10.1.62: ICMP echo request, id 42171, seq 3, length 64
01:47:03.325823 IP 10.1.2.55 > 10.10.1.62: ICMP echo request, id 42171, seq 4, length 64

And the outgoing packets at eth0:

[root@blackdog ~]# tcpdump -nnr /tmp/inc-eth0-ping.pcap|grep 10.1.2.55
01:47:18.969567 IP 10.10.1.62 > 10.1.2.55: ICMP echo reply, id 42427, seq 1, length 64
01:47:19.970800 IP 10.10.1.62 > 10.1.2.55: ICMP echo reply, id 42427, seq 2, length 64
01:47:20.969751 IP 10.10.1.62 > 10.1.2.55: ICMP echo reply, id 42427, seq 3, length 64
01:47:21.968764 IP 10.10.1.62 > 10.1.2.55: ICMP echo reply, id 42427, seq 4, length 64
01:47:22.968705 IP 10.10.1.62 > 10.1.2.55: ICMP echo reply, id 42427, seq 5, length 64

What happened here? As it says in this Red Hat note, the rp_filter kernel parameter got more strict than in previous kernel versions, so the “1” value has a different meaning. For example, in 2.6.16 kernel you can read in the documentation (/usr/share/doc/kernel-doc-2.6.18/Documentation/networking/ip-sysctl.txt):

        1 - do source validation by reversed path, as specified in RFC1812
            Recommended option for single homed hosts and stub network
            routers. Could cause troubles for complicated (not loop free)
            networks running a slow unreliable protocol (sort of RIP),
            or using static routes.

And in 2.6.32 kernels and more recent:

        1 - Strict mode as defined in RFC3704 Strict Reverse Path 
            Each incoming packet is tested against the FIB and if the interface
            is not the best reverse path the packet check will fail.
            By default failed packets are discarded.

Of course, you have another (more elegant) solution: using multiple routing tables

Thanks again to Rafa Serrada from HPE for giving me the trace for solving the problem :-)