MPLS, MP-BGP and /32 loopbacks

A lot of sources say configuring MPLS BGP peering using /32 loopbacks is recommended, or even required.

So what, if anything, happens when we set up MP-BGP peering using loopbacks with /24 addresses?

We will test this on a simple topology with 2 PEs, 2 CEs and 1 P routers.

PE1 and PE2 have BGP peering configured using their loopbacks, 10.1.1.1/24 and 10.1.2.2/24 .

Client sites, placed in vrf RED, are running BGP AS65015 with PEs and have the following networks configured:

CE1
50.0.0.0/24
50.0.1.0/24

CE2
60.0.0.0/24
60.0.1.0/24

As we can see MP-BGP session comes up and prefixes are exchanged:

PE1#sh bgp vpnv4 unicast all sum | b Nei
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.1.2.2        4           24       6       7       10    0    0 00:01:04        3
120.0.0.2       4        65015      10      11       10    0    0 00:05:57        3

PE1#sh bgp vpnv4 unicast all | b Net
     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 1:1 (default for vrf RED)
 *>  50.0.0.0/24      120.0.0.2                0             0 65015 ?
 *>  50.0.1.0/24      120.0.0.2                0             0 65015 ?
 *>i 60.0.0.0/24      10.1.2.2                 0    100      0 65015 ?
 *>i 60.0.1.0/24      10.1.2.2                 0    100      0 65015 ?
 *   120.0.0.0/24     120.0.0.2                0             0 65015 ?
 *>                   0.0.0.0                  0         32768 i
 *>i 121.0.0.0/24     10.1.2.2                 0    100      0 i

PE2#sh bgp vpnv4 unicast all sum | b Nei
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.1.1.1        4           24       7       6       10    0    0 00:01:46        3
121.0.0.2       4        65015       6       7       10    0    0 00:01:14        3

PE2#sh bgp vpnv4 unicast all | b Net
     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 1:1 (default for vrf RED)
 *>i 50.0.0.0/24      10.1.1.1                 0    100      0 65015 ?
 *>i 50.0.1.0/24      10.1.1.1                 0    100      0 65015 ?
 *>  60.0.0.0/24      121.0.0.2                0             0 65015 ?
 *>  60.0.1.0/24      121.0.0.2                0             0 65015 ?
 *>i 120.0.0.0/24     10.1.1.1                 0    100      0 i
 *   121.0.0.0/24     121.0.0.2                0             0 65015 ?
 *>                   0.0.0.0                  0         32768 i

We can also see that each of the customer sites learns the routes from the other site:

CE1#sh ip route bgp | b sub
      60.0.0.0/24 is subnetted, 2 subnets
B        60.0.0.0 [20/0] via 120.0.0.1, 00:02:11
B        60.0.1.0 [20/0] via 120.0.0.1, 00:02:11
      121.0.0.0/24 is subnetted, 1 subnets
B        121.0.0.0 [20/0] via 120.0.0.1, 00:02:12

CE2#sh ip route bgp | b sub
      50.0.0.0/24 is subnetted, 2 subnets
B        50.0.0.0 [20/0] via 121.0.0.1, 00:01:47
B        50.0.1.0 [20/0] via 121.0.0.1, 00:01:47
      120.0.0.0/24 is subnetted, 1 subnets
B        120.0.0.0 [20/0] via 121.0.0.1, 00:01:47

But what happens when we try to ping from one customer site to the other?


CE1#ping 60.0.0.1 source l50
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 60.0.0.1, timeout is 2 seconds:
Packet sent with a source address of 50.0.0.1
.....
Success rate is 0 percent (0/5)

Ping fails.

How about traceroute:


CE1#traceroute 60.0.0.1 source l50
Type escape sequence to abort.
Tracing the route to 60.0.0.1
VRF info: (vrf in name/id, vrf out name/id)
  1 120.0.0.1 5 msec 2 msec 5 msec
  2  *  *  *
  3  *  *  *

Trace stops on the PE1. We should now focus our investigation on PE1. We know that PE1 received route to 60.0.0.0/24 over MP-BGP:


PE1#sh bgp vpnv4 unicast all 60.0.0.0/24
BGP routing table entry for 1:1:60.0.0.0/24, version 8
Paths: (1 available, best #1, table RED)
  Advertised to update-groups:
     1
  65015
    10.1.2.2 (metric 21) from 10.1.2.2 (10.1.2.2)
      Origin incomplete, metric 0, localpref 100, valid, internal, best
      Extended Community: RT:1:1
      mpls labels in/out nolabel/203

Next hop shows 10.1.2.2, and routing table has the following for 10.1.2.2:


PE1#sh ip route 10.1.2.2
Routing entry for 10.1.2.2/32
  Known via "ospf 1", distance 110, metric 21, type intra area
  Last update from 140.1.11.1 on FastEthernet0/0, 00:23:30 ago
  Routing Descriptor Blocks:
  * 140.1.11.1, from 10.1.2.2, 00:23:30 ago, via FastEthernet0/0
      Route metric is 21, traffic share count is 1

Corresponding label for this prefix:


PE1#sh mpls forwarding-table 10.1.2.2
Local  Outgoing    Prefix            Bytes tag  Outgoing   Next Hop
tag    tag or VC   or Tunnel Id      switched   interface
102    301         10.1.2.2/32       0          Fa0/0      140.1.11.1

All seems to be in order, let's move to the next hop 140.1.11.1, which happens to be P1:


P1#sh ip route 10.1.2.2
Routing entry for 10.1.2.2/32
  Known via "ospf 1", distance 110, metric 11, type intra area
  Last update from 140.1.12.2 on FastEthernet0/1, 00:25:02 ago
  Routing Descriptor Blocks:
  * 140.1.12.2, from 10.1.2.2, 00:25:02 ago, via FastEthernet0/1
      Route metric is 11, traffic share count is 1

P1#sh mpls forwarding-table 10.1.2.2
Local  Outgoing    Prefix            Bytes tag  Outgoing   Next Hop
tag    tag or VC   or Tunnel Id      switched   interface
301    Untagged    10.1.2.2/32       3983       Fa0/1      140.1.12.2

We can see that P1 has the route to 10.1.2.2 and it points to PE2, as it should. However LFIB does not have an outgoing label for this prefix and consequently cannot forward labelled packets destined 10.1.2.2. So now you might wonder, what's wrong here. PE2 owns this prefix so it should have advertised a label for it. We will move to PE2 and see if the label gets generated:


PE2#sh mpls ldp bindings 10.1.2.2 32
  tib entry: 10.1.2.2/32, rev 11
        remote binding: tsr: 10.1.10.10:0, tag: 301

PE2 has no local binding for 10.1.2.2/32. We know that labels are generated for all IGP prefixes, and here's what the routing table shows on PE2:


PE2#sh ip route 10.1.2.2
Routing entry for 10.1.2.0/24
  Known via "connected", distance 0, metric 0 (connected, via interface)
  Routing Descriptor Blocks:
  * directly connected, via Loopback0
      Route metric is 0, traffic share count is 1

PE2#sh mpls ldp bindings 10.1.2.0 24
  tib entry: 10.1.2.0/24, rev 4
        local binding:  tag: imp-null

Wait, there's no route for 10.1.2.2/32 and consequently no label for /32 is being generated. Instead PE2 has a route for /24, which is the mask we configured on lo0. Other routers however don't have this route but instead learn /32 route. This causes LSP to break.

The reason for this discrepancy is the default behaviour of OSPF when advertising loopback addresses. OSPF ignores mask configured for the IP address on the loopback interface and advertises it as a /32 prefix instead.

We have two possible solutions to this problem:

Change the mask on loopback interfaces to /32 to avoid issues caused by OSPF and its default behaviour.
Add "ip ospf network point-to-point" to the loopback's configuration. This will tell OSPF to advertise this prefix with the correct mask.

For the sake of this example we will change OSPF network type to show that /24 IP addresses can work in this scenario.

After our configuration changes we can see that all of the routers in the path have correct entries in the LIB and LFIB:


PE1#sh mpls forwarding-table 10.1.2.2
Local  Outgoing    Prefix            Bytes tag  Outgoing   Next Hop
tag    tag or VC   or Tunnel Id      switched   interface
106    302         10.1.2.0/24       0          Fa0/0      140.1.11.1

PE1#sh mpls ldp bindings 10.1.2.0 24
  tib entry: 10.1.2.0/24, rev 13
        local binding:  tag: 106
        remote binding: tsr: 10.1.10.10:0, tag: 302

P1#sh mpls forwarding-table 10.1.2.2
Local  Outgoing    Prefix            Bytes tag  Outgoing   Next Hop
tag    tag or VC   or Tunnel Id      switched   interface
302    Pop tag     10.1.2.0/24       254        Fa0/1      140.1.12.2

P1#sh mpls ldp bindings 10.1.2.0 24
  tib entry: 10.1.2.0/24, rev 13
        local binding:  tag: 302
        remote binding: tsr: 10.1.2.2:0, tag: imp-null
        remote binding: tsr: 10.1.1.1:0, tag: 106

PE2#sh mpls ldp bindings 10.1.2.0 24
  tib entry: 10.1.2.0/24, rev 4
        local binding:  tag: imp-null
        remote binding: tsr: 10.1.10.10:0, tag: 302

And a final test, another ping attempt and traceroute between CE1 and CE2:


CE1#ping 60.0.0.1 source l50

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 60.0.0.1, timeout is 2 seconds:
Packet sent with a source address of 50.0.0.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 80/85/92 ms

CE1#traceroute 60.0.0.1 source l50

Type escape sequence to abort.
Tracing the route to 60.0.0.1

  1 120.0.0.1 20 msec 36 msec 8 msec
  2 140.1.11.1 [MPLS: Labels 302/203 Exp 0] 60 msec 60 msec 60 msec
  3 121.0.0.1 [AS 24] [MPLS: Label 203 Exp 0] 44 msec 40 msec 40 msec
  4 121.0.0.2 [AS 24] 72 msec 68 msec 60 msec

CE2#ping 50.0.1.1 source l61

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 50.0.1.1, timeout is 2 seconds:
Packet sent with a source address of 60.0.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 60/72/84 ms

CE2#traceroute 50.0.1.1 source l61

Type escape sequence to abort.
Tracing the route to 50.0.1.1

  1 121.0.0.1 40 msec 40 msec 20 msec
  2 140.1.12.1 [MPLS: Labels 300/104 Exp 0] 68 msec 60 msec 60 msec
  3 120.0.0.1 [AS 24] [MPLS: Label 104 Exp 0] 52 msec 48 msec 48 msec
  4 120.0.0.2 [AS 24] 88 msec 80 msec 60 msec

As we can see, bidirectional connectivity has been achieved with MP-BGP peering configured on /24 loopbacks.

We can now appreciate why use of /32 loopback is recommended for LDP/MP-BGP/MPLS peerings. This practice ensures that label generation is in sync with our IGP and it can save us a lot of troubleshooting in the future.