Cover Page

Table of Contents

Foreword

Introduction

Chapter 1: Getting Started

Session Negotiation and Capabilities

Chapter 2: BGP/MPLS IP-VPN

Basic Configuration

Prefix Dissemination

Extensions for IPv6 VPN (6VPE)

Multi-AS Backbones (Inter-AS)

Chapter 3: Using BGP in VPLS

BGP Auto-Discovery with LDP Signaling

BGP Auto-Discovery and Signaling

BGP Multi-Homing

Chapter 4: BGP Signaling for VPWS

BGP VPWS

Dynamic Multi-Segment Pseudowire

Chapter 5: Labeled Unicast IPv4

Seamless MPLS

Inter-AS Type C

Carriers' Carrier

Notes

Chapter 6: Reconvergence

Advertisement of Multiple Paths

Best External

Next-Hop Tracking

Prefix Independent Convergence (PIC)

Minimum Route Advertisement Interval

BGP Anycast

Chapter 7: Multicast

Inter-Domain IPv4-IPv6 PIM

Multicast in MPLS/BGP IP-VPNs

Chapter 8: Graceful Restart and Error Handling

Graceful Restart Mechanism

Error Handling

Chapter 9: Security

FlowSpec

Remote Triggered Blackholing

Generalized TTL Security Mechanism

Auto-Generation of Filters for BGP Peers

Chapter 10: General Applicability

IPv6 PE Router (6PE)

Load-Balancing

IGP Shortcuts

Split Horizon

Peer Groups

BGP in Residential Broadband Networks

QoS Policy Propagation Using BGP

Route Policy Framework

Notes

Chapter 11: Looking Ahead

Ethernet VPN (EVPN)

Control-Plane-Only Route-Reflection

Prefix Origin Validation

Link State Information Distribution Using BGP

Appendix A: Path Selection Process

Best-Path Selection Algorithm

Always-Compare-MED

Deterministic MED

References and Glossary

References

Glossary

Advertisement

Title Page

 

 

 

 

 

To my wife, whose patience knows no bounds.

Credits

Executive Editor
Carol Long
 
Project Editor
Martin V. Minner
 
Senior Production Editor
Kathleen Wisor
 
Copy Editor
Martin V. Minner
 
Editorial Manager
Mary Beth Wakefield
 
Freelancer Editorial Manager
Rosemarie Graham
 
Associate Director of Marketing
David Mayhew
 
Marketing Manager
Ashley Zurcher
 
Business Manager
Amy Knies
 
Vice President and Executive Group Publisher
Richard Swadley
 
Associate Publisher
Jim Minatel
 
Project Coordinator, Cover
Todd Klemme
 
Proofreader
Nancy Carrasco
 
Indexer
Johnna VanHoose Dinse
 
Cover Image
© Lars Ruecker/Getty Images
 
Cover Designer
Alcatel-Lucent

About the Author

Colin Bookham is a consulting engineer at Alcatel-Lucent with more than 20 years of experience in the telecommunications industry. He has designed or helped to design, and supported, many large IP networks across a broad range of market segments in EMEA.

Prior to working at Alcatel-Lucent, Colin spent a number of years working in IP design and architecture for a UK operator. Before that, he spent the early years of his career studying communications in the Royal Navy. Colin lives in Guildford, UK, and can be reached at colin.bookham@alcatel-lucent.com.

Acknowledgments

Thanks to Adam Simpson, Ian Cowburn, Jorge Rabadan, and Gilles Geerts for their valuable input in validating the content for technical correctness and relevance. Special thanks to Adam for his endless support and for never tiring of my questions.

Thanks to all those people who helped in some way in the technical reviews and helped to fix errors in the manuscript: Walter de Smedt, Guiu Fabregas, Bert Todts, Patrick Colman, Rafa Portillo, Patrick Lynchehaun, Ian Dodds, Rob Shakir, Ilya Varlashkin, and Bruno Decraene.

I also want to express my gratitude to Karyn Lennon and Stephanie Chasse who guided me along this publishing initiative.

Finally, thanks to members of the Alcatel-Lucent senior management team—Wim Henderickx, Rudy Hoebeke and Barry Denroche—for supporting me in the writing of this book.

Foreword

Over the past decade we have witnessed an exciting evolution of Internet Protocol (IP) networks from best-effort networks providing basic Internet access services to true multi-service networks providing fixed residential and mobile broadband services, business Virtual Private Network (VPN) services, cloud services, and carrying more and more mission-critical applications.

IP networks have gradually replaced most of the legacy networks of the past, resulting in more efficient and converged network infrastructures. This is not only the case for service provider and enterprise networks; the same evolution applies to strategic industry networks such as defense, energy, health care, transportation, and government networks.

When it comes to IP networking, there is arguably no protocol more important or successful than the Border Gateway Protocol (BGP)—it is the protocol that has tied the Internet together over the course of its impressive development in the past 20-plus years.

As the scope of IP networking has evolved over time, so has BGP. BGP has been extended to enable new services such as IP VPN (BGP/MPLS IPv4 VPN and 6VPE) and Layer 2 VPN (Virtual Private LAN, Virtual Leased Line, and BGP/MPLS based Ethernet VPN) services, to support network optimizations such as those provided by large-scale MPLS network designs—now commonly known as seamless MPLS—to simplify operations, to enhance network security and to improve network stability, resiliency and reconvergence performance. There is no other protocol that carries such a large and varied set of networking information and that is so central to many networking functions and services, both internally and between Autonomous Systems (AS).

This book deals with all aspects of this evolved BGP in a practical, hands-on manner, using the Alcatel-Lucent Service Router OS (SR-OS) implementation of BGP as the basis for a wealth of configuration examples. It's a great reference for networking engineers who require a comprehensive and current review on BGP and the specifics of the BGP implementation in SR-OS. I hope you will enjoy reading this book as much as I have.

Rudy Hoebeke
Vice President of Product Management
Alcatel-Lucent IP Routing & Transport Division

Introduction

As defined in the base specification for the Border Gateway Protocol (BGP), the primary function of a BGP speaking system is to exchange network reachability information with other BGP speakers while including information on the list of Autonomous Systems that the reachability information traverses. This information can be used to construct a graph of AS connectivity for this reachability, while at the same time removing routing loops and providing operators the ability to implement local policy.

The intention was clear. At its conception, BGP was to be used for exchanging Internet routes between Autonomous Systems/Internet Service Providers. As a result, the protocol was built with characteristics that above all provided a level of stability among the constant churn of the Internet routing table.

During the last 15 or so years the use of BGP has evolved significantly. From a deployment perspective, operators have learned from experience and shared those experiences with the wider community to everybody's mutual benefit. BGP is well understood and is considered a mature protocol. From a service delivery perspective, the evolution is two-fold:

So, while BGP remains the primary protocol for inter-domain route exchange, its use for delivery of intra-domain services has increased significantly. The base protocol has been extended many times to provide the ability to carry new reachability information. It thereby enables Service Providers to effectively deliver new services with minimum impact on their existing IP infrastructure using a known and deployed protocol. In addition, the protocol is evolving into new areas such as Data Centers with the advent of Ethernet VPN.

While this is happening and BGP is being used more and more to deliver business critical services, other base characteristics have changed. BGP is historically a slow-converging protocol, but fast-reconvergence upon failure has become an absolute requirement for delivery of high-profile services. Many potential consumers use fast reconvergence upon failure as a measuring stick of network performance. Incidents that result in the failure of BGP have become totally unacceptable, and so the base protocol has had to become more robust than early implementations.

Objective

The purpose of this book is to provide you with an all-encompassing single reference guide to the BGP implementation within Alcatel-Lucent SR-OS. It aims to equip you with sufficient knowledge to feel competent and confident about the technology you are addressing, and be able to maximize and optimize your implementation of BGP using SR-OS.

The book looks at how services can be delivered and how efficient routing can be achieved in both native IP networks and MPLS networks. It covers how you can use BGP to provide services such as Layer-2 VPNs and Layer-3 VPNs, as well as native or VPN-aware multicast and IPv6. At the core infrastructure layer, it looks at how you can use BGP to deliver scalable IP/MPLS networks using inter-AS and inter-domain scenarios.

In addition, the book covers techniques that you can use to improve path visibility and improve reconvergence times. It also looks at how procedures for error handling have evolved from the base BGP specification. It aims to detail the implications and considerations for each technology, and it gives design tips where appropriate.

For each feature, function, or technology that the book covers, the aim is to provide an overview of what it is and how it operates at a protocol level. The book then details the configuration requirements with CLI and debug outputs used to aid understanding. The objective is that you have a full understanding of the technology in question together with the knowledge of how to implement it in SR-OS.

Audience

The book is primarily intended for IP design and engineering communities. Familiarity with Alcatel-Lucent SR-OS is not a requirement, although readers who are familiar with SR-OS will recognize configuration examples and Command Line Interface (CLI) outputs.

You can read each chapter as a standalone chapter if, for example, you need some guidance on how to implement and configure a particular service and/or function, or even just to learn how a particular technology works. On the other hand, an avid reader passionate about BGP may choose to read from cover to cover.

To keep this book to a manageable size, I do not discuss the basic operation of BGP as a path vector protocol. Numerous other reference books provide this introductory information, and I assume that knowledge to be a prerequisite.

Want to Practice Some of These Configs?

You may want to try some of what you learn in this book in an SR-OS lab. Alcatel-Lucent can help you with its MySRLab Service.

The MySRLab Service provides you with remote access to a hosted Service Router lab so you can:

MySRLab features include:

Get started today by visiting:

www.alcatel-lucent.com/src/mysrlab

Chapter 1

Getting Started

Although this book does not discuss the operation of BGP as a path-vector protocol, it's worth a quick recap on how a BGP speaker processes and stores routes in the Routing Information Bases (RIBs). The RIB within a BGP speaker is made up of three distinct parts: the Adj-RIB-In, the Loc-RIB, and the Adj-RIB-Out. The Adj-RIB-In stores routing information learned from inbound UPDATE messages advertised by peers to the local router. The routes in the Adj-RIB-In represent routes that are available to the path decision process. The Loc-RIB contains routing information the local router selected after applying policy to the routing information contained in the Adj-RIB-In. These are the routes that will be used by the local router. The Adj-RIB-Out stores information the local router selected for advertisement to its peers. This information is carried in UPDATE messages sourced by this router when advertising to peers. In summary, the Adj-RIB-In contains unprocessed routing information advertised by peers to the local router, the Loc-RIB contains the routes that have been selected by the local BGP speaker's best-path decision process, and the Adj-RIB-Out contains the routes for advertisement to peers in UPDATE messages. I'll use this terminology throughout the book, and may interchangeably use Adj-RIB-In or simply RIB-In, and Adj-RIB-Out or simply RIB-Out.

Enabling BGP in its most basic form is a very simple exercise. All you need is an IP interface toward a BGP peer and some minimal BGP configuration. For conciseness, Output 1-1 does not show the IP interface configuration. For exchange of IPv4 reachability, the only parameters required are an Autonomous System (AS) number defined within the global router context (or Virtual Private Routed Network [VPRN] context), an IP address for the peer, and a peer AS number. The IP address and peer AS number are entered in a BGP group context, often referred to as a peer group. Peer groups allow you to group together a set of peers that have a common administrative configuration, and are discussed further in Chapter 10.


Output 1-1: Basic BGP Configuration

    router
        autonomous-system 64496
        bgp
            group “EBGP”
                neighbor 192.168.0.2
                    peer-as 64510
                exit
            exit
            no shutdown
        exit
    exit

Session Negotiation and Capabilities

A Finite State Machine (FSM) is maintained for each BGP peer, and there are six possible states in the FSM. Initially, the FSM for the BGP peer is in the Idle state. In this state, the router listens for a TCP connection initiated by the remote peer or initiates the TCP connection itself. The second state is the Connect state, where the FSM is waiting for the TCP three-way handshake to be completed. If the TCP connection is not successfully established, the state is changed to Active and a further attempt is made to establish the TCP connection to the remote peer. (If the connection continues to fail, the FSM reverts to the Idle state.) If the TCP connection is successfully established, the FSM completes the BGP initialization, generates an OPEN message toward the peer, and changes its state to OpenSent. If an OPEN message is also received from the remote peer and the parameters contained in the OPEN message are acceptable, the router generates a KEEPALIVE message and changes its state to OpenConfirm. If the parameters of the OPEN message are not acceptable, a NOTIFICATION message is sent with the appropriate error code, and the state is reverted to Idle. While in the OpenConfirm state, if the router receives a KEEPALIVE message from the remote peer, it moves to the Established state. In the Established state, peers can send UPDATE messages to exchange routing information.

The OPEN message sent by each peer contains its AS number, Hold Time, BGP identifier, and some optional parameters. The notable optional parameter is the Capabilities parameter. The Capabilities parameter is defined in RFC 5942 and allows BGP speakers to exchange capability sets in the OPEN exchange. If both peers advertise a given capability, the peers can use that advertised capability on the peering. If either peer did not advertise the capability, it cannot be used.

The Capabilities parameter is encoded as a code, a length, and a value. The output in Debug 1-1 is taken from an OPEN negotiation between an SR-OS router and a test device. The SR-OS router sends its OPEN message with capability codes indicating support for IPv4 unicast Multi-Protocol (MP)-BGP, Route-Refresh, and 4-byte ASN support. The capability code for MP-BGP encodes a value (0x0 0x1 0x0 0x1) that represents an Address Family Identify (AFI) of IPv4 (0x0 0x01) and a Subsequent Address Family Identifier (SAFI) of unicast (0x0 0x1) indicating support only for IPv4 unicast MP-BGP. (The use of the AFI and SAFI for Multi-Protocol BGP is discussed in further detail later in this chapter.) The capability code for 4-Octet ASN also encodes a value indicating its 4-byte Autonomous System number. In this case the SR-OS router only has a 2-byte Autonomous System number; therefore, it is converted into a 4-byte Autonomous System number by setting the two high-order octets of the 4-octet field set to zero.

Figure 1-1 Finite State Machine

image

Conversely, the test device peer sends its OPEN message indicating support for IPv4 unicast MP-BGP, IPv6 unicast MP-BGP, and Route Refresh. In this OPEN message the capability code for MP-BGP appears twice; each occurrence contains a different capability value. The first occurrence indicates support for IPv4 unicast. The second occurrence, with value (0x0 0x2 0x0 0x1), represents an AFI of IPv6 (0x0 0x2) and a SAFI of unicast (0x0 0x1).


Debug 1-1: OPEN message with Capabilities Negotiation

135 2013/04/18 14:47:00.98 BST MINOR: DEBUG #2001 Base BGP
"BGP: OPEN
Peer 1: 192.168.0.2 - Send (Active) BGP OPEN: Version 4
   AS Num 64496: Holdtime 90: BGP_ID 192.0.2.46: Opt Length 16
   Opt Para: Type CAPABILITY: Length = 14: Data:
     Cap_Code MP-BGP: Length 4
       Bytes: 0x0 0x1 0x0 0x1
     Cap_Code ROUTE-REFRESH: Length 0
     Cap_Code 4-OCTET-ASN: Length 4
       Bytes: 0x0 0x0 0x11 0xed
"
137 2013/04/18 14:47:00.97 BST MINOR: DEBUG #2001 Base BGP
"BGP: OPEN
Peer 1: 192.168.0.2 - Received BGP OPEN: Version 4
   AS Num 64510: Holdtime 30: BGP_ID 192.168.0.2: Opt Length 16
   Opt Para: Type CAPABILITY: Length = 14: Data:
     Cap_Code MP-BGP: Length 4
       Bytes: 0x0 0x1 0x0 0x1
     Cap_Code MP-BGP: Length 4
       Bytes: 0x0 0x2 0x0 0x1
     Cap_Code ROUTE-REFRESH: Length 0
"

This asymmetric capability negotiation is acceptable from the perspective of the peering session, providing that the only optional capabilities used are IPv4 MP-BGP and Route-Refresh. If, for example, the peer advertises an IPv6 prefix using MP-BGP, this results in a NOTIFICATION message being sent. The integrity of the peering session thereafter is dependent on supported and configured error handling capabilities. Standard capabilities' codes are maintained by the Internet Assigned Numbers Authority (IANA) at www.iana.org/assignments/capability-codes/capability-codes.xml but vendor-specific capability codes are in widespread use. During capability exchange these should be ignored by a BGP speaker if not recognized.


Output 1-2: Local/Remote Capabilities

*A:R1# show router bgp neighbor 192.168.0.2 | match expression “Local|Remote”
Local AS             : 64496            Local Port           : 179
Local Address        : 192.168.0.1
Local Family         : IPv4
Remote Family        : IPv4 IPv6
Local Capability     : RtRefresh MPBGP 4byte ASN
Remote Capability    : RtRefresh MPBGP
Local AddPath Capabi*: Disabled
Remote AddPath Capab*: Send - None

The Hold Times negotiated in the OPEN exchange do not have to be the same for the BGP session to be established. The BGP speaker calculates the active Hold Time value by using the smaller of its configured value and the value received in the OPEN message. In the OPEN exchange shown in Debug 1-1, SR-OS uses the default Hold Time of 90 seconds while the peer advertises a Hold Time of 30 seconds. This exchange results in both peers using a Hold Time of 30 seconds, with KEEPALIVE messages exchanged every (30/3) 10 seconds.

As previously described, when a BGP speaker has sent an OPEN message it moves to the OpenSent state, and when it has received a corresponding OPEN message from its peer it moves to OpenConfirm state. If the BGP speaker is happy with the contents of the received OPEN message, it responds with a KEEPALIVE message. When each BGP speaker has sent and received an OPEN message and KEEPALIVE message, they move to the ESTABLISHED state and can then exchange reachability information.

UPDATE Messages

This book does not explicitly detail all BGP message formats, but it's useful to review the basic BGP UPDATE format so you can understand the differences between it and the general format of Multi-Protocol BGP UPDATE messages. The Withdrawn Routes field contains a list of IP prefixes in the form <length, prefix> that are being withdrawn from service. The Network Layer Reachability Information (NLRI) field contains a list of IP prefixes, again in the form <length, prefix>, that can be reached from a given BGP speaker (subject to policy).


Debug 1-2: Active Hold Time

*A:R1# show router bgp neighbor 192.168.0.2 | match "Hold Time"
Hold Time            : 90               Keep Alive :         30
Min Hold Time        : 0
Active Hold Time     : 30               Active Keep Alive :  10

Figure 1-2 UPDATE Message Format

image

The Path attributes field contains a sequence of attributes associated with an NLRI and each attribute can be placed into one of four categories: well-known mandatory, well-known discretionary, optional transitive, and optional non-transitive. Non-transitive simply refers to the fact that this attribute may be advertised into an AS but may not leave that AS.

Mandatory attributes must be present in the UPDATE message if NLRI is present (that is, the UPDATE does not purely carry Withdraw routes) and include the ORIGIN, AS_PATH, and NEXT_HOP attributes. Examples of well-known discretionary attributes include LOCAL_PREF and ATOMIC_AGGREGATE.


Output 1-3: UPDATE Message with NLRI

1 2013/06/09 09:07:10.11 BST MINOR: DEBUG #2001 Base Peer 1: 192.168.0.2
"Peer 1: 192.168.0.2: UPDATE
Peer 1: 192.168.0.2 - Received BGP UPDATE:
    Withdrawn Length = 0
    Total Path Attr Length = 18
    Flag: 0x40 Type: 1 Len: 1 Origin: 0
    Flag: 0x40 Type: 3 Len: 4 Nexthop: 192.168.0.2
    Flag: 0x40 Type: 2 Len: 4 AS Path:
        Type: 2 Len: 1 < 64510 >
    NLRI: Length = 4
        172.16.0.0/20
"

At the beginning of the path attribute field there is a 2-octet field that contains an Attribute Flags octet followed by the Attribute Type Code octet as shown in Figure 1-3.

Figure 1-3 Path Attribute Flags

image

The Attribute Type Code is a value defining the type of Attribute. Within the Attribute Flags octet the high-order bit (bit 0) is the Optional bit and defines whether the attribute is optional (1) or well-known (0). Bit 1 is the Transitive bit and defines whether an optional attribute is transitive (1) or non-transitive (0). Bit 2 is the Partial bit and defines whether an optional transitive attribute is recognized by a BGP speaker when advertising it to peers (0), or unrecognized (1). Note that if a BGP speaker recognizes the optional transitive attribute (and would therefore set the partial bit to 0), but the partial bit has already been set to 1 by some other AS, it must not be set back to zero by the processing speaker. In effect, when set, the partial bit provides visibility that some BGP speaker along the path didn't recognize the attribute. Bits 4-7 are reserved and should be set to zero (some early Internet drafts on error handling for optional-transitive attributes proposed the use of bit 4, but this proposal was largely superseded through widespread adoption of other error handling drafts discussed in Chapter 8).

Examples of optional non-transitive attributes include the MED, ORIGINATOR_ID, CLUSTER, MP_REACH, and MP_UNREACH attributes, while examples of optional transitive attributes include the AGGREGATOR and COMMUNITY attributes.

In order to withdraw a route from service once it has been advertised, the IP prefix previously advertised as NLRI in the UPDATE message can be advertised in the Withdrawn Routes field of an UPDATE message, or a replacement route with the same NLRI can be advertised. Equally, if the BGP session between two peers is closed, all routes advertised to each other are implicitly removed. If an UPDATE message carries only Withdrawn Routes and no NLRI, the mandatory attributes such as NEXT_HOP, ORIGIN, and AS_PATH need not be present.

NOTIFICATION Messages

A NOTIFICATION message is sent when an error condition is detected and causes the BGP session to close. The NOTIFICATION message contains fields for error codes, one or more error sub-codes associated with that error code, and a data field that provides some indication of the error condition. Error codes and sub-codes are contained in section 4.5 of RFC 4271, updated by RFC 4486 (Subcodes for BGP Cease NOTIFICATION Message).


Debug 1-3: UPDATE Message with Withdrawn Routes

3 2013/06/09 09:09:06.50 BST MINOR: DEBUG #2001 Base Peer 1: 192.168.0.2
"Peer 1: 192.168.0.2: UPDATE
Peer 1: 192.168.0.2 - Received BGP UPDATE:
    Withdrawn Length = 4
        172.16.0.0/20
    Total Path Attr Length = 0"

Error conditions that require a NOTIFICATION message to be sent are categorized into three types:

When the BGP session is closed, the associated TCP connection is closed, the RIB-IN entries with the peer are cleared, and all resources allocated to that particular peer are released. Errors in the BGP message header are uncommon and indicate a fairly fundamental problem. Errors in the OPEN message are typically due to misconfiguration of peer parameters. However, errors in UPDATE messages are not uncommon, and have the potential to be extremely disruptive.


Debug 1-4: NOTIFICATION Message

11 2013/06/09 09:14:03.48 BST MINOR: DEBUG #2001 Base Peer 1: 192.168.0.2
"Peer 1: 192.168.0.2: NOTIFICATION
Peer 1: 192.168.0.2 - Received BGP NOTIFICATION: Code = 6 (CEASE) Subcode
= 4 (Administrative Reset)
  Data Length = 16  Data: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0"

The original BGP specification called for a NOTIFICATION message to be generated under a number of conditions during error checking of attributes within UPDATE messages. More recent work (draft-ietf-grow-ops-reqs-for-bgp-error-handling) has called for alternative measures to be implemented under these circumstances in order to avoid this level of disruption. This point is discussed further in Chapter 8.

Multi-Protocol BGP

The Multi-Protocol extensions to BGP defined in RFC 4760 provide the capability for BGP to carry routing information for multiple network layer protocols such as IPv6, VPN-IPv4, VPN-IPv6, L2VPN, and Multicast-VPN, to name but a few. To identify individual network layer protocols and be able to associate them with Next-Hop information and the semantics of the NLRI, the extensions to Multi-Protocol BGP specified the use of the Address Family Identifier (AFI) and the Subsequent Address Family Identifier (SAFI).

AFI and SAFI assignments are administered by IANA at www.iana.org/assignments/address-family-numbers/address-family-numbers.xhtml and www.iana.org/assignments/safi-namespace/safi-namespace.xhtml. By way of example, a VPN-IPv4 prefix is represented as AFI 1 (IPv4), SAFI 128 (MPLS-labeled VPN address).

Two optional transitive attributes were introduced to support Multi-Protocol extensions to BGP: Multi-Protocol Reachable NLRI and Multi-Protocol Unreachable NLRI. The Multi-Protocol Reachable NLRI (MP_REACH_NLRI) is used to carry the set of reachable destination prefixes together with the Next-Hop information to be used for forwarding to those destination prefixes. Each MP_REACH_NLRI UPDATE message contains a single Next-Hop address and a list of NLRIs associated with that Next-Hop address.

At a minimum, an UPDATE message that carries the MP_REACH_NLRI must also carry the Next-Hop, Origin, and AS_PATH attributes in both EBGP and IBGP, and the LOCAL-PREF attribute in IBGP.

In contrast, Multi-Protocol Unreachable NLRI (MP_UNREACH_NLRI) is used to withdraw one or more unfeasible routes and has much the same format as the MP_REACH_NLRI attribute without the requirement to signal Next-Hop information.

Figure 1-4 MP_REACH_NLRI Encoding

image

Figure 1-5 MP_UNREACH_NLRI Encoding

image

Unlike the MP_REACH_NLRI, an UPDATE message containing the MP_UNREACH_NLRI attribute is not required to carry any other path attributes.

The capability to support Multi-Protocol BGP is negotiated in the OPEN exchange on an Address Family basis. By default, SR-OS signals the Multi-Protocol BGP capability for AFI/SAFI unicast IPv4 only. If other Address Families are added or removed at BGP/group/neighbor level, the OPEN exchange is renegotiated. To illustrate the encoding of the Multi-Protocol BGP MP_REACH_NLRI, Debug 1-5 shows an UPDATE message for IPv6 prefix 2a00:8010:1b00::/48. Note the Address Family, Next-Hop information, and prefix are all contained within the single MP_REACH_NLRI attribute.

The introduction of Multi-Protocol BGP was significant. BGP was already considered a very flexible protocol and relatively lightweight to support, and with the introduction of Multi-Protocol BGP AFI/SAFI and different NLRI it had become extensible to support any other network layer as you'll see in the following chapters.

UPDATE or MP_REACH, and Withdraw or MP_UNREACH are referred to interchangeably throughout this book.


Debug 1-5 UPDATE with MP_REACH_NLRI attribute

1 2013/05/02 13:54:46.39 BST MINOR: DEBUG #2001 Base Peer 1: 192.168.0.2
"Peer 1: 192.168.0.2: UPDATE
Peer 1: 192.168.0.2 - Received BGP UPDATE:
    Withdrawn Length = 0
    Total Path Attr Length = 42
    Flag: 0x40 Type: 1 Len: 1 Origin: 0
    Flag: 0x40 Type: 2 Len: 4 AS Path:
        Type: 2 Len: 1 < 64510 >
    Flag: 0x80 Type: 14 Len: 28 Multiprotocol Reachable NLRI:
        Address Family IPV6
        NextHop len 16 Global NextHop 2001:db8:1C00::3
        2001:db8:1B00::/48"

Chapter 2

BGP/MPLS IP-VPN

The framework for building BGP/Multi-Protocol Label Switching (BGP/MPLS) based IP Virtual Private Networks (IP-VPNs) relies on Multi-Protocol BGP (RFC 4760) and the optional-transitive BGP Extended Communities (RFC 4360) attribute “Route Target.”

Multi-Protocol BGP is used for advertising of VPN-IPv4/VPN-IPv6 prefixes, and, because both are labeled prefixes, they follow the encoding of labeled BGP (RFC 3107), where the prefix is constructed of an 8-byte Route-Distinguisher followed by a 4-byte IPv4 prefix or 16-byte IPv6 prefix. The purpose of the RD is to allow the concatenation of RD and IPv4/IPv6 prefixes to create a unique VPN-IPv4/VPN-IPv6 prefix.

For VPN-IPv4 the AFI is 1 (IPv4), and for VPN-IPv6 the AFI is 2 (IPv6). Both VPN-IPv4 and VPN-IPv6 use a SAFI of 128 (MPLS-labeled VPN address).

Figure 2-1 VPN-IPv4/IPv6 NLRI Encoding

image

When a route is redistributed into VPN-IPv4, a Route Target Extended Community is appended to the prefix. The Route Target Extended Community is a transitive attribute (RFC 4360) used to define the set of sites belonging to a given VPN. When a VPN-IPv4 prefix is received at a Provider Edge (PE) router, it parses the Route Target value and checks whether any locally configured VRFs have an import policy that matches that value. If it does, the route is imported into that VPRN. If it doesn't, the route is not imported into any VPRNs. In short, associating a particular Route Target attribute with a prefix allows that route to be placed into VRFs serving that VPN. If ten sites in a VPN all have a common export and import Route Target value, the result is an “any-to-any” VPN.

Basic Configuration

Output 2-1 shows the base level of configuration required in order to configure a VPRN. The route-distinguisher (RD) is a required parameter when configuring a VPRN, and the VPRN will not become operational until it is configured. When a VPRN is configured with a Route-Distinguisher but without any Route Target parameters, the VPRN does not rely on any BGP/MPLS IP-VPN control plane for learning prefixes but simply creates a separate routing context frequently referred to as “VRF-lite.” The route-distinguisher command is followed by a value that can take three formats but typically uses the type 0 format of a 2-byte ASN subfield followed by a 4-byte assigned number subfield (the remaining 2 bytes are used to define the actual type).

To participate in the BGP/MPLS IP-VPN control plane, the definition of Route Target values is required for import and export of VPN-IPv4 prefixes. The simplest method is using the vrf-target command followed by a Route Target value that has the same format as the Route Distinguisher. The vrf-target command allows for definition of a single value applicable to import and export Route Targets as shown in Output 2-1, or it allows for definition of different import and export Route Target values using the export and import keywords after the vrf-target command, followed by the relevant Route Target values. An alternative to the vrf-target approach for defining Route Target values is to use the vrf-import and vrf-export commands to reference policies constructed within the policy framework.

When prefixes are learned in VPN-IPv4, the receiving PE router must resolve the BGP Next-Hop to a GRE or MPLS tunnel before the prefix is considered valid. The auto-bind command tells the system to automatically bind the Next-Hop to an LSP in the LSP tunnel-table, and the keyword mpls means to use any form of LSP, with a preference for RSVP over LDP, and LDP over BGP.


Output 2-1: VPRN Base Configuration

    service
        vprn 4001
            autonomous-system 64496
            route-distinguisher 64496:4001
            auto-bind mpls
            vrf-target target:64496:4001
            no shutdown

One last optional parameter is the definition of an autonomous-system number in the VPRN. This parameter is required only if BGP is used as a PE-CE routing protocol. This parameter is used as the source ASN in the OPEN exchange unless the local-as parameter is also configured, in which case the ASN defined as the local-as is used in the OPEN exchange. (This also applies to the use of local-as in the global BGP context.)

At face value, both the VPRN autonomous-system ASN and local-as ASN appear to serve the same purpose of mimicking an ASN that differs from the global ASN defined in the router context. In fact, they can have different impacts on the AS_PATH of UPDATE messages propagated to connected CE routers depending on two things:

If configured on their own (they do not co-exist) the VPRN-level ASN or local-as ASN is appended to the AS_PATH advertised to the CE and overrides the global ASN. If they are configured to co-exist, the behavior differs depending on the setting of the local-as no-prepend-global-as parameter. If the no-prepend-global-as parameter is disabled, the local-as AS number is appended to the AS_PATH along with the VPRN-level AS number if it differs from the VPRN-level ASN. If the no-prepend-global-as parameter is enabled, the local-as AS number overrides the VPRN-level AS number.

The local-as parameter can be considered useful if a VPRN context needs to appear to be more than one ASN to its peers. If not, the VPRN-level ASN is sufficient. To consolidate the various options, consider the topology in Figure 2-2 where CE1 is in AS 64509 and advertises IPv4 prefix 172.31.100.0/24 to PE1, which in turn propagates the prefix to PE2, which in turn propagates the prefix to CE2. The AS_PATH as seen at CE2 with different configurations is shown in Table 2.1.

Figure 2-2 AS_PATH Encoding

image

Table 2.1 AS_PATH Encoding with VRF ASN and Local-AS

image

Prefix Dissemination

When a PE router belongs to a particular VPN, it learns some of that VPN's routes from attached CE routers using static or dynamic routing. These routes are installed in the VRF associated with that CE router and are converted to VPN-IPv4/IPv6 routes for export into BGP so that other PEs belonging to that VPN can learn those routes. These routes can be disseminated to other PE routers of the same VPN through a number of methods; some use an implicit flood model while others use an explicit send-only-if-required model. This section discusses the varying approaches that can be adopted for both prefix dissemination and route-table updates following a local policy change.

Automatic Route Filtering

To help scale BGP/MPLS based IP-VPNs, PE routers do not by default retain in the RIB-IN prefixes that are not associated with any configured VRFs. When a PE router receives a VPN-IPv4/IPv6 prefix with a Route Target value that is not associated with any VRFs on that PE, the prefix is simply discarded (unless, of course, the PE is a Route-Reflector). This approach is known as Automatic Route Filtering (ARF) and ensures that PE routers hold routes only for VRFs that are actually configured on that PE. It is enabled by default and requires no configuration. In the example illustrated in Debug 2-1, the PE router receives VPN-IPv4 prefix 64496:30:172.16.100.0/24 with Extended Community Route Target 64496:30 but has no configured VRFs with that Route Target value. The PE router therefore silently discards the prefix as shown in Output 2-2.


Debug 2-1: Automatic Route Filtering

1 2013/04/22 15:29:36.77 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.11
"Peer 1: 192.0.2.11: UPDATE
Peer 1: 192.0.2.11 - Received BGP UPDATE:
    Withdrawn Length = 0
    Total Path Attr Length = 75
    Flag: 0x90 Type: 14 Len: 32 Multiprotocol Reachable NLRI:
        Address Family VPN_IPV4
        NextHop len 12 NextHop 192.0.2.13
        172.16.100.0/24 RD 64496:30 Label 262139
    Flag: 0x40 Type: 1 Len: 1 Origin: 0
    Flag: 0x40 Type: 2 Len: 0 AS Path:
    Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
    Flag: 0x80 Type: 9 Len: 4 Originator ID: 192.0.2.13
    Flag: 0x80 Type: 10 Len: 4 Cluster ID:
        192.0.2.11
    Flag: 0xc0 Type: 16 Len: 8 Extended Community:
        target:64496:30"

 


Output 2-2: ARF and RIB-IN

*A:PE1# show router bgp routes vpn-ipv4 172.16.100.0/24
==========================================================================
 BGP Router ID:192.0.2.22      AS:64496       Local AS:64496
==========================================================================
 Legend -
 Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
 Origin codes : i - IGP, e - EGP, ? - incomplete, > - best, b - backup
==========================================================================
BGP VPN-IPv4 Routes
==========================================================================
Flag  Network                              LocalPref         MED
      Nexthop                              Path-Id           Label
      As-Path
--------------------------------------------------------------------------
No Matching Entries Found
==========================================================================

Route Refresh

ARF is useful for optimizing memory consumption, but what happens if a PE router's policy changes, and it now requires a VPN-IPv4 prefix that it had previously discarded? What is needed is a mechanism to allow the PE router to reevaluate all learned routes against the modified policy. This is the purpose of Route Refresh. Route Refresh (RFC 2918) capability is negotiated during the OPEN exchange and allows a BGP speaker to dynamically request a readvertisement of the Adj-RIB-OUT from a BGP peer. Once the peer readvertises the Adj-RIB-OUT, it can be reevaluated against the new policy. In SR-OS, every time a VPRN import policy is modified, either through the route-policy framework or through modification of the VPRN vrf-target syntax, a Route Refresh is generated for the VPN-IPv4 and VPN-IPv6 Address Families as shown in Debug 2-2. Not shown for conciseness is the Adj-RIB-OUT readvertised to the speaker generating this refresh.


Debug 2-2: Route Refresh

2 2013/04/22 15:41:49.81 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.11
"Peer 1: 192.0.2.11: ROUTE REFRESH
Peer 1: 192.0.2.11 - Send BGP ROUTE REFRESH: Address Family AFI_IPV4: Sub
AFI SAFI_VPN"
 
3 2013/04/22 15:41:49.81 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.11
"Peer 1: 192.0.2.11: ROUTE REFRESH
Peer 1: 192.0.2.11 - Send BGP ROUTE REFRESH: Address Family AFI_IPV6: Sub
AFI SAFI_VPN"

When the Route Refresh has been generated, all triggered routes (in this case VPN-IPv4 and VPN-IPv6) are marked as “stale” and must be refreshed within the system “purge timer.” By default, this timer is set to 10 minutes, and any prefixes not refreshed before the expiration of the purge timer are deleted. There should be no requirement to modify this timer but it is possible using the purge-timer command within the BGP context.

Route Refresh provides a dynamic way to refresh policy, but the method that it uses can be considered somewhat sub-optimal because the mechanism essentially says to its peer(s) “give me every prefix again and I will reevaluate all of those prefixes against my new RIB-IN policy.” This consumes resources not only for the receiving PE, but also for the peer(s) that need to transmit all of the Adj-RIB-OUT UPDATE messages again. In environments with high provisioning activity, these Route Refreshes can be frequent and often overlapping (which causes the purge timer to be reset again) resulting in a high control plane load. Less of an issue, but still worth highlighting, is that the age of any refreshed prefixes is reset when the route is refreshed. The age of a prefix in a routing table is often used by operational communities when troubleshooting, so having this reset during a Route Refresh means potentially useful information is lost.

Outbound Route Filtering

Where the Route Refresh mechanism refreshes all prefixes of a given Address Family, a better approach could be to ask only for those prefixes a PE router knows that it needs. This is the premise of Outbound Route Filtering (ORF). ORF allows a BGP speaker to send to a BGP peer a set of filters that the peer should apply on its Adj-RIB-OUT.

ORF entries are carried within Route Refresh messages and are encoded as:

SR-OS supports the Extended Community (Route Target) ORF-type and it is enabled by configuring the outbound-route-filtering context followed by the command extended-community and an option or either accept-orf or send-orf. In a typical BGP/MPLS IP-VPN environment involving Route Reflection, PE routers are configured to send-orf values, while Route-Reflectors are configured to accept-orf values. This causes the Route-Reflector(s) to apply filters to its Adj-RIB-OUT such that only requested Route Target values are advertised to its peers.

After ORF has been enabled, any time the VPRN import policy is modified (again either through the route-policy framework or through modification of the VPRN vrf-target syntax), Route Refresh messages are generated that remove any existing filter policy and then apply the modified filter policy. The simple example shown in Debug 2-3 shows a VPRN import policy being created to allow Route Target 64496:20.


Output 2-3: ORF Configuration

    router
        bgp
            outbound-route-filtering
                extended-community
                    accept-orf|send-orf
                exit
            exit
            no shutdown
        exit
    exit

 


Debug 2-3: Route-Refresh with ORF

25 2013/04/22 16:27:36.17 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.11
"Peer 1: 192.0.2.11: ORF
Peer 1: 192.0.2.11 - Send BGP (ROUTE_REFRESH) ORF: AFI 1, Sub AFI 128
    When-to-refresh: DEFER
    ORF Type: Extended Community
    ORF Len: 1 Bytes
        ORF Action: REMOVE-ALL
        ORF Match: PERMIT
"
 
26 2013/04/22 16:27:36.17 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.11
"Peer 1: 192.0.2.11: ORF
Peer 1: 192.0.2.11 - Send BGP (ROUTE_REFRESH) ORF: AFI 1, Sub AFI 128
    When-to-refresh: IMMEDIATE
    ORF Type: Extended Community
    ORF Len: 9 Bytes
        ORF Action: ADD
        ORF Match: PERMIT
        Extended Community : 0.2.0.64496.0.0.0.20

ORF allows for implementation of a more explicit “request only what you need” model, but support of different ORF-types varies between vendors. For example, SR-OS supports the Extended Community ORF-type, but other implementations have historically favored Address-based ORF-type.

Soft Reconfiguration

One potential way to completely avoid the use of Route Refresh is through the use of the so-called “soft reconfiguration.” Using soft reconfiguration, the router retains all of the Multi-Protocol BGP prefixes that it receives, regardless of whether they are imported into a VRF or not. For prefixes that are imported into a VRF, normal behavior applies, but prefixes with Route Target values that are not associated with any VRF on the router are retained in the RIB-IN and marked as invalid/rejected. Thereafter, when local policy on the router is modified, it does not send any Route Refresh messages but simply scans the prefixes in the RIB-IN against the modified policy.

Soft reconfiguration for BGP/MPLS IP-VPN is enabled with the mp-bgp-keep command under the global BGP context. The advantages over Route Refresh should be self-evident, but the significant disadvantage is that memory is consumed by prefixes that the router doesn't need. If memory is not an issue, soft reconfiguration is a good mechanism for reducing control plane activity.

Route Target Constraint

Constrained Route Target distribution for BGP/MPLS IP-VPNs (RFC 4684) builds on the concept of cooperative route filtering by propagating required Route Target membership information. Route Target membership information received by BGP speakers is then used to dynamically build outbound filters so that VPN-IPv4/IPv6 UPDATE messages are propagated only to peers that have advertised the respective Route Target. In effect, Route Target Constraint is used to create a controlled flooding distribution graph.