BACKGROUND
[0001] Typical physical networks contain several physical routers to perform L3 forwarding
(i.e., routing). When a first machine wants to send a packet to a second machine located
on a different IP subnet, the packet is sent to a router that uses a destination IP
address of the packet to determine through which of its physical interfaces the packet
should be sent. Larger networks will contain multiple routers, such that if one of
the routers fails, the packets can be routed along a different path between the first
machine and the second machine. Both within a contained network and across network
boundaries, routing protocols are used to advertise routes through the network. That
is, a first router peers with a second router and sends messages to the second router
indicating which addresses it can reach through its other interfaces and how far away
those addresses are. The first router also receives corresponding information from
the second router, and uses this information to determine how to route packets.
[0002] In logical networks implemented in a datacenter, user-defined data compute nodes
(e.g., virtual machines) on different subnets may need to communicate with each other,
as well as with machines external to the datacenter. In this case, tenants may define
a network for virtualization that includes both logical switches and logical routers.
Methods for implementing the logical routers to adequately serve such virtualized
logical networks in datacenters are needed, including methods that allow for a similar
route exchange with routers of the external physical network.
[0003] In the background art,
US 2015/0063360A1 Thakkar et al describes a network controller that manages a plurality of logical networks, wherein
the network controller receives a specification of a logical network that includes
a logical router.
BRIEF SUMMARY
[0004] The present invention is defined according to the independent claims. Additional
features will be appreciated from the dependent claims and the description herein.
Any embodiments which are described but which do not fall within the scope of the
claims are to be interpreted merely as examples useful for a better understanding
of the invention.
[0005] Some embodiments provide a method for implementing a dynamic routing protocol for
a logical router that interfaces with an external network (e.g., external to the datacenter
in which the logical router is implemented. In some embodiments, the logical router
has multiple interfaces with the external network, each of which is implemented in
a separate gateway host machine. When selecting the gateway host machines to implement
these interfaces, a network controller of some embodiments selects one of the gateway
host machines to also implement a dynamic routing protocol control plane. Each of
the interfaces operates as a separate component, advertising routes to the external
network and receiving dynamic routing protocol information (e.g., BGP or OSPF packets,
or data for any other dynamic routing protocol). The various interfaces at the gateway
host machines forward the dynamic routing protocol packets to the single control plane,
which performs route calculation to update routing tables for the gateway host machines
to use in implementing the logical router interfaces.
[0006] In some embodiments, the logical router is implemented in a managed network (e.g.,
a datacenter) in both distributed and centralized fashion. Specifically, the management
plane of some embodiments (implemented, e.g., in a network controller) receives a
logical router configuration (e.g., through an API) and defines multiple routing components
for the logical router. In some embodiments, when the logical router connects to an
external network the management plane defines one distributed routing component for
the logical router, and one centralized routing component for each interface of the
logical router that connects to the external network. Each of these centralized routing
components is then assigned to a gateway host machine that implements the corresponding
interface. In some embodiments, the management plane generates a routing table for
each of the centralized routing components and configures them with these routing
tables.
[0007] In addition, the user (e.g., network administrator) that configures the logical router
may specify for the router to advertise one or more public IP subnets to the external
network, in order to attract traffic directed to that subnet. As mentioned, some embodiments
select one of the gateway host machines that implements a centralized routing component
to implement a dynamic routing protocol control plane (e.g., in a same virtual machine
or other data compute node that implements the centralized routing component, in a
different virtual machine or other data compute node, etc.).
[0008] All of the centralized routing components advertise the specified public IP subnets,
and receive dynamic routing protocol packets advertising routes from the external
routers to which they connect. Rather than processing these packets locally (which
would often result in duplicative processing), the centralized components are configured
to pass these packets to the selected gateway host machine, which removes duplicative
information and updates the respective routing tables.
[0009] In some embodiments, all of the gateway host machines that implement centralized
components for a logical router are configured with the ability to run the dynamic
routing protocol control plane. The protocol stack running on the selected gateway
host machine operates as the master, and only if that machine fails does one of the
others take over. In this case, the standby control plane takes over in a manner similar
to graceful restart for a standard router. That is, the newly-determined master would
indicate to the physical external router (through packets sent from the several centralized
routing components) to send all of its routes, which would enable the control plane
to recalculate the updated routing tables for the centralized components.
[0010] The preceding Summary is intended to serve as a brief introduction to some embodiments
of the invention. It is not meant to be an introduction or overview of all inventive
subject matter disclosed in this document. The Detailed Description that follows and
the Drawings that are referred to in the Detailed Description will further describe
the embodiments described in the Summary as well as other embodiments. Accordingly,
to understand all the embodiments described by this document, a full review of the
Summary, Detailed Description and the Drawings is needed. Moreover, the claimed subject
matters are not to be limited by the illustrative details in the Summary, Detailed
Description and the Drawing, but rather are to be defined by the appended claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The novel features of the invention are set forth in the appended claims. However,
for purpose of explanation, several embodiments of the invention are set forth in
the following figures.
Figure 1 illustrates a configuration view of a logical network that includes a logical router,
which represents the logical network as designed by a user.
Figure 2 illustrates a management plane view of the logical network of Figure 1.
Figure 3 illustrates a physical implementation of the logical router of Figure 1.
Figure 4 conceptually illustrates a process of some embodiments for configuring SRs to implement
the uplinks of a logical router, with one of the SRs assigned to operate a control
plane for a dynamic routing protocol.
Figure 5 illustrates the start of BGP operations and establishment of adjacency with an external
router.
Figure 6 conceptually illustrates a process of some embodiments performed by a SR, that does
not host the routing protocol control plane, upon receipt of a packet.
Figure 7 illustrates the receipt of BGP updates by the three SRs of Figure 5.
Figure 8 conceptually illustrates a process of some embodiments performed by the dynamic routing
protocol control plane operating as a route server to update routing tables for all
of the SRs of the logical router.
Figure 9 illustrates the data distributed by the routing protocol control plane operating
on one SR to the other SRs, based on received updates.
Figure 10 illustrates the processing of a packet by an SR using newly distributed routing information.
Figure 11 illustrates the failover of a BGP control plane for a set of SRs.
Figure 12 conceptually illustrates an electronic system with which some embodiments of the
invention are implemented.
DETAILED DESCRIPTION
[0012] Some embodiments provide a method for implementing a dynamic routing protocol for
a logical router that interfaces with an external network (e.g., external to the datacenter
in which the logical router is implemented. In some embodiments, the logical router
has multiple interfaces with the external network, each of which is implemented in
a separate gateway host machine. When selecting the gateway host machines to implement
these interfaces, a network controller of some embodiments selects one of the gateway
host machines to also implement a dynamic routing protocol control plane. Each of
the interfaces operates as a separate component, advertising routes to the external
network and receiving dynamic routing protocol information (e.g., Border Gateway Protocol
(BGP) or Open Shortest Path First (OSPF) packets). The various interfaces at the gateway
host machines forward the dynamic routing protocol packets to the single control plane,
which performs route calculation to update routing tables for the gateway host machines
to use in implementing the logical router interfaces.
[0013] In some embodiments, the logical router is implemented in a managed network (e.g.,
a datacenter) in both distributed and centralized fashion. Specifically, the management
plane of some embodiments (implemented, e.g., in a network controller) receives a
logical router configuration (e.g., through an application programming interface (API))
and defines multiple routing components for the logical router. In some embodiments,
when the logical router connects to an external network the management plane defines
one distributed routing component for the logical router (referred to as a distributed
router, or DR), and one centralized routing component for each interface of the logical
router that connects to the external network (referred to as service routers, or SRs).
Each of these SRs is then assigned to a gateway host machine that implements the corresponding
interface. In some embodiments, the management plane generates a routing table for
each of the SRs and configures them with these routing tables. The management plane
operations to define multiple routing components for a logical router are described
in further detail in
U.S. Provisional Application 62/110,061, filed 1/30/2015, and
U.S. Patent Application 14/814,473, filed 7/30/215.
[0014] In addition, the user (e.g., network administrator) that configures the logical router
may specify for the router to advertise one or more public IP subnets to the external
network, in order to attract traffic directed to that subnet. As mentioned, some embodiments
select one of the gateway host machines that implements a SR to implement a dynamic
routing protocol control plane (e.g., in a same virtual machine or other data compute
node that implements the SR, in a different virtual machine or other data compute
node, etc.).
[0015] All of the SRs advertise the specified public IP subnets and receive dynamic routing
protocol packets advertising routes from the external routers to which they connect.
Rather than processing these packets locally (which would often result in duplicative
processing), the SRs are configured to pass these packets to the selected gateway
host machine, which removes duplicative information and updates the respective routing
tables.
[0016] In some embodiments, all of the gateway host machines that implement SRs for a logical
router are configured with the ability to run the dynamic routing protocol control
plane. The protocol stack running on the selected gateway host machine operates as
the master, and only if that machine fails does one of the others take over. In this
case, the standby control plane takes over in a manner similar to graceful restart
for a standard router. That is, the newly-determined master would indicate to the
physical external router (through packets sent from the several SRs) to send all of
its routes, which would enable the control plane to recalculate the updated routing
tables for the SRs.
[0017] The above introduces the concept of having one of several gateways between a logical
network and an external physical network acting as a route server for the logical
router that interfaces with the external physical network. In the following, Section
I introduces the logical routers of some embodiments and their physical implementation.
Next, Section II describes the operation of a centralized routing component in route
server mode according to some embodiments. Finally, Section III describes the electronic
system with which some embodiments of the invention are implemented.
I. LOGICAL ROUTER AND PHYSICAL IMPLEMENTATION
[0018] The following discussion describes the design of logical routers for some embodiments
as well as the implementation of such logical routers by the network controllers of
some embodiments. Logical routers, in some embodiments, exist in three different forms.
The first of these forms is the API view, or configuration view, which is how the
logical router is defined by a user (e.g., a datacenter provider or tenant). The second
view is the control plane, or management plane, view, which is how the network controller
internally defines the logical router. Finally, the third view is the physical realization,
or implementation of the logical router, which is how the logical router is actually
implemented in the datacenter. That is, the logical router is an abstraction describing
a set of functionalities (e.g., routing, NAT, etc.) that a user configures for the
logical router. The logical router is then implemented by various machines in the
datacenter based on instructions distributed to those machines by a set of network
controllers, with the instructions generated by the network controllers according
to the configuration provided by the user.
[0019] In the control plane view, the logical router of some embodiments may include one
or both of a single DR and one or more SRs. The DR, in some embodiments, spans managed
forwarding elements (MFEs) that couple directly to VMs or other data compute nodes
that are logically connected, directly or indirectly, to the logical router. The DR
of some embodiments also spans the gateways to which the logical router is bound.
The DR, in some embodiments, is responsible for first-hop distributed routing between
logical switches and/or other logical routers that are logically connected to the
logical router. The SRs of some embodiments are responsible for delivering services
that are not implemented in a distributed fashion (e.g., some stateful services).
[0020] In some embodiments, the physical realization of a logical router always has a DR
(i.e., for first-hop routing). A logical router will have SRs if either (i) the logical
router connects to external physical networks or (ii) the logical router has services
configured that do not have a distributed implementation (e.g., NAT, load balancing,
DHCP in some embodiments), or both. The present subject matter relates to logical
routers that connect to external physical networks, and which do so in a uniform manner
(i.e., all of the interfaces of the logical router with the external physical network
have the same L3 connectivity).
[0021] Figures 1-3 illustrate the three different views of an implementation for a logical router 115
that connects to an external network 120.
Figure 1 specifically illustrates the configuration view, which represents a logical network
100 as designed by a user. As shown, the logical router 115 is part of a logical network
100 that includes the logical router 115 and two logical switches 105 and 110. The
two logical switches 105 and 110 each have VMs that connect to logical ports. While
shown as VMs in these figures, it should be understood that other types of data compute
nodes (e.g., namespaces, etc.) may connect to logical switches in some embodiments.
In some embodiments, in fact, the user may simply configure these VMs as workloads,
allowing the system to determine how to implement the workloads (e.g., as VMs, namespaces,
physical machines, etc.).
[0022] The logical router 115 also includes three ports (referred to as uplinks) that connect
to the external physical network 120. Specifically, each of these three uplinks connects
to the same pair of routers 125 and 130. As mentioned, some embodiments require the
same L3 connectivity for all of the uplinks of a logical router. Other embodiments,
however, allow different uplinks to connect to different sets of external routers,
possibly on different subnets (and thus the uplinks are on different subnets from
each other). In various different embodiments, the three uplinks of the logical router
115 may be on the same VLAN, or different VLANs.
[0023] Figure 2 illustrates the management plane view 200 of the logical network 100. The logical
switches 105 and 110 are the same in this view as the configuration view, but the
network controller has created three service routers 205-215 for the logical router
115, as well as a distributed router 220 and a transit logical switch 225. The DR
220 includes a southbound interface for each of the logical switches 105 and 110,
and a single northbound interface to the transit logical switch 225 (and through this
to the SRs). The SRs 205-215 each include a single southbound interface to the transit
logical switch 220 (used to communicate with the DR 220, as well as each other in
certain situations). Each SR 205-215 also corresponds to an uplink port of the logical
router (that connects to the external network), and thus each of the SRs has a single
such interface. Each of these northbound interfaces connects to both of the physical
routers 125 and 130, as in the configuration view of the logical network 100.
[0024] The detailed configuration of the northbound and southbound interfaces of the various
router constructs 205-220 and their connections with the transit logical switch 225
are described in detail in
U.S. Provisional Application 62/110,061 and
U.S. Patent Application 14/814,473, as well as in
U.S. Patent Application 14/871,968, filed 9/30/2015. In some embodiments, the management plane generates separate routing information
bases (RIBs) for each of the router constructs 205-220. That is, in addition to having
separate objects created in the management/control plane, each of the router constructs
205-220 is treated as a separate router with a separate routing table. Some embodiments
define a subnet for the transit logical switch from a pool of available subnets for
internal use, and define the internal interfaces of the router constructs 205-220
as having IP addresses in that subnet. In addition, the management plane assigns MAC
addresses to each of the internal interfaces. The RIB (and thus the FIB, after RIB
to FIB conversion) for the DR 220 of some embodiments is defined with a default route
pointing to any of the three southbound interfaces of the SRs 205-215 (which the implementation
would choose among using equal-cost multi-path (ECMP) principles). In addition, the
user would typically configure a static default route for the logical router pointing
to the external routers 125 and 130, which would be automatically added to the RIBs
(and thus the FIBs, after RIB to FIB conversion) for each of the three SRs 205-215.
[0025] Figure 3 illustrates a physical implementation of the logical router 100. As shown, each of
the VMs that couples to one of the logical switches 105 and 110 in the logical network
100 resides on a host machine 305. These VMs, though shown in this case on separate
host machines, may reside on fewer than four host machines in some cases (i.e., with
two or more VMs on the same host machine).
[0026] Managed forwarding elements (MFEs) 310 also operate on these host machines 305, in
order to implement the distributed aspects of the logical network 100. These MFEs
310, in some embodiments, are software virtual switches (e.g., Open vSwitch (OVS),
ESX) that operate within the hypervisors or other virtualization software on the host
machines. Though the MFEs are software virtual switches, they may be referred to as
physical forwarding elements in order to differentiate them from the logical forwarding
elements 105-115, which are abstract elements defined as a network configuration,
and which are implemented on the physical forwarding elements. These MFEs 310 perform
first-hop switching and routing to implement the logical switches 105 and 110, and
the logical router 115, for packets sent by the VMs of the logical network 100. The
MFEs 310 (or a subset of them) also may implement logical switches (and distributed
logical routers) for other logical networks if the other logical networks have VMs
that reside on the host machines 305 as well.
[0027] The three SRs 205-215 each operate on different gateway machines 315-320. The gateway
machines 315-325 are host machines similar to the machines 305 in some embodiments
(e.g., x86 boxes), but host SRs rather than user VMs. In some embodiments, MFEs 310
also operate on the gateway machines 315-325, to handle logical switching as well
as routing for the DR 215. For instance, packets sent from the external network 120
may be routed by the SR routing table on one of the gateway machines and then subsequently
switched and routed (according to the DR routing table) by the MFE on the same gateway.
In addition, the MFE provides the connections to the physical NICs on the gateway
machines 315-325. Each of the MFEs 310 in the gateway machines 315-325 connects to
both of the external routers 125 and 130 as well as to the other MFEs that implement
the logical network in the datacenter (e.g., through tunnels). For differentiation
purposes in this figure, tunnels between the edge MFEs (that connect directly to the
user VMs) and gateway MFEs (to which the SR VMs 330-340 directly connect) are shown
as straight dotted lines, while tunnels between the gateway MFEs are shown as orthogonal
solid lines. In addition, the connections from the gateway MFEs to the external routers
125 and 130 are shown as straight dashed/dotted lines.
[0028] The SRs may be implemented in a namespace, a virtual machine, or as a VRF in different
embodiments. In this example, the SRs 205-215 are implemented as virtual machines
330-340. While some embodiments allow two SRs operating in active-standby mode (e.g.,
when the SRs provide stateful services such as firewalls), the examples described
herein operate in active-active mode (enabling ECMP routing for both ingress and egress
traffic).
[0029] As shown, one of the SR VMS (specifically, VM 335 that hosts the SR 220) also hosts
a BGP control plane. This BGP control plane is a BGP protocol stack that (i) receives
routing protocol data from each of the other SRs (when the SRs receive this data from
the external routers 125 and 130) and (ii) updates the routing tables of all of the
SRs using the routing protocol data. In some embodiments, each of the SRs 205-215
open BGP (or other routing protocol) sessions with each of the external routers 125
and 130. The SRs originate their own BGP packets advertising routes (e.g., for the
subnets defined for the logical switches 105 and 110, if public), enabling the routers
125 and 130 to use ECMP routing for packets directed to these subnets.
[0030] In addition, as part of the BGP session, the external routers 125 and 130 send BGP
packets to each of the SRs 205-215, advertising routes for the networks behind them.
For example, the northbound ports of the two routers 125 and 130 might be on different
subnets, and would therefore advertise different administrative distances to the different
subnets. The SR VMs 330 and 340 receive these packets and pass them on to the VM 335,
where the BGP control plane operates. The VM 335 also receives these packets from
the routers 125 and 130, and processes them internally. The BGP protocol stack operating
in the VM 335 uses all of these BGP packets to identify new routes for the SRs 205-215,
and updates its local routing table for SR 210 in addition to sending the routing
table updates to the other VMs 330 and 340.
[0031] In some embodiments, local network controllers (not shown) operate on each of the
gateway host machines, for the purpose of receiving configuration data from a centralized
network controller (e.g., as a set of formatted data tuples) and converting those
data tuples into configuration data useable by the MFE and SR VM. In some embodiments,
the local network controller on a particular one of the gateway machines receives
the RIB for its local SR from the network controller, and converts this into a forwarding
information base (FIB), which it uses to install the routing table on the VM to implement
the SR. In some such embodiments, the BGP control plane operating on the VM 335 sends
an updated RIB to each of these local controllers when updates are received from the
routers 125 and 130. The local controllers then calculate an updated FIB and configure
the routing table of their respective SR VM with the updated routing table.
[0032] In the example shown in
Figures 1-3, the logical router that connects to the external network also connects directly
to the logical switches. In some embodiments, two tiers of logical routers are defined
within a logical network. Provider logical routers (PLRs) provide a connection between
the logical network implemented in a datacenter and the external network, and are
often administered by the owner of the datacenter. Multiple tenant logical routers
(TLRs) may connect to the southbound interfaces of PLRs, allowing different tenants
of a datacenter to configure their own logical routers (and logical switches). In
the two-tiered case of some embodiments, the PLRs implement BGP (or other routing
protocols) in the manner described herein, in order to exchange routes with the external
network. In some such cases, the logical switches that connect to the TLRs may be
public subnets, and the PLR advertises routes for these logical switch subnets. The
two tiers of logical routers are described in further detail in
U.S. Provisional Application 62/110,061 and
U.S. Patent Application 14/814,473.
OPERATION OF SR IN ROUTE SERVER MODE
[0033] As indicated above, in some embodiments a network controller selects multiple gateway
host machines for the multiple SRs of a logical router that interfaces with an external
network. In addition, some embodiments select one of these gateway host machines to
serve as a master routing protocol control plane for all of the SRs. Other embodiments
use an entity external to the gateway host machines (e.g., a central controller) to
act as the master routing protocol control plane for all of the SRs. Each of these
SRs appears to the external network as a separate interface (e.g., a separate line
card), advertising routes to the external network and receiving dynamic routing protocol
information from the external network. However, rather than processing the routing
protocol data themselves, all of the SRs forward the data to the master control plane,
which identifies any updates based on the data, and updates the SR routing tables.
A. SR Configuration
[0034] Figure 4 conceptually illustrates a process 400 of some embodiments for configuring SRs to
implement the uplinks of a logical router, with one of the SRs assigned to operate
a control plane for a dynamic routing protocol (e.g., BGP). In some embodiments, the
process 400 is performed by a network controller that manages the logical router.
That is, in some embodiments, a network control system for managing a network in a
datacenter may include numerous network controllers, with different controllers assigned
to manage different logical networks or different logical forwarding elements. In
this case, the network controller that manages a particular logical router will generate
the configuration data for the logical routing constructs (DR, SRs, transit logical
switch) of the particular logical router and distribute the configuration data to
the host machines that implement the logical router. In some embodiments, the network
controller distributes the configuration data to local controllers operating on the
host machines, that translate the configuration data into a format used to configure
the local software switches or VM routing tables.
[0035] As shown, the process 400 begins by receiving (at 405) a configuration for a logical
router with multiple uplinks connecting to an external physical network. In some embodiments,
a network administrator defines the logical router through a management application
user interface, which in turn generates API commands to the network controller based
on the user configuration. Thus, the network controller receives the logical router
configuration as one or more API commands (e.g., to create a logical router, create
interfaces, create static routes, etc.). In some embodiments, the logical router may
have 0 or more uplinks (e.g., with a maximum of 8, 16, etc. uplinks). A logical router
with 0 uplinks will not communicate with either the external network or other logical
routers; in this case, the router would serve primarily as a means for several logical
switches or other logical routers to communicate with each other.
[0036] Some embodiments require that the uplinks all have the same L3 connectivity, while
other embodiments allow different L3 connectivity for different uplinks. However,
if different uplinks connect to different external routers, then not only will the
different SRs receive different routing protocol information, but a single control
plane would need to create different routing table updates for the different SRs,
and the computation benefits of having only the single control plane will be diminished.
That is, the single routing protocol control plane would perform one set of updates
for a first SR based on routing protocol data received from the routers to which the
first SR connects, then a second set of updates for a second SR based on routing protocol
data received from the routers to which the second SR connects, and so on. However,
when multiple uplinks share the same L3 connectivity, then some embodiments will aggregate
the routing protocol control plane for these SRs, even if other uplinks of the logical
router have different L3 connectivity and run a separate control plane.
[0037] After receiving the configuration, the process 400 defines and configures (at 410)
a DR for the logical router and one SR for each uplink of the logical router. Though
not discussed here in detail, some embodiments allow multiple uplinks to be assigned
to the same SR. In the case that all the uplinks have the same configuration (e.g.,
there are no stateful services defined on any of the uplinks) and the same L3 connectivity,
then assigning two uplinks to the same SR would just result in that SR receiving twice
as much traffic as the other SRs, with no benefit. The definition and configuration
of the DR and SRs, including routing table configuration, is described in further
detail in
U.S. Provisional Patent Application 62/110,061 as well as
U.S. Patent Applications 14/814,473 and
14/871,968.
[0038] The process also selects (at 415) a host machine to host each SR. In some embodiments,
the datacenter includes sets of host machines (e.g., clusters) that are specifically
allocated as gateway host machines, for hosting SRs. Some embodiments allow numerous
SRs (for different logical routers) to be hosted on each gateway host machine, while
other embodiments allow only one (or a small number) of SRs per gateway host machine.
In some embodiments, the network controllers load balance the SRs for numerous logical
routers across the gateway host machines in a cluster. However, when only a single
PLR is defined for a datacenter, then only one SR will be assigned to each gateway
host machine, assuming the SRs for a specific logical router are all assigned to different
host machines.
[0039] After selecting the set of host machines for the SRs, the process 400 selects (at
420) one of the host machines (i.e., one of the host machines selected to host an
SR) to run a dynamic routing protocol control plane for the logical routers. In some
embodiments, this choice is random, or designed to approximate a random distribution
(e.g., by calculating a hash value of a set of configuration inputs and using the
hash value to assign the routing protocol control plane to one of the host machines).
Other embodiments use the locations of the host machines relative to each other, assigning
the routing protocol control plane to the host machine with the shortest distance
to all of the other host machines in the set selected for the SRs. As mentioned, some
embodiments use a controller (e.g., the controller performing the process 400) to
run the dynamic routing protocol control plane instead of one of the host machines
of an SR.
[0040] Having selected host machines and generated the required configuration data, the
process then distributes (at 425) the SR configuration data for the various SRs to
each of the selected host machines and (at 430) the dynamic routing protocol configuration
and SR location information to the particular host machine selected to operate the
dynamic routing protocol control plane. As indicated above, some embodiments distribute
the SR configuration data for a particular SR to a local controller operating on the
host machine to which the SR is assigned. This local controller is responsible for
configuring the SR on the host machine, which may include calculating a routing table
for the SR to use based on a received RIB. The local controller also configures the
MFE on the host machine in some embodiments to implement the DR of the logical router
(based on configuration data received from the centralized network controller), as
well as any other logical forwarding elements in the network (e.g., other logical
routers, logical switches, etc.).
[0041] In some embodiments, the dynamic routing protocol configuration that is distributed
to the selected host machine includes the routing information base for the SRs. If
L3 connectivity is the same for all of the SRs, then the SRs should all have the same
RIB, unless the administrator configured certain static routes to output via a particular
one of the uplinks. The south-facing routes all have the north-bound interface of
the DR as their next hop address, and the north-facing routes should also be the same
in the different SRs. As such, in these situations, the network controller distributes
one RIB for configuration of the dynamic routing protocol control plane, as well as
information indicating the locations of the other SRs to which RIB updates will be
distributed.
[0042] In addition, the network controller distributes configuration data that indicates
to the local controller on the particular host machine that it will be hosting the
routing protocol control plane. As mentioned, in some embodiments the SRs are implemented
as VMs, with the routing protocol operating within the same VM. In other embodiments,
a second VM is instantiated on the host machine to perform the routing protocol operations.
Other embodiments implement the SR in other form factors besides a VM (e.g., as a
VRF directly in the datapath of the MFE, as a namespace or other non-VM data compute
node, etc.). The control plane may operate as a separate VM or other data compute
node in some of these embodiments.
B. Routing Protocol Operation
[0043] Once the SRs are configured, the logical router (and the rest of the logical network)
may begin operations.
Figure 5 conceptually illustrates a portion of a network 500 that will be used throughout
this section as an example. Specifically,
Figure 5 illustrates the start of BGP operations and establishment of adjacency with an external
router over two stages 501-502. In this case, the network 500 includes three SRs 505-515
of a logical router. These three SRs 505-515 operate on separate host machines (e.g.,
as VMs) in a datacenter. For simplicity, the host machines are not shown, nor are
the MFEs that operate on the host machines in some embodiments. The BGP control plane
operates on the SR 510 in this example, based on selection by the network controller
that manages the logical router to which these SRs belong.
[0044] The SRs 505-515 include connections to each other as well as to a physical router
520 that provides a connection to the network external to the datacenter (e.g. to
the Internet). As described above, the connections between the SRs, in some embodiments,
are actually tunnels between the MFEs that operate on the respective host machines
of the SRs. Similarly, the connections between the SRs and the external router also
pass through the MFEs on host machines of the SRs as well (with the MFE handling the
packet delivery to and receipt from the NIC) on these host machines.
[0045] To begin operation and establish adjacency with the external routers, in some embodiments
the SR on which the BGP control plan operates initiates routing protocol sessions
with each external router to which the SRs connect. In the first stage 501 of this
example, the SR 510 sends a BGP Open message 525 to the external router 520, with
its own IP address in the message. In addition, the SR 510 generates BGP Open messages
530 and 535 for the SRs 505 and 515, to be sent to the router. However, these messages
are tunneled to the respective SRs at this stage (the encapsulation is not shown in
the figure). In the second stage 502, the SRs 505 and 515 decapsulate the BGP Open
messages 530 and 535, respectively, and forward these onto the external router 520.
Once the SRs detect that these are BGP packets, they skip any further processing and
forward them on to the peer router, such that they effectively act simply as interfaces
for the single router operating at the SR 510 with the control plane.
[0046] This process assumes that the BGP control plane has negotiated a successful TCP connection
with the external router 520, and thus is in the Connect state of the standard BGP
state machine. After sending the BGP Open messages, then the BGP state machine transitions
to the OpenSent state. In some embodiments, the BGP control plane manages a separate
BGP state machine for each SR, while in other embodiments the BGP control plane manages
a single state machine for its adjacency with the external router. Assuming no errors,
the SRs 505 and 515 would each receive an Open message in return, which they would
forward via tunnel to the SR 510 (which should also receive such a message). The BGP
control plane at SR 510 would then send Keepalive messages to the external router
520 through each SR (transitioning to the OpenConfirm state), and listen for Keepalive
messages from the external router (at which point it would transition to the Established
state, so that routes can be exchanged between the peers).
[0047] The route exchange from the SRs to the external router happens as is normal for BGP.
That is, the SRs send Update messages (or forward messages generated by the control
plane) indicating the reachable subnets, which are those that the user(s) of the logical
network have opted to make public (and for which public IP addresses have been assigned).
These messages indicate the sending SR uplink as the next hop IP address, and have
a low administrative distance (assuming the logical network is all within a single
autonomous system), as they only include routes for the logical network subnets. Even
if a SR connects to multiple routers, in some embodiments the SR will not advertise
routes learned from one router to another router, so as to avoid having to process
traffic not sent to or from the logical network. As such, the BGP updates sent through
the SRs should only change when new public subnets are added to the logical network.
[0048] However, BGP updates may be received regularly from the external routers, as the
external network will generally be more subject to changes that affect the routes
advertised.
Figure 6 conceptually illustrates a process 600 of some embodiments performed by a SR that
does not host the routing protocol control plane upon receipt of a packet. While this
process 600 is performed by the SR, in some embodiments a similar process that discriminates
between data packets for processing by the SR and routing protocol packets may be
performed by the MFE operating the host machine with the SR. For example, if the MFE
is a flow-based virtual switch (e.g., Open vSwitch), some embodiments include flow
entries that match on the fields that indicate that the packet is a BGP (or other
routing protocol) update packet, and automatically forward those packets through a
tunnel to the correct host machine. In other embodiments, the MFE forwards the packet
to the SR based on its destination address, and the SR identifies that the packet
is an update and sends the packet to the correct host machine (via the MFE).
[0049] As shown, the process 600 receives (at 605) a packet at the SR from the external
network. This packet could be a data packet intended for a particular user VM (or
a public IP address that corresponds to multiple user VMs). For instance, if a datacenter
tenant operates a web server in the datacenter, this web server would likely send
and receive large amounts of traffic with clients in the external network. Incoming
traffic would pass through the SR for routing in this case. In addition, external
routers with which the BGP control plane has established adjacency through the SR
will also send BGP packets (e.g., Open messages, Keepalive messages, updates, etc.)
to the SR.
[0050] The process determines (at 610) whether the received packet is a routing protocol
packet. In some embodiments, prior to performing any additional processing, the SR
performs a check to determine whether the packet is a routing protocol packet that
should be passed along to the SR that runs the control plane for the routing protocol.
BGP packets (or packets for other routing protocols) will (i) have a destination address
of the SR itself, rather than a workload in the logical network (e.g., a user VM)
and (ii) identify the routing protocol in its headers. Thus, Update, Open, Keepalive,
etc. messages will be received by the SR (when the routing protocol is BGP), and should
be forwarded to the control plane (as they relate to the establishment and maintenance
of the peering).
[0051] Thus, when the received packet is not a routing protocol packet, the process 600
processes (at 615) the packet at the SR. If the packet is a packet for another routing
protocol maintained at the SR, the SR performs the actions required based on such
a packet. The packet could also be a standard data packet (e.g., a TCP segment, UDP
datagram, etc.), in which case the SR routes the packet according to its FIB and performs
any other required processing.
[0052] On the other hand, when the packet is a routing protocol packet, the process forwards
(at 620) the packet through a tunnel to the host machine at which the routing protocol
control plane operates. That is, once the SR identifies that the packet is a BGP packet
(by looking at its header), the SR encapsulates and forwards the packet without any
further processing. In some embodiments, the SR is configured to modify the destination
IP and/or MAC address of the packet to be that of the SR with the control plane. The
SR then sends the packet back to its local MFE, which tunnels the packet to the MFE
at the remote host machine where the routing protocol control plane resides. The process
then ends.
[0053] Figure 7 illustrates the receipt of BGP updates by the three SRs 505-515 over two stages 705
and 710. In the first stage 705, the external physical router 520 sends BGP updates
to the three SRs 505-515. The first SR 505 receives an update 715 with information
about a route for the prefix 1.1.10.0/28, while the second and third SRs 510 and 515
receive updates 720 and 725 respectively, which both provide the same information
about the prefix 1.1.11.0/28. As with any standard BGP update, these provide information
about the reachability of the indicated IP prefixes, noting the number of hops (or
number of autonomous systems) needed to reach the IP address through the router that
sends the update message (i.e., the router 520).
[0054] As shown in the second stage 710, the first SR 505 and third SR 515 send their updates
packets 715 and 725 to the second SR 510 that operates the BGP control plane for the
three SRs. That is, because these SRs do not process dynamic routing updates to the
routing table themselves, they do not do anything with the packets 715 and 725 beyond
forwarding them to the SR 510. The SR 510 does not have to forward the packet 720
that it receives from the external router 520, as it will process the packet internally
(along with the other updates that it receives from the other SRs).
[0055] Figure 8 conceptually illustrates a process 800 of some embodiments performed by the dynamic
routing protocol control plane operating as a route server to update routing tables
for all of the SRs of a logical router. This process may be performed at one of the
SRs of the logical router (e.g., the SR 510 in the above example) in some embodiments,
or at a central controller that manages the SRs in other embodiments. When the routing
protocol control plane operates within the SR, this process is performed by the SR
itself. However, the process could also be performed by a separate VM that operates
on the same host machine as the SR to perform the routing protocol control plane operations
in some embodiments. Furthermore, in other embodiments, the local network controller
on the host machine performs the routing protocol control plane operations.
[0056] As shown, the process 800 begins by receiving (at 805) a routing protocol update
packet at the protocol control plane. This could be a packet received directly from
an external router (e.g., the packet 720) or a packet received by a different SR and
forwarded to the SR that runs the routing protocol control plane. For the BGP protocol,
the update packets indicate a routable prefix (or prefixes) for which data traffic
can be sent to the router from which the packet was received, the autonomous system
number of the sending router, and the reachability distance for each routable prefix.
[0057] Based on the received update packet, the process updates (at 810) its routing table.
As described above by reference to
Figure 4, in some embodiments the routing protocol control plane is configured with an initial
routing table (i.e., RIB) for the SRs generated by the centralized network controller.
As the routing protocol control plane learns routes from external routers, it updates
this routing table.
[0058] When a new route is received, the control plane of some embodiments determines whether
the routing table already has a route with the same prefix and next hop (i.e., whether
it has already received an advertisement for the same prefix from the same external
router). When this is the case, the control plane updates this route entry to reflect
the new data (e.g., a different administrative distance). If the control plane routing
table has a route for the same prefix but with a different next hop, then in some
embodiments it stores both of the routes (as the routes might both be used if the
administrative distance is the same). However, some embodiments also identify an optimal
path for the route by choosing the route with the lowest cost (e.g., lowest administrative
distance). Thus, when the SRs connect to multiple external routers, the computation
savings of maintaining the one control plane to compare routes for the same prefix
is increased. By performing all of the updates at the single routing protocol control
plane, the duplicative updates need not be processed separately.
[0059] The process then distributes (at 815) the updated routing configuration (i.e., the
routing table updates) to all of the SRs. This includes the SR that is local to the
control plane performing the process 800, any remote SRs that sent updates to the
control plane, and any other remote SRs for the logical router. In some embodiments,
the routing protocol control plane provides the updated routing table to the local
network controllers at all of the gateway host machines that host the SRs, which allows
these to calculate updated FIBs with which to provision their respective local SRs.
In other embodiments, the SR itself (e.g., the VM) performs the route traversal process
to generate the FIB based on the updated RIB. In still other embodiments, the routing
protocol control plane actually performs the route traversal process to generate an
updated FIB, and this is what is then distributed to each of the SRs.
[0060] In addition to distributing the updated routing configuration to the SRs, the process
800 also provides (at 820) the updated route information to the centralized network
controller for incorporation into the routing table of the DR of the logical router,
and subsequent distribution to the MFEs that implement the DR (including the MFEs
on the gateway host machines at which the SRs reside). Some embodiments use the local
controller on the gateway host machine that runs the routing protocol control plane
to pass this information up to the network controller that manages the logical router,
which incorporates the route updates into the RIB for the DR. This information is
then sent to the local controllers at the various host machines that implement the
logical network (e.g., the machines 305 in
Figure 3), which configure the MFEs that implement the DR (e.g., the MFEs 310). In addition,
the centralized controller sends the information regarding the DR routing table to
the local controllers at the host machines with the SR, which configure the MFEs there
that also implement the DR.
[0061] Figure 9 illustrates the data distributed by the routing protocol control plane operating
on the SR 510 to the other SRs 505 and 515, based on the updates received in
Figure 7. Specifically, as shown in
Figure 7, the routing protocol control plane received updates regarding the prefixes 1.1.10.0/28
and 1.1.11.0/28. The BGP control plane then determines whether these updates reflect
new information, and if so performs the computations to update its routing table.
For example, in this case the control plane discards the duplicative updates for the
route 1.1.11.0/28, and adds new routes for the two prefixes. The BGP control plane
then distributes these updates to the SRs 505-515. Specifically, in some embodiments
the BGP control plane distributes these updates to the local controllers (not shown)
operating on the host machines on which these SRs reside. These local network controllers
then recalculate the FIB for their respective SRs, and configure their SRs with the
new routing configuration.
[0062] Figure 10 illustrates the processing of a packet 1000 by one of the SRs 505-515 using the newly
distributed routing information, over two stages 1005 and 1010. As shown, in the first
stage 1005 the SR 505 receives a packet sent by a user VM 1015 (e.g., a VM logically
attached to a logical switch that in turn attaches to the logical router to which
the SRs 505-515 belong). In order for the SR 505 to receive the packet 1000, in some
embodiments, the user VM sends the packet to its local MFE, which performs first-hop
processing on the packet. This first-hop processing at the MFE process the packet
through pipelines for the logical switch to which the VM connects, then the DR, and
then the transit logical switch. The transit logical switch identifies the southbound
SR interface as the destination for the packet, and thus identifies to tunnel the
packet to the host machine on which the SR 505 resides. The MFE local to the SR 505
then completes the transit logical switch processing to deliver the packet to the
SR. The data processing pipelines of some embodiments are described in greater detail
in
U.S. Provisional Application 62/110,061 and
U.S. Patent Application 14/814,473.
[0063] Once the SR 505 receives the packet 1000, it routes the packet according to its routing
table. In this case, the routing table now has a route indicating that packets with
destination IPs in the range 1.1.11.0/28 should be sent to the external router 520.
As such, in the second stage 1010, the SR sends the packet (through its local MFE
again, in some embodiments) out of the physical interface that corresponds to the
uplink, to the external router. In this example, with only a single external router,
the dynamic routing would most likely not be needed, as the logical router (and thus
the SRs) would typically be configured with a default static route (i.e., for 0.0.0.0/0)
to send all otherwise unrouted packets to the external router. However, when the SRs
connect to multiple routers, then the default route for a particular SR might point
to a first one of the routers, whereas the route for a specific subnet (such as 1.1.11.0/28)
might point to a second router.
C. Failover of Control Plane
[0064] With the routing protocol control plane running on only one of several SRs of a logical
router, but controlling the other SRs, failure of the control plane affects the other,
still-operating SRs. In the case that each SR operates its own BGP (or other protocol)
control plane, then failure of the BGP process on a particular SR simply means that
the SR will not attract traffic from the external routers, and the other SRs will
receive additional traffic. Similarly, the failure of the SR itself will result in
the other SRs for the logical router taking over the ingress and egress traffic, as
well as any policies configured on the uplink implemented by the failed SR. More detailed
failure scenarios are described in
U.S. Provisional Application 62/110,061 and
U.S. Patent Application 14/814,473.
[0065] When the SR that operates the control plane fails, some embodiments select one of
the other SRs to operate the routing protocol control plane. As described above, the
routing protocol process already runs on the other SRs in order to establish adjacencies
with the external routers; however, these processes do not store the routing table
to update based on incoming routes. Instead, as described in the previous sections,
the routing table is only updated by the protocol control plane that operates on one
of the SRs. Instead, the newly selected SR begins to update the control plane by using
the graceful restart capability of most routing protocols. That is, all of the SRs
re-establish their adjacencies as though they had crashed and restarted, which causes
the external router to re-send all of their routes to the SRs, thereby enabling the
new protocol control plane to quickly build up its routing table. In other embodiments,
the backup SRs also run the routing protocol control plane, but use higher costs when
sending out updates. This way, the external physical router will already have the
routes for an adjacency with the other SR (or other SRs) as the master control plane,
but will not use these routes due to the higher cost until the adjacency with the
original master is lost.
[0066] Figure 11 illustrates the failover of the BGP control plane for the SRs 505-515 over two stages
1105 and 1110. As shown in the first stage 1105, the SR 510 that operates the BGP
control plane for the three SRs has failed. This may be due to the VM crashing, the
entire gateway host crashing, one or more of the tunnels that connects the SR to the
other SRs (or the user VMs) going down, the connection to the physical network going
down, etc.
[0067] At this point, the other two SRs 505 and 515 identify that the second SR 510 has
crashed, and that not only do they need to take over its interfaces, but also the
BGP control plane. Some embodiments use a ranking system to identify which of the
other SRs takes over for a failed SR. In some embodiments, each of the SRs is assigned
a ranking at the time they are set up (e.g., by the management plane running in a
centralized controller). The SR with the next highest ranking from the failed SR then
takes over its interfaces, as well as the routing protocol control plane. In this
case, the first SR 505 has the next highest ranking compared to that of the failed
SR 510, and therefore takes over the BGP control plane.
[0068] Therefore, as shown at the second stage 1110, the VM for the first SR 505 now operates
the BGP control plane for the two remaining SRs. In some embodiments, the local controller
on the host machine where the SR 505 resides identifies the failure of the SR 510
and configures the control plane process to begin running on the VM. In addition,
the local controllers on both of the host machines for the remaining SRs 505 and 515
initiate the restart process for their respective routing protocol processes. Thus,
as shown, the two SRs 505 and 515 re-establish adjacency with the external router
520 by sending new BGP Open messages. These messages include a restart state bit that
indicates this is a graceful restart. In some embodiments, this induces the router
520 to send its full list of routes to each of the SRs, allowing for the control plane
operating at the first SR to update its routing table.
III. ELECTRONIC SYSTEM
[0069] Many of the above-described features and applications are implemented as software
processes that are specified as a set of instructions recorded on a computer readable
storage medium (also referred to as computer readable medium). When these instructions
are executed by one or more processing unit(s) (e.g., one or more processors, cores
of processors, or other processing units), they cause the processing unit(s) to perform
the actions indicated in the instructions. Examples of computer readable media include,
but are not limited to, CD-ROMs, flash drives, RAM chips, hard drives, EPROMs, etc.
The computer readable media does not include carrier waves and electronic signals
passing wirelessly or over wired connections.
[0070] In this specification, the term "software" is meant to include firmware residing
in read-only memory or applications stored in magnetic storage, which can be read
into memory for processing by a processor. Also, in some embodiments, multiple software
inventions can be implemented as sub-parts of a larger program while remaining distinct
software inventions. In some embodiments, multiple software inventions can also be
implemented as separate programs. Finally, any combination of separate programs that
together implement a software invention described here is within the scope of the
invention. In some embodiments, the software programs, when installed to operate on
one or more electronic systems, define one or more specific machine implementations
that execute and perform the operations of the software programs.
[0071] Figure 12 conceptually illustrates an electronic system 1200 with which some embodiments of
the invention are implemented. The electronic system 1200 can be used to execute any
of the control, virtualization, or operating system applications described above.
The electronic system 1200 may be a computer (e.g., a desktop computer, personal computer,
tablet computer, server computer, mainframe, a blade computer etc.), phone, PDA, or
any other sort of electronic device. Such an electronic system includes various types
of computer readable media and interfaces for various other types of computer readable
media. Electronic system 1200 includes a bus 1205, processing unit(s) 1210, a system
memory 1225, a read-only memory 1230, a permanent storage device 1235, input devices
1240, and output devices 1245.
[0072] The bus 1205 collectively represents all system, peripheral, and chipset buses that
communicatively connect the numerous internal devices of the electronic system 1200.
For instance, the bus 1205 communicatively connects the processing unit(s) 1210 with
the read-only memory 1230, the system memory 1225, and the permanent storage device
1235.
[0073] From these various memory units, the processing unit(s) 1210 retrieve instructions
to execute and data to process in order to execute the processes of the invention.
The processing unit(s) may be a single processor or a multi-core processor in different
embodiments.
[0074] The read-only-memory (ROM) 1230 stores static data and instructions that are needed
by the processing unit(s) 1210 and other modules of the electronic system. The permanent
storage device 1235, on the other hand, is a read-and-write memory device. This device
is a non-volatile memory unit that stores instructions and data even when the electronic
system 1200 is off. Some embodiments of the invention use a mass-storage device (such
as a magnetic or optical disk and its corresponding disk drive) as the permanent storage
device 1235.
[0075] Other embodiments use a removable storage device (such as a floppy disk, flash drive,
etc.) as the permanent storage device. Like the permanent storage device 1235, the
system memory 1225 is a read-and-write memory device. However, unlike storage device
1235, the system memory is a volatile read-and-write memory, such a random access
memory. The system memory stores some of the instructions and data that the processor
needs at runtime. In some embodiments, the invention's processes are stored in the
system memory 1225, the permanent storage device 1235, and/or the read-only memory
1230. From these various memory units, the processing unit(s) 1210 retrieve instructions
to execute and data to process in order to execute the processes of some embodiments.
[0076] The bus 1205 also connects to the input and output devices 1240 and 1245. The input
devices enable the user to communicate information and select commands to the electronic
system. The input devices 1240 include alphanumeric keyboards and pointing devices
(also called "cursor control devices"). The output devices 1245 display images generated
by the electronic system. The output devices include printers and display devices,
such as cathode ray tubes (CRT) or liquid crystal displays (LCD). Some embodiments
include devices such as a touchscreen that function as both input and output devices.
[0077] Finally, as shown in
Figure 12, bus 1205 also couples electronic system 1200 to a network 1265 through a network
adapter (not shown). In this manner, the computer can be a part of a network of computers
(such as a local area network ("LAN"), a wide area network ("WAN"), or an Intranet,
or a network of networks, such as the Internet. Any or all components of electronic
system 1200 may be used in conjunction with the invention.
[0078] Some embodiments include electronic components, such as microprocessors, storage
and memory that store computer program instructions in a machine-readable or computer-readable
medium (alternatively referred to as computer-readable storage media, machine-readable
media, or machine-readable storage media). Some examples of such computer-readable
media include RAM, ROM, read-only compact discs (CD-ROM), recordable compact discs
(CD-R), rewritable compact discs (CD-RW), read-only digital versatile discs (e.g.,
DVD-ROM, dual-layer DVD-ROM), a variety of recordable/rewritable DVDs (e.g., DVD-RAM,
DVD-RW, DVD+RW, etc.), flash memory (e.g., SD cards, mini-SD cards, micro-SD cards,
etc.), magnetic and/or solid state hard drives, read-only and recordable Blu-Ray®
discs, ultra density optical discs, any other optical or magnetic media, and floppy
disks. The computer-readable media may store a computer program that is executable
by at least one processing unit and includes sets of instructions for performing various
operations. Examples of computer programs or computer code include machine code, such
as is produced by a compiler, and files including higher-level code that are executed
by a computer, an electronic component, or a microprocessor using an interpreter.
[0079] While the above discussion primarily refers to microprocessor or multi-core processors
that execute software, some embodiments are performed by one or more integrated circuits,
such as application specific integrated circuits (ASICs) or field programmable gate
arrays (FPGAs). In some embodiments, such integrated circuits execute instructions
that are stored on the circuit itself.
[0080] As used in this specification, the terms "computer", "server", "processor", and "memory"
all refer to electronic or other technological devices. These terms exclude people
or groups of people. For the purposes of the specification, the terms display or displaying
means displaying on an electronic device. As used in this specification, the terms
"computer readable medium," "computer readable media," and "machine readable medium"
are entirely restricted to tangible, physical objects that store information in a
form that is readable by a computer. These terms exclude any wireless signals, wired
download signals, and any other ephemeral signals.
[0081] This specification refers throughout to computational and network environments that
include virtual machines (VMs). However, virtual machines are merely one example of
data compute nodes (DCNs) or data compute end nodes, also referred to as addressable
nodes. DCNs may include non-virtualized physical hosts, virtual machines, containers
that run on top of a host operating system without the need for a hypervisor or separate
operating system, and hypervisor kernel network interface modules.
[0082] VMs, in some embodiments, operate with their own guest operating systems on a host
using resources of the host virtualized by virtualization software (e.g., a hypervisor,
virtual machine monitor, etc.). The tenant (i.e., the owner of the VM) can choose
which applications to operate on top of the guest operating system. Some containers,
on the other hand, are constructs that run on top of a host operating system without
the need for a hypervisor or separate guest operating system. In some embodiments,
the host operating system uses name spaces to isolate the containers from each other
and therefore provides operating-system level segregation of the different groups
of applications that operate within different containers. This segregation is akin
to the VM segregation that is offered in hypervisor-virtualized environments that
virtualize system hardware, and thus can be viewed as a form of virtualization that
isolates different groups of applications that operate in different containers. Such
containers are more lightweight than VMs.
[0083] Hypervisor kernel network interface modules, in some embodiments, is a non-VM DCN
that includes a network stack with a hypervisor kernel network interface and receive/transmit
threads. One example of a hypervisor kernel network interface module is the vmknic
module that is part of the ESXi™ hypervisor of VMware, Inc.
[0084] It should be understood that while the specification refers to VMs, the examples
given could be any type of DCNs, including physical hosts, VMs, non-VM containers,
and hypervisor kernel network interface modules. In fact, the example networks could
include combinations of different types of DCNs in some embodiments.
[0085] While the invention has been described with reference to numerous specific details,
one of ordinary skill in the art will recognize that the invention can be embodied
in other specific forms. In addition, a number of the figures (including
Figures 4,
6, and
8) conceptually illustrate processes. The specific operations of these processes may
not be performed in the exact order shown and described. The specific operations may
not be performed in one continuous series of operations, and different specific operations
may be performed in different embodiments. Furthermore, the process could be implemented
using several sub-processes, or as part of a larger macro process. Thus, one of ordinary
skill in the art would understand that the invention is not to be limited by the foregoing
illustrative details, but rather is to be defined by the appended claims.
[0086] In one example there has been described a method for configuring a logical router
that interfaces with an external network, the method comprising: receiving a configuration
for a logical network comprising a logical router with a plurality of interfaces that
connect to at least one physical router external to the logical network; selecting
a separate host machine to host a centralized routing component for each of the interfaces;
and selecting a particular one of the host machines for operating a dynamic routing
protocol control plane that receives routing protocol data from each of the centralized
routing components and updates routing tables of each of the centralized routing components.
[0087] In one example, the configuration for the logical network comprises a set of logical
switch subnets to advertise via the dynamic routing protocol to the at least one physical
router. In one example, each of the interfaces connects to a same set of external
physical routers. In one example, the plurality of separate host machines are located
within a cluster of host machines designated for hosting centralized routing components
of logical routers. In one example the method further comprises generating an initial
routing table for each of the centralized routing components. In one example, the
centralized routing components operate on the host machines as virtual machines. In
one example, the dynamic routing protocol control plane operates on the virtual machine
operating on the particular host machine. In one example, the dynamic routing protocol
control plane operates on a second virtual machine operating on the particular host
machine separate from the centralized routing component that operates on the particular
host machine. In one example, when a centralized routing component operating on a
first host machine that is not the particular host machine receives a routing protocol
packet from an external physical router, the first centralized routing component forwards
the packet to the particular host machine through a tunnel between the first host
machine and the particular host machine. In one example, upon receiving a routing
protocol packet from at least one of (i) an external physical router and (ii) a centralized
routing component forwarding the routing protocol packet from an external physical
router, the dynamic routing protocol control plane calculates updates to the routing
tables of each of the centralized routing components and distributes the updates to
the centralized routing components. In one example, the particular host machine is
designated as an active machine for the dynamic routing protocol control plane, wherein
each of the other host machines are designated as standby machines for the dynamic
routing protocol control plane. In one example, when the particular host machine crashes,
a designated one of the standby machines operates the dynamic routing protocol control
plane as an active machine.