Spanning Tree Protocol
The Spanning Tree Protocol is a network protocol that builds a loop-free logical topology for Ethernet networks. The basic function of STP is to prevent bridge loops and the broadcast radiation that results from them. Spanning tree also allows a network design to include backup links providing fault tolerance if an active link fails.
As the name suggests, STP creates a spanning tree that characterizes the relationship of nodes within a network of connected layer-2 bridges, and disables those links that are not part of the spanning tree, leaving a single active path between any two network nodes. STP is based on an algorithm that was invented by Radia Perlman while she was working for Digital Equipment Corporation.
In 2001, the IEEE introduced Rapid Spanning Tree Protocol as 802.1w. RSTP provides significantly faster recovery in response to network changes or failures, introducing new convergence behaviors and bridge port roles to do this. RSTP was designed to be backwards-compatible with standard STP.
STP was originally standardized as IEEE 802.1D but the functionality of spanning tree, rapid spanning tree, and multiple spanning tree has since been incorporated into IEEE 802.1Q-2014.
Protocol operation
The need for the Spanning Tree Protocol arose because switches in local area networks are often interconnected using redundant links to improve resilience should one connection fail. However, this connection configuration creates a switching loop resulting in broadcast radiations and MAC table instability. If redundant links are used to connect switches, then switching loops need to be avoided.To avoid the problems associated with redundant links in a switched LAN, STP is implemented on switches to monitor the network topology. Every link between switches, and in particular redundant links, are catalogued. The spanning-tree algorithm then blocks forwarding on redundant links by setting up one preferred link between switches in the LAN. This preferred link is used for all Ethernet frames unless it fails, in which case a non-preferred redundant link is enabled. When implemented in a network, STP designates one layer-2 switch as root bridge. All switches then select their best connection towards the root bridge for forwarding and block other redundant links. All switches constantly communicate with their neighbors in the LAN using Bridge Protocol Data Units.
Provided there is more than one link between two switches, the STP root bridge calculates the cost of each path based on bandwidth. STP will select the path with the lowest cost, that is the highest bandwidth, as the preferred link. STP will enable this preferred link as the only path to be used for Ethernet frames between the two switches, and disable all other possible links by designating the switch ports that connect the preferred path as root port.
After STP enabled switches in a LAN have elected the root bridge, all non-root bridges assign one of their ports as root port. This is either the port that connects the switch to the root bridge, or if there are several paths, the port with the preferred path as calculated by the root bridge. Because not all switches are directly connected to the root bridge they communicate amongst each other using STP Bridge Protocol Data Units. Each switch adds the cost of its own path to the cost received from the neighboring switches to determine the total cost of a given path to the root bridge. Once the cost of all possible paths to the root bridge have been added up, each switch assigns a port as root port which connects to the path with the lowest cost, or highest bandwidth, that will eventually lead to the root bridge.
Path cost
The STP path cost default was originally calculated by the formula. When faster speeds became available, the default values were adjusted as otherwise speeds above 1 Gbit/s would have been indistinguishable by STP. Its successor RSTP uses a similar formula with a larger numerator:. These formulas lead to the sample values in the table.Port states
All switch ports in the LAN where STP is enabled are categorized.; Blocking: A port that would cause a switching loop if it were active. To prevent the use of looped paths, no user data is sent or received over a blocking port. BPDU data is still received in blocking state. A blocked port may go into forwarding mode if the other links in use fail and the spanning tree algorithm determines the port may transition to the forwarding state.
; Listening: The switch processes BPDUs and awaits possible new information that would cause it to return to the blocking state. It does not populate the MAC table and it does not forward frames.
; Learning: While the port does not yet forward frames, it does learn source addresses from frames received and adds them to the MAC table.
; Forwarding: A port in normal operation receiving and forwarding frames. The port monitors incoming BPDUs that would indicate it should return to the blocking state to prevent a loop.
; Disabled: A network administrator has manually disabled the switch port.
When a device is first attached to a switch port, it will not immediately start to forward data. It will instead go through a number of states while it processes BPDUs and determines the topology of the network. The port attached to a host such as a computer, printer or server always goes into the forwarding state, albeit after a delay of about 30 seconds while it goes through the listening and learning states. The time spent in the listening and learning states is determined by a value known as the forward delay. If another switch is connected, the port may remain in blocking mode if it is determined that it would cause a loop in the network. Topology Change Notification BPDUs are used to inform other switches of port changes. TCNs are injected into the network by a non-root switch and propagated to the root. Upon receipt of the TCN, the root switch will set the Topology Change flag in its normal BPDUs. This flag is propagated to all other switches and instructs them to rapidly age out their forwarding table entries.
Configuration
Before configuring STP, the network topology should be carefully planned. Basic configuration requires that STP be enabled on all switches in the LAN and the same version of STP chosen on each. The administrator may determine which switch will be the root bridge and configure the switches appropriately. If the root bridge goes down, the protocol will automatically assign a new root bridge based on bridge ID. If all switches have the same bridge ID, such as the default ID, and the root bridge goes down, a tie situation arises and the protocol will assign one switch as root bridge based on the switch MAC addresses. Once the switches have been assigned a bridge ID and the protocol has chosen the root bridge switch, the best path to the root bridge is calculated based on port cost, path cost and port priority. Ultimately STP calculates the path cost on the basis of the bandwidth of a link, however links between switches may have the same bandwidth. Administrators can influence the protocol's choice of the preferred path by configuring the port cost, the lower the port cost the more likely it is that the protocol will choose the connected link as root port for the preferred path. The selection of how other switches in the topology choose their root port, or the least cost path to the root bridge, can be influenced by the port priority. The highest priority will mean the path will ultimately be less preferred. If all ports of a switch have the same priority, the port with the lowest number is chosen to forward frames.Root bridge and the bridge ID
The root bridge of the spanning tree is the bridge with the smallest bridge ID. Each bridge has a configurable priority number and a MAC address; the bridge ID is the concatenation of the bridge priority and the MAC address. For example, the ID of a bridge with priority 32768 and MAC is. The bridge priority default is 32768 and can only be configured in multiples of 4096. When comparing two bridge IDs, the priority portions are compared first and the MAC addresses are compared only if the priorities are equal. The switch with the lowest priority of all the switches will be the root; if there is a tie, then the switch with the lowest priority and lowest MAC address will be the root. For example, if switches A and B both have a priority of 32768 then switch A will be selected as the root bridge. If the network administrators would like switch B to become the root bridge, they must set its priority to be less than 32768.Path to the root bridge
The sequence of events to determine the best received BPDU is:- Lowest root bridge ID - Determines the root bridge
- Lowest cost to the root bridge - Favors the upstream switch with the least cost to root
- Lowest sender bridge ID - Serves as a tie breaker if multiple upstream switches have equal cost to root
- Lowest sender port ID - Serves as a tie breaker if a switch has multiple links to a single upstream switch, where:
- *Bridge ID = priority + locally assigned system ID extension + ID ; the default bridge priority is 32768, and
- *Port ID = priority + ID ; the default port priority is 128.
Breaking ties in selecting the path to the root bridge
Breaking ties for designated ports: When the root bridge has more than one port on a single LAN segment, the bridge ID is effectively tied, as are all root path costs. The designated port then becomes the port on that LAN segment with the lowest port ID. It's put into Forwarding mode while all other ports on the root bridge on that same LAN segment become non-designated ports and are put into blocking mode. Not all bridge/switch manufacturers follow this rule, instead making all root bridge ports designated ports, and putting them all in forwarding mode. A final tie-breaker is required as noted in the section "The final tie-breaker."
When more than one bridge on a segment leads to a least-cost path to the root, the bridge with the lower bridge ID is used to forward messages to the root. The port attaching that bridge to the network segment is the designated port for the segment. In the diagram on the right there are two least cost paths from network segment d to the root, one going through bridge 24 and the other through bridge 92. The lower bridge ID is 24, so the tie breaker dictates that the designated port is the port through which network segment d is connected to bridge 24. If bridge IDs were equal, then the bridge with the lowest MAC address would have the designated port. In either case, the loser sets the port as being blocked.
The final tie-breaker. In some cases, there may still be a tie, as when the root bridge has multiple active ports on the same LAN segment with equally low root path costs and bridge IDs, or, in other cases, multiple bridges are connected by multiple cables and multiple ports. In each case, a single bridge may have multiple candidates for its root port. In these cases, candidates for the root port have already received BPDUs offering equally-low root path costs and equally-low bridge IDs, and the final tie breaker goes to the port that received the lowest port priority ID, or port ID.
Bridge Protocol Data Units
The above rules describe one way of determining what spanning tree will be computed by the algorithm, but the rules as written require knowledge of the entire network. The bridges have to determine the root bridge and compute the port roles with only the information that they have. To ensure that each bridge has enough information, the bridges use special data frames called Bridge Protocol Data Units to exchange information about bridge IDs and root path costs.A bridge sends a BPDU frame using the unique MAC address of the port itself as a source address, and a destination address of the STP multicast address.
There are two types of BPDUs in the original STP specification :
- Configuration BPDU, used for Spanning Tree computation
- Topology Change Notification BPDU, used to announce changes in the network topology
Bridge Protocol Data Unit fields
IEEE 802.1D and IEEE 802.1aq BPDUs have the following format:1. Protocol ID: 2 bytes
2. Version ID: 1 byte
3. BPDU Type: 1 byte
4. Flags: 1 byte
bits : usage
1 : 0 or 1 for Topology Change
2 : 0 or 1 for Proposal in RST/MST/SPT BPDU
3-4 : 00 or
01 for Port Role Alternate/Backup in RST/MST/SPT BPDU
10 for Port Role Root in RST/MST/SPT BPDU
11 for Port Role Designated in RST/MST/SPT BPDU
5 : 0 or 1 for Learning in RST/MST/SPT BPDU
6 : 0 or 1 for Forwarding in RST/MST/SPT BPDU
7 : 0 or 1 for Agreement in RST/MST/SPT BPDU
8 : 0 or 1 for Topology Change Acknowledgement
5. Root ID: 8 bytes
bits : usage
1-4 : Root Bridge Priority
5-16 : Root Bridge System ID Extension
17-64 : Root Bridge MAC Address
6. Root Path Cost: 4 bytes
7. Bridge ID: 8 bytes
bits : usage
1-4 : Bridge Priority
5-16 : Bridge System ID Extension
17-64 : Bridge MAC Address
8. Port ID: 2 bytes
9. Message Age: 2 bytes in 1/256 secs
10. Max Age: 2 bytes in 1/256 secs
11. Hello Time: 2 bytes in 1/256 secs
12. Forward Delay: 2 bytes in 1/256 secs
13. Version 1 Length: 1 byte
14. Version 3 Length: 2 bytes
The TCN BPDU includes fields 1-3 only.
Spanning Tree Protocol standards
The first spanning tree protocol was invented in 1985 at the Digital Equipment Corporation by Radia Perlman. In 1990, the IEEE published the first standard for the protocol as 802.1D, based on the algorithm designed by Perlman. Subsequent versions were published in 1998 and 2004, incorporating various extensions. The original Perlman-inspired Spanning Tree Protocol, called DEC STP, is not a standard and differs from the IEEE version in message format as well as timer settings. Some bridges implement both the IEEE and the DEC versions of the Spanning Tree Protocol, but their interworking can create issues for the network administrator, as illustrated by the problem discussed in an on-line Cisco document.Different implementations of a standard are not guaranteed to work, due for example to differences in default timer settings. The IEEE encourages vendors to provide a "Protocol Implementation Conformance Statement", declaring which capabilities and options have been implemented, to help users determine whether different implementations will interwork correctly.
Rapid Spanning Tree Protocol
In 2001, the IEEE introduced Rapid Spanning Tree Protocol as 802.1w. RSTP provides significantly faster spanning tree convergence after a topology change, introducing new convergence behaviors and bridge port roles to do this. RSTP was designed to be backwards-compatible with standard STP.While STP can take 30 to 50 seconds to respond to a topology change, RSTP is typically able to respond to changes within 3 × Hello times or within a few milliseconds of a physical link failure. The Hello time is an important and configurable time interval that is used by RSTP for several purposes; its default value is 2 seconds.
Standard IEEE 802.1D-2004 incorporates RSTP and obsoletes the original STP standard.
Rapid Spanning Tree Operation
RSTP adds new bridge port roles in order to speed convergence following a link failure. The number of states a port can be in has been reduced to three instead of STP's original five.RSTP bridge port roles:
- Root - A forwarding port that is the best port from non-root bridge to root bridge
- Designated - A forwarding port for every LAN segment
- Alternate - An alternate path to the root bridge. This path is different from using the root port
- Backup - A backup/redundant path to a segment where another bridge port already connects
- Disabled - Not strictly part of STP, a network administrator can manually disable a port
- Discarding - No user data is sent over the port
- Learning - The port is not forwarding frames yet, but is populating its MAC-address-table
- Forwarding - The port is fully operational
- Detection of root switch failure is done in 3 hello times, which is 6 seconds if the default hello times have not been changed.
- Ports may be configured as edge ports if they are attached to a LAN that has no other bridges attached. These edge ports transition directly to the forwarding state. RSTP still continues to monitor the port for BPDUs in case a bridge is connected. RSTP can also be configured to automatically detect edge ports. As soon as the bridge detects a BPDU coming to an edge port, the port becomes a non-edge port.
- RSTP calls the connection between two or more switches as a "link-type" connection. A port that operates in full-duplex mode is assumed to be point-to-point link, whereas a half-duplex port is considered a shared port by default. This automatic link type setting can be overridden by explicit configuration. RSTP improves convergence on point-to-point links by reducing the Max-Age time to 3 times Hello interval, removing the STP listening state, and exchanging a handshake between two switches to quickly transition the port to forwarding state. RSTP does not do anything differently from STP on shared links.
- Unlike in STP, RSTP will respond to BPDUs sent from the direction of the root bridge. An RSTP bridge will "propose" its spanning tree information to its designated ports. If another RSTP bridge receives this information and determines this is the superior root information, it sets all its other ports to discarding. The bridge may send an "agreement" to the first bridge confirming its superior spanning tree information. The first bridge, upon receiving this agreement, knows it can rapidly transition that port to the forwarding state bypassing the traditional listening/learning state transition. This essentially creates a cascading effect away from the root bridge where each designated bridge proposes to its neighbors to determine if it can make a rapid transition. This is one of the major elements that allows RSTP to achieve faster convergence times than STP.
- As discussed in the port role details above, RSTP maintains backup details regarding the discarding status of ports. This avoids timeouts if the current forwarding ports were to fail or BPDUs were not received on the root port in a certain interval.
- RSTP will revert to legacy STP on an interface if a legacy version of an STP BPDU is detected on that port.
Spanning Tree Protocol standards for VLANs
Proprietary Spanning Tree VLAN standards
Before the IEEE published a Spanning Tree Protocol standard for VLANs a number of vendors who sold VLAN capable switches developed their own Spanning Tree Protocol versions that were VLAN capable. Cisco developed, implemented and published the Per-VLAN Spanning Tree proprietary protocol using its own proprietary Inter-Switch Link for VLAN encapsulation, and PVST+ which uses 802.1Q VLAN encapsulation. Both standards implement a separate spanning tree for every VLAN. Cisco switches now commonly implement PVST+ and can only implement Spanning Trees for VLANs if the other switches in the LAN implement the same VLAN STP protocol. Very few switches from other vendors support Cisco's various proprietary protocols. HP provides PVST and PVST+ compatibility in some of its network switches. Some devices from Force10 Networks, Alcatel-Lucent, Extreme Networks, Avaya, Brocade Communications Systems and BLADE Network Technologies support PVST+. Extreme Networks does so with two limitations: Lack of support on ports where the VLAN is untagged/native, and also on the VLAN with ID 1. PVST+ can tunnel across an MSTP Region.The switch vendor Juniper Networks in turn developed and implemented its VLAN Spanning Tree Protocol to provide compatibility with Cisco's PVST, so that the switches from both vendors can be included in one LAN. The VSTP protocol is only supported by the EX and MX Series from Juniper Networks. There are two restrictions to the compatibility of VSTP:
- VSTP supports only 253 different spanning-tree topologies. If there are more than 253 VLANs, it is recommended to configure RSTP in addition to VSTP, and VLANs beyond 253 will be handled by RSTP.
- MVRP does not support VSTP. If this protocol is in use, VLAN membership for trunk interfaces must be statically configured .
Cisco also published a proprietary version of Rapid Spanning Tree Protocol. It creates a spanning tree for each VLAN, just like PVST. Cisco refers to this as Rapid Per-VLAN Spanning Tree.
Multiple Spanning Tree Protocol
The Multiple Spanning Tree Protocol, originally defined in IEEE 802.1s-2002 and later merged into IEEE 802.1Q-2005, defines an extension to RSTP to further develop the usefulness of virtual LANs.In the standard a spanning tree that maps one or more VLANs is called multiple spanning tree. If MSTP is implemented a spanning tree can be defined for individual VLANs or for groups of VLANs. Furthermore, the administrator can define alternate paths within a spanning tree. VLANs must be assigned to a so-called multiple spanning tree instance. Switches are first assigned to an MST region, then VLANs are mapped against or assigned to this MST. A Common Spanning Tree is an MST to which several VLANs are mapped, this group of VLANs is called MST Instance. CSTs are backward compatible with the STP and RSTP standard. A MST that has only one VLAN assigned to it is a Internal Spanning Tree.
Unlike some proprietary per-VLAN spanning tree implementations, MSTP includes all of its spanning tree information in a single BPDU format. Not only does this reduce the number of BPDUs required on a LAN to communicate spanning tree information for each VLAN, but it also ensures backward compatibility with RSTP. MSTP does this by encoding additional region information after the standard RSTP BPDU as well as a number of MSTI messages. Each of these MSTI configuration messages conveys the spanning tree information for each instance. Each instance can be assigned a number of configured VLANs and frames assigned to these VLANs operate in this spanning tree instance whenever they are inside the MST region. In order to avoid conveying their entire VLAN to spanning tree mapping in each BPDU, bridges encode an MD5 digest of their VLAN to instance table in the MSTP BPDU. This digest is then used by other MSTP bridges, along with other administratively configured values, to determine if the neighboring bridge is in the same MST region as itself.
MSTP is fully compatible with RSTP bridges, in that an MSTP BPDU can be interpreted by an RSTP bridge as an RSTP BPDU. This not only allows compatibility with RSTP bridges without configuration changes, but also causes any RSTP bridges outside of an MSTP region to see the region as a single RSTP bridge, regardless of the number of MSTP bridges inside the region itself. In order to further facilitate this view of an MST region as a single RSTP bridge, the MSTP protocol uses a variable known as remaining hops as a time to live counter instead of the message age timer used by RSTP. The message age time is only incremented once when spanning tree information enters an MST region, and therefore RSTP bridges will see a region as only one "hop" in the spanning tree. Ports at the edge of an MST region connected to either an RSTP or STP bridge or an endpoint are known as boundary ports. As in RSTP, these ports can be configured as edge ports to facilitate rapid changes to the forwarding state when connected to endpoints.
Shortest path bridging (SPB)
The IEEE approved the IEEE 802.1aq standard May 2012, also known and documented in most books as Shortest Path Bridging. SPB allows redundant links between switches to be active through multiple equal cost paths, and provides much larger layer 2 topologies, faster convergence, and improves the use of the mesh topologies through increased bandwidth between all devices by allowing traffic to load share across all paths on a mesh network.SPB consolidates multiple existing functionalities, including Spanning Tree Protocol, Multiple Spanning Tree Protocol, Rapid Spanning Tree Protocol, Link aggregation, and Multiple MAC Registration Protocol into a one link state protocol. SPB is designed to virtually eliminate human error during configuration and preserves the plug-and-play nature that established Ethernet as the de facto protocol at Layer 2.
System ID Extension
The bridge ID, or BID, is a field inside a BPDU packet. It is eight bytes in length. The first two bytes are the bridge priority, an unsigned integer of 0-65,535. The last six bytes are a MAC address supplied by the bridge. Prior to IEEE 802.1D-2004, the first two bytes gave a 16 bit bridge priority. Since IEEE 802.1D-2004, the first four bits are a configurable priority, and the last twelve bits carry the bridge system ID extension. In the case of MST, the bridge system ID extension carries the MSTP instance number. Some vendors set the bridge system ID extension to carry a VLAN ID allowing a different spanning tree per VLAN, such as Cisco's PVST.Disadvantages and current practice
Spanning tree is an older protocol with a longer default hold-down time that governs convergence of the protocol state. Improper use or implementation can contribute to network disruptions. The idea of blocking links is something that customers these days do not accept as a proper high availability solution. Modern networks can make use of all connected links by use of protocols that inhibit, control or suppress the natural behavior of logical or physical topology loops.Switch virtualization techniques like HPE IRF, Aruba VSF and Cisco VSS combine multiple switches into a single logical entity. A multi-chassis link aggregation group works like a normal LACP trunk, only distributed through multiple switches. Conversely partitioning technologies compartmentalize a single physical chassis into multiple logical entities.
On the edge of the network, loop-detection is configured to prevent accidental loops by users.