www.pa2jfx.nl

QoS stands for Quality of Service. The term is being used by Cisco to refer to IP-based features that allow specification and delivery of services much like the Quality of Service features in ATM.

When you get right down to it, there isn't all that much a router can do to control traffic, since it is not the originator of most of the traffic. The router can drop traffic -- although we'd prefer it didn't do so. It can put some queued frames out an interface before others. It can be selective about accepting traffic -- another form of dropped traffic. And, with TCP, it can selectively drop the occasional packet as an indirect signal to slow down. With cooperative hosts, the router can try to accept reservations and hold bandwidth for applications that need it.

Acronyms, features or topics that fall under QoS include: Priority Queuing (PQ), Custom Queuing (CQ), Fair and Weighted Fair Queuing (WFQ), Random EarlyDetection (RED) and its Distributed, Weighted variant (DWRED), Resource Reservation Protocol (RSVP), Traffic Shaping, Committed Access Rate (CAR), Policy Routing, QoS Policy Propagation via BGP (QPPB), NetFlow, and Cisco Express Forwarding (CEF).

The first set of functions relate to queuing, to managing congestion. They are sometimes referred to as "Fancy Queuing". These include Priority Queuing (PQ), Custom Queuing (CQ), and Weighted Fair Queuing (WFQ). These features allow the router to control which frames are sent first on an interface. If there are too many frames (congestion), then we are, in effect, also selecting which frames get dropped.

These functions, Priority Queuing (PQ), Custom Queuing (CQ), and Weighted Fair Queuing (WFQ), are the subject of this article. They are also discussed as one small part of the Cisco certified ACRC course.

The next feature on the list, Weighted Random Early Detection, is intended to prevent or reduce congestion -- trying to reduce problems, rather than mitigating the consequences once the problem has already occurred.

RSVP allows for applications to reserve bandwidth, primarily WAN bandwidth. It is designed to work with WFQ or Traffic Shaping on the outbound interface.

Traffic Shaping and Committed Access Rate (CAR) control traffic. It seems like a better acronym could have been chosen: CAR controls traffic? Anyway, CAR controls the rate of inbound traffic, allowing specification of what to do with traffic that is coming in faster than policy. Traffic Shaping paces outbound traffic, controlling use of bandwidth. Traffic Shaping also allows matching the speed of the output access link across a WAN cloud, so that a faster central hub access circuit doesn't cause carrier or remote link congestion.

CAR, Policy Routing, and QPPB can also set the IP precedence bits (TOS bits), which are used by some of the above mechanisms to favor some traffic over other traffic.

Finally, NetFlow and CEF are switching techniques used in high-performance routers. They assist in providing QoS by providing efficient packet delivery and statistics on the traffic, statistics to manage traffic flow, trunk sizing, and network design with.

Priority Queuing is the oldest of the queuing techniques. Traffic is prioritized with a priority-list, applied to an interface with a priority-group command. The traffic goes into one of four queues: high, medium, normal, or low priority.When the router is ready to transmit a packet, it searches the high queue for a packet. If there is one, it gets sent. If not, the medium queue is checked. If there is a packet, it is sent. If not, the normal, and finally the low priority queues are checked. For the next packet, the process repeats. If there is enough traffic in the high queue, the other queues may get starved: they never get serviced.

You can regard Priority Queuing as being drastic. It says that the high priority traffic must go out the interface at all costs, and any other traffic can be dropped. It is generally intended for use on low bandwidth links.

To assign traffic meeting certain characteristics to a queue (high, medium, normal, or low), use one of the following commands:

list-number

protocol-name

queue-keyword keyword-value

list-number

interface-type interface-number

The first of these takes a protocol, like ip, ipx, appletalk, rsrb, dlsw, etc., to classify traffic. The queue-keyword can be one of: fragments, gt, lt, list, tcp, and udp. The keyword-value specifies the port for tcp or udp, or the size for gt (greater than) and lt (less than). The word list allows you to specify an access list characterizing the traffic. And fragments means just that, IP fragments (which should probably get expedited handling, so as to not have to retransmit all the fragments again if one is lost).

The second command above is similar, but classifies traffic based on the interface it arrived on.

The list-number is any number in the range 1-16. All statements in one policy use the same number.

To change the default queue for all other traffic:

list-number

To change the queue sizes from the defaults 20, 40, 60, 80 (don't go overboard on this if you see output drops, you may make things worse):

list-number

high-limit medium-limit normal-limit low-limit

To apply the priority queueing policy for outbound packets on an interface:

list-number

Relevant EXEC Commands

show queueing priority

The following configuration sets up a priority list where DLSw traffic goes into the high priority traffic, as does telnet transmissions. The remaining IP that matches access list 101 goes to the medium queue, and any thing else goes in the low queue. (Standard joke: you've planned to send your boss's traffic into the low queue, to make sure the congestion gets noticed). You've mildly upped the default queue sizes. And this policy is in effect for packets being sent out serial 0.

Custom Queuing uses 17 queues to divide up bandwidth on an interface. Queue 0, the system queue, is always serviced first. It is used for keepalives and other critical interface traffic. The remaining traffic can be assigned to queues 1 through 16. These queues are serviced in round-robin fashion.

Here's how it works. Packets are sent from each queue in turn. As each packet is sent, a byte counter is incremented. When the byte counter exceeds the default or configured threshold for the queue, transmission moves on to the next queue. The byte count total for the queue that just finished has the threshold value subtracted from it, so that it starts its next turn penalized by the number of bytes that it went over its quota. This provides additional fairness to the mechanism.

If you think about it, you can't send half of a packet. That's why this mechanism might well exceed quota on any given round of transmission from a queue. But on the next round, the queue is penalized for taking more than it's fair share, so in the long run it averages out.

Custom Queuing is aimed at fair division of bandwidth. For instance, you might set it up to allow IP roughly 50% of a link, DLSw 25%, and IPX 25%. When congestion is taking place, the limits are enforced. If there is unused bandwidth, say from IPX, it is divided equally among any excess traffic from the other classes of traffic, IP and DLSw. To implement this, you would tweak the thresholds for the relevant queues, say making them 3000, 1500, and 1500 bytes respectively. Some fine tuning to average packet MTU size can make this more precise.

The commands for CQ are very similar to those for PQ. The difference is that you put the traffic into queues numbered 1-16, rather than named high, medium, normal, low. Hence we build our CQ policy with:

list-number

protocol-name queue-number queue-keyword keyword-value

list-number

interface-type interface-number

You can specify the default queue, the one that receives any unmatched traffic, with the command:

list-number

queue-number

(The default default queue is 1).

You can specify the number of packets allowed in any queue with the command:

list-number

queue-number

limit-number

The threshold for a queue can be changed with the following command:

list-number

queue-number

byte-count-number

The default threshold for the queues is 1500 bytes.

And the CQ policy is applied to outbound frames on an interface with:

list-number

type number

The following configuration is similar to that for PQ, except that we're not making DLSw and Telnet traffic top priority any more. Instead, we'reusing four (4) queues (since default traffic goes to queue 10). The thresholds are 1500, 1500, 3000, and 1500, so Telnet in queue 3 gets 3000/7500 = 40% of the bandwidth, and the other queues get 20% each.

Weighted fair queueing provides automatically sorts among individual traffic streams without requiring that you first define access lists. It can manage one way or two way streams of data: traffic between pairs of applications or voice and video. It automatically smooths out bursts to reduce average latency.

In WFQ, packets are sorted in weighted order of arrival of the last bit, to determine transmission order. Using order of arrival of last bit emulates the behavior of Time Division Multiplexing (TDM), hence "fair". In Frame Relay, FECN, BECN, and DE bits will cause the weights to be automatically adjusted, slowing flows if needed.

From one point of view, the effect of this is that WFQ classifies sessions as high- or low-bandwidth. Low-bandwidth traffic gets priority, with high-bandwidth traffic sharing what's left over. If the traffic is bursting ahead of the rate at which the interface can transmit, new high-bandwidth traffic gets discarded after the configured or default congestive-messages threshold has been reached. However, low-bandwidth conversations, which include control-message conversations, continue to enqueue data.

Weighted fair queuing uses some parts of the protocol header to determine flow identity. For IP, WFQ uses the Type of Service (TOS) bits, the IP protocol code, the source and destination IP addresses (if not a fragment), and the source and destination TCP or UDP ports.

Distributed WFQ is available in IOS 12.0 on high-end interfaces and router models.

congestive-discard-threshold

dynamic-queues

reservable-queues

congestive-discard-threshold: Number of messages allowed in each queue in the range 1 to 4096, default 64.

dynamic-queues: Number of dynamic queues used for best-effort conversations. Values are 16, 32, 64, 128, 256, 512, 1024, 2048, and 4096. The default is 256.

reservable-queues: Number of reservable queues used for reserved (RSVP) conversations, range 0 to 1000. The default is 0. If RSVP is enabled on a WFQ interface with reservable-queues set to 0, the reservable queue size is automatically set to bandwidth divided by 32 Kbps. Specify a reservable-queue size other than 0 if you wish different behavior.

Fair queuing is enabled by default for physical interfaces whose bandwidth is less than or equal to 2.048 Mbps, except for Link Access Procedure, Balanced (LAPB), X.25, or Synchronous Data Link Control (SDLC) encapsulations. Enabling custom queuing or priority queuing on an interface disables fair queueing. Fair queuing is automatically disabled if you enable autonomous or SSE switching on a 7000 model. Fair queueing is now enabled automatically on multilink PPP interfaces. WFQ is not supported on tunnels.

When congestion occurs, the weight for a class or group specifies the percentage of the output bandwidth allocated to that group. A weight of 60 gives 60% of the bandwidth during congestion periods.

Start by specifying what type of fair queuing is in effect on an interface:

[no] fair-queue [ tos | qos-group ]

If you omit tos and qos-group, you get flow-based WFQ. Otherwise you get TOS (precedence)-based or QoS-group based WFQ on the interface. You then set the total number of buffered packets on the interface. Below this limit, packets will not be dropped. Default is based on bandwidth and memory space available.

<aggregate-limit>

You also specify the limit for each queue. Default is half the aggregate limit.

<individual-limit>

The documentation suggests you not alter the queue limits without a good reason. To specify the depth of queue for a class of traffic:

<0-7>

<0-99>

<queue-limit>

Finally, to specify weight (percentage of the link) for a class of traffic:

<0-7>

<0-99>

<weight>

The percentages on an interface must add up to no more than 99 (percent).

interface

This restores the defaults on a T1 serial link.

The following configuration sets up two QoS groups, 2 and 6, corresponding to precedences 2 and 6. It then specifies WFQ in terms of those two QoS groups.

The following configuration directly specifies WFQ based on precedences 1,2, and 3:

Part of the new IP QoS tool kit in routers and switches is the IP Precedence bits. Devices at the edge of the network may classify traffic as deserving a certain Class of Service using these bits. Core devices can then use the bits to provide differing types of service to different flavors of traffic.

The IP Precedence bits are the first 3 bits of the IP header Type of Service (TOS) field. They are followed by 4 TOS bits and an unused bit. The Cisco documentation relating to QoS seems to sometimes be a bit sloppy as to precedence versus TOS bits, sometimes using TOS to refer to the TOS field in a rather general way. The IETF Diff-Serv group is looking at using the Precedence bits for ISP/carrier purposes, so it is not safe to assume these bits will make it through your service provider unscathed -- they may well change the bits for their own purposes.

Cisco keyword values for precedence: critical, flash, flash-override, immediate, internet, network, priority, routine. One assumes these correspond respectively to numeric values 7 down to 0. The keywords may be found in the descriptions of the IP extended access list variants.

TOS keyword values: max-reliability, max-throughput, min-delay, min-monetary-cost, normal. TOS is not in much use today. It is up to an application to ask for the Type of Service it needs. The standing joke is that by default IP traffic is unreliable, slow, and costly, since none of the relevant TOS bits is usually turned on. Even were applications to use these bits, no-one that I know of is using them to provided differentiated service, although the IP extended access lists in the Cisco IOS do allow you to do so if you wish. This is perhaps somewhat of a chicken-and-egg problem: no reason to use TOS to provide better service if there are no customers, and no reason for applications to set the bits if it isn't going to make a difference.

Random Early Detection (RED) is a high-speed congestion avoidance mechanism. It is not intended as a congestion management mechanism, the way the queuing techniques (PQ, CQ, WFQ) are. It is also more appropriate for long-haul trunks with many traffic flows, e.g. trans-oceanic links, rather than campus networks.

When enabled, RED responds to congestion by dropping packets at the selected rate. This is recommended only for TCP/IP networks with mostly TCP traffic. The drops are intended to cause TCP to back off its transmission rate.

TCP normally adapts its transmission rate to the rate the network can support. Each TCP flow repeats a cycle of ramping up to approximately the available bandwidth, then slowing to either near zero or near half the bandwidth, depending on the implementation. Thus a typical TCP flow may average between 1/2 and 3/4 of the available bandwidth, in the absence of any other traffic.

Multiple TCP flows tend to become synchronized, speeding up and slowing down in synchronization. This behavior is sometimes called "porpoising",because the flows surface and dive in unison, like a pod of porpoises. When congestion occurs, all TCP sessions normally get slowed down simultaneously,resulting in periods where link capacity is underutilized. By randomly slowing one TCP session, the others benefit, resulting in better goodput.

Note that dropping packets does not work with most other protocols, including AppleTalk and Novell.

When RSVP is also configured, packets from other flows are dropped before those from RSVP flows, when possible. We'll look at RSVP in a later article.

Weighted RED (WRED) allows you to specify a RED policy in combination with IP precedence, so that different types of packets are dropped at different rates and levels of congestion. You can set it so precedence is ignored,or you can set it so that lower precedence packets are more likely to be dropped. WRED is an IOS 11.1 CC or 12.0 feature.

Distributed Weighted RED (DWRED) is available in IOS 12.0 for hardware that supports it. The Distributed WRED (DWRED) feature uses the VIP rather than the RSP to perform the queuing. It requires a Cisco 7500 series router or Cisco 7200 series router with RSP.

The default is for RED to be disabled on an interface. RED is only useful on interfaces where most of the traffic is TCP. Random early detection cannot be configured on an interface already configured with custom, priority, or fair queueing. To enable RED on an interface, configure:

random-detect

You may also configure

constant

Here constant is a number in the range 1 to 16 used to determine the rate that packets are dropped when congestion occurs. The default is 10. The number is an exponent used in the exponential decay rate for the weighted queue size calculation used in RED. It is suggested that you change the default with caution. A big value means the queue size measurement changes slowly, making RED less responsive. The formula used for tracking queue size is:

average = (old_average * (1-1/2^n)) + (current_queue_size * 1/2^n)

Where n is the exponential weighting constant.

To configure WRED on an interface, configure:

<0..7>

<min-thresh> <max-thresh> <mark-probability-denom>

In this command, precedence refers to IP precedence, number 0 to 7. And min-thresh is the minimum threshold in number of packets, from 1 to 4096. When the average queue length reaches this number, RED begins to drop packets with the specified IP precedence. The number max-thresh is the maximum threshold in number of packets, from 1 to 4096. When the average queue length exceeds this number, WRED drops all packets with thespecified IP precedence. Finally, mark-prob-denom is the denominator for the fraction of packets dropped when the average queue depth is max-threshold, in the range 1 to 65536. If the denominator is 512, one out of every 512 packets is dropped when the average queue is at the max-threshold. The value is from 1 to 65536. The default is 10.

The per-precedence min-threshold defaults are 9/18, 10/18, ... 16/18 of the max-threshold size, for precedences 0 through 7 respectively. The max-threshold is determined based on interface speed and output buffering capacity.

RED configuration

WRED configuration

Committed Access Rate (CAR) has two functions:

Packet Classification, using IP Precedence and QoS group setting
Access Bandwidth Management, through rate limiting

So CAR is basically the input side of Traffic Shaping (which we've talked about somewhat in a prior Frame Relay article).

Traffic is sequentially classified using pattern matching specifications, just like access lists, on a first-match basis. The pattern matched specifies what action policy rule to use, based on whether the traffic conforms. That is, if traffic is within the specified rate, it conforms, and is treated one way. Non-conforming (excess) traffic can be treated differently, usually either by giving it lower priority or by dropping it. If no rule is matched, the default is to transmit the packet. This allows you to use rules to rate limit some traffic, and allow the rest to be transmitted without any rate controls.

The possible action policy rules:

transmit
drop
continue (go to next rate-limit rule on the list)
set IP Precedence bits and transmit
set IP Precedence bits and continue
set QoS group and transmit
set QoS group and continue

IP Precedence uses the 3 bit precedence field in the IP header. This gives up to 6 Classes of Service (CoS): 0-5 can be used, but 6 and 7 are reserved per RFC791.

QoS group is an identifier within the router only. It can be set by CAR or by QPPB (see elsewhere). The QoS group is a number in the range 0 to 99, with 0 the default for unassigned packets (and not usable in assignmentsof QoS group).

The configurable parameters include:

committed rate (bits/second) -- in increments of 8 Kbps
normal burst size (bytes) -- how many bytes are handled in a burst above the committed rate limit without a penalty
extended burst size (bytes) -- number of bytes in an extended burst -- beyond this, packets are dropped

For traffic falling between normal and extended burst sizes, selected packets are dropped using a RED-like managed drop policy. (See RED, elsewhere).

It's mostly one long command, repeated over and over with various rule specifications:

[no] rate-limit {input|output}
    [access-group [rate-limit]  | qos-group  ]
      
    conform-action { drop|
        transmit|
        continue|
        set-prec-transmit  |
        set-prec-continue 

        set-qos-group-transmit 
        set-qos-group-continue }
    exceed-action  { drop|
        transmit|
        continue|
        set-prec-transmit |
        set-prec-continue |

        set-qos-group-transmit |
        set-qos-group-continue }

The arguments bps, normal-burst, extended-burst are as noted prior to this section (committed rate in bps and burst sizes in bytes).

Traffic matches can be specified using access-lists:

acl-index

precedence

mac-address

prec-mask

where acl-index is the access list number: from 1 to 99 classifies packets by precedence or precedence mask, from 100 to 199 classifies by MAC address.

And mask prec-mask is the IP precedence mask; a two-digit hexadecimal number. This is used to assign multiple precedences to the same rate-limit access list. (Precedences map to bits: precedence 0 is the 1 bit, precedence 1 the 2 bit, etc.).

acl-index

interface

Here's a simple sample:

And a more complex one:

Traffic Shaping comes in two forms: Generic Traffic Shaping and Frame Relay Traffic Shaping. These are found in IOS 11.2 and later.

Traffic Shaping allows you to control how fast packets are sent out an interface, any interface. You might want to do this to avoid congestion either locally or elsewhere in your network, for example if you have a network with different access rates or if you are restricting some traffic to a fraction of the available bandwidth. For example, if one end of the link in a Frame Relay network is 256 Kbps and the other end of the link is only 128 Kbps, sending packets at 256 Kbps at the very least causes congestion. Somewhere.

You can traffic shape all traffic on an interface, or use an access list to specify certain traffic. On Frame Relay interfaces, additional per-virtual-circuit features are available with Frame Relay Traffic Shaping.

Traffic shaping is not supported with optimum, distributed, or flow switching. If you enable traffic shaping, all interfaces will revert to fast switching.

bit-rate

burst-size

excess-burst-size

access-list bit-rate

burst-size

excess-burst-size

The former command traffic shapes all traffic on an interface. The latter uses an access-list to specify which traffic is to be traffic shaped.

bit-rate: Bit rate that traffic is shaped to in bits per second.

burst-size: Sustained number of bits that can be transmitted per interval. The default is the bit-rate divided by 8.

excess-burst-size: Maximum number of bits that can exceed the burst size in the first interval in a congestion event. The default is equal to the burst-size.

The measurement interval is calculated by dividing the burst-size (if non-zero) by the bit rate. If the burst-size is zero, the excess-burst-size is used (if non-zero).

For Frame Relay, you can use:

bit-rate

This command uses the configured bit rate as a lower bound, with the bit rate specified by the traffic-shape rate command as the upper bound for bandwidth. The actual rate that the traffic is shaped to lies between those two rates. It should be configured at both ends of the link because it also configures the devices to reflect forward explicit congestion notifications (FECN's) as BECN's, enabling the faster end of the link to adjust to congestion at the other end.

interface

The Frame Relay traffic shaping allows

rate enforcement per PVC or SVC
dynamic traffic throttling in response to BECN packets
custom or priority queuing per virtual circuit

The intent is to allow guaranteed bandwidth for each type of traffic. The queuing features let us prioritize per-circuit, and the rate enforcement makes sure that we won't have a burst on one virtual circuit denying access line bandwidth to the others.

Policy routing is the name given to use of a route map on packets to influence the routing decision. The routing next hop or output interface can be chosen based on inbound interface, source, or type of traffic. The IP precedence can also be set by the route map. If you're choosing outbound interface or next hop in response to destination, then you're doing normal routing, subject to some policy perhaps. Policy routing in the Cisco world refers specifically to routing based on source or other traffic characteristics, other than destination. Since this may have performance impact, use it only where needed and appropriate. Policy routing has performance impact: it is process or fast switched. It is therefore suitable for setting precedence at low speed edge routers, but not elsewhere.

To specify use of a route-map for policy routing on an interface, configure:

map-tag

The route map blocks then are defined using:

map-tag

sequence-number

Route-map match conditions used for policy routing can match either packet length or an IP extended access list.

To match the Layer 3 length of the packet, use:

min max

To match IP sources and destinations based on standard or extended access list(s):

access-list-number

name

access-list-number

name

The route-map block's set conditions can specify precedence value, next-hop for IP routing, or output interface.

To set the precedence value in the IP header:

value

To specify the next hop to which to route the packet (it need not be adjacent):

ip-address

To specify the output interface(s) for the packet:

type number

To specify the default route next hop for use when there is no explicit route:

ip-address

To specify the default output interface(s) for use when there is no explicit route:

type number

Fast-switched policy routing supports all of the match commands and most of the set commands, except for the set ip default command and some use of the set interface command. The set interface command is supported only over point-to-point links, unless a route-cache entry exists using the same interface specified in the set interface command in the route map. When process switching policy routing, the routing table is used to check output interface sanity. During fast switching, if the packet matches, the software blindly forwards the packet to the specified interface. To configure fast-switched policy routing on an interface:

ip route-cache policy

Packets generated by the router are not normally policy-routed. To enable local policy routing of such packets, specify the route map to use. This is a global configuration mode command.

map-tag

The following example provides two sources with equal access to two different service providers. Packets arriving on serial interface 1 from 1.1.1.1 are sent to the next hop 3.3.3.3 if there is no explicit route for the packet's destination. Packets arriving from 2.2.2.2 are sent to the next hop 4.4.4.4 there is no explicit route for the packet's destination. All other packets for which the router has no explicit route to the destination are discarded.

QoS Policy Propagation via BGP (QPPB) allows you to classify packets based on access lists, BGP community lists, and BGP AS paths. The classification can then set either IP precedence (a global tagging scheme), or internal QoS group identifier (internal to the router). The BGP community can also contain both AS and IP precedence information -- see the second example below. After classification, other QoS features such as CAR and WRED can then be used to enforce business policy. Note that this allows you to set up a policy at one BGP speaking router, and propagate that to other routers via BGP. Hence the name. This means that at the service provider router connecting to a site, a policy can be set up so that inbound traffic elsewhere is classified into the right class of service (IP Precedence bits). This can then interact with Tag Switching, or MPLS. If you set the QoS group ID, it can then be used for rate-limiting or WFQ based on QoS group ID. This expands on the classes of service provided by the 8 IP precedence values. If you use IP precedence, it can now be set based on source or destination address.

community-list community-list-number

prefix

Configuring QPPB on an interface:

Configuring BGP to set QoS groups:

Configuring BGP to set TOS bits (precedence):

QOS (Quality of Service)