Dump

From Network Security Wiki
Jump to navigation Jump to search

Windows Script Backup ScreenOS[edit]

@echo off
REM ================================================================
REM ===This Script may give following error:
REM ===FATAL ERROR: Network error: Connection timed out ==> Check IP addresses
REM ===FATAL ERROR: Network error: Connection refused ==> Check SSH Parameters on Firewall
REM ===WARNING - POTENTIAL SECURITY BREACH! ==> SSH Public Keys changed/recalculated
REM ===Access denied ==> Password wrong
REM ================================================================
REM ===No of times "Access denied" message appears ==> no of wrong firewalls with wrong pwds
REM ================================================================
REM ===Configurable Parameters
REM ================================================================
set username=aman
set CFGFILE=BackupList.txt
set DESTDIR=Backups\
REM ================================================================
REM ===Script code starts here
REM ================================================================
SET TIMESTAMP=%date:~-4,4%.%date:~-7,2%.%date:~-10,2%
for /F "tokens=1,2,3 delims=," %%A in (%CFGFILE%) do (
    IF NOT EXIST "%DESTDIR%%TIMESTAMP%" mkdir "%DESTDIR%%TIMESTAMP%"
    plink -ssh -C -batch -pw %%C %username%@%%B get config > "%DESTDIR%%TIMESTAMP%\%%A.cfg"
)
echo Backup completed

"BackupList.txt" file:

R1,192.168.1.3,cisco
R2,192.168.1.4,cisco2

Need to download Plink from this Link

HA Best Practices[edit]

Basic:

1.	Two Firewalls should have same Hardware(Model, Modules, Ram, Ports, etc)
2.	Firmware should be exactly same i.e. Major, Minor as well as Patch version
3.	Licenses & features on both firewalls should be same (Basic, Advanced, AV, DI, AS, Web filtering, etc) 
4.	One firewall with expired license should be not put in cluster with a firewall with no license as it may cause them to become out of sync & different free memory.
5.	It is recommended to configure cluster with 2 dedicated HA links.
6.	VSD Group should be 0. If it is not 0, need to assign interfaces to that VSD Group on both firewalls.
7.	Console access is always recommended before Configuring/ Implementing/ Troubleshooting NSRP issues.
8.	Hostnames of the firewalls should be different to differentiate between devices.

Preempt:

1.	Preempt should be enabled. 
2.	Hold-down timer should have a higher value (~120-180 seconds) to prevent NSRP failover flapping.  
3.	Preempt need not be configured on the backup device. 
4.	It should not be configured in environments with dynamic routing protocols due to protocol re-convergence.
5.	The priority of the preferred backup should be a higher value, as the lower priority takes precedence.

Interface Monitoring:

1.	Only add critical interfaces to Monitoring to avoid unnecessary failovers/preempts.

Track-IP:

1.	Track-IP is necessary to achieving a successful failover event, when the primary Juniper firewall stops passing traffic; but the monitored interfaces remain up while using interface monitoring only. 
2.	Need to determine one or more hosts that can reliably respond to ICMP/ARP traffic.

Master-Always-Exist:

1.	With NSRP monitoring enabled, both NSRP peers can become 'Inoperable'. Enabling the master-always-exist option will ensure that the cluster remains available. 
2.	Run the command “set nsrp vsd-group master-always-exist” only on Master & it will sync to Backup automatically.
Secondary-path: 

1.	To avoid Split Brain, Configure NSRP with 2 dedicated HA links. 
2.	The secondary-path option allows NSRP to poll the peer via an alternate, non-dedicated interface. The purpose is only to prevent a split-brain scenario, so NSRP sync data is not carried across this link, only heart-beat messages.

RTO Sync:

1.	Backup Session Timeout Acknowledge should be enabled
2.	Route Synchronization should not be used unless Dynamic Routing Protocol is running.

HA probe:

1.	HA probe must be enabled if the HA links are connected through a layer 2 switch.
2.	It should NOT be used if they are directly connected.
3.	Duplex settings on the switch and firewall interfaces should match.
Authentication & Encryption password:

1.	Use NSRP Authentication & Encryption if the HA Cables connect through a layer 2 switch.
2.	No need to use Authentication & Encryption if they are directly connected.
Misc:

1.	While adding secondary firewall in a cluster, Interface-based Default Route such as “set interface <interface name> gateway <gateway address>” will result in loss of communication as the Interface will become Inactive. Need to add regular Default Route before proceeding.
2.	Duplicate MAC address seen when 2 set of NSRP Clusters with same Cluster ID and VSD-Group are attached to the same switch/or are in same Broadcast Domain. Changing the Cluster ID or VSD group number will resolve the issue. 


KBs Referred:

http://kb.juniper.net/KB9311
http://kb.juniper.net/KB9309
http://kb.juniper.net/KB11432

SRX Import config[edit]

1. load system terminal Merge|Replace|~~~
{}{}{}{}
Cntl+D
commit


2. Edit
Set...
set... one by one
commit

Juniper SRX Firewalls[edit]

**********************
Juniper SRX Firewalls
**********************
run = used in configure mode to use operational mode commands

//Show Routes
show route brief
show route best x.x.x.x
set routing-options static route 10.2.2.0/24 next-hop 10.1.1.254
//Forwarding Table
run show route forwarding-table destination x.x.x.x/24

//TraceOptions settings
root@fw1# show security flow | display set
set security flow traceoptions file matt_trace
set security flow traceoptions file files 3
set security flow traceoptions file size 100000
set security flow traceoptions flag basic-datapath
set security flow traceoptions packet-filter f0 source-prefix 10.0.0.1/32 destination-prefix 200.1.2.3/32
set security flow traceoptions packet-filter f1 source-prefix 10.0.0.1/32 destination-prefix 200.1.2.3/32
activate security flow traceoptions
commit
monitor start matt_trace
monitor list

!! Kill the capture
monitor stop <captureFileName>
clear log <captureFileName>            !! Clear the log file
delete security flow traceoptions
commit
file delete <captureFileName>

//Show Traceoptions
show security flow session source-prefix 10.124.80.42 destination-prefix 117.1.1.25
start shell

egrep ‘matched filter|(ge|fe|reth)-.*->.*|session found|create session|dst_xlate|routed|search|denied|src_xlate|outgoing phy if’ /var/log/matt_trace | sed -e ‘s/.*RT://g’ | sed -e ‘s/tcp, flag 2 syn/–TCP SYN–/g’ | sed -e ‘s/tcp, flag 12 syn ack/–TCP SYN\/ACK–/g’ | sed -e ‘s/tcp, flag 10/–TCP ACK–/g’ | sed -e ‘s/tcp, flag 4 rst/–TCP RST–/g’ | sed -e ‘s/tcp, flag 14 rst/–TCP RST\/ACK–/g’ | sed -e ‘s/tcp, flag 18/–TCP PUSH\/ACK–/g’ | sed -e ‘s/tcp, flag 11 fin/–TCP FIN\/ACK–/g’ | sed -e ‘s/tcp, flag 5/–TCP FIN\/RST–/g’ | sed -e ‘s/icmp, (0\/0)/–ICMP Echo Reply–/g’ | sed -e ‘s/icmp, (8\/0)/–ICMP Echo Request–/g’ | sed -e ‘s/icmp, (3\/0)/–ICMP Destination Unreachable–/g’ | sed -e ‘s/icmp, (11\/0)/–ICMP Time Exceeded–/g’ | awk ‘/matched/ {print “\n\t\t\t=== PACKET START ===”}; {print};’

//Show Sessions
run show security flow session destination-prefix x.x.x.x

//Match Policy
run show security match-policies from-zone zonea to-zone zoneb source-ip x.x.x.x destination-ip x.x.x.x protocol tcp source-port 1024 destination-port xx

//Check for Block Group
show security policies from-zone untrust to-zone trust | display set | grep deny

//Find Syntax for an Existing Command
show | display set | xxxxxxxxx

//VPN Troubleshooting
show security ike security-associations [index <ID>] [detail]
show security ipsec security-associations [index <ID>] [detail]
show security ipsec statistics [index <ID>]

//VPN
//Set proxy ID’s for a route based tunnel
set security ipsec vpn vpn-name ike proxy-identity local 10.0.0.0/8 remote 192.168.1.0/24 service any

//Packet Capture
set security datapath-debug capture-file my-capture
set security datapath-debug capture-file format pcap
set security datapath-debug capture-file size 1m
set security datapath-debug capture-file files 5
set security datapath-debug maximum-capture-size 400
set security datapath-debug action-profile do-capture event np-ingress packet-dump
set security datapath-debug packet-filter my-filter action-profile do-capture
set security datapath-debug packet-filter my-filter source-prefix 1.2.3.4/32

//Super SRX Packet Capture Filter
egrep ‘matched filter|(ge|fe|reth ) -.*- > .*|session found|Session \(id|session id|create|dst_nat|chose interface|dst_xlate|routed|search|denied|src_xlate|dip id|outgoing phy if|route to|DEST|post’ /var/log/mchtrace | uniq | sed -e ‘s/.*RT://g’ | awk ‘/matched/ {print “\n\t\t\t=== PACKET START ===”} ; {print} ;’ | awk ‘/^$/ {print “\t\t\t=== PACKET END ===”}; {print};’ ; echo | awk ‘/^$/ {print “\t\t\t=== PACKET END ===”}; {print};’

// Policy commands

show | display set (shows policy)
set system syslog
set security log
set interfaces ge-0/0/3 gigether-options auto-negotation (redundant-parent)
set security policies from-zone xxx to-zone xxx policy policy_name match
set security zones security-zone untrust address-book address
set security nat source rule-set zone-to-zone rule rule-source-nat match source-address 10.0.0.0
set routing-instances
set applications

set security ike proposal
set security ike policy
set security ike gateway
set security ipsec proposal
set security ipsec policy
set security ipsec vpn

show|compare
commit check
commit comments ticket#2222 and-quit

set security policies from-zone dmz to-zone trust policy 12 match source-address h_10.124.0.1 destination-address h_1.2.3.4 application tcp_22
set security policies from-zone dmz to-zone trust policy 12 then permit
set security policies from-zone dmz to-zone trust policy 12 then log session-init session-close

+         match {
+             source-address h_10.124.0.1;
+             destination-address h_1.2.3.4;
+             application tcp_22;
+         }
+         then {
+             permit;
+             log {
+                 session-init;
+                 session-close;
+             }
+         }
+     }

 

Various:
show system uptime 	Uptime
show version 	Version of platform (host/model)
show chassis firmware 	Firmware loaded on FPCs
show system software detail 	
	
	
	
show chassis routing-engine 	CPU, Memory for Routing-Engine
show chassis fan 	Speed and status of fans
show chassis environment 	Temperature status of components
show chassis hardware detail 	Hardware inventory (backplane)
show system core-dumps 	Core-dumps
show system alarms 	System alarms
show chassis alarms 	Alarms for hardware and chassis
show system boot-messages 	Logs from boot sequence
show log chassisd 	Logs for SRX chassis (Cards)
show log messages 	Recent system messages
show configuration security log 	Syslog configuration
show system buffers 	Utilization of memory buffers
show system virtual-memory 	Virtual memory utilization
show system processes 	Processes running on system
show security idp memory 	IDP memory statistics
show security monitoring performance session 	Session counts on each FPC

MIP in a policy-based VPN [1][edit]

KB9924

This work-around is for configuring a Mapped Internet Protocol (MIP) address in a policy-based VPN, where they are typically created on tunnel interfaces in a route-based VPN. This workaround applies when the customer requirement does not allow for a route-based VPN.

Customer requirements:

A site-to-site VPN tunnel between a Juniper firewall and a Cisco.
The Cisco Peer IP address and the Remote subnet must use the same Public IP address.
MIPs need to be configured for the servers behind the Juniper Firewall.

For these requirements, a route-based VPN on the Juniper firewall is not an option because a route is needed to the remote network pointing to the tunnel interface. If the peer IP and remote IP addresses are the same for both devices, the IKE negotiation can not be established.

A policy-based VPN can be configured for this design, since only a default route is needed and then a policy can be used to determine the VPN.

On the Juniper firewall, a MIP needs to be configured for the servers on the private network, which need to be accessed via a VPN from the Cisco site. However, MIPs are not directly supported in policy-based VPN.

If the outgoing interface is in a zone other than Untrust (for example, zone is ISP), follow KB27122- [ScreenOS] How to configure a MIP in a policy based VPN when outgoing interface is in zone other than Untrust

Untrust-Tun is the Tunnel type zone, carrier zone that helps encryption-decryption

set interface tunnel.1 zone Untrust-Tun

Fixed IP on the tunnel interface

set interface tunnel.1 ip 4.4.4.10/24

MIP will be used by the cisco-remote network to connect to server behind the Juniper firewall's local network

set interface tunnel.1 mip 4.4.4.11 host 20.20.20.5 netmask 255.255.255.255

A route needs to be added to send the traffic to the tunnel interface:

set route 25.34.5.7 interface tunnel.1

Phase 1 configuration:

set ike gateway Netscreen-Cisco-IKE address 25.34.5.7 main outgoing-interface ethernet4 preshare test sec-level standard

Phase 2 configuration:

set vpn Netscreen-Cisco-VPN gateway Netscreen-Cisco-IKE sec-level standard

Bind Tunnel Zone (Juniper firewall will recognize the MIP configured on the tunnel interface):

set vpn Netscreen-Cisco-VPN bind zone Untrust-Tun

Then an appropriate access-list needs to be configured on the Cisco end to support Proxy-IDs generated by the polices in the Juniper firewall.

set policy from untrust to trust 2.2.2.2/32 MIP (4.4.4.10) any tunnel vpn Netscreen-Cisco-VPN log
set policy from trust to untrust 20.20.20.5/32 2.2.2.2/32 any tunnel vpn Netscreen-Cisco-VPN log

Note:

The MIP will work in only one direction.  
If traffic needs to be initiated from the Netscreen Trust zone over the tunnel and that traffic must use NAT, 
then a DIP is required, and the DIP cannot use the same IP as the MIP.  
This is a limitation.  If a bi-directional MIP is required a route based VPN must be used.

Workaround if outgoing is other than Untrust Zone[edit]

If the outgoing interface is in a zone other than Untrust (for example, zone is ISP) proceed with following:

set zone "ISP"
set internet ethernet0/2 zone "ISP"

ISP is the zone for outgoing interface ethernet0/2:

set internet ethernet0/2 ip 1.1.1.1/24

ISP-Tun zone is the carrier zone for the tunnel for NAT-ing:

set zone "ISP-Tun" tunnel ISP

Untrust-Tun is the Tunnel type zone, carrier zone that helps encryption-decryption:

set interface tunnel.1 zone ISP-Tun

Fixed IP on the tunnel interface

set interface tunnel.1 ip 4.4.4.10/24

MIP will be used by the remote network to connect to server behind the ScreenOS firewall's local network:

set interface tunnel.1 mip 4.4.4.11 host 20.20.20.5 netmask 255.255.255.255

A route needs to be added to send the traffic to the tunnel interface; for the translation to take place:

set route 6.7.8.9/32 interface tunnel.1

Phase 1 configuration:

set ike gateway Netscreen-IKE address 2.2.2.2 main outgoing-interface ethernet0/2 preshare test sec-level standard

Phase 2 configuration:

set vpn Netscreen-VPN gateway Netscreen-IKE sec-level standard

Bind Tunnel Zone (ScreenOS firewall will identify the MIP configured on the tunnel interface):

set vpn Netscreen-VPN bind zone Untrust-Tun

Then an appropriate access-list must be configured on the remote end to support Proxy-IDs generated by the polices in the ScreenOS firewall.

set policy from ISP to trust 6.7.8.9/32 MIP (4.4.4.11) any tunnel vpn Netscreen-VPN log
set policy from trust to ISP 20.20.20.5/32 6.7.8.9/32 any tunnel vpn Netscreen-VPN log

get sa detail[edit]

CORPORATE-> get sa
total configured sa: 1
HEX ID   Gateway   Port Algorithm   SPI Life:sec kb Sta PID vsys
00000001< 2.2.2.2 500 esp:3des/sha1 c2e1f0e4 3296 unlim A/- -1 0
00000001> 2.2.2.2 500 esp:3des/sha1 74098e47 3296 unlim A/- -1 0

We can see that the remote peer is 2.2.2.2. The State shows A/-. The possible states are below:

I/I SA Inactive. VPN is currently not connected.
A/- SA is Active, VPN monitoring is not enabled
A/D SA is Active, VPN monitoring is enabled but failing thus DOWN
A/U SA is Active, VPN monitoring is enabled and UP

Gateway IP address for Next Hop[edit]

Why is it necessary to specify 'Gateway IP address for Next Hop' during the configuration of static default route?

Scenario I
Next-hop gateway IP address is not specified in the static default route.
SSG-> set route 0.0.0.0/0 int eth0/1 

SSG-> get db st

route to 4.2.2.2
cached arp entry with MAC 000000000000 for 4.2.2.2
add arp entry with MAC 000000000000 for 4.2.2.2 to cache table
wait for arp rsp for 4.2.2.2
ifp2 ethernet0/1, out_ifp ethernet0/1, flag 10000e00, tunnel ffffffff, rc 0
outgoing wing prepared, not ready

SSG-> get route | i 4.2.2.2
* 16 0.0.0.0/0 eth0/1 0.0.0.0 S 20 1 Root

Because the next-hop IP address is not specified in the default route, the firewall is doing an ARP for 4.2.2.2.

When the firewall needs to forward a packet via the default route, it needs the MAC address of the default router in order to build the frame to forward the packet.

The reason for the failure is that the firewall is waiting for an ARP response from 4.2.2.2, as if it was on a connected segment. This is indicated by the 'wait for arp rsp for 4.2.2.2', which it never receives.

It then drops the packet with the message 'outgoing wing prepared, not ready' which indicates that there is no ARP response;

Scenario II
Next-hop gateway ip address is specified in the static default route.
SSG-> set route 0.0.0.0/0 int eth0/1 gateway 1.1.1.2

SSG-> get db st

route to 1.1.1.2
cached arp entry with MAC 000000000000 for 1.1.1.2
add arp entry with MAC 002688e8c305 for 1.1.1.2 to cache table
arp entry found for 1.1.1.2
ifp2 ethernet0/1, out_ifp ethernet0/1, flag 10800e00, tunnel ffffffff, rc 1
outgoing wing prepared, ready

SSG-> get route | i 4.2.2.2
* 15 0.0.0.0/0 eth0/1 1.1.1.2 S 20 1 Root

In this scenario, the firewall found the MAC address for the next-hop gateway (ISP router with ip 1.1.1.2) in its ARP table.

It was then able to build the frame and forward the packet to the ISP router, which in turn routed the packet to its next hop, until the packet reached the destination IP 4.2.2.2.

SRX Stuck on old technology[edit]

The SRX uses stateful inspection which relies on port and protocol for policy decisions, a technique that is ineffective at controlling applications that use dynamic ports, encryption, or tunnel across often used/allowed ports to bypass firewalls.


Stateful Inspection[edit]

This solution allows calls to come from any port on an inside machine, and will direct them to port 25 on the outside.

So why is it wrong?

Our defined restriction is based solely on the outside host’s port number, which we have no way of controlling. Now an enemy

can access any internal machines and port by originating his call from port 25 on the outside machine.

What can be a better solution ?

The ACK signifies that the packet is part of an ongoing conversation Packets without the ACK are connection establishment messages, which we are only permitting from internal hosts


Sub interface number[edit]

The maximum permitable number for sub interface number in Juniper SSG140 firewall is 100. The firewall will accept a number in the range of 1-100 only. Sub Interface names in Juniper Netscreen firewalls are like: eth0/1.50, eth0/2.100. A name like eth0/2.101 or eth0/2.200 will not be acceptable.

Window size smaller that MTU[edit]

If window size is smaller than MTU, packet retransmissions will occur. This is an application issue. This means buffer size is smaller & lager packets are received.

File:Small-buffer.pcap

Certificates[edit]

​ A session symmetric key between two parties is used only once.

The symmetric (shared) key in the Diffie-Hellman method is K = g xy mod p.

In public-key cryptography, everyone has access to everyone’s public key; public keys are available to the public.


Our example uses small numbers, but note that in a real situation, the numbers are very large. Assume that g = 7 and p = 23. The steps are as follows: 1. Alice chooses x = 3 and calculates R 1 = 7 3 mod 23 = 21. 2. Alice sends the number 21 to Bob. 3. Bob chooses y = 6 and calculates R 2 = 7 6 mod 23 = 4. 4. Bob sends the number 4 to Alice. 5. Alice calculates the symmetric key K = 4 3 mod 23 = 18. Bob calculates the symmetric key K = 21 6 mod 23 = 18. The value of K is the same for both Alice and Bob; g xy mod p = 7 18 mod 35 = 18.


Public Announcement: The naive approach is to announce public keys publicly. Bob can put his public key on his website or announce it in a local or national newspaper. When Alice needs to send a confidential message to Bob, she can obtain Bob’s public key from his site or from the newspaper, or even send a message to ask for it. This approach, however, is not secure; it is subject to forgery. For example, Eve could make such a public announcement. Before Bob can react, damage could be done. Eve can fool Alice into sending her a message that is intended for Bob. Eve could also sign a document with a corresponding forged private key and make everyone believe it was signed by Bob. The approach is also vulnerable if Alice directly requests Bob’s public key. Eve can intercept Bob’s response and substitute her own forged public key for Bob’s public key.


CSR has a Public Key.

CA signs it.

Certificate is a proof of public key.

Encrypt using public key & receiver decrypts using private key.

There are two types of certificate authorities (CAs), root CAs and intermediate CAs.

   Certificate 1 - Issued To: example.com; Issued By: Intermediate CA 1
   Certificate 2 - Issued To: Intermediate CA 1; Issued By: Intermediate CA 2
   Certificate 3 - Issued To: Intermediate CA 2; Issued By: Intermediate CA 3
   Certificate 4 - Issued To: Intermediate CA 3; Issued By: Root CA 

Root CA certificates, on the other hand, are "Issued To" and "Issued By" themselves,

For enhanced security purposes, most end user certificates today are issued by intermediate certificate authorities.

Installing an intermediate CA signed certificate on a web server or load balancer usually requires installing a bundle of certificates.

The CA will also provide a so called intermediate CA file or chain certificate. It proves that your chosen CA is trusted by one of the root CAs. You will need the intermediate CA certificate as 'chain' certificate in your clientssl profile.

Nonce is Number Once


In an asymmetric key encryption scheme, anyone can encrypt messages using the public key, but only the holder of the paired private key can decrypt. Security depends on the secrecy of the private key.

In the Diffie–Hellman key exchange scheme, each party generates a public/private key pair and distributes the public key. After obtaining an authentic copy of each other's public keys, Alice and Bob can compute a shared secret offline. The shared secret can be used, for instance, as the key for a symmetric cipher.


  • Public-key encryption, in which a message is encrypted with a recipient's public key. The message cannot be decrypted by anyone who does not possess the matching private key, who is thus presumed to be the owner of that key and the person associated with the public key. This is used in an attempt to ensure confidentiality.
  • Digital signatures, in which a message is signed with the sender's private key and can be verified by anyone who has access to the sender's public key. This verification proves that the sender had access to the private key, and therefore is likely to be the person associated with the public key. This also ensures that the message has not been tampered with, as any manipulation of the message will result in changes to the encoded message digest, which otherwise remains unchanged between the sender and receiver.


TCP[edit]

Source: TCP/IP Protocol-Suite, B.Forouzan

  • TCP uses the services of IP, a connectionless protocol, but itself is connection-oriented.
  • TCP uses the services of IP to deliver individual segments to the receiver, but it controls the connection itself.
  • If a segment is lost or corrupted, it is retransmitted. IP is unaware of this retransmission.
  • If a segment arrives out of order, TCP holds it until the missing segments arrive; IP is unaware of this reordering.
  • Sequence number of packet is the number of the first byte in the packet.
  • Together with length in the TCP header, we know which packet has which bytes

TCP Connection[edit]

  • TCP transmits data in full-duplex mode.
  • When two TCPs in two machines are connected, they are able to send segments to each other simultaneously.
  • In TCP, connection-oriented transmission requires three phases:
Connection Establishment
Data Transfer
Connection Termination

Connection Establishment[edit]

Three way handshake[edit]

  • Server program tells its TCP that it is ready to accept a connection.
  • This request is called a Passive Open.
  • The client program issues a request for an active open.
  • TCP can now start the three-way handshaking process
3way handshake.jpg
1st Packet
  • SYN segment is for synchronization of sequence numbers.
  • The client chooses a random number as the first sequence number called Initial Sequence Number(ISN) and sends it to the Server.
  • This segment does not contain an Acknowledgment Number.
  • It does not define the window size either; a window size definition makes sense only when a segment includes an Acknowledgment.
  • This can include some options - WSF, MSS, SACK_PERM
  • SYN segment is a control segment and carries no data, However it consumes one sequence number.
  • When the data transfer starts, the ISN is incremented by 1.
  • We can say that the SYN segment carries no real data, but we can think of it as containing one imaginary byte.


2nd Packet
  • The server sends a SYN + ACK segment with two flag bits set: SYN and ACK.
  • This segment has a dual purpose.
  • First, it is a SYN segment for communication in the other direction.
  • The server uses this segment to initialize a sequence number for numbering the bytes sent from the server to the client.
  • The server also acknowledges the receipt of the SYN segment from the client by setting the ACK flag and displaying the next sequence number it expects to receive from the client.
  • Because it contains an acknowledgment, it also needs to define the receive window size, rwnd, to be used by the client.


3rd Packet
  • The client sends the third segment which is just an ACK segment.
  • It acknowledges the receipt of the second segment with the ACK flag and Acknowledgment Number field.
  • Sequence number in this segment is the same as the one in the SYN-ACK segment; the ACK segment does not consume any sequence numbers.
  • The client must also define the server window size.
  • Third segment usually does not carry data and consumes no sequence numbers.


Note
  • A SYN cannot carry data, but it consumes one Sequence number.
  • A SYN+ACK cannot carry data, but consumes one Sequence number.
  • A ACK if carrying no data, consumes no sequence number.


Simultaneous Open[edit]

  • In rare situation when both processes issue an active open.
  • In this case, both TCPs transmit a SYN + ACK segment to each other.
  • Only one single connection is established between them.


SYN Flooding Attack[edit]

  • TCP handshake is susceptible to SYN flooding attack.
  • This happens when a malicious attackers send a large number of SYN segments.
  • The server, assuming that the clients are issuing an active open, allocates the necessary resources and setting timers.
  • The TCP server then sends the SYN+ACK segments to the fake clients, which are lost.
  • When the server waits for the third packet, resources are allocated without being used.
  • If the number of SYN segments is large, the server eventually runs out of resources.
  • It may be unable to accept connection requests from valid clients.
  • This SYN flooding attack belongs to denial of service attack group.
  • One strategy is to postpone resource allocation until the server can verify that the connection request is coming from a valid IP address, by using a Cookie.
  • SCTP uses this strategy.


Data Transfer[edit]

  • After connection is established, bidirectional data transfer can take place.
  • The client and server can send data and acknowledgments in both directions.
  • Data traveling in the same direction as an acknowledgment are carried on the same segment.
  • The acknowledgment is piggybacked with the data.


Connection Termination[edit]

  • Any of the two parties involved in exchanging data (client or server) can close the connection, it is usually initiated by the client.
  • Most implementations today allow two options for connection termination:
Three-way Termination
Four-way Termination with a half-close option.


Three-Way Termination[edit]

Three-way termination.jpg
1st Packet
  • The client TCP, after receiving a close command from the client process, sends the FIN segment.
  • A FIN segment can include the last chunk of data sent by the client or it can be just a control segment.
  • If it is only a control segment, it consumes only one sequence number.
2nd Packet
  • The server TCP after receiving the FIN, informs its process
  • It then sends a FIN+ACK to confirm the receipt of the FIN from the client and to announce the closing of the connection in the other direction.
  • This segment can also contain the last chunk of data from the server.
  • If it does not carry data, it consumes only one sequence number.
3rd Packet
  • The client TCP sends an ACK segment to confirm the receipt of the FIN from the TCP server.
  • This segment contains the acknowledgment number, which is one plus the sequence number received in the FIN segment from the server.
  • This segment cannot carry data and consumes no sequence numbers.


Note
  • The FIN segment consumes one sequence number if it does not carry data.
  • The FIN + ACK segment consumes one sequence number if it does not carry data.


Half-Close[edit]

  • In TCP, one end can stop sending data while still receiving data. This is called a Half-Close.
  • Either the server or the client can issue a half-close request.
  • It can occur when the server needs all the data before processing can begin.
  • An example is sorting.
  • When the client sends data to the server to be sorted, the server needs to receive all the data before sorting can start.
  • This means the client, after sending all data, can close the connection in the client-to-server direction.
  • However, the server-to-client direction must remain open to return the sorted data.
  • The server, after receiving the data, still needs time for sorting; its outbound direction must remain open.
Half-close.jpg
  • The data transfer from the client to the server stops.
  • The client half-closes the connection by sending a FIN segment.
  • The server accepts the half-close by sending the ACK segment.
  • The server, however, can still send data.
  • When the server has sent all of the processed data, it sends a FIN segment, which is acknowledged by an ACK from the client.
  • After half closing the connection, data can travel from server to client and acknowledgments can travel from client to server.
  • The client cannot send any more data to the server.


Connection Reset[edit]

  • TCP at any end may
Deny a connection request
Abort an existing connection
Terminate an idle connection
  • All of these are done with the RST flag.

State Transition[edit]


State Description
CLOSED No connection exists
LISTEN Passive open received; waiting for SYN
SYN-SENT SYN sent; waiting for ACK
SYN-RCVD SYN+ACK sent; waiting for ACK
ESTABLISHED Connection established; data transfer in progress
FIN-WAIT-1 First FIN sent; waiting for ACK
FIN-WAIT-2 ACK to first FIN received; waiting for second FIN
CLOSE-WAIT First FIN received, ACK sent; waiting for application to close
TIME-WAIT Second FIN received, ACK sent; waiting for 2MSL time-out
LAST-ACK Second FIN sent; waiting for ACK
CLOSING Both sides decided to close simultaneously


State Transition.jpg

Maximum Segment Life[edit]

  • The TCP standard defines MSL as being a value of 120 seconds (2 minutes).
  • In modern networks TCP allows implementations to choose a lower value.
  • The common value for MSL is between 30 seconds and 1 minute.
  • The MSL is the maximum time a segment can exist in the Internet before it is dropped.
  • TCP segment is encapsulated in an IP datagram, which has a limited lifetime (TTL).
  • When the IP datagram is dropped, the encapsulated TCP segment is also dropped.


TIME-WAIT state and 2SML timer[edit]

There are two reasons for the existence of the TIME-WAIT state and the 2SML timer:

   Ambox notice.png     This section needs to be concized.
1st Reason
  • If the last ACK segment is lost, the server TCP, which sets a timer for the last FIN, assumes that its FIN is lost and resends it.
  • If the client goes to the CLOSED state and closes the connection before the 2MSL timer expires, it never receives this resent FIN segment, and consequently, the server never receives the final ACK.
  • The server cannot close the connection.
  • The 2MSL timer makes the client wait for a duration that is enough time for an ACK to be lost (one SML) and a FIN to arrive (another SML).
  • If during the TIME-WAIT state, a new FIN arrives, the client sends a new ACK and restarts the 2SML timer.
2nd Reason
  • A duplicate segment from one connection might appear in the next one.
  • Assume a client and a server have closed a connection.
  • After a short period of time, they open a connection with the same socket addresses (same source and destination IP addresses and same source and destination port numbers).
  • This new connection is called an incarnation of the old one.
  • A duplicated segment from the previous connection may arrive in this new connection and be interpreted as belonging to the new connection if there is not enough time between the two connections.
  • To prevent this problem, TCP requires that an incarnation cannot occur unless 2MSL amount of time has elapsed.
  • Some implementations, however, ignore this rule if the initial sequence number of the incarnation is greater than the last sequence number used in the previous connection.


TCP Windows[edit]

  • TCP uses two windows for each direction of data transfer:
Send window 
Receive window
  • Four windows for a bidirectional communication.

Send Window[edit]

Send Window.jpg
  • The window shown here is of size 100 bytes (normally thousands of bytes).
  • The send window size is dictated by the receiver (flow control) and the congestion in the underlying network (congestion control).
  • The figure shows how a send window opens, closes, or shrinks.

Receive Window[edit]

Receive Window.jpg
  • TCP allows the receiving process to pull data at its own pace.
  • This means that part of the allocated buffer at the receiver may be occupied by bytes that have been received and acknowledged, but are waiting to be pulled by the receiving process.
  • The receive window size is then always smaller or equal to the buffer size
  • The receiver window size determines the number of bytes that the receive window can accept from the sender before being overwhelmed (flow control).
rwnd = buffer size − number of waiting bytes to be pulled

Flow Control[edit]

  • Flow control balances the rate a producer creates data with the rate a consumer can use the data.
  • TCP separates flow control from error control.
Flow Control.jpg
  • Data travels from Sending Process to Sending TCP, then to the Receiving TCP, and finaly to the receiving process (paths 1, 2, and 3).
  • Flow control feedback's are traveling from the receiving TCP to the sending TCP and from the sending TCP up to the sending process (paths 4 and 5).
  • Most implementations of TCP do not provide flow control feedback from the receiving process to the receiving TCP; they let the receiving process pull data from the receiving TCP whenever it is ready.
  • Thus receiving TCP controls the sending TCP; the sending TCP controls the sending process.
  • Flow control feedback from the Sending TCP to the Sending Process (path 5) is achieved through simple rejection of data by sending TCP when its window is full.
  • Windows are used to achieve flow control from Receiving TCP to Sending TCP, as discussed in below section.


Opening and Closing Windows[edit]

  • To achieve flow control, TCP forces the sender and the receiver to adjust their window sizes.
  • The size of the buffer for both parties is fixed when the connection is established.
  • The receive window closes (moves its left wall to the right) when more bytes arrive from the sender;
  • It opens (moves its right wall to the right) when more bytes are pulled by the process.
  • Assume that it does not shrink (the right wall does not move to the left).
  • The opening, closing, and shrinking of the send window is controlled by the receiver.
  • The send window closes (moves its left wall to the right) when a new acknowledgement allows it to do so.
  • The send window opens (its right wall moves to the right) when the RWND advertised by the receiver allows it to do so.
Open Close Window.jpg

The diagram shows 8 segments:

1. Client sends the server a SYN to request connection. The client announces its ISN = 100. The server, allocates a buffer size of 800 (assumption) and sets its window to cover the whole buffer (rwnd = 800). The number of the next byte to arrive starts from 101.

2. This is an ACK + SYN segment. The segment uses ack no = 101 to show that it expects to receive bytes starting from 101. It also announces that the client can set a buffer size of 800 bytes.

3. The third segment is an ACK segment from client to server.

4. After the client has set its window with the size (800) dictated by the server, the process pushes 200 bytes of data. The TCP client numbers these bytes 101 to 300. It creates a segment and sends it to server. The segment has starting byte number as 101 and the segment carries 200 bytes. The window of client is then adjusted to show 200 bytes of data are sent but waiting for acknowledgment. When this segment is received at the server, the bytes are stored, and the receive window closes to show that the next byte expected is byte 301; the stored bytes occupy 200 bytes of buffer.

5. The fifth segment is the feedback from the server to the client. The server acknowledges bytes up to and including 300 (expecting to receive byte 301). The segment also carries the size of the receive window after decrease (600). The client, after receiving this segment, purges the acknowledged bytes from its window and closes its window to show that the next byte to send is byte 301. The window size decreases to 600 bytes. Although the allocated buffer can store 800 bytes, the window cannot open (moving its right wall to the right) because the receiver does not let it.

6. Sent by the client after its process pushes 300 more bytes. The segment defines seq no as 301 and contains 300 bytes. When this segment arrives at the server, the server stores them, but it has to reduce its window size. After its process has pulled 100 bytes of data, the window closes from the left for the amount of 300 bytes, but opens from the right for the amount of 100 bytes. The result is that the size is only reduced 200 bytes. The receiver window size is now 400 bytes.

7. The server acknowledges the receipt of data, and announces that its window size is 400. When this segment arrives at the client, the client has no choice but to reduce its window again and set the window size to the value of rwnd = 400. The send window closes from the left by 300 bytes, and opens from the right by 100 bytes.

8. This one is also from the server after its process has pulled another 200 bytes. Its window size increases. The new rwnd value is now 600. The segment informs the client that the server still expects byte 601, but the server window size has expanded to 600. After this segment arrives at the client, the client opens its window by 200 bytes without closing it. The result is that its window size increases to 600 bytes.

Shrinking of Windows
  • The receive window cannot shrink.
  • The send window can shrink if the receiver defines a value for rwnd that results in shrinking the window.


Window Shutdown[edit]

  • Shrinking the send window by moving its right wall to the left is discouraged.
  • There is one exception: the receiver can temporarily shut down the window by sending a RWND of 0.
  • This can happen if the receiver does not want to receive data from the sender for a while.
  • The sender do not actually shrink the size of the window, but stops sending data until a new advertisement has arrived.
  • Even when the window is shut down by an order from the receiver, the sender can always send a segment with 1 byte of data.
  • This is called Probing and is used to prevent a deadlock.


Silly Window Syndrome [2][edit]
  • A serious problem can arise in the sliding window operation when either the sending application program creates data slowly or the receiving application program consumes data slowly, or both.
  • Any of these situations results in the sending of data in very small segments, which reduces the efficiency of the operation.
  • If TCP sends segments containing only 1 byte of data, it means that a 41-byte datagram (20 bytes TCP header and 20 bytes IP header) transfers only 1 byte of user data.
  • The Overhead is 41:1
  • The inefficiency is even worse after accounting for the data link layer and physical layer overhead.


Syndrome due to Sender
  • The sending TCP may create a silly window syndrome if it is serving an application program that creates data slowly(e.g:1 byte at a time).
  • The application program writes 1 byte at a time into the buffer of the sending TCP.
  • If the sending TCP does not have any specific instructions, it may create segments containing 1 byte of data.
  • The result is a lot of 41-byte segments that are traveling through an internet.
  • The solution is to prevent the sending TCP from sending the data byte by byte.
  • The sending TCP must be forced to wait and collect data to send in a larger block.
  • If it waits too long, it may delay the process.
  • If it does not wait long enough, it may end up sending small segments.


Solution - Nagle’s Algorithm[3]
  • The sending TCP sends the first piece of data it receives from the sending application program even if it is only 1 byte.
  • After sending the first segment, the sending TCP accumulates data in the output buffer and waits until either the receiving TCP sends an acknowledgment or until enough data has accumulated to fill a maximum-size segment.
  • Above Step is repeated for the rest of the transmission.


Syndrome Created by the Receiver
  • if Receiving TCP is serving an application that consumes data slowly (like 1 byte at a time) Syndrome may occur.
  • Assume that the sender creates data in blocks of 1000 byte, but the receiver consumes data 1 byte at a time.
  • Also assume that the input buffer of the receiving TCP is 4 kilobytes. The sender sends the first 4 kilobytes of data.
  • The receiver stores it in its buffer.
  • Now its buffer is full.
  • It advertises a window size of zero, which means the sender should stop sending data.
  • The receiving application reads the first byte of data from the input buffer of the receiving TCP.
  • Now there is 1 byte of space in the incoming buffer.
  • The receiving TCP announces a window size of 1 byte, which means that the sending TCP takes this advertisement as good news and sends a segment carrying only 1 byte of data.
  • The procedure will continue.
  • One byte of data is consumed and a segment carrying 1 byte of data is sent.
  • This is again an efficiency problem.


Two solutions are possible
Clark’s Solution
  • Announce a window size of zero until either
  1. There is enough space to accommodate a segment of maximum size
  2. At least half of the receive buffer is empty.


Delayed Acknowledgment
  • The second solution is to delay sending the acknowledgment.
  • This means that when a segment arrives, it is not acknowledged immediately.
  • The receiver waits until there is a decent amount of space in its incoming buffer before acknowledging the arrived segments.
  • The delayed acknowledgment prevents the sending TCP from sliding its window.
  • After the sending TCP has sent the data in the window, it stops.
  • This removes the syndrome.
  • Delayed acknowledgment also has another advantage: it reduces traffic.
  • The receiver does not have to acknowledge each segment.
  • However, there also is a disadvantage in that the delayed acknowledgment may result in the sender unnecessarily retransmitting the unacknowledged segments.
  • TCP adjusts this by defining that the acknowledgment should not be delayed by more than 500 ms.

Error Control[edit]

  • TCP is a reliable transport layer protocol.
  • This means that an application program that delivers a stream of data to TCP relies on TCP to deliver the entire stream to the application program on the other end in order, without error, and without any part lost or duplicated.
  • TCP provides reliability using error control. Error control includes mechanisms for detecting and resending corrupted segments, resending lost segments, storing out-of-order segments until missing segments arrive, and detecting and discarding duplicated segments.
  • Error control in TCP is achieved through the use of three simple tools: checksum, acknowledgment, and time-out.


Checksum[edit]

  • Each segment includes a checksum field, which is used to check for a corrupted segment.
  • If a segment is corrupted as deleted by an invalid checksum, the segment is discarded by the destination TCP and is considered as lost.
  • TCP uses a 16-bit checksum that is mandatory in every segment.


Acknowledgment[edit]

  • TCP uses acknowledgments to confirm the receipt of data segments.
  • Control segments that carry no data but consume a sequence number are also acknowledged.
  • ACK segments are never acknowledged.


Acknowledgment Type

There are two types of acknowledgment:


Cumulative Acknowledgment (ACK)
  • TCP was originally designed to acknowledge receipt of segments cumulatively.
  • The receiver advertises the next byte it expects to receive, ignoring all segments received and stored out of order.
  • Also called Positive Cumulative Acknowledgment or ACK.
  • "Positive” indicates that no feedback is provided for discarded, lost, or duplicate segments.
  • The 32-bit ACK field in the TCP header is used for cumulative acknowledgments
  • Its value is valid only when the ACK flag bit is set to 1.


Selective Acknowledgment (SACK)
  • A SACK does not replace ACK, but reports additional information to the sender.
  • A SACK reports a block of data that is out of order.
  • Also reports a block of segments that is duplicated.
  • There is no provision in the TCP header for adding this type of information.
  • SACK is implemented as an option at the end of the TCP header.


Acknowledgment Generation

1. When end A sends a data segment to end B, it must include (piggyback) an acknowledgment that gives the next sequence number it expects to receive. This rule decreases the number of segments needed and therefore reduces traffic.

2. When the receiver has no data to send and it receives an in-order segment (with expected sequence number) and the previous segment has already been acknowledged, the receiver delays sending an ACK segment until another segment arrives or until a period of time (normally 500 ms) has passed. In other words, the receiver needs to delay sending an ACK segment if there is only one outstanding in-order segment. This rule reduces ACK segment traffic.

3. When a segment arrives with a sequence number that is expected by the receiver, and the previous in-order segment has not been acknowledged, the receiver immediately sends an ACK segment. In other words, there should not be more than two in-order unacknowledged segments at any time. This prevents the unnecessary retransmission of segments that may create congestion in the network.

4. When a segment arrives with an out-of-order sequence number that is higher than expected, the receiver immediately sends an ACK segment announcing the sequence number of the next expected segment. This leads to the fast retransmission of missing segments.

5. When a missing segment arrives, the receiver sends an ACK segment to announce the next sequence number expected. This informs the receiver that segments reported missing have been received.

6. If a duplicate segment arrives, the receiver discards the segment, but immediately sends an acknowledgment indicating the next in-order segment expected. This solves some problems when an ACK segment itself is lost.

Retransmission[edit]

  • The heart of the error control mechanism is the retransmission of segments.
  • When a segment is sent, it is stored in a queue until it is acknowledged.
  • When the retransmission timer expires or when the sender receives three duplicate ACKs for the first segment in the queue, that segment is retransmitted.


Retransmission after RTO
  • The sending TCP maintains one retransmission time-out (RTO) for each connection.
  • When the timer matures, i.e. times out, TCP sends the segment in the front of the queue (the segment with the smallest sequence number) and restarts the timer.
  • Note that again we assume Sf < Sn.
  • This version of TCP is sometimes referred to as Tahoe.
  • We will see later that the value of RTO is dynamic in TCP and is updated based on the round-trip time (RTT) of segments.
  • RTT is the time needed for a segment to reach a destination and for an acknowledgment to be received.


Retransmission after Three Duplicate ACK Segments(Reno)
  • The previous rule about retransmission of a segment is sufficient if the value of RTO is not large.
  • To help throughput by allowing sender to retransmit sooner than waiting for a time out, most implementations today follow the three duplicate ACKs rule and retransmit the missing segment immediately.
  • This feature is called fast retransmission, and the version of TCP that uses this feature is referred to as Reno.
  • In this version, if three duplicate acknowledgments (i.e., an original ACK plus three exactly identical copies) arrives for a segment, the next segment is retransmitted without waiting for the time-out.


Out-of-Order Segments
  • TCP implementations today do not discard out-of-order segments.
  • They store them temporarily and flag them as out-of-order segments until the missing segments arrive.
  • Out-of-order segments are never delivered to the process.
  • TCP guarantees that data are delivered to the process in order.


Lost Segment
Lost Segment.jpg
   Ambox notice.png     This section needs to be precised
  • A lost segment is discarded somewhere in the network; a corrupted segment is discarded by the receiver itself.
  • Both are considered lost.
  • We are assuming that data transfer is unidirectional: one site is sending, the other receiving.
  • In our scenario, the sender sends segments 1 and 2, which are acknowledged immediately by an ACK (rule 3).
  • Segment 3, however, is lost.
  • The receiver receives segment 4, which is out of order.
  • The receiver stores the data in the segment in its buffer but leaves a gap to indicate that there is no continuity in the data.
  • The receiver immediately sends an acknowledgment to the sender displaying the next byte it expects (rule 4).
  • Note that the receiver stores bytes 801 to 900, but never delivers these bytes to the application until the gap is filled.
  • The sender TCP keeps one RTO timer for the whole period of connection.
  • When the third segment times out, the sending TCP resends segment 3, which arrives this time and is acknowledged properly (rule 5).


Fast Retransmission
Fast retransmission.jpg
  • Here RTO has a larger value.
  • Each time the receiver receives the fourth, fifth, and sixth segments, it triggers an acknowledgment (rule 4).
  • The sender receives four acknowledgments with the same value (three duplicates).
  • Although the timer has not matured, the rule for fast transmission requires that segment 3, the segment that is expected by all of these duplicate acknowledgments, be resent immediately.
  • After resending this segment, the timer is restarted.


Delayed Segment
  • TCP uses the services of IP, which is a connectionless protocol.
  • Each IP datagram encapsulating a TCP segment may reach the final destination through a different route with a different delay.
  • Hence TCP segments may be delayed.
  • Delayed segments sometimes may time out.
  • If the delayed segment arrives after it has been resent, it is considered a duplicate segment and discarded.


Duplicate Segment
  • A duplicate segment can be created, for example, by a sending TCP when a segment is delayed and treated as lost by the receiver.
  • Handling the duplicated segment is a simple process for the destination TCP.
  • The destination TCP expects a continuous stream of bytes.
  • When a segment arrives that contains a sequence number equal to an already received and stored segment, it is discarded.
  • An ACK is sent with ackNo defining the expected segment.


Automatically Corrected Lost ACK
Lost acknowledgment.jpg
  • A key advantage of using cumulative acknowledgments.
  • Figure shows a lost acknowledgment sent by the receiver of data.
  • In the TCP acknowledgment mechanism, a lost acknowledgment may not even be noticed by the source TCP.
  • TCP uses an accumulative acknowledgment system.
  • We can say that the next acknowledgment automatically corrects the loss of the acknowledgment.
Lost acknowledgment corrected by resending a segment.jpg
  • If the next acknowledgment is delayed for a long time or there is no next acknowledgment (the lost acknowledgment is the last one sent), the correction is triggered by the RTO timer.
  • A duplicate segment is the result.
  • When the receiver receives a duplicate segment, it discards it, and resends the last ACK immediately to inform the sender that the segment or segments have been received.
  • Note that only one segment is retransmitted although two segments are not acknowledged.
  • When the sender receives the retransmitted ACK, it knows that both segments are safe and sound because acknowledgment is cumulative.


Deadlock Created by Lost Acknowledgment
  • There is one situation in which loss of an acknowledgment may result in system deadlock.
  • This is the case in which a receiver sends an acknowledgment with rwnd set to 0 and requests that the sender shut down its window temporarily.
  • After a while, the receiver wants to remove the restriction; however, if it has no data to send, it sends an ACK segment and removes the restriction with a nonzero value for rwnd.
  • A problem arises if this acknowledgment is lost.
  • The sender is waiting for an acknowledgment that announces the nonzero rwnd.
  • The receiver thinks that the sender has received this and is waiting for data.
  • This situation is called a deadlock; each end is waiting for a response from the other end and nothing is happening.
  • A retransmission timer is not set.
  • To prevent deadlock, a persistence timer was designed.


Congestion Control[edit]

  • Congestion control in TCP is based on both open-loop and closed-loop mechanisms.
  • TCP uses a congestion window and a congestion policy that avoid congestion and detect and alleviate congestion after it has occurred.


Congestion Window
  • It is not only the receiver that can dictate to the sender the size of the sender’s window.
  • The network can also dectate the size.
  • If the network cannot deliver the data as fast as it is created by the sender, it must tell the sender to slow down.
  • So Receiver and Network determine the size of the sender’s window.
  • The sender has two pieces of information: the Receiver-Advertised window size and the Congestion window size.
  • The actual size of the window is the minimum of these two:
Actual window size = Minimum (rwnd, cwnd)


Congestion Policy
  • TCP’s general policy for handling congestion is based on three phases:
  1. Slow Start
  2. Congestion Avoidance
  3. Congestion Detection
  • In the slow start phase, the sender starts with a slow rate of transmission, but increases the rate rapidly to reach a threshold.
  • When the threshold is reached, the rate of increase is reduced.
  • Finally if ever congestion is detected, the sender goes back to the slow start or congestion avoidance phase, based on how the congestion is detected.


Slow Start - Exponential Increase
Slow Start.png
  • The slow start algorithm is based on the idea that the size of the congestion window (cwnd) starts with 1 MSS.
  • The MSS is determined during connection establishment using an option of the same name.
  • The size of the window increases one MSS each time one acknowledgement arrives.
  • The algorithm starts slowly, but grows exponentially.
  • Assume that rwnd is much longer than cwnd, so that the sender window size always equals cwnd.
  • Ignore delayed-ACK policy for now and assume that each segment is acknowledged individually.
  • The sender starts with cwnd = 1 MSS.
  • This means that the sender can send only one segment.
  • After the first ACK arrives, the size of the congestion window is increased by 1, which means that cwnd is now 2.
  • Now two more segments can be sent.
  • When two more ACKs arrive, the size of the window is increased by 1 MSS for each ACK, which means cwnd is now 4.
  • Now four more segments can be sent.
  • When four ACKs arrive, the size of the window increases by 4, which means that cwnd is now 8.
  • In the slow start algorithm, the size of the congestion window increases exponentially until it reaches a threshold.


Congestion Avoidance - Additive Increase
Congestion avoidance.png
  • In slow start algorithm, the size of the congestion window increases exponentially.
  • To avoid congestion before it happens, one must slow down this exponential growth.
  • TCP's Congestion avoidance feature increases the cwnd additively instead of exponentially.
  • When the size of the congestion window reaches the slow start threshold, the slow start phase stops and the additive phase begins.
  • Each time the whole “window” of segments is acknowledged, the size of the congestion window is increased by one.
  • A window is the number of segments transmitted during RTT.
  • The increase is based on RTT, not on the number of arrived ACKs.
  • Therefore the size of the congestion window increases additively until congestion is detected.


Congestion Detection - Multiplicative Decrease
  • If congestion occurs, the congestion window size must be decreased.
  • The only way a sender can guess that congestion has occurred is the need to retransmit a segment.
  • This is a major assumption made by TCP.
  • Retransmission is needed to recover a missing packet which is assumed to have been dropped by a router due to overloaded or congested.
  • Retransmission can occur in one of two cases: when the RTO timer times out or when three duplicate ACKs are received.
  • In both cases, the size of the threshold is dropped to half (multiplicative decrease).

Most TCP implementations have two reactions:

1. If a time-out occurs, there is a stronger possibility of congestion; a segment has probably been dropped in the network and there is no news about the following sent segments.

In this case TCP reacts strongly:

a. It sets the value of the threshold to half of the current window size.
b. It reduces cwnd back to one segment.
c. It starts the slow start phase again.

2. If three duplicate ACKs are received, there is a weaker possibility of congestion; a segment may have been dropped but some segments after that have arrived safely since three duplicate ACKs are received. This is called fast transmission and fast recovery.

In this case, TCP has a weaker reaction as shown below:

a. It sets the value of the threshold to half of the current window size.
b. It sets cwnd to the value of the threshold (some implementations add three segment sizes to the threshold).
c. It starts the congestion avoidance phase.


TCP Timers[edit]

Most TCP implementations use at least four timers

  • Retransmission
  • Persistence
  • Keepalive
  • TIME-WAIT


Retransmission Timer

To retransmit lost segments, TCP employs one retransmission timer for the whole connection period that handles the retransmission time-out (RTO), the waiting time for an acknowledgment of a segment.

The following rules apply to the retransmission timer:
1. When TCP sends the segment in front of the sending queue, it starts the timer.
2. When the timer expires, TCP resends the first segment in front of the queue, and restarts the timer.
3. When a segment (or segments) are cumulatively acknowledged, the segment (or segments) are purged from the queue.
4. If the queue is empty, TCP stops the timer; otherwise, TCP restarts the timer.


Round-Trip Time (RTT)

To calculate the retransmission time-out (RTO), we first need to calculate the RTT.

  • Measured RTT - The measured round-trip time for a segment is the time required for the segment to reach the destination and be acknowledged, although the acknowledgment may include other segments. In TCP only one RTT measurement can be in progress at any time.
  • Smoothed RTT - The measured RTT is likely to change for each round trip. The fluctuation is so high in today’s Internet that a single measurement alone cannot be used for retransmission time-out purposes.
  • RTT Deviation - Most implementations use RTT deviation


Retransmission Time-out (RTO)
  • The value of RTO is based on the smoothed round-trip time and its deviation.
  • Take the running smoothed average value of Smoothed RTT, and add four times the running smoothed average value of RTT Deviation (normally a small value).


Karn’s Algorithm
  • Do not consider the round-trip time of a retransmitted segment in the calculation of RTTs.
  • Do not update the value of RTTs until you send a segment and receive an acknowledgment without the need for retransmission.
  • TCP does not consider the RTT of a retransmitted segment in its calculation of a new RTO.


Exponential Backoff
  • Most TCP implementations use an exponential backoff strategy to calculate the value of RTO if a retransmission occurs.
  • The value of RTO is doubled for each retransmission.
  • So if the segment is retransmitted once, the value is two times the RTO.
  • If it transmitted twice, the value is four times the RTO.


Persistence Timer
  • To deal with a zero-window-size advertisement, TCP needs Persistence Timer.
  • If the receiving TCP announces a window size of zero, the sending TCP stops transmitting segments until the receiving TCP sends an ACK segment announcing a nonzero window size.
  • This ACK segment can be lost.
  • Remember - ACK segments are not acknowledged nor retransmitted in TCP.
  • Both TCPs might continue to wait for each other forever (a deadlock).
  • To correct this deadlock, TCP uses a persistence timer for each connection.
  • When the sending TCP receives an acknowledgment with a window size of zero, it starts a persistence timer.
  • When the persistence timer goes off, the sending TCP sends a special segment called a Probe.
  • This segment contains only 1 byte of new data.
  • It has a sequence number, but its sequence number is never acknowledged; it is even ignored in calculating the sequence number for the rest of the data.
  • The probe causes the receiving TCP to resend the acknowledgment.
  • The value of the persistence timer is set to the value of the retransmission time.
  • If a response is not received from the receiver, another probe segment is sent and the value of the persistence timer is doubled and reset.
  • The sender continues sending the probe segments and doubling and resetting the value of the persistence timer until the value reaches a threshold (generally 60s).
  • After that the sender sends one probe segment every 60 s until the window is reopened.


Keepalive Timer
  • A keepalive timer is used in some implementations to prevent a long idle connection between two TCPs.
  • If a client opens a TCP connection to a server, transfers some data, and becomes silent.
  • Perhaps the client has crashed. In this case, the connection remains open forever.
  • To remedy this situation, most implementations equip a server with a keepalive timer.
  • Each time the server hears from a client, it resets this timer.
  • The time-out is usually 2 hours.
  • If the server does not hear from the client after 2 hours, it sends a probe segment.
  • If there is no response after 10 probes, each of which is 75s apart, it assumes that the client is down and terminates the connection.


Options[edit]

The TCP header can have up to 40 bytes of optional information.

  • 1-byte options
  1. End of option list
  2. No operation
  • Multiple-byte options
  1. Maximum Segment Size
  2. Window Scale Factor
  3. Timestamp
  4. SACK-permitted
  5. SACK


End of Option
  • EOP is a 1-byte option used for padding at the end of the option section.
  • It can only be used as the last option. There are no more options in the header after EOP.
  • Only one occurrence of this option is allowed.
  • After this option, the receiver looks for the payload data.
  • Data from the application program starts at the beginning of the next 32-bit word.


No Operation
  • NOP option is also a 1-byte option used as a filler.
  • It normally comes before another option to help align it in a four-word slot.
  • NOP can be used more than once.


Maximum Segment Size (MSS)
  • MSS option defines the size of the biggest unit of data that can be received by the destination of the TCP segment.
  • It defines the maximum size of the data, not the maximum size of the segment.
  • The field is 16 bits long, the value can be 0 to 65,535 bytes.
  • Each party defines the MSS for the segments it will receive during the connection.
  • If a party does not define this, the default values is 536 bytes.
  • The value of MSS is determined during connection establishment and does not change during the connection.


Window Scale Factor
  • Window size field in the header defines the size of the sliding window.
  • This field is 16 bits long, which means that the window can range from 0 to 65,535 bytes.
  • It may not be sufficient if the data are traveling through a long channel with a wide bandwidth.
  • To increase the window size, a window scale factor is used.
  • The new window size is found by first raising 2 to the number specified in the window scale factor.
  • Then this result is multiplied by the value of the window size in the header.
New Window Size = Window Size in Header × 2 Window Scale Factor

If Window Scale Factor is 3.
An end point receives an acknowledgment in which the window size is advertised as 32,768.

New Window Size = 32,768 × 23 = 262,144 bytes. 
  • Although the scale factor could be as large as 255, the largest value allowed by TCP/IP is 14.
  • Maximum window size is 216 × 214 = 230, which is less than the maximum value for the sequence number.
  • The size of the window cannot be greater than the maximum value of the sequence number.
  • The value of the window scale factor can also be determined only during connection establishment; it does not change during the connection.
  • During data transfer, the size of the window (specified in the header) may be changed, but it must be multiplied by the same window scale factor.
  • One end may set the value of the window scale factor to 0, which means although it supports this option, it does not want to use it for this connection.


Timestamp
  • This is a 10-byte option.
  • TS is announced in the SYN.
  • If SYN + ACK from the other end also has TS, it is allowed; otherwise it does not use it any more.
  • The timestamp option has two applications: RTT Calc and PAWS attack prevention.
Measuring RTT
  • Timestamp can be used to measure the round-trip time (RTT).
  • TCP, when ready to send a segment, reads the value of the system clock and inserts this value, a 32-bit number, in the timestamp value field.
  • The receiver, when sending an acknowledgment for this segment or an accumulative acknowledgment that covers the bytes in this segment, copies the timestamp received in the timestamp echo reply.
  • The sender, upon receiving the acknowledgment, subtracts the value of the timestamp echo reply from the time shown by the clock to find RTT.
  • Note that there is no need for the sender’s and receiver’s clocks to be synchronized because all calculations are based on the sender clock.
  • Also note that the sender does not have to remember or store the time a segment left because this value is carried by the segment itself.
  • The receiver needs to keep track of two variables. The first, lastack, is the value of the last acknowledgment sent.
  • The second, tsrecent, is the value of the recent timestamp that has not yet echoed.
  • When the receiver receives a segment that contains the byte matching the value of lastack, it inserts the value of the timestamp field in the tsrecent variable.
  • When it sends an acknowledgment, it inserts the value of tsrecent in the echo reply field.
  • The sender simply inserts the value of the clock (for example, the number of seconds past midnight) in the timestamp field for the first and second segment.
  • When an acknowledgment comes (the third segment), the value of the clock is checked and the value of the echo reply field is subtracted from the current time.
  • RTT is 12 s in this scenario.
  • The receiver’s function is more involved.
  • It keeps track of the last acknowledgment sent (12000).
  • When the first segment arrives, it contains the bytes 12000 to 12099.
  • The first byte is the same as the value of lastack.
  • It then copies the timestamp value (4720) into the tsrecent variable.
  • The value of lastack is still 12000 (no new acknowledgment has been sent).
  • When the second segment arrives, since none of the byte numbers in this segment include the value of lastack, the value of the timestamp field is ignored.
  • When the receiver decides to send an accumulative acknowledgment with acknowledgment 12200, it changes the value of lastack to 12200 and inserts the value of tsrecent in the echo reply field.
  • The value of tsrecent will not change until it is replaced by a new segment that carries byte 12200 (next segment).
  • Note that as the example shows, the RTT calculated is the time difference between sending the first segment and receiving the third segment.
  • This is actually the meaning of RTT: the time difference between a packet sent and the acknowledgment received.
  • The third segment carries the acknowledgment for the first and second segments.
PAWS
  • The timestamp option has another application, protection against wrapped sequence numbers (PAWS).
  • The sequence number defined in the TCP protocol is only 32 bits long.
  • Although this is a large number, it could be wrapped around in a high-speed connection.
  • This implies that if a sequence number is n at one time, it could be n again during the lifetime of the same connection.
  • Now if the first segment is duplicated and arrives during the second round of the sequence numbers, the segment belonging to the past is wrongly taken as the segment belonging to the new round.
  • One solution to this problem is to increase the size of the sequence number, but this involves increasing the size of the window as well as the format of the segment and more.
  • The easiest solution is to include the timestamp in the identification of a segment.
  • In other words, the identity of a segment can be defined as the combination of timestamp and sequence number.
  • This means increasing the size of the identification.
  • Two segments 400:12,001 and 700:12,001 definitely belong to different incarnations.
  • The first was sent at time 400, the second at time 700.


SACK-Permitted and SACK Options
  • Acknowledgment field is designed as cumulative acknowledgment, which means it reports the receipt of the last consecutive byte.
  • It does not report the bytes that have arrived Out of order or Duplicate segments.
  • This may have a negative effect on TCP’s performance.
  • If some packets are lost or dropped, the sender must wait until a time-out and then send all packets that have not been acknowledged.
  • The receiver may receive duplicate packets.
  • To improve performance, selective acknowledgment (SACK) was proposed.
  • Selective acknowledgment allows the sender to have a better idea of which segments are actually lost and which have arrived out of order.
  • The new proposal even includes a list for duplicate packets.
  • The sender can then send only those segments that are really lost.
  • The list of duplicate segments can help the sender find the segments which have been retransmitted by a short time-out.
  • The SACK-permitted option of two bytes is used only during connection establishment.
  • The host that sends the SYN segment adds this option to show that it can support the SACK option.
  • If the other end, in its SYN + ACK segment, also includes this option, then the two ends can use the SACK option during data transfer.
  • Note that the SACK-permitted option is not allowed during the data transfer phase.
  • The SACK option, of variable length, is used during data transfer only if both ends agree (if they have exchanged ACK-permitted options during connection establishment).
  • The option includes a list for blocks arriving out of order.
  • Each block occupies two 32-bit numbers that define the beginning and the end of the blocks.
  • Allowed size of an option in TCP is only 40 bytes.
  • This means that a SACK option cannot define more than 4 blocks.
  • The information for 5 blocks occupies (5 × 2) × 4 + 2 or 42 bytes, which is beyond the available size for the option section in a segment.
  • If the SACK option is used with other options, then the number of blocks may be reduced.
  • The first block of the SACK option can be used to report the duplicates.
  • This is used only if the implementation allows this feature.
  • The SACK option announces this duplicate data first and then the out-of-order block.
  • This time, however, the duplicated block is not yet acknowledged by ACK, but because it is part of the out-of-order block (4001:5000 is part of 4001:6000), it is understood by the sender that it defines the duplicate data.


OSPF BGP Interview Questions[edit]

  • How many peers can RR have?

Redistribution from osfp to bgp[edit]

all redistributed routes into bgp takes ad value of BGP ,inorder redistribute all the ospf routes internal ,external (E1&E2) we need to uses redisrtibute ospf process mathc internal external 1 external 2

Redistribution of bgp into Ospf will take metric one ,Reditributio of ospf into BGP take IGP metric

Qos -Each router maintain two queue hardware queue works on FIFO and software queues (LLQ,CBWFQ,Flow based WFq) ,service policy applies only on software queue


Use the tx-ring-limit command to tune the size of the transmit ring to a non-default value (hardware queue is last stop before the packet is transmitted)

Note: An exception to these guidelines for LLQ is Frame Relay on the Cisco 7200 router and other non-Route/Switch Processor (RSP) platforms. The original implementation of LLQ over Frame Relay on these platforms did not allow the priority classes to exceed the configured rate during periods of non-congestion. Cisco IOS Software Release 12.2 removes this exception and ensures that non-conforming packets are only dropped if there is congestion. In addition, packets smaller than an FRF.12 fragmentation size are no longer sent through the fragmenting process, reducing CPU utilization. 

It's all based upon whether there is or is not congestion on the link.

 
The priority queue (LLQ) will always be served first, regardless of congestion.  It will be both guaranteed bandwidth AND policed if there is congestion.  If there is not congestion, you may get more throughput of your priority class traffic.

 
If the class is underutilized then the bandwidth may get used by other classes.  Generally speaking this is harder to quantify than you may think.  Because in normal classes, the "bandwidth" command is a minimum of what's guaranteed.  So you may get MORE in varying amounts just depending on what is in the queue at any point in time of congestion.

 

As mentioned before, policers determine whether each packet conforms or exceeds (or, optionally, violates) to the traffic configured policies and take the prescribed action. The action taken can include dropping or re-marking the packet. Conforming traffic is traffic that falls within the rate configured for the policer. Exceeding traffic is traffic that is above the policer rate but still within the burst parameters specified. Violating traffic is traffic that is above both the configured traffic rate and the burst parameters.



An improvement to the single-rate two-color marker/policer algorithm is based on RFC 2697, which details the logic of a single-rate three-color marker.

The single-rate three-color marker/policer uses an algorithm with two token buckets. Any unused tokens in the first bucket are placed in a second token bucket to be used as credits later for temporary bursts that might exceed the CIR. The allowance of tokens placed in this second bucket is called the excess burst (Be), and this number of tokens is placed in the bucket when Bc is full. When the Bc is not full, the second bucket contains the unused tokens. The Be is the maximum number of bits that can exceed the burst siz

Queing -FIFO,PQ,WFO,CBWFQ[edit]

PQ- high priorty queue is always serviced first irrrespective traffic coming fron other queue.

WFQ-flow based ,each flow consist of source port ,destination port ,source and destination WFO always give prefernce smaller flows and lower packet size 

CBWFQ-each traffic is classifed and placed in class ,each class is allcated some amount of bandwidth ,queues are always serviced on basis amount of allocated bandwidth to queue .



Random Early Detection (RED) is a congestion avoidance mechanism that takes advantage of the congestion control mechanism of TCP. By randomly dropping packets prior to periods of high congestion, RED tells the packet source to decrease its transmission rate. WRED drops packets selectively based on IP precedence. Edge routers assign IP precedences to packets as they enter the network. (WRED is useful on any output interface where you expect to have congestion. However, WRED is usually used in the core routers of a network, rather than at the edge.) WRED uses these precedences to determine how it treats different types of traffic. 

When a packet arrives, the following events occur: 

1. The average queue size is calculated. 

2. If the average is less than the minimum queue threshold, the arriving packet is queued. 

3. If the average is between the minimum queue threshold for that type of traffic and the maximum threshold for the interface, the packet is either dropped or queued, depending on the packet drop probability for that type of traffic. 

4. If the average queue size is greater than the maximum threshold, the packet is dropped.

IPSEC[edit]

TWo modes trasnport ,tunnel mode 

Transport mode only data  packet is encrypted 
tunnel mode -ESP header is placed between new IP header and data 

|-----Encrypted---------------|

Data | Original IP Header | ESP Header | New IP Header

 
In Transport mode only the data is encrypted, and the original IP header is places in front of the ESP header.

 
|--Encrypted-----|

Data  ------ | ESP Header | Original IP Header



encryption algo -DES,3DES,AES

Phase 1 -authenticatation  of IPsec peers and negotiation of SA to provide secure communication channel for phase 2

Phase 2-data is tranfered based on SA parameters exhange and keys stored in SA database.

Phase 1- securty poiclies are negotiated,Diffe helman exchange ( used to genrate the preshared keys) ,authentication of remote peer


Tranform sets-consist of encryption algo,authication algo,key length proposed.
diffe helman -public key exchange method that alows two peers to establish shared secret key.
secret preshared keys are manuualy entered to authiticate the remote Peer.



SA consist of encryption algo ,authtication algo ,destination adress ,key lenghth and life time of tunnel .

each SA has life time based on two factors either amount of data transfered or time in seconds.

1, Define ISAKMP polciy 2. Define tranform set includes encryptio and data intergrity also 3 create ACL for intersting traffic 
4. create crypto map which matches previously defined paramters 5. apply crypto on outgoing interface.

we want to use RSA Keys instead of preshared key then isakmp identity need to be defined
crypto isakmp policy 1
 authentication rsa-encr
 group 2
 lifetime 240
crypto isakmp identity hostname

Protocol 50-ESP traffic 
protocol 51-AH traffic 
udp 500-ISKMP Traffic 


ISAKMP: Authenticates the peers, Determines if Authentication is preshared ot RSA-ecryption, and prepares the SA which includes group(length of key in Bits) and lifetime of the tunnel.

IPSEC Trasnform set determines the encyption protocol AH/ESP with Data Encryption standards(DES/3DES) for the data to be trasported across the secure tunnel & esp-sha-hmac defines the key stregth and hashing algorithm for sharing keys

Mode (Tunnel/Transport can be defind in trasform set only.



All traffic that goes through the ASA is inspected using the Adaptive Security Algorithm and either allowed through or dropped. A simple packet filter can check for the correct source address, destination address, and ports, but it does not check that the packet sequence or flags are correct. A filter also checks every packet against the filter, which can be a slow process. 

A stateful firewall like the ASA, however, takes into consideration the state of a packet: 
•Is this a new connection? 

If it is a new connection, the ASA has to check the packet against access lists and perform other tasks to determine if the packet is allowed or denied. To perform this check, the first packet of the session goes through the "session management path," and depending on the type of traffic, it might also pass through the "control plane path." 

The session management path is responsible for the following tasks: 

–Performing the access list checks 

–Performing route lookups 

–Allocating NAT translations (xlates) 

–Establishing sessions in the "fast path" 

Some packets that require Layer 7 inspection (the packet payload must be inspected or altered) are passed on to the control plane path. Layer 7 inspection engines are required for protocols that have two or more channels: a data channel, which uses well-known port numbers, and a control channel, which uses different port numbers for each session. These protocols include FTP, H.323, and SNMP. 

Is this an established connection? sa	

If the connection is already established, the ASA does not need to re-check packets; most matching packets can go through the "fast" path in both directions. The fast path is responsible for the following tasks: 

–IP checksum verification 

–Session lookup 

–TCP sequence number check 

–NAT translations based on existing sessions 

–Layer 3 and Layer 4 header adjustments 

Data packets for protocols that require Layer 7 inspection can also go through the fast path. 

BGP[edit]

BGP SYnchronization rule -IF the AS is acting transient for other AS routes learn through BGP will not be advertized unless the all the routes learn this routes though IGP.

If we turned on the synchronisation BGP router will not advertize the route learned from IBGP PEER to EBGP Peer unless that route is learned through IGP. 

Split horizon rule -Routes larn though IBGp nei will not be advertized to other IBGP nei .

BGP path selection cretiron -route is excluded if next hop is unreachable ,hightest wieight ,high local pref ,route if locally orginated ,shortest as path len,prefer lowest origin code (IGP<EGP<Unknown),lowest MED,ebgp overIBGP, between IBGP closed IGP nei ,bet EBGP oldest route,lowest Router ID.

BGP Message types -Keepalive ,notification ,open ,update .

Routes received from a Route-Reflector-client is reflected to other clients and non-client neighbors.So if we have two route reflectors we should also keep in separte clusters ,, to avoide loops .That means that if you have multiple RRs with different cluster ID, optimal path is selected by selecting shorter cluster list. Having multiple RRs in the same cluster creates partial connectivity during failure 

The first route reflector also set an additonal BGP attribute called originator id and add it to BGP router -id of client.if any router receive the route which contains its own router id will ignore the route

Confedrations -Breaking As into smaller As so that they can exchange routing updates using intra confedration EBGp Seesion.
but on the intraconfedration EBGP session parmaters for IBGP are still preserved .(like next hop self,metric ,preference)

commands -under BGP process bgp confedration id x.x -Original As
         -BGP confdration peers x.x ,y...- Need to specify the the intra confdration with in AS.

MED Vs As path prepend -MED doesnot goes beyond neibor As while As path prepeend goes beyond that .

BGP always compare md -compares MED for a path from neibors in differnt AS.

BGP Determinsic-Med -comparison of MEd for a path from differnt Peers advertize in  same .As,

BGP conditional advertizement uses two terms advertize-map and non-exist-map ,advertize the prefix in adtervertize map only if there is no  route in BGPtable defined in non-exist-map

BGP conditonal Inject and Exist map -BGP conditional Route injection advertize the specific route defined in inject map from the summary route present in exist map .Its reverse of Aggregation .

SOO -Site of orgin -is used to prevent routing loops and is used to identify the site from where the route is orginated and does not readvertize same route back to the site .

SOO is enabled on PE routers -marked the customer prefixes.

BGP communities are used to TAG the routes and they are used to perform policy routing in upstream router .Community attribute consist of four octets .. Inorder to send community 
we need to use send community command under BGP process .
BGP community are :
Internet: advertise these routes to all neighbors. 
Local-as: prevent sending routes outside the local As within the confederation. 
No-Advertise: do not advertise this route to any peer, internal or external. 
No-Export: do not advertise this route to external BGP peers. 

Local AS command can be used in while migration of As - it will genrate BGP open message which is defined in local AS.
nei x.x.x. local 100 no prepend  replace as dual-as.( can be used for remote peer to configue whatever AS no has configured at there side ).

Peers Group -Peer groups are a way of defining templates/groups with settings for neighbor 
relationships . The same policy that goes to 1 neighbor in the peer group must go to all if it case one neighbor has a slightly different config we do not use peer-group for this neighbor the idea being a group with all required bgp settings and then add the neighbors to this group so they inherit the settings. 
using BGP peer group one update is sent to peer group instead of individual updates helps in optimisation of updates .Configration makes its simpler.

BGP route relector -Eliminates the need of bgp full mesh ,similar to ospf DR ,BDR elecltion, only peering needs to with RR.
When RR get the update from its client it sent to other RR and its client .
Modify the spilt horizon rule .BGP cluster id is used as loop prevention.
Doesnot modiy the next hope attributes.
Route reflectores modify split horizon rule now routes learn through IBGP can be forwarded to other IBGP nei ,route reflectore can do .
if the client is having IBGP session with multiple routereflectores so each client will receive two copies of all routes.this can create the routing loops to avoid it each route reflector and its client form cluster which is identifed by cluster id which is unique in AS.
whenver particular route is reflected route reflector router id is added to  cluster list attirbute and set cluster id number in cluster -list.if for any reason route is reflected back to route reflectore for some reason it will reconganize cluster id includes its own router id . and will not forward it .

the BGP Link Bandwidth feature used to enable multipath load balancing for external links with unequal bandwidth capacity. This feature is enabled under an IPv4 or VPNv4 address family sessions by entering the bgp dmzlink-bw command. This feature supports both iBGP, eBGP multipath load balancing, and eiBGP multipath load balancing in Multiprotocol Label Switching (MPLS) Virtual Private Networks (VPNs). When this feature is enabled, routes learned from directly connected external neighbor are propagated through the internal BGP (iBGP) network with the bandwidth of the source external link.

The link bandwidth extended community indicates the preference of an autonomous system exit link in terms of bandwidth. This extended community is applied to external links between directly connected eBGP peers by entering the neighbor dmzlink-bw command. The link bandwidth extended community attribute is propagated to iBGP peers when extended community exchange is enabled with the neighbor send-community command. 

it should be configured in conjuction with max path command .

bgp dmzlink-bw
neighbor ip-address dmzlink-bw
neighbor ip-address send-community [both | extended | standar

Aggreagate with AS set command -normal aggregation with summary command advertise the summary prefix only and suppress all the specific routes ,so router which is performing the aggreagation will include its own AS while sending the update .
so when Aggreagate with AS set  command is used it will include all the AS in updates for summary prefix  for those AS route performing the aggregation with AS list ,this will prevent routing loop. 

attribute map -can be used to modify the community received in aggregation router  to none.(command ) MAP.When particular is sending the prefix to router performing aggregation with community like no export  attached ,Aggregate router will inherit the communtiy and can cause issue to aggregate prefix while propagating ,TO avoid it we can  modifiy the community to none using atrribute map command (aggrgate address x.x.x.x .x.x.x as-set summary only attribute map )

BGP Backdor link- used to modifiy the AD for external route from 20 to 200 so that IGP learned route can be prefered over EBGP .
command will be added to router which is learning the prefises from two routing ptotocols .

router bgp x.x.x.x

network x.x.x.x mask backdoor

BGP Questions[edit]

Difference between eBGP and iBGP?
What is the TCP port number for BGP communication? 
Explain various states of BGP?
What is the reason for an interface stuck on active state?
Do we need to follow 3 way handshake process to establish BGP communication?
What are various path attributes?
What is difference between Local preference and MED attributes?
Explain the sequence of selecting the best route through the attributes?

OSPF[edit]

  • OSPF Packet type -Hello ,DBD ,LSR ,LSU ,LSA
  • Each interface participate in OSPF send hello at 224.0.0.5
  • two router to form neighborship-same area ,samehello and dead interval,same subnetmask ,authentication must same .
  • OSPF States-down,init,two way,extrat (DR ,BDR secltion),exchange (DBD contains entry of link or net type having following info link type,adv router,seq number,costoflink),if router donot have update info for link type it send LSR (loading state ),Neirouter send updated LSU again LSR router adds new entry in lSDB once all the routers have identical LSDB -routers are in full state .
  • to send request to DR and BDR (224.0.0.6 ) .
  • for broadcast n/w type each ospf speaking router will be form full adjancey b/w DR, BDR and two way state b/w DR other routers.
  • sh ip ospf database summary ( prefix ) will give information for type 3 inter area routes learned via ABR.
Type 3 LSA called summary LSA doesnot mean network prefixes are summarised while propagated by ABR means topolgy information is summarised.
  • EACH LSA in lSDB contains seq number ,EACH LSA is flooded after30 minutes ,each time LSA is flooded it is incremnted by one )-195
  • point to point -T1,E1,neiborus are discovered automatically,hellos send at M.A 224.0.0.5 ,NO DR BDR election as there are only two routers.
  • multiacess -DR ,BDR election DR failes BDR becomes DR and new BDR is elected.
  • if new router added with highest priorty it will not preemt existing DR and BDR election ,if DR or BDR goes down then only selection starts.
  • DR/BDR-ip ospf priority =0 for DR other
  • STUB Area- All the routers in Area must agree on stub flag , does not allow type 5 and type 4 LSA.and ABR genartes default route in stub area to reach external destination.
to cofigure stub area - area x stub 
  • Tottaly Stub area - removes type 3 ,4 ,5 LSA and ABR genrates inter area default route , total stubby area configured on ABR of the area.
To configure totally stubby - on ABR area x stub no summary and other routers need to configued wth area x stub command .
  • NSSA area -was desgined to keep stub feature attribute and also allowed external routes ,ASBR will genrate type 7 LSA in NSSA and se the P bit 1 and ABR will translate type 7 to type 5 propagate in ospf domain and all routers should agree on NSSA area.ABR doesnot genrate default route automatically .so in case if we other external AS connected to other areas NSSA area will not have information for that external routes , so in that case we need to genrate defaul route mannually.
  • NOS-total stubby area - remove type 3 ,4 ,5 lsa , genrates type 7 LSA and ABR genrates default route .note it is not necessary for ABR to be part of total stubby NSSA it can still run NSSA for that area in ospf process.
  • Order of preference of OSPF routes- O, OIA ,E1,E2 ,N1,N2.
  • When ABR does LSA translation from Type 7 to Type 5 ,if we look for external network in an area using sh ip os database external. there are field,Advertising router and Forwading address ,Advertising address will be address of ABR which is doing the translation and Forwading address is address of ASBR.
  • Also if the forwading address field is 0.0.0.0, then traffic will be forwading to who is orginating the route .
  • if we have mutliple ABR in NSSA the ABR with highest router id will genrate type 5 LSA. this doesnot mean all the traffic will follow the ABR with highest router id because the forwading address field contains the information for the ASBR to reach external destination .
  • In case if we want to change the forwading address on ABR while tranlating from type 7 to type 5 we can use the command
area i nssa no summary translate type 7 suppress forwading address .
Note - in the LSA lookup if the forwading address is 0.0.0.0 so the router which is advertising the lsa and is announcing it self to use himself to reach destination.
  • E1 and E2 routes -E1 routes external cost is added to cost of link packet traverse ,if we have multiple ASBR then we should use marked external routes as type E1
  • if we have muliple ASBR ,then default metric to reach external network would be same propagted by both of them ,in that case each ospf speaking router will use forward metric to reach ASBR as best path.In case the forward metric is same then decision will be based on router id of ASBR.
that can be verified by - sh ip ospf database external XXXx.
  • E2 -External cost only ,if we have single ASBR
Note- ABR has information for all the connected area's so when genrating the type 3 SLA topogy information is summarised and propagated from one area to other area .
  • Loop prevnetion mechanism in OSPF-Its ABR only that accespts and process the type 3 LSA if it is from backbone area .
  • area X filter-list prefix {in|out}. Good news here – this command applies after all summarization has been done and filters the routing information from being used for type-3 LSA generation. It applies to all three type of prefixes: intra-area routes, inter-area routes, and summaries generated as a result of the area X range command. All information is being learned from the router’s RIB. used to filter specific prefix in Type 3 LSA.
  • LSA Type 5 filerting -This LSA is originated by an ASBR (router redistributing external routes) and flooded through the whole OSPF autonomous system,Important -You may filter the redistributed routes by using the command distribute-list out configured under the protocol, which is the source of redistribution or simply applying filtering with your redistribution.
  • The key thing you should remember is that non-local route filtering for OSPF is only available at ABRs and ASBRs
  • Distribute list out on ABR and ASBR will filter the type 5 LSA while propagting
we can verify using sh ip ospf database external x.x.x.x 
  • Distribute list in - Will filter the information from routing table but lSA will still be propagtint to neiobor routers.
  • If we have NSSA area we want to filer type 5 SLA on ABR we can filter the forwading address using ditribute list on ABR. ( As the forwading address is copied from type 7 SLA when ABR regenrates the type 5 SLA out of it .
OSPF Network Types 
  • Point to point - Supports broadcast like t1, E1 , there are only two routers no DR/BDR election ,hello and dead are 10/40
  • Brodacast - Like ethernet ,broadacst capabilty , There is DR and BDR election ,10 and 40
  • point to multipoint brodacast - have broadcast capabilty , NO DR and BDr election , hello/dead are 40 /130 , In case of hub and spoke topology hub will form adjancy

with the spokes ,other spokes will not form adjancy as there is not direct layer connection so when hub receive the update from spoke it changes its next hop self while propagating the updates .

  • Point to multipoint non brodcast - No broadcast capabilty , hello will be send as unicast ,will not be send if neighbors are not defined manually

As there is no brodcast capabilty hellos are send as unicast and there is no DR /BDR election . hello/dead are 40 /130 ,Special next hope processing . Non-Broadcast is the default network type on multipoint frame-relay interface, eg a main interface.

  • Non broadcast n/w - Default network type is nonbroadcast for frame-relay network , there is no broadcast capabilty , hello are send as unicast ,neibors need to define mannualy .hello /dead 30-40 ,DR and BDR election ,

NBMAN-(Non broadcast)-Nei needs to define mannualy ,there is slection of DR and BDR ,full mesh or partail mesh,IN NBMAN if there is DR ,BDR selction all routers should be fully meshed or DR BDR can be staticly configured on router that should have full adjancies to all routers. Make sure the for non-broadcastn/w make sure hub is chossen as DR and need to define nei mannaulay to send ospf updates as unicast.

Note - Broadcast and non broadcast n/w , DR on receiveing the LSA's didnot change the next hop while propagating the LSA to other DR-other routers so in case of broadcast segment its fine while for non broadcaset frame relay network we need to mannualy define the layer 3 to layer 2 resoltuion to reach that neibour . while in case of point-point , HDLC there is only one device at other end so layer 3 to layer 2 mapping is not required.

  • In OSPF loopbacks are advertised as stub host and network type loopback.if the mask of loopback is /24 and we want to advertise as /24 to ospf domain we need to change the network type
  • By adjusting the hello/dead timers you can make non-compatible OSPF network types appear as neighbors via the “show ip ospf neighbor” but they won’t become “adjacent” with each other. OSPF network types that use a DR (broadcast and non-broadcast) can neighbor with each other and function properly. Likewise OSPF network types (point-to-point and point-to-multipoint) that do not use a DR can neighbor with each other and function properly. But if you mix DR types with non-DR types they will not function properly (i.e. not fully adjacent). You should see in the OSPF database “Adv Router is not-reachable” messages when you’ve mixed DR and non-DR types.
  • Here is what will work:
Broadcast to Broadcast
Non-Broadcast to Non-Broadcast
Point-to-Point to Point-to-Point
Point-to-Multipoint to Point-to-Multipoint
Broadcast to Non-Broadcast (adjust hello/dead timers)
Point-to-Point to Point-to-Multipoint (adjust hello/dead timers)
  • Command lines:
sh ip os inter brief 
sh ip route ospf 
sh ip os boarder routers 
sh ip os da summary x.x.x - type 3
sh ip os da external x.x.x.x-type 5
sh ip os data router .x..x.x.x- type 1
  • Sumarisation can occur on ABR and ASBR
  • ABR uses area range command
  • when ABR /ASBR does sumarization it genrates null route for the summary , in case spefic prefix went unreachable for some reason and ABR has received traffic for that preifx it wll drop the traffic , if we want to avoid it use default route to forward the traffic we can use command ( no discard route internal / external) to drop the null route from routing table .
ASBR- Summary address x.x.x.x mask

OSPF Questions[edit]

What do you understand by backbone area?
What is the need for dividing the autonomous system into various areas?
What is the benefit of dividing the entire network into areas?
What changes it would make if the network is divided or not divided into areas?
What is the purpose of Stub area?
What is the purpose of NSSA area?
How Stub and NSSA works?
What are the criteria to form neighbour ship?
Why master slave needs to be elected between two neighbour interface?
What is virtual link?
Virtual link updates are multicast or unicast?
Explain the various states of OSPF?
What are various LSA and message Types?
What is the difference between E1 an E2 metrics?
Explain router redistribution?
How DR and BDR is elected?

Virtual links[edit]

All areas in an Open Shortest Path First (OSPF) autonomous system must be physically connected to the backbone area (Area 0). In some cases, where this is not possible, you can use a virtual link to connect to the backbone through a non-backbone area. You can also use virtual links to connect two parts of a partitioned backbone through a non-backbone area. The area through which you configure the virtual link, known as a transit area, must have full routing information. The transit area cannot be a stub area. 

The transit area cannot be a stub area, because routers in the stub area do not have routes for external destinations. Because data is sent natively, if a packet destined for an external destination is sent into a stub area which is also a transit area, then the packet is not routed correctly. The routers in the stub area do not have routes for specific external destinations.


we can also use GRE link between nonbackbone area and backbone area ,run area 0 over tunneled interface but there is GRE overhead.IN case of virtul only OSPF packets are send as tunneled packet and data traffic is send as it is normal area connected to backbone area.

EIGRP[edit]

EIGRP runs on ip protocol 88 , ospf 99 

Eigrp is hybrid protocol and has some properties of distance vector and some link state .

Distance vector - Only knows what its directly connected neibors are advertizing and link state because it form adjancies .

Inorder to form adjancency EIGRP AS no should be same between neihbours.

EIGRP Multicast adress -224.0.0.10

EIGRP is like bgp will only advertize the route which is going to install in routing table .

EIGRP classes protocol does automatic summary by default ,so we need to disable the automatic summarisation ( no auto summary )

EIGRp does spilt horizon , in case of DMVPN we need to disable the split horizon so that routes learned on tunnel interface through one spoke need to advertize to other spoke through same tunnel interface .
e 

passive interface command works silghtly different in EIGRP ,it stops sending multicast/ unicast hello to nei thus prevent forming adjancies .


Issuing a neighbour statment in eigrp on a link means it stops listen to mutlicast address so we need to specify the neighbour mannuly to other side to form adjancies.

Timers in EIGRP is not nessescary to match to form adjancey.



EIGRP -Metric calculation by bandwidth ,delay ,relibilty ,load MTU.

Bandwidth is scaled as minimum bandwidth and total delay ,highest load ,lowest reliablilty while calculating composite metric .

Feasible distance is best metric along the path and its successor metric .


EIGRP -FD-is best metric along the path to desination router including metric to reach the neibor 

Advertised  distance -toatl metric along the path advertized by up stream router .

a router is feasible successor if AD<FD of successor

FD is used for loop avoidance . spilt horizonrule -never advertized the route on the interface on which it is learned .

Feasible succesors are only candidates for unequal path load balancing.


Load balancing is done in EIGRP though unequal cost paths through  variance multiplier.
EIGRP is only routing protocol that supports load balancing across unequal path unlike like rip ,ospf ,isis.
 Fd <= FSx variance ( FD) then the path is choosen for unequal cost load balancing .

EIGRP traffic eng.could be easily achieved by modify the delay vlaue instead of bandwidth .


EIGRP command ( sh ip eigrp nei , sh ip eigrp nei de , sh ip eigrp topology , sh ip eigrp route)

Equal cost load balancing the traffic is distributed based on CEF.to turn off cef on interface do ( no ip route-cache)



SIA -Stuck in active ,if router receive a queries for destination neworwork it taking too much time to respond be baecause of network flap or some network condtion occur route is considered in SIA state .

we can tune the amount of time router should wait before putting route in SIA state by timers acive-time command 

to check which routers have not replied with queries issue sh ip eigrp topolgy ,router denoted by R meaning waiting for replies.


EIGRP perpforms auto summarization for a n/w when crossing a major n/w boundary

 * Split horizon should only be disabled on a hub site in a hub-and-spoke network.
   no ip split-horizon eigrp x


EIGRP router id helps in loop prevention for external routes which says if I gets the routes with orignator that is equal to my router id then I will discard the routes 


EIGRP provides faster convergnece as it doesnot need to run dual algo in case if there is feasible successor for the path, else if router do not have route it will send the query to its neibour router which will further progates the query to there neibours if the router doesnot 
recive the reply from the neibour before the timer expires it will mark this route in Stuck in active state and reset its neibour relationship if all its query are not answered with time time period .
while in OSPF if the primary path goes down ,it need to send the LSA and SPF algo is run again .
 dcesor in mind. 
There is ways to bound the query domain You can do in either of 2 ways or both

1) Using Summary routes -ip summary-address eigrp 'as' [network] [mask] [ad]
If RouterA sends a query message to RouterB and summarization is in use, RouterB will only have a summary router in its EIGRP topology table – not the exact prefix match of the query and will therefore send a network unknown response back to routerA. This stops the query process immediately at RouterB, only one hop away.

2) Using Stub  -
router eigrp 1
eigrp stub ' arguments' the default arguments are connected and summary this means it 
will advertised connected and summary routes only. 
A router will inform it neighbor of it stub status during the neighbor adjacency 
forming

Stub routers tell their neighbors “do not send me any queries”. Since no queries will be sent, it is extremely effective. However, it is limited in where you can use it. It is only used in non-transit paths and star topologies.

3. filtering the prefix 

please note Eigrp neighbor router  will propagate query received from neighbor router only if it has the extact match for the route ints topology table, if router doesnot have exact route in toplogy table it will send the reply with route unknow to its neighbor and further query will not be propagated .

4.Different AS domains

Different EIGRP AS numbers. EIGRP processes run independently from each other, and queries from one system don’t leak into another. However, if redistribution is configured between two processes a behavior similar to query leaking is observed. 



Both IGRP and EIGRP use an Autonomous System (AS) number and only routers using the same AS number can exchange routing information using that protocol. When routing information is propagated between IGRP and EIGRP, redistribution has to be manually configured because IGRP and EIGRP use different AS numbers. However, redistribution occurs automatically when both IGRP and EIGRP use the same AS number

MPLS[edit]

LAbels are locally significant between two attached devices .Once the mpls ip is enabled lables are advertised for connected interfaces and IGP learned routes.

MPLS label -32 bit ,first 20 bits label value .20-22 -experimental bits for qos ,23 -BoS(bottom of stack bit to signify the bottom label in stack ,24-32 (TTL vaule ) 

MPLS label is palced between layer 2 and lyer 3 header know as shim headder.

FEC-group or flow of packets that are forwaded along the same path with same treatment.
x
Protocol used to distribute labels are LDP ,TDP and RSVP TDP is cisco propriatry.there is formation of LIB which contains local binding and remote binding from all the LSR,what extacly the remote binding need to be used based on best route in Ip routing table information is populated in LFIB.

LDP is used for neighbour discovery over udp port 646 on multicast address 224.0.0.2 

for neighbor adjancy on tcp port 646 .

Label advertisemnt is for IGP connected interfaces and IGP leanred routes .

How does router determine wheather it is ip packet or labeled - there is protocol field is layer 2 frame ,that tell router to look the cef for ip packet or to look LFIB.

Inorder to see extract from LFIB- sh mpls forwading-table 

LFIB can be also seen as - sh mpls forwading-table prefix length 

MPLS Stack operatios (Push ,pop,swap,Untagged ,aggregate- summaristion is performed on router ,to remove the lable and perform IP lookup,)

labels 0 to 15 are reserved lables - lable 0 is explict null lable ,lable 3 is implict null lable ,label 1 router alert, label 14 OAM alert label

Use of Implict null lable is penultimate hop popping.


Explict null lable is used to reserve the Qos information .

Inorder to change the mpls lable range - mpls lable range 16 to 10 lakh 


MPLS lDP works on UDP protocol 646 and LDP hello messages are sent over multicast address 224.0.0.2
Inroder to check labels are received or not - sh mpls ldp discovery detail 



COMMAND LINES FOR MPLS `

1. IP CEF 
2. MPLS LABEL PROTOCOL TDP / LDP
3. MPLS IP 

SH MPLS LDP INTERFACE
sh MPlS LDP NEIGHBOR
sh MPLS FORWADING TABLE SIMMILAR TO sH IP ROUTE.




php-Penultimate Hope Popping which says that device next to last hop in the path is going to remove the label for the optimisation of lable lookup so that end device doesnot need to perform two looks while sending the traffic to end customer .

so to acomplish this router which is next to last hop send implicit null label for all its connected and loopbackinterfaces .

Note for any destination which is one hop away in mpls forwading tabel we are going to see POP LABEL.


P routers in the core doesnot need to know the full reachbilty of customer routing information as they just swicthed the packets based on labels .


FOR MPLS to work correctly we need to enable BGP next hop self command for the EBGP updates to propagate over IBGP PEER with next hop information for loopback interface .if the BGP peering is formed not over loopbacks between PE'sinstead of phyical interfaces peerring will be formed but it will lead to black hole as the pHP will cause third last hop to perform POP operation and traffic will be forwared to next to last hop as ip packet for which it doesnt have information for the destination.
the isssue is PHP get processed one hop too soon.


MPLS basis consist of two comonents 
1) VRF's -separatation of customer routing information using vrf's per interface 
2)exchange of routing information using MP-BGP.


VRF's without MPLS is called VRF lite .when using VRF's lite route distingusiher is only locally significant.

when we create VRF's any packet that comes to interface in VRF then the routing loopkup is done on that VRF's .


VNPV4 route- RD+IPV4 prefix (makes vpnv4 routes unique globly.(RD is 8 byte)

mpls vpn label - PE route exchange lable for each customer route via VPNV4.

Transport label- to tranport packet across remote PE.

RT_route traget is used to tell the PE which VRF route belongs and its BGP extented community attribute.


if we are running EIGRP over VRF's then we need to specify the autonomus system inside the vrf's separately else EIGRP adjancy will not be formed over EIGRP.

Route Target export- to advertise the routes from vrf into BGP .

Route Target import -To import the routes from BGP into VRF.

Between the PE's routers peering will be done globaly however customer routes will be redistributed in address-famil vpnv4 .

Please note while configuring vpnv4 we need to acitivate the vpnv4 capabilty with remote-peers.

loop prevention mechanism for route-target -the route will not import any prefix into vrf unless it is specified .

packet structure-                Layer2 header-Transport+VPN--IP header-LAyer4 header----PAyload

So when the traffic reaches from remote PE to PE on other side it will just refer to VPN label to see which exitinterface or VRF packet belongs too.


Steps for MPLS once basic connectvity and MPLS is enabled on interface in MPLS n/w

1. create VRF with route distingusiher+RT

2. Assign VRF to interfaces

3. RUN VRF aware routing process betweem PE to CE

4. ESTABLISH VPNV4 PEERS

5. Redistriute subnet from VRF to BGP and vice versa..

SHAM Links[edit]

SHAM links are basically creation of Virtual links between PE running BGP network and extending OSPF domain over mpls .

When we are running OSPF between PEto CE and rediribute ospf routes into bGP and vice versa there is addtion ospf attibutes that is attached in BGP VPNV4 routes.
so on other PE sidte when this routes are rediributed back from BGP to ospf these attributes helps where the redisributes routes to place in OSPF database as type 1,2 ,3,4,or 5.

Additionl attributed encoded from OSPF to BGP is like expample ( OSPF domain id ) which is created by the the local process id running if the ospf process id is same as doamin id in VPNV4 prefix ,the routes are injected in OPSF database as Type 3 LSA even if they are redistributed from BGP to OSPF.
if the domain id do not match the routes are leanred as type 5 for other vpn site .

So if we have backdoor link between two sites ,backdoor link is always perfered instead of MPLS,so to avoid it we create a SHAM links over PE's like GRE tunnel to extend the OSPF domain over MPLS.so when the routes are reditrbuted from BGP to OSPF as Intraarea routes rather than interarea.

How to create SHAM links .

1. Allocate a address between the PE's reachable over mpls 

2. under OSPF for that VRf create adjancy over PE's 

router osps 1 vrf c 
area 0 shamlink source address  destination address 

OSPF path selection creteria -if we have two routes learned as Inter area routes but one of route is leanred BY ABR in backbone area and other via ABR in over non backbone area ,prefix is always preferd by backbone area.

Loop prevention mechanism for OSPF changes when its being used as Layer 3 MPLS.

Using OSPF Between PE/CE customer routes are sent as Type 3 LSA so this sent as DN(down) bit set so if the same route is recieved BY PE on other side it will make PE aware not to redistibute the route back in BGP.

Cabailty VRF lite command under OSPF process is used to ignore down bit and TyPE 3 lSA will not installed in routing table .

For Type 5 LSA either we need to do with DOWN bit or route TAG  to prevent the loop.

Commands for switching[edit]

Note -Layer 2 header contains source mac ,des mac ,ether type ,ether type fields tells the process next layer 3 protocol like ipv4 ,ipv6.

sh int fa0/1 switchport ( trunk ,access ,administrative mode )

sh int trunk ( ports which are trunk )


sh spanning tree vlan 1 ( to check wheather traffic is forwaded in spanning tree )

if we have layer 2 ether channel then if we do sh spanning tree output it should show individual port channel group in output rather than individually phsyical links else we have issue .

on the swicth we have root port and designate port ,all the traffic from root port will be forwaded towards root bridge .

if the two switches are in differnt VTP domain, as long as they have trunking set between them is correct they will not effect the broadcast domain -Good 


two ways to change priorty for root bridge 

spaniing tree vlan 2 root primary 

spanning tree vlan 2 priorty lesser than 32768


In spanning tree one of election for root port on non route bridge is based path cost that is local to interface 


in 3560 swicth by default PVST+ is enabled 




AUto -Auto -results in access port 
access mode-Dynamic desirable -Access port 
tunk with nonnegotiate ---auto -Becuase switch on left side is not sedning DTP frames.


BEst practises of truking -mode trunk and non negotiate ,Trunk negotaition are done on DTP when using DTP both the ends should in same VTP domain


when frame traverse the trunk link it is marked over truking protocol and on receiving end VID is removed before sending to access link 

ISL and 802.1Q

ISL -encapulsate entire frame ,it dos not native vlan traffic ,orginal frame unmodifed ,ISL adds 26 byts header and 4 bytes trailer.range of isl 1-1024

802.1Q-insert 4 byte tag ,does not tag the frame that belong to native vlan ,additonal tag includes priroty field ,extending qos support ,4096 VLans,1-4096


inorder to maintain identical information of vlan database ,VLAn information is propagatd over trunk links in same VTP domain ,VTP information is advertized over trunk links only .

VTP is layer 2 messaging protocol.three version of VTP (1,2,3)

Limitaion of VTP version 1 ,2 -extended VLan funstionality wasonly used in when switch is configured in trasnsparent mode ,so the VTP version 3 is used .


Server mode -create ,del ,modify ,send and forward advertizements ,syn vlan database ,store information in nvram

transparent mode -`create ,del ,modify local Vlan ,forward advertizements,no syn vlan database, store information in nvram

client mode -- canot create ,del ,modify vlans ,forward advertizements,syn vlan database,do not store information in nvram.

Important -when ver new switch is added make sure its configration revision is less than any other swiches in VTP doamin else if it is high then it will erase all the vlan information of server and client 
to protect that either add switch in transpanrent mode or in differnt domain .


for VTP configration requires VTP domain ,password ,VTP mode on each switch .sh VTP status or VTP counters.


VTP pruning -used to remove unnessary flooding of brodcast traffic on the network.


STP-is used to avoid unwanted loops in the environment .

STP created one refernce point in n/w that is called root of tree ,based on rerfernce point decides whether there is redundant path in the n/w 


Layer 2 forwading -By default CAM table entries got aged out every 300  sec 

We can also create static mac address table entry in cam - command ( mac-address-table static mac-address VLAN id interface type )



Bridge segments collsion domain dose not segmets broadcast doamin 

Root bridge -selection is based on bPDU contains bridge id which is combination of mac address and priorty (both are chosen lower )
on root bridge both the ports are DP.
then there is selection of root port on non root bridge .

for root port selection is based on following paramteters ( lower root bride id ,lowest path to root brige ,lowest sender bridge id ,lowest port priority ,lowest port id .


for every lan segment -there is secltion of DP (selection is based on root id creteria)

802.1d states -Disabled ,blocking?(listen to incoming BPDU) ,listening ,learning ,forwading (tranmit BPDU)

Hello time -Default is 2 seconds ,time interval in which subsequent configration BPDU send root bridge ,for non root bridge TCN BPDU is 2 sec .

Forward delay -time interval swich port spends in listening and learning states ,default time is 15 second 

Maximum age --time when max age is timed out is 20 seconds when the BPDU is aged out .


In case if any interface flap ( up /down states )switch will send the TCN BPDU untill it reach root bridge ,root bridge will send the configration BPDU with TC flag set and each switch 
will will rebuild its mac table based on forwadig delay time .(default is 300 sec) total time is 17 seconds.

total time the port trantion from blocking to forwadig state is 30 seconds


Port fast feature -when we enable port fast on the port so TCN BPDU is send in case of Topolgy change and port directly transtion to forwading state .SO there are chances that port fast enabled port could cause STP loops if the accidently switch is installed on that port ,to prevnet this we use BPDU Guard along with STP.


We can manully select the root bridge -spanning tree VLAn vlanid priotry (bridge priority)

we can set mannualy to become one bridge to be root bridge ( spanning tree vlan vlan id root  (primary ,secondary,diameter)

We can aslo set the path cost -spanning tree vlan vlanid cost 

port id is 16 bit -8 bit port priorty + 8 bit port number 

spannin tree vlan vlan id port priority 


RSTP have rapid convergence time ( discadring ,listening ,forwading )

RSTP works on port rules instead of rely on BPDU from root bridge .

RSTP-root port ,DP,alternate port is back up of root port ( have two up links ), back up port ( given segment active ling fail and there is no path to reach root then back up port become active .

IN RSTP all the full duplex ports are point to point links ,BPDU are exchanged between swiches in form of proposal and agreement ,once the given port is selected as DP and other switch send agrremnts message ,
RSTP convergys qucikly by throgh RSTP handhake .




HSPR-Provide redudancy of the gateways ,HSRP exchange the HSRP hello message on 224.0.0.2


VRRP-In VRRP we can use real ip add of router as virtual address ,IEE standard,router with highestest priorty is master router and other acts a back and VRRP messages are send on multicast address 224.0.0.18 ,Default interval is 1 second and preemtion is enabled by default .


GLBP -uses concept of AVG and one router act as primary while other act as backup ,AVG assign virtual macs to AVF,and it is AVF's which forwrd the packets based on virual mac's assgin by AVG.,

GLBP communicate over hello packets send every 3 seconds on multicast address (224.0.0.102),GLBP suppots up to 1024 vrtual routers.



This table shows the support of MST in Catalyst switches and the minimum software required for that support.

Catalyst Platform MST with RSTP  -- (12.1 or higher )
Catalyst 2900 XL and 3500 XL Not Available  
Catalyst 2950 and 3550 Cisco IOS® 12.1(9)EA1 
Catalyst 3560 Cisco IOS 12.1(9)EA1 
Catalyst 3750 Cisco IOS 12.1(14)EA1 
Catalyst 2955 All Cisco IOS versions 
Catalyst 2948G-L3 and 4908G-L3 Not Available 
Catalyst 4000, 2948G, and 2980G (Catalyst OS (CatOS)) 7.1 
Catalyst 4000 and 4500 (Cisco IOS) 12.1(12c)EW 
Catalyst 5000 and 5500 Not Available 
Catalyst 6000 and 6500 (CatOS) 7.1 
Catalyst 6000 and 6500 (Cisco IOS) 12.1(11b)EX, 12.1(13)E, 12.2(14)SX 
Catalyst 8500

Spaning tree[edit]

Spaning tree features that helps in reducing covergence time 

1 .Portfast -used for access layer ports ,Ports directyly transtion to forwading state with out going to lisening and learing states .

2. uplink fast -is used in case of one of uplink goes down ,root port and alternate port forms uplink group ,if the root port goes down alternate port directyly transtion to forwading state with out going to lisening and learing states .


3. backbone fast -In case of indirect link failure ,switch on  where backbone fast is enabled receice inferior BPD's from Desiganting switch anouncing it self as root bride ,On revceving the inferior BPDUS it will expire the max aga time imidiatlly and reconverge the toplogy.Backbone fast helps in optimisation of max-age timer,should be implemented globally .
switch determine that path to root bridge has 
gone down so send the RLQ out all its ports and once the root bridge recieve the RLQ and send the response back and port receving the response can transtion to forwading the state

PAGP[edit]

 auto
	
Places a port into a passive negotiating state, in which the port responds to PAgP packets it receives but does not start PAgP packet negotiation. This setting minimizes the transmission of PAgP packets. This mode is not supported when the EtherChannel members are from different switches in the switch stack (cross-stack EtherChannel).

desirable
	

Places a port into an active negotiating state, in which the port starts negotiations with other ports by sending PAgP packets. This mode is not supported when the EtherChannel members are from different switches in the switch stack (cross-stack EtherChannel).


Spaning tree security features[edit]

Spanning Tree enhancements:


bpdu Gaurd---Enable on the edge ports , connected to the hosts. If bpdu is reveived on these interfaces , it will put the interface in shudown state.
bpdu filter---Enable on edge ports---it dont send and recieve bpdu if enabled, if bpdu received, drop the bpdu, port goes, through normal stp states.
root gaurd: Root guard prevent the switch to become root bridge , It is enabled on the designated ports of root switch, so that if those ports listen to the superior BPDU then put that port in inconsistent state.
Loop Gaurd: Spanning Tree Loop Guard helps to prevent loops when you use fibre links.STP is not able to detect Layer 1 issue , Enable alternate ports/backup ports when Loop Guard detects that BPDUs are no longer being received on a non-designated port, the port is moved into a loop-inconsistent state instead of transitioning to the listening/learning/forwarding state. and idealy it can be enabled on all the ports.should be enabled on non-designated ports .
Actually, loopguard is a method of protecting against unidirectional links. In order for spanning tree to function correctly, any link participating in the STP have to be bidirectional. If a link should become unidirectional, through a cable failure or interface fault, spanning tree could unblock a link which would cause a loop.
UDLD (UniDirectional Link Detection) is a Cisco proprietary protocol that will detect this condition. Loopguard is what you would use if you didn't have Cisco switches at each end of the link in question.
Based on the various design considerations, you can choose either UDLD or the loop guard feature. In regards to STP, the most noticeable difference between the two features is the absence of protection in UDLD against STP failures caused by problems in software. As a result, the designated switch does not send BPDUs. However, this type of failure is (by an order of magnitude) more rare than failures caused by unidirectional links. In return, UDLD might be more flexible in the case of unidirectional links on EtherChannel. In this case, UDLD disables only failed links, and the channel should remain functional with the links that remain. In such a failure, the loop guard puts it into loop-inconsistent state in order to block the whole channel. 
Additionally, loop guard does not work on shared links or in situations where the link has been unidirectional since the link-up. In the last case, the port never receives BPDU and becomes designated. Because this behaviour could be normal, this particular case is not covered by loop guard. UDLD provides protection against such a scenario.
Loopguard is not able to detect misiwring problem but UDLD able to detect this and UDLD is using its own layer 1 keepalive message .


DHCP snooping -allowed confgration of trusted and untrusted ports ,trusted will sorurce all the DHCP messages and untrusted will source on DHCP request,if the rouge DHCP server tries to reply the DHCP request DHCP snopping will make this port shut .
DHCP option 82 -in wich port number is also added in DHCP request.
SPanning port security feature only works if we have configured the port in statc access / trunk port ,it won't work with port in dynamic mode.we can bind the mac address with switchport port security command and if we use sticky what ever mac is learned over interface it will mannualy add to secure cam table and also add in running config .
Second option is mannaul create static enriers in CAM table .
Storm control feature - used to limit the amount of unicast /mutlicast /broadcast packet recieved on interface .Simmilar to polcier in MQC.
Port base ACL- is used to apply access list on layer 2 port but its only used to filter inbound traffic .
We can also use MAC based ACL but that is only used to restrict non-IP traffic .
IP source guard ( layer 2 port  ,Dyanmic arp inspection is for arp spoofing .


VLAN[edit]

VLAN -create a broadcast domain,PVlan allows splitting the domain into multiple isolated subdomains .


Private Vlans _  Promicious , Cummunity , Isolated 

Promiciuos -Carry traffic for all the pvlans 


community vlan -Can only talk to ports in same community vlan and its promiciuos port 

Isolated -Can only talk to promicious port

Primary VLAN— The primary VLAN carries traffic from the promiscuous ports to the host ports, both isolated and community, and to other promiscuous ports.


for low end switches ,there is command switchport mode protected act simmlar to isloated vlan ,all those ports configured for protected donot talk to each other .Usually, ports configured as protected are also configured not to receive unknown unicast (frame with destination MAC address not in switch’s MAC table) and multicast frames flooding for added security.

Configure - 

Vlan 1000 
Private vlan primary 

vlan 1012 
private vlan community

vlan 1013
private vlan ISolated 


vlan 1000
private vlan association 1012,1013.


configure ports 

1 int fa0/1
 swicth port private-vlan 1000,1012 -each host port is member of two vlans .
 switch port private-vlan host 


2. int fa0/2 
    switch port private-vlan 1000,1013 -isolocated port 
    switch port private-vlan host

3. int vlan 1000
    private vlan mapping 1012,1013 -promciuos port 


This example shows how to associate community VLANs 100 through 103 and isolated VLAN 109 with primary VLAN 5: 

switch# configure terminal 
switch(config)# vlan 5 
switch(config-vlan)# private-vlan association 100-103, 109 


This example shows how to configure the Ethernet port 1/12 as a host port for a private VLAN and associate it to primary VLAN 5 and secondary VLAN 101: 

switch# configure terminal 
switch(config)# interface ethernet 1/12 
switch(config-if)# switchport mode private-vlan host 
switch(config-if)# switchport private-vlan host-association 5 101


F5 Trainging[edit]

LTM How BIG IP process Traffic 


Node -represent the Ip address 
Pool member -combination of Ip address and port number ,in other words pool member is application server on which F5 will redirect the traffic 
Pool-combitnation of pool memeber.

Virtual server -combination of virtual IP and port ,is also know as listner and we associate virtual server to pool members.

load balacing mehtods[edit]

static -Round robin ,ratio
Dyanamic -LFOPD (least connection ,fastest ,observed,predective,dyanmic ratio )

Least connection -load balacing is based on no of connection counts ,if the connection counts are equal it will use round robin  

Fastest -No of layer 7 request pending on each member.

Observed -ration load balacing method but ratio assigned by BIG IP,No off least connections counts BIG IP assign the request and check dyanamically and assign the ratio's of the request.

Predective -similar to oberved but assigns the ratio agressivley based on average connection counts .


load balacing by poolmember or node .


Priorty activation -helps to configure back sets for exsiting pool members .BIG Ip will use high priorty pool member first .

Fallback host is only used for HTTP request ,if all the pool memebers are not availiable BIG will redirect the cilent request 

Monitors :check the status of nodes and pool memembers ,if any pool meember resposnse time is not good or is not responding big ip will not send the request to that node.

monitor type :

adress check -BIG IP send ICMP request and wait for reply if there is no reply it considers nei down does not send the trafic further to that node.

service check -will check TCP port number on which server is lisenting ,if no responce it considers down ----

contect check -we can check if the server is resondping with right contest ,like for http requset get/http .... request is send .

interactive check -TEST for FTP connection .once connection is open username  and  password is send then request is send get /file once file is recieved  connection is closed .

F5 recommends time out = 3n+1 (frquency) for setting the monitor for http 

Customization of monitor 

Assign nodes to monitor 


Profiles -defining traffic behaviour for virtual server.

Profiles contains setting how to process traffic though virtual servers.if for certain application BIG IP load balace the traffic then it will break the client connection 
to avoid this we use perstiance profile so that return request for the cilent is send to same server.

persisteance profile - isconfigured for clients and group of cilents how BIG IP knows the returning client request need to send to same server ,persistance profile is confiured taking source ip address of http cookie

SSL termination


FTP profile 


All virtual servers have layer four profile includes tCP,UDP,fastl4 


Profile types -service profile ,persistance profile ,protocol profile ,ssl profile ,authentication profile ,other profiles.

Persistence types[edit]

source address persistance :keeps the track of source ip address ,adminstrator can set the net mask in persitance record so that all lients in same mask will assigned to same pool member.

Limitation -if the client address being NAted .


Cookie persistance -only uses http protocol 

Three modes : (insert ,rewrite ,passive ) mode.

Insert mode -BIG ip create special cookie in HTTP resonse to client .
rewrite -pool member created blanl cookie and big ip inserts special cookie 
passive -pool memeber created special cookie and BIG IP let it pass through

SSL Profile[edit]

SSL is secured socket layer .

website which uses HTTPS we need to us SSL profile as traffic is being Nated for source clients and web app is using https protocol.
Using SSL termination BIG can decrypt the traffic and assigned to pool member.


BIG IP contains SSL encryption hardware so all the encruption and key exchange are done in hardware .centralized certifiacte management.

I rule[edit]

I Rule is a script that direct traffic though BIG IP , based on TCl command language .I rule give controll of inbound and outbound traffic from BIg IP.

I rule contains follwing events ( I rule name ,events ,condtion ,action )

Multicasting[edit]

Ranges 

224.0.0.0/4 - 224.0.0.0 -239.255.255.255

Link local address - 224.0.0.0/24

Source specifc multicast -232.0.0.0/24

Administrativley scoped -239.0.0.0/8


Multicast control plane work differntly than unicast routing ,it needs to know who is sender of mutlicast and to which group ,also the reciever of multicast.

Multicast Data plane -do RPF check ( was traffic received on correct interface and bulid multicast routing table ).

Multicast is source based routing .

IGMP -Host on LAN singanl the router to join the mutlicast group .

Two kind of request - (*,G)-Any source who is genrating the mutlicast stream for that group  -Supported by IGMP V1 and V2
                      (S,G)-want to join particular source sending the mutlicast group .-IGMP version 3 support both (s,g and (*,G)

IGMP get enabled when the IP PIM [ Dense mode,sparse mode,SParse-DENSE-mode) is enabled .

BY default IGMP version 2 is enabled .

IP  IGMP join group address can be used for testing on routers to see weather muticast traffic is recieved on router for particular group.

ip igmp static group  command can be used to mannually put the request for particular mutlicast group insteaed of reling on IGMP queriy messsage for particular group.

PIM- used to siganl routers to bulid muticast tree ,tree could be sender to receiver or sender to rendevpoint--- receiver.

PIM version 1 or 2 ,By default its PIM version 2 , RP information is already encoded in PIM packet in version 2. PIM version 2 has field for BSR.

DENSE mode - Implicit join ,mutilcast traffic is send across entire network unless if some one report for not joing the particular stream.Flood and prune behiviour.
Nighbor discovery on multiicast address 224.0.0.13 same for sparse mode as well .

Note if we have (*,G) entery then we know about reciver and if we have  (S,G) entry then we know about sender as well .

Two ways to genrate mutlicast traffic either through pinging mutlicast address or through IP SLA.
IN PIM dense -through RPF nei information is used to send unicast packet back to source ,message could pim prune or graft message .when the multicast source flood the traffic for particular multicast groups each multicast enable router will install (S,G entry) and (*,G) entries even if they are not intersted .

So in dense every router needle to install (*,G ) and (S,G) entry as we canot have (S,G) untill we have (*,G) entries.so if the source is active every router need to maintain the state table for mutlicasting .

Graft message for (S,G) entry is to unprune the mutlicast traffic as earlier it was set to prune .

State refresh to keep the link prune as its original state .

SParse mode -uses explict join unless it is asked by someone to join mutlicast traffic uses RP as reference point.In case we are using source specific mutlicast we don't need RP.for Group specfic joins we need RP.Traffic is not send anywhere unless it is requested .Sparse mode uses both source based trees and shortest path trees
RP needs to know the recievers and senders . DR on  lan segment send (S,G) register mess age to  and RP in turns reply regiester stop process and recievers on lan sengment send IGMP join and which will be converted to pim join(*,G) message to RP to form RPT tree.So pim join will traverse from receiver till RP every device will have (*,G) entry and from source till RP every device will have (S,G) entry.once RP knows about sender and reciver it will send (S,G) join request back to source and source would start sending the mutlicast traffic to RP then to receiver.then its up to the last hop reouter on reciever side for the optimation process weather it want to join directly to source using SPT bypassing RP.

Note -When we do debug only process switchd traffic is debug if we want to debug the data plane traffic then we need to disable cef (no ip route cache),if we change the unicast routing it will also change the mutlicasting routing,To change the unicast routing we can also use Ip mroute command .


Source based tree- tree is bulid based on shortest path from reciver till sender.
shared tree -tree from sender to RP and then RP till receiver.

To check RP configured on each transient router  -sh ip pim rp mapping 
RP can be assigned staticaly (ip pim rp address ) or dynamically ( auto RP and BSR)

Auto RP -uses two data plane mutlicast address (224.0.1.39) advertised by routers willing to become RP to mapping agents ,
224.0.1.40- chooses the RP and advertised to rest of routers for RP information.

To stay on shared tree rather than SPT ( ip pm spt-threshold infinity)


SParse-dense-mode -ANY group for which we have RP assigned used sparse mode for other uses dense mode.

RPF check is used for loop free path in mutlicast data plane ,AS per RPF check if the mutlicast packet is received on incoming interface router will check the unicast routing for source and that matches the incoming interface RPF check Passes else fail .

Once the mutlicast routing table is populated router always prefer (S,G) over (*,G) and in muticast routing table we have incoming interfaces and OIL for outgoing intefrcae list if the RPF  check passes mutilcast traffic is send across all interfaces in OIL.




On multicast router -sh ip igmp group -- shows which multicast group is active on ethernet and which receiver has joined the group 

To determine which router is IGMP querier router - sh ip igmp interface EO

We can manauly tune the query interval and query max response time - 
query interval - ip igmp query interval 120 (default 60 sec) 
respose time - ip igmp query-max-response-time 20 (default 10 sec)

IOS command to support which version of IGMP is - Ip igmp version 1/2


Test commands for IGMP

ip igmp join group 

ip igmp static group 

for sparse mode we need to assgn RP - ip pim rp address x.x.x.x

inorder to check if there are any rp mapping - sh ip pim rp mappings

Inoder to check for mutlicating packet conuters- sh ip mroute counters

In sparse mode there is SPT switch over shorted path tree 

for the SPT threshold we can set the threshold on DR muticast router that is receiving the IGMP join request in gloabl config mode ip pim spt threshold (vlaue)- Value is volume of multicast feed 

if the Rpf check is failing we can still have interface to forward multicase by static mrouter ( ip mroute server mask next hop address )
  1. www.bt.com/india
  2. www.citrix.com
  3. www.citrix.com