The newest version of the ESP header consists of the following fields:
and is preceded by an IP header [6].
In transport mode an ESP packet is:
where Payload is a TCP header followed by data.
In our abstract packet format this is:
where data-list is the entire packet. Note that ESP uses both an authentication transform and an encryption/decryption transform, so there will be two keys associated with an ESP security association. When a packet is sent, it is assumed that an encryption function is first used to create ciphertext form the contents of the encrypted-data fields, then an authentication function, such as a keyed hash, is used over the fields in the hash-data. The first version of the ESP header did not include an authentication mechanism [2].
In the earliest version of the ESP header, it was claimed that in addition to confidentiality, the ESP header also supplied authentication even though no authentication transform was used (see the discussion on authentication) [2]. This claim is misleading. The current proposed ESP header does not provide either authentication or confidentiality unless the optional authentication feature of the header is used. If the integrity of the ciphertext and/or the fields used to determine the correct security association and key are not guaranteed, then the decryption of the ciphertext may be incorrect but this may go undetected at the IP layer. In the worst case, only a block of the ciphertext has been modified, so decryption will continue and the resulting bits will be passed up the protocol stack. The use of the self-correcting Cipher Block Chaining Mode increases this possibility. In an attempt to make the latest version of the ESP header backward compatible, the use of a cryptographic hash over parts of an ESP packet is an optional default. As is mentioned in [6], if this option is not used, confidentiality is not guaranteed.
Processing the ESP header by first encrypting and then authenticating allows for fast detection of improperly constructed or replayed packages since only the authentication function needs to be computed. This is done to reduce the success of denial of service attacks and also allows processing of the hash and decryption transformations to be done in parallel [6].