First 6 Identity Protection (Main Mode) messages negotiate security parameters to protect the next 3 messages (Quick Mode) and whatever is negotiated in Phase 2 is used to protect production traffic (ESP or AH, normally ESP for site-site VPN).
We call first 6 messages Phase 1 and last 3 messages as Phase 2.
Both peers add a unique SPI just to uniquely identify each side's Security Association (SA):
In frame #1, the Initiator (.70) sends a set of Proposals containing a set of security parameters (Transforms) that Responder (.71) can pick if it matches its local policies:
Fair enough, in frame #2 the Responder (.71) picks one of the Transforms:
2.2 DH Key Exchange
Then, next 2 Identity Protection packets both peers exchange Diffie-Hellman public key values and nonces (random numbers) which will then allow both peers to agree on a shared secret key:
With DH public key value and the nonce both peers will generate a seed key called SKEYID.
A further 3 session keys will be generated using this seed key for different purposes:
SKEYID_d (d for derivative): not used by Phase 1. It is used as seed key for Phase2 keys, i.e. seed key for production traffic keys in Plain English.
SKEYID_a (a for authentication): this key is used to protect message integrity in every subsequent packets as soon as both peers are authenticated (peers will authenticate each other in next 2 packets). Yes, I know, we verify the integrity by using a hash but throwing a key into a hash adds stronger security to hash and it's called HMAC.
SKEYID_e (e for encryption): you'll see that the next 2 packets are also encrypted. As selected encryption algorithm for this phase was AES-CBC (128-bits) then we use AES with this key to symmetrically encrypt further data.
Nonce is just to protect against replay attacks by adding some randomness to key generation
The purpose of this exchange is to confirm each other's identity. If we said we're going to do this using pre-shared keys then verification consists of checking whether both sides has the same pre-shared key. If it is RSA certificate then peers exchange RSA certificates and assuming the CA that signed each side is trusted then verification complete successfully.
In our case, this is done via pre-shared keys:
In packet #5 the Initiator sends a hash generated using pre-shared key set as key material so that only those who possess pre-master key can do it:
The responder performs the same calculation and confirms the hash is correct.
Responder also sends a similar packet back to Initiator in frame #6 but I skipped for brevity.
Now we're ready for Phase 2.
3. Phase 2
The purpose of this phase is to establish the security parameters that will be used for production traffic (IPSec SA):
Now, Initiator sends its proposals to negotiate the security parameters for production traffic as mentioned (the highlighted yellow proposal is just a sample as the rest is collapsed - this is frame #7):
Note: Identification payload carries source and destination tunnel IP addresses and if this doesn't match what is configured on both peers then IPSec negotiation will not proceed.
Then, in frame #8 we see that Responder picked one of the Proposals:
Frame #9 is just an ACK to the picked proposal confirming that Initiator accepted it:
I just highlighted the Hash here to reinforce the fact that since both peers were authenticated in Phase 1, all subsequent messages are authenticated and a new hash (HMAC) is generated for each packet.