Creating new business opportunities with SRTP for VoIP

SRTP uses algorithms such as key derivation to encrypt/decrypt VoIP traffic and make it more secure. This is especially important given the sensitive data, such as credit card numbers, that is transmitted using IP telephony. Learn the basics of SRTP and why it might represent an underserved market for VARs.

IP telephony includes more than simple human conversations. Other sensitive data, including credit card information, is frequently exchanged using methods such as DTMF (dual tone multi frequency) and fax. While VoIP security has received more attention, especially since it is not inherently secure, true VoIP security requires the use of advanced cryptography.

Secure Real Time Protocol (SRTP), a security standard published in 2004, uses cryptographic techniques such as key derivation to encode/decode VoIP traffic. Still, according to information security experts Peter Thermos and Ari Takanen, SRTP has not been widely deployed because of "negligence or lack of expertise to deploy SRTP in VoIP enterprise environments by corporations." Given this gap between client need and the lack of VARs with SRTP expertise, discover how SRTP can help you open up new areas of business.

The Secure Real Time Protocol (SRTP) is a profile for the Real Time Protocol (RTP, IETF RFC 3550) to provide confidentiality, integrity, and authentication to media streams and is defined in the IETF RFC 3711. Although there are several signaling protocols (for example, SIP, H.323, Skinny) and several key-exchange mechanisms (for example, MIKEY, SDESCRIPTIONS, ZRTP), SRTP is considered one of the standard mechanism for protecting real-time media (voice and video) in multimedia applications. In addition to protecting the RTP packets, it provides protection for the RTCP (Real-time Transport Control Protocol) messages. RTCP is used primarily to provide QoS feedback (for example, round-trip delay, jitter, bytes and packets sent) to the participating end points of a session. The RTCP messages are transmitted separately from the RTP messages, and separate ports are used for each of the protocols. Therefore, both RTP and RTCP need to be protected during a multimedia session. If RTCP is left unprotected, an attacker can manipulate the RTCP messages between participants and cause service disruption or perform traffic analysis.

The designers of SRTP focused on developing a protocol that can provide adequate protection for media streams but also maintain key properties to support wired and wireless networks in which bandwidth or underlying transport limitations may exist. Some of the highlighted properties are as follows:

  • The ability to incorporate new cryptographic transforms.
  • Maintain low bandwidth and computational cost.
  • Conservative in the size of implementation code. This is useful for devices with limited memory (for example, cell phones).
  • Underlying transport independence, including network and physical layers that may be used, and perhaps prone to reordering and packet loss.

These properties make the implementation of SRTP feasible even for mobile devices that have limited memory and processing capabilities. Similar design properties are found in MIKEY (Multimedia Internet KEYing). Therefore, the use of MIKEY for key exchange and SRTP for media protection is one combination of mechanisms to provide adequate security for Internet multimedia applications, including VoIP, video, and conferencing.

The application that implements SRTP has to convert RTP packets to SRTP packets before sending them across the network. The same process is used in reverse to decrypt SRTP packets and convert them to RTP packets. Figure 6.1 depicts this process.

FIGURE 6.1 SRTP encoding/decoding.

After the application captures the input from a device (for example, microphone or camera), it encodes the signal using the negotiated or default encoding standard (for example, G.711, G.729, H.261, H.264) and creates the payload of the RTP packet. Next, the RTP payload is encrypted using the negotiated encryption algorithm. The default encryption algorithm for SRTP is AES (Advanced Encryption Standard) in counter mode using a 128-bit key length. This mode, along with the null mode,5 is mandatory for implementations to be considered compliant with the IETF RFC (see RFC 3711 for additional requirements) and interoperate with other implementations. SRTP also recommends the use of AES in f8 mode to encrypt UMTS (Universal Mobile Telecommunications System) data. This mode also uses the same size for the session key and the salt as in counter mode. The use of AES in SRTP allows processing the packets even if they are received out of order, which is a desirable feature for real-time applications.

In addition to providing data encryption, the SRTP standard supports message authentication and integrity of the RTP packet. The default message authentication algorithm is SHA-1 using a 160-bit key length. The message authentication code (MAC) is produced by computing a hash of the entire RTP message, including the RTP headers and encrypted payload, and placing the resulting value in the Authentication tag header, as shown in Figure 6.2.

FIGURE 6.2 Format of the SRTP packet.

You might note that the SRTP message resembles the format of an RTP message with the exception of two additional headers: the MKI and the Authentication tag. The MKI (Master Key Identifier) is used by the key management mechanism (for example, MIKEY), and its presence is optional in implementations according to the SRTP standard (RFC 3711). The MKI can be used for rekeying or to identify the master key from which the session keys were derived to be used by the application to decrypt or verify the authenticity of the associated SRTP payload. The key-exchange mechanism generates and manages the value of this field throughout the lifetime of the session. The use of the Authentication tag header is important and provides protection against message-replay attacks.6 In VoIP deployments, it is recommended that message authentication be used at a minimum if encryption is not an option. Use of both is the optimal approach.

Note that the message headers are purposefully not encrypted (for example, sequence number, SSRC) to support header compression and interoperate with applications or intermediate network elements that might not be required to support SRTP but need to process the RTP headers (for example, billing). This limitation allows an attacker to perform traffic analysis by collecting information from the RTP headers and extensions, along with information from underlying transports (for example, IP, UDP). One area of interest is the future protocol extensions that will be developed for RTP and the sensitivity of the information that these extensions will carry.

Figure 6.3 shows an example of an application using SDescriptions (Security Descriptions) to transmit a cryptographic key for use with SRTP. The key is transmitted within the SDP portion of a SIP message. The SDP media attribute crypto defines the type of algorithm, the encryption mode, and the key length (AES_CM_128), along with the message digest algorithm and its length (SHA1_32).

FIGURE 6.3 Key negotiation using SDescriptions in SIP. (Click here for a larger image.)

The "inline" method indicates that the actual keying material is captured in the key-info field of the header. The syntax of the header is defined as follows:

 a=crypto:  [ ]

 identifies the encryption and authentication algorithms (in this case, AES in counter mode using a 128-bit key length and SHA-1).

The next attribute is , where

 key-params =  ":"

In this case the  is inline

   = UlrbLlfNTNw3blKHQVLGze6oHsyFdjGj3NheKoYx

Another mechanism of exchanging cryptographic keys is through the use of MIKEY, as discussed in further detail in Chapter 7, "Key Management Mechanisms." Figure 6.4 shows a SIP INVITE that announces the use of MIKEY in the SDP portion of the message. The following message is a capture from communications that use the minisip implementation.7

The attribute header key-mgmt in the SDP indicates that MIKEY should be used to encrypt media during this session.

If the signaling message (in this case, SIP) is transmitted in the clear, the encryption key can be intercepted and the contents of the media streams can be decrypted by an adversary. Therefore, it is necessary that signaling messages that carry encryption keys are also encrypted using protection mechanisms discussed in Chapter 5. In this case, the SIP signaling was performed using UDP to exchange keying material. UDP does not offer any protection and thus the keying material are exposed to eavesdropping.

After the keys have been negotiated, the application encrypts the RTP payload and sends the SRTP packets to the remote end. Figure 6.5 shows an example of the SRTP packet.

FIGURE 6.4 Use of MIKEY in SIP for key negotiation. (Click here for a larger image.)

FIGURE 6.5 Contents of an SRTP packet.

All headers in the RTP packet are sent in the clear except for the payload, which is encrypted. Because SRTP uses AES by default, it provides protection against DoS attacks that aim to corrupt the encrypted media content. Typically, stream ciphers that rely on previous blocks to decrypt the next block (cipher block chaining) can be attacked by corrupting the data of one block and thus crippling the ability to successfully reassemble and produce the original content. AES does not suffer from this limitation because it can decrypt each block without requiring knowledge of previous blocks.

The use of authentication and integrity in SRTP messages is an important way to protect against attacks, including message replay and disruption of communications. For example, an attacker may modify the SRTP messages to corrupt the audio or video streams and thus cause service disruption. Another attack can be performed by sending bogus SRTP messages to a participant's device, thus forcing the device to attempt and decrypt the bogus messages. This attack forces the device application to impact the legitimate session by diverting resources to process the bogus messages. In cases where applications do not maintain session state, these attacks might not be as effective compared to stateful applications. Therefore, it is recommended that VoIP implementations use SRTP using SHA-1 with a 160-bit key length (and producing an 80-bit authentication tag) for message authentication and integrity to protect against such attacks. In some scenarios (for example, wireless communications) where bandwidth limitations impose restrictions, the use of a short authentication tag (for example, 32-bit length) or even zero length (no authentication) is an option.

Table 6.1 lists the parameters and corresponding values associated with key management in SRTP.

Table 6-1 SRTP Key Management

Parameter Mandatory to Support Default
SRTP/SRTCP cryptographic transforms AES_CM, NULL AES_CM, AES_F8 for UMTS
SRTP/SRTCP authentication transforms HMAC_SHA1 HMAC_SHA1
SRTP/SRTCP authentication parameters 80-bit authentication tag 80-bit authentication tag
Key derivation Pseudo Random Function AES_CM AES_CM
Session encryption key length 128 bit 128 bit
Session authentication key length 160 bit 160 bit
Session salt value length 112 bit 112 bit
Key derivation rate 0 0
SRTP packets max key-lifetime 248 248
SRTCP packets max key-lifetime 231 231
MKI indicator 0 0
MKI length 0 0

In addition, the following parameters are included in the crypto context for each session SSRC value: ROC (Roll Over Counter), SEQ (RTP sequence), SRTCP index, transport address, and port number.

Key Derivation

Although implementations may use a variety of key management mechanisms to manage keys, the SRTP standard requires that a native derivation algorithm be used to generate session keys. The use of the derivation algorithm is mandatory for the initial session keys.

FIGURE 6.6 Key derivation algorithm.

The ability to derive keys through SRTP instead of using an external mechanism reduces additional computing cycles for key establishment. Typically, each session participant maintains a set of cryptographic information for each SRTP stream, which is referred to as the cryptographic context. For each cryptographic context, there are at least one encryption, one salt, and one authentication key for SRTP and SRTCPs respectively. Therefore, the SRTP key derivation algorithm can request only one master key and one salt value, when required, to derive the necessary session keys. Figure 6.6 shows this process. The derivation algorithm can be used repetitively to derive session keys. The frequency of session key generation is based on the value of the key_derivation_rate, which is predefined.

More on securing VoIP networks
Read the table of contents, foreword and preface from Securing VoIP Networks

 This can be thought of as a key-refreshing mechanism that can be used to protect against cryptanalysis (which might otherwise be possible if a single master key is used). For example, an attacker can collect large amounts of session data and attempt to perform cryptanalysis. If the same key is used for the entire data, when that key is discovered all data can be recovered. If multiple keys are used, however, successful cryptanalysis will recover only data associated with the respective key (not the entire session). Therefore, multiple session keys can support perfect forward secrecy. Although frequent session key generation may be desirable and applicable for unicast sessions (for example, between small groups of two or four participants), it is not applicable for large multicast communications because each participant would have to maintain several hundred keys (which, in turn, deplete resources and impact processing and performance). One way to manage multiple SRTP and SRTCP keys is to refresh only the SRTP session keys on a specific interval and use only one key for SRTCP (for example, SRTCP key_derivation_rate = 0). Note that rekeying is necessary in cases where participants may join or leave during a group session (for example, conference calls). The determination of when such rekeying needs to occur is typically left up to the implementation, as long as there is a mechanism to alert all the participants to the expiration of the current key and the issuance of a new one. For example, the application might automatically trigger rekeying each time a participant joins the discussion or departs from the discussion. Either way, rekeying can be a costly computation depending on the number of participants and resource capabilities available on each participant's device.


5. The NULL mode can be used in cases where confidentiality is not desired.


6. J. Bilien, et al. Secure VoIP: Call Establishment and Media Protection. Royal Institute of Technology (KTH). Stockholm, Sweden, 2004.

7. Israel Abad Caballero. Secure Mobile VoIP. Master's thesis, Department of Microelectronics and Information Technology, Royal Institute of Technology, June 2003.

Reproduced from Chapter six of the book Securing VoIP Networks by Peter Thermos and Ari Takanen. Copyright 2008, Pearson Education, Inc. Reproduced by permission of Pearson Education, Inc., 800 East 96th Street, Indianapolis, IN 46240. Written permission from Pearson Education, Inc. is required for all other uses.

For more information, visit our voice over IP security or network protocol security topic centers.


Dig Deeper on Voice and unified communications