What Is the SIP Protocol?

By: on April 15, 2020

The session initiation protocol (SIP), a popular internet telephony protocol, forms the foundation of all types of internet communication sessions. It establishes sessions, manages signaling, and terminates the connection when the sessions end.

If you want to know what SIP is, you may also want to know:

  • How do voice and video calls travel across the internet?
  • How do messages travel over internet protocol (IP) networks?
  • How are mobile calls made over Long Term Evolution (LTE) or Voice over Long Term Evolution (VoLTE) networks?

To fully grasp how internet-based phone systems and network services such as SIP trunking work, you’ll need to understand SIP. We’ve created a two-part guide to answer all your SIP-related questions. In this part, we’ll focus on the protocol itself. In the second part, we’ll talk about SIP trunking—the primary use of SIP that most IT managers should know.

The “P” in SIP represents “protocol.” Let’s first understand what a protocol is.

What is an internet protocol?

A protocol is a set of rules that defines how two or more computing devices (laptops, smartphones, routers, network switches, etc.) communicate with each other.

To keep things simple, we’re going to focus on protocols that are involved in making and receiving voice and video calls over the internet. This is known as Voice over Internet Protocol (VoIP) technology. Systems that enable the transmission of voice and video calls through internet networks are known as VoIP or business phone systems.

It is important to remember that VoIP isn’t a protocol itself. Instead, it’s an umbrella term for all the technologies involved in transporting voice and video information using IPs.

Communication between networked devices on the internet doesn’t just involve a single protocol. Multiple protocols work simultaneously by building on top of each other in layers, collectively known as a “protocol stack.” Different models explain how protocols layer on top of each other, but the Open Systems Interconnection (OSI) model, developed by the International Organization for Standardization (ISO), is the most commonly used.

Different layers of the OSI model

The OSI model with locations of protocols involved in VoIP technology (Source)

What is an SIP?

SIP is an application layer protocol and the foundation of modern interactive communications over the internet (voice calls, video calls, etc.).

What is SIP used for?

SIP is a media-independent protocol—it’s not voice, it’s not video, it’s not data—it could be anything. While it’s mostly applied to VoIP, it’s not a VoIP protocol.

Gary Audin, tech writer, expert in VoIP and IP telephony

SIP simply initiates and terminates an IP communication session, which could be a voice call between two people or a video conference between a team. It sets up the session by sending messages—in the form of data packets—between two or more identified IP endpoints, also known as SIP addresses. Every SIP address is linked to a physical SIP client (e.g., an IP desk phone) or a software client (e.g., a softphone).

The image below depicts the initiation details of an SIP session. INVITE is an SIP message used to request participation from another SIP client. The chunks of text resembling email addresses are the participants’ SIP addresses.

SIP invite message

An SIP invite message (Source)

SIP tells you the presence of the other party, makes a connection and lets you do whatever you want over the connection, but it has no idea of what’s going over the connection.

Gary Audin, tech writer, expert in VoIP and IP telephony

SIP doesn’t encode, decode, or transport any information during these sessions. That’s why it can be used for video conferencing and instant messaging as well as making phone calls over the internet. We’ll leave the other uses of SIP aside for now and focus on how the protocol works during a voice call.

How does SIP work in a VoIP call?

SIP doesn’t work alone during VoIP calls. Several other protocols work along with it to ensure voice data reaches its destination. The session description protocol (SDP) is one such protocol.

While SIP communicates with IP endpoints to exchange signaling details, SDP conveys session-related information to help participants join or receive details of the session. It sends three types of information: session description, time description, and media description. SDP doesn’t transport these details itself. Instead, session descriptions are included as a payload of SIP messages.

Before being transported over the network, voice information is encoded using codecs that translate audio signals into binary data. Many codecs are used for this purpose, but the two most common are:

  • G.711 codec: Used for uncompressed digital voice. Audio quality is better than other codecs, but it uses more bandwidth.
  • G.729 codec: Used for compressed voice. It lowers the audio quality to reduce the amount of transmitted data and the resulting bandwidth consumption.

Encoded packets of audio data are carried by the real-time transport protocol (RTP), a specialized application layer protocol used for real-time streaming of audio and video data. RTP sessions are independent of SIP. RTP sessions run parallel to SIP sessions, unlike SDP, which is a payload of SIP.

RTP works alongside the RTP control protocol (RTCP), which exchanges information related to service quality, including the number of data packets exchanged, number of packets lost, and round-trip lag time. Using RTCP details, the service quality of sessions can be monitored. RTCP information isn’t mixed with the RTP data stream and is delivered through separate sessions that run parallel to the RTP streams.

The image below depicts the exchange of RTP and RTCP data packets in a VoIP session with three participants.

RTP and RTCP data flow in a VoIP session (Source)

RTP, RTCP, and SIP (with the SDP payload) data packets are transported to their destinations using transport layer protocols. The two most commonly used protocols are explained below.

  • Transmission control protocol (TCP): Transports packets in an ordered sequence. For every packet sent, the receiving end sends back a receipt acknowledgment packet. If the acknowledgment packet isn’t received within a certain time or if it states that there was a problem, then the original packet is re-sent. TCP is designed for accuracy and ensures data packets are delivered in their original sequence.
  • User datagram protocol (UDP): Transports data without detecting out-of-sequence packets or retransmitting lost packets. Packets can not only be delivered in an incorrect order but can also be completely left out. The main aim of UDP is to get the packets delivered to their destination as soon as possible.

Given its focus on real-time data transmission, UDP is more suitable for VoIP calls than TCP. Although lost and out-of-sequence packets in UDP can cause slight audio quality issues, in many cases these aren’t detected by the human ear. Also, the delay caused by the reordering and retransmitting of TCP packets can result in poor audio quality or even dropped calls.

Framework of a VoIP call between two endpoints

Framework of a VoIP call between two endpoints

At this point, you may be asking why is SIP so important if all it does is set up and tear down calls. Well, the telecommunication industry has standardized on SIP as the preferred protocol for VoIP communication, precisely because SIP isn’t itself involved in encoding and transmitting data. It simply establishes a session over the network.

Also, protocols written to support VoIP became obsolete with time, and every time something required fixing, the protocols had to be rewritten, which was a challenge. But SIP helps overcome this challenge. It’s designed as a standard protocol where another standard defines the media you’re moving—so you don’t have to rewrite the protocol again.

Conclusion and next steps

This high-level overview of the protocols involved in a VoIP call should be sufficient for most IT managers. Only application developers at telecom companies need to understand the mechanics of each protocol and the relationships between them.

If you’re just deploying and administering a VoIP phone system, the details covered in this article are more than enough. However, for IT managers, it’s important to understand SIP trunking, a network service central to the functioning of most IP phone systems. We’ve explained SIP trunking in the second part of this article, which you can read here.

If you need help in choosing a specific VoIP system or SIP trunk provider, our advisors are here for you. Software Advice advisors provide free, fast, and personalized software recommendations, helping businesses of all sizes find software that meets their specific business needs. Schedule an appointment with an advisor here.

You may also like:

What Is SIP Trunking?

Top Considerations for Selecting and Implementing a SIP Provider (Part 1)

Cloud vs. On-Premise? 2 Unified Communications Case Studies

Compare Business Phone Systems