What is video encoding?

Video encoding refers to the process of reducing the size of video files and streams as well as converting them into different formats for distribution. Video encoding is often used interchangeably with the term transcoding. But, they are not necessarily the same, which we will delve deeper into and explain later on.

In this article about video encoding we’ll cover the following topics:

What’s the purpose of video encoding?
Is there a difference between video encoding and transcoding?
Does a video encoder for live streaming work differently?
How does software encoders work? (Common principles and applications)

What’s the purpose of video encoding?

The purpose of video encoding can be defined as follows:

“... the process of preparing the video for output, where the digital video is encoded to meet proper formats and specifications for recording and playback through the use of a video encoder software.”

Video encoding is an essential part of getting videos ready to be delivered to viewers and give them a high-quality playback experience. Essentially, encoding is the conversion of a video file from one format to another better-compressed version, preparing it to allow consumers to stream the video on different devices at the highest possible quality, without having to worry about buffering.

Is there a difference between video encoding and transcoding?

In the theoretical sense, yes, in the practical, no.

The definition of video encoding and transcoding varies greatly depending on who you ask. Some claim they are the same thing, synonyms to each other. Others, that encoding is simply a process within transcoding. The definition has become increasingly blurred throughout the years.

The original definition of the two is as follows:

Encoding converts from “raw” video into an encoded format
Transcoding converts from one encoded format to another

At other times they are distinguished by what type of file they are turning:

Encoding - Turn an uncompressed file and turn it into another type of file format
Transcoding - Turn a compressed file and turn it into another type of file format

Does this matter in any significant way? Not really. If you keep treating them as synonyms to each other, you will do fine, and you can be sure they will continue to be used interchangeably with each other moving forward.

Does a video encoder for live streaming work differently?

No, the basic principles are pretty much the same. An encoder for live streaming works as one creating files reducing the amount of data needed to contain or carry the video. However, the constraints on each process vary. For example, in streaming, latency - the time between a bit of video going in and coming out of the encoding process - can be incredibly important whereas, in a file-to-file process, it can be less significant.

Note that currently, the encoding solutions available through VidiNet, the Vidspine media service platform, are focused on file-to-file and not live stream encoding.

How does video encoding work?

When you capture or process a video, the starting point is often referred to as “baseband”, which is sometimes referred to as an “uncompressed” video. There are however instances where some encoding or compression has already been applied at this stage.

The video is “encoded” to baseband in a couple of different dimensions including:

Resolution (SD, HD, 4K, etc.)
A captured video is always quantised to certain vertical and horizontal dimensions.
Chroma (Color) and Luma (Brightness)
In the natural world, there are infinite degrees of hue, saturation, and brightness. But on a computer, you get to choose 256 levels of Red, Green, and Blue. The “baseband” video is encoded into a colour space on similar principles. Sometimes with RGB but more often other combinations like YUV. This can become an important aspect of the amount of data since although at this point the video is not referred to as compressed, we can in many cases throw away (or compress) some of the colour information that our eyes aren’t sensitive to and reduce the video file size.

Common principles

Encoding refers to several different processes in the content chain. The principles of encoding are the same for all the processes, the difference is how the principles are applied, and considerations around the encode depend on which process you’re talking about. To try and make a rather complex more tangible we’ll therefore give a summary of the most important common principles regarding compression, codecs, and other necessary information to be able to understand the process of video encoding.

Types of compression

In the encode process, you always try to keep the video quality as high as possible using the least possible data. To reduce the amount of data, there are three types of compression:

Lossless
Lossless means that all and any data that previously has been removed can be recovered completely. In the non-video world, a ZIP-file is the best and most well-known example of lossless compression, where similar techniques are used in video encoding.
Visually lossless
Visually lossless means the original data can’t be recovered, but the result is visually identical. This compression relies on models of how we perceive video and generally how the human eye works.
Lossy
Lossy is compression where some of the “quality” of the image is lost. Depending on the application and level of compression, this may be impossible for the viewer to notice. Lossy compression is used in a vast majority of video workflows where, generally speaking, the further we move from the content origination towards the viewer, the more compression is used.

What are codecs?

A video compression technology called codec is used to encode/decode and compress/decompress video. They simply allow us to tightly compress a bulky video for delivery and storage. The codecs apply algorithms to the video and create a facsimile (a copy) which is shrunk down for storage and transmission and later decompressed for viewing.

A video is made up of several still images (or frames) played in sequence. Therefore, most codecs use a combination of intraframe and interframe compression.

Intraframe compression
Intraframe compression is, at a high level, the same as the encoding used on digital photographs. There are some things the eye can’t see in moving images that it will instil, which can be exploited. Therefore, intraframe compares differences in pixels in the horizontal and vertical planes of the image and, normally in blocks of pixels, stores the variance between them rather than the full data for each pixel – you can see those blocks sometimes when you get a data interruption in a video stream.
Interframe compression
Interframe compression does the same thing but between the frames, based on the principle that in many scenarios, there isn’t a massive change in the image between frames except at scene changes. Several frames are therefore grouped into a “group of pictures” (GOP), and, again, the variance between rather than the whole image data is stored. That number of frames is typically “fixed GOP length” but can be “variable GOP length” in some applications

Changing between codecs

When changing between codecs, or “transcode”, you have to go back to the baseband - i.e. each time we decode and encode the video. Where lossy compression is used, the concept of “generational loss” becomes an issue. Generational loss refers to the loss of quality between subsequent copies or transcodes of data. Different codecs have different generational loss performance, but as a general rule, to preserve quality, we need to reduce the amount of transcoding applied to the video through the content chain. Due to the generational loss, it’s important to make considerations when thinking about the different encoding process and codecs to use.

For example, going back to GOPs. In editing, if we only use intraframe encoded material as our source, we can “copy” frames we don’t change directly to the output, only decoding and re-encoding frames that we’ve manipulated. However, if we use interframe encoded material, we need to decode and re-encode entire GOPs and, given that we probably need to maintain constant GOP lengths, this means re-encoding all used frames, introducing a generation of loss. Therefore in editing, and especially where turnaround time (speed) is important, intraframe codecs are often preferred.

Speed vs. Quality vs. Efficiency

Another important thing to note is that encoding tends to be a factor of speed vs. quality vs. efficiency (level of compression). Often at times, you’ll have to make a compromise where if you want something encoded quickly, you’ll need to choose between either quality or efficiency - if you want both of them you’ll need a lot of horsepowers, thus becoming expensive. If you want quality and efficiency, it’s either going to take a lot of time or a lot of horsepowers.

Applications

Baseband Ingest/Encode

In this workflow, you’re capturing baseband video streams coming from a studio, tape or other line feeds and encoding them to files. This was historically (and still largely is) done on dedicated hardware called “video servers''. Nowadays, these streams have started being migrated from SDI (Serial Digital Interface) to IP, and that’s the direction it will continue to go.

Quality is key within this area as this material is often used in production and/or archived downstream. Since the files are also used in production, one might want to choose intra-frame codecs. The files will be bigger as a result, but that’s a trade-off in terms of speed and quality in the workflow.
Camera/Card Ingest

Outside of the studio (see above) most cameras now write captured video directly to a memory card. As such, the video is already encoded. However, there is still an ingest and/or encode process as the video enters the media supply chain. There may be no change in codec at this time, in which case the ingest process is only related to metadata. Although, In some cases, there is a change in codec and/or video wrapper (the file format) so an “encode” process may happen. The codecs here will be the same as the baseband ingest use case above.
Contribution Encoding

Contribution encoding is a specific use case for news and live events where the video is “contributed” to a studio or other production from a remote location – e.g. a newsgathering vehicle or the judging/scoring in Eurovision Song Contest. With live/studio events, synchronisation between locations is key so latency needs to be kept to a minimum. As such, speed and efficiency outweigh quality – hence remote news reports often have a weaker colour or slightly blocky images compared to the studio. Contribution encoding regularly uses dedicated hardware and often proprietary codecs. They also often require a decoder at the studio/receiver end.
Transcode/re-wrap

At some points in the workflow, it can be necessary to transcode video for interoperability reasons. Another point where transcode happens is to prepare a video for delivery to other organisations. The vast majority of transcoders are software-based using CPU only. Only a few of them use GPU. There are very few that are hardware-based, and only a handful still in circulation that use standard CPU and/or GPU, but in a proprietary chassis. Codecs could be anything here depending on the next step in the workflow.
Distribution Encoding (live)

In “traditional” TV broadcast scheduled programs are “played out” from a video server to baseband (similar or same device as baseband ingest above) and passed to a distribution encoder. These encode the streams using high-efficiency codecs – maximising quality for the lowest bitrate, often introducing a delay in the video signal as a result, and also “transmux” (repackage the file into different delivery formats) the streams into groups which are then packaged into streams for OTA (Over The Air) or satellite/cable distribution. Distribution encoders have used IP transport for a while but the majority are still hardware-based – that’s beginning to shift.
Distribution Transcoding (live)

Often, downstream of the distribution encoders, there will be transcoders that perform a “live” transcode on the video and/or remux. This might be for local distribution and/or to facilitate things like local advert insertion etc. Again, primarily hardware-based.
Distribution Encoding (live) – OTT

OTT, or “Over The Top”, uses the web to distribute content for streaming services. When introduced, these were hardware encoders that sat alongside the traditional distribution encoders and in many cases still do. OTT encoding is interesting as the “codecs” actually include several different versions with different encode parameters – like other distribution encodings. It’s focused on quality and efficiency, but different versions at different qualities and bit rates will be included in the package, such that viewers get the best quality stream without any buffering. Of course, software encoding is possible.
Distribution Encoding (VOD)

Video-on-demand encoding is essentially just a transcode but with some specific parameters. There are a few hardware encoders still available but this is primarily done in software. The “variable” codec levels mentioned in OTT can be used here as well, but not in all cases – some systems still “buffer” or download video to make it available.

Get access to video encoding software in VidiNet today

Through our cloud-based media solution VidiNet you get access to some of the best software encoders on the market including:

Our own video encoding software VidiCoder
AWS Elemental MediaConvert
Bitmovin video encoder

Contact us directly if you have any further questions about what video encoding is, or need help in getting started with our video encoding software through VidiNet.

Contact us to get a free demo or let us help you create a customized trial based on your needs.

Stay up to date with the latest news & updates from Vidispine. Subscribe to our newsletter.

You May Also Be Interested In

VidiNet

VidiNet is a cloud-based platform at the heart of the content ecosystem. The foundation for a broad range of applications and services, VidiNet provides a robust footing for the complete content chain.

Learn more