DirectX Video Acceleration


DirectX Video Acceleration is a Microsoft API specification for the Microsoft Windows and Xbox 360 platforms that allows video decoding to be hardware-accelerated. The pipeline allows certain CPU-intensive operations such as iDCT, motion compensation and deinterlacing to be offloaded to the GPU. DXVA 2.0 allows more operations, including video capturing and processing operations, to be hardware-accelerated as well.
DXVA works in conjunction with the video rendering model used by the video card. DXVA 1.0, which was introduced as a standardized API with Windows 2000 and is currently available on Windows 98 or later, can use either the overlay rendering mode or VMR 7/9. DXVA 2.0, available only on Windows Vista, Windows 7, Windows 8 and later OSs, integrates with Media Foundation and uses the Enhanced Video Renderer present in MF.

Overview

The DXVA is used by software video decoders to define a codec-specific pipeline for hardware-accelerated decoding and rendering of the codec. The pipeline starts at the CPU which is used for parsing the media stream and conversion to DXVA-compatible structures. DXVA specifies a set of operations that can be hardware-accelerated and device driver interfaces that the graphic driver can implement to accelerate the operations. If the codec needs to do any of the defined operations, it can use these interfaces to access the hardware-accelerated implementation of these operations. If the graphic driver does not implement one or more of the interfaces, it is up to the codec to provide a software fallback for it. The decoded video is handed over to the hardware video renderer, where further video post-processing might be applied to it before being rendered to the device. The resulting pipeline is usable in a DirectShow-compatible application.
DXVA specifies the Motion Compensation DDI, which specifies the interfaces for iDCT operations, Huffman coding, motion compensation, alpha blending, inverse quantization, color space conversion and frame-rate conversion operations, among others. It also includes three sub-specifications: Deinterlacing DDI, COPP DDI and ProcAmp DDI. The Deinterlacing DDI specifies the callbacks for deinterlacing operations. The COPP DDI functions allow the pipeline to be secured for DRM-protected media, by specifying encryption functions. The ProcAmp DDI is used to accelerate post-processing video. The ProcAmp driver module sits between the hardware video renderer and the display driver, and it provides functions for applying post-processing filters on the decompressed video.
The functions exposed by DXVA DDIs are not accessible directly by a DirectShow client, but are supplied as callback functions to the video renderer. As such, the renderer plays a very important role in anchoring the pipeline.

DXVA on Windows Vista and later

DXVA 2.0 enhances the implementation of the video pipeline and adds a host of other DDIs, including a Capture DDI for video capture. The DDIs it shares with DXVA 1.0 are also enhanced with the ability to use hardware acceleration of more operations. Also, the DDI functions are directly available to callers and need not be mediated by the video renderer. As such, a program can also create a pipeline for simply decoding the media or post-processing and rendering. These features require the Windows Display Driver Model drivers, which limits DXVA 2.0 to Windows Vista, Windows Server 2008, Windows 7, Windows Server 2008 R2 and Windows 8. On Windows XP and Windows 2000, programs can use DXVA 1.0. DXVA 2.0 allows Enhanced Video Renderer as the video renderer only on Vista, Windows 7, and Windows 8. DXVA integrates with Media Foundation and allows DXVA pipelines to be exposed as Media Foundation Transforms. Even decoder pipelines or post-processing pipelines can be exposed as MFTs, which can be used by the Media Foundation topology loader to create a full media playback pipeline. DXVA 1.0 is emulated using DXVA 2.0. DXVA 2.0 does not include the COPP DDI, rather it uses PVP for protected content. Windows 7 implements DXVA-HD if the driver complies with WDDM 1.1.

DXVA2 implementations: native and copy-back

DXVA2 implementations come in two variants: native and copy-back.
With native implementation, the decoded video stays in GPU memory until it has been displayed. The video decoder must be connected to the video renderer with no intermediary processing filter. The video renderer must also support DXVA, which gives less freedom in the choice of renderers.
With copy-back implementation, the decoded video is copied from GPU memory back to the CPU's memory. This implementation doesn't have the limitations mentioned above and acts similarly to a normal software decoder; however, video stuttering will occur if the GPU is not fast enough to copy its memory back to the CPU's memory.
Native mode is advantageous unless there is a need for customized processing, as the additional copy-back operations will increase GPU memory load.

Software