To build a DirectX desktop app, you can start with the Win32 Project template in the New Project dialog, or download a Win32 game template , or download a sample from DirectX11 samples or DirectX12 samples as a starting point.
Member list and Quick Info , as shown in the following screenshot, are just two examples of the IntelliSense features Visual Studio offers to make code writing easier and faster. Member list shows you a list of valid members from a type or namespace.
Quick Info displays the complete declaration for any identifier in your code. Refactoring, Auto-complete, squiggles, reference highlighting, syntax colorization, code snippets are some of the other useful productivity features to be of great assistance in code writing and editing.
Navigating in large codebases and jumping between multiple code files can be a tiring task. In the example in the screenshot, Visual Studio brings in the definition of the CreateInputLayout method that lives in the d3d1.
The Visual Studio shader editor recognizes HLSL, FX, and other types of shader files, and provides syntax highlighting and braces auto-completion, making it easier to read and write shader code. Debugging shader code from a captured frame is another great way to pinpoint the source of rendering problems.
Simply set a breakpoint in your shader code and press F5 to debug it. You can inspect variables and expressions in Locals and Autos windows. Troubleshooting issues in the code can be time-consuming. Use the Visual Studio debugger to help find and fix issues faster. When the breakpoint is hit, you can watch the value of variables and complex expressions in the Autos and Watch windows as well as in the data tips on mouse hover, view the call stack in the Call Stack window, and step in and step out of the functions easily.
In the example in the screenshot below, the Autos window is showing us the data in the constant buffer and the value of each member of the device resource object instance, making stepping through DirectX code easy and efficient. But that is not all what the Visual Studio debugger can do.
Rendering problems can be very tricky to troubleshoot. You can inspect each DirectX event, graphics object, pixel history, and the graphics pipeline to understand exactly what occurred during the frame. Learn more about Visual Studio Graphics Diagnostics. If you are looking for ways to increase the frame rate for your DirectX games, Visual Studio Frame Analysis can be very helpful. It analyzes captured frames to look for expensive draw calls and performs experiments on them to explore performance optimization opportunities for you.
The results are presented in a useful report, which you can save and inspect later or share with your team members. For more information on how to use this tool, see blog post Visual Studio Graphics Frame Analysis in action! While the Frame Analysis tool can help pinpoint the expensive draw calls, understanding how your game performs on the CPU and the GPU in real-time is essential as well.
Shipping high-quality games requires good testing. This automatically adds a test project to your solution. This lesson is written with no assumptions about your current skill level and assumes you have never written a graphics application before. The various components of the DirectX API provide low-level access to the hardware running on Windows based operating systems [6].
The first version of DirectX was not released at the same time as Windows 95 but shortly after it in September [6]. DirectX 2.
Through the period of , the DirectX library went through several version changes to reach version 5. Subsequent major revisions were released on an annual basis until DirectX 9 which was released two years after DirectX 8 [6].
DirectX 8. Shader Model 1 [9] was the first shader model which introduced vertex and pixel shaders to the programmable pipeline. DirectX 9. Shader Model 3. Shader Model 4. The geometry shader allows the graphics programmer to create new geometric primitives from simpler primitives for example, take a single point as input to the geometry shader and produce a set of triangles. DirectX 11 was released in October and introduced Shader Model 5.
Shader Model 5. Tessellation shaders provide the ability to dynamically refine the level of detail of a model by computing the triangle primitives from control points of a Bezier surface for example, but other tessellation techniques can also be implemented in the tessellation shader. Compute shaders allow the graphics programmer to create general purpose programs that advantage of the massive parallelism of the Graphics Processing Unit GPU.
DirectX 12 and Direct3D Texture arrays were already possible prior to Shader Model 5. Using descriptor arrays allows texture of varying dimensions and storage formats to be accessed from a single shader variable. On April 11, , together with the Windows 10 creators update version , Shader Model 6. Shader Model 6. The wave-level intrinsic functions added in Shader Model 6. The API that is concerned with hardware accelerated 3D graphics rendering is called Direct3D and is the subject of this article.
Direct2D is a hardware-accelerated, immediate-mode, 2D graphics API that provides high-performance and high-quality rendering for 2D geometry, bitmaps, and text. Direct3D is the primary subject of this article. DirectWrite supports high-quality text rendering, resolution-independent outline fonts, and full Unicode text and layouts.
XInput replaces DirectInput. The DirectX 12 graphics pipeline consists of several stages. The following diagram illustrates the various stages of the DirectX 12 graphics pipeline. The arrows indicate the flow of data from each stage of the graphics pipeline as well as from memory resources such as buffers, textures, and constant buffers that are available in high-speed GPU memory. DirectX 12 Graphics Pipeline [13]. The image illustrates the various stages of the DirectX 12 rendering pipeline.
The blue rectangular blocks represent the fixed-function stages and cannot be modified programmatically. The green rounded-rectangular blocks represent the programmable stages of the graphics pipeline.
The first stage of the graphics pipeline is the Input-Assembler IA stage. The purpose of the input-assembler stage is to read primitive data from user-defined vertex and index buffers and assemble that data into geometric primitives line lists, triangle strips, or primitives with adjacency data.
The Vertex Shader VS stage is responsible for transforming the vertex data from object-space into clip-space. The vertex shader can also be used for performing skeletal animation or computing per-vertex lighting. The vertex shader takes a single vertex as input and outputs the clip-space position of the vertex. The vertex shader is the only shader stage that is absolutely required in order to define a valid pipeline state object [15]. The Hull Shader HS stage is an optional shader stage and is responsible for determining how much an input control patch should be tessellated by the tessellation stage [14].
The Tessellator Stage is a fixed-function stage that subdivides a patch primitive into smaller primitives according to the tessellation factors specified by the hull shader stage [14]. The Domain Shader DS stage is an optional shader stage and it computes the final vertex attributes based on the output control points from the hull shader and the interpolation coordinates from the tesselator stage [14].
The input to the domain shader is a single output point from the tessellator stage and the output is the computed attributes of the tessellated primitive. The Geometry Shader GS stage is an optional shader stage that takes a single geometric primitive a single vertex for a point primitive, three vertices for a triangle primitive, and two vertices for a line primitive as input and can either discard the primitive, transform the primitive into another primitive type for example a point to a quad or generate additional primitives.
This data can be recirculated back to the rendering pipeline to be processed by another set of shaders. This is useful for spawning or terminating particles in a particle effect. The geometry shader can discard particles that should be terminated or generate new particles if particles should be spawned.
The Rasterizer Stage RS stage is a fixed-function stage which will clip primitives into the view frustum and perform primitive culling if either front-face or back-face culling is enabled. The rasterizer stage will also interpolate the per-vertex attributes across the face of each primitive and pass the interpolated values to the pixel shader.
The Pixel Shader PS stage takes the interpolated per-vertex values from the rasterizer stage and produces one or more per-pixel color values. The pixel shader is invoked once for each pixel that is covered by a primitive [15]. The Output-Merger OM stage combines the various types of output data pixel shader output values, depth values, and stencil information together with the contents of the currently bound render targets to produce the final pipeline result.
One of the more difficult concepts to understand for beginning DirectX 12 programmers is synchronization. In earlier versions of DirectX and in OpenGL there was no need to be concerned with GPU synchronization in order to get the GPU to render something, it was usually handled by the driver and required little to no intervention from the graphics programmer.
If GPU synchronization is not handled correctly the programmer will receive errors from the DirectX debug layer that will be difficult to understand and debug. GPU synchronization is also very important to understand when performing resource management.
Resources cannot be freed if they are currently being referenced in a command list that is being executed on a command queue. It is only safe to release those resources after the command queue has finished executing any command list that is referencing those resources. Before going into too much detail about GPU synchronization, a few terms that may not be familiar are described. The Fence object is used to synchronize commands issued to the Command Queue. The fence stores a single value that indicates the last value that was used to signal the fence.
Although it is possible to use the same fence object with multiple command queues, it is not reliable to ensure the proper synchronization of commands across command queues.
Therefore, it is advised to create at least one fence object for each command queue. Multiple command queues can wait on a fence to reach a specific value, but the fence should only be allowed to be signaled from a single command queue.
In addition to the fence object, the application must also track a fence value that is used to signal the fence. A Command List is used to issue copy, compute dispatch , or draw commands. In DirectX 12 commands issued to the command list are not executed immediately like they are with the DirectX 11 immediate context. All command lists in DirectX 12 are deferred; that is, the commands in a command list are only run on the GPU after they have been executed on a command queue.
The Command Queue in DirectX 12 has a very simple interface. The Render method is responsible for rendering the scene. It does this by first populating the command list that contain all of the draw or compute commands that are needed to render the scene.
The resulting command list is then executed on the command queue using the ExecuteCommandList method. The call to to the ExecuteCommandList method will not block the calling thread. It does not wait for the commands in the command list to be executed on the GPU before it returns to the caller. The Signal method will append a fence value to the end of the command queue.
In other words, the completed value for the fence object will be set to the specified fence value only after all of the commands that were executed on the command queue prior to the Signal have finished executing on the GPU. The call to Signal does not block the calling thread but instead just returns the value to wait for before any writable GPU resources that are referenced in the command lists can be reused.
The Present method on line 23 will cause the rendered result to be presented to the screen. The return value from the Present method in this pseudo-code example returns the index of the next backbuffer within the swap-chain to render to. For this reason, the back-buffer resource from the previous frame cannot be reused until the image has been presented to the screen.
To prevent the resource from being overwritten before they are presented to the screen, the CPU thread needs to wait for the fence value of the previous frame to be reached. DirectX 12 defines three different command queue types:. Although the DirectX 12 API defines these three different command queue types, it is not necessarily the case that the GPU in your computer actually has three physical work queues. It may also be the case that the GPU may have one dedicated work queue for each one of these types and it may even be the case that it has multiple work queues of each type.
If you decide to create multiple queues in your own applications, you should allocate one fence object and track one fence value for each allocated command queue.
An example of performing GPU synchronization. In the image above several commands are issued on the main thread. In this example, the first frame is denoted Frame N. The command lists are executed on the command queue.
Immediately after executing the command lists, the queue is signaled with the value N. When the command queue reaches that point, the fence will be signaled with the specified value. Since there were no commands in the command queue in Frame N-1 , execution continues without stalling the CPU thread.
In this case, the CPU has to wait until signal N is reached which indicates that the command queue is finished with those resources. This example demonstrates a typical double-buffered scenario. You might think that using triple-buffering for rendering will reduce the amount of time the CPU has to wait for the GPU to finish its work. Whenever the CPU is faster at issuing commands than the command queue is at processing those commands, the CPU will have to stall at some point in order to allow the command queue to catch-up to the CPU.
It gets more complicated if you add an additional queue. In this case, you must be careful not to signal the second queue with a fence value that is larger than, but could be completed before, a fence value that was used on another queue using the same fence object. Doing so could result in the fence reaching the fence value from the other queue before the main queue has reached the earlier fence value.
Incorrect Synchronization with multiple queues. The moral of the story is to make sure that every command queue tracks its own fence object and fence value and only signals its own fence object. To be safe, the fence value for a queue should never be allowed to decrease.
If the command queue is signaled times per frame and your game is rendering at an average of FPS the queue is signaled 30, times per second , the game could run for about In order to follow along with this tutorial series, you should ensure that you have the following software installed on your computer.
In the following sections, we will create the DirectX 12 demo application. In this tutorial, the demo will only create a window and clear the screen. Rendering of geometry will be handled in a later tutorial.
The preamble of the source includes the header files that are required to create the demo. Any variables that are used for the demo are also declared in the preamble. Since this demo uses the Windows library functions, the ubiquitous Windows. In order to minimize the number of header files that are included in the Windows. The shellapi. This function will be used later to parse the command-line arguments passed to the application. The min and max macros defined in the standard C library header file may conflict with the std::min and std::max functions defined in the algorithm STL header.
To avoid any compiler errors, the min and max macros should be undefined and only the std::min and std::max functions should be used. In the Windows. Since a function with the same name is defined in this source file, the CreateWindow macro is undefined on line The wrl.
The Direct3D 12 header file is included on line DXGI 1. HDR rendering will be discussed in another article. The d3dcompile. It is recommended to compile HLSL shaders at compile time when the application is compiled into an executable but for demonstration purposes, it might be more convenient to allow runtime compilation of HLSL shaders. Shaders will be introduced in the next lesson. The DirectX Math library will be used in the later tutorials.
The D3D12 extension library d3dx The d3dx The algorithm header contains math related functions such as std::min and std::max. The cassert header contains the assert macro. The chrono header contains time related functions. The Helpers.
Currently, the contents of the Helpers. If the function returns an fail code, an exception is thrown. This is useful for debugging the application and simplifies error checking in the main application code. In the next section, the variables used by the application are defined. Tweak variables and variables that control the application initialization are defined first. This value must not be less than 2 when using the flip presentation model.
Details about the swap chain and flip models are discussed in more detail later. The software rasterizer allows the graphics programmer to access the full set of advanced rendering features that may not be available in the hardware for example, when running on older GPUs. The WARP device can also be used to verify the results of a rendering technique if the quality of the vendor supplied display driver is in question. This variable is used to prevent certain window messages such as the window resize message from being handled until after the device and swap chain have been fully created.
When switching to a full-screen window state, the previous size of the window needs to be stored so that when switching back to windowed mode, the window dimensions can be restored correctly. The swap chain is responsible for presenting the rendered image to the window. The swap chain will be discussed in more detail later in the tutorial. The swap chain will be created with a number of back buffer resources. Although the back buffers of the swap chain are actually textures, all buffer and texture resources are referenced using the ID3D12Resource interface in DirectX Generally a single command list is needed to record GPU commands using a single thread.
Since this demo uses the main thread to record all GPU commands, only a single command list is defined. Unlike the command list, a command allocator cannot be reused unless all of the commands that have been recorded into the command allocator have finished executing on the GPU. The back buffer textures of the swap chain are described using a render target view RTV. The render target view describes the location of the texture resource in GPU memory, the dimensions width and height of the texture, as well as the format of the texture.
The RTV is used to clear the back buffers of the render target. In a later tutorial, the RTV will be used to render geometry to the screen. A descriptor heap can be visualized as an array of descriptors views.
A view simply describes a resource that resides in GPU memory. A view in DirectX 12 is also called a descriptor. Similar to a view, a descriptor describes a resource. Since the swap chain contains multiple back buffer textures, one descriptor is needed to describe each back buffer texture. The size of a descriptor in a descriptor heap is vendor specific Intel, NVidia, and AMD may store descriptors differently.
In order to correctly offset the index into the descriptor heap, the size of a single element in the descriptor heap needs to be queried during initialization. Depending on the flip model of the swap chain, the index of the current back buffer in the swap chain may not be sequential. This will cap the framerate of the application to the refresh rate of the screen. The source code for the demo has been organized to minimize the number of functions that need to be forward declared.
The windows message call back procedure is an exception and requires a forward declaration so that the callback function can be used to register the window class. The ParseCommandLineArguments function allows a few of the globally defined variables to be overridden by supplying command-line arguments when the application is executed. Additional command-line arguments for example, to specify the application start in fullscreen mode can be handled by extending this function.
This macro computes the IID based on the type of interface pointer, which prevents coding errors in which the IID and interface pointer type do not match. Windows developers should always use this macro with any method that requires separate IID and interface pointer parameters. The graphics programmer should strive to eliminate any and all errors and warnings that are reported by the debug layer. Before creating an instance of an OS window, the window class corresponding to that window must be registered.
The window class will be automatically unregistered when the application terminates. The window will be created in the center of the primary display device. Care must be taken to prevent the window from being created off-screen. Creating a window larger than the viewable area of the display will cause parts of the window to be offscreen. If the title bar and the window frame are offscreen, then it will not be possible to resize the window to fit in the screen.
The GetSystemMetrics function retrieves specific system metric information. In order to calculate the required size of the window rectangle, based on the desired client-rectangle size, the AdjustWindowRect function is used.
On lines , the dimensions of the adjusted window rectangle are used to compute the width and height of the window that is to be created. The top-left corner point of the window is computed on lines so that the window appears in the center of the screen.
The CreateWindowExW function has the following signature [18] :. The window has been created but it has not yet been shown. The window is shown only after the DirectX 12 device and command queue have been created and initialized. The GetAdapter function is used to query for a compatible adapter.
Before querying for available adapters, a DXGI factory must be created. On line , the DXGI factor is created. For more information, see the reference documentation for QueryInterface. If there are more DirectX 12 compatible GPU adapters for example, the integrated Intel GPU in the system, then the one with the largest amount of dedicated video memory is favored.
The DirectX 12 device is used to create resources such as textures and buffers, command lists, command queues, fences, heaps, etc…. The DirectX 12 device is not directly used for issuing draw or dispatch commands. Destroying the DirectX 12 device will cause all of the resources allocated by the device to become invalid. If the device is destroyed before all of the resources that were created by the device, then the debug layer will issue warnings about those objects that are still being referenced.
In this case, the actual device is created and stored in the d3d12Device2 argument. The D3D12CreateDevice function has the following signature:. As was mentioned previously, the graphics programmer should try to fix any and all errors and warnings generated by the debug layer before releasing the DirectX 12 application to the general public.
In order to facilitate diagnosing errors and warnings generated by the debug layer, the DirectX 12 device provides access to the ID3D12InfoQueue interface. The ID3D12InfoQueue interface is used to enable break points based on the severity of the message and the ability to filter certain messages from being generated. The ID3D12InfoQueue::SetBreakOnSeverity method sets a message severity level to break on while the application is attached to a debugger when a message with that severity level passes through the storage filter.
While all DirectX 12 warnings and errors should be resolved before distributing the application, it may not be practical or feasible to address all of the possible warnings that can occur. In such a case, some warning messages can be ignored. A storage queue filter can be specified to ignore certain warning messages that are generated by the debug layer. Messages can be ignored by category, severity, or specific message IDs can be ignored. No messages are ignored based on their category but the code is left in on line for demonstration purposes.
The following warning messages are suppressed based on their message ID :. The CreateCommandQueue function is used to create the command queue for the application. Screen tearing occurs when a moving image is presented to the screen out-of-sync with the vertical refresh rate of the screen.
An example of screen tearing can be seen in the image below. WDDM 2. The primary purpose of the swap chain is to present the rendered image to the screen.
The swap chain stores no less than two buffers that are used to render the scene. The buffer that is currently being rendered to is called the back buffer and the buffer that is currently being presented is called the front buffer. In previous versions of DirectX, the DXGI presentation model used a bit-block transfer bitblt model to present the rendered image to the display. When using a bitblt presentation model, the Direct3D runtime copied the contents of the front buffer to a Desktop Window Manager DWM redirection surface.
Only after the contents of the front buffer were fully copied to the redirection surface was the image presented to the screen. Windows 8 and DXGI 1. Using the flip presentation model, the Direct3D runtime passes the front buffer surface directly to the DWM for presentation to the screen. The flip presentation model provides a performance improvement in both space and speed since the redirection surface is no longer required and the front buffer does not need to be copied before it is presented to the screen.
The swap chain stores pointers to a number of buffers in GPU memory. After a present, the pointers are updated to swap through the buffer chain. The image above provides a visual example of the DXGI flip model [20].
DirectX 12 does not support the bitblt presentation model and only supports the flip presentation model. There are two flip effects that can be used when creating the swap chain [21] :. The discard means that if the previously presented frame is still in the queue to be presented, then that frame will be discarded and the next frame will be put directly to the front of the presentation queue.
Using this presentation model may cause presentation lag when there are no more buffers to utilize as the next back buffer the IDXGISwapChainPresent1 method will likely block the calling thread until a buffer can be made available. This code is similar to the the GetAdapter function shown earlier and is not described in detail here. This method has the following signature [23] :. Switching to a full screen state will be handled manually using a full-screen borderless window.
In the next sections, a descriptor heap is created and the views for each back buffer are recorded into the descriptors of the descriptor heap. A descriptor heap can be considered an array of resource views. Certain types of resource views descriptors can be created in the same heap. Descriptor heaps will be discussed in more detail in another lesson which deals with binding textures to the rendering pipeline. For now, a descriptor heap is created to store the render target views for the swap chain buffers.
The CreateDescriptorHeap function described above is used to create a descriptor heap of a specific type. A render target view RTV describes a resource that can be attached to a bind slot of the output merger stage see Output Merger Stage.
The render target view describes the resource that receives the final color computed by the pixel shader stage. More complex usages of render targets will be discussed in the next lesson. For this lesson, the render target will only be cleared to a specific color.
0コメント