An Unequal Trade

  • D2D and Threading

    D2D and COM

     

    D2D follows a calling pattern that should be familiar to anyone used to D3D. A single export, D2D1CreateFactory creates a root interface (ID2D1Factory) from which all other objects in D2D are created. This pattern is similar to many COM APIs with the exception that the first call is not a call to CoCreateInstance (or CoGetClassObject). All D2D interfaces inherit from IUnknown, both for the sake of familiarity for developers already used to COM and because it provide a simple interface for reference counting and a type-safe extensibility point in the QueryInterface method. Using interface based design also allows for simple representations of polymorphism through interface inheritance.  It also allows intelli-sense to aid the user of the API to a much larger extent than a flat API would. However, D2D does not use the COM run time in any way, including: BSTRs, standard allocators, VARIANTs, proxies, apartments or registration.

    The simplest explanation for why D2D does not use COM is that on a point by point basis COM doesn’t provide sufficient value for the complexity it incurs, for the sort of low-level usage that D2D needs. (Other components certainly benefit from COM). Fundamentally, almost all D2D’s interfaces cannot be implemented by a component other than D2D. The properties exposed on something like an ID2D1Brush are far less important than the concrete and internal behaviors of the brush objects when they are passed back to D2D for rendering. Any attempt to formalize the interface between the brush and D2D so than any implementation might work would be extremely brittle, hard to evolve and would not be sufficiently high performance. To that extent, D2D doesn’t, arguably, use a component based design at all.

    D2D and Threading

     

    D2D’s threading model should be familiar to anyone who is familiar with all versions of D3D up to an including 10. Without regard to the current COM apartment, if any, the caller can specify whether they want a single threaded or multi-threaded instance of the root D2D factory interface.

    If they ask for a single threaded instance, the factory returned to the caller and all derived objects returned from it, can be used by a single thread - at a time. There is no notion of thread affinity, and if the object is called from multiple threads at the same time, D2D can enter unpredictable states and might crash.

    You can create as many single threaded instances as you like and you can call them from any apartment you want, even an MTA. For example, an ISAPI extension might create a single threaded instance of D2D to render and serve up a page and there might be many such single threaded instances within the same MTA.  Each single threaded instance has absolutely no serialization against any other single threaded instance within D2D, so, this mechanism provides a very large degree of scaling on the CPU.

    The caller can also create a multi-threaded instance of D2D, in this case the factory and all derived objects can be used from any thread, and, each render target can be rendered to independently. D2D serializes calls to these objects, so, a single multi-threaded D2D instance won’t scale as well on the CPU as many single threaded instances. However, the resources can be shared within the multi-threaded instance.

    You might note that the qualifier “On the CPU” was used quite a lot throughout this discussion. GPUs generally take advantage of fine-grained parallelism more so than CPUs, so, you need to think about scaling for GPU based rendering a little differently. For example, multi-threaded calls from the CPU might still end up being serialized when being sent to the GPU, however, a whole bank of pixel- and vertex- shaders will run in parallel to perform the rendering.

    You should also note that there is one exception to the serialization of calls into multi-threaded objects in D2D, D2D geometries are immutable and hence all geometry operation are both lock free and completely reentrant.

    D2D and Domains

     

    Coupled to this notion of threading is the notion of D2D resource domains. Every object created from a root D2D factory can only be used with objects created from the same D2D factory instance. Forcing this simplification allows course grained locking within D2D, which generally results in better performance than attempting to lock each D2D object independently.

    D2D does allow memory buffers to be shared as input bitmaps and render targets, when using software rendering and it also allows hardware surfaces to be shared (even across process) when doing hardware rendering via “DXGI interop”. There is no implied locking on the hardware surfaces in this case.

    Next up….

     

    For my next post I will provide an example of a multi-threaded D2D factory being shared between two STA apartments. The foreground apartment will do UI based rendering and the background will do bitmap decoding.

  • Why "An Unequal Trade"

    We live in a society obsessed with equality in various forms. We want equal pay for equal work. We have equal air time for political candidates, without regard to the accuracy of their statements. We want equal resources for every child.

     

    However, in practice, reality isn’t particularly equal. Taxes aren’t that negotiable. We have never broken the law of gravity, but it has broken many of our structures. And, no matter how hard we try, how computers are built, how hard-drives spin, how cache lines and registers work can have profound impacts on how our code works and performs. Design too is unequal. Any architecture, just by being something optimizes for some cases and not for others, small looking features become difficult, or even impossible. What engineers do, even software engineers, is not, ultimately just a mathematical exercise.

     

    I started this blog because I can finally speak about D2D or Direct 2D after a year and bit of working on it. I had the role of working on how the D2D API would be exposed, and hence, to some extent, how it could be implemented. D2D embodies a set of principals and compromises. I would like to hear what you think about them.

     

    There is a final way in which this exchange is unequal. There are many more of you than there are of me. You understand your problems better than I do. I would like to hear about the ways in which D2D helps you, how it hinders you and how it could be improved, I am sure that I will learn more from you than you will from me.

     

    As well as talking about what differentiates D2D from other graphics systems and providing samples, I will veer into other territory related to software engineering.

     

    Might as well jump into the first differentiator:

     

    D2D is fundamentally hardware oriented

     

    D2D stands for Direct 2D and is part of the family “Direct” APIs.  It is layered on top of D3D 10, which despite the 3D in its name, is really a Hardware Abstraction Layer for GPUs. D2D takes the capabilities of a video card and exposes a 2D API with a set of primitives similar to other 2D APIs, for example, GDI, GDI+, 2D WPF and Quartz 2D. D2D takes the “Direct” in its name quite seriously.  This is illustrated in the following diagram:

     

     D2D Difference

     

    Many hardware accelerated 2D APIs start with a CPU focused resource model and a set of rendering operations that work well on CPUs. Then, various operations in this API are accelerated. This requires a resource manager to map CPU resources down to resources on the GPU. In fact, due to limitation on the GPU, there might be a many to many association between the resources visible to the application and resources on the GPU. Then, various rendering operations are accelerated on the GPU. Typically, not all of them can be. This can require communication backwards and forwards between the CPU and GPU in order to transition to CPU rendering (which is expensive), or it can sometime force rendering to fall unpredictably back to the CPU entirely. In addition some of the rendering operations that look simple might require temporary intermediate rendering steps that are not exposed in the API, which in turn requires more GPU resources.

     

    D2D takes a much more direct mapping to exploiting the GPU. Geometry is kept on the CPU, most other resources directly map to resources on the GPU. Rendering calls are performed by producing vertex and coverage information from the geometry and then combining this with texturing information produced from the hardware resources. Any intermediate rendering is directly controlled by the application.

     

    D2D’s approach has a set of advantages:

     

    The application directly controls resource cost. Including both how much system memory and video memory is used.

     

    Performance is predictable and discoverable - any operation exposed in the API will be performed on the GPU. If this wasn’t possible, the operation simply wasn’t exposed in the API.

     

    There is no communication from the GPU back to the CPU.

     

    Rendering calls don’t stall on GPU resource creation. The application can ensure that whatever GPU resources it needs are created up front. (Or even in another thread).

     

    D2D’s approach has a set of disadvantages:

     

    The quality is what the hardware is capable of. If the hardware has lower precision, the rendered result is less precise. It should be noted that DX10 based hardware conforms to much stricter precision tolerances than DX9 based hardware. D2D’s software rendering stack doesn’t have the same complication, and the application is free to use the SW path if the GPU precision isn’t sufficient for their requirements.

     

    Limitations of the GPU aren’t virtualized. For example, if the hardware supports a maximum texture size, that is the maximum bitmap size. It should be noted that in systems which attempt to work around this, exceeding the maximum texture size can result in higher resource usage and combinations of rendering operations and parameters which unexpectedly must be performed on the CPU. Since D2D also has a SW rasterizer, we have, in a sense, simply ensured that this limitation is handled consciously by the application in a discoverable manner.

     

    If the GPU goes away, some of your resources go away too.

     

    Zemblanity

     

    In a somewhat typical example of this natural opposite of Serendipity, I will be going on vacation shortly. It might be presumptious :), but some of my responses to you might be delayed as a result.

     

    For more on Zemblanity see: http://blogs.msdn.com/Tmulcahy/

     

    Other resources

     

    See the PDC session: http://channel9.msdn.com/pdc2008/PC18/

     

    Also, see Thomas Olsens blog: http://blogs.technet.com/thomasolsen/

     

     


© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker