1. Mission statement
To start out with a precise description, what I'm actually working on is primarily point #2 of the rant thread, ie. the rendering pipeline rewrite. In order to explain what the results of that are going to be, however, some context is in order: Performance in the client is -- both in the current client and in the new system (at least for OpenGL) -- determined by three fairly independent aspects of the client:
- Actual client rendering: This is the vast bulk of the client rendering code, the basic task of which is to take the information that the client has from the server (about the map, objects on it, and such things) and turn it into OpenGL commands.
- OpenGL dispatch: Having produced OpenGL commands, said commands need to be dispatched to the video card driver (the OpenGL implementation), which in turn converts them to hardware commands in a format that the specific GPU in the system understands.
- GPU work: Finally, the commands produced above are consumed by the GPU, which does the work of actually producing visible output.
To put the rendering pipeline rewrite into context, then, it has three main focuses:
- Optimize the actual client rendering: It is my intention, hope and expectation that the rewrite will effectively remove the client rendering code itself from being a bottleneck, transferring it to either of the other two. As mentioned in the rant thread, the method for achieving this is to make rendering information more stateful and persistent, being able to reuse a lot of calculations from one frame to the next.
- Make the system more asynchronous and non-blocking: As mentioned in the rant thread, I find a common and rather serious performance problem is that moving around in complex scenes causes a lot of stutter, caused by having to do a lot of fairly complex recalculations whenever the things being rendered change substantially (or sometimes even at all), and perhaps more importantly needing to do so in a way that blocks rendering. The intention is to be able to effect changes to the rendering state asynchronously, so that rendering of one frame can proceed even while things are being added, removed or otherwise changed.
- Enable porting to different graphics interfaces: As a (rather big and important) part of the rewrite, I'm introducing an indirection layer in between the client rendering code and the graphics interface in order to isolate the client code from OpenGL, and enable porting it to other graphics interfaces, the obvious candidate being Vulkan. In theory, I'm fairly confident that the system I now have is general enough that I could also write backends for stuff like DirectX or Metal if I wanted to, but I don't expect to want to. More on this later.
- In and of itself, the rewrite only affects the actual client rendering, and not the OpenGL dispatch work, or the work done on the GPU, so if you are bottlenecked on either of those, the rewrite won't do anything to help. That being said, it does enable a few interesting things particularly in the area of OpenGL dispatch. There's generally more information available to do various interesting optimizations. More on this later. As for GPU optimizations, that is still pretty much as stated in the rant thread.
- It does not bring, really, any visible changes at all. If you're hoping for it to include stuff like scalable UIs or higher graphical fidelity or anything like that, then I will have to disappoint you. It's for performance only, with probably not a shadow of a pixel in visual difference. That's not to say that either of those aren't worthwhile, it's just a completely orthogonal thing, and there's no benefit to doing it as part of the rewrite.
- It is not the end of all optimizations. Most particularly, the rewrite itself does not address point #1 in the rant thread (instancing of animated meshes). Such things come afterward; more on this later.
2. Current progress
It can probably be said that the rewrite involves three major steps:
- 1. Formulating an abstract rendering architecture that the client can use for rendering, and for which such backends such as OpenGL or Vulkan can be written;
- 2. Writing an implementation of said rendering system; and
- 3. Converting the actual client to use said rendering system.
Again, most of the work is converting every little detail of rendering to the new system, so it's a bit hard to break the remaining work into perfectly well-defined chunks, but among the more well-defined blocks that I can make out that are yet to be converted are such things as: Converting sprite creation to be asynchronous and non-blocking, click-testing, shadows, MSAA resolve filters, and automatic instancing.
3. Future work
After the above is done, it is my intention to push the rewritten client to be used henceforth, but as mentioned above, that doesn't mean that there are no more optimizations to be done. The main things I see once the new client is in use are the following:
- Vertex-data atlasing: As part of the OpenGL dispatch work, I have identified that it appears that one of the more expensive things that the client does on OpenGL is switching VAOs, and much of said switching may be unnecessary, but is hard to remove in the current rendering system. With the new system in place, I intend to group models that share the same vertex "format" (number and nature of bound vertex attributes, attribute stream specifications, &c., as seen by the GPU) into much fewer VAOs, which should decrease VAO switching by some orders of magnitude, which I suspect will help quite a bit on the OpenGL driver side of things.
- Instancing animated meshes: As stated in the rant thread, pretty much.
- Instancing of variable materials: Most material varations are in textures only, and in theory such textures can be stuffed into array textures and then instanced, reducing draw call overhead for them by a lot. Probably, this will allow a much greater range of things to have variable materials, like wooden crates, barrels, herbalist tables, individual wall segments, &c&c.
- Graphical detail slider: As mentioned above, I'm still not entirely sure exactly what to do about optimizing for the GPU case. While I see a few possible things to do, all of them are fairly theoretical, and I'm not sure how much they'd actually give. As such, it seems the main thing to do for GPU optimization would be to just reduce the graphical fidelity for various things. Technically, this is something I could have done for a long time, but I haven't really wanted to since I've wanted to do non-degrading optimizations first. The rendering system rewrite has been the main non-degrading optimization on my agenda, however, so with it complete, it's probably time to start looking at this.
4. So what about Vulkan?
For compatibility reasons, I'm still primarily targeting OpenGL 2.0, which is less than optimal in order to extract optimal performance. Most of the overhead of OpenGL 2.0 lies in the area of OpenGL dispatche as defined above, and with the actual client rendering being removed as a bottleneck (hopefully), OpenGL dispatch may or may not take over that dishonorable spot, depending on the GPU used. As mentioned in the rant thread, one of the main allures of Vulkan is the ability to reuse constructed command-lists, which holds the possibility to pretty much remove dispatch overhead altogether, so that's clearly the way to go for the future.
In the immediate, however, Jogamp (the parent project of JOGL) does not yet support Vulkan, so that's a bit of a roadblock. It seems that LWJGL has some sort of Vulkan support, but last I looked at it, it seemed fairly preliminary and makeshift. Even if that changes and becomes more robust, I'm not too fond of LWJGL, particularly because it seems to require using its own windowing toolkits and stuff, whereas I want to go on using AWT for the windowing system abstraction. There may be things that can be done about that (primarily, I've considered using LWJGL to render the final image off-screen, and then just use Java 2D to draw that into an AWT window), but either way, all that seems to be at least slightly in the future still.
Since I'm not sure just how far in the future it is to get Vulkan support, I have been considering alternatives. OpenGL 4 does alleviate quite a few of the issues plaguing OpenGL 2, especially if one uses various more-or-less-common extensions to it. However, it's still not as good as Vulkan, and considerably more complex to implement, so unless Vulkan support really takes a long time to reach Java, I'm not too sure I want to waste time on that.
--
I know I did previously promise an essay, and if nothing else, I seem to have delivered on that particular promise. I hope it's interesting in some way.