Client performance update thread

by **loftar** » Thu Aug 02, 2018 12:58 am

I know I've kind of promised a status update for quite a while now, I just haven't really known what to write. Let's see if I can come up with something.

1. Mission statement
To start out with a precise description, what I'm actually working on is primarily point #2 of the rant thread, ie. the rendering pipeline rewrite. In order to explain what the results of that are going to be, however, some context is in order: Performance in the client is -- both in the current client and in the new system (at least for OpenGL) -- determined by three fairly independent aspects of the client:

Actual client rendering: This is the vast bulk of the client rendering code, the basic task of which is to take the information that the client has from the server (about the map, objects on it, and such things) and turn it into OpenGL commands.
OpenGL dispatch: Having produced OpenGL commands, said commands need to be dispatched to the video card driver (the OpenGL implementation), which in turn converts them to hardware commands in a format that the specific GPU in the system understands.
GPU work: Finally, the commands produced above are consumed by the GPU, which does the work of actually producing visible output.

The actual client rendering and the OpenGL dispatch run in separate threads, and therefore can work in parallel on different CPUs, whereas the GPU work runs solely on the GPU itself. Therefore, whichever of these is the slowest is what determines the overall client performance. In practice, I find that on hardware that is actually reasonably good, the actual client rendering is by far most often the bottleneck. On hardware that is less good, the work done on the GPU tends to be the bottleneck.

To put the rendering pipeline rewrite into context, then, it has three main focuses:

Optimize the actual client rendering: It is my intention, hope and expectation that the rewrite will effectively remove the client rendering code itself from being a bottleneck, transferring it to either of the other two. As mentioned in the rant thread, the method for achieving this is to make rendering information more stateful and persistent, being able to reuse a lot of calculations from one frame to the next.
Make the system more asynchronous and non-blocking: As mentioned in the rant thread, I find a common and rather serious performance problem is that moving around in complex scenes causes a lot of stutter, caused by having to do a lot of fairly complex recalculations whenever the things being rendered change substantially (or sometimes even at all), and perhaps more importantly needing to do so in a way that blocks rendering. The intention is to be able to effect changes to the rendering state asynchronously, so that rendering of one frame can proceed even while things are being added, removed or otherwise changed.
Enable porting to different graphics interfaces: As a (rather big and important) part of the rewrite, I'm introducing an indirection layer in between the client rendering code and the graphics interface in order to isolate the client code from OpenGL, and enable porting it to other graphics interfaces, the obvious candidate being Vulkan. In theory, I'm fairly confident that the system I now have is general enough that I could also write backends for stuff like DirectX or Metal if I wanted to, but I don't expect to want to. More on this later.

However, a few things that it does not do include the following:

In and of itself, the rewrite only affects the actual client rendering, and not the OpenGL dispatch work, or the work done on the GPU, so if you are bottlenecked on either of those, the rewrite won't do anything to help. That being said, it does enable a few interesting things particularly in the area of OpenGL dispatch. There's generally more information available to do various interesting optimizations. More on this later. As for GPU optimizations, that is still pretty much as stated in the rant thread.
It does not bring, really, any visible changes at all. If you're hoping for it to include stuff like scalable UIs or higher graphical fidelity or anything like that, then I will have to disappoint you. It's for performance only, with probably not a shadow of a pixel in visual difference. That's not to say that either of those aren't worthwhile, it's just a completely orthogonal thing, and there's no benefit to doing it as part of the rewrite.
It is not the end of all optimizations. Most particularly, the rewrite itself does not address point #1 in the rant thread (instancing of animated meshes). Such things come afterward; more on this later.

2. Current progress
It can probably be said that the rewrite involves three major steps:

1. Formulating an abstract rendering architecture that the client can use for rendering, and for which such backends such as OpenGL or Vulkan can be written;
2. Writing an implementation of said rendering system; and
3. Converting the actual client to use said rendering system.

While each step conceptually builds on the prior, I'm not doing them in strict order. However, the first part is mostly complete. There are some minor details left here and there that I'm filling out as I'm coming around to touching them for the first time, but the big picture is very complete. As for the second part, it actually consists of two sub-parts: On the one hand an "immediate-mode" renderer for things that are not performance-critical and isn't reusing data from frame to frame (mainly the 2D UI), and on the other hand a "persistent-mode" renderer for things that do reuse persistent data from frame to frame, mainly the actual 3D rendering. I have had the immediate-mode renderer pretty much completed for quite a while now, and I have converted the 2D parts of the client to use it, which, as far as I can tell, works pretty much completely. What has taken time since then has mostly been the implementation of the persistent-mode renderer for OpenGL, but I have quite recently (since a week or two) reached a point where that seems to be mostly complete and working for what it is, so I have now turned my attention to actually using this rendering system in the client. This involves a lot of little details, and thus far I've been working on peripheral stuff like converting material specifications and the preexisting system for asynchronous texture loading and such things to the new rendering system.

Again, most of the work is converting every little detail of rendering to the new system, so it's a bit hard to break the remaining work into perfectly well-defined chunks, but among the more well-defined blocks that I can make out that are yet to be converted are such things as: Converting sprite creation to be asynchronous and non-blocking, click-testing, shadows, MSAA resolve filters, and automatic instancing.

3. Future work
After the above is done, it is my intention to push the rewritten client to be used henceforth, but as mentioned above, that doesn't mean that there are no more optimizations to be done. The main things I see once the new client is in use are the following:

Vertex-data atlasing: As part of the OpenGL dispatch work, I have identified that it appears that one of the more expensive things that the client does on OpenGL is switching VAOs, and much of said switching may be unnecessary, but is hard to remove in the current rendering system. With the new system in place, I intend to group models that share the same vertex "format" (number and nature of bound vertex attributes, attribute stream specifications, &c., as seen by the GPU) into much fewer VAOs, which should decrease VAO switching by some orders of magnitude, which I suspect will help quite a bit on the OpenGL driver side of things.
Instancing animated meshes: As stated in the rant thread, pretty much.
Instancing of variable materials: Most material varations are in textures only, and in theory such textures can be stuffed into array textures and then instanced, reducing draw call overhead for them by a lot. Probably, this will allow a much greater range of things to have variable materials, like wooden crates, barrels, herbalist tables, individual wall segments, &c&c.
Graphical detail slider: As mentioned above, I'm still not entirely sure exactly what to do about optimizing for the GPU case. While I see a few possible things to do, all of them are fairly theoretical, and I'm not sure how much they'd actually give. As such, it seems the main thing to do for GPU optimization would be to just reduce the graphical fidelity for various things. Technically, this is something I could have done for a long time, but I haven't really wanted to since I've wanted to do non-degrading optimizations first. The rendering system rewrite has been the main non-degrading optimization on my agenda, however, so with it complete, it's probably time to start looking at this.

4. So what about Vulkan?
For compatibility reasons, I'm still primarily targeting OpenGL 2.0, which is less than optimal in order to extract optimal performance. Most of the overhead of OpenGL 2.0 lies in the area of OpenGL dispatche as defined above, and with the actual client rendering being removed as a bottleneck (hopefully), OpenGL dispatch may or may not take over that dishonorable spot, depending on the GPU used. As mentioned in the rant thread, one of the main allures of Vulkan is the ability to reuse constructed command-lists, which holds the possibility to pretty much remove dispatch overhead altogether, so that's clearly the way to go for the future.

In the immediate, however, Jogamp (the parent project of JOGL) does not yet support Vulkan, so that's a bit of a roadblock. It seems that LWJGL has some sort of Vulkan support, but last I looked at it, it seemed fairly preliminary and makeshift. Even if that changes and becomes more robust, I'm not too fond of LWJGL, particularly because it seems to require using its own windowing toolkits and stuff, whereas I want to go on using AWT for the windowing system abstraction. There may be things that can be done about that (primarily, I've considered using LWJGL to render the final image off-screen, and then just use Java 2D to draw that into an AWT window), but either way, all that seems to be at least slightly in the future still.

Since I'm not sure just how far in the future it is to get Vulkan support, I have been considering alternatives. OpenGL 4 does alleviate quite a few of the issues plaguing OpenGL 2, especially if one uses various more-or-less-common extensions to it. However, it's still not as good as Vulkan, and considerably more complex to implement, so unless Vulkan support really takes a long time to reach Java, I'm not too sure I want to waste time on that.

--

I know I did previously promise an essay, and if nothing else, I seem to have delivered on that particular promise. I hope it's interesting in some way.

by **jordancoles** » Thu Aug 02, 2018 1:36 am

Thanks for writing this Loftar

Keep up the good work

by **Finigini** » Thu Aug 02, 2018 2:02 am

As someone who has an interest in programming and game development I'm glad you are sharing your insight. I'm also super excited for where this rewrite will take the game to in the future. Thanks for the hard work, I think I'll pick up a sub.

by **sMartins** » Thu Aug 02, 2018 2:41 am

What about the rendering distance? the same?

by **loftar** » Thu Aug 02, 2018 3:13 am

sMartins wrote:What about the rendering distance? the same?

loftar wrote:It does not bring, really, any visible changes at all. If you're hoping for it to include stuff like scalable UIs or higher graphical fidelity or anything like that, then I will have to disappoint you. It's for performance only, with probably not a shadow of a pixel in visual difference. That's not to say that either of those aren't worthwhile, it's just a completely orthogonal thing, and there's no benefit to doing it as part of the rewrite.

On that note, though, increasing the rendering distance is much more than just a client-local thing. Doing that in a meaningful way would require the server to be able to provide data to the client for things at greater distances, which in itself would require being able to provide data at different levels of detail. The latter being both for performance reasons (server-, client- and network-wise), but also for gameplay reasons. Noone wants everyone else to be able to see what they're building half a world away.

IIRC, Jorb even talked about this on the last stream, actually.

by **DDDsDD999** » Thu Aug 02, 2018 4:29 am

So, OpenGL dispatch rewrite and GPU optimization before we get noticeable benefits?

by **loftar** » Thu Aug 02, 2018 5:03 am

DDDsDD999 wrote:So, OpenGL dispatch rewrite and GPU optimization before we get noticeable benefits?

No idea how you managed to draw that conclusion tbqh fampai.

by **Onep** » Thu Aug 02, 2018 8:12 am

loftar wrote:No idea how you managed to draw that conclusion tbqh fampai.

Loftar has spent too much time researching dank me mes

by **Kuddie** » Thu Aug 02, 2018 10:51 am

Thank you for this status update

by **sMartins** » Thu Aug 02, 2018 12:50 pm

loftar wrote:
sMartins wrote:What about the rendering distance? the same?

loftar wrote:It does not bring, really, any visible changes at all. If you're hoping for it to include stuff like scalable UIs or higher graphical fidelity or anything like that, then I will have to disappoint you. It's for performance only, with probably not a shadow of a pixel in visual difference. That's not to say that either of those aren't worthwhile, it's just a completely orthogonal thing, and there's no benefit to doing it as part of the rewrite.

On that note, though, increasing the rendering distance is much more than just a client-local thing. Doing that in a meaningful way would require the server to be able to provide data to the client for things at greater distances, which in itself would require being able to provide data at different levels of detail. The latter being both for performance reasons (server-, client- and network-wise), but also for gameplay reasons. Noone wants everyone else to be able to see what they're building half a world away.

IIRC, Jorb even talked about this on the last stream, actually.

Yeah, yeah .... I was just wondering if the boost in perfomances would allow us to have full rendering at the actual max distance in the vanilla client.
Cause, you know, right now stuff pops out from the void at max distance, just saying. But maybe it's just a matter to set the limit of the zoom out exactly with the rendering distance.

Btw, thx for sharing your work, it's interesting to know how many stuff are actually involved .... also if I may ask you, are you already familiar with Vulkan? I know some emulators (such as Cemu) that are switching to Vulkan right now .... and for them it's completly a new thing, so much that they are keeping 2 version, the old one on opengl, and the new one on Vulkan full of bugs, and they will keep both version untill the new one become as stable as the old one.
Do you think you'll do the same in the future? Old version on opengl and new version on Vulkan, that will require refinements, etc.. with time?

Client performance update thread

Client performance update thread

Re: Client performance update thread

Re: Client performance update thread

Re: Client performance update thread

Re: Client performance update thread

Re: Client performance update thread

Re: Client performance update thread

Re: Client performance update thread

Re: Client performance update thread

Re: Client performance update thread

Who is online