In my mind, there are three main, independent areas where client performance should be improved, so I'll write about each independently.
1. Animated meshes
There's been a lot of talk lately about extremely poor performance in larger fights. While I still haven't been able to reproduce anything close to the extreme levels that people have been talking about (1-2 FPS in a fight with a few tens of people), it seems safe to say that it's primarily a matter of animating the player characters involved. Despite my lack of ability to reproduce those particular situations, I have on the other hand noted myself that large numbers of livestock are a big part of slowdowns in complex village scenes, so it seems like an area of particular urgency anyway, and I figure fixing one should have a good chance of fixing the other as well.
There have been ideas floating around that animated meshes can be fixed be combining several animated meshes into one, big, combined vertex buffer, so that they can be submitted with fewer draw calls; but in my profiling, the draw calls seems to be a fairly small part of the problem of animated meshes anyway, the vast majority of CPU time rather being spent in updating the actual vertices, which indicates to me that the vertex processing should be transferred to the GPU instead, along with instanced rendering to reduce the draw call overhead as well. I've long been confounded about instancing animated meshes, however, since the animation data that needs to be passed for each instance is far too big to fit into the vertex data.
However, I recently found an article on the subject by nVidia, from quite a few years ago. The model used in the article is far too simplistic to be applied to Haven (no multiple or blended animations, for instance), but the technique of passing data in textures inspired me a bit, and I believe I now have a fairly thought out idea about how animated meshes could be instanced by the Haven client, where the client still combines the animations and calculates the bone matrices on the CPU, and passes that data in a texture to the vertex shader.
It should be said that said technique is only applicable to skeletal animations (like players or livestock) rather than mesh animations (like beehives or hearth fires). I do have some embryo of an idea on how to instance the latter as well, but it is not at all as thought out yet. It should also be said that the mentioned technique requires the ability of the GPU to do texture fetches in the vertex shader, a capability that is available in all semi-modern GPUs, but I have noticed some people still using pretty ancient hardware to play Haven that is probably lacking said ability, such as GeForce 6xxx- or 7xxx-series cards (as distinct from 6xx or 7xx, mind you), so they may be left out in the cold after such an update.
2. General rendering pipeline
The base of the current rendering system is that every visisble object is iterated over and allowed to draw itself. I have, since a fairly good while back, realized the alternative possibility to instead having them register with a rendering system at creation time, and maintaining a tighter coupling so that some information can be saved and reused from frame to frame. Originally, however, such a system would have only produced rather minor improvements, so it didn't seem worth the extra complexity of such a system, or the fairly large rewrite it would have entailed. However, over time, and especially lately, reasons for reconsidering that have been cropping up, and I'm now fairly convinced that such a rewrite should be done in fairly short order. Among the reasons are these:
- Since instanced rendering was implemented more efficiently, I've noticed that there is an undesirable effect from having to recalculate the entire instancing data buffers every time an object in the instanced batch changes. While the current implementation of instancing has made it very CPU-efficient to draw, for instance, very large fields of crops, there's a large recalculation that has to be made every time any single crop object is added or removed from it, which leads to poor performance when moving around in villages. If more persistent information were kept, it would hopefully allow only the smallest possible deltas to the instancing buffers to be effected instead, which would change the rendering scaling factor from O(number of objects on screen) to O(number of instanced batches on screen), which starts to seem worth getting worked up about.
- As a corollary, it would be nice if such a system could be formulated so that changes can be effected in parallel from the rendering loop itself, so that rendering wouldn't have to be blocked by objects popping in and out, leading to less stuttering.
- I've also been reading up about Vulkan lately, which has been quite an interesting and enlightening experience, and it seems obvious to me that Vulkan is the future over OpenGL. Rendering in Vulkan would be greatly helped by being able to preserve data from frame to frame, as that would allow reusing pipeline-state objects from frame to frame instead of having to recalculate them, which could be worth quite a lot.
- Also, and almost most important of all, I learned only quite recently that one is able to reuse command buffers in Vulkan from frame to frame, which is quite a fascinating proposal indeed, since it may allow improving the rendering scaling factor even further from O(number of instanced batches) to O(number of changed objects for a particular frame). If such a system could efficiently implemented, it almost holds the promise of virtually removing CPU overhead in rendering.
3. GPU overhead
All the aforementioned topics only touch on CPU overhead in rendering, but GPU overhead is also a very important factor, especially for users of integrated GPUs. Unfortunately, reasoning about GPU usage is much harder, especially since I have been utterly incapable of getting any GPU profiler working. I've been trying a fairly large number of GPU profilers, even going so far as to install Windows on a separate system to see if nVidia's Visual Studio-based tools would work, but all of them keep failing for one reason or another. Intel's tools only works for Direct3D and not for OpenGL (sigh), nVidia's previous profiling tools are no longer supported and the licensing server is down, preventing them from starting (double sigh), their new Linux-based tools crash because of Java, and I couldn't even try the Visual Studio based tools because they didn't support the GPU I had available (who knows if they even had worked if I had a newer GPU to try with).
I have some guesses as to optimizations that could potentially be meaningful to improve GPU rendering time, but all those guesses are too uncertain to seem worth trying before profiling (especially as many of them are fairly large changes). Thus, so far, work in this area is still... ongoing. I do feel it is a slightly lesser priority anyway, because the CPU overhead is more commonly the limiting factor, but if CPU usage is improved, GPU overhead may very soon start becoming the limiting factor instead.
It's true that there are some things that could be made to would improve the GPU situation regardless, such as introducing settings to reduce graphical complexity, but I feel that for most scenes, that just shouldn't be required anyway, so that non-degrading optimization should come first. Not sure how to proceed yet.
So there's that. It's just so you know, I guess.