Welcome back :D

by **aghmed** » Mon Jan 04, 2016 11:21 am

As far as i red the code there are 3 things that kills it.
First will be fixable and looks like that:
coord.add(c2).mul(c3).mul(c4)
Assumption that base coord is immutable forces to create 4 instances of Coord that kills GC and RAM.
Fixing is long but possible yet i don't think it will help THAT much

Second is much worse.
That fancy "async" is being computed per Request not batch. In short explanation it's like buying a pound of peaunats by buying it piece after piece instead of bulk transaction because somebody thought that 4 ppl will bring them piece by piece faster than 1 person getting whole pound at once.
I'm not sure is it even possible to fix this without (basically) rewriting client.

And third is lack of flyweight.
No need to explain, anoyne who debugged hnh code sees it.
there might be a way to release at lest portion of code from this issue.
https://sourcemaking.com/design_patterns/flyweight

In conclusion this is multi thread code has very big bottle necks that are not going to be compensated by adding computing power to it.
I suggest small study on amdahl's law.

by **romovs** » Mon Jan 04, 2016 8:13 pm

aghmed wrote:As far as i red the code there are 3 things that kills it.
First will be fixable and looks like that:
coord.add(c2).mul(c3).mul(c4)
Assumption that base coord is immutable forces to create 4 instances of Coord that kills GC and RAM.
Fixing is long but possible yet i don't think it will help THAT much

Yes too many unnecessary object instantinations in critical routines. However not sure how much micro optimizations, that is turning everything into primitive types would help.
I spent a half day a while back reviewing possible micro optimizations (was able to shave only 5 (FIVE) FPS with that) - and well:
1. It would be pain to merge changes afterwards from the vanilla.
2. It requires extreme time investment - passing a few more primitives over stack instead of pushing single reference should be carefully considered. i.e. not that trivial change to make. Also objects can be loaded remotely hence not a trivial "method prototype change".
3. (I know almost nothing about OpenGL) It seems the problem is at least partially lies there. 60FPS without shadows - 20FPS with shadows.. This on the latest Skylake CPU with HD 530 GPU. The situation is much grim on lower end GPUs.

CPU utilization is an issue as well but oh well can't reply to that wihtout knowing OpenGL ins and outs...

by **aghmed** » Tue Jan 05, 2016 10:56 pm

This is third day (afternoon) of my reading Hafen.
So far i managed to decrease amount of Coord instances by about 75%. As intended this gave nothing.
*according to profiler

Also i found the main reason of why this code works in a way it does and it is not a good new.
Every single openGL request is converted to a wannabe delegate "Command" that is added to a pool of commands and computed. Problem is that this is highly generic function so it basically does everything with GL and that means that:
1. there is shittone of calling it.
2. every call needs extra time time to do generic fancying

What i thought: hey! this must be shittone of time used just to call thoose functions and.... we usually run 25 000 of theese every frame, 10 000 000 of EMPTY generic calls (like one of theese) takes about 650ms that (with 60 fps) gives us about 100ms every second... just to call method RUN (without computing anything). (please note that thoose calculations were made on FX 9590, not on a laptop)
<tested on code: http://wklej.to/xCONJ/text >

So i decided to rewrite that class (1050 lines of rewriting). I wanted to make separate loops and queues for all types of GL requests that we don't do generic sillyness but some array operations, preferable on multiple threads.
HOWEVER the designer of the code thought also about this and... PUT ALSO CONFIGURATION OF GL INTO COMMANDS. What does it mean in practice? Well order of executing matter so we're not able to do jack sh*t to this Command stuff without rewriting whole client.

Conclusion:
1. Code is unfixable (except micro boosts that won't help in long term)
2. caching won't help at all
3. the only way to really increase FPS is to reduce drawing area to actual visible area wich(drawing) is defined in multiple ways and places (now it draws everything it can) but it still won't too much
4. i'm really suprised nobody decided to make "ctrl+h" to hide objects

If anyone is interested where i found this sweet loop, its in BGL.java (method run)

by **aghmed** » Tue Jan 05, 2016 11:17 pm

There is one way to optimize this loop - it will reduce this 100ms to about 60-70 but it requires shittone of work and result will be most likely unmergeable to custom clients (depending on how much they were altered).
What we can do is to take all loops that do multiple Commands and turn them into single request:

Code: Select all: public void update2(GOut g) { for(Pair p : parrays) { if(p.upd != null) { g.gl.bglCopyBufferf(p.n.data, 0, p.upd, 0, p.upd.capacity()); p.n.update(); p.upd = null; } } for(Pair p : darrays) { if(p.upd != null) { g.gl.bglCopyBufferf(p.n.data, 0, p.upd, 0, p.upd.capacity()); p.n.update(); p.upd = null; } } }

Code: Select all: public void update2(GOut g) { g.gl.bglSubmit(new Request(){ @Override public void run(GL2 gl) { for(Pair p : parrays) { if(p.upd != null) { p.n.data.position(0); p.upd.position(0).limit(p.upd.capacity()); p.n.data.put(p.upd); p.n.data.rewind(); p.upd.rewind().limit(p.upd.capacity()); p.n.update(); p.upd = null; } } for(Pair p : darrays) { if(p.upd != null) { p.n.data.position(0); p.upd.position(0).limit(p.upd.capacity()); p.n.data.put(p.upd); p.n.data.rewind(); p.upd.rewind().limit(p.upd.capacity()); p.n.update(); p.upd = null; } } } }); }

However looking how variable managment works here this operation is scary

by **loftar** » Wed Jan 06, 2016 2:14 am

I really don't want to discourage you guys from analyzing client performance and suggesting changes to improve it, but I really cannot help but think that y'all are looking in the wrong places. For instance:

romovs wrote:Yes too many unnecessary object instantinations in critical routines.

Object creation in Hotspot is generally extremely cheap. I have at times even tested and benchmarked converting some very common object creations into pooled or mutable objects instead, and it has made no difference whatever. Which is as expected, really, as Hotspot implements object allocation as little more than the increment of a register. Young-gen GCs (or GCs overall, for that matter) also represent almost no part of the total CPU time expended by the client.

aghmed wrote:coord.add(c2).mul(c3).mul(c4)

Even with object allocation being cheap, I'm pretty sure the JIT will inline those calls, convert the non-escaping instances into stack allocations, and in the end even move those into register-only values.

aghmed wrote:Every single openGL request is converted to a wannabe delegate "Command" that is added to a pool of commands and computed.

For those who don't know, the BGL and its Commands are used to off-load the actual calling of OpenGL driver routines to a second thread, so it is a part of parallellization. However, when I implemented it, and was at the stage where I had the BGL but not the secondary thread to do the dispatch, I ran it all in one thread, and even then it was still faster (by about 10%) than doing the OpenGL calls directly in place instead of allocating Command instances. My hypothesis on why this is so, is that bunching the preparation code and the OpenGL calls together, each for themselves, simply improves cache locality.

It is also useful to observe that the thread that actually does the OpenGL command dispatch is only very rarely the bottleneck anyway. At least 9 times out of 10, the bottleneck is rather the UI thread, which generates the commands. Therefore, even if the BGL dispatch loop may be said to be slow in some way, optimizing it would make no difference -- it would just increase the amount of time that thread is waiting for the next frame to dispatch.

aghmed wrote:tested on code: http://wklej.to/xCONJ/text

When I tried this benchmark, I found not only that it was mostly GC-limited, but also that it doesn't follow reasonable practice for JIT warmup. Changing it slightly to only use 1,000,000 commands, and instead repeating the test 1,000 times showed that the average time taken to allocate and then dispatch a single command was only on the order of 25 ns.

To offer my theory on why the client performs as it does and how to improve it, I'm pretty sure it is simply because the client treats all objects (graphical, in-game objects, that is) as being completely dynamic and treating them all with equal dignity. I'm fairly sure the main reason why most modern games are so much faster is because their engines know what objects are static and can fast-path them appropriately, whereas the Haven client assumes that all objects may change at any time and does complete setup of them from scratch for every cycle. I'm fairly sure the reasonable way to fix this would be to give the client a notion of static objects, and allow it to cache rendering information about them from frame to frame. Optimally, it could perhaps even save a stand-alone BGL list for each group of such objects and just resubmit that for dispatch every frame.

by **loftar** » Wed Jan 06, 2016 2:24 am

aghmed wrote:http://wklej.to/xCONJ/text

Also, I feel I must ask: Do you calculate the amount of indentation for each line with an RNG? :)

by **APXEOLOG** » Wed Jan 06, 2016 11:33 am

loftar wrote:For those who don't know, the BGL and its Commands are used to off-load the actual calling of OpenGL driver routines to a second thread, so it is a part of parallellization. However, when I implemented it, and was at the stage where I had the BGL but not the secondary thread to do the dispatch, I ran it all in one thread, and even then it was still faster (by about 10%) than doing the OpenGL calls directly in place instead of allocating Command instances. My hypothesis on why this is so, is that bunching the preparation code and the OpenGL calls together, each for themselves, simply improves cache locality.

Why don't you just use any of graphics engines? Graphics programming is own world where people spend years to become expirienced enough to make fast and good-looking applications. Graphics engines allows not to spend billion of years understanding underground rendering pipelines, specific for each version/vendor/os pairs.
For example, back in times of hnh 1, i've spend week or two to port client to libgdx and i got like 20 fps boost without even changing client architecture.

I know you don't like to reuse code of other programmers, but this is the point where you should decide - you will continue with doing haven for yourself (for you ego or whatever) or you will become professional and continue with doing haven for players (games are made to be played, you know).

by **aghmed** » Wed Jan 06, 2016 8:12 pm

it's not that i don't want to use 3rd party code - i will finally use libgdx or something like that. It's just that this code has so many things that has to be fixed that it would be actually pretty stupid to add more code before fixing existing one. I guess i'll look for your help as soon as i'll decide that code is stable enough for operation on opened hearth.

And just to be clear. For ppl that writes JAVA stuff some things might just be obvious while i work around it. It's because i write in that second java (c#) and i don't work with graphics but with data transformation (it's basically the same until a cow has a loan

)
I came here just to get some knowlege about JAVA and openGL. I'm not going back to game, i'm not making bots again (i don't even have sources to them). I might add pathfinder back if transformation s2m would work somewhat good and loft would accept it or i could help anyone that would like to add it to his engine (it is 95% working atm - generates proper path but target is based on screen coord not map coord) but i didn't came here for longer.

by **APXEOLOG** » Wed Jan 06, 2016 9:12 pm

aghmed wrote:...

Actually i was talking to loftar

by **aghmed** » Fri Jan 08, 2016 8:48 pm

1000 fps confirmed

Welcome back :D

Re: Welcome back :D

Re: Welcome back :D

Re: Welcome back :D

Re: Welcome back :D

Re: Welcome back :D

Re: Welcome back :D

Re: Welcome back :D

Re: Welcome back :D

Re: Welcome back :D

Re: Welcome back :D

Who is online