Had to start my blog immediately after seeing this video from RETRO GAME BLOKE, proving once more that the community can often find ways to do things that the guys who wrote the engine thought was impossible. In his video, you will see demonstrations of a third person body that animates as the player moves, and the ability to pick up and drop items. Both of these features are not native to the engine, they have been scripted with the built-in LUA commands and has left the lead programmer of the Reloaded project wandering how the 'smeg' it was done:
Not to mention the dynamic in-game scaling! I am officially gob-smacked, and it's great to know the performance issue is not slowing down the community when it comes to creative ideas!
As mentioned last week my main thrust of performance work ended on Friday and the two days this week before my meeting on Wednesday is about cleaning up and putting the wires back in the box ready for a demo to the internal folk. First news is that The Escape used to run at 35-40fps on my machine, and the same settings on my machine now run at 65-90fps. It was no one thing that fixed it, and you can re-read the blog posts from last week to get clues what was improved, but my personal milestone of playing the whole game and staying over 60 fps has been achieved (phew). I then decided to have one more day on performance before the clean-up, which I estimate should not take more than one day if I am focused. This meant I could tackle one or two more ideas I had, including a look at the cost of shader effect swaps, texture reference costs, the true cost of batching and general behavior of the engine.
An Intel engineer put me onto a tool called GPUView which I assume has been used for YEARS in the industry but news to me (naturally). It clearly shows the work the CPU and GPU are doing in broad strokes (actually it's shown in hideous detail but it provides some nice charts that can illustrate the activity of the app quite quickly). It clearly showed that the app is indeed GPU bound (that is, the engine is spending a LOT of time waiting for the GPU to finish what has been sent to it), and also that the CPU spends a good deal of time sitting around waiting for the GPU to flush sufficiently that more render instructions can be sent. I am still not 100% proficient at reading all this new data, but it's great to have my fingers on the various pulses of the engine now.
The last big 'investigation' on my list (aside from object occlusion and render draw order which did not get actioned) was to find out if the terrain system (which is notoriously slow thanks to a huge shader) could be increased in speed. The above scene in the version from this morning ran at 135fps with all features switched on. Not bad compared to V1.0085 speeds, but when I switched to a re-written LOWEST shader to use only a single surface texture (no painting, no normals, no triplanar, no vegmap, no shadows), I jumped to 225fps. It was clear in my cut and paste analysis that each texture reference added to the terrain shader knocked another 20-30fps off my total. For a proper analysis I would need something like Frame Analyser or NSIGHT now, but it's revealing to know where a huge performance chunk could be found! The reason I switched to one texture is that I had an idea to implement something called Texture Splatting which would render a local texture of my immediate area to a render target and then use that as my single texture in the shader. My theory is that rendering textured quads was faster than the seven million texture reads I am doing every cycle, which works out at 438 million texture reads per second at 60fps. Pity there is no time left to implement the idea, but it's very tempting to know that implementing it could give me between 40-80fps extra in some situations.
My actual tasks for today include a final test of the batching system, correction of a crash that only happens after 2 hours of lightmap baking (a horrible one to reproduce) and restoring the dynamic shadows for the editor and higher shader modes. It's plenty to be getting on with, and is a needed step to ensure the engine stays in one piece. It's all to easy to break everything to get more speed, but it has to be tempered with ensuring the foundations are still in tact, and that everything that worked before, works still. Hopefully I can argue for more performance work at the meeting on Wednesday, but with my goal of 60 fps achieved in The Escape demo, and pre-baked lighting in place to improve the visuals of the scene, I rather suspect the vote will be to move to the 'third pillar' which we have dubbed 'functionality'. That is, stuff like more LUA commands, grenades, better explosions, a fully working entity properties panel, finalized widget graphics, the list, as you can imagine, goes on :)