Performance In The House
My plan for Monday was QUAD batches, so of course by the end of Monday I've just put the finishing touches to the new GPU Occlusion and Culling system for Reloaded. Now prepare yourself for a prototype screenshot, so those of a weak disposition, please avert your eyes now.
As you might see, the numbers in the top left tell an interesting story about where we are in engine terms. The first number to note is the O value, which tells us how many buildings I added to the 50,000 x 50,000 world. Each building has three LOD levels of 5000 polygons, 4000 polygons and 3000 polygons (roughly). The next important number is 310MB, which shows we only used 109MB to render all the buildings within visual distance, and have been created in real-time as part of the dynamic buffering system. As we move through the world, this memory usage will not increase much more than this if the spread of buildings stays consistent as buffers are efficiently reused.
My favorite statistic is CALLS:4 which tells us that only FOUR draw calls where made to create the scene, and the POLYGONS of 24796 shows the population within these four drawn buffers. A quick calculation and you will figure out that we have static geometry batching going on as well!
The 1750 fps is a little deceptive given we don't have the rest of the engine in play, but when you consider that before the above techniques the same scene was rendering over 40 draw calls, nearly half a million polygons and a frame rate of around 400-500 fps.
This worked better than I thought and really takes advantage of the speed increase you can get when you draw to a scene (in front to back order) using a non-color write mode to get the query to say if pixels where drawn.
Amazingly after lots of careful thinking and slow coding, the engine worked first time out the gate. It was a shock to see the call rate drop to single digits and everything still rendered as though EVERYTHING was being rendered. Neat.
As tempting as it is to move this over to the main engine and test against a variety of entities, sizes, textures and so on, I really want to finish the render scene with QUADs. Right now the LOW LOD just disappears after a certain distance and is not replaced with anything, so they just ping out of existence. I had a mind to prototype using the Dark Occlusion module which provides this functionality out of the box, but on deeper reflection I would need it very closely integrated into the instance stamp occlusion system I have now and there would be some compliance issues in the middle of all that.
I will have it as my fall-back, but for Tuesday I am hoping I can create a large static pool of quads and write a shader to have them face the camera with the right texture. Generating the texture will be another matter entirely, and it will be a dice throw as to whether I go for my previous idea which was seven angles around the Y axis and one from above (for the shadow cameras), or opt for the more sophisticated technique used by Dark Occlusion which renders each object perfectly to a communal quad texture, and only refreshes the render when absolutely necessary. The result is a MUCH more believable transition in the far distance, and is perhaps the hallmark of a quality future proof engine. My concern is that if you place down 100,000 trees, that's rendering 100K of entity views to a sequence of community textures, and that is only going to consume video memory and real-time performance.
My dilemma is whether the extra time to test the more advanced technique is worth it, or whether it's more important to get everything in the main engine as soon as possible, and continue testing and improving from that base.