Tuesday 30 September 2014

CPU Profiling Totally Rocks

Having a great time profiling the FPSC Reloaded Engine with VTune at the moment. I started by switching off every component and running the profiler on a completely empty scene, no terrain, objects, sky, physics, anything. I then had a look at what might be hogging things.  Turns out quite a few things.


Seems the engine would be monitoring ALL the objects, even if they where invisible for things like animation potential, mesh vertex update potential and other large loops. Completely redundant of course and hogged my CPU cycles. By changing them to use shortlists, I would only do a loop that consisted of the objects of interest, and the bottleneck completely disappeared.

At first I also thought the huge amount of time spent in NVD3DUM.DLL was something I could optimize but I think a good engine spends most of it's time in here as that is where the CPU is constantly giving things to the GPU to process, which means more frames and faster games. My current guideline is to ensure that the engine always spends more time in this module than it does in the remaining modules, thus ensuring a fast throughput of polygons to the card and zero CPU stalling.

I still have an outstanding issue which is causing a DirectX Error crash due to skipping the texture sort each frame (a MASSIVE hog) but once I track down the specific object(s) responsible I can fix it properly and massage the texture sort system so I am not breaking something elsewhere.


As you can see, by ensuring the texture sort only happened when the overall number of objects in the engine changes (i.e. something got added or removed) I went from 143 to 208 by adding two extra lines of code and a new variable!

After I've solved the texture sort crash bug, I need to spend some time playing the game and using the editor and features of the engine to ensure I have not broken anything major. Best to fix those now when I know what code I changed than a week from now when I won't have a clue.

I won't tease you with my current frame rate gains as they are very subjective but I am happy to report that for every bottleneck I find, and eliminate, the bottom line FPS jumps up.  There is still the unavoidable issue that the engine drops back down to the 40 range when I try to draw a thousand shaded objects, but that is something I plan to tackle separately as it relates back to visuals and how the rendering order and quantity is handled.

Pretty happy with the performance work so far, and my hope is to bring you some solid news of the gains before the week is out.  Until then, watch this space and keep your fingers crossed.

Monday 29 September 2014

Performance Week

At long last I have finally broke free of visual thoughts and hidden thread bugs to spend a whole week on performance. When not trying to find ways NOT get distracted with 'other things' I will be using Intel VTune to find all CPU bottlenecks and squash (or redirect) them as much as possible.  I have some major areas I think need work, but I will let the profiler be my guide this week (and my own task list of course).

Before I started this week of speed, I had one last visuals related conference call with one of our former top artists. Taking five minutes away from his cut and thrust industry artist lifestyle, he helped us come up with a plan on how we might balance the lighting system in our engine and create a more impressive final render.  I will only be starting this work mid-October but it's great to have another pair of eyes, attached to a brain and mouth that can talk in the language of code and the language of art. We can only benefit from his help!


I have arrived at the visuals settings that 'I' like, and when we do the light balancing we can come up with something both 'generic and cool'. There was even talk of disabling some of the sliders by default (released in SETUP.INI) so you can be protected from messing up the lighting balance. For example, no good comes from increasing ambience if we can provide the ambience via a skybox cube map (spherical, hemispheric or cube mapping) for the perfect natural lighting (i.e. when you change the sky to blue or red, the ambient contribution is affected by the colors in the sky and is very effective).  More on this when I start my experiments next month.

Experiments for the rest of this month include hunting down and eliminating lazy, bloated or overworked algorithms and teaching them the true meaning of speed.  My personal goal is to go from 35 fps which is what I am getting from my aged PC and Graphics card to well over 60 fps to achieve a smooth and exciting game play.  My hope is that I will find a single line of code which is the cause of all the slow down but I don't think I am going to be that lucky. It's more likely to come from some effective investigation, lateral problem solving and a few hard choices too.  Having been a victim of these slow patches, I am pretty excited to get stuck into the heart of this beast so going to grab a bite to eat now and then set forth on a journey of discovery to slay a few code sloths!

Friday 26 September 2014

Give Me Light!

Spent one hour and a half talking about the lighting question again, all very necessary stuff but it's slow progress making concrete decisions. You could not find a more subjective topic to talk about, and committing ideas to code requires a very clear understanding of what you want to do, versus what do not want to happen. After the call I was thrust into a further three hours of experimentation and analysis to attempt to create a situation where the surfaces of objects can be overexposed without using ambient light to artificially brighten other areas of the scene which became undesirable. The solution was a new slider value called 'Surface Level', and much like 'Ambient Level' controls the intensity of the multiplier within the shader, but this time for direct light.  I also added the remaining shader controls to the new static render effect to include shadow intensity and general ambient and surface colors.


While I was coding and testing this, I set my machine off pre-baking The Escape level with ambient occlusion but with a single threaded approach. I did some experimenting last night and discovered that if I completely eliminate the threading code, and run on the main processor thread, I can lightmap the whole scene without any corruption, freezing or crashing. A major clue, as I now know it has something to do with threads competing for use of the same data and getting it royally wrong.

Of course after 2 hours, when the pre-bake was finished, it turns out that by setting the ambient value to 0.0 instead of 0.6, the expensive occlusion effect was lost as inside buildings there is no light to subtract from. Ah well. Returned it to 0.5 and started the build process again.  The reason I dropped it was to allow the real shaders to fully control ambience, but as I want to have ambient occlusion mapping even where there is no light sources, I have decided to use 0.5 as the base-line and then deduct this value from the shader so I can effectively have negative light to apply the ambient occlusion effect again.

For today, in the spirit of getting things done, I have decided not to continue hunting for the threading bug and instead get the static shaders finished off and move onto in-game performance and continued visual touches. These items are now more important than making the light mapping process faster, but it's still high on the list, just slightly demoted while I get the engine into a state which can allow me to make some decent screenshots everyone is happy with.

Currently agonizing over creating a new shader (which will be almost identical to the entity_basic.fx) to allow normals, specular, fog, e.t.c. but with the addition of an extra UV data chunk and a texture re-shuffle.  Ideally it could all be in one shader but then I would have redundant resources in there on both sides (i.e. secondary UV not required for dynamic entities, occlusion texture not required for static entities). I also like the freedom of being able to tailor the static shader for speed given it's static state.  Despite the terrible 'code duplication' it will create I think I will opt for a specific static shader and just cut and paste 94% of the code from entity_basic.fx.

If anyone knows of a good technique to have 'common shader code' which can then be included into the HLSL file, it would make the above concern mute and would significantly clean up my shaders and also reduce the chance of errors creeping in such as typos.

For now I will proceed to create final static shaders, juggle the code to allow the extra textures in there and tie in the HIGHEST to LOWEST settings so they can change the static shader too.  That will then set me up nicely to produce some nice shots this evening, and have the engine ready to do some serious performance profiling with Intel VTune (the core duties of my tasks next week). Would have been nice to share a final render of the combined effects of this work, and maybe I will post one this evening if I am not too zonked, but for now here is me diving back into the land of shaders and putting some of the wires back in the box.

Thursday 25 September 2014

Monster Crash Freeze

Every now again, once every few years, you encounter a bug which dwarfs the daily bug, reassuring the typical plodding coder that he still has a lot to learn. The bug in this case is one which flickers the whole screen, and then after around 5 seconds freezes everything, apps, processes, mouse, task manager, everything. The only escape is a hard reset and complete reboot of Windows. Now this type of super crash was quite common before processors got to Ghz speeds, when you only had one core and a hacky way to do threading.  So much in fact that when this new monster bug occurred, it was like a blast from the past.  Not a happy journey into nostalgia though as it is precisely in my way to creating nice looking scenes and fast games.

Actually stayed up until 2AM last night battling with it, to no avail. All my tricks to hunt it down resulted merely in the insight that it is definitely rogue memory writing that is to blame.  The corruption strikes in the object sort list, in the object data themselves and in the vertex buffer manager, so there is no specific target and the damage hits in a different place each execution. Even data based breakpoints failed as the area being monitored would be corrupt the first time but not the second.

Today I had the idea that I should amend the light mapper to use a single main thread and to create smaller light mapping batches with a smaller test level so I can concentrate the bug effect and have a chance of the program using the same address space on repeated runs.  I am now in the middle of this process and hopefully I can then use my data breakpoint trick to find the origin of the rogue who is randomly writing bytes into someone else's memory.  I've also been asked for some top notch light mapping screenshots so it's critical I get this fixed, new scenes designed and shots made before the newsletter comes out.  Wish me luck!

Wednesday 24 September 2014

I Was Up, Then I Was Down

All the way up to 4PM I did some good work, creating a new task slicer for the lightmapper which will conserve the system memory even for massive levels. It does this by only lightmapping a chunk at a time, and then freeing up the used memory for the next job. I also found and fixed a few major memory leaks in the lightmapper too which helped hugely. Once I moved from my test scene to The Escape level, although it lightmapped fine as a single process, the sliced one now exhibits a very unique bug, which flashes the monitor (not the back-buffer of the app but the whole PC), and after ten seconds the whole machine freezes completely. Have to hard reset to get it back!  Not a fine ending to the day, but I will go through line by line and fine the culprit but it will have to be Thursday now.

I also managed to do an updated trace of all the CPU allocations when running the map editor, and also tacked on the lightmap resource usage from this morning.



Ignore the title, it should have read "CPU System Memory" but you can see the worst offenders are terrain, and the 21% is the memory created during the light mapping process itself (along with some other big LM chunks).  I am sticking with the lightmap resources for now as part of the ambient occlusion mapping work and then I will probably explore the terrain usage after I've done the priority performance tasks on my list.

It's a nice feeling when you finish a day and you've made progress and the software runs better than the day before. On this occasion, progress definitely made, but with a bug that literally forces a PC reset, I don't have that take-home feel good factor.  I may return this evening for a few more hours and at least find and fix the 'freeze my PC' feature, as I don't want that to greet me come Thursday!!

Tuesday 23 September 2014

Insert Amusing Title Here

A good day of work today with more progress on the lighting system, screenshot second one below.  Also continued having fun thinking up the new name for the product for our Steam launch.  You may have read the latest forum thread on this but the headline titles for this week have been Hyperion, Scorch, Titan, Breeze and Dark, with the addition of Viper during our call today. We're also been playing with dropping 'Game Creator' in favor of 'Maker', 'Kit' and 'Engine'. Give me coding any day, thinking up new names is HARD!  The hunt for the perfect name continues.


The above is what I started with this morning, with the obvious issue being the total shadow under the fallen fence gate. This was due to the light mapper not detecting semi-transparent textures in the process.

My task today was to get my shadows looking prettier, adding ambient occlusion back in, allowing semi-transparent textures to cast semi-transparent shadows, ensuring the whole Escape level can be processed, solving the edge artifacts and making it all blend together.  Apart from wasting four hours finding out why D3DXLoadSurfaceFromSurface was not working, everything else went smoothly. For DX coders out there, I must share the above solution. It seems you cannot use the Surface function to copy a compressed video memory texture to an uncompressed system memory texture. Any combination of the above fails. What you have to do is create a system memory texture and directly load the compressed texture into it using D3DXLoadSurfaceFromFile. Hopefully this little pearl can save you four hours of your life on day!


As you can see, the new lighting system brings out the depth of the building features and adds subtle shadows where required. It only costs a few extra frames and replaces the very crude LOWEST shadows, but gives a much higher resolution shadow, and even here the resolution of the lightmaps have been limited to 512 pixels wide. This can be increased to 2048 or even 4096, that is, once I have solved the management of the memory used by the light mapper. You find though that most games keep the pre-baked lightmapping relatively low resolution and subtle, mainly to conserve aforementioned memory. I have not yet batched the static geometry which might yield a return of these frames, and most likely gain some too.

Also bear in mind the above buildings and objects are not using any normals, specular or other per pixel refinement, just basic diffuse + lightmap. Hopefully when I add in specular and normals, the flattish surfaces will bump a little to create some more fidelity or they are too subtle and can be left out of all but the closest objects, we will see.

Next on my list is to squash the whole lightmapping process so it does not take up quite so much system memory.  It was originally written and tested against small objects, not whole levels and as a result the implementation proceeds to create allocations for the entire process at the start, AND creates more memory on the fly as it goes. Pretty hungry now, and it becomes positively ravenous when you increase the light mapping resolution quality.  My initial idea is to break the job up into 200MB or so of work, process that, then move onto the next 200MB using the previously freed memory. Might add a few seconds of set up to the whole thing, but allows light mapping to happen inside Test Game which is ideally where I want it.  Before that however, I shall spend some of Wednesday tracing through the 1338 static objects from The Escape level and investigate why they are collectively eating 500MB of system memory. It might be perfectly acceptable when you consider the addition of collision geometry for the ray caster, holding areas for the light accumulation buffers and the system memory copies of the transparent textures, but it's always worth checking out memory allocations of that magnitude.  It also means I am one step closer to starting my performance work, which I am very much looking forward to!

Monday 22 September 2014

Good Day

A pretty good start to the week, having solved the 10 fps issue that struck last week. Turns out it was a batch of animating objects being forced into being static for the lightmapper but still trying to update the vertex buffer each cycle which made the DX pipeline crawl. Added code to convert such objects and extract the animation properties of these newly created static objects and the speed returned.

My main job for today was to create terrain shadows as part of the light mapping process and as at 5PM I have achieved. The shot above was an early shot of the prototype when I introduced the terrain geometry to the process and as you can see those shadow tears are a real eye sour. After much tweaking was able to solve it, but for the sake of expediency I have switched off the ability of the terrain to cast it's own shadows in the light map process. I am happy that they will do this eventually but my mission was to have the buildings cast shadows on the floor, and The Escape level now has floor shadows!

I still have a few things to do, such as re-activate the ambient occlusion mode and add transparent textures to the light mapping so that vegetation textures can let light through and exaggerate creases.  I also want to confirm I have solved the small artifacts that appeared on the first buildings I light mapped.


I also want to experiment with multiple colored static lights, especially for interior scenes as I think this is where the lightmapper will come into it's own, and I look forward to bringing you some screen shots of inside the buildings when my experiments are complete.

Soon I will be straying into performance territory now, so I just want to make sure the visual is where I need it for the time being and then ramp up the work to ensure I can get The Escape running at well over 60 fps on my aging machine with mid-range graphics card.  I also have Intel VTune set up and ready to roll now so I should have the tools I need when the time comes.

Friday 19 September 2014

A Low FPS Day

Be warned, programmer and back-end screenshots to follow. If you have a sensitive disposition, stop reading now.


My Friday began with the above shot, showing almost all the art hidden except for terrain and the new light mapped object collections.  I am currently struggling with an issue which mystery drains almost all performance from the engine when these objects gang up. The same number of regular entities with full shaders do not do this, but the new objects are playing silly buggers.


As part of my journey of experiments, and to find out where the bottleneck (or as I like to call it, squashed straw) I removed the camera cycle render loop and was pleased to see the FPS return in at just under 2000 frames per second :) Obviously very fast when you are not rendering any 3D but I have my HUDs, text and the engine running the background so it gave me hope that we could find some good gains when I started the performance work next week. 

My plan is still to finish the lightmapper and 'glass terrain' shadows but I wanted my test level to run at a decent speed, and 10 fps does not cut it. Alas I spent most of the day skipping functions to isolate the code responsible for the massive slow down but to no avail. At 4PM I decided to throw in the towel on the old school method of finding hot spots and instead broke out my new edition of the Intel VTune Amplifier XE 2015. Although this premiere tool is designed to find these kind of issues immediately, I am not too familiar with the latest version so my Friday evening has been spent learning how it works and how I can get it to work for me.

Once I can get The Escape running fast enough to run around and view lightmaps, I can revert to the Visual Camp and finish off the floor terrain shadows, small artifacts in the larger entities and squash the light mapping memory usage down as far as it will go.  Also had a few ideas how I can batch, group and consolidate the lightmap objects to reduce texture files, how to entirely skip using light map textures for very small polygon surfaces and even did some research into a way to create many instances of an object with a single draw call using a clever Vertex Shader 3.0 system of using multiple streams, one for vertex data and one for instance data.  Can't wait to get light mapping done so I can enjoy the theme park of performance optimization!

Thursday 18 September 2014

Meeting Day

A full day of 'meet' today, covering all topics in and out of development, with some good short term goals set for the next three weeks. We also revisited the issue of lighting balance, which I am still at a loss to fully grasp the issue.


As you can see above, this is a variant of the 30 ambient, 50 contrast setting from yesterday which is now adjusted to 40 ambient and 40 contrast to create more of a day-time lighting effect.

We have also tentatively planned to bring in an artist to mock-up a typical scene inside the 3D modeller, complete with desired lighting and a fly through so we can compare his 'correct' lighting system with my 'obviously incorrect' lighting system.  As I say, I am still coming to terms with exactly what is incorrect in the above shot aside from the obvious lack of ambient occlusion shading, more scenery, more action and perhaps a few atmospheric effects.  I have handed off the production of this fly-through so I do not get distracted with my immediate mission to finish the pre-bake process, save some precious memory usages and move swiftly onto performance.  Hopefully it will yield results from an additional perspective on the visual finish of the engine, and certainly create some new assets in the process.

Development work resumes Friday, which should hopefully see the light mapper producing a fast light mapped object set for The Escape demo, which is my current test level.  Once that is in and working, I will be switching over to what I am calling my 'glass terrain' trick which will lightmap a film of geometry stretched over the terrain world to catch the floor shadows of every object in the scene, and then render that super-fast. I can then remove the static rendering of the shadows and thus improve performance as I hand over this responsibility to pre-baked shadows.  Should not be too many distractions on my radar so should get some good coding done.

As an amusing aside, I just discovered a band called "The Ukulele Orchestra of Great Britain" which played a great rendition of The Good, The Bad & The Ugly and also Leaning On A Lamp Post.  Made me smile a lot :)

Also, in case you missed the news or lost touch, Alpha 6 of AGK V2 is now available to all AGK V2 pledgers, details here:

https://www.kickstarter.com/projects/tgc/app-game-kit-v2/posts

If you have not checked out AGK in a while, you have to check out this latest alpha, it's coming on in leaps and bounds!  If you don't know what AGK is, it's the easy to use programming tool behind the number one Driving Test mobile app in the UK. Check out the platforms supported: http://theorytestapp.co.uk/

I am personally looking forward to the new 3D and Shader command additions, and especially keen to see the performance achievable from the new generation of really fast Android devices!

Wednesday 17 September 2014

A Much Better Day

It's always nice when you fix a bug, but when you fix a big, ugly, hidden, stealthy bug, now that's a day to be happy in.  For those readers from yesterday, you will have guessed by now that I have indeed found the source of the heap corruption, and for the tech heads out there I am going to explain some more.

It all came about when a rather LARGE vertex declaration was requested from DirectX, and that request failed (i.e. position, normal, two sets of UVs, tangents, bi-normals and bone data). Turns out my static shader for the lightmapper had a left-over matrix palette constant set-up which forced the shader effect handler to add bone data to my meshes, which in turn created a non viable custom FVF, and when it exited the mesh change state early it did not recreate the mesh buffer to match the already re-sized vertex data and the memory copy did all the rest. Boom!  I am now able to lightmap and load the models, and apply the shaders without the heap corruption and no more crashing. I still have them problem of understanding why some of the entities succeed the mesh change and some fail, which will no doubt lead to some conversion code, and I also need to expand my prototype to light map 'all' The Escape level rather than a subset but I am on the home stretch when it comes to getting this level lit properly so I can move onto finesse, video memory usage and finally performance.

Believe it or not but this was not the hog of the day!  For the first time in recorded TGC history, we had a 2 hour conference call covering the subject of lighting in the engine.  After much debate and many points of view (far more than the people in the call I might add), we arrived at settings which will be the new defaults for the next build.  In short, ambiance to 30, brightness to 0, contrast to 50 and we are also replacing Veg Specular with a Global Specular slider which will allow all specular effect to be regulated across entity, terrain, characters and any other official shader effects.  Also, to ensure end users who want to override a specific specular effect, I will be adding a new FPE field called specular which will allow the engine to select between the provided specular file, or to choose a pre-set none, low, medium and high specular on a per entity basis. It would also make a nice trick if you wanted to reduce the memory footprint of entities that have a consistent specular value (as the textures used will only be 1x1 in size and re-used).


As you can see above, this was our final agreed lighting between direct sunlight on the nearest building to low lighting on the building behind which is not directly facing the sun. Simon also discovered that all our lighting is based on a near sun-set style sun position, as demonstrated with his before and after shots.



To this end, we are adjusting the sky spec files to lift the sun higher in the sky to create a nicer overall blend of lighting between terrain and scenery objects. Seems in a single day I have the potential to add lightmapping and an improved overall colour balance to the engine.

In other news, we also had a chance to play the new multiplayer prototype, and Ravey has pulled it off once again with actual characters animating, running, shooting and generally behaving like the skeleton of a real death-match game. It was great to see, and the icing on the cake was that thanks to the Steam API, connection was a breeze. No router configuring, no firewall advice required, just go to Steam, click play, join lobby, game starts, run for your life, magic!  Next on his list is things like jumping, fragging, re-spawning and host migration. Nothing pretty so how yet, just raw functionality, but progress is going well on this front and we think you will approve.

Alas it is only 2:28 PM in the afternoon and I have a few good hours ahead of me, and thanks to the protracted call this morning the 4 PM call has been cancelled so it's plane sailing to tea time.  Just leaving my massive level to pre-bake while I add the global specular constant to all the shaders in anticipation of connecting the slider bar.  Happy days...

Tuesday 16 September 2014

Slow Day For Lee

After the torrent of small victories made yesterday, Tuesday has been stuck in the proverbial mud.  It took me four hours to get to this point:


As you can see, I have narrowed it down to a single function call, but it is called across hundreds of different objects and only one of them may be corrupting the heap. The heap in question in the stack of memory which holds all the data for the engine, and while applying a shader to one of the light mapped imported objects, the vertex data copy operation is overwriting neighboring blocks of memory and causing a heap crash. 

Finding it was a complete mare, but fixing it might be just as torturous. The good news is that I am tracking it down now and with a little luck I can solve it once and for all.

Fortunately this was not my only small win today, as I have also solved the issue of the light map image files saves from crashing by moving the DirectX copy texture and save code from the threaded process to the main one. A rival solution was to activate the DirectX multi-thread mutex feature but that would have incurred a very small performance hit which I am no longer willing to compromise.

It's about nine thirty PM now and still no joy in figuring out which object is messing up my heap so will do my usual back-ups and resume Wednesday with better eyes.  As un-glamorous as this type of code fixing is, it may solve several issues in one go as the unpredictable outcomes of silent heap corruption can be far and wide.  Thirty objects down, several hundred to go...

Monday 15 September 2014

My Bus-mans Holiday To IDF 2014

Regular blog readers will know full well that I have been absent last week from my normal posting duties to chill out and attend the annual Intel developer forum in San Francisco.



Three floors of technology and innovation, dispersed with sessions covering everything from chips to robots. My own duties were incredibly light this year as I attended as a mere mortal with only two speaking engagements on the subject of RealSense (formerly known as Perceptual Computing). Also managed to snatch some time attending sessions on integrated graphics performance acceleration for my return to the universe of Reloaded. 



While saying hello to a few friends in the 'Internet Of Things' lab, I happened across a project challenge to build a machine using the Galileo board. My creation was a pretty neat contraption which detected air motion, sampled the particles for ethanol and if high levels where detected, to sound an alarm and increment a sequence of LED lights by way of a detection alert. Although well received and prone to winning a prize for my efforts, it transpired that my close involvement as an Intel Innovator meant I was not eligible for the prize. That's politics for you!



My interest in Galileo lead me naturally to take an interest in it's older brother, Edison, which is a more powerful circuit board powered by an Atom processor. Powerful enough in fact to run the brain of a 22 jointed robot called Jimmy, capable of walking and talking, and built from a simple metal frame and 3D printed body parts. I have always had a passion for robotics, and were it not for the fact I am a better software developer than an electrician, I would be designing them even now.



Another take-away, and one mentioned in the keynote, was the wireless power system, which allows a laptop or other chargeable device to take power from a remote device located under your desk or table. Eliminating the last cable in the office was a great thing to see, and we should see peripherals by the end of the year using this tech, and by the end of next year have this integrated into our Ultrabooks! It's currently rated to 20 watts, so not quite powerful enough to run your desktop or huge monitor, but it will power mostly everything else and it's a great start to a glowing wires free future!


Of course the main reason for my attendance was to recharge the old batteries from several months solid work on FPS Creator Reloaded.  They say a change is as good as a rest, and with liberal quantities of Guinness and stuff that looked like it, my brain was happily sedated while my mouth rabbited on for queen and country.

We can find out about technology and gadgets from the internet, but there is no substitute for getting together and talking about it face to face, and IDF is one of my favorite times to escape the office and do this.

In my capacity as the only Welsh Intel Black Belt, one of my busmans holiday highlights was a trip to the Planetarium, set out like a cinema under a huge domed screen projecting a journey through the universe. Complete with welcome drinks, a gorgeous meal, equally gorgeous people, white crocodiles, uncut diamonds and a great talk by Genevieve Bell on the evolution of robots (and some great movie quotes). It remains a privilege to be invited back as a Black Belt developer, and a pleasure to continue to contribute my thoughts and deeds back into the developer community in the years ahead.

Alas I did not get to enjoy the last evening at my favorite Steak House and Irish Pub as the aircraft to take me home dragged me away in the middle of the last day of IDF.



As it turned out, despite the home-time traffic of San Francisco and threatened TSA security lines, I was sitting at the departure gate restaurant within two hours of leaving the hotel and recovering from a rather naughty pizza. The British Airways plane you see performing it's reverse taxi trick was the sister flight to mine, scheduled three hours later.  Rest assured I had plenty time to get through a few more chapters of Terry Pratchett's Raising Steam.

As I type, my inbox is mighty, my whole office is a dumping ground for miscellaneous tasks, both foreign and domestic, and my brain is still getting to grips with where it left off in the FPSC Reloaded universe.  Normal blogging will resume on Tuesday, just as soon as I figure out why all my characters have suddenly disappeared and what remained to be coded for the new Ambient Occlusion lightmapper.  Very pleased to see the progress made on the Multiplayer and Construction Kit, and hopefully it won't be long before we can show you some shots or even videos of these new components to the game engine.  I am not planning any more holidays or trips until Christmas now, so expect plenty uninterrupted development for the next few months :)

Friday 5 September 2014

Lightmapping Progress

Aside from some quick tweaks the importers and zombies, most of the day has been given over to the work on the lightmapper which will provide the Ambient Occlusion textures required to make the Reloaded scenes look better and run faster. One of my early wins was the reduction of a structure called 'Lumel' from 12 bytes down to just one byte. The original lightmapper could handle multi-colored lightmaps for things like semi-transparent stain glass projections, but for here and right now we do not use them. Further, the data structure used a four byte float to store the accumulated light colour for each pixel, but as this float only converted to an unsigned char at the end of the day, I simply replaced the float with a byte, did the float conversion each time the pixel was added to, and simply passed out the final capped byte when the time came to create the light map texture.  This saving took my per-lightmap consumption from 13MB to around 1.5MB based on a 1024x1024 texture plate. This overhead can be reduced further if I replace the Lumel class with a raw array of bytes but that would mean extra coding and doubtlessly introduce a nice bag of new bugs.

Happy with my saving on the system memory front, I turned my attention to activating the multi-core feature of the present lightmapper. I tried a while back but for some reason it would not play ball so I stuck with the working single threaded approach. Now I am doing test bakes of large levels such as 'The Escape', I cannot afford to sit around waiting 30 minutes for a crash so need to speed up this process so I can get more done.

Of course I had to turn my attention BACK to the LUMEL optimization when I found out that the same data structure was being used to store position and normal vectors in the former float members. This prompted the creation of anew LUMEL LITE data structure to separate the texture pixel work from the hijacked vector code.  By this time it was 3:43PM and day light was running out but it was good to see the lightmapper perform in an identical way except for the drastically less memory and for some reason slightly better performance.


After another 30 minutes, I was able to confirm that multi-core does indeed work a charm, as can be seen with this processor view showing 100% concurrency!


As baking The Escape level takes ages, only to be rewarded with a nice mystery crash, this will probably be my closing entry until I return a week from now.  It's rather fitting that my departure to attend IDF launches with my engine using all eight cores on my PC, something Intel like to see.

I won't be demonstrating much at the IDF show this year, just my ability to drink beer and talk nonsense, for which I am overqualified.  Until my blog returns on the 15th September, you can tune into my twitter feed at @leebambertgc which I occasionally post on when I have a spare five minutes alone with my mobile.  Have a great weekend!!

Thursday 4 September 2014

How To Get GPU Video Memory 'In Use' In DirectX 9

It took two half days of research and experimentation, and finally a link from a Reloaded community member to come up with the solution. You can use the code (below) in any DirectX 9 application to get the 'currently used' bytes of your graphics card, ideal for monitoring your resources, debugging and even taking pre-preemptive action when GPU video memory starts to get sparse. When I added this feature to the log report, and ran a simple 'gun and zombie' test, I saw this section of entries:

12748764 : gun 18:modern\colt1911\gunspec.txt                   S:0MB   V:0MB (123)     
12748771 : gun 19:modern\Magnum357\gunspec.txt                  S:0MB   V:0MB (123)     
12748778 : gun 20:modern\RPG\gunspec.txt                        S:0MB   V:0MB (123)     
12748785 : gun 21:modern\Shotgun\gunspec.txt                    S:0MB   V:0MB (123)     
12748792 : gun 22:modern\SniperM700\gunspec.txt                 S:0MB   V:0MB (123)     
12748799 : gun 23:modern\Uzi\gunspec.txt                        S:0MB   V:0MB (123)     
12748806 : total guns=23                                        S:0MB   V:0MB (123)     
12751435 : Load player config                                   S:1MB   V:109MB (232)   
12751453 : LOADING ENTITIES DATA                                S:0MB   V:64MB (296)    
12751481 : Loaded 1:_markers\player start.fpe                   S:0MB   V:-4MB (292)    
12752030 : Loaded 2:\Characters\zombies\Zombie Crawler.fpe      S:3MB   V:0MB (292)     
12752046 : LOADING WAYPOINTS DATA                               S:0MB   V:4MB (296)     

12752065 : LOADING TERRAIN DATA                                 S:0MB   V:0MB (296)     

As you can see, thanks to being able to link video memory usage with stages in the engine resource process, I notice there is something going on in 'Load Player Config' which is taking 109MB of video memory, and it will be interesting to discover what that might be. I write this blog in the afternoon so it could be I have saved mucho memory by the time you are reading this. Just wanted to get this documented and out into the world to emphasis the usefulness of monitoring video memory on the fly!

Also managed to crunch two bugs, one zombie related and one light map related, with more tweaks, twists and turns to follow. As yesterday's blog was image deprived, I have created a quite level with some of the new modern day assets that have been added to the library.


As an aside, this scene without terrain runs on my machine at about 190 fps, a substantial step up from when I first installed by GeForce 9600 GT card :)

REQUEST: As the community has been such a sterling help getting to the bottom of the video memory read issue, I wanted to put another 'home work' task out there. I am looking for a good DirectX 9 shader technique for very fast but realistic water that does NOT rely on reflection or refraction. Often seen used to render completely opaque water but the ripples and light reflections make it look the bomb!  If you can send me shots, links to code, e.t.c. that would certainly help get that ball rolling.

Before I share the code, just wanted to provide an update that the next thing on my list for Friday is what is called wrap-up, which means preparing internal builds, finishing off and cleaning code, backing up and generally cleaning my desk. I fly out to San Francisco on Monday for a week, so I will need a tidy office and work-plate on my return from drinking all that Guinness.  For my immediate future, I will dive back into GPU video memory analysis and find out precisely who is spending all my VMEM budget!


DIRECTX 9 CODE TO READ VIDEO MEMORY 'IN USE':

DARKSDK int DMEMAvailable(void)
{
static int Memory = 0;
HANDLE ProcessHandle = GetCurrentProcess();
LONGLONG dedicatedBytesUsed = 0;
LONGLONG sharedBytesUsed = 0;
LONGLONG committedBytesUsed = 0;
HMODULE gdi32Handle;
PFND3DKMT_QUERYSTATISTICS queryD3DKMTStatistics;
        
if (gdi32Handle = LoadLibrary(TEXT("gdi32.dll")))
queryD3DKMTStatistics = (PFND3DKMT_QUERYSTATISTICS)GetProcAddress(gdi32Handle, "D3DKMTQueryStatistics");
        
if (queryD3DKMTStatistics)
{
D3DKMT_QUERYSTATISTICS queryStatistics;
IDirect3D9Ex* pDX = NULL;
Direct3DCreate9Ex ( D3D_SDK_VERSION, &pDX );
if ( pDX ) 
{
if ( pDX )
{
memset(&queryStatistics, 0, sizeof(D3DKMT_QUERYSTATISTICS));
queryStatistics.Type = D3DKMT_QUERYSTATISTICS_PROCESS;
pDX->GetAdapterLUID(0,&queryStatistics.AdapterLuid);
queryStatistics.hProcess = ProcessHandle;
if (queryD3DKMTStatistics(&queryStatistics)==0) 
{
committedBytesUsed = queryStatistics.QueryResult.ProcessInformation.SystemMemory.BytesAllocated;
}
memset(&queryStatistics, 0, sizeof(D3DKMT_QUERYSTATISTICS));
queryStatistics.Type = D3DKMT_QUERYSTATISTICS_ADAPTER;
pDX->GetAdapterLUID(0,&queryStatistics.AdapterLuid);
if (queryD3DKMTStatistics(&queryStatistics)==0) 
{
ULONG i;
ULONG segmentCount = queryStatistics.QueryResult.AdapterInformation.NbSegments;
for (i = 0; i < segmentCount; i++) 
{
memset(&queryStatistics, 0, sizeof(D3DKMT_QUERYSTATISTICS));
queryStatistics.Type = D3DKMT_QUERYSTATISTICS_SEGMENT;
pDX->GetAdapterLUID(0,&queryStatistics.AdapterLuid);
queryStatistics.QuerySegment.SegmentId = i;
if (queryD3DKMTStatistics(&queryStatistics)==0) 
{
// Windows 7 (Windows 8 and above is aperture = queryStatistics.QueryResult.SegmentInformation.Aperture;)
bool aperture = queryStatistics.QueryResult.SegmentInformationV1.Aperture;
                        
memset(&queryStatistics, 0, sizeof(D3DKMT_QUERYSTATISTICS));
queryStatistics.Type = D3DKMT_QUERYSTATISTICS_PROCESS_SEGMENT;
pDX->GetAdapterLUID(0,&queryStatistics.AdapterLuid);
queryStatistics.hProcess = ProcessHandle;
queryStatistics.QueryProcessSegment.SegmentId = i;
if (queryD3DKMTStatistics(&queryStatistics)==0)
{
if (aperture)
sharedBytesUsed += queryStatistics.QueryResult
.ProcessSegmentInformation
.BytesCommitted;
else
dedicatedBytesUsed += queryStatistics.QueryResult
.ProcessSegmentInformation
.BytesCommitted;
}
}
}
}

// free DX9Ex when done
pDX->Release();
}
}
}
        
// free GDI DLL
FreeLibrary(gdi32Handle);

// Pass dedicated memory used back to DBP
Memory = dedicatedBytesUsed / 1024 / 1024;
return Memory;

}



Wednesday 3 September 2014

Delve Into Video Memory

I must have spent at least four hours hunting for a simple way to read the current used video memory from my system. I must have done everything at least once. Using the legacy DirectX API, created an OpenGL context to grab it that way, tapped into the WINAPI, used DxDiag, used DXGI and everything in between. You would think this relatively common requirement would be a simple DirectX command, but no, it's wrapped in mystery and riddled with dead-ends. After completely failing in this, I decided on a different approach and used GPU-Z window alongside the loading process and watched for video memory spikes that way. Can't waste days writing a robust video memory monitor when I have game engines to write.  If anyone has some 'working' code in C++ that can detect the 'remaining' (not total) amount of video memory on NVIDIA and AMD machines, please do post.

On proceeding the old fashioned way, I did uncover one crucial fact, which lead to a 400MB+ saving on one of my standalone levels. A cheeky claim, as it turns out when you encrypt media, it does not remember it was loaded, and subsequently loads it in again for any other entity that needs the reference. This would include entities of the same type, and more crucially light map textures. My run this morning crashed out at 1GB of video memory, my new test loaded, ran and leveled off at 805MB (673MB when divide all textures by 32) video memory (which included a large uncompressed Church texture which gobbled 128MB all by itself) :)  This brief journey puts me in a position where I can load in a lightmapped standalone level now, which is great!

There are still many targets for video memory savings here, and each site needs close study to get the best use of available budget. For now I will finish off my making some new entity asset conversions and then resume Thursday with more winning video saving antics.  Sorry for the lack of images, some days are nothing but words and deeds ;)

Tuesday 2 September 2014

Zombies and Light

A two phase day today, with the AM involving a tweak to allow my F9 Entity Placement Mode to work (for my light mapping tests), and then a fix in the animation engine to allow the new Zombie assets we are using to work, and then in the PM run up to 4PM returning to Ambient Occlusion mapping so that my lightmap files can be saved, loaded and used to create standalone games.


As my original one hour Zombie side-task turned into three hours, I decided to commemorate the work with a small video, and a possible hint on our plan to improve the Zombie Pack for Reloaded.


As I increase the number of variety of entities being tested under the light mapper it's great to see old entities gain new life with the extra lighting fidelity. In the Church model you can now see shadows being cast inside the bell tower, the parapet details and even the subtle base relief, bringing out those geometry touches that were previously hidden.

The next chunk of work after light map file saving is GPU Memory exploration, specifically to find out what is being used, the proportions and any waste happening. I can then budget in my requirement for decent light mapping texture space and at the same time tackle the issue of Reloaded crashing out due to lack of video memory.  I consider 1GB of video plenty for what we are doing right now, so there should be no excuses why we are getting anywhere near that number. Will be a fun adventure down the rabbit hole that one!

Monday 1 September 2014

AO and Performance

Hit the ground running with more Ambient Occlusion work (as though I never left) with further progress so that light mapping can now be triggered from inside the Test Game and the calculations retained across multiple test game sessions.  I am still keeping the integrated prototype close for quick lightmapping tests, but it's great to finally be in the main engine.


As you can see, here is a typical scene comprising of four large walls and some interior structures. I then press F1 in Test Game, wait a little while, and the static entities are replaced with my new static light map objects.


What you see is a combination of ambient occlusion and a single directional light mapping using a pre-bake of the static entities.  The surfaces still don't have normals and specular, but you can see the basic light vs dark are covered and the depth really comes through.

The light mapper also makes other assumptions, such as deleting LOD1 and LOD2 meshes from the process to reduce light map texture usage overall, betting on the fact the reduced shader requirement will allow the higher geometry rendering to remain.

In other small tests, I also wanted to double check the engine was capable of reaching 700 fps (as requested by Simon) on my crappy gfx card, and sure enough I was able to hit this target:


You might notice I have switched off a few things, but then who wants to run a game at 700 fps :)  Given this theoretical performance availability, the trick now is to ensure the re-introduced elements so not drain their 'fair share' of system resources so we can achieve high frame rates moving forward.  Terrain is a huge drain at present, and with the additional requirement to apply the AO shadows to it, there might be a solution of solving both issues with some kind of run-time replacement for the current dynamic and versatile terrain system (a sort of pre-bake for terrain).

My next trench of work involves being able to save and load the extra light map object data, transfer those files when making standalone and the ultimate test, pre-baking a large level such as The Escape demo.  I think this will lead me naturally onto memory reduction and tasks in that area, so I want to make sure all the other light mapping basics are covered before I get to that point.

Not bad for Monday, let's hope the rest of the week is as productive!  For those interested, I will be making my transatlantic journey to San Francisco next week for the annual Intel Developer Forum 2014.  If you would like to meet up and share a beer or two, I will be tweeting my locations each day and generally enjoying a brief break before the final charge up to Christmas.