Thursday 20 June 2013

Thursday Blue Crash No 2

Darn Hardware!

A successful day, and then disaster.  After executing my plan to copy in a working proto into the engine and then slowly migrate it over, the plan worked just fine and I was able to see shadows quite quickly in the main engine. It turns out that if you use PRINT or CENTRE TEXT commands, it royally messes up the pixel shader and who knows what else.

I managed to then move all the proto code out and have my regular Instance Stamp geometry textured and lit by the new shader, complete with shadows. The only remaining artefact is the first one/two cascades are using corrupt depth buffers for some reason, but there is a pattern there which will give me a clue as to who is putting it there and why.

Stop The Press

All this wonderful developer stuff ground to a halt when I experienced my second Blue Screen Of Despair, and after three attempts at a reboot I finally used Safe Mode to inspect my most key files, and of course, sods law struck again and had completely wiped out the main shadow mapping source file, not the header this time but the main file with all the clever stuff in.


To make matters worse, the PC had time to trigger the automated backup system which promptly copied the corrupt (empty) key file to the remote network backup, erasing my reserve copy.  My local copy was way out of date (four days old) and I was pretty much up the creek.

After over two decades of being paranoid about back-ups, I had a backup plan (pun intended) which was my GIT repository which I usually refresh each evening as a matter of course (and to get used to this popular rival to SVN). Lo and behold, a 2 day old copy of the missing file was intact and sitting there waiting to bring me happiness. Granted it had also managed to take the corrupt file as well somehow (maybe a panic sync after the horse had bolted), but both GIT and SVN have the rather cool feature of recording every version of the file you commit, so I was once again saved.

Emergency Protocol

What adventures we are having, and no mistake!  I instantly made the decision to move my entire Reloaded development files and work to my old machine which I strategically kept to one side and used daily for things like wiring blogs, answering emails and accessing older projects I did not want on the new 'super' machine.  It only has 400MB left on drive C so it looks like my evening will be spent freeing some drive space, moving over some files, ensuring everything compiles again and of course set up a triple redundant backup scheme for it all.

At the same time (though probably next week now) I need to investigate why my new monster machine is unstable.  The clues are that both BSOD crashes seemed to be memory related and on the first event it had to repair sectors of Drive C.  This narrows the possible culprits to a dodgy SSD, dodgy memory or an unstable over-clock.  As the machine was running fine for over 6 months, I am inclined to suspect the SSD as the villain of the piece. Unfortunately I am in a self imposed 'no more hardware' mode so a new SSD is out the question as they are a few hundred quid for a decent one.  I will probably steal a secondary SSD and reformat the whole machine (which means at least 1/2 days of reinstalling all the darn software too).  Before any of that, I am going to do all the software only stuff first like run a full virus scan, deep disc scan and any free stress tests I can find to put the PC through. The perfect solution is that I find the exact cause and swap out the dodgy part and carry on, though just like a stuttering car, once you are left stranded on the motorway you find it very hard to get the confidence back.

Back On Track From Friday

The priority must and is the Reloaded development schedule, so I will have finished my restoration tonight and be ready to continue Reloaded development in the afternoon.  These bumps in the road are inevitable when you are a developer, and it's about how you respond to them that separates the experienced guys who do it for a living, and the guys who kick and scream at the world for being so cruel.  The kicking and screaming is absolutely essential however, as it's the only way to learn the harsher lessons development throws at you.

Art-vine

I have heard a whisper that some artwork is being massaged into life this week to compliment our new parallax & shadow shader. I have a few objectives for a demo I am producing for a Monday meeting, so if you are short on time I highly recommend checking out the Blog on Monday/Tuesday for some visual candy.

Signing Off

I'm sure certain parties will conclude that because I built the PC in the first place, the BSOD and wasted half-day is entirely my fault. Sure enough, I make a point of buying dodgy hardware from time to time to keep me on my toes and provide exciting material for my blog ;)  In fact, I'm already planning my next massive system failure and mass deleting of a few weeks work!

7 comments:

  1. Sounds like you have a hard drive failure about to ruin your day/s

    ReplyDelete
  2. Wow, a BSOD. I don't think I've seen one of those since XP days. Oh wait, I take that back, a couple years ago I had an external sound card (of all things) cause a BSOD. It's so unnerving when it happens.

    ReplyDelete
  3. It gives me an opportunity to play Mr Holmes :) As it reported both IDEPORT0 and IDEPORT1 in the Event Viewer, I am disregarding a specific SSD failure, and I've run two memory tests which passed. My Narvell SATA controller driver was from 2011, so I've updated to the latest from the MB manufacturers website and now hunting for some stress test disc software to throw at it overnight. I'm still moving my dev kit back to 'old faithful', but set it up so I can work on both and use a common SVN for all files. Necessity is indeed the mother of invention (or in this case protection)!

    ReplyDelete
  4. I recently went through my own BSOD ordeal, where it'd always happen at the most inopportune time. The problem pointer in mine was Memory_Management, which I successfully traced down to a single bad stick of RAM. The PC worked fine for a year but then that one stick went bad.

    All of your parts you bought for your new PC are likely still under manufacture warranty. If so you should really track it down and send in an RMA for them to replace it, at most you'd have to pay to ship yours back. My RAM was under lifetime warranty and they replaced it right away.

    Unlike you, I am still terrible at keeping backups and generally have none at all. You'd figure I would have learned after a hard drive crash in '03 caused me to permanently lose essentially every piece of data I ever created. I'm glad you were able to get your file(s) back!

    ReplyDelete
  5. Reloaded, derailed!

    It must be the impending meeting on Monday causing it all!

    ReplyDelete
  6. Well between moving the dev kit over, backing up two machines and sorting out a new super SVN to keep Reloaded source safer, I've accidently burned through the borders of Thursday into Friday territory. Grr. Going to leave PC-A on a stress test (as it backs up too - the ultimate test), finish off the SVN upload manually (an SSL bug in OpenSSL means I have to upload in 30MB chunks, double grr). Then I can go. I'm sure the Universe is winding me up!

    ReplyDelete
  7. Haha Rick, you always sound like your smiling but hissing through your teeth, "Come on, Lee, we've got a lot of expectant backers waiting for you to finish...!!" ;)

    ReplyDelete