Weird issue resulting in application getting corrupt

August 23, 2012 – 6:07 am

I had a very bad start into this week. On last week friday, I made some changes to our CRM project. The changes were mostly some modifications to a variable resolver. I removed redundant code by moving some parts of the code to methods within the resolver code. I also added two new methods to an existing bean. All this is far away from rocket science. I then changed the home.xsp to use the new methods from the bean.

After opening the application in the browser, everything looked fine. I refreshed the application in the browser and … BOOOOOOOMMMMMMMMM.

Last thing, I could see on the server console was

err_protected> ERR_PROTECTED ERROR: D:\Programme\IBM\Lotus\Domino\data [nsfsem2.c/ln 1684]
err_protected> ERR_PROTECTED ERROR: D:\Programme\IBM\Lotus\Domino\data [nsfsem2.c/ln 1684]

then the server crashed. I restarted the server and tired again. Same result. Application could be opened and crashed after any other action. To be sure, I removed my modifications. But the server still crashed. I tried everything from compact, fixup , replacing the design with the design of a working copy and even creating a new copy of the application. But nothing helped. The error persists.

The strange thing is, that I could/can open the application in question in Designer and make changes, add, update and delete data documents. The crash only occurs, when working in the browser.

I then replaced the corrupt file with the working copy and everything worked as expected. We made some other modifications to the design and the application stayed stable. Before driving home, I made a copy and put the file onto my Domino at home. As expected, everything was fine.

I then started to implement the same modifications that ended in the file becoming corrupt. And yes, I was able to reproduce the issue.

Phil Riand asked me to send in any crash report and the code that caused the error. I sent him an email yesterday morning. During the night, Phil sent a reply. here is wahat he wrote:

Hey Ulrich,

…. The issue you mention is definitively down in the NSF layer, in the semaphore management code. So I forwarded that to the appropriate team internally, and hope to get feedback soon. The issue might not even happen because of your database, and probably not because of your changes.

This is good news and bad news. Good news because there is something wrong, and it is reproducable. Bad news because if it is not in the database nor in the changes, it can happen again any time. So I hope, that IBM will find the cause and fix it.



Sorry, comments for this entry are closed at this time.