OBriens tower
Musings on software development, Linux and business

Feeling better after a good stack dump

by Robert Fuller

Earlier this week we encountered performance problems on one of the production systems we developed and help support. After having migrated several hundred clients onto the high throughput java based system, the users began to notice some strange slowness appearing.

Log files are great. By studying the production logs we were determined that the output side of the application was no longer keeping up with input. Having already developed some performance enhancements for a future release, we backported some of these and generated a patch which we tested then deployed into the production system. The system was fast again.

Or so we thought. Infact it was much faster for two days until the the strange slowness suddenly reappeared. The logs revealed that things had slowed down, but no indication why. I generated a stack dump (java on linux) using kill -3. I created a three column spreadsheet, then skipping idle threads belonging to the web application container created one row for each application thread. The columns are:

  1. Thread name
  2. Kind of thread (input, output, etc.)
  3. what the thread is doing

This took a little time as the application has more than 100 threads, but as I did it a pattern began to emerge… I could see many threads waiting to lock a statically synchronized method of a date parsing utility component. I investigated and found that the method had been statically synchronized because it relies on java.text.SimpleDateFormat, a class which is not synchronized and relatively expensive to create.

Studying what others have written about this problem, the development team is now reworking the implementation to use ThreadLocal instances of the SimpleDateFormat rather than statically shared instances. The stack dump was very useful in helping to find the blockage. I hope the fix resolves the problem!

Leave a Reply