Round 4:Late Night Bugs

From Planetarion Wiki
Jump to navigationJump to search

Late night bugs

Nothing's like strawling home at the break of dawn after a very long day at work knowing you've accomplished a great deal. Well, this was one of those days. Spinner and myself, Oreo playing games in the back of the office, monitoring the system throughout the night.

I configured the webservers to spread the load across our machines better than before. I moved away all the images to a separate server. That released at least some CPU power on all webservers. Seemed like thttpd didn't keep up the pace when getting hitted real hard. Or maybe it wasn't such a good idea any more to have one image server on each machine in addition to the webservers. At least the system seems to perform better having one single webserver serve all images.

Throughtout the night I did notice that we experience some real overload on the database server once every hour. Our pattern of determining which planet participates in combat and who doesn't used to take like 20-30 seconds. Now, after we've exceeded 150.000 planets the same pattern suddenly takes 8-15 minutes! Amazing. So between the resource tick and the combat tick - the system is very slow, if not completely blocked. I will start working on making a new pattern for this situation.

But the great even of the night was to the THE problem, the now infamous freezup during nighttime. And yes, dead on time, at 04:04 we experienced it again. And of course the problem was trivial to fix :-) The problem was related to the new galaxy-news functionality - and around this time we started purging the old galaxy news entries. When you find problems like this - it's like 'Doh! Why didn't I think of that?!'. Anyway - I'm very glad this problem was found, dealt with and fixed.

Uh-oh, yes, last day we sat an even new all-time high in number of displayed pages. We exceeded the magical 10 million page-views per day. Outstanding. But - then again - our webservers do have problem keeping up the pace. Also - linux doesn't quite perform very well after running in excess of 2000 processes per machine. Even though the cpu isn't maxed.

I guess we must start experimenting with new webservers. Maybe test out the apache-2 versions which has a mix of processes and threading pools - not just processes waiting for execution. One thing for sure - we definitely need another webserver or two. So keep klicking those banners !

There's a catch-22 situation between the ticker and the number of pages displayed. The more pages we display - the slower will the ticker run. Last night we actually used 54 minutes for one tick in the worst period. However - I'll resolve the problem described above about the pattern used for finding planets-in-combat, and that will probably reduce the tick time with 10 minutes.

Vish