ColdFusion 8 Monitoring Heisenberg Errors

I ran into my first inexplicable crash that I eventually traced back to the ColdFusion Server Monitor. Now first off, this isn’t a problem or bug with the Server Monitor. This is to be expected. The server Monitor adds overhead to requests, and if you have an intense process, it’s going to generate a lot of monitoring data. It’s possible that you might reach its limit.

I just wanted to let people know what a crash caused by the monitoring service looks like, because it doesn’t give you a message that “You have left the monitoring service on in production!”

I had a long running complicated process crashing on my local workstation. It did work on our communal development server. So it wasn’t just the process itself. I thought maybe it was that my laptop wasn’t a server class machine. But actually, the virtual machine that we are testing on wasn’t tremendously more powerful.

The browser session would error out with a message that said:

500

Java heap space

java.lang.OutOfMemoryError: Java heap space

After digging in the JRun logs for awhile I found this:

javax.servlet.ServletException: ROOT CAUSE:

java.lang.OutOfMemoryError: Java heap space

at coldfusion.monitor.event.MonitoringServletFilter. doFilter(MonitoringServletFilter.java:70)
at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
at jrun.servlet.FilterChain.service(FilterChain.java:101)
at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:284)
at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

java.lang.OutOfMemoryError: GC overhead limit exceeded

Of course I didn’t bother actually reading this error until just now when I copied and pasted it. It clearly indicates that the problem is in the Monitoring Servlet Filter. In any case, after much trial and error, I turned off memory tracking and then turned off profiling. Once I turned off profiling the error went away.

10 thoughts on “ColdFusion 8 Monitoring Heisenberg Errors

  1. The fact that the OutOfMemoryError is thrown from the MonitoringServletFilter does not mean that monitoring is the root cause. The MonitoringServletFilter is the “perimeter” of the monitoring system – when exceptions are thrown from within CF, they’re caught there, logged by monitoring, and rethrown. I would suggest you trace down the logs some more, and you’ll probably find an entry beneath the one for this exception indicating the root cause exception. And do keep in mind that OutOfMemoryErrors occur when, well, the JVM is out of memory – is there any possibility that your application is creating objects, and not throwing them away, eating all the JVM memory? Also, as we’ve noted before, do not run production systems with Memory Tracking on – that can quickly bring a server to its knees. If neither of these is a potential root cause, do drop me a mail with more details, and we’ll look into it ASAP.

    Like

  2. Actually, Ashwin, creating many, many objects and holding them over the course of one request was EXACTLY what I was doing. But with profiling and memory monitoring turned off, I was giving myself more rope?

    In any case, my goal here wasn’t to snipe at CF monitoring. It was to point out what it looks like if you’re doing something crazy that pushes monitoring to the point where it breaks.

    Like

  3. Yep, definitely plenty of rope there! 😉 Going by our testing, profiling is safe to use in production, but as I noted, memory tracking could kill a server, especially if it’s creating too many objects. Try your test with memory tracking turned off, and let us know what happens. I didn’t at all mean to suggest that you were sniping at CF monitoring – just providing the background so you know why the stacktrace for the error looks the way it does.

    Like

  4. How exactly do you turn off memory tracking and profiling, I have this exact problem on a clustered pair of 2x Servers with 8GB of RAM each 😦

    Like

  5. In the CF administrator:
    Go to Server Monitoring
    Launch Server Monitor
    Up at the top there should be 3 options that say Stop Monitoring, Stop Profiling, Stop Memory Tracking.
    Turn them off.

    However, these are turned off by default and if you have never turned them on this error could be caused by something else per Ashwin’s comment earlier in this thread.

    Like

  6. Thanks for posting this. I have now found warnings to this effect buried in the user documentation, but it seems to me incumbent on Adobe to post this warning in big red letters on the Server Monitor screen so it’s clear to everyone that it should not be kept running on a production server. At CFUnited in June, there were a lot of Adobe people generating excitement about the Server Monitor, but no mention of its dangers. Of what use, exactly, is the Server Monitor if it can’t run on production? This is a blow to my confidence in Adobe products.

    Like

  7. Well there are a lot of things you can do in the CFadministrator to really screw up the server. None of them get the same treatment. I think that Adobe acts responsibly here in that they don’t install CF with the monitoring running.

    I do think mention of these dangers should be included in future documentation, and guides to setting up ColdFusion, but in reality the load burden of monitoring only comes up on heavily trafficked sites or in the case I discuss above, very complex sites.

    Like

  8. We had a similar problem with Fusebox applications on our servers. We finally did several thread dumps and determined that there were locking issues. Turning off all server monitoring functions cleared the problems instantly.

    Like

  9. If you have an object that has a lot of objects created in its variables scope, you may want to try this.

    After you are done with that object, clean up so it can be garbage collected such as:

    structDelete( variables, “objOrder” );

    This combined with turning off the monitoring as listed above solved our problem.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s