Tuesday, August 30, 2011

XBee Based Datacenter Temperature Monitoring Network

Temperature monitoring and control is extremely crucial for any datacenter. I work for a hosting company who has several thousand servers running producing enormous mounts of heat. To battle the BTU output of these servers you must have hundreds of tons of cooling running all the time to keep everything operating under control. Monitoring the datacenters temperature is important in more ways than you think. In the event of a cooling failure, servers will begin to overheat extremely quickly, this in turn causes them to raise the speed of their internal fans to full speed to combat the internal heat. For one server this may not be so bad, but when 1000+ servers do it your total power consumption increases drastically. This can then cause circuits to draw more current which in turn heats them up further increasing the overall temperature in the facility. Servers CPUs will then begin to be throttled by the bios to cool them down which can hurt websites performance. Datacenters internally can actually reach high enough temperatures over 100 degrees Fahrenheit were circuit breakers begin to trip and large 150KVA+ UPS battery backup systems can actually fail from the high temperatures and the increased load they are seeing. Thermal expansion can also come into play causing wire lugs to come loose on UPS inputs and transformers further causing points of failure.

All of this is why reliable temperature monitoring is so crucial. Being able to monitor temperatures throughout a facility can keep you ahead of any potential failure. Commercial temperature monitoring systems are available, but they are extremely expensive and limited on their placement. If you have a 10,000+ square foot facility, monitoring the temperature everywhere could be extremely expensive and difficult to implement. So for all these reasons I wanted to design a system that was very inexpensive and easy to implement anywhere you needed it. Specifically I wanted to not only monitor ambient air in both hot and cold aisles, but also be able to monitor temperatures inside circuit panels and PDUs. Being able to measure a increase in temperature within the circuit panel or PDU itself could indicate a potential failure way ahead of time allowing it to be addressed before it causes a dangerous situation.

My solution was to build a simple temperature monitoring network that was ideal for a datacenter environment. When beginning the design process I came up with the following requirements that it needed to meet:

1.Inexpensive. Commercial temperature monitoring systems are very expensive. I wanted my sensors to be inexpensive enough where you could place them in locations that you would not normally be able to place a temperature sensor because of cost reasons.
2.Wireless. Most existing temperature systems are wired, which is silly in my opinion based on the inexpensiveness of existing XBee based wireless modules. This saves the need of having to run additional wires to each sensor and increases placement flexibility.
3.Easily expandable. I wanted to make sure that I could add additional sensors into the system at any time without hitting any limits on the amount of sensors, within reason.
4.Reliable. Previous systems I used were dumb in the fact that the sensors would sometimes have errors collecting data. This would in turn wake me up at 4am telling me the datacenter was at 150+ degrees Fahrenheit, then instantly back to 70 degrees. This was just annoying.

This is the design of my first three prototype sensors I made:




Each sensor is based on a PIC 18F25K20 microcontroller driven by a 16Mhz oscillator. An XBee module provides the wireless connectivity while a TI TMP100 I2C based temperature sensor monitors the current temperature condition. The TMP100 is one of my favorite chips, it is inexpensive, has 12 bit resolution, and has a wide temperature range. Now I know I could have easily made these much smaller by using all smd based components The PIC I chose is way overkill for the job anyway, but I had a large quantity of these in my parts bin left over from previous designs so I used them as I hate having parts sit around unused.

The software on the PIC simply reads the temperature from the TMP100 every 60 seconds and sends the temp along with a unique device ID to the XBee. The data it sends is also encrypted to prevent anyone from injecting rogue temperature data into my monitoring network. If someone really wanted to they could probably crack the encryption over time. I guess if they just really want to wake me up at night I'll have to implement even stronger encryption. ;)

The receiving end is PHP based. An Xbee / Max232 interface receives the data and sends it to /dev/ttyS0 on the receiving Linux based server. From there I decrypt and parse the received data and process it for errors. It is then checked for predefined temperature limits and emails the appropriate contacts if thresholds are reached. It is also stored into a database and sends a daily log of temperatures throughout the day.

The system is flexible as it allows me to add any additional sensors into the monitoring network at any time. I'm using 2.4Ghz based XBee's, but they are in sockets allowing me to replace them with the 900Mhz version in the event that a facility has a lot of walls blocking the 2.4Ghz signal.

So far the temperature sensors have been working very well, future plans include a modular temperature sensor board and a better (prettier) web interface to show the current facility conditions.