During my recent Cacti (0.8.7g) deployment on Debian 6.0.5, I encountered a particularly frustrating issue where the poller would hang indefinitely while waiting for non-existent child processes. Here's the technical deep dive into what I discovered.
The primary symptom was graphs not updating, with the poller consistently exceeding its maximum runtime. The key error message was:
POLLER: Poller[0] Maximum runtime of 298 seconds exceeded. Exiting.
Manual execution revealed the poller would:
- Process data normally for about 1 minute
- Begin looping "Waiting on 1 of 1 pollers"
- Eventually time out after 298 seconds
The root cause emerged when examining the database interaction pattern:
// Original problematic query
$finished_processes = db_fetch_cell("SELECT count(*) FROM cacti.poller_time
WHERE poller_id=0 AND end_time>'0000-00-00 00:00:00'");
Debug output showed the query would initially work, then suddenly return NULL:
Finished: 0 - Started: 1
Waiting on 1 of 1 pollers.
Finished: 1 - Started: 1
Finished: - Started: 1 // NULL value appearing
Direct MySQLi queries confirmed the data was available, suggesting an ADOdb issue:
$mysqli = new mysqli("localhost","cacti","cacti","cacti");
$result = $mysqli->query("SELECT COUNT(*) FROM poller_time
WHERE poller_id=0 AND end_time>'0000-00-00 00:00:00';");
$row = $result->fetch_assoc();
// Returns valid count while Cacti's query fails
Adding debug to ADOdb's _close() method revealed premature connection termination:
function _close() {
@mysql_close($this->_connectionID);
echo "!!!! CLOSED !!!!\n";
debug_print_backtrace();
$this->_connectionID = false;
}
The backtrace showed the connection was being closed during poller execution:
#0 ADODB_mysql->_close()
#1 ADOConnection->Close()
#2 db_close() called at [/usr/share/cacti/site/poller.php:455]
The fix involved modifying the connection handling in Cacti's database.php:
// Original problematic implementation
function db_close() {
global $database_sessions;
foreach($database_sessions as $id=>$conn) {
$conn->Close();
}
}
// Modified version with connection persistence
function db_close() {
global $database_sessions;
static $connection_cycles = 0;
// Only close connections every 5 cycles
if(++$connection_cycles >= 5) {
foreach($database_sessions as $id=>$conn) {
$conn->Close();
}
$connection_cycles = 0;
}
}
Additionally, we modified the poller timeout handling:
// In poller.php, around line 368
$timeout = 60; // seconds to wait for child processes
$start_time = microtime(true);
while(true) {
$finished = db_fetch_cell("SELECT...");
if($finished >= $started) break;
if((microtime(true) - $start_time) > $timeout) {
cacti_log("WARNING: Poller child process timeout");
break;
}
sleep(1);
}
When your Cacti poller gets stuck in a perpetual "Waiting on dead processes" loop, the root cause often lies in unexpected database connection termination. Here's what's happening under the hood:
// Typical symptom in cacti.log
POLLER: Poller[0] Maximum runtime of 298 seconds exceeded. Exiting.
Waiting on 1 of 1 pollers. // This keeps repeating
The smoking gun appears when examining the ADODB MySQL driver behavior. The connection drops mid-execution, yet Cacti continues polling as if nothing happened:
// Debug output showing the disconnect
!!!! CLOSED !!!!
#0 ADODB_mysql->_close() called at [/usr/share/php/adodb/adodb.inc.php:2141]
#1 ADOConnection->Close() called at [/usr/share/cacti/site/lib/database.php:68]
#2 db_close() called at [/usr/share/cacti/site/poller.php:455]
Implement these modifications to lib/database.php
to add connection verification:
function db_fetch_cell($sql, $colname = '') {
static $connection_retries = 0;
// Verify connection before query
if (!db_connection_valid()) {
if ($connection_retries++ < 3) {
db_reconnect();
} else {
cacti_log("DB_CONNECTION: Failed after $connection_retries attempts");
return NULL;
}
}
// Original fetch logic
$result = db_fetch_row($sql);
if (empty($colname)) {
return current($result);
}
return $result[$colname];
}
function db_connection_valid() {
global $database_sessions;
if (!isset($database_sessions['default'])) {
return false;
}
try {
return $database_sessions['default']->Execute("SELECT 1");
} catch (Exception $e) {
return false;
}
}
Often the database disconnect coincides with RRDTool operations. Add this validation before critical operations:
// In poller.php, before rrdtool updates
if (!db_connection_valid()) {
cacti_log("POLLER: Database connection lost before RRD updates");
db_reconnect();
// Re-validate poller_time table state
$finished = db_fetch_cell("SELECT COUNT(*) FROM poller_time
WHERE poller_id=0 AND end_time>'0000-00-00 00:00:00'");
if ($finished === NULL) {
exit("FATAL: Cannot re-establish database connection");
}
}
Modify your include/config.php
to enable more robust connection handling:
$database_type = "mysql";
$database_default = "cacti";
$database_hostname = "localhost";
$database_username = "cactiuser";
$database_password = "yourpassword";
$database_port = "3306";
$database_retries = 5; // Custom addition
$database_ssl = false; // Set true for cloud deployments
After implementing these changes, monitor your cacti.log
for these healthy patterns:
POLLER: Poller[0] DB_RECONNECT: Re-established connection (attempt 1)
SYSTEM STATS: Time:1.8732 Method:cmd.php Processes:1 Threads:N/A Hosts:2
RRDsProcessed:6 // Should match your device count
The poller should now complete within normal execution timeframes without getting stuck in wait loops for non-existent processes.