otmonitor uses 100% cpu after a while
Moderator: hvxl
otmonitor uses 100% cpu after a while
Hi,
I'm running the latest prebuilt otmonitor armhf off the otgw site (4.2.2) on a rbpi with the latest gateway firmware 4.2.3 on a gateway prebuilt by kiwi.
It's hooked up to a honeywell chronotherm round and a remeha avanta boiler.
The whole setup works beautifully except for the fact that the otmonitor binary's cpu usage increases by about 10% or so every day until it ends up using 100% of the pi's cpu constantly (and about 50% of memory) and is no longer responsive to commands. If I manually stop the binary and run it again it works fine until a few days later when the problem reoccurs.
It's running headless with only the web interface enabled (X and the like aren't even installed, running on minibian) and does not write logs to disk. I can provide the full configs if needed.
I noticed that the full command log history from the moment of starting is available in the web gui; to me it smells like the log ends up getting too big for the pi to handle, but I haven't looked at the code. Could this be the case?
If not, what might be causing this problem?
Independent of this issue, Is there log truncate or roundrobin option in the otmonitor somewhere I've missed (for the live in-memory log)? It does not seem very useful to push the entire log to web clients..
Thanks
I'm running the latest prebuilt otmonitor armhf off the otgw site (4.2.2) on a rbpi with the latest gateway firmware 4.2.3 on a gateway prebuilt by kiwi.
It's hooked up to a honeywell chronotherm round and a remeha avanta boiler.
The whole setup works beautifully except for the fact that the otmonitor binary's cpu usage increases by about 10% or so every day until it ends up using 100% of the pi's cpu constantly (and about 50% of memory) and is no longer responsive to commands. If I manually stop the binary and run it again it works fine until a few days later when the problem reoccurs.
It's running headless with only the web interface enabled (X and the like aren't even installed, running on minibian) and does not write logs to disk. I can provide the full configs if needed.
I noticed that the full command log history from the moment of starting is available in the web gui; to me it smells like the log ends up getting too big for the pi to handle, but I haven't looked at the code. Could this be the case?
If not, what might be causing this problem?
Independent of this issue, Is there log truncate or roundrobin option in the otmonitor somewhere I've missed (for the live in-memory log)? It does not seem very useful to push the entire log to web clients..
Thanks
Re: otmonitor uses 100% cpu after a while
Hi,
I may have the same problem:
http://www.domoticaforum.eu/viewtopic.p ... 462#p74731
For now I reboot the Pi every night (crontab)
Yoja
I may have the same problem:
http://www.domoticaforum.eu/viewtopic.p ... 462#p74731
For now I reboot the Pi every night (crontab)
Yoja
Re: otmonitor uses 100% cpu after a while
The program is supposed to only keep 1000 lines of history. Do you get more than that?
But even if that feature doesn't work as intended, it would only account for the increased memory use. It should not have much impact on the cpu use.
Rebooting the Pi just to restart a program is a bit like killing a fly with a sledgehammer.
But even if that feature doesn't work as intended, it would only account for the increased memory use. It should not have much impact on the cpu use.
Rebooting the Pi just to restart a program is a bit like killing a fly with a sledgehammer.
Schelte
Re: otmonitor uses 100% cpu after a while
Hi Schelte,
I get way, way more log entries
Not sure how many exactly, but it's definately way more than 1000. After getting the reply notification for this topic I checked otmonitor again and it had died with:
# ./otmonitor-ahf --daemon -f etc/otmonitor.conf
unable to alloc 90443790 bytes
Aborted
I had also previously nabbed a strace of a unresponsive otmonitor and it was looping the following syscall sequence:
gettimeofday({1433240040, 792521}, NULL) = 0
futex(0x64dfd8, FUTEX_WAKE_PRIVATE, 1) = 1
clock_gettime(CLOCK_REALTIME, {1433240040, 797202716}) = 0
futex(0x87a9cc, FUTEX_WAIT_PRIVATE, 1295, {0, 95318284}) = -1 ETIMEDOUT (Connection timed out)
write(4, "\0", 1) = 1
futex(0x64dfd8, FUTEX_WAKE_PRIVATE, 1) = 1
write(4, "\0", 1) = 1
Not sure if that is useful, or if it isn't, what else may be. Haven't had time to look at the source yet (or grok Tcl for that matter).
Here's the otmonitor.conf for completeness:
web {
enable true
port 8080
nopass true
}
connection {
device /dev/ttyUSB0
type serial
enable true
}
Also, the pi's CPU is pretty dismal, I would not be surprised if something like this could bring it to its knees.
Any more hints?
Thanks
I get way, way more log entries

# ./otmonitor-ahf --daemon -f etc/otmonitor.conf
unable to alloc 90443790 bytes
Aborted
I had also previously nabbed a strace of a unresponsive otmonitor and it was looping the following syscall sequence:
gettimeofday({1433240040, 792521}, NULL) = 0
futex(0x64dfd8, FUTEX_WAKE_PRIVATE, 1) = 1
clock_gettime(CLOCK_REALTIME, {1433240040, 797202716}) = 0
futex(0x87a9cc, FUTEX_WAIT_PRIVATE, 1295, {0, 95318284}) = -1 ETIMEDOUT (Connection timed out)
write(4, "\0", 1) = 1
futex(0x64dfd8, FUTEX_WAKE_PRIVATE, 1) = 1
write(4, "\0", 1) = 1
Not sure if that is useful, or if it isn't, what else may be. Haven't had time to look at the source yet (or grok Tcl for that matter).
Here's the otmonitor.conf for completeness:
web {
enable true
port 8080
nopass true
}
connection {
device /dev/ttyUSB0
type serial
enable true
}
Also, the pi's CPU is pretty dismal, I would not be surprised if something like this could bring it to its knees.
Any more hints?
Thanks
Re: otmonitor uses 100% cpu after a while
You're right. A bug was introduced in Tcl that interferes with the correct operation of the code that was supposed to limit the number of saved messages. When the correct fix is known, I'll build a new version that should not exhibit the ever increasing memory consumption. Then let's also see what impact that has on cpu use.
Schelte
Re: otmonitor uses 100% cpu after a while
I have posted otmonitor version 4.2.3 on my web site. In this version I avoid the Tcl bug by using a slightly different method to limit the number of saved messages. Please try it and let me know if that improves things.
Schelte
Re: otmonitor uses 100% cpu after a while
Schelte,
I've got errors starting otmonitor on raspberry
Is it related to changes made for domoticaforum.eu/viewtopic.php?f=75& ... 383#p75380 ?
I've got errors starting otmonitor on raspberry
Code: Select all
./otmonitor-ahf --daemon --system -w 8080
Not connected
while executing
"dbus method / com.tclcode.debug.Eval {apply {{info str} {uplevel #0 $str}}}"
(file "/home/pi/otmonitor-ahf/dbus.tcl" line 153)
invoked from within
"source /home/pi/otmonitor-ahf/dbus.tcl"
("uplevel" body line 1)
invoked from within
"uplevel #0 [list source [file join /home/pi/otmonitor-ahf $file]]"
(procedure "include" line 2)
invoked from within
"include dbus.tcl"
(file "/home/pi/otmonitor-ahf/otmonitor.tcl" line 1863)
invoked from within
"source /home/pi/otmonitor-ahf/otmonitor.tcl"
("uplevel" body line 1)
invoked from within
"uplevel #0 [list source [file join /home/pi/otmonitor-ahf $file]]"
(procedure "include" line 2)
invoked from within
"include otmonitor.tcl"
(file "/home/pi/otmonitor-ahf/main.tcl" line 13)
Re: otmonitor uses 100% cpu after a while
Hi,
Getting this error when running otmonitor a couple of minutes, using telnet minute to monitor:
invalid command name "nu"
while executing
"$arg $val"
(procedure "otdecode" line 19)
invoked from within
"otdecode data $type $id"
(procedure "otmessage" line 6)
invoked from within
"otmessage [clock microseconds] $line [expr {$type & 7}] $id $data"
(procedure "process" line 12)
invoked from within
"process [append data $line]"
(procedure "receive" line 6)
invoked from within
"receive"
Getting this error when running otmonitor a couple of minutes, using telnet minute to monitor:
invalid command name "nu"
while executing
"$arg $val"
(procedure "otdecode" line 19)
invoked from within
"otdecode data $type $id"
(procedure "otmessage" line 6)
invoked from within
"otmessage [clock microseconds] $line [expr {$type & 7}] $id $data"
(procedure "process" line 12)
invoked from within
"process [append data $line]"
(procedure "receive" line 6)
invoked from within
"receive"
Re: otmonitor uses 100% cpu after a while
Been running 4.2.3 since saturday afternoon.
Log size limiting works fine now. Mem use is stable at around 32m virtual / 10m resident. CPU use hovers between 2 and 5% on the pi. Not using dbus so I didn't hit that snag.
Seems to be working fine!
Thanks for patching, Schelte!
Log size limiting works fine now. Mem use is stable at around 32m virtual / 10m resident. CPU use hovers between 2 and 5% on the pi. Not using dbus so I didn't hit that snag.
Seems to be working fine!
Thanks for patching, Schelte!
Re: otmonitor uses 100% cpu after a while
That's good. Hopefully it stays that way. Thanks for reporting back.
Schelte