OT monitor sometimes looses contact with gateway

This Forum is about the Opentherm gateway (OTGW) from Schelte

Moderator: hvxl

yjb
Member
Member
Posts: 211
Joined: Fri Apr 17, 2009 1:15 pm
Location: Venhuizen, Netherlands

OT monitor sometimes looses contact with gateway

Post by yjb »

Hi,

I'm running otmonitor on a Linux system and connect over the network to the gateway.:

Code: Select all

tcp        0      0 192.168.0.176:47857     192.168.0.62:1001       ESTABLISHED 14738/tclkit
Out of habit I reboot the gateway, that is to say the network piece, every week. Not always, let's say 50% of the times, otmonitor is no longer able to recover from this temporary disconnect and produces "flatliners until the I restart otmonitor (no additional restart required of the gateway).
While otmonitor is not able to talk to the gateway, I do not see any sign of a network connection to port 1001, but I do see port 1001 listening on the gateway side of things.

Is is possible to look into this, and make the network connection more "persistent", or at least create a heartbeat so that otmonitor recognizes loss of connection?

This is part of the log during the reboot:

Code: Select all

02:59:58.614090  B40192B00  Read-Ack    Boiler water temperature: 43.00
02:59:59.518995  T101813D7  Write-Data  Room temperature: 19.84
02:59:59.627808  BD01813D7  Write-Ack   Room temperature: 19.84
03:00:00.506432  T00000000  Read-Data   Status: 00000000 00000000
03:00:00.615300  BC0000000  Read-Ack    Status: 00000000 00000000
03:00:01.515140  T10010600  Write-Data  Control setpoint: 6.00
03:00:01.623179  BD0010600  Write-Ack   Control setpoint: 6.00
03:00:39.295159  T801B0D00

BE01B0D00

A401B0D4D

T00000000

BC0000000

T10010600

BD0010600

T00110000

BC0110000

T80190000

B40192B00

T900E6400

B500E6400

T00000000

BC0000000

T10010600

BD0010600

T00110000

BC0110000

T80190000

B40192B00

T001D0000

R801A0000

BE0
SERead-Data Outside temperature: 13.00
03:00:39.295688  SE
03:00:39.296157  SE
03:00:39.296396  SE
03:00:39.296866  SE
03:00:39.299839  SE
03:00:39.311881  SE
03:00:39.325953  SE
03:00:39.339870  SE
03:00:39.353848  SE
03:00:39.366909  SE
03:00:39.379923  SE
03:00:39.394012  SE
03:00:39.406891  SE
03:00:39.421918  SE
03:00:39.434824  SE
03:00:39.448880  SE
03:00:39.461877  SE
03:00:39.475890  SE
03:00:39.488911  SE
03:00:39.502844  SE
03:00:39.515909  T101813D7  Write-Data  Room temperature: 19.84
03:00:39.519823  SE
03:00:39.529820  SE
03:00:39.536871  SE
03:00:39.623891  BD01813D7  Write-Ack   Room temperature: 19.84
03:00:40.501814  T00000000  Read-Data   Status: 00000000 00000000
03:00:40.610098  BC0000000  Read-Ack    Status: 00000000 00000000
03:00:41.510845  T10010600  Write-Data  Control setpoint: 6.00
03:00:41.617436  BD0010600  Write-Ack   Control setpoint: 6.00
03:00:42.501968  T00110000  Read-Data   Relative modulation level: 0.00
03:00:42.609850  BC0110000  Read-Ack    Relative modulation level: 0.00
03:00:43.510977  T80190000  Read-Data   Boiler water temperature: 0.00
03:00:43.617895  B40192B00  Read-Ack    Boiler water temperature: 43.00
03:20:01.999789  Command (via websocket from 192.168.0.176:57693): OT=13.3
03:30:01.643183  Command (via websocket from 192.168.0.176:57743): OT=13.3
YJB> Restarted otmonitor
08:35:15.862615  T00090000
08:35:15.892026  R801A0000  Read-Data   DHW temperature: 0.00
08:35:15.892874  BE01A0000  Data-Inv    DHW temperature: 0.00
08:35:15.893279  Command: PR=A
08:35:15.893583  AC0090000  Read-Ack    Remote override room setpoint: 0.00
And this is from a previous reboot sequence:

Code: Select all

02:59:57.318827  B40192400  Read-Ack    Boiler water temperature: 36.00
02:59:58.201800  T900E6400  Write-Data  Maximum relative modulation level: 100.00
02:59:58.310865  B500E6400  Write-Ack   Maximum relative modulation level: 100.00
02:59:59.210805  T00000000  Read-Data   Status: 00000000 00000000
02:59:59.319812  BC0000000  Read-Ack    Status: 00000000 00000000
03:00:00.202306  T10010600  Write-Data  Control setpoint: 6.00
03:00:00.310369  BD0010600  Write-Ack   Control setpoint: 6.00
03:00:01.211212  T00110000  Read-Data   Relative modulation level: 0.00
03:00:01.318117  BC0110000  Read-Ack    Relative modulation level: 0.00
03:00:39.054764  T00000000  Read-Data   Status: 00000000 00000000
03:00:39.055525  BC0000000  Read-Ack    Status: 00000000 00000000
03:00:39.055987  T10010600  Write-Data  Control setpoint: 6.00
03:00:39.056489  BD0010600  Write-Ack   Control setpoint: 6.00
If you want me to collect more information, please let me know (might take a couple of weeks as it doesn't happen every week).

Thanks for looking into this,
Ysbrand
hvxl
Senior Member
Senior Member
Posts: 1965
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OT monitor sometimes loses contact with gateway

Post by hvxl »

Which version of otmonitor are you using? Are you running it with the GUI or as a daemon? Assuming GUI mode: What does the icon on the right-hand side of the status bar look like when the connection is no longer working? You're claiming it doesn't reconnect when you click on the icon? Does it report any error?
Schelte
yjb
Member
Member
Posts: 211
Joined: Fri Apr 17, 2009 1:15 pm
Location: Venhuizen, Netherlands

Re: OT monitor sometimes looses contact with gateway

Post by yjb »

Hi Schelte,

I'm running 4.2.1 in Daemon mode, so there isn't really a button to press.

Is there anything I can checkout?

Thanks,
Ysbrand
hvxl
Senior Member
Senior Member
Posts: 1965
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OT monitor sometimes loses contact with gateway

Post by hvxl »

I've added some dbus commands to otmonitor 2.4.3 that allow you to disconnect and/or reconnect otmonitor to the gateway. On a raspberry pi you can use the following commands:
  • dbus-send --print-reply --dest=com.tclcode.otmonitor / com.tclcode.otmonitor.Disconnect
  • dbus-send --print-reply --dest=com.tclcode.otmonitor / com.tclcode.otmonitor.Reconnect
  • dbus-send --print-reply --dest=com.tclcode.otmonitor / com.tclcode.otmonitor.Connect :192.168.0.62:1001
Hopefully it's obvious what each command does. You can also check whether otmonitor thinks the connection is up with the following command:
  • dbus-send --print-reply --dest=com.tclcode.otmonitor / org.freedesktop.DBus.Properties.Get :com.tclcode.otmonitor :Connected
The connection to the gateway is handled by an endless loop. I suspect your reboots somehow caused an uncaught exception that broke the loop. If that is true, reconnecting won't be possible. When you try you will get:
  • Error org.freedesktop.DBus.Error.Failed: invalid command name "connect"
I also added some additional logging to hopefully figure out what has happened. If you run into the situation again, please first try to reconnect and then collect the logs in one of two ways:
  1. dbus-send --print-reply --dest=com.tclcode.otmonitor / com.tclcode.debug.Log
  2. Point your browser to http://192.168.0.176:8080/debuglog.txt (assuming you have the web server running on the default port)
Based on my theory I examined the code again to see if I could find where it might go wrong. I couldn't find anything obvious, but I added some protection around a command that I could not entirely guarantee could never cause trouble. If that was indeed the culprit, you should not see the problem anymore. I did add logging when the protection kicks in. So if you have rebooted your gateway's network piece several times without problems, you can check the logs as described above. If they are empty, the problem simply hasn't occurred. Otherwise they may confirm that it was indeed the suspected command.

Please check if you can approach your otmonitor via the session dbus. If not, you may want to have it connect to the system dbus. See http://192.168.0.176:8080/dbus.html for more information.
Schelte
yjb
Member
Member
Posts: 211
Joined: Fri Apr 17, 2009 1:15 pm
Location: Venhuizen, Netherlands

Re: OT monitor sometimes looses contact with gateway

Post by yjb »

Thanks Schelte for looking into this.

I've downloaded the code, but running it is giving some problems:

Code: Select all

version conflict for package "dbus": have 2.0, need 2.1
    while executing
"package require dbus 2.1"
I will need to figure out which dbus components require an update on my system

Code: Select all

yast2-dbus-client-2.22.0-4.1.1.x86_64
dbus-1-x11-1.6.8-2.22.1.x86_64
dbus-1-1.6.8-2.22.1.x86_64
libdbusmenu-qt2-0.9.2-4.1.1.x86_64
libdbus-1-3-32bit-1.6.8-2.22.1.x86_64
dbus-1-glib-0.100.1-1.1.1.x86_64
dbus-1-python-1.1.1-2.1.1.x86_64
libdbus-1-3-1.6.8-2.22.1.x86_64
Ysbrand
hvxl
Senior Member
Senior Member
Posts: 1965
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OT monitor sometimes loses contact with gateway

Post by hvxl »

The dbus 2.1 package is included in otmonitor. If you're running from sources, you need to use the latest tclkit from http://otgw.tclcode.com/otmonsrc.html
Schelte
yjb
Member
Member
Posts: 211
Joined: Fri Apr 17, 2009 1:15 pm
Location: Venhuizen, Netherlands

Re: OT monitor sometimes looses contact with gateway

Post by yjb »

Thanks, I didn't realize that there was a requirement to update tclkit as well (I'm running from source in order to be able to increase the logging time).

In the meantime I've upgrade to 4.2.3.2 based on the errors I was seeing in another forum thread.

The connect / reconnect seems to work:

Code: Select all

# dbus-send --system --print-reply --dest=com.tclcode.otmonitor / com.tclcode.otmonitor.Disconnect
method return sender=:1.9934 -> dest=:1.10557 reply_serial=2
# dbus-send --system --print-reply --dest=com.tclcode.otmonitor / com.tclcode.otmonitor.Reconnect
method return sender=:1.9934 -> dest=:1.10528 reply_serial=2
.

I can't get the status and log reporting tto work, I assume that it's related to my lack of knowledge on the dbus stuff:

Code: Select all

dbus-send --system --print-reply --dest=com.tclcode.otmonitor / org.freedesktop.DBus.Properties.Get :com.tclcode.otmonitor :Connected
Error org.freedesktop.DBus.Error.AccessDenied: Rejected send message, 1 matched rules; type="method_call", sender=":1.10574" (uid=0 pid=31510 comm="dbus-send --system --print-reply --dest=com.tclcod") interface="org.freedesktop.DBus.Properties" member="Get" error name="(unset)" requested_reply="0" destination="com.tclcode.otmonitor" (uid=0 pid=27271 comm="./tclkit-x64 otmonitor.vfs.4.2.3.2/main.tcl --syst")

dbus-send --system --print-reply --dest=com.tclcode.otmonitor / com.tclcode.debug.Log
Error org.freedesktop.DBus.Error.AccessDenied: Rejected send message, 1 matched rules; type="method_call", sender=":1.10583" (uid=0 pid=31560 comm="dbus-send --system --print-reply --dest=com.tclcod") interface="com.tclcode.debug" member="Log" error name="(unset)" requested_reply="0" destination="com.tclcode.otmonitor" (uid=0 pid=27271 comm="./tclkit-x64 otmonitor.vfs.4.2.3.2/main.tcl --syst")
Anyways, I'll try to explore things further, and when I hit the issue again, I will report back (II will increase the reboot frequency of the gateway to every day, just to make the failure more likely to happen).

Thanks again,
Ysbrand
hvxl
Senior Member
Senior Member
Posts: 1965
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OT monitor sometimes loses contact with gateway

Post by hvxl »

/etc/dbus-1/system.d/com.tclcode.otmonitor.conf is a bit too restrictive. You can simplify it to:

Code: Select all

<!DOCTYPE busconfig PUBLIC "-//freedesktop//DTD D-BUS Bus Configuration 1.0//EN"
 "http://www.freedesktop.org/standards/dbus/1.0/busconfig.dtd">
<busconfig>
  <policy user="pi">
    <allow own="com.tclcode.otmonitor"/>
  </policy>
  <policy context="default">
    <allow send_destination="com.tclcode.otmonitor"/>
  </policy>
</busconfig>
After that change it should work. There's no need to restart anything. The change will be picked up instantly.
Schelte
yjb
Member
Member
Posts: 211
Joined: Fri Apr 17, 2009 1:15 pm
Location: Venhuizen, Netherlands

Re: OT monitor sometimes looses contact with gateway

Post by yjb »

Yes! That works:

Code: Select all

 dbus-send --system --print-reply --dest=com.tclcode.otmonitor / org.freedesktop.DBus.Properties.Get :com.tclcode.otmonitor :Connected
method return sender=:1.7 -> dest=:1.1692 reply_serial=2
   boolean true

dbus-send --system --print-reply --dest=com.tclcode.otmonitor / com.tclcode.debug.Log
method return sender=:1.7 -> dest=:1.1711 reply_serial=2
   string ""
So, I guess I'm now ready to collect more information (in case things are still falling apart).
yjb
Member
Member
Posts: 211
Joined: Fri Apr 17, 2009 1:15 pm
Location: Venhuizen, Netherlands

Re: OT monitor sometimes looses contact with gateway

Post by yjb »

Probably off-topic, but I discovered something new in the web interface (4.2.3 and up):
Capture.JPG
Capture.JPG (10.04 KiB) Viewed 15776 times
Selecting it gives:
Capture1.JPG
Capture1.JPG (19.83 KiB) Viewed 15776 times
Much appreciated, much clearer than the previous "hoover" legend.
hvxl
Senior Member
Senior Member
Posts: 1965
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OT monitor sometimes looses contact with gateway

Post by hvxl »

Yeah, the Hoover sucks.

Image
Schelte
yjb
Member
Member
Posts: 211
Joined: Fri Apr 17, 2009 1:15 pm
Location: Venhuizen, Netherlands

Re: OT monitor sometimes looses contact with gateway

Post by yjb »

Just an update:

I've switched to daily reboots on the gateway (at least the network server that is attached to it), but I haven't seen any otmonitor outages yet. Will continue to monitor (and report back).
hvxl
Senior Member
Senior Member
Posts: 1965
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OT monitor sometimes looses contact with gateway

Post by hvxl »

Anything in the logs?
Schelte
yjb
Member
Member
Posts: 211
Joined: Fri Apr 17, 2009 1:15 pm
Location: Venhuizen, Netherlands

Re: OT monitor sometimes looses contact with gateway

Post by yjb »

I can't find anything:

Code: Select all

 dbus-send --system --print-reply --dest=com.tclcode.otmonitor / com.tclcode.debug.Log
method return sender=:1.19102 -> dest=:1.87548 reply_serial=2
   string ""
And http://192.168.0.176:8080/debuglog.txt gives an empty screen
hvxl
Senior Member
Senior Member
Posts: 1965
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OT monitor sometimes looses contact with gateway

Post by hvxl »

Then it seems you haven't hit the problem yet. Because if my change caught it, it would be reported in the log. If it didn't, you would have noticed.
Schelte
Post Reply

Return to “Opentherm Gateway Forum”