[SOLVED]Erratic behaviour/lock-ups/com errors Calenta/iSense

This Forum is about the Opentherm gateway (OTGW) from Schelte

Moderator: hvxl

Mars
Starting Member
Starting Member
Posts: 23
Joined: Sun Mar 03, 2013 5:02 pm

[SOLVED]Erratic behaviour/lock-ups/com errors Calenta/iSense

Post by Mars »

I'm running the OTGW USB/Wifi version with firmware 4.01 in combination with a Remeha Calenta and iSense (v28).

I'm having erratic behaviour, lock-ups and communication errors (F203) in both monitor and gateway mode.

1.:
In gateway mode, EVERY message is flagged as an Error 03 (something with bit received when not expected), but everything seems to work as expected, as depicted in the next figure:
Image

2:
Switching to Gateway mode stops these errors, but gives me at startup a bunch of Error 01's, which do disappear after a few seconds.

3:
I did run the diagnostics software which reported no errors. I see timings around the 500uS and 1msec mark for both boiler and thermostat.

4:
I do have the following problems:
  • After using the otmonitor program for hours/days without problems, stopping otmonitor results in some OTGW lock-up after a few hours: although the iSense reports no errors, nothing works anymore: the boiler status isn't reported anymore on the iSense, and setpoints on the iSense just do nothing or just for a short time (for example: flame status is ON during 30-60seconds, and then the boiler stops heating again).
    I have to power cycle the boiler to have a working boiler/otgw/thermostat combination again. Just power cycling the otgw does NOT help.
  • As this behaviour continued, I disconnected the USB cable and WIFI router, and after that everything seemed ok for a few weeks.
  • Then the problems started again: the iSense displayed very short F203 messages, boiler status was not displayed, and setpoints didn't work. After a few power cycles, and otgw restarts it worked again for a few hours, before the lock-up was back again.
As these lock-ups continue to popup about every day now, I removed the otgw and everything is running fine again, so it seems the otgw is the cause of these problems.

Does anyone have any clue what the problem might be? I haven't tried different firmware yet.
Last edited by Mars on Sun Jan 04, 2015 4:21 pm, edited 1 time in total.
hvxl
Senior Member
Senior Member
Posts: 1966
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: Erratic behaviour/lock-ups/comm errors Calenta/iSense

Post by hvxl »

I get the impression you are not running your gateway with the default setting for IT, i.e.: PR=T will return a 0 in the first position. The first thing to try is to set it back to default with the IT=1 command.
Schelte
Mars
Starting Member
Starting Member
Posts: 23
Joined: Sun Mar 03, 2013 5:02 pm

Re: Erratic behaviour/lock-ups/comm errors Calenta/iSense

Post by Mars »

I'm not aware of any change in these settings, but I will check this setting and connect the OTGW again to test if this was causing this erratic behaviour.
Mars
Starting Member
Starting Member
Posts: 23
Joined: Sun Mar 03, 2013 5:02 pm

Re: Erratic behaviour/lock-ups/comm errors Calenta/iSense

Post by Mars »

New results :D

After some hours of testing I finally got it working by setting the reference voltage to 1.667 instead of the default 1.250!

While testing I had the following results:
  • Setting IT=1 resulted in an Error 03 for every message
  • Setting It=0 resulted in many Error 01 messages, but stopped the Error 03 messages
  • Sometimes after an hour, sometimes after minutes the Error 03 came back, including repeating the same opentherm message all over in de logging (seems to be the lock-up!)
  • Restarting the ogtw sometimes solved things, but not for long
  • Setting the reference voltage higher immediately stopped the error messages.
I hope everything keeps working as it does now 8)

BTW: What about the MQTT function of otmonitor? It doesn't seem to publish anything, just subscribing to some action topic?
mike7
Member
Member
Posts: 102
Joined: Mon Dec 02, 2013 8:45 am

Re: Erratic behaviour/lock-ups/comm errors Calenta/iSense

Post by mike7 »

Mars wrote: BTW: What about the MQTT function of otmonitor? It doesn't seem to publish anything, just subscribing to some action topic?
Check mqtt section in the options. What is your OS? By default otmonitor publish only signals to mqtt.
hvxl
Senior Member
Senior Member
Posts: 1966
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: Erratic behaviour/lock-ups/comm errors Calenta/iSense

Post by hvxl »

Mars wrote:I finally got it working by setting the reference voltage to 1.667 instead of the default 1.250!
I didn't expect adjusting the reference voltage would help since you mentioned that "everything seems to work as expected" in monitor mode (or at least I think that's you meant). Monitor mode also depends on a correct reference voltage. Also the diagnostics software uses the default reference voltage. But if setting it to 1.667 helps, great!


In general your problem description was contradicting itself in few places:
  • You had "erratic behaviour, lock-ups and communication errors (F203) in both monitor and gateway mode", yet "everything seems to work as expected".
  • "EVERY message is flagged as an Error 03" but still the picture shows information, so some good messages must have come through.
  • In #1 you report about gateway mode, then in #2 you switch to the same mode.
  • There has never been a firmware version called 4.01.
So, in future, please be accurate in the observations you report. For some of the contradictions mentioned above I can probably guess what you meant. But with every guess I have to make the risk of misunderstanding the problem increases. That makes it very hard to provide useful assistance.
Schelte
Mars
Starting Member
Starting Member
Posts: 23
Joined: Sun Mar 03, 2013 5:02 pm

Re: Erratic behaviour/lock-ups/comm errors Calenta/iSense

Post by Mars »

hvxl wrote:
Mars wrote:I finally got it working by setting the reference voltage to 1.667 instead of the default 1.250!
I didn't expect adjusting the reference voltage would help since you mentioned that "everything seems to work as expected" in monitor mode (or at least I think that's you meant). Monitor mode also depends on a correct reference voltage. Also the diagnostics software uses the default reference voltage. But if setting it to 1.667 helps, great!
Yup, and it is currently still working :D
So, in future, please be accurate in the observations you report. For some of the contradictions mentioned above I can probably guess what you meant. But with every guess I have to make the risk of misunderstanding the problem increases. That makes it very hard to provide useful assistance.
Reading back my problem description, I can't find much wrong in that point, so I've added some logging to your questions to make it 100% clear :)
In general your problem description was contradicting itself in few places:
  • You had "erratic behaviour, lock-ups and communication errors (F203) in both monitor and gateway mode", yet "everything seems to work as expected".
  • "EVERY message is flagged as an Error 03" but still the picture shows information, so some good messages must have come through.
This is applicable to the first point I made: I get Error 03 messages, BUT the webpage shows all the data, ie it seems to work as expected. This is no contradiction, just a collection of facts/observations:

Code: Select all

14:12:23.633347  T80000100  Read-Data   Status: 00000001 00000000
14:12:23.716097  Error 03
14:12:24.637418  T10013A00  Write-Data  Control setpoint: 58.00
14:12:24.715257  Error 03
14:12:25.643446  T00110000  Read-Data   Relative modulation level: 0.00
14:12:25.712929  Error 03
14:12:26.637053  T80190000  Read-Data   Boiler water temperature: 0.00
14:12:26.714985  Error 03
14:12:27.641546  T900E6400  Write-Data  Maximum relative modulation level: 100.00
14:12:27.714324  Error 03
14:12:28.639013  T80000100  Read-Data   Status: 00000001 00000000
  • In #1 you report about gateway mode, then in #2 you switch to the same mode.
My bad :shock:

But to show you wat I meant by switching to monitor mode (GW=0), the Error 03 disappears right after the GW=0 command:

Code: Select all

12:10:27.762270  T00030000  Read-Data   Slave configuration: 00000000 0
12:10:27.885424  Error 03
12:10:28.759031  T00030000  Read-Data   Slave configuration: 00000000 0
12:10:28.894127  Error 03
12:10:29.008063  Command: GW=0
12:10:29.040273  GW: 0
12:10:29.773877  T00030000  Read-Data   Slave configuration: 00000000 0
12:10:29.878264  B4003410B  Read-Ack    Slave configuration: 01000001 11
12:10:30.782434  T807F0000  Read-Data   Slave product version: 0 0
12:10:30.873126  BC07F090B  Read-Ack    Slave product version: 9 11
12:10:31.797690  T00050000  Read-Data   Application-specific flags: 00000000 0
12:10:31.879147  BC00500FF  Read-Ack    Application-specific flags: 00000000 255
The so called lock-ups (thermostat not showing current boiler status, set points not working) is shown as repeated messages in the log. Sometimes the Gateway resets (shows the startup message OpenTherm Gateway 4.x.x):

Code: Select all

12:08:24.808773  T00030000  Read-Data   Slave configuration: 00000000 0
12:08:25.816180  Error 03
12:08:26.802593  T00030000  Read-Data   Slave configuration: 00000000 0
12:08:27.820917  Error 03
12:08:28.824883  T00030000  Read-Data   Slave configuration: 00000000 0
12:08:28.959391  Error 03
12:08:29.828069  T00030000  Read-Data   Slave configuration: 00000000 0
12:08:30.845461  Error 03
12:08:31.843901  T00030000  Read-Data   Slave configuration: 00000000 0
12:08:31.946864  Error 03
12:08:32.852571  T00030000  Read-Data   Slave configuration: 00000000 0
12:08:32.957204  Error 03
Setting IT=1 to IT=0 stops the Error 03, and starts the Error 01 messages:

Code: Select all

12:08:37.896964  T00030000  Read-Data   Slave configuration: 00000000 0
12:08:38.048978  Error 03
12:08:38.745838  Command: IT=0
12:08:38.768990  IT: 0
12:08:38.897208  T00030000  Read-Data   Slave configuration: 00000000 0
12:08:38.919318  Error 01
12:08:39.903810  T00030000  Read-Data   Slave configuration: 00000000 0
12:08:39.928611  Error 01
Anf finally, after setting VR=5, the errors stop directly:

Code: Select all

12:13:54.685936  T00030000  Read-Data   Slave configuration: 00000000 0
12:13:54.711882  Error 01
12:13:55.694479  T00030000  Read-Data   Slave configuration: 00000000 0
12:13:55.727515  Error 01
12:13:56.045849  Command: VR=5
12:13:56.069487  VR: 5
12:13:56.712142  T00030000  Read-Data   Slave configuration: 00000000 0
12:13:56.875668  B4003410B  Read-Ack    Slave configuration: 01000001 11
12:13:57.714916  T807F0000  Read-Data   Slave product version: 0 0
12:13:57.877513  BC07F090B  Read-Ack    Slave product version: 9 11
12:13:58.726074  T00050000  Read-Data   Application-specific flags: 00000000 0
12:13:58.874886  BC00500FF  Read-Ack    Application-specific flags: 00000000 255
  • There has never been a firmware version called 4.01.
I forgot one dot, it is 4.0.1..., an obvious mistake:

Code: Select all

13:45:41.606794  OpenTherm Gateway 4.0.1
13:45:41.659179  Thermostat disconnected
13:45:45.737429  R00000000  Read-Data   Status: 00000000 00000000
I will provide some logging in the future, if available!

Now I will continue finding the cause of the disconnects between the TP Link and OTMonitor...
hvxl
Senior Member
Senior Member
Posts: 1966
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: Erratic behaviour/lock-ups/comm errors Calenta/iSense

Post by hvxl »

Logs do take much of the guessing out of the equation. Your log shows Error 03 every other message, not every message. The messages from the thermostat are received correctly, only the messages from the boiler can not be understood. It is then even more surprising that the VR command fixed the issue. The reference voltage should hardly have any influence on the boiler side, unless there is something wrong with R4, R2, or D9. Test #5 of the diagnostics firmware should have revealed this problem.

I can't explain why the errors stop when you switch to monitor mode. The same code is used to decode the messages. The only long shot is that in gateway mode the LEDs negatively affect the supply voltage leading to problems. In monitor not all LEDs work, which may make the supply voltage more stable. Do you have LEDs connected? Are you using current-limiting resistors with high enough values? Come to think of it, diagnostics test #5 also doesn't drive the LEDs, so maybe that's why it didn't show a problem there.

The firmware version was indeed one where I could "probably guess what you meant". I estimated the chance that you forgot a dot higher than you accidentally hitting a 0.
Schelte
Mars
Starting Member
Starting Member
Posts: 23
Joined: Sun Mar 03, 2013 5:02 pm

Re: Erratic behaviour/lock-ups/comm errors Calenta/iSense

Post by Mars »

hvxl wrote:Logs do take much of the guessing out of the equation. Your log shows Error 03 every other message, not every message. The messages from the thermostat are received correctly, only the messages from the boiler can not be understood. It is then even more surprising that the VR command fixed the issue. The reference voltage should hardly have any influence on the boiler side, unless there is something wrong with R4, R2, or D9. Test #5 of the diagnostics firmware should have revealed this problem.
Weird then, but I will check R2, R4 and D9. I didn't notice any strange values while running test #5 btw but didn't write them down.
I can't explain why the errors stop when you switch to monitor mode. The same code is used to decode the messages. The only long shot is that in gateway mode the LEDs negatively affect the supply voltage leading to problems. In monitor not all LEDs work, which may make the supply voltage more stable. Do you have LEDs connected? Are you using current-limiting resistors with high enough values? Come to think of it, diagnostics test #5 also doesn't drive the LEDs, so maybe that's why it didn't show a problem there.
I haven't connected any LED, so that can't be the problem.

I also used several kind of cable between the gateway and the boiler (long, short, cat5, phone cable, electric wire), but all gave the same results.

I have no scope to check the signals coming in from the boiler.

This morning everything was still running fine. The only thing that I saw was a high CPU load on my server running otmonitor (around 25% instead of sub 1%), resulting in a 10W higher power consumption. I suspect that either otmonitor and/or the otmonitor webserver are responsible for the high CPU load, as this is the only change on my server. I will check that later this week and report back here.
Mars
Starting Member
Starting Member
Posts: 23
Joined: Sun Mar 03, 2013 5:02 pm

Re: Erratic behaviour/lock-ups/comm errors Calenta/iSense

Post by Mars »

This morning everything was still running fine. The only thing that I saw was a high CPU load on my server running otmonitor (around 25% instead of sub 1%), resulting in a 10W higher power consumption. I suspect that either otmonitor and/or the otmonitor webserver are responsible for the high CPU load, as this is the only change on my server. I will check that later this week and report back here.
The otmonitor was partly responsible for the high CPU load: the other one was the Windows Indexing service. When I disabled the logfile for otmonitor, the high CPU load for the indexing service was gone, so that one is solved.

As there was a Windows Update, I had to reboot my server. After that the high CPU load for otmonitor was down to 0.2-0.9% (coming from 10-15%). Memory consumption went also down from 80MB to 8MB.
mike7 wrote:
Mars wrote: BTW: What about the MQTT function of otmonitor? It doesn't seem to publish anything, just subscribing to some action topic?
Check mqtt section in the options. What is your OS? By default otmonitor publish only signals to mqtt.
My OS is Windows 8.1 Pro. I did check the MQTT panel in otmonitor, but don't see anything published by otmonitor to mosquitto. The only thing I see is some subscribe.
Running mosquitto from the commandline shows the following:

Code: Select all

Using default config
Opening ipv6 listen socket on port 1883
Opening ipv4 listen socket on port 1883
New connection from ::1 on port 1883
New client connected from ::1 as otmon (c2, k120).
Sending CONNACK to otmon (0)
Received SUBSCRIBE from otmon
	actions/otmonitor/+ (QoS 2)
otmon 2 actions/otmonitor/+
Sending SUBACK to otmon
I still have to check the values of R2, R4 and D9 :)
mike7
Member
Member
Posts: 102
Joined: Mon Dec 02, 2013 8:45 am

Re: Erratic behaviour/lock-ups/comm errors Calenta/iSense

Post by mike7 »

Mars wrote: My OS is Windows 8.1 Pro. I did check the MQTT panel in otmonitor, but don't see anything published by otmonitor to mosquitto. The only thing I see is some subscribe.
Running mosquitto from the commandline shows the following:
What is an output of "mosquitto_sub -t \# -v" ?
Mars
Starting Member
Starting Member
Posts: 23
Joined: Sun Mar 03, 2013 5:02 pm

Re: Erratic behaviour/lock-ups/comm errors Calenta/iSense

Post by Mars »

mike7 wrote:What is an output of "mosquitto_sub -t \# -v" ?
Nothing for otmonitor. I only see my testmessages send using mqttinspector and my owntracks messages, so mosquitto is running fine.

I did enable MQTT once more in otmonitor and restarted otmonitor, but still nothing...
mike7
Member
Member
Posts: 102
Joined: Mon Dec 02, 2013 8:45 am

Re: Erratic behaviour/lock-ups/comm errors Calenta/iSense

Post by mike7 »

Mars wrote:

Code: Select all

New connection from ::1 on port 1883
New client connected from ::1 as otmon (c2, k120).
I did enable MQTT once more in otmonitor and restarted otmonitor, but still nothing...
try to disable ipv6 in mosquitto, or change address to 127.0.0.1
Mars
Starting Member
Starting Member
Posts: 23
Joined: Sun Mar 03, 2013 5:02 pm

Re: Erratic behaviour/lock-ups/comm errors Calenta/iSense

Post by Mars »

mike7 wrote:try to disable ipv6 in mosquitto, or change address to 127.0.0.1
I have no idea how to disable ipv6 in mosquitto. I'm using a secure connection for outside (owntracks) access and local port access for any client (no username/pw).

I did enter 127.0.0.1 in otmonitor, but still nothing.
Sending data through mosquitto_pub simply works, so I have no idea why this doen'st work. The otgw website also doesn't describe mqtt, at least I can't find it.

Edit 18-12-2014:
- otgw and otmonitor still running without any error.
- mqtt still not working. I did try an older 4.1b2 version, but got same results.
- as the connect & subscribe to mosquitto work, it seems a publish-only problem somewhere in otmonitor, but I can't find what...
hvxl
Senior Member
Senior Member
Posts: 1966
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: Erratic behaviour/lock-ups/comm errors Calenta/iSense

Post by hvxl »

Let's add some debugging information:
  • Please download the attached file and unzip it.
  • Copy the complete contents to the clipboard.
  • Run otmonitor and make sure MQTT is enabled.
  • Press F12.
  • Paste the data from the clipboard into the console window.
  • Disable and re-enable MQTT.
  • Wait for some new data to appear in the console window.
  • Select everything in the console window and copy it (Ctrl-A, Ctrl-C).
  • Paste the data into a file and provide that to me.
(Buggy debug code attachment deleted. See later post for corrected version.)
Schelte
Post Reply

Return to “Opentherm Gateway Forum”