Opened 2 years ago

Closed 11 months ago

Last modified 11 months ago

#436 closed defect (fixed)

Error Module in PDHColletor.cpp:215 Failed to query performance counters

Reported by: jamdev12 Owned by: mickem
Priority: 1 Milestone: 0.4.0
Component: CheckSystem Version: 0.4.0-nightly
Severity: Bugs Keywords: checksystem, PDHCollector, 215, Failed, Error, Module
Cc: sbhobbs@…, msn@…, jbroome@…, mathew.yanovsky@…

Description

Hi all,

The issue here is that when checksystem.dll is activated in the modules section of the NSC.ini file the error below presents itself.

2011-03-10 11:14:58: error:modules\CheckSystem\PDHCollector.cpp:215: Failed to query performance counters: \Processor(_total)\% Processor Time: PdhGetFormattedCounterValue? failed: A counter with a negative denominator value was detected. (800007D6)
2011-03-10 11:15:04: error:modules\CheckSystem\PDHCollector.cpp:215: Failed to query performance counters: \Processor(_total)\% Processor Time: PdhGetFormattedCounterValue? failed: A counter with a negative denominator value was detected. (800007D6)
2011-03-10 11:15:06: error:modules\CheckSystem\PDHCollector.cpp:215: Failed to query performance counters: \Processor(_total)\% Processor Time: PdhGetFormattedCounterValue? failed: A counter with a negative denominator value was detected. (800007D6)

When the checksystem.dll module is commented out the errors in the log disappear, but you can no longer query CPU, MEM or Service states because those checks are relying on the checksystem.dll module to be active.

I have seen that this issue is present in the RC 0.3.8 version and the issue is also present in the Nightly build for 0.3.9.

The system that I'm running is Windows 2008 R2 x64. This happens on all of the systems running this OS. I don't know if this occurs in any other OS' because all I run is Windows 2008 R2 x64.

The logs in this case grow to exponential proportions after a few days.

Change History (21)

comment:1 Changed 2 years ago by BrantleyHobbs

  • Cc sbhobbs@… added

I'm experiencing this problem as well. I also am running Windows 2008 R2 x64. I have tried all versions from nightly all the way back to .2.7, and they all exhibit this behavior.

This is a total show-stopper for me.

comment:2 Changed 2 years ago by biba@…

Hi, same issue on my servers, all affected servers 2008R2 X64. Please patch this problem.

Thanks Biba

comment:3 Changed 2 years ago by sdouce

  • Milestone 0.3.9 deleted

Idem on 2003R2 X64 SP2 ...
I have Tested 3.7 3.8 with no success ...

I cant get no CLIENTVERSION USEDDISKSPACE ou anything else.

comment:4 Changed 2 years ago by mickem

Interesting is this consistent on all servers?
Usually this error is "intermittent on one server" or some such.
Does anyone have a VM with w2k8r2 x64 I could use to debug this issue?

Michael Medin

comment:5 Changed 20 months ago by dion

  • Cc msn@… added

It looks like this holds at least for the R2 Windows X64 versions. I see this error logged on 2008R2 X64, but not on a 2008 X64 server, both running 3.9.

comment:6 Changed 18 months ago by jbroome

  • Cc jbroome@… added

Seeing the same thing with .3.9 on server 2003 32bit. Running in a RHEV VM if that matters.

comment:7 Changed 15 months ago by mickem

  • Milestone set to 0.4.1
  • Version set to 0.3.9

comment:8 Changed 15 months ago by steffenpoulsen

When this happens, the counter is usually broken elsewhere as well, i.e. in perfmon.

This procedure has been known to remedy the situation here:

1 - Stop "Windows Management Instrumentation", accept that "IP Helper" will be stopped as well.

2 - Delete %systemroot%\System32\wbem\Repository

3 - Restart OS.

Last edited 15 months ago by steffenpoulsen (previous) (diff)

comment:9 Changed 15 months ago by mickem

  • Resolution set to worksforme
  • Status changed from new to closed

Then I will close this ticket for now, if someone can fill out some more details plese reopen...

comment:10 Changed 15 months ago by mickem

Really nice to share this information BTW, as the "broken counters" are very very hard to diagnose (as they never happen on my machines)...

comment:11 Changed 14 months ago by amac

  • Resolution worksforme deleted
  • Status changed from closed to reopened

I've been having the same issue. Log is below

error:modules\CheckSystem\PDHCollector.cpp:215: Failed to query performance counters: \Processor(_total)\% Processor Time: PdhGetFormattedCounterValue? failed: A counter with a negative denominator value was detected. (800007D6)

Unfortunately carrying out the following commands hasn't fix the problem.

1 - Stop "Windows Management Instrumentation", accept that "IP Helper" will be stopped as well.

2 - Delete %systemroot%\System32\wbem\Repository

3 - Restart OS.

Any suggestions will be much appreciated.

comment:12 Changed 12 months ago by motbka

  • Cc mathew.yanovsky@… added
  • Version changed from 0.3.9 to 0.4.0-nightly

Same problem here: Windows 2008 R2 x64 Foundation (Dell PowerEdge? R210 II). Performing the steps above hasn't improved the situation.
Tried several versions: 0.3.7, 0.3.9, currently running NSClient++ 0,4,0,172 2012-05-08.
I'm getting this message pretty much every second, so the log grows quite large in less than a day:

2012-06-08 13:01:37: e:..\..\..\..\trunk\modules\CheckSystem\PDHCollector.cpp:148: Failed to query performance counters: \238(_total)\6: PdhGetFormattedCounterValue? failed {format: 1024}: -2147481642: A counter with a negative denominator value was detected.

Would love this to be fixed, until then I'll have to set up task scheduler to restart the service daily and delete the log.

Thanks!

comment:13 Changed 12 months ago by mickem

Try 174 where I applied another attempt at a fix...

comment:14 Changed 11 months ago by motbka

Installed 0.4.0.174. The error is still there, however it got a little bit better, now it appears less often, every 4 to 30 seconds. And it only starts after first Nagios test. In previous versions it popped up right after NSCP started.

Below is a small extract from nsclient.log:

2012-06-12 17:56:52: l:..\..\..\trunk\service\NSClient++.cpp:385: NSClient++ 0,4,0,174 2012-05-19 x64 booting...

2012-06-12 17:56:54: l:..\..\..\trunk\service\NSClient++.cpp:385: NSClient++ 0,4,0,174 2012-05-19 x64 booting...

2012-06-12 17:58:00: l:..\..\..\..\trunk\modules\CheckExternalScripts\CheckExternalScripts.cpp:229: Arguments: Backup Exec Remote Agent for Windows Systems

2012-06-12 17:58:00: l:..\..\..\..\trunk\modules\CheckExternalScripts\CheckExternalScripts.cpp:229: Arguments: MinWarn=10% MinCrit=5% CheckAll? FilterType?=FIXED

2012-06-12 17:58:05: e:..\..\..\..\trunk\modules\CheckSystem\PDHCollector.cpp:148: Failed to query performance counters: Failed to poll counter: : -2147481642: A counter with a negative denominator value was detected.

2012-06-12 17:58:16: e:..\..\..\..\trunk\modules\CheckSystem\PDHCollector.cpp:148: Failed to query performance counters: Failed to poll counter: : -2147481642: A counter with a negative denominator value was detected.

2012-06-12 17:58:20: e:..\..\..\..\trunk\modules\CheckSystem\PDHCollector.cpp:148: Failed to query performance counters: Failed to poll counter: : -2147481642: A counter with a negative denominator value was detected.

2012-06-12 17:58:55: e:..\..\..\..\trunk\modules\CheckSystem\PDHCollector.cpp:148: Failed to query performance counters: Failed to poll counter: : -2147481642: A counter with a negative denominator value was detected.

2012-06-12 17:59:15: e:..\..\..\..\trunk\modules\CheckSystem\PDHCollector.cpp:148: Failed to query performance counters: Failed to poll counter: : -2147481642: A counter with a negative denominator value was detected.

2012-06-12 17:59:23: e:..\..\..\..\trunk\modules\CheckSystem\PDHCollector.cpp:148: Failed to query performance counters: Failed to poll counter: : -2147481642: A counter with a negative denominator value was detected.

comment:15 Changed 11 months ago by mickem

  • Milestone changed from 0.4.1 to 0.4.0
  • Resolution set to fixed
  • Status changed from reopened to closed

I googled some more and seems a lot of people has added this to the ignore list so I have done likewise.
So hence forth (next build 176?) this problem will "magically" go away.

comment:16 Changed 11 months ago by steffenpoulsen

Thank you for the fix! :-)

In the meantime we have gotten another suggestion from Microsoft, which in the end worked around the problem by us (our problem being mainly the rapidly expanding logfiles that NSClient creates when the \Processor(_total)\% Processor Time counter fails).

This fix also means that perfmon and other tools reading the counter will produce nice graphs again.

---

1.    Locate and then click the following key in the registry: 

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\Power

2.    On the Edit menu, point to New, and then click DWORD (32-bit) Value. 

3.    Type SkipTickOverride, and then press ENTER. 

4.    On the Edit menu, click Modify. 

5.    Type 0, and then click OK. 

6.    Reboot the computer. 

---

Btw, the root cause of this problem or symptom is related to the Hyper-V hypervisor sleep / power handling mechanisms as I understand it.

Last edited 11 months ago by steffenpoulsen (previous) (diff)

comment:17 Changed 11 months ago by mickem

Then I shall update the fix and add so it "only logs" once saying hence fort disabled or some such so people can see the error exists (and try to solve it with the various workarounds) but wont get massive log data.

Thanks!!!

Michael Medin

comment:18 Changed 11 months ago by steffenpoulsen

That sounds great by me, I would like that - and thanks again! :-)

Btw: I forgot to mention, that the above registry key can be added from an administrator command prompt like this:

REG ADD "HKLM\System\CurrentControlSet\Control\Session Manager\Power" /v SkipTickOverride /t REG_DWORD /d 0 /f

comment:19 Changed 11 months ago by motbka

Thanks a lot for the registry setting workaround, works like a charm! No more performance counter error flood in the logs.

comment:20 Changed 11 months ago by justinbmann

Had the same issue on windows 7 x64 and 2008 R2 x64. Tried the 0.4.0.176 build and the errors stopped. Good work.

Can any one provide me with information on what impact the registry change has on the operating system.

Last edited 11 months ago by justinbmann (previous) (diff)

comment:21 Changed 11 months ago by mickem

Notice 176 only stops logging it doesn't fix the problem... next build will again log (but only once).
But the problem is that we are missing polled values.

For instance checkcpu will poll cpu values ever second and give you an average with this "bug" you will miss a few seconds here and there (whenever you would have gotten this message). Same for check memory if you were to poll you would not get the latest memory consumption values you would get one from a few seconds ago.

So the impact of this problem is negligible for most people...
(Apart from the logs becoming flooded)

Exactly what the registry change fix does I don't know.

Michael Medin

Note: See TracTickets for help on using tickets.