Back to Blog

Debugging R-peak Detection: How QvosAgent Investigated a NeuroKit2 Anomaly

Published: 2026-05-08 | Tags: ECG, Signal Processing, NeuroKit2, Debugging, Open Source AI


The Mystery

While analyzing real ECG data from the MIT-BIH Arrhythmia Database, a user noticed something puzzling in the visualization generated by QvosAgent: several red dots marking R-peaks did not align with the actual peaks of the ECG waveform. Some dots appeared to sit in the valleys between waves rather than at the tops.

Was this a calculation error? A bug in the algorithm? Or something more subtle?

This is exactly the kind of question that showcases what autonomous AI agents can do — not just run code, but investigate, research, compare, and resolve complex technical issues.

The Investigation

Step 1: Reproduce and Analyze

QvosAgent first reproduced the analysis using MIT-BIH Record 100 (normal sinus rhythm), a 10-second ECG segment sampled at 360 Hz. The original code used NeuroKit2's ecg_process() function:

df, info = nk.ecg_process(ecg_signal, sampling_rate=fs)
peaks = info['ECG_R_Peaks']

Comparing the detected peaks against the official MIT-BIH annotations revealed the problem:

Detected Peak Official Annotation Deviation
@2092 @2044 (Atrial premature) 133ms
@2375 @2402 (Normal) 75ms
@2686 @2706 (Normal) 56ms

The detected points were sitting in the Q-wave/S-wave valleys (negative values) instead of the R-wave peaks (positive values) — a difference of over 800mV in signal amplitude.

Step 2: Community Research

QvosAgent searched the NeuroKit2 GitHub repository and found multiple related issues:

The community consensus was clear: NeuroKit2's default gradient-based detection has known limitations with abnormal waveforms.

Step 3: Method Comparison

QvosAgent tested three different peak detection methods:

# Method 1: Direct ecg_findpeaks with neurokit method
peaks = nk.ecg_findpeaks(clean_signal, sampling_rate=fs, method='neurokit')

# Method 2: Nabian 2018 (community recommended)
peaks = nk.ecg_findpeaks(clean_signal, sampling_rate=fs, method='nabian2018')

# Method 3: Pan-Tompkins (classic algorithm)
peaks = nk.ecg_findpeaks(clean_signal, sampling_rate=fs, method='pantompkins1985')

The Surprising Result

Method Detected Matched Mean Deviation Max Deviation
NeuroKit (ecg_findpeaks) 12 12/12 0.5ms 2.8ms
Nabian2018 11 11/11 3.0ms 5.6ms
Pan-Tompkins 12 12/12 14.4ms 44.4ms
Old (ecg_process) 12 9/12 22.5ms 133.3ms

The root cause was not the algorithm itself, but the API usage pattern. When ecg_process() internally calls peak detection, it uses a different implementation path than directly calling ecg_findpeaks(). The direct call produced near-perfect results with a mean deviation of only 0.5ms.

Method Comparison

Key Takeaways

  1. Separate signal cleaning from peak detection — Use ecg_process() for cleaning, then ecg_findpeaks() with an explicit method for detection
  2. Community knowledge matters — GitHub issues revealed this was a known limitation with documented workarounds
  3. Multiple methods should be compared — Testing three algorithms revealed the best approach
  4. Autonomous investigation works — QvosAgent independently reproduced the issue, researched community discussions, compared alternatives, and identified the root cause

Best Practice Code

import neurokit2 as nk

# Step 1: Clean the signal
df, info = nk.ecg_process(ecg_signal, sampling_rate=fs)
clean_signal = df['ECG_Clean'].values

# Step 2: Detect peaks with explicit method
peaks = nk.ecg_findpeaks(clean_signal, sampling_rate=fs, method='neurokit')
r_peaks = peaks['ECG_R_Peaks']

# Alternative: Nabian 2018 method
peaks = nk.ecg_findpeaks(clean_signal, sampling_rate=fs, method='nabian2018')

This analysis was performed entirely by QvosAgent, an open-source local AI agent that autonomously investigates and resolves technical challenges. The full code and data are available for reproducibility.