Skip to content

cross_correlation

CrossCorrelation

Compute the time-domain cross-correlation function (CCF) between two light curves or GP models.

This class supports three primary use cases:

  1. Regularly sampled LightCurve objects (mode="regular")
    Computes the CCF directly using array-based shifting of flux values. This mode requires both light curves to be sampled on the same time grid.

  2. Irregularly sampled LightCurve objects (mode="lin_interp")
    Uses the interpolated cross-correlation function (ICCF; Gaskell & Peterson 1987), which linearly interpolates each light curve onto the other's time grid to evaluate the correlation across lags. This supports fully independent time arrays.

  3. GaussianProcess models
    If both inputs are trained GP models with sampled realizations (via .sample()), the CCF is computed for each realization pair and averaged. Outputs in this case include the mean and standard deviation of the peak lag, centroid lag, and maximum correlation (rmax). This mode currently supports only mode="regular".

Optionally, Monte Carlo resampling can be used to estimate uncertainties on lag measurements. This uses: - Random Subset Selection (RSS) with replacement - Flux Randomization (FR) via Gaussian noise and is only available when using mode="lin_interp".

Parameters:

Name Type Description Default
lc_or_model1 LightCurve or GaussianProcess

First input light curve or trained GP model.

required
lc_or_model2 LightCurve or GaussianProcess

Second input light curve or trained GP model.

required
mode (regular, lin_interp)

CCF computation mode. Use "regular" for array-based shifting, or "lin_interp" for ICCF-based interpolation.

"regular"
monte_carlo bool

Whether to estimate lag uncertainties using Monte Carlo resampling (RSS + FR). Only supported with lin_interp mode.

False
n_trials int

Number of Monte Carlo trials (default: 1000).

1000
min_lag float or auto

Minimum lag to evaluate. If "auto", set to -duration / 2.

'auto'
max_lag float or auto

Maximum lag to evaluate. If "auto", set to +duration / 2.

'auto'
dt float or auto

Lag grid spacing. If "auto", uses half the native time resolution in "regular" mode, and ⅓ the mean sampling interval in "lin_interp" mode.

'auto'
centroid_threshold float

Threshold (as a fraction of peak correlation) for defining the centroid lag region.

0.8
rmax_threshold float

Trials with a maximum correlation (rmax) below this threshold are discarded when using Monte Carlo.

0.0

Attributes:

Name Type Description
lags ndarray

Array of lag values evaluated.

ccf ndarray or None

Cross-correlation coefficients. Not set when both inputs are GP models.

peak_lag float or tuple

Peak lag of the CCF. If using GPs, returns (mean, std) across realizations.

centroid_lag float or tuple

Centroid lag of the high-correlation region. If using GPs, returns (mean, std).

rmax float or tuple

Maximum correlation value. If using GPs, returns (mean, std).

peak_lags_mc ndarray or None

Peak lags from Monte Carlo trials, if enabled.

centroid_lags_mc ndarray or None

Centroid lags from Monte Carlo trials.

peak_lag_ci tuple or None

68% confidence interval (16th–84th percentile) on peak lag from MC trials.

centroid_lag_ci tuple or None

68% confidence interval on centroid lag from MC trials.

Source code in stela_toolkit/cross_correlation.py
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
class CrossCorrelation:
    """
    Compute the time-domain cross-correlation function (CCF) between two light curves or GP models.

    This class supports three primary use cases:

    1. **Regularly sampled `LightCurve` objects (`mode="regular"`)**  
       Computes the CCF directly using array-based shifting of flux values.
       This mode requires both light curves to be sampled on the same time grid.

    2. **Irregularly sampled `LightCurve` objects (`mode="lin_interp"`)**  
       Uses the interpolated cross-correlation function (ICCF; Gaskell & Peterson 1987),
       which linearly interpolates each light curve onto the other's time grid
       to evaluate the correlation across lags. This supports fully independent time arrays.

    3. **`GaussianProcess` models**  
       If both inputs are trained GP models with sampled realizations (via `.sample()`),
       the CCF is computed for each realization pair and averaged.
       Outputs in this case include the mean and standard deviation of the peak lag,
       centroid lag, and maximum correlation (`rmax`).
       This mode currently supports only `mode="regular"`.

    Optionally, Monte Carlo resampling can be used to estimate uncertainties on lag measurements.
    This uses:
        - Random Subset Selection (RSS) with replacement
        - Flux Randomization (FR) via Gaussian noise
    and is only available when using `mode="lin_interp"`.

    Parameters
    ----------
    lc_or_model1 : LightCurve or GaussianProcess
        First input light curve or trained GP model.

    lc_or_model2 : LightCurve or GaussianProcess
        Second input light curve or trained GP model.

    mode : {"regular", "lin_interp"}, optional
        CCF computation mode. Use "regular" for array-based shifting, or "lin_interp" for ICCF-based interpolation.

    monte_carlo : bool, optional
        Whether to estimate lag uncertainties using Monte Carlo resampling (RSS + FR). Only supported with lin_interp mode.

    n_trials : int, optional
        Number of Monte Carlo trials (default: 1000).

    min_lag : float or "auto", optional
        Minimum lag to evaluate. If "auto", set to `-duration / 2`.

    max_lag : float or "auto", optional
        Maximum lag to evaluate. If "auto", set to `+duration / 2`.

    dt : float or "auto", optional
        Lag grid spacing. If "auto", uses half the native time resolution in "regular" mode,
        and 1/3 the mean sampling interval in "lin_interp" mode.

    centroid_threshold : float, optional
        Threshold (as a fraction of peak correlation) for defining the centroid lag region.

    rmax_threshold : float, optional
        Trials with a maximum correlation (rmax) below this threshold are discarded when using Monte Carlo.

    Attributes
    ----------
    lags : ndarray
        Array of lag values evaluated.

    ccf : ndarray or None
        Cross-correlation coefficients. Not set when both inputs are GP models.

    peak_lag : float or tuple
        Peak lag of the CCF. If using GPs, returns (mean, std) across realizations.

    centroid_lag : float or tuple
        Centroid lag of the high-correlation region. If using GPs, returns (mean, std).

    rmax : float or tuple
        Maximum correlation value. If using GPs, returns (mean, std).

    peak_lags_mc : ndarray or None
        Peak lags from Monte Carlo trials, if enabled.

    centroid_lags_mc : ndarray or None
        Centroid lags from Monte Carlo trials.

    peak_lag_ci : tuple or None
        68% confidence interval (16th–84th percentile) on peak lag from MC trials.

    centroid_lag_ci : tuple or None
        68% confidence interval on centroid lag from MC trials.
    """


    def __init__(self,
                 lc_or_model1,
                 lc_or_model2,
                 mode="regular",
                 monte_carlo=False,
                 n_trials=1000,
                 min_lag="auto",
                 max_lag="auto",
                 dt="auto",
                 centroid_threshold=0.8,
                 rmax_threshold=0.0):

        req_reg_samp = True if mode=="regular" else False
        data1 = _CheckInputs._check_lightcurve_or_model(lc_or_model1, req_reg_samp=req_reg_samp)
        data2 = _CheckInputs._check_lightcurve_or_model(lc_or_model2, req_reg_samp=req_reg_samp)

        self.is_model1 = data1['type'] == 'model'
        self.is_model2 = data2['type'] == 'model'
        self.mode = mode
        self.monte_carlo = monte_carlo

        if self.is_model1:
            if not hasattr(lc_or_model1, 'samples'):
                raise ValueError("Model 1 must have generated samples via GP.sample().")
            self.times = lc_or_model1.pred_times
            self.rates1 = lc_or_model1.samples
        else:
            self.times, self.rates1, self.errors1 = data1['data']

        if self.is_model2:
            if not hasattr(lc_or_model2, 'samples'):
                raise ValueError("Model 2 must have generated samples via GP.sample().")
            self.times = lc_or_model2.pred_times
            self.rates2 = lc_or_model2.samples
        else:
            self.times, self.rates2, self.errors2 = data2['data']

        self.n_trials = n_trials
        self.centroid_threshold = centroid_threshold
        self.rmax_threshold = rmax_threshold

        duration = self.times[-1] - self.times[0]
        self.min_lag = -duration / 2 if min_lag=="auto" else min_lag
        self.max_lag = duration / 2 if max_lag=="auto" else max_lag

        if mode == "regular":
            times1 = data1['data'][0]
            times2 = data2['data'][0]

            # require the same time grid for regular
            if not np.allclose(times1, times2, rtol=1e-10, atol=1e-12):
                raise ValueError(
                    "In 'regular' mode, both light curves must have the same time array.\n"
                    "Use 'lin_interp' mode for irregular or mismatched time sampling."
                )

            self.dt = np.diff(self.times)[0] / 2 if dt=="auto" else dt
            self.lags = np.arange(self.min_lag, self.max_lag + self.dt, self.dt)

            if self.is_model1 and self.is_model2:
                ccfs, self.peak_lags, self.centroid_lags, self.rmaxs = [], [], [], []

                # Compute ccf and lags for each pair of realizations
                for i in range(self.rates1.shape[0]):
                    ccf = self.compute_ccf(self.rates1[i], self.rates2[i])
                    peak_lag, centroid_lag = self.find_peak_and_centroid(self.lags, ccf)
                    rmax = np.max(ccf)

                    ccfs.append(ccf)
                    self.peak_lags.append(peak_lag)
                    self.centroid_lags.append(centroid_lag)
                    self.rmaxs.append(rmax)

                # Consider use of CIs
                self.ccf = np.mean(ccfs, axis=0)
                self.peak_lag = (np.mean(self.peak_lags), np.std(self.peak_lags))
                self.centroid_lag = (np.mean(self.centroid_lags), np.std(self.centroid_lags))
                self.rmax = (np.mean(self.rmaxs), np.std(self.rmaxs))

        elif mode == "lin_interp":
            if self.is_model1 or self.is_model2:
                raise NotImplementedError("GP models are not yet supported with lin_interp mode.")

            self.times1 = data1['data'][0]
            self.times2 = data2['data'][0]

            self.dt = np.mean([np.mean(np.diff(self.times1)), np.mean(np.diff(self.times2))]) / 3 if dt == "auto" else dt
            self.lags = np.arange(self.min_lag, self.max_lag + self.dt, self.dt)

            self.ccf = self.compute_ccf_interp(self.times1, self.rates1, self.times2, self.rates2)

            self.rmax = np.max(self.ccf)
            self.peak_lag, self.centroid_lag = self.find_peak_and_centroid(self.lags, self.ccf)

        else:
            raise AttributeError(f"Invalid mode detected: {mode}. Valid modes are 'regular' and 'lin_interp'.")

        if monte_carlo:
            if np.all(self.errors1 == 0) or np.all(self.errors2 == 0):
                print("Skipping Monte Carlo: zero errors for all points in one or both light curves.")
            else:
                self.peak_lags_mc, self.centroid_lags_mc = self.run_monte_carlo()
                self.compute_confidence_intervals()

    def compute_ccf(self, rates1, rates2):
        """
        Compute the cross-correlation function (CCF) via direct shifting.

        Parameters
        ----------
        rates1 : ndarray
            First time series.

        rates2 : ndarray
            Second time series.

        Returns
        -------
        lags : ndarray
            Lag values.

        ccf : ndarray
            Pearson correlation coefficients at each lag.
        """

        ccf = []

        for lag in self.lags:
            shift = int(round(lag / self.dt))

            if shift < 0:
                x = rates1[:shift]
                y = rates2[-shift:]
            elif shift > 0:
                x = rates1[shift:]
                y = rates2[:-shift]
            else:
                x = rates1
                y = rates2

            if len(x) < 2:
                ccf.append(0.0)
            else:
                r = np.corrcoef(x, y)[0, 1]
                ccf.append(r)

        return np.array(ccf)


    def compute_ccf_interp(self, times1, rates1, times2, rates2):
        """
        Compute the cross-correlation function using linear interpolation on both light curves.

        Parameters
        ----------
        times1 : ndarray
            Time values for the first light curve.
        rates1 : ndarray
            Flux values for the first light curve.
        times2 : ndarray
            Time values for the second light curve.
        rates2 : ndarray
            Flux values for the second light curve.

        Returns
        -------
        ccf : ndarray
            Cross-correlation values at each lag.
        """
        interp1 = interp1d(times1, rates1, bounds_error=False, fill_value=0.0)
        interp2 = interp1d(times2, rates2, bounds_error=False, fill_value=0.0)
        ccf = []

        for lag in self.lags:
            t_shift1 = times1 + lag
            t_shift2 = times2 - lag

            mask1 = (t_shift1 >= times2[0]) & (t_shift1 <= times2[-1])
            mask2 = (t_shift2 >= times1[0]) & (t_shift2 <= times1[-1])

            if np.sum(mask1) < 2 or np.sum(mask2) < 2:
                ccf.append(0.0)
                continue

            r1 = np.corrcoef(rates1[mask1], interp2(t_shift1[mask1]))[0, 1]
            r2 = np.corrcoef(rates2[mask2], interp1(t_shift2[mask2]))[0, 1]
            ccf.append((r1 + r2) / 2)

        return np.array(ccf)


    def find_peak_and_centroid(self, lags, ccf):
        """
        Compute the peak and centroid lag of a cross-correlation function.

        The peak lag corresponds to the lag with the maximum correlation value.
        The centroid lag is computed using a weighted average of lag values
        in a contiguous region around the peak where the correlation exceeds
        a fraction of the peak value.

        Parameters
        ----------
        lags : ndarray
            Array of lag values (assumed sorted).

        ccf : ndarray
            Cross-correlation values at each lag.

        Returns
        -------
        peak_lag : float
            Lag corresponding to the maximum correlation.

        centroid_lag : float or np.nan
            Correlation-weighted centroid lag near the peak.
            Returns NaN if a valid centroid region cannot be identified.
        """
        if len(lags) != len(ccf) or len(ccf) == 0:
            raise ValueError("lags and ccf must be the same nonzero length")

        # Locate the peak correlation and corresponding lag
        peak_idx = np.nanargmax(ccf)
        peak_lag = lags[peak_idx]
        peak_val = ccf[peak_idx]

        # Define a local region around the peak above a fractional threshold
        threshold = self.centroid_threshold
        cutoff = threshold * peak_val

        # Expand to left of peak
        i_left = peak_idx
        while i_left > 0 and ccf[i_left - 1] >= cutoff:
            i_left -= 1

        # Expand to right of peak
        i_right = peak_idx
        while i_right < len(ccf) - 1 and ccf[i_right + 1] >= cutoff:
            i_right += 1

        # Compute centroid if region is valid
        if i_right >= i_left:
            lags_subset = lags[i_left:i_right + 1]
            ccf_subset = ccf[i_left:i_right + 1]
            weight_sum = np.sum(ccf_subset)
            if weight_sum > 0:
                centroid_lag = np.sum(lags_subset * ccf_subset) / weight_sum
            else:
                centroid_lag = np.nan
        else:
            centroid_lag = np.nan

        return peak_lag, centroid_lag


    def run_monte_carlo(self):
        """
        Run Monte Carlo simulations using RSS (only if mode='lin_interp') + FR to estimate the lag uncertainties.

        Each trial performs:

        - Random Subset Selection (RSS): Resample with replacement from each light curve
            - RSS can only be used when using linear interpolation, for which the time arrays do not need to be the same.
        - Flux Randomization (FR): Add Gaussian noise based on measurement errors
        - Discard trials with low correlation (if rmax_threshold is set)

        Returns
        -------
        peak_lags : ndarray
            Peak lag values from successful trials.

        centroid_lags : ndarray
            Centroid lag values from successful trials.
        """
        peak_lags = []
        centroid_lags = []

        n_points = len(self.times)

        for _ in range(self.n_trials):
            # Random subset selection
            if self.mode == 'lin_interp':
                idx1 = np.random.randint(0, n_points, n_points)
                unique1, counts1 = np.unique(idx1, return_counts=True)
                times1 = self.times[unique1]
                rates1 = self.rates1[unique1]
                errors1 = self.errors1[unique1] / np.sqrt(counts1) # reweighting errors based on frequency similar to bootstrap

                idx2 = np.random.randint(0, n_points, n_points)
                unique2, counts2 = np.unique(idx2, return_counts=True)
                times2 = self.times[unique2]
                rates2 = self.rates2[unique2]
                errors2 = self.errors2[unique2] / np.sqrt(counts2)

                sort1 = np.argsort(times1)
                sort2 = np.argsort(times2)
                times1, rates1, errors1 = times1[sort1], rates1[sort1], errors1[sort1]
                times2, rates2, errors2 = times2[sort2], rates2[sort2], errors2[sort2]
            else:
                rates1, errors1 = self.rates1, self.errors1
                rates2, errors2 = self.rates2, self.errors2

            # Flux randomization
            perturbed1 = np.random.normal(rates1, errors1)
            perturbed2 = np.random.normal(rates2, errors2)

            # Compute ccf
            if self.mode == "lin_interp":
                ccf = self.compute_ccf_interp(times1, perturbed1, times2, perturbed2)
            else:
                ccf = self.compute_ccf(perturbed1, perturbed2)

            # Rmax threshold filter
            peak, centroid = self.find_peak_and_centroid(self.lags, ccf)

            if np.max(ccf) < self.rmax_threshold or np.isnan(peak) or np.isnan(centroid):
                continue

            peak_lags.append(peak)
            centroid_lags.append(centroid)

        return np.array(peak_lags), np.array(centroid_lags)

    def compute_confidence_intervals(self, lower_percentile=16, upper_percentile=84):
        """
        Compute percentile-based confidence intervals for Monte Carlo lag distributions.

        Parameters
        ----------
        lower_percentile : float
            Lower percentile bound (default is 16).

        upper_percentile : float
            Upper percentile bound (default is 84).
        """

        if self.peak_lags_mc is None or self.centroid_lags_mc is None:
            print("No Monte Carlo results available to compute confidence intervals.")
            return

        self.peak_lag_ci = (
            np.percentile(self.peak_lags_mc, lower_percentile),
            np.percentile(self.peak_lags_mc, upper_percentile),
        )
        self.centroid_lag_ci = (
            np.percentile(self.centroid_lags_mc, lower_percentile),
            np.percentile(self.centroid_lags_mc, upper_percentile),
        )

    def plot(self, show_mc=True):
        """
        Plot the cross-correlation function and optional lag distributions.

        Parameters
        ----------
        show_mc : bool
            Whether to show lag distributions from GP samples or Monte Carlo trials.
        """
        # Plot the CCF
        fig, ax = plt.subplots(figsize=(8, 5))
        ax.plot(self.lags, self.ccf, label="CCF", color='black')

        if self.is_model1 and self.is_model2:
            peak_lag = self.peak_lag[0]
            peak_std = self.peak_lag[1]
            centroid_lag = self.centroid_lag[0]
            centroid_std = self.centroid_lag[1]
            label_suffix = " (GP samples)"
        else:
            peak_lag = self.peak_lag
            centroid_lag = self.centroid_lag
            peak_std = centroid_std = None
            label_suffix = ""

        ax.axvline(peak_lag, color='orange', linestyle='--',
                label=f"Peak lag = {peak_lag:.2f}")
        ax.axvline(centroid_lag, color='blue', linestyle=':',
                label=f"Centroid lag = {centroid_lag:.2f}")

        # Overlay lag distributions on same plot
        if show_mc:
            # Need to modify this to use the confidence intervals from earlier.
            # No else statement to ensure code execution for default show_mc even for no mc
            peak_data, centroid_data = None, None
            if self.monte_carlo:
                peak_data = self.peak_lags_mc
                centroid_data = self.centroid_lags_mc
                label_suffix = " (MC)"
                peak_std = np.std(peak_data)
                centroid_std = np.std(centroid_data)

            elif self.is_model1 and self.is_model2:
                peak_data = self.peak_lags
                centroid_data = self.centroid_lags

            if peak_data is not None:
                ax.hist(peak_data, bins=30, density=True, color='orange', alpha=0.3,
                        label=f"Peak lag dist{label_suffix}, σ={peak_std:.2f}", zorder=1)
            if centroid_data is not None:
                ax.hist(centroid_data, bins=30, density=True, color='blue', alpha=0.3,
                        label=f"Centroid lag dist{label_suffix}, σ={centroid_std:.2f}", zorder=1)

        ax.set_xlabel("Time Lag")
        ax.set_ylabel("Correlation Coefficient / Density")
        ax.grid(True)
        ax.legend()
        plt.tight_layout()
        plt.show()

compute_ccf(rates1, rates2)

Compute the cross-correlation function (CCF) via direct shifting.

Parameters:

Name Type Description Default
rates1 ndarray

First time series.

required
rates2 ndarray

Second time series.

required

Returns:

Name Type Description
lags ndarray

Lag values.

ccf ndarray

Pearson correlation coefficients at each lag.

Source code in stela_toolkit/cross_correlation.py
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
def compute_ccf(self, rates1, rates2):
    """
    Compute the cross-correlation function (CCF) via direct shifting.

    Parameters
    ----------
    rates1 : ndarray
        First time series.

    rates2 : ndarray
        Second time series.

    Returns
    -------
    lags : ndarray
        Lag values.

    ccf : ndarray
        Pearson correlation coefficients at each lag.
    """

    ccf = []

    for lag in self.lags:
        shift = int(round(lag / self.dt))

        if shift < 0:
            x = rates1[:shift]
            y = rates2[-shift:]
        elif shift > 0:
            x = rates1[shift:]
            y = rates2[:-shift]
        else:
            x = rates1
            y = rates2

        if len(x) < 2:
            ccf.append(0.0)
        else:
            r = np.corrcoef(x, y)[0, 1]
            ccf.append(r)

    return np.array(ccf)

compute_ccf_interp(times1, rates1, times2, rates2)

Compute the cross-correlation function using linear interpolation on both light curves.

Parameters:

Name Type Description Default
times1 ndarray

Time values for the first light curve.

required
rates1 ndarray

Flux values for the first light curve.

required
times2 ndarray

Time values for the second light curve.

required
rates2 ndarray

Flux values for the second light curve.

required

Returns:

Name Type Description
ccf ndarray

Cross-correlation values at each lag.

Source code in stela_toolkit/cross_correlation.py
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
def compute_ccf_interp(self, times1, rates1, times2, rates2):
    """
    Compute the cross-correlation function using linear interpolation on both light curves.

    Parameters
    ----------
    times1 : ndarray
        Time values for the first light curve.
    rates1 : ndarray
        Flux values for the first light curve.
    times2 : ndarray
        Time values for the second light curve.
    rates2 : ndarray
        Flux values for the second light curve.

    Returns
    -------
    ccf : ndarray
        Cross-correlation values at each lag.
    """
    interp1 = interp1d(times1, rates1, bounds_error=False, fill_value=0.0)
    interp2 = interp1d(times2, rates2, bounds_error=False, fill_value=0.0)
    ccf = []

    for lag in self.lags:
        t_shift1 = times1 + lag
        t_shift2 = times2 - lag

        mask1 = (t_shift1 >= times2[0]) & (t_shift1 <= times2[-1])
        mask2 = (t_shift2 >= times1[0]) & (t_shift2 <= times1[-1])

        if np.sum(mask1) < 2 or np.sum(mask2) < 2:
            ccf.append(0.0)
            continue

        r1 = np.corrcoef(rates1[mask1], interp2(t_shift1[mask1]))[0, 1]
        r2 = np.corrcoef(rates2[mask2], interp1(t_shift2[mask2]))[0, 1]
        ccf.append((r1 + r2) / 2)

    return np.array(ccf)

compute_confidence_intervals(lower_percentile=16, upper_percentile=84)

Compute percentile-based confidence intervals for Monte Carlo lag distributions.

Parameters:

Name Type Description Default
lower_percentile float

Lower percentile bound (default is 16).

16
upper_percentile float

Upper percentile bound (default is 84).

84
Source code in stela_toolkit/cross_correlation.py
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
def compute_confidence_intervals(self, lower_percentile=16, upper_percentile=84):
    """
    Compute percentile-based confidence intervals for Monte Carlo lag distributions.

    Parameters
    ----------
    lower_percentile : float
        Lower percentile bound (default is 16).

    upper_percentile : float
        Upper percentile bound (default is 84).
    """

    if self.peak_lags_mc is None or self.centroid_lags_mc is None:
        print("No Monte Carlo results available to compute confidence intervals.")
        return

    self.peak_lag_ci = (
        np.percentile(self.peak_lags_mc, lower_percentile),
        np.percentile(self.peak_lags_mc, upper_percentile),
    )
    self.centroid_lag_ci = (
        np.percentile(self.centroid_lags_mc, lower_percentile),
        np.percentile(self.centroid_lags_mc, upper_percentile),
    )

find_peak_and_centroid(lags, ccf)

Compute the peak and centroid lag of a cross-correlation function.

The peak lag corresponds to the lag with the maximum correlation value. The centroid lag is computed using a weighted average of lag values in a contiguous region around the peak where the correlation exceeds a fraction of the peak value.

Parameters:

Name Type Description Default
lags ndarray

Array of lag values (assumed sorted).

required
ccf ndarray

Cross-correlation values at each lag.

required

Returns:

Name Type Description
peak_lag float

Lag corresponding to the maximum correlation.

centroid_lag float or nan

Correlation-weighted centroid lag near the peak. Returns NaN if a valid centroid region cannot be identified.

Source code in stela_toolkit/cross_correlation.py
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
def find_peak_and_centroid(self, lags, ccf):
    """
    Compute the peak and centroid lag of a cross-correlation function.

    The peak lag corresponds to the lag with the maximum correlation value.
    The centroid lag is computed using a weighted average of lag values
    in a contiguous region around the peak where the correlation exceeds
    a fraction of the peak value.

    Parameters
    ----------
    lags : ndarray
        Array of lag values (assumed sorted).

    ccf : ndarray
        Cross-correlation values at each lag.

    Returns
    -------
    peak_lag : float
        Lag corresponding to the maximum correlation.

    centroid_lag : float or np.nan
        Correlation-weighted centroid lag near the peak.
        Returns NaN if a valid centroid region cannot be identified.
    """
    if len(lags) != len(ccf) or len(ccf) == 0:
        raise ValueError("lags and ccf must be the same nonzero length")

    # Locate the peak correlation and corresponding lag
    peak_idx = np.nanargmax(ccf)
    peak_lag = lags[peak_idx]
    peak_val = ccf[peak_idx]

    # Define a local region around the peak above a fractional threshold
    threshold = self.centroid_threshold
    cutoff = threshold * peak_val

    # Expand to left of peak
    i_left = peak_idx
    while i_left > 0 and ccf[i_left - 1] >= cutoff:
        i_left -= 1

    # Expand to right of peak
    i_right = peak_idx
    while i_right < len(ccf) - 1 and ccf[i_right + 1] >= cutoff:
        i_right += 1

    # Compute centroid if region is valid
    if i_right >= i_left:
        lags_subset = lags[i_left:i_right + 1]
        ccf_subset = ccf[i_left:i_right + 1]
        weight_sum = np.sum(ccf_subset)
        if weight_sum > 0:
            centroid_lag = np.sum(lags_subset * ccf_subset) / weight_sum
        else:
            centroid_lag = np.nan
    else:
        centroid_lag = np.nan

    return peak_lag, centroid_lag

plot(show_mc=True)

Plot the cross-correlation function and optional lag distributions.

Parameters:

Name Type Description Default
show_mc bool

Whether to show lag distributions from GP samples or Monte Carlo trials.

True
Source code in stela_toolkit/cross_correlation.py
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
def plot(self, show_mc=True):
    """
    Plot the cross-correlation function and optional lag distributions.

    Parameters
    ----------
    show_mc : bool
        Whether to show lag distributions from GP samples or Monte Carlo trials.
    """
    # Plot the CCF
    fig, ax = plt.subplots(figsize=(8, 5))
    ax.plot(self.lags, self.ccf, label="CCF", color='black')

    if self.is_model1 and self.is_model2:
        peak_lag = self.peak_lag[0]
        peak_std = self.peak_lag[1]
        centroid_lag = self.centroid_lag[0]
        centroid_std = self.centroid_lag[1]
        label_suffix = " (GP samples)"
    else:
        peak_lag = self.peak_lag
        centroid_lag = self.centroid_lag
        peak_std = centroid_std = None
        label_suffix = ""

    ax.axvline(peak_lag, color='orange', linestyle='--',
            label=f"Peak lag = {peak_lag:.2f}")
    ax.axvline(centroid_lag, color='blue', linestyle=':',
            label=f"Centroid lag = {centroid_lag:.2f}")

    # Overlay lag distributions on same plot
    if show_mc:
        # Need to modify this to use the confidence intervals from earlier.
        # No else statement to ensure code execution for default show_mc even for no mc
        peak_data, centroid_data = None, None
        if self.monte_carlo:
            peak_data = self.peak_lags_mc
            centroid_data = self.centroid_lags_mc
            label_suffix = " (MC)"
            peak_std = np.std(peak_data)
            centroid_std = np.std(centroid_data)

        elif self.is_model1 and self.is_model2:
            peak_data = self.peak_lags
            centroid_data = self.centroid_lags

        if peak_data is not None:
            ax.hist(peak_data, bins=30, density=True, color='orange', alpha=0.3,
                    label=f"Peak lag dist{label_suffix}, σ={peak_std:.2f}", zorder=1)
        if centroid_data is not None:
            ax.hist(centroid_data, bins=30, density=True, color='blue', alpha=0.3,
                    label=f"Centroid lag dist{label_suffix}, σ={centroid_std:.2f}", zorder=1)

    ax.set_xlabel("Time Lag")
    ax.set_ylabel("Correlation Coefficient / Density")
    ax.grid(True)
    ax.legend()
    plt.tight_layout()
    plt.show()

run_monte_carlo()

Run Monte Carlo simulations using RSS (only if mode='lin_interp') + FR to estimate the lag uncertainties.

Each trial performs:

  • Random Subset Selection (RSS): Resample with replacement from each light curve
    • RSS can only be used when using linear interpolation, for which the time arrays do not need to be the same.
  • Flux Randomization (FR): Add Gaussian noise based on measurement errors
  • Discard trials with low correlation (if rmax_threshold is set)

Returns:

Name Type Description
peak_lags ndarray

Peak lag values from successful trials.

centroid_lags ndarray

Centroid lag values from successful trials.

Source code in stela_toolkit/cross_correlation.py
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
def run_monte_carlo(self):
    """
    Run Monte Carlo simulations using RSS (only if mode='lin_interp') + FR to estimate the lag uncertainties.

    Each trial performs:

    - Random Subset Selection (RSS): Resample with replacement from each light curve
        - RSS can only be used when using linear interpolation, for which the time arrays do not need to be the same.
    - Flux Randomization (FR): Add Gaussian noise based on measurement errors
    - Discard trials with low correlation (if rmax_threshold is set)

    Returns
    -------
    peak_lags : ndarray
        Peak lag values from successful trials.

    centroid_lags : ndarray
        Centroid lag values from successful trials.
    """
    peak_lags = []
    centroid_lags = []

    n_points = len(self.times)

    for _ in range(self.n_trials):
        # Random subset selection
        if self.mode == 'lin_interp':
            idx1 = np.random.randint(0, n_points, n_points)
            unique1, counts1 = np.unique(idx1, return_counts=True)
            times1 = self.times[unique1]
            rates1 = self.rates1[unique1]
            errors1 = self.errors1[unique1] / np.sqrt(counts1) # reweighting errors based on frequency similar to bootstrap

            idx2 = np.random.randint(0, n_points, n_points)
            unique2, counts2 = np.unique(idx2, return_counts=True)
            times2 = self.times[unique2]
            rates2 = self.rates2[unique2]
            errors2 = self.errors2[unique2] / np.sqrt(counts2)

            sort1 = np.argsort(times1)
            sort2 = np.argsort(times2)
            times1, rates1, errors1 = times1[sort1], rates1[sort1], errors1[sort1]
            times2, rates2, errors2 = times2[sort2], rates2[sort2], errors2[sort2]
        else:
            rates1, errors1 = self.rates1, self.errors1
            rates2, errors2 = self.rates2, self.errors2

        # Flux randomization
        perturbed1 = np.random.normal(rates1, errors1)
        perturbed2 = np.random.normal(rates2, errors2)

        # Compute ccf
        if self.mode == "lin_interp":
            ccf = self.compute_ccf_interp(times1, perturbed1, times2, perturbed2)
        else:
            ccf = self.compute_ccf(perturbed1, perturbed2)

        # Rmax threshold filter
        peak, centroid = self.find_peak_and_centroid(self.lags, ccf)

        if np.max(ccf) < self.rmax_threshold or np.isnan(peak) or np.isnan(centroid):
            continue

        peak_lags.append(peak)
        centroid_lags.append(centroid)

    return np.array(peak_lags), np.array(centroid_lags)