Nearest Neighbor Metrics (:code:`nn_hit_rate`, :code:`nn_miss_rate`, :code:`nn_isolation`, :code:`nn_noise_overlap`)
====================================================================================================================

Calculation
-----------

There are several implementations of nearest neighbor metrics which can be used to evaluate unit quality.

When calling the :code:`compute_quality_metrics()` function, the following options are available to calculate NN metrics:

- The :code:`nearest_neighbor` option will return :code:`nn_hit_rate` and :code:`nn_miss_rate` (based on [Siegle]_ inspired by [Chung]_).
- The :code:`nn_isolation` option will return the nearest neighbor isolation metric (adapted from [Chung]_).
- The :code:`nn_noise_overlap` option will return the nearest neighbor isolation metric (adapted from [Chung]_).

All options involve non-parametric calculations in PCA space.

:code:`nearest_neighbor`
------------------------

The membership function, :math:`\rho` is defined such that for any spike :math:`g_i`` in some cluster :math:`G`, :math:`\rho(g_i) = G`.
Additionally, the nearest neighbor function :math:`n_k(g_i)` is defined such that the output of the function is the set of :math:`k` spikes which are closest to :math:`g_i`.

For a unit associated with cluster :math:`C`, a subset of spikes are randomly drawn to form the cluster  :math:`A`.
A subset of spikes which are not in  :math:`C` are drawn to form the cluster  :math:`B`.
Note that :math:`|A| = |B|`.
The NN-hit rate for  :math:`C` is then:

.. math::
    NN_{\textrm{hit}}(C) = \frac{1}{k} \sum_{i=1}^{k} \frac{ | \{x \in A  : \rho(n_i(x)) = A \} |}{ | A | }


Similarly, the NN-miss rate for :math:`C` is:

.. math::
    NN_{\textrm{miss}}(C) = \frac{1}{k} \sum_{i=1}^{k} \frac{ | \{x \in B : \rho(n_i(x)) = A \} |}{ | B | }

NN-hit rate gives an estimate of contamination (an uncontaminated unit should have a high NN-hit rate).
NN-miss rate gives an estimate of completeness.
A more complete unit should have a low NN-miss rate.

:code:`nn_isolation`
--------------------

The overall logic of this approach is to choose a cluster for which the isolation is to be computed, and compute the pairwise isolation score between the chosen cluster and every other cluster.
The isolation score is then the minimum of the pairwise scores (the worst case).

Let A and B be two clusters from sorting.
We set :math:`|A| = |B|` by subsampling as appropriate to match the size of the smaller cluster (or the :code:`max_spikes_for_nn` parameter value, if using).
We also restrict the waveforms to channels with significant signal.

The pairwise isolation between clusters A and B is then:

.. math::

    NN_{\textrm{isolation}}(A, B) = \frac{1}{k} \sum_{i=1}^{k} \frac{ | \{x \in A \cup B  : \rho(n_i(x)) = \rho(x) \} |}{ | A \cup B | }


Note that nn_isolation is affected by the size of the clusters, so setting the :code:`max_spikes_for_nn` may aid downstream comparison of scores.

:code:`nn_noise_overlap`
------------------------

A noise cluster is generated by randomly sampling voltage snippets from the recording.
Following a similar procedure to that of the nn_isolation method, compute isolation between the cluster of interest and the generated noise cluster.
noise overlap is then :math:`1 - NN_{\textrm{isolation}}`.

This metric gives an indication of the contamination present in the unit cluster.


References
----------

.. autofunction:: spikeinterface.qualitymetrics.pca_metrics.nearest_neighbors_metrics

.. autofunction:: spikeinterface.qualitymetrics.pca_metrics.nearest_neighbors_isolation

.. autofunction:: spikeinterface.qualitymetrics.pca_metrics.nearest_neighbors_noise_overlap


Literature
----------

Introduced by [Chung]_ and adapted by [Siegle]_ and Kyu Hyun Lee.