Analyze Neuropixels datasets ============================ This example shows how to perform Neuropixels-specific analysis, including custom pre- and post-processing. .. code:: ipython3 %matplotlib inline .. code:: ipython3 import spikeinterface.full as si import numpy as np import matplotlib.pyplot as plt from pathlib import Path .. code:: ipython3 base_folder = Path('/mnt/data/sam/DataSpikeSorting/howto_si/neuropixel_example/') spikeglx_folder = base_folder / 'Rec_1_10_11_2021_g0' Read the data ------------- The ``SpikeGLX`` folder can contain several “streams” (AP, LF and NIDQ). We need to specify which one to read: .. code:: ipython3 stream_names, stream_ids = si.get_neo_streams('spikeglx', spikeglx_folder) stream_names .. parsed-literal:: ['imec0.ap', 'nidq', 'imec0.lf'] .. code:: ipython3 # we do not load the sync channel, so the probe is automatically loaded raw_rec = si.read_spikeglx(spikeglx_folder, stream_name='imec0.ap', load_sync_channel=False) raw_rec .. parsed-literal:: SpikeGLXRecordingExtractor: 384 channels - 30.0kHz - 1 segments - 34,145,070 samples 1,138.15s (18.97 minutes) - int16 dtype - 24.42 GiB .. code:: ipython3 # we automaticaly have the probe loaded! raw_rec.get_probe().to_dataframe() .. raw:: html

	x	y	contact_shapes	width	shank_ids	contact_ids
0	16.0	0.0	square	12.0		e0
1	48.0	0.0	square	12.0		e1
2	0.0	20.0	square	12.0		e2
3	32.0	20.0	square	12.0		e3
4	16.0	40.0	square	12.0		e4
...	...	...	...	...	...	...
379	32.0	3780.0	square	12.0		e379
380	16.0	3800.0	square	12.0		e380
381	48.0	3800.0	square	12.0		e381
382	0.0	3820.0	square	12.0		e382
383	32.0	3820.0	square	12.0		e383

384 rows × 6 columns

.. code:: ipython3 fig, ax = plt.subplots(figsize=(15, 10)) si.plot_probe_map(raw_rec, ax=ax, with_channel_ids=True) ax.set_ylim(-100, 100) .. parsed-literal:: (-100.0, 100.0) .. image:: analyze_neuropixels_files/analyze_neuropixels_8_1.png Preprocess the recording ------------------------ Let’s do something similar to the IBL destriping chain (See :ref:``ibl_destripe``) to preprocess the data but: - instead of interpolating bad channels, we remove then. - instead of highpass_spatial_filter() we use common_reference() .. code:: ipython3 rec1 = si.highpass_filter(raw_rec, freq_min=400.) bad_channel_ids, channel_labels = si.detect_bad_channels(rec1) rec2 = rec1.remove_channels(bad_channel_ids) print('bad_channel_ids', bad_channel_ids) rec3 = si.phase_shift(rec2) rec4 = si.common_reference(rec3, operator="median", reference="global") rec = rec4 rec .. parsed-literal:: bad_channel_ids ['imec0.ap#AP191'] .. parsed-literal:: CommonReferenceRecording: 383 channels - 30.0kHz - 1 segments - 34,145,070 samples 1,138.15s (18.97 minutes) - int16 dtype - 24.36 GiB Visualize the preprocessing steps --------------------------------- Interactive explore the preprocess steps could de done with this with the ipywydgets interactive ploter .. code:: python %matplotlib widget si.plot_traces({'filter':rec1, 'cmr': rec4}, backend='ipywidgets') Note that using this ipywydgets make possible to explore diffrents preprocessing chain wihtout to save the entire file to disk. Everything is lazy, so you can change the previsous cell (parameters, step order, …) and visualize it immediatly. .. code:: ipython3 # here we use static plot using matplotlib backend fig, axs = plt.subplots(ncols=3, figsize=(20, 10)) si.plot_traces(rec1, backend='matplotlib', clim=(-50, 50), ax=axs[0]) si.plot_traces(rec4, backend='matplotlib', clim=(-50, 50), ax=axs[1]) si.plot_traces(rec, backend='matplotlib', clim=(-50, 50), ax=axs[2]) for i, label in enumerate(('filter', 'cmr', 'final')): axs[i].set_title(label) .. image:: analyze_neuropixels_files/analyze_neuropixels_13_0.png .. code:: ipython3 # plot some channels fig, ax = plt.subplots(figsize=(20, 10)) some_chans = rec.channel_ids[[100, 150, 200, ]] si.plot_traces({'filter':rec1, 'cmr': rec4}, backend='matplotlib', mode='line', ax=ax, channel_ids=some_chans) .. parsed-literal:: .. image:: analyze_neuropixels_files/analyze_neuropixels_14_1.png Should we save the preprocessed data to a binary file? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Depending on the machine, the I/O speed, and the number of times we will need to “use” the preprocessed recording, we can decide whether it is convenient to save the preprocessed recording to a file. Saving is not necessarily a good choice, as it consumes a lot of disk space and sometimes the writing to disk can be slower than recomputing the preprocessing chain on-the-fly. Here, we decide to do save it because Kilosort requires a binary file as input, so the preprocessed recording will need to be saved at some point. Depending on the complexity of the preprocessing chain, this operation can take a while. However, we can make use of the powerful parallelization mechanism of SpikeInterface. .. code:: ipython3 job_kwargs = dict(n_jobs=40, chunk_duration='1s', progress_bar=True) rec = rec.save(folder=base_folder / 'preprocess', format='binary', **job_kwargs) .. code:: ipython3 # our recording now points to the new binary folder rec .. parsed-literal:: BinaryFolderRecording: 383 channels - 30.0kHz - 1 segments - 34,145,070 samples 1,138.15s (18.97 minutes) - int16 dtype - 24.36 GiB Check spiking activity and drift before spike sorting ----------------------------------------------------- A good practice before running a spike sorter is to check the “peaks activity” and the presence of drifts. SpikeInterface has several tools to: - estimate the noise levels - detect peaks (prior to sorting) - estimate positions of peaks Check noise level ~~~~~~~~~~~~~~~~~ Noise levels can be estimated on the scaled traces or on the raw (``int16``) traces. .. code:: ipython3 # we can estimate the noise on the scaled traces (microV) or on the raw one (which is in our case int16). noise_levels_microV = si.get_noise_levels(rec, return_scaled=True) noise_levels_int16 = si.get_noise_levels(rec, return_scaled=False) .. code:: ipython3 fig, ax = plt.subplots() _ = ax.hist(noise_levels_microV, bins=np.arange(5, 30, 2.5)) ax.set_xlabel('noise [microV]') .. parsed-literal:: Text(0.5, 0, 'noise [microV]') .. image:: analyze_neuropixels_files/analyze_neuropixels_21_1.png Detect and localize peaks ~~~~~~~~~~~~~~~~~~~~~~~~~ SpikeInterface includes built-in algorithms to detect peaks and also to localize their position. This is part of the **sortingcomponents** module and needs to be imported explicitly. The two functions (detect + localize): - can be run parallel - are very fast when the preprocessed recording is already saved (and a bit slower otherwise) - implement several methods Let’s use here the ``locally_exclusive`` method for detection and the ``center_of_mass`` for peak localization: .. code:: ipython3 from spikeinterface.sortingcomponents.peak_detection import detect_peaks job_kwargs = dict(n_jobs=40, chunk_duration='1s', progress_bar=True) peaks = detect_peaks(rec, method='locally_exclusive', noise_levels=noise_levels_int16, detect_threshold=5, radius_um=50., **job_kwargs) peaks .. parsed-literal:: detect peaks: 0%| | 0/1139 [00:00 .. image:: analyze_neuropixels_files/analyze_neuropixels_26_1.png .. code:: ipython3 # we can also use the peak location estimates to have an insight of cluster separation before sorting fig, ax = plt.subplots(figsize=(15, 10)) si.plot_probe_map(rec, ax=ax, with_channel_ids=True) ax.set_ylim(-100, 150) ax.scatter(peak_locations['x'], peak_locations['y'], color='purple', alpha=0.002) .. parsed-literal:: .. image:: analyze_neuropixels_files/analyze_neuropixels_27_1.png Run a spike sorter ------------------ Even if running spike sorting is probably the most critical part of the pipeline, in SpikeInterface this is dead-simple: one function. **Important notes**: - most of sorters are wrapped from external tools (kilosort, kisolort2.5, spykingcircus, montainsort4 …) that often also need other requirements (e.g., MATLAB, CUDA) - some sorters are internally developed (spyekingcircus2) - external sorter can be run inside a container (docker, singularity) WITHOUT pre-installation Please carwfully read the ``spikeinterface.sorters`` documentation for more information. In this example: - we will run kilosort2.5 - we apply no drift correction (because we don’t have drift) - we use the docker image because we don’t want to pay for MATLAB :) .. code:: ipython3 # check default params for kilosort2.5 si.get_default_sorter_params('kilosort2_5') .. parsed-literal:: {'detect_threshold': 6, 'projection_threshold': [10, 4], 'preclust_threshold': 8, 'car': True, 'minFR': 0.1, 'minfr_goodchannels': 0.1, 'nblocks': 5, 'sig': 20, 'freq_min': 150, 'sigmaMask': 30, 'nPCs': 3, 'ntbuff': 64, 'nfilt_factor': 4, 'NT': None, 'do_correction': True, 'wave_length': 61, 'keep_good_only': False, 'n_jobs': 40, 'chunk_duration': '1s', 'progress_bar': True} .. code:: ipython3 # run kilosort2.5 without drift correction params_kilosort2_5 = {'do_correction': False} sorting = si.run_sorter('kilosort2_5', rec, output_folder=base_folder / 'kilosort2.5_output', docker_image=True, verbose=True, **params_kilosort2_5) .. code:: ipython3 # the results can be read back for futur session sorting = si.read_sorter_folder(base_folder / 'kilosort2.5_output') .. code:: ipython3 # here we have 31 untis in our recording sorting .. parsed-literal:: KiloSortSortingExtractor: 31 units - 1 segments - 30.0kHz Post processing --------------- All the postprocessing step is based on the **SortingAnalyzer** object. This object combines a ``sorting`` and a ``recording`` object. It will also help to run some computation aka “extensions” to get an insight on the qulity of units. The first extentions we will run are: \* select some spikes per units \* etxract waveforms \* compute templates \* compute noise levels Note that we use the ``sparse=True`` option. This option is important because the waveforms will be extracted only for a few channels around the main channel of each unit. This saves tons of disk space and speeds up the waveforms extraction and further processing. Note that our object is not persistent to disk because we use ``format="memory"`` we could use ``format="binary_folder"`` or ``format="zarr"``. .. code:: ipython3 analyzer = si.create_sorting_analyzer(sorting, rec, sparse=True, format="memory") analyzer .. parsed-literal:: estimate_sparsity: 0%| | 0/1139 [00:00`__ for more information and a list of all supported metrics. Some metrics are based on PCA (like ``'isolation_distance', 'l_ratio', 'd_prime'``) and require to estimate PCA for their computation. This can be achieved with: ``analyzer.compute("principal_components")`` .. code:: ipython3 metric_names=['firing_rate', 'presence_ratio', 'snr', 'isi_violation', 'amplitude_cutoff'] # metrics = analyzer.compute("quality_metrics").get_data() # equivalent to metrics = si.compute_quality_metrics(analyzer, metric_names=metric_names) metrics .. parsed-literal:: /home/samuel.garcia/Documents/SpikeInterface/spikeinterface/src/spikeinterface/qualitymetrics/misc_metrics.py:846: UserWarning: Some units have too few spikes : amplitude_cutoff is set to NaN warnings.warn(f"Some units have too few spikes : amplitude_cutoff is set to NaN") .. raw:: html

	amplitude_cutoff	firing_rate	isi_violations_ratio	isi_violations_count	presence_ratio	snr
0	0.011528	0.798668	4.591436	10.0	1.000000	1.430458
1	0.000062	9.886262	5.333802	1780.0	1.000000	1.938214
2	0.002567	2.849373	3.859813	107.0	1.000000	1.586939
3	0.000099	5.404408	3.519589	351.0	1.000000	2.073651
4	0.001487	4.772678	3.947255	307.0	1.000000	1.595303
5	0.001190	1.802055	6.403293	71.0	1.000000	2.411436
6	0.003508	0.531567	94.320694	91.0	0.888889	3.377035
7	0.000119	5.400015	0.612662	61.0	1.000000	4.631496
8	0.000265	10.563680	0.073487	28.0	1.000000	8.178637
9	0.000968	8.181734	0.730646	167.0	1.000000	3.900670
10	0.000259	16.839682	0.298477	289.0	1.000000	5.044798
11	NaN	0.007029	0.000000	0.0	0.388889	4.032886
12	0.000264	10.184115	0.720070	255.0	1.000000	4.767068
13	NaN	0.005272	0.000000	0.0	0.222222	4.627749
14	0.000371	10.047929	0.771631	266.0	1.000000	5.185702
15	NaN	0.107192	0.000000	0.0	0.888889	4.248180
16	0.000452	0.535081	8.183362	8.0	0.944444	2.309993
17	0.000196	4.650550	6.391673	472.0	1.000000	2.064208
18	NaN	0.077319	293.942411	6.0	0.722222	6.619197
19	0.000053	7.088728	5.146421	883.0	1.000000	2.057868
20	0.000071	9.821244	5.322676	1753.0	1.000000	1.688922
21	NaN	0.046567	405.178005	3.0	0.666667	5.899876
22	NaN	0.094891	65.051727	2.0	0.722222	6.476350
23	0.002927	1.849501	13.699103	160.0	1.000000	2.282473
24	0.003143	1.420733	4.352889	30.0	1.000000	1.573989
25	0.002457	0.675661	56.455510	88.0	0.944444	4.107643
26	0.003152	0.642273	2.129918	3.0	1.000000	1.902601
27	0.000229	1.012173	6.860924	24.0	0.888889	1.854307
28	0.002856	0.804818	38.433003	85.0	0.888889	3.755829
29	0.002854	1.012173	1.143487	4.0	1.000000	1.345607
30	0.005439	0.649302	63.910953	92.0	0.888889	4.168347

Curation using metrics ---------------------- A very common curation approach is to threshold these metrics to select *good* units: .. code:: ipython3 amplitude_cutoff_thresh = 0.1 isi_violations_ratio_thresh = 1 presence_ratio_thresh = 0.9 our_query = f"(amplitude_cutoff < {amplitude_cutoff_thresh}) & (isi_violations_ratio < {isi_violations_ratio_thresh}) & (presence_ratio > {presence_ratio_thresh})" print(our_query) .. parsed-literal:: (amplitude_cutoff < 0.1) & (isi_violations_ratio < 1) & (presence_ratio > 0.9) .. code:: ipython3 keep_units = metrics.query(our_query) keep_unit_ids = keep_units.index.values keep_unit_ids .. parsed-literal:: array([ 7, 8, 9, 10, 12, 14]) Export final results to disk folder and visulize with sortingview ----------------------------------------------------------------- In order to export the final results we need to make a copy of the the waveforms, but only for the selected units (so we can avoid to compute them again). .. code:: ipython3 analyzer_clean = analyzer.select_units(keep_unit_ids, folder=base_folder / 'analyzer_clean', format='binary_folder') .. code:: ipython3 analyzer_clean .. parsed-literal:: SortingAnalyzer: 383 channels - 6 units - 1 segments - binary_folder - sparse - has recording Loaded 9 extensions: random_spikes, waveforms, templates, noise_levels, correlograms, unit_locations, spike_amplitudes, template_similarity, quality_metrics Then we export figures to a report folder .. code:: ipython3 # export spike sorting report to a folder si.export_report(analyzer_clean, base_folder / 'report', format='png') .. code:: ipython3 analyzer_clean = si.load_sorting_analyzer(base_folder / 'analyzer_clean') analyzer_clean .. parsed-literal:: SortingAnalyzer: 383 channels - 6 units - 1 segments - binary_folder - sparse - has recording Loaded 9 extensions: random_spikes, waveforms, templates, noise_levels, template_similarity, spike_amplitudes, correlograms, unit_locations, quality_metrics And push the results to sortingview webased viewer .. code:: python si.plot_sorting_summary(analyzer_clean, backend='sortingview')