Open Computational Tools for Single-Molecule Spectroscopy (Open compSMFS)

Open CompSMFS

The ability to probe molecular features one molecule at a time has enhanced our understanding of how bio-macromolecules function. It has certainly allowed us to identify and characterize conformational states of proteins, RNA, protein-protein & protein-nucleic acid complexes, even in cases multiple different conformational states and low abundance conformational states.

Single-molecule fluorescence spectroscopy (SMFS) encompasses a family of techniques in which fluorescence signals from single molecules are detected and analyzed to study its structure & structural dynamics features. There are many realizations for SMFS experiments: probing fluorescence from immobilized single molecules using TIRF microscopy and a fast camera, probing fluorescence from freely-diffusing single molecules using confocal microscopy and fast single-photon detectors, and hybrids of the last two realizations. Different realizations are suited for different applications depending on what features are studied, and how rare they might occur. However, the diversity of SMFS is not dictated only by the different applications and their relevant experimental applications, but also on how experimental data is being analyzed. 

As an example, single-molecule FRET (smFRET) allows to sort different conformational states of a studied macromolecule according to the efficiency of excitation energy transfer between a donor dye & an acceptor dye labeling 2 specific positions in the macromolecule. In the last decade, thetre have been an increasing number of research groups who started to quantify accurate inter-dye distances per each conformational state, and with enough measured pairs of labeling positions, the inter-dye distance information has ben used as restraints in building structural models that characterize these conformational states. Many different groups analyze their experimental smFRET data in many different ways for: 1) identifying FRET efficiency-based sub-populations, 2) identifying which FRET-dependent subpopulation represents a single conformational state and which is a time-average of more than one conformational state, rapidly interconverting, 3) how to translate the mean FRET efficiency of a given conformational state to distance information & 4) how to properly use the distance information to study something about the underlying structural ensemble of the conformational state. Although there are now many research groups that have many different ways to analyze their data, there is nowadays a wwPDB task-force dedicated to standardizing the smFRET analysis to be accepted as spatial information on wwPDB structures. There is even a nonuniformity between different groups as for the file format and data type of their raw data. 

In the spirit of open Science, open code & open data, the Weiss lab (Prof. Shimon Weiss, Dr. Antonino Ingargiola, Dr. Xavier Michalet, Dr. Ted Laurence, Dr. Eitan Lerner and others) have initiated the "Open Computational Tools for Single Molecule Spectroscopy" project (see our reports in ResearchGate). This is a web platform that helps spread the word about the effort to change the way we exchange data, code, file formats, etc. In this platform we also focused on our own contributions, which are summarized below:

FRETbursts

We have provided the community with FRETbursts, an open-source Python-based software for the analysis of confocal-based freely-diffusing SMFS. We have provided the most up-to-date algorithms for background estimation, burst search & filtration, plotting function and other analysis tools. The code is readily available to anyone who would like to implement it or to even add their own features. FRETbursts can be run on Jupyter notebooks, which provide a web-interface that can be easily saved and used as documentation of data analysis.

The concepts of FRETbursts are summarized in our PLoS One software paper and the code & the software documentation are also readily available.

FRETbursts

An example of a 2D FRET-Stoichiometry (E-S) map. Each dot in the plot is a single-molecule burst with a given value of FRET efficiency (E) and Donor-Acceptor dye stoichiometry (S).

The plot shows the power of single-molecule experiments have to sort single molecules on the basis of different properties (here E & S values) and their sub-populations.

The plot combines a scatter plot (to see the value of each single-molecule burst) and a contour plot (just without the smoothing a contour plot creates).

MC-DEPI

Monte-Carlo Diffusion-Enhanced Photon Inference (MC-DEPI) is a new methodology developed in the Weiss laboratory (main contributors: Prof. Shimon Weiss, Dr. Antonino Ingargiola & Dr. Eitan Lerner), that allows both simulating smFRET experiments as well as analyzing them, on the basis of an underlying model that includes: 1) number of conformational states, 2) the donor-acceptor distance distributions and 3) the relative diffusion coefficient between the donor & the acceptor dyes.

If you are planning an smFRET experiment and would like not only to identify conformational sub-populations, but also characterize the donor-acceptor distance information of each conformational state, MC-DEPI is for you! What's new in this technique is that it allows decoupling the effect of donor-acceptor distance dynamics on FRET from the FRET photophysics. Doing so, one gets a higher accuracy in the resolved donor-acceptor distance distributions. This decoupling can allow the analyis of many other complex types of experiments that involve effects on FRET both from the donor-acceptor distance dynamics and from complex photophysics (for instance for accurate analysis of three distances and how they depend on each other in a 3-color single-molecule FRET experiment).

The code and sample Python notebooks of MC-DEPI are available here. The methodology is explained in our new paper in J. Phys. Chem. B.

MC-DEPI - retrieving the distance distribution between 17 base-pair separated dyes

The experimental results of measuring smFRET (in grey) between a donor dye and an acceptor dye labeling DNA bases separated by 17 bp. Shown as burst-wise FRET histogram (bottom-left) and as donor & acceptor fluorescence decays (top- & bottom-right, respectively). In blue one can see the best fitted model using MC-DEPI to retrieve the donor-acceptor distance distribution in equilibrium (top-right) and after donor de-excitation (center-right).

While smFRET provide information on the distance between the donor & acceptor dyes when the donor was de-excited, MC-DEPI helps recovering the donor-acceptor distance distribution in equilibrium!

Phconvert

As mentioned above, there are currently many different file formats for the raw data coming from SMFS measurements (LabView file formats, time-tagged-tiome-resolved - TTTR - files formats from Becker & Hickl, PicoQuant, etc). This stems partly from the many different experimental setups that provide many different data acquisition platforms. In the spirit of open science, the raw data of experiments should be provided. However, it will be hard to communicate Science between different groups using different files & file formats. For that purpose, Dr. Antonino Ingargiola, Dr. Ted Laurence, Dr. Robert Boutelle, Prof. Shimon Weiss & Dr. Xavier Michalet adapted the successful HDF5 file format, used and well-validated by the particle physics community, and accomodated to diffusion-based SMFS raw data, to create the Photon HDF5 file format. 

The concepts of phconvert and the Photon HDF5 file format are summarized in a paper in Antonino Ingargiola's Biophys. J. paper and the code & the software documentation are also readily available.

Pycorrelate

Fluorescence correlation spectroscopy (FCS) is a technique that allows measurements of many features of fluorescently-labeled molecules: molecular size & shape (from diffusion coefficient & hydrodynamic radius), type of molecular transport (3D-diffusion, 2D-diffusion, subdiffusion, active transport), binding/unbinding dynamics, conformational & orientational dynamics as well as photophysical phenomena (a nice review about FCS can be given here). It is, in many ways, a method that complements dynakic light scattering (DLS) experiments. The abovementioned dynamic information is extracted in FCS from fluorescence correlation curves. There are many types of software to analyze FCS data and retrieve dynamic information, however some of them are only available commercially, and others are either not publically available, or hard to use.

Pycorrelate, developed by Dr. Antonino Ingargiola (in the Weiss lab), provides a library of functions to calculate fluorescence correlation functions, as well as to analyze them. One can use pycorrelate in Jupyter notebooks to streamline FCS analysis and to allow analytical results to be easily documented and deposited in public repositories.

FRET CCF example

The cross-correlation between donor & acceptor photons in a FRET experiment (blue) and the best fit curve (orange) to a model that assumes the donor-acceptor cross dynamics occur solely due to the translational diffusion of the donor-acceptor labeled molecule through the nonuniformly illuminated focused laser beam. The fitting residuals are shown in black.

Both the calculation of the fluorescence cross-correlation curve and the fitting was enabled in pycorrelate.

 

PyBroMo

In confocal-based single-molecule fluorescence spectroscopy (SMFS) measurements, such as in single-molecule FRET, molecules that randomly traverse a femtoliter excitation-detection volume produce bursts of photons. There are, however, many experimental parameters (e.g. concentration, viscosity, temperature, molecular size, weight, rigidity/flexibility, fluorescence brightness, photostability, experimental configuration, excitation type, wavelength, intemsity, etc) that lead to different types of photon bursts with different sizs (total amount of bursts), different widths/durations (the time spent by the molecule in the excitation-detection volume) and bursts with varyng instantaneous photon rates, where all of the above are modulated via the burst search & selection algorithm used. It is therefore sometimes hard to know what will be the exact outcome of these parameters in an experiment, ahead of time. Additionally, the link between these experimental observables and the underlying truth (e.g. identity of photons to specific molecules in the ensemble, their position in space when emitting a photons, etc.) is hard to find. The solution for these concerns can be found in simulations that mimic confocal-based SMFS experiments.

For that purpose, Dr. Antonino Ingargiola has developed Python Brownian Motion simulations (in the Weiss lab), or in short PyBroMo. The first layer of PyBroMo produces diffusion trajectories of simulated particles diffusing in a rectangular box with given dimensions. The second layer of SMFS smulations using PyBroMo adds photons emitted from the diffusing particles, depending on their instantaneous positions in the simulation, in reference to an effecive detection volume (EDV), calculated by he square of an excitation point-spread-function (PSF), where the molecular brightness of the particles is taken as the photon rate when a particle is in the center of the EDV. The outcome of the second layer is a file that consists of simulated photons arising from different simulated particles, when they have been in specific positions in the space inside the rectangular box. On top of that, background photons are added.

PSFmodels
Two different models of a pont-spread-function (PSF) used in PyBroMo - a numerically-calculated model that mimics a realistic model of a PSF, and a Gaussian approximation. Both PSF models are the same in means of their energy.

Then, testing different simulation conditions is just a matter of analyzing the photon file (a photonHDF5 file produced by the PyBroMo second layer) using FRETbursts.

We have recently supported the update of PyBromo, by adding several new features that allow referring to the molecule identity and positions, per each photon, in the analysis of the photon file. This way we were able to perform a systematic analysis of the extent of burst impurity (a bursts with its photons arising from more than one particle) in confocal-based SMFS experiments, and how they might be controlled via proper choice of burst analysis parameter values (all found in our new paper). These recent additions can be found in PyBroMo version 0.8.1.

Burst impurity illustration
An example of an impure burst, how its impurity depends on the molecular position, and how impurity is decreased with increasing the value of the instantaneous photon rate threshold, a burst analysis parameter. Points are molecular positions when photons were emitted - molecule 5 (from red to yellow, in time) and molecule 2 (from blue to green, in time). Starting positions are shown with X's. In grey, contours of the numerical PSF model are overlaid.
The extent of burst impurity
The occurrence and level of impure bursts as a function of burst search criteria and concentrations. Different burst analysis parameter values for different concentrations of molecules. The relative occurrence of impure bursts (left) was calculated as the fraction of bursts with an impurity level larger than 0 (error ranges calculated as the 95% confidence intervals), as the fraction of non-single-molecule bursts, and hence as the fraction of impure bursts. The level of impurity (right) was calculated as either the mean of all burst impurity levels (black; error ranges calculated as the standard error) or as the fraction of impure photons from all bursts relative to all burst photons (red; no error ranges, as the calculation was performed over all photons).
The effect of burst impurity on accuracy in smFRET
The effect of burst impurity on the accuracy of the retrieved mean FRET efficiency when performing single-molecule Förster resonance energy transfer (smFRET) experiments with two FRET efficiency subpopulations - increasing the value of the photon rate threshold, F, improves the accuracy of the retrieved mean FRET efficiency. From top to bottom, each panel shows the resulting FRET histogram (blue), the best fit sum-of-two-Gaussians (red), the best-fit mean FRET efficiencies (orange and cyan vertical lines; dimmer lines show the error ranges), and the simulation ground-truth mean FRET efficiency values (dashed red and green vertical lines).

 

Next on our list is to further develop PyBroMo to treat not only freely diffusing particles, but also particles in a directional drift, to mimic SMFS experiments using flow in microfluidic devices.