BioSAS: Advanced Applications US-SOMO component at SAS2018

Coordinates: October 7, 2018, Peninsula A, Grand Traverse Resort & Spa, Traverse City, Michigan

Software and sample data installation

At your earliest convenience, please install the latest version of the software before the course.
Download and extract the Aldolase sample data available here

Section Outline

The purpose of this section of the BioSAS Advanced Workshop is to guide the users attending it through the SAS data analysis capabilities of the more general program UltraScan SOlution MOdeller (US‑SOMO) [references]. In particular, we will be dealing extensively with the HPLC-SAXS module, and its capabilities of produce individual species intensities vs. momentum transfer I(q) vs. q data for overlapping, not baseline resolved peaks following a size-exclusion chromatography run directly coupled to concentration and SAXS detectors. This will be mainly done by Gaussian decomposition after conversion of the I(q) vs. q data into a series of I(t) vs. t “chromatograms” as a function of elution time (or frame number) t for each q value. Prior to the Gaussian decomposition, an initial Single Value Decomposition (SVD) evaluation of the number of components present in the SAXS data files is performed. Procedures for the comparison of the separate components’ SAXS data with those derived from atomic-resolution structures using distance distribution functions P(r) vs. r will be then illustrated. If time will allow it, baseline correction procedures will be also demonstrated.

Steps in the hands-on tutorial:

Loading of a PDB-formatted structure file into the US-SOMO main module for the subsequent comparison with SAXS-derived data, with automatic computation of the partial specific volume and of the radius of gyration.
Launching the SAS module, and a general overview of it, but without any further operation at this point.
Launching of the HPLC-SAXS module from the main SAS module.
Loading both SAXS and concentration files in the HPLC-SAXS module. Visualization tools available.
Converting the I(q) vs. q SAXS frames into the I(t) vs. t chromatograms as a function of time/frame number for each q value. Visual check of the data to establish if baseline correction tools should be tested/applied. At this stage, no baseline correction tools will be demonstrated, but a pre-corrected dataset will be uploaded in the HPLC-SAXS module.
Rescaling and alignment of the concentration chromatogram against a single chosen I(t) vs. t dataset. Setting the rescaled/aligned concentration chromatogram as the reference concentration chromatogram.
Using the “Trial make I(q)” mode to examine the radius of gyration Rg as a function of frame number, revealing non-separated species.
SVD analyses of both uncorrected and pre-baseline-corrected datasets. Determination of the minimum number of Gaussians needed to decompose the data.
Gaussian decomposition first step. Choice of the Gaussian type (“normal” vs. skewed Gaussians). Initialization of the procedure by selecting a single I(t) vs. t chromatogram and positioning the chosen number of Gaussians. Setting the decomposition limits.
Gaussian decomposition second step. Using the fitter to achieve a good decomposition of this single I(t) vs. t chromatogram, as judged by residuals and chi squared value. Effect of the choice of Gaussian type on this initialization procedure. Accepting the best Gaussian set.
Gaussian decomposition third step. Selecting a subset of I(t) vs. t chromatograms for global fitting, using the advanced data selector. Propagating the initial set of Gaussians (positions, widths, and skewness, if present, are kept constant for each Gaussian along the I(t) vs. t chosen subset, while amplitudes are adjusted). Global fitting, where all parameters will be optimized. Note that the positions, widths and skewness of the Gaussians will be the same within each peak along the I(t) vs. t chromatograms. In addition, as a default the skewness will be the same for all Gaussians in all peaks, but this can be released.
Gaussian decomposition fourth step. Judging if the Gaussian decomposition obtained on the I(t) vs. t subset is satisfactory using, in addition of the global chi squared and of the global residues plot, also the “global fit by q” plot, where the results of a P-Value Pairwise analysis of each original and reconstructed from the sum of the Gaussian curves I(t) vs. t chromatogram, as well of the individual chi squared of the fit, are plotted as a function of the q values. Acceptance or rejection of the global decomposition.
Gaussian decomposition fifth step. If the fit of the subset was judged good enough and accepted, selection of all I(t) vs. t chromatograms and propagation of all accepted Gaussians to entire dataset (again, positions, widths, and skewness, if present, are kept constant for each Gaussian peak along the I(t) vs. t entire dataset, while amplitudes are adjusted). Again, evaluation by global residuals, global chi squared, and the global fit by q plot.
Further checks by using the “Trial make I(q)” mode, verifying if the Rg values are now sufficiently constant under each decomposed peak.
Gaussian decomposition sixth step. Decomposition of the reference concentration chromatogram (rescaled/aligned) associated with the SAXS data. Initialization with a single set of the final Gaussians, fit allowing minimal changes on the position/width of the individual Gaussians (to partially compensate for the different distances between peaks if the detector is placed in series and before the SAXS cell).
Gaussian decomposition seventh step. Back- generation of the individual SAXS frames for each Gaussian-fit peak. Understanding/selecting the “Make I(q)” menu options, including the band-broadening correction for the concentration detector option, and the automatic top of each peak frames selection for concentration-normalization and averaging.
Examination of the resulting normalized/averaged top of each peak frames. Selecting an averaged dataset and pushing it to the main SAS module. Guinier analysis.
Comparison with data generated from the pre-loaded PDB file. First, generation of a P(r) vs. r curve from the PDB structure (optional mapping of residues contributing to various portions of the P(r) vs.r curve). Then, using the IFT (indirect Fourier transform) with Bayesian inference, generation of a P(r) vs. r curve from the averaged dataset derived from the Gaussian decomposition in the HPLC-SAXS module. Comparison of the two curves. Optionally, if time will allow, comparison using the NNLS (non negative least squares) module of the HPLC-SAXS-derived curve with a series of I(q) vs. q curves previously calculated from several PDB model structures using the WAXSiS server.
If time permits, back to the HPLC-SAXS module to demonstrate the baseline correction features.

If you have any questions, please feel free to contact us directly.

Emre Brookes, Ph.D.
Associate Professor for Research
Department of Chemistry & Biochemistry
The University of Montana, Missoula

Mattia Rocco, Ph.D.
Retired, Proteomica e Spettrometria di Massa
IRCCS Ospedale Policlinico, San Martino
Genova, Italy