Scoring localization

Here we will break down the scoring script from the first API page to give an overview of the individual scoring components. The main scoring functionality is performed on a PSM by PSM basis and can be accessed through the pyAscore class. The format that I find most helpful in scripts is to read in the spectra as a dictionary of dictionaries and the PSMs as a list so that you can loop through the PSMs and then just look up the spectra you need.

psm_file = "psms.pep.xml"
id_parser = IdentificationParser(psm_file, "pepXML")
psm_objects = id_parser.to_list()

spectra_file = "spectra.mzML"
spectra_parser = SpectraParser(spectra_file, "mzML")
spectra_objects = spectra_parser.to_dict()

The pyAscore class will score the modification of interest and has parameters that help you tailor scoring to your individual instrument parameters. One thing that is good to note is that the scoring is done with exact calculations that get a lot of speed enhancement from caching of previous calculations. This means that it is best to initialize ascore objects before looping through the PSMs, instead of for every PSM.

mod_mass = 79.966331
ascore = PyAscore(mod_group="STY",
                  mod_mass=mod_mass,
                  mz_error=.05,
                  fragment_types="by")

The PSM reading functionality does not know which modification is of interest to the user, and so during looping the modifications must be partitioned into variable and static mods. Then, the fragments, intensities, unmodified peptide sequence, and modification information for a PSM can be passed to the score function. At this point, the maximum fragment charge state can also be chosen. For max speed, we can only score +1 peaks, but for max accuracy we would recommend up to the precursor charge - 1.

pyascore_results = []
for psm in psm_objects:
    mod_select = np.isclose(psm["mod_masses"], mod_mass)
    nmods = np.sum(mod_select)

    if nmods >= 1:
        spectrum = spectra_objects[psm["scan"]]

        # Partition modifications
        aux_mod_pos = psm["mod_positions"][~mod_select].astype(np.uint32)
        aux_mod_masses = psm["mod_masses"][~mod_select].astype(np.float32)

        # Score PSMs
        ascore.score(mz_arr = spectrum["mz_values"],
                     int_arr = spectrum["intensity_values"],
                     peptide = psm["peptide"],
                     n_of_mod = np.sum(mod_select),
                     max_fragment_charge = psm["charge_state"] - 1,
                     aux_mod_pos = aux_mod_pos,
                     aux_mod_mass = aux_mod_masses)

        # Store scores
        pyascore_results.append({"scan" : psm["scan"],
                                 "localized_peptide" : ascore.best_sequence,
                                 "pepscore" : ascore.best_score,
                                 "ascores" : ";".join([str(s) for s in ascore.ascores])})

Class Reference

class pyascore.PyAscore

The PyAscore object scores the localization of post translational modifications (PTMs).

Objects are designed to take in a spectra, the associated peptide sequence, a set of fixed position modifications, and a variable amount of unlocalized modifications and determine how much evidence exists for placing PTMs on individual amino acids. The algorithm is a modified version of Beausoleil et al. [PMID: 16964243] which can efficiently handle any size peptide and arbitrary PTM masses. Each scored PSM will generate the most likely PTM positions and scores, as well as alternative sites for each PTM which have equal evidence but evidence that is less than or equal to the maximum. These alternative sites are not required to be adjacent (i.e. not separated by another modifiable residue).

Note:: Attributes are only meaningful after consumption of the first peptide.

Parameters:

bin_sizefloat: Size in MZ of each bin
n_topint: Number of top peaks to retain in each bin (must be >= 0)
mod_groupstr: A string which lists the possible modified residues for the unlocalized modification. For example, with phosphorylation, you may want “STY”.
mod_massfloat: The mass of the unlocalized modification in Daltons. For example, phosphorylation is 79.966331.
mz_errorfloat: The error in daltons to match theoretical peaks to consumed spectral peaks. The option to use PPM will likely be included in the future. (Defaults to 0.5)
fragment_typesstr: The theoretical fragment types to score.

Attributes:

best_sequencestr: Peptide sequence with modifications included in brackets for the best scoring localization.
best_scorefloat: The best PepScore among all possible localization permutations.
pep_scoreslist of dict: Python dict representations of internal PepScore objects. Each object contains the sequence of the underlying peptide and all information necessary to calculate the ambiguity scores. The list is sorted by decreasing weighted_score which is also known as the PepScore.
ascoresndarray of float32: Ascores for each individual non-static site in the peptide.
alt_siteslist of ndarry of uint32: Alternative positions for each individual non-static site in the peptide.

Methods

`add_neutral_loss`	Add a neutral loss ion to any fragment containing specified amino acids
`calculate_ambiguity`	Calculate ambiguity between 2 competing localizations
`score`	Consume spectra and associated peptide information and score PTM localization

add_neutral_loss()

Add a neutral loss ion to any fragment containing specified amino acids

For every fragment ion containing one of the amino acids defined in group, the algorithm will search for a secondary neutral loss peak to also score. Neutral losses can be specified for any amino acid using its one letter code, i.e STY, and can be placed on a modified amino acid (variable or otherwise) by using a lowercase letter, i.e. sty.

Parameters:

groupstr: Amino acids that should allow the neutral loss specified with their single letter code, i.e. STY for Ser, Thr, and Tyr.
massfloat: Absolute mass of the neutral loss. Negative values will cause the algorithm to look for higher mass peaks.

calculate_ambiguity()

Calculate ambiguity between 2 competing localizations

Inputs to this function should come directly from the pep_scores attribute. For this score to be possitive, the score dict with the highest weighted_score should come first. When the weighted_score for both is equal, this will return 0.

Parameters:

ref_scoredict: PepScore object for one localization.
other_scoredict: PepScore object for competing localization.

Returns:

float: Relative evidence for localization in first argument vs the second argument

score()

Consume spectra and associated peptide information and score PTM localization

Parameters:

mz_arrndarray of float64: Array of MZ values for each peak in a spectra.
int_arrndarray of float64: Array of intensity values for each peak in a spectra.
peptidestr: The peptide string without any modifications or n-terminal markings.
n_of_modint > 0: Number of unlocalized modifications on the sequence.
max_fragment_chargeint > 0: Maximum fragment charge to be used for score calculations.
aux_mod_posndarray of uint32: Positions of fixed modifications. Most modification positions should start at 1 with 0 being reserved for n-terminal modifications, as seems to be the field prefered encoding.
aux_mod_massndarray of float32: Masses of individual fixed postion modifications.