Scoring localization
Here we will break down the scoring script from the first API page to give an overview of the individual scoring components. The main scoring functionality is performed on a PSM by PSM basis and can be accessed through the pyAscore class. The format that I find most helpful in scripts is to read in the spectra as a dictionary of dictionaries and the PSMs as a list so that you can loop through the PSMs and then just look up the spectra you need.
psm_file = "psms.pep.xml"
id_parser = IdentificationParser(psm_file, "pepXML")
psm_objects = id_parser.to_list()
spectra_file = "spectra.mzML"
spectra_parser = SpectraParser(spectra_file, "mzML")
spectra_objects = spectra_parser.to_dict()
The pyAscore class will score the modification of interest and has parameters that help you tailor scoring to your individual instrument parameters. One thing that is good to note is that the scoring is done with exact calculations that get a lot of speed enhancement from caching of previous calculations. This means that it is best to initialize ascore objects before looping through the PSMs, instead of for every PSM.
mod_mass = 79.966331
ascore = PyAscore(mod_group="STY",
mod_mass=mod_mass,
mz_error=.05,
fragment_types="by")
The PSM reading functionality does not know which modification is of interest to the user, and so during looping the modifications must be partitioned into variable and static mods. Then, the fragments, intensities, unmodified peptide sequence, and modification information for a PSM can be passed to the score function. At this point, the maximum fragment charge state can also be chosen. For max speed, we can only score +1 peaks, but for max accuracy we would recommend up to the precursor charge - 1.
pyascore_results = []
for psm in psm_objects:
mod_select = np.isclose(psm["mod_masses"], mod_mass)
nmods = np.sum(mod_select)
if nmods >= 1:
spectrum = spectra_objects[psm["scan"]]
# Partition modifications
aux_mod_pos = psm["mod_positions"][~mod_select].astype(np.uint32)
aux_mod_masses = psm["mod_masses"][~mod_select].astype(np.float32)
# Score PSMs
ascore.score(mz_arr = spectrum["mz_values"],
int_arr = spectrum["intensity_values"],
peptide = psm["peptide"],
n_of_mod = np.sum(mod_select),
max_fragment_charge = psm["charge_state"] - 1,
aux_mod_pos = aux_mod_pos,
aux_mod_mass = aux_mod_masses)
# Store scores
pyascore_results.append({"scan" : psm["scan"],
"localized_peptide" : ascore.best_sequence,
"pepscore" : ascore.best_score,
"ascores" : ";".join([str(s) for s in ascore.ascores])})
Class Reference
- class pyascore.PyAscore
The PyAscore object scores the localization of post translational modifications (PTMs).
Objects are designed to take in a spectra, the associated peptide sequence, a set of fixed position modifications, and a variable amount of unlocalized modifications and determine how much evidence exists for placing PTMs on individual amino acids. The algorithm is a modified version of Beausoleil et al. [PMID: 16964243] which can efficiently handle any size peptide and arbitrary PTM masses. Each scored PSM will generate the most likely PTM positions and scores, as well as alternative sites for each PTM which have equal evidence but evidence that is less than or equal to the maximum. These alternative sites are not required to be adjacent (i.e. not separated by another modifiable residue).
- Note:
Attributes are only meaningful after consumption of the first peptide.
- Parameters:
- bin_sizefloat
Size in MZ of each bin
- n_topint
Number of top peaks to retain in each bin (must be >= 0)
- mod_groupstr
A string which lists the possible modified residues for the unlocalized modification. For example, with phosphorylation, you may want “STY”.
- mod_massfloat
The mass of the unlocalized modification in Daltons. For example, phosphorylation is 79.966331.
- mz_errorfloat
The error in daltons to match theoretical peaks to consumed spectral peaks. The option to use PPM will likely be included in the future. (Defaults to 0.5)
- fragment_typesstr
The theoretical fragment types to score.
- Attributes:
- best_sequencestr
Peptide sequence with modifications included in brackets for the best scoring localization.
- best_scorefloat
The best PepScore among all possible localization permutations.
- pep_scoreslist of dict
Python dict representations of internal PepScore objects. Each object contains the sequence of the underlying peptide and all information necessary to calculate the ambiguity scores. The list is sorted by decreasing weighted_score which is also known as the PepScore.
- ascoresndarray of float32
Ascores for each individual non-static site in the peptide.
- alt_siteslist of ndarry of uint32
Alternative positions for each individual non-static site in the peptide.
Methods
Add a neutral loss ion to any fragment containing specified amino acids
Calculate ambiguity between 2 competing localizations
Consume spectra and associated peptide information and score PTM localization
- add_neutral_loss()
Add a neutral loss ion to any fragment containing specified amino acids
For every fragment ion containing one of the amino acids defined in group, the algorithm will search for a secondary neutral loss peak to also score. Neutral losses can be specified for any amino acid using its one letter code, i.e STY, and can be placed on a modified amino acid (variable or otherwise) by using a lowercase letter, i.e. sty.
- Parameters:
- groupstr
Amino acids that should allow the neutral loss specified with their single letter code, i.e. STY for Ser, Thr, and Tyr.
- massfloat
Absolute mass of the neutral loss. Negative values will cause the algorithm to look for higher mass peaks.
- calculate_ambiguity()
Calculate ambiguity between 2 competing localizations
Inputs to this function should come directly from the pep_scores attribute. For this score to be possitive, the score dict with the highest weighted_score should come first. When the weighted_score for both is equal, this will return 0.
- Parameters:
- ref_scoredict
PepScore object for one localization.
- other_scoredict
PepScore object for competing localization.
- Returns:
- float
Relative evidence for localization in first argument vs the second argument
- score()
Consume spectra and associated peptide information and score PTM localization
- Parameters:
- mz_arrndarray of float64
Array of MZ values for each peak in a spectra.
- int_arrndarray of float64
Array of intensity values for each peak in a spectra.
- peptidestr
The peptide string without any modifications or n-terminal markings.
- n_of_modint > 0
Number of unlocalized modifications on the sequence.
- max_fragment_chargeint > 0
Maximum fragment charge to be used for score calculations.
- aux_mod_posndarray of uint32
Positions of fixed modifications. Most modification positions should start at 1 with 0 being reserved for n-terminal modifications, as seems to be the field prefered encoding.
- aux_mod_massndarray of float32
Masses of individual fixed postion modifications.