Working with modified peptides
When trying to score the localization of a modification on a peptide’s sequence, it is often necessary to enumerate all possible localizations of the modification and calculate a score. pyAscore has an internal class for doing this enumeration, and we make that interface available to users to allow convenient iteration over the theoretical fragments of modified peptides.
Iteration over all fragments
Basic iteration requires a peptide sequence (e.g. ‘ASMTK’), the mass of a modification of interest (e.g. 79.9663), the amino acids that the modification can fall on (e.g STY), and the number of modifications of interest. All this can be initialized for a given peptide with the following lines.
pep = PyModifiedPeptide("STY", 79.9663)
pep.consume_peptide("ASMTK", 1)
The PyModifiedPeptide object is where we will get our graphs which allow iteration over the theoretical fragments from permutations of modified amino acids. In this case, there are 2 permutations, AS[79.9663]MTK and ASMT[79.9663]K. If we would like to iterate over the b fragments of charge 1, we can do that by generating the b+ fragment graph and iterating using the iter_permutations and iter_fragments methods.
b_graph = pep.get_fragment_graph("b", 1)
for perm in b_graph.iter_permutations():
print(perm.get_signature())
for mz, label in graph.iter_fragments():
print(mz, "<-", label)
# Output:
[1, 0]
72.0449 <- b1+
239.043 <- b2+
370.084 <- b3+
471.131 <- b4+
[0, 1]
72.0449 <- b1+
159.077 <- b2+
290.117 <- b3+
471.131 <- b4+
Notice that the full peptide mass is currently not included. This is because this “fragment” can’t be used for localization.
When recording scores, we tend to build a dictionary that uses the signature as a key. This allows us to track additive scores, such as the counts pyAscore uses, and iterate over graphs independently of each other. Note that one of the big reasons we do that is that iteration over b type permutations is in the opposite direction to the y type permutations.
print("b type iteration:")
b_graph = pep.get_fragment_graph("b", 1)
for perm in b_graph.iter_permutations():
print(perm.get_signature())
print("y type iteration:")
y_graph = pep.get_fragment_graph("y", 1)
for perm in y_graph.iter_permutations():
print(perm.get_signature())
# Output
b type iteration:
[1, 0]
[0, 1]
y type iteration:
[0, 1]
[1, 0]
- class pyascore.PyModifiedPeptide
The PyModifiedPeptide object provides functionality for modified residues of peptides.
Objects can take in a sequence, a set of fixed position modifications, and a variable amount of unlocalized modifications which can fall on any residue of a specified type. The design allows peaks from spectra to be matched to theoretical peaks from any possible localization. Individual realizations of modified peptides are encoded via a “signature”, which is merely a binary vector with an entry for each modifiable residue, and a 1 signifying that the residue is modified. The peaks of all possible modifications states can be traversed by creating a PyFragmentGraph object, and if two signatures are provided, one can retrieve only the site determining peaks.
- Parameters:
- mod_groupstr
A string which lists the possible modified residues for the unlocalized modification. For example, with phosphorylation, you may want “STY”.
- mod_massfloat
The mass of the unlocalized modification in Daltons. For example, phosphorylation is 79.966331.
- mz_errorfloat
The error in daltons to match theoretical peaks to consumed spectral peaks. The option to use PPM will likely be included in the future. (Defaults to 0.5)
Methods
Consumes a single peptide sequence and creates it's internal representation
Builds a PyFragmentGraph object which references the current PyModifiedPeptide object
Prints the modified sequence (residues plus mod mass) of the consumed peptide
Determine the non-overlapping theoretical fragments of two peptides.
add_neutral_loss
- consume_peptide()
Consumes a single peptide sequence and creates it’s internal representation
- Parameters:
- peptidestr
The peptide string without any modifications or n-terminal markings
- n_of_modint > 0
Number of unlocalized modifications on the sequence
- max_fragment_chargeint > 0
Fragments will be considered from charge 1 to max_fragment_charge
- aux_mod_posndarray of uint32
Positions of fixed modifications. Most modification positions should start at 1 with 0 being reserved for n-terminal modifications, as seems to be the field prefered encoding.
- aux_mod_massndarray of float32
Masses of individual fixed postion modifications.
- get_fragment_graph()
Builds a PyFragmentGraph object which references the current PyModifiedPeptide object
- Parameters:
- fragment_typechar
The type of fragment graph to create, e.g. ‘b’.
- charge_stateinteger > 0
The charge state of all fragments.
- Returns:
- PyFragmentGraph
The fragment graph of specified type and charge state.
- get_peptide()
Prints the modified sequence (residues plus mod mass) of the consumed peptide
- Parameters:
- signaturendarray of 0,1 values
Encodes the modification state that each modifiable amino acid should have. Defaults to no modifications.
- Returns:
- str
A peptide sequence with bracketed modification masses, e.g. PEPT[80]IDEK.
- get_site_determining_ions()
Determine the non-overlapping theoretical fragments of two peptides.
- Parameters:
- sig1ndarray of 0,1 values
Encodes the modification state that each modifiable amino acid should have in the first peptide.
- sig2ndarray of 0,1 values
Encodes the modification state that each modifiable amino acid should have in the second peptide.
- fragment_typechar
The type of fragment graph to create, e.g. ‘b’.
- max_chargeinteger > 0
Site determining ions are produced for all charge states from 1 to max_charge inclusive.
- Returns:
- tuple of ndarray
A tuple of length 2 with entries that contain all theoretical fragments for a modified peptide not found in the other modified peptide.
- class pyascore.PyFragmentGraph
The PyFragmentGraph object allows traversal of the modification tree of a PyModifiedPeptide object.
Every possible modified residue creates a branch point determined by it being modified or not, and the PyFragmentGraph object allows efficient depth first traversal of the tree. By specifying whether the graph should be made over the b or y ions (more ion types to come), this object will spit out the appropriate theoretical MZ for each fragment. These MZ can be mathed against the internal cache of the PyModifiedPeptide object to determine if any consumed peaks match the theoretical peak. For efficiency’s sake, y and b ion graphs iterate through signatures differently, and may not necessarily be reverse iterators of each other.
Object creation through the PyModifiedPeptide.get_fragment_graph method is suggested by not required.
- Parameters:
- peptidePyModifiedPeptide
A PyModifiedPeptide instance which has consumed at least one peptide
- fragment_typechar
The type of fragment graph to create, e.g. ‘b’.
- charge_stateint > 0
The charge state of all fragments
- Attributes:
- fragment_typechar
The current type of fragment returned by the graph
- charge_stateint
The current charge state for fragments returned from the graph
Methods
Return the size of the current fragment in m/z.
Return sequence of current fragment without modifications.
Return the size of the current fragment in number of amino acids.
Return current signature.
Increment to next fragment for current signature.
Get next signature at position of last modification switch.
Check if iterator has reached the last fragemnt, i.e. the end of the peptide.
Check if iterator is at last signature.
Iterate through remaining fragments of current signature.
Iterate through remaining signatures and return fragment graph ready for iteration.
Resets iterator to the first position of the current signature.
Resets iterator to the first position of the first signature.
Change signature to user specified value and reset to the first fragment.
- get_fragment_mz()
Return the size of the current fragment in m/z.
- Returns:
- float
- get_fragment_seq()
Return sequence of current fragment without modifications.
- Returns:
- str
- get_fragment_size()
Return the size of the current fragment in number of amino acids.
- Returns:
- int
- get_signature()
Return current signature.
- Returns:
- ndarray of uint64
Array with one position per modifiable amino acid and a 1 if modified and 0 if not.
- incr_fragment()
Increment to next fragment for current signature.
- incr_signature()
Get next signature at position of last modification switch.
- is_fragment_end()
Check if iterator has reached the last fragemnt, i.e. the end of the peptide.
- is_signature_end()
Check if iterator is at last signature.
- Returns:
- bool
Is this the last signature?
- iter_fragments()
Iterate through remaining fragments of current signature.
- Yields:
- (float, string)
pair of fragment mz and fragment label
- iter_permutations()
Iterate through remaining signatures and return fragment graph ready for iteration.
If mode == ‘all’, reset fragments to the first position before returning graph.
- Yields:
- PyFragmentGraph
reference to current graph
- reset_fragment()
Resets iterator to the first position of the current signature.
- reset_iterator()
Resets iterator to the first position of the first signature.
- set_signature()
Change signature to user specified value and reset to the first fragment.
- Parameters:
- new_signaturendarray of uint32
Encodes the modification state that each modifiable amino acid should have.