Working with modified peptides

When trying to score the localization of a modification on a peptide’s sequence, it is often necessary to enumerate all possible localizations of the modification and calculate a score. pyAscore has an internal class for doing this enumeration, and we make that interface available to users to allow convenient iteration over the theoretical fragments of modified peptides.

Iteration over all fragments

Basic iteration requires a peptide sequence (e.g. ‘ASMTK’), the mass of a modification of interest (e.g. 79.9663), the amino acids that the modification can fall on (e.g STY), and the number of modifications of interest. All this can be initialized for a given peptide with the following lines.

pep = PyModifiedPeptide("STY", 79.9663)
pep.consume_peptide("ASMTK", 1)

The PyModifiedPeptide object is where we will get our graphs which allow iteration over the theoretical fragments from permutations of modified amino acids. In this case, there are 2 permutations, AS[79.9663]MTK and ASMT[79.9663]K. If we would like to iterate over the b fragments of charge 1, we can do that by generating the b+ fragment graph and iterating using the iter_permutations and iter_fragments methods.

b_graph = pep.get_fragment_graph("b", 1)
for perm in b_graph.iter_permutations():
    print(perm.get_signature())
    for mz, label in graph.iter_fragments():
        print(mz, "<-", label)
# Output:
[1, 0]
72.0449 <- b1+
239.043 <- b2+
370.084 <- b3+
471.131 <- b4+
[0, 1]
72.0449 <- b1+
159.077 <- b2+
290.117 <- b3+
471.131 <- b4+

Notice that the full peptide mass is currently not included. This is because this “fragment” can’t be used for localization.

When recording scores, we tend to build a dictionary that uses the signature as a key. This allows us to track additive scores, such as the counts pyAscore uses, and iterate over graphs independently of each other. Note that one of the big reasons we do that is that iteration over b type permutations is in the opposite direction to the y type permutations.

print("b type iteration:")
b_graph = pep.get_fragment_graph("b", 1)
for perm in b_graph.iter_permutations():
    print(perm.get_signature())

print("y type iteration:")
y_graph = pep.get_fragment_graph("y", 1)
for perm in y_graph.iter_permutations():
    print(perm.get_signature())
# Output
b type iteration:
[1, 0]
[0, 1]
y type iteration:
[0, 1]
[1, 0]
class pyascore.PyModifiedPeptide

The PyModifiedPeptide object provides functionality for modified residues of peptides.

Objects can take in a sequence, a set of fixed position modifications, and a variable amount of unlocalized modifications which can fall on any residue of a specified type. The design allows peaks from spectra to be matched to theoretical peaks from any possible localization. Individual realizations of modified peptides are encoded via a “signature”, which is merely a binary vector with an entry for each modifiable residue, and a 1 signifying that the residue is modified. The peaks of all possible modifications states can be traversed by creating a PyFragmentGraph object, and if two signatures are provided, one can retrieve only the site determining peaks.

Parameters:
mod_groupstr

A string which lists the possible modified residues for the unlocalized modification. For example, with phosphorylation, you may want “STY”.

mod_massfloat

The mass of the unlocalized modification in Daltons. For example, phosphorylation is 79.966331.

mz_errorfloat

The error in daltons to match theoretical peaks to consumed spectral peaks. The option to use PPM will likely be included in the future. (Defaults to 0.5)

Methods

consume_peptide

Consumes a single peptide sequence and creates it's internal representation

get_fragment_graph

Builds a PyFragmentGraph object which references the current PyModifiedPeptide object

get_peptide

Prints the modified sequence (residues plus mod mass) of the consumed peptide

get_site_determining_ions

Determine the non-overlapping theoretical fragments of two peptides.

add_neutral_loss

consume_peptide()

Consumes a single peptide sequence and creates it’s internal representation

Parameters:
peptidestr

The peptide string without any modifications or n-terminal markings

n_of_modint > 0

Number of unlocalized modifications on the sequence

max_fragment_chargeint > 0

Fragments will be considered from charge 1 to max_fragment_charge

aux_mod_posndarray of uint32

Positions of fixed modifications. Most modification positions should start at 1 with 0 being reserved for n-terminal modifications, as seems to be the field prefered encoding.

aux_mod_massndarray of float32

Masses of individual fixed postion modifications.

get_fragment_graph()

Builds a PyFragmentGraph object which references the current PyModifiedPeptide object

Parameters:
fragment_typechar

The type of fragment graph to create, e.g. ‘b’.

charge_stateinteger > 0

The charge state of all fragments.

Returns:
PyFragmentGraph

The fragment graph of specified type and charge state.

get_peptide()

Prints the modified sequence (residues plus mod mass) of the consumed peptide

Parameters:
signaturendarray of 0,1 values

Encodes the modification state that each modifiable amino acid should have. Defaults to no modifications.

Returns:
str

A peptide sequence with bracketed modification masses, e.g. PEPT[80]IDEK.

get_site_determining_ions()

Determine the non-overlapping theoretical fragments of two peptides.

Parameters:
sig1ndarray of 0,1 values

Encodes the modification state that each modifiable amino acid should have in the first peptide.

sig2ndarray of 0,1 values

Encodes the modification state that each modifiable amino acid should have in the second peptide.

fragment_typechar

The type of fragment graph to create, e.g. ‘b’.

max_chargeinteger > 0

Site determining ions are produced for all charge states from 1 to max_charge inclusive.

Returns:
tuple of ndarray

A tuple of length 2 with entries that contain all theoretical fragments for a modified peptide not found in the other modified peptide.

class pyascore.PyFragmentGraph

The PyFragmentGraph object allows traversal of the modification tree of a PyModifiedPeptide object.

Every possible modified residue creates a branch point determined by it being modified or not, and the PyFragmentGraph object allows efficient depth first traversal of the tree. By specifying whether the graph should be made over the b or y ions (more ion types to come), this object will spit out the appropriate theoretical MZ for each fragment. These MZ can be mathed against the internal cache of the PyModifiedPeptide object to determine if any consumed peaks match the theoretical peak. For efficiency’s sake, y and b ion graphs iterate through signatures differently, and may not necessarily be reverse iterators of each other.

Object creation through the PyModifiedPeptide.get_fragment_graph method is suggested by not required.

Parameters:
peptidePyModifiedPeptide

A PyModifiedPeptide instance which has consumed at least one peptide

fragment_typechar

The type of fragment graph to create, e.g. ‘b’.

charge_stateint > 0

The charge state of all fragments

Attributes:
fragment_typechar

The current type of fragment returned by the graph

charge_stateint

The current charge state for fragments returned from the graph

Methods

get_fragment_mz

Return the size of the current fragment in m/z.

get_fragment_seq

Return sequence of current fragment without modifications.

get_fragment_size

Return the size of the current fragment in number of amino acids.

get_signature

Return current signature.

incr_fragment

Increment to next fragment for current signature.

incr_signature

Get next signature at position of last modification switch.

is_fragment_end

Check if iterator has reached the last fragemnt, i.e. the end of the peptide.

is_signature_end

Check if iterator is at last signature.

iter_fragments

Iterate through remaining fragments of current signature.

iter_permutations

Iterate through remaining signatures and return fragment graph ready for iteration.

reset_fragment

Resets iterator to the first position of the current signature.

reset_iterator

Resets iterator to the first position of the first signature.

set_signature

Change signature to user specified value and reset to the first fragment.

get_fragment_mz()

Return the size of the current fragment in m/z.

Returns:
float
get_fragment_seq()

Return sequence of current fragment without modifications.

Returns:
str
get_fragment_size()

Return the size of the current fragment in number of amino acids.

Returns:
int
get_signature()

Return current signature.

Returns:
ndarray of uint64

Array with one position per modifiable amino acid and a 1 if modified and 0 if not.

incr_fragment()

Increment to next fragment for current signature.

incr_signature()

Get next signature at position of last modification switch.

is_fragment_end()

Check if iterator has reached the last fragemnt, i.e. the end of the peptide.

is_signature_end()

Check if iterator is at last signature.

Returns:
bool

Is this the last signature?

iter_fragments()

Iterate through remaining fragments of current signature.

Yields:
(float, string)

pair of fragment mz and fragment label

iter_permutations()

Iterate through remaining signatures and return fragment graph ready for iteration.

If mode == ‘all’, reset fragments to the first position before returning graph.

Yields:
PyFragmentGraph

reference to current graph

reset_fragment()

Resets iterator to the first position of the current signature.

reset_iterator()

Resets iterator to the first position of the first signature.

set_signature()

Change signature to user specified value and reset to the first fragment.

Parameters:
new_signaturendarray of uint32

Encodes the modification state that each modifiable amino acid should have.