Working with modified peptides

When trying to score the localization of a modification on a peptide’s sequence, it is often necessary to enumerate all possible localizations of the modification and calculate a score. pyAscore has an internal class for doing this enumeration, and we make that interface available to users to allow convenient iteration over the theoretical fragments of modified peptides.

Iteration over all fragments

Basic iteration requires a peptide sequence (e.g. ‘ASMTK’), the mass of a modification of interest (e.g. 79.9663), the amino acids that the modification can fall on (e.g STY), and the number of modifications of interest. All this can be initialized for a given peptide with the following lines.

pep = PyModifiedPeptide("STY", 79.9663)
pep.consume_peptide("ASMTK", 1)

The PyModifiedPeptide object is where we will get our graphs which allow iteration over the theoretical fragments from permutations of modified amino acids. In this case, there are 2 permutations, AS[79.9663]MTK and ASMT[79.9663]K. If we would like to iterate over the b fragments of charge 1, we can do that by generating the b+ fragment graph and iterating using the iter_permutations and iter_fragments methods.

b_graph = pep.get_fragment_graph("b", 1)
for perm in b_graph.iter_permutations():
    print(perm.get_signature())
    for mz, label in graph.iter_fragments():
        print(mz, "<-", label)

# Output:
[1, 0]
0449 <- b1+
043 <- b2+
084 <- b3+
131 <- b4+
[0, 1]
0449 <- b1+
077 <- b2+
117 <- b3+
131 <- b4+

Notice that the full peptide mass is currently not included. This is because this “fragment” can’t be used for localization.

When recording scores, we tend to build a dictionary that uses the signature as a key. This allows us to track additive scores, such as the counts pyAscore uses, and iterate over graphs independently of each other. Note that one of the big reasons we do that is that iteration over b type permutations is in the opposite direction to the y type permutations.

print("b type iteration:")
b_graph = pep.get_fragment_graph("b", 1)
for perm in b_graph.iter_permutations():
    print(perm.get_signature())

print("y type iteration:")
y_graph = pep.get_fragment_graph("y", 1)
for perm in y_graph.iter_permutations():
    print(perm.get_signature())

# Output
b type iteration:
[1, 0]
[0, 1]
y type iteration:
[0, 1]
[1, 0]

class pyascore.PyModifiedPeptide

The PyModifiedPeptide object provides functionality for modified residues of peptides.

Objects can take in a sequence, a set of fixed position modifications, and a variable amount of unlocalized modifications which can fall on any residue of a specified type. The design allows peaks from spectra to be matched to theoretical peaks from any possible localization. Individual realizations of modified peptides are encoded via a “signature”, which is merely a binary vector with an entry for each modifiable residue, and a 1 signifying that the residue is modified. The peaks of all possible modifications states can be traversed by creating a PyFragmentGraph object, and if two signatures are provided, one can retrieve only the site determining peaks.

Parameters:

mod_groupstr: A string which lists the possible modified residues for the unlocalized modification. For example, with phosphorylation, you may want “STY”.
mod_massfloat: The mass of the unlocalized modification in Daltons. For example, phosphorylation is 79.966331.
mz_errorfloat: The error in daltons to match theoretical peaks to consumed spectral peaks. The option to use PPM will likely be included in the future. (Defaults to 0.5)

Methods

`consume_peptide`	Consumes a single peptide sequence and creates it's internal representation
`get_fragment_graph`	Builds a PyFragmentGraph object which references the current PyModifiedPeptide object
`get_peptide`	Prints the modified sequence (residues plus mod mass) of the consumed peptide
`get_site_determining_ions`	Determine the non-overlapping theoretical fragments of two peptides.

add_neutral_loss

consume_peptide()

Consumes a single peptide sequence and creates it’s internal representation

Parameters:

peptidestr: The peptide string without any modifications or n-terminal markings
n_of_modint > 0: Number of unlocalized modifications on the sequence
max_fragment_chargeint > 0: Fragments will be considered from charge 1 to max_fragment_charge
aux_mod_posndarray of uint32: Positions of fixed modifications. Most modification positions should start at 1 with 0 being reserved for n-terminal modifications, as seems to be the field prefered encoding.
aux_mod_massndarray of float32: Masses of individual fixed postion modifications.

get_fragment_graph()

Builds a PyFragmentGraph object which references the current PyModifiedPeptide object

Parameters:

fragment_typechar: The type of fragment graph to create, e.g. ‘b’.
charge_stateinteger > 0: The charge state of all fragments.

Returns:

PyFragmentGraph: The fragment graph of specified type and charge state.

get_peptide()

Prints the modified sequence (residues plus mod mass) of the consumed peptide

Parameters:

signaturendarray of 0,1 values: Encodes the modification state that each modifiable amino acid should have. Defaults to no modifications.

Returns:

str: A peptide sequence with bracketed modification masses, e.g. PEPT[80]IDEK.

get_site_determining_ions()

Determine the non-overlapping theoretical fragments of two peptides.

Parameters:

sig1ndarray of 0,1 values: Encodes the modification state that each modifiable amino acid should have in the first peptide.
sig2ndarray of 0,1 values: Encodes the modification state that each modifiable amino acid should have in the second peptide.
fragment_typechar: The type of fragment graph to create, e.g. ‘b’.
max_chargeinteger > 0: Site determining ions are produced for all charge states from 1 to max_charge inclusive.

Returns:

tuple of ndarray: A tuple of length 2 with entries that contain all theoretical fragments for a modified peptide not found in the other modified peptide.

class pyascore.PyFragmentGraph

The PyFragmentGraph object allows traversal of the modification tree of a PyModifiedPeptide object.

Every possible modified residue creates a branch point determined by it being modified or not, and the PyFragmentGraph object allows efficient depth first traversal of the tree. By specifying whether the graph should be made over the b or y ions (more ion types to come), this object will spit out the appropriate theoretical MZ for each fragment. These MZ can be mathed against the internal cache of the PyModifiedPeptide object to determine if any consumed peaks match the theoretical peak. For efficiency’s sake, y and b ion graphs iterate through signatures differently, and may not necessarily be reverse iterators of each other.

Object creation through the PyModifiedPeptide.get_fragment_graph method is suggested by not required.

Parameters:

peptidePyModifiedPeptide: A PyModifiedPeptide instance which has consumed at least one peptide
fragment_typechar: The type of fragment graph to create, e.g. ‘b’.
charge_stateint > 0: The charge state of all fragments

Attributes:

fragment_typechar: The current type of fragment returned by the graph
charge_stateint: The current charge state for fragments returned from the graph

Methods

`get_fragment_mz`	Return the size of the current fragment in m/z.
`get_fragment_seq`	Return sequence of current fragment without modifications.
`get_fragment_size`	Return the size of the current fragment in number of amino acids.
`get_signature`	Return current signature.
`incr_fragment`	Increment to next fragment for current signature.
`incr_signature`	Get next signature at position of last modification switch.
`is_fragment_end`	Check if iterator has reached the last fragemnt, i.e. the end of the peptide.
`is_signature_end`	Check if iterator is at last signature.
`iter_fragments`	Iterate through remaining fragments of current signature.
`iter_permutations`	Iterate through remaining signatures and return fragment graph ready for iteration.
`reset_fragment`	Resets iterator to the first position of the current signature.
`reset_iterator`	Resets iterator to the first position of the first signature.
`set_signature`	Change signature to user specified value and reset to the first fragment.

get_fragment_mz()

Return the size of the current fragment in m/z.

Returns:

float

get_fragment_seq()

Return sequence of current fragment without modifications.

Returns:

str

get_fragment_size()

Return the size of the current fragment in number of amino acids.

Returns:

int

get_signature()

Return current signature.

Returns:

ndarray of uint64: Array with one position per modifiable amino acid and a 1 if modified and 0 if not.

incr_fragment(): Increment to next fragment for current signature.

incr_signature(): Get next signature at position of last modification switch.

is_fragment_end(): Check if iterator has reached the last fragemnt, i.e. the end of the peptide.

is_signature_end()

Check if iterator is at last signature.

Returns:

bool: Is this the last signature?

iter_fragments()

Iterate through remaining fragments of current signature.

Yields:

(float, string): pair of fragment mz and fragment label

iter_permutations()

Iterate through remaining signatures and return fragment graph ready for iteration.

If mode == ‘all’, reset fragments to the first position before returning graph.

Yields:

PyFragmentGraph: reference to current graph

reset_fragment(): Resets iterator to the first position of the current signature.

reset_iterator(): Resets iterator to the first position of the first signature.

set_signature()

Change signature to user specified value and reset to the first fragment.

Parameters:

new_signaturendarray of uint32: Encodes the modification state that each modifiable amino acid should have.