Command Line Interface

Common workflows

Running with standard file formats

The minium information needed to run pyAscore from the command line is a file containing spectra and a file containing PMSs from a database search engine such as Comet. By default, pyAscore will accept spectra in the mzML format and PSMs in pepXML format. The modification of interest can be specified by providing the modifiable amino acids in there 1 letter codes and the mass fo the modification, e.g. S (Serine), T (Threonine), and Y (Tyrosine) and 79.9663 Da. This is also the default modification.

$ pyascore --residues STY \
>          --mod_mass 79.9663 \
>          spectra_file.mzML \
>          psm_file.pep.xml \
>          output_file.tsv

Other file formats can also be provided. For spectra, users can also supply mzXML files, and for PSMs, users can supply mzIdentML, percolatorTXT, and mokapotTXT. The format of the input files must be specified with the –spec_file_type and –ident_file_type arguments respectively.

$ pyascore --residues STY \
>          --mod_mass 79.9663 \
>          --spec_file_type mzXML \
>          --ident_file_type mzIdentML \
>          spectra_file.mzXML \
>          psm_file.mzid \
>          output_file.tsv

Inputing custom PSM data

Often, a user may have used a search engine which doesn’t output a standard format but they still want to connect it to pyAscore. Or they may want to manipulate data before handing it off to pyAscore and want to work with a simpler format for connecting their pipeline than XML based data. This situation is easily handled by the percolatorTXT input format since the standard Percolator output can be reduced to a minimal tab delimited layout.

   scan  rt     sequence                      charge
0  2082  304.8  S[79.9663]NNSNSNSGGK          2
1  2624  402.8  RARES[79.9663]DNEDAK          3
2  2625  402.9  SS[79.9663]NGNESNGAK          2
3  2655  405.4  n[42.010565]S[79.9663]DAGRK   2

Once a user makes a tsv file with their PSMs, the data can be fed to pyAscore with the following command.

$ pyascore --residues STY \
>          --mod_mass 79.9663 \
>          --ident_file_type percolatorTXT \
>          spectra_file.mzML \
>          psm_file.tsv \
>          output_file.tsv

Tailoring to instrument parameters

pyAscore works by looking for fragment ions within the supplied spectra which match a theoretical fragment pattern. Users should tailor the tolerance of this search so that it matches the instrument resolution. This can be done with the –mz_error option. For high resolution data, we have been using a tolerance of 0.05 Da, and for low resolution data, we have been using a tolerance of 0.5 Da.

$ pyascore --residues STY \
>          --mod_mass 79.9663 \
>          --mz_error 0.05 \
>          spectra_file.mzML \
>          psm_file.pep.xml \
>          output_file.tsv

By default, pyAscore will score b and y ion fragment peaks, which are the most abundant peaks for HCD and CID fragmentation data. If a user wants to analyze ETD fragmentation data, it is recommended to score c and z+H ions. This can be specified with the –fragment_types option. The Z character is used to differentiate the z+H ion from the z ion.

$ pyascore --residues STY \
>          --mod_mass 79.9663 \
>          --mz_error 0.05 \
>          --fragment_types cZ \
>          spectra_file.mzML \
>          psm_file.pep.xml \
>          output_file.tsv

Specifying neutral losses

A user has the option to use neutral loss peaks in their scoring procedure, and these can be different for modified and unmodified residues. A user can supply a comma sepparated list of amino acid groups, uppercase for unmodified residues and lowercase for modified, and a comma sepparated list of neutral loss masses. If, for example, a user wants to use the H3P04 neutral loss ions, a loss of 97.976896, on modified Ser, Thr, and Tyr residues, they could use the following command.

$ pyascore --residues STY \
>          --mod_mass 79.9663 \
>          --mz_error 0.05 \
>          --neutral_loss_groups sty \
>          --neutral_loss_masses 97.976896 \
>          spectra_file.mzML \
>          psm_file.pep.xml \
>          output_file.tsv

While a rare occurence, a user could theoretically specify a gain of mass on any residue by passing negative masses to –neutral_loss_masses.

Output description

pyAscore outputs a single .tsv style file with one entry for every PSM containing the modification of interest. Example output is given below.

Scan LocalizedSequence                                PepScore           Ascores                       AltSites
7546 ARS[80]VS[80]PPPK                                inf                inf;inf                       ;
7547 S[80]AS[80]SC[57]PNLLVPETWPHQVSASHAGRSKQP        6.8605122566223145 0.0;0.0                       4;4
7548 VGSLM[16]TSSSGTSLRTSST[80]                       16.59139633178711  0.0                           11,12,15,16,17
7549 NDSLSSLDFDDDDVDLS[80]REK                         2.4440932273864746 8.094615                      3,5,6
7552 ASAS[80]PSTSSTSSRPK                              92.12223815917969  0.0                           2
7553 RLNHS[80]PPQSSSR                                 31.98526954650879  53.332687                     9,10,11
7555 M[16]HSGEKPY[80]EC[57]S[80]EC[57]GKIFS[80]M[16]K 8.363080978393555  0.0;0.0;15.29163              3;3;3
7557 RHS[80]HS[80]HS[80]PMSTR                         66.47030639648438  44.625744;48.631317;39.542572 10,11;10,11;10,11
Description of columns:
  • Scan: This is the scan number from the supplied spectra file. It is usually taken from the scan header and so care should by taken that this matches expectations.

  • LocalizedSequence: This is the modified peptide with the PTM of interest placed in the best positions according the the PepScore. All outputed masses are rounded to their whole number representations.

  • PepScore: This score gives the total amount of evidence for the listed sequence being correct. It is based on the total number of matching theoretical ions to the ranked peaks within the supplied spectrum. A value of inf means that there is no ambiguity in the localized sequence.

  • Ascores: This semicolon separated list of scores gives the relative amount of evidence for the localization of the modification of interest vs the next best localization. It is based on the number of matching theoretical site determining ion peaks in the supplied spectrum. A value of inf means that there is no ambiguity in the site placement. There is one entry in the list per modification of interest on the peptide.

  • Altsites: This semicolon separated list of comma separated positions gives the next best locations for a modification. There is one list of alternative sites per modification of interest on the peptide.

All options

The pyAscore module provides PTM localization analysis using a custom implementation of the Ascore algorithm. It employees pyteomics for efficient reading of spectra in mzML format and identifications in pepXML format. All scoring has been implemented in custom c++ code which is exposed to python via cython wrappers. Any PTM which be defined with a canonical amino acid and mass shift can be analyzed.

usage: pyascore [-h] [--match_save] [--residues RESIDUES]
                [--mod_mass MOD_MASS] [--mz_error MZ_ERROR]
                [--mod_correction_tol MOD_CORRECTION_TOL]
                [--zero_based ZERO_BASED]
                [--neutral_loss_groups NEUTRAL_LOSS_GROUPS]
                [--neutral_loss_masses NEUTRAL_LOSS_MASSES]
                [--static_mod_groups STATIC_MOD_GROUPS]
                [--static_mod_masses STATIC_MOD_MASSES]
                [--fragment_types FRAGMENT_TYPES]
                [--max_fragment_charge MAX_FRAGMENT_CHARGE]
                [--hit_depth HIT_DEPTH] [--parameter_file PARAMETER_FILE]
                [--spec_file_type SPEC_FILE_TYPE]
                [--ident_file_type IDENT_FILE_TYPE]
                spec_file ident_file out_file

Positional Arguments

spec_file

MS Spectra file.

ident_file

Results of database search.

out_file

Destination for Ascores.

Named Arguments

--match_save

Default: False

--residues

Residues which can be modified.

Default: “STY”

--mod_mass

Modification mass to match to identifications. This is often rounded by search engines so this argument should be considered the most accurate mass.

Default: 79.966331

--mz_error

Tolerance in mz for deciding whether a spectral peak matches to a theoretical peak.

Default: 0.5

--mod_correction_tol

MZ tolerance for deciding whether a reported modification matches internal or user specified modifications. A wide tolerance can help overcome rounding. If more precission is needed, make sure to set this parameter and that your search engine provides for it.

Default: 1.0

--zero_based

Mod positions are by default assumed to be 1 based.

Default: False

--neutral_loss_groups

Comma separated clusters of amino acids which are expected to have a neutral loss. To specify that the modified versions of the amino acids should have the neutral loss, use lower case letters. Example: ‘st’ vs ‘ST’.

Default: “”

--neutral_loss_masses

Comma separated neutral loss masses for each of the neutral_loss_groups. Should have one mass per group. Positive masses indicate a loss, e.g. ‘18.0153’ for water loss, while negative masses can be used to indicate a gain.

Default: “”

--static_mod_groups

Comma separated clusters of amino acids which will be read in with a constant modification.

Default: “C”

--static_mod_masses

Comma separated masses for each of the static_mod_groups.

Default: “57.021464”

--fragment_types

Fragment ion types to score. Supported: bcyzZ. The special character Z indicates a z+H fragment.

Default: “by”

--max_fragment_charge

Max fragment charge to use for calculating theoretical peaks. Internally, the max fragment charge will not be allowed to be greater than the PSM charge - 1. However, if a more stringent limit needs to be set, this argument can be used.

Default: 5

--hit_depth

Number of PSMS to take from each scan. Set to negative to always analyze all.

Default: 1

--parameter_file

A file containing parameters. Example: ‘residues = STY’.

Default: “”

--spec_file_type

The type of file supplied for spectra. One of mzML or mzXML. Default: mzML.

Default: “mzML”

--ident_file_type

The type of file supplied for identifications. One of pepXML, mzIdentML, percolatorTXT, or mokapotTXT. Default: pepXML.

Default: “pepXML”