Data handling#

Autoprot Preprocessing Functions.

@author: Wignand, Julian, Johannes

@documentation: Julian

autoprot.preprocessing.data_handling.download_from_ftp(url, save_dir, login_name='anonymous', login_pw='')[source]#

Download a file from FTP.

Parameters:
  • url (TYPE) – DESCRIPTION.

  • save_dir (TYPE) – DESCRIPTION.

  • login_name (str) – Login name for the FTP server. Default is ‘anonymous’ working for the PRIDE FTP server.

  • login_pw (str) – Password for access to the FTP server. Default is ‘’

Returns:

Path to the downloaded file.

Return type:

str

Examples

Download all files from a dictionary holding file names and ftp links and save the paths to the downloaded files in a list.

>>> downloadedFiles = []
>>> for file in ftpDict.keys():
...     downloadedFiles.append(pp.download_from_ftp(ftpDict[file], r'C:\Users\jbender\Documents\python_playground'))
autoprot.preprocessing.data_handling.fetch_from_pride(accession, term, ignore_caps=True)[source]#

Get download links files belonging to a PRIDE identifier.

Parameters:
  • accession (str) – PRIDE identifier.

  • term (str) – Part of the filename belonging to the project. For example ‘proteingroups’

  • ignore_caps (bool, optional) – Whether to ignore capitalisation during matching of terms. The default is True.

Returns:

file_locs – Dict mapping filenames to FTP download links.

Return type:

dict

Examples

Generate a dict mapping file names to ftp download links. Not that only files containing the string proteingroups are retrieved.

>>> ftpDict = pp.fetch_from_pride("PXD031829", 'proteingroups')
autoprot.preprocessing.data_handling.read_csv(file, sep='\t', low_memory=False, **kwargs)[source]#

pd.read_csv with modified default args.

Parameters:
Returns:

The parsed dataframe.

Return type:

pd.DataFrame

autoprot.preprocessing.data_handling.to_csv(df, file, sep='\t', index=False, **kwargs)[source]#

Write to CSV file.

Parameters:
Return type:

None.