Data handling#

Autoprot Preprocessing Functions.

@author: Wignand, Julian, Johannes

@documentation: Julian

autoprot.preprocessing.data_handling.download_from_ftp(url, save_dir, login_name='anonymous', login_pw='')[source]#

Download a file from FTP.

Parameters:

url (TYPE) – DESCRIPTION.
save_dir (TYPE) – DESCRIPTION.
login_name (str) – Login name for the FTP server. Default is ‘anonymous’ working for the PRIDE FTP server.
login_pw (str) – Password for access to the FTP server. Default is ‘’

Returns:

Path to the downloaded file.

Return type:

str

Examples

Download all files from a dictionary holding file names and ftp links and save the paths to the downloaded files in a list.

>>> downloadedFiles = []
>>> for file in ftpDict.keys():
...     downloadedFiles.append(pp.download_from_ftp(ftpDict[file], r'C:\Users\jbender\Documents\python_playground'))

autoprot.preprocessing.data_handling.fetch_from_pride(accession, term, ignore_caps=True)[source]#

Get download links files belonging to a PRIDE identifier.

Parameters:

accession (str) – PRIDE identifier.
term (str) – Part of the filename belonging to the project. For example ‘proteingroups’
ignore_caps (bool, optional) – Whether to ignore capitalisation during matching of terms. The default is True.

Returns:

file_locs – Dict mapping filenames to FTP download links.

Return type:

dict

Examples

Generate a dict mapping file names to ftp download links. Not that only files containing the string proteingroups are retrieved.

>>> ftpDict = pp.fetch_from_pride("PXD031829", 'proteingroups')

autoprot.preprocessing.data_handling.read_csv(file, sep='\t', low_memory=False, **kwargs)[source]#

pd.read_csv with modified default args.

Parameters:

file (str) – Path to input file.
sep (str, optional) – Column separator. The default is ‘t’.
low_memory (bool, optional) – Whether to reduce memory consumption by inferring dtypes from chunks. The default is False.
**kwargs – see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html.

Returns:

The parsed dataframe.

Return type:

pd.DataFrame

autoprot.preprocessing.data_handling.to_csv(df, file, sep='\t', index=False, **kwargs)[source]#

Write to CSV file.

Parameters:

df (pd.DataFrame) – Dataframe to write.
file (str) – Path to output file.
sep (str, optional) – Column separator. The default is ‘t’.
index (bool, optional) – Whether to add the dataframe index to the output. The default is False.
**kwargs – see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html.

Return type:

None.

Data handling#

This Page