Data handling#
Autoprot Preprocessing Functions.
@author: Wignand, Julian, Johannes
@documentation: Julian
- autoprot.preprocessing.data_handling.download_from_ftp(url, save_dir, login_name='anonymous', login_pw='')[source]#
Download a file from FTP.
- Parameters:
url (TYPE) – DESCRIPTION.
save_dir (TYPE) – DESCRIPTION.
login_name (str) – Login name for the FTP server. Default is ‘anonymous’ working for the PRIDE FTP server.
login_pw (str) – Password for access to the FTP server. Default is ‘’
- Returns:
Path to the downloaded file.
- Return type:
str
Examples
Download all files from a dictionary holding file names and ftp links and save the paths to the downloaded files in a list.
>>> downloadedFiles = [] >>> for file in ftpDict.keys(): ... downloadedFiles.append(pp.download_from_ftp(ftpDict[file], r'C:\Users\jbender\Documents\python_playground'))
- autoprot.preprocessing.data_handling.fetch_from_pride(accession, term, ignore_caps=True)[source]#
Get download links files belonging to a PRIDE identifier.
- Parameters:
accession (str) – PRIDE identifier.
term (str) – Part of the filename belonging to the project. For example ‘proteingroups’
ignore_caps (bool, optional) – Whether to ignore capitalisation during matching of terms. The default is True.
- Returns:
file_locs – Dict mapping filenames to FTP download links.
- Return type:
dict
Examples
Generate a dict mapping file names to ftp download links. Not that only files containing the string proteingroups are retrieved.
>>> ftpDict = pp.fetch_from_pride("PXD031829", 'proteingroups')
- autoprot.preprocessing.data_handling.read_csv(file, sep='\t', low_memory=False, **kwargs)[source]#
pd.read_csv with modified default args.
- Parameters:
file (str) – Path to input file.
sep (str, optional) – Column separator. The default is ‘t’.
low_memory (bool, optional) – Whether to reduce memory consumption by inferring dtypes from chunks. The default is False.
**kwargs – see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html.
- Returns:
The parsed dataframe.
- Return type:
pd.DataFrame
- autoprot.preprocessing.data_handling.to_csv(df, file, sep='\t', index=False, **kwargs)[source]#
Write to CSV file.
- Parameters:
df (pd.DataFrame) – Dataframe to write.
file (str) – Path to output file.
sep (str, optional) – Column separator. The default is ‘t’.
index (bool, optional) – Whether to add the dataframe index to the output. The default is False.
**kwargs – see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html.
- Return type:
None.