ml.process¶
Process ab-initio molecular dynamics simulations.
Module Contents¶
Functions¶
|
Check if a file with the given name exists. If it does not exist, the function ends the current |
Combines single point xyz's for all replicates. |
|
|
Combine the all_charges.xls files for replicates into a master charge file. |
|
Collects charges or coordinates into a xls and xyz file across replicates. |
|
Combines the charge_mull.xls files generate by TeraChem single points. |
|
Converts an xyz trajectory file into a pdb trajectory file. |
|
Calculate pairwise distances between residue centers of mass and save the result to a CSV file. |
|
Generate pairwise charge features for a given metalloenzyme structure. |
|
Gets the residue identifiers such as Ala1 or Cys24. |
|
Sums the charges for all atoms by residue. |
|
Create final charge data set. |
|
Calculate the electrostatic potential (ESP) of a molecular component. |
|
Loops over replicates and single points and collects metal-centered ESPs. |
|
Add the "upper" and "lower" columns from a CSV file to a DataFrame by removing |
- ml.process.check_file_exists(filename)[source]¶
Check if a file with the given name exists. If it does not exist, the function ends the current Python session with an informative error message.
- Parameters:
filename (str) – The name of the file to check for existence.
- ml.process.combine_sp_xyz()[source]¶
Combines single point xyz’s for all replicates.
The QM single points each of a geometry file. Combines all those xyz files into. Preferential to using the other geometry files to insure they are identical.
- Returns:
replicate_info – List of tuples with replicate number and frame count for the replicates.
- Return type:
List[tuple()]
- ml.process.combine_qm_replicates() None[source]¶
Combine the all_charges.xls files for replicates into a master charge file.
The combined file contains a single header with atom numbers as columns. Each row represents a new charge instance. The first column indicates which replicate the charge came from.
- ml.process.combine_replicates(all_charges: str = 'all_charges.xls', all_coors: str = 'all_coors.xyz') None[source]¶
Collects charges or coordinates into a xls and xyz file across replicates.
- Parameters:
all_charges (str) – The name of the file containing all charges in xls format.
all_coors.xyz (str) – The name of the file containing the coordinates in xyz format.
Notes
Run from the directory that contains the replicates. Run combine_restarts first for if each replicated was run across multiple runs. Generalized to combine any number of replicates.
See also
qa.process.combine_restartsCombines restarts and should be run first.
- ml.process.combine_qm_charges(first_job: int, last_job: int, step: int) None[source]¶
Combines the charge_mull.xls files generate by TeraChem single points.
After running periodic single points on the ab-initio MD data, we need to process the charge data so that it matches the SQM data. This code gets the charges from each single point and combines them. Results are stored in a tabular form.
- Parameters:
first_job (int) – The name of the first directory and first job e.g., 0
last_job (int) – The name of the last directory and last job e.g., 39901
step (int) – The step size between each single point.
- ml.process.xyz2pdb_traj() None[source]¶
Converts an xyz trajectory file into a pdb trajectory file.
Note
Make sure to manually check the PDB that is read in. Assumes no header lines. Assumes that the only TER flag is at the end.
- ml.process.pairwise_distances_csv(pdb_traj_path, output_file, replicate_info, remove=[])[source]¶
Calculate pairwise distances between residue centers of mass and save the result to a CSV file.
- Parameters:
pdb_traj_path (str) – The file path of the PDB trajectory file.
output_file (str) – The name of the output CSV file.
remove (list of ints) – A list of integars corresponding to residues indexed at zero to drop
- ml.process.pairwise_charge_features(structure, charge_data)[source]¶
Generate pairwise charge features for a given metalloenzyme structure.
This function reads charge data from a CSV file, computes pairwise charge features using the specified operation (addition or multiplication), and saves the results as a new CSV file. The input CSV file should contain columns with charges for each amino acid in the metalloenzyme, with each row representing a frame from an ab-initio molecular dynamics simulation. The last column should be named “replicate” and contain information about the replicate each row belongs to.
- Parameters:
structure (str) – The name of the metalloenzyme structure, used to find the input CSV file (named “{structure}_charges.csv”) and to name the output CSV file (named “{structure}_charges_pairwise_{operation}.csv”).
- Raises:
ValueError – If the user inputs an invalid operation for pairwise charge features. Valid operations are “add” and “multiply”.
- ml.process.get_residue_identifiers(template, by_atom=True) List[str][source]¶
Gets the residue identifiers such as Ala1 or Cys24.
Returns either the residue identifiers for every atom, if by_atom = True or for just the unique amino acids if by_atom = False.
- Parameters:
template (str) – The name of the template pdb for the protein of interest.
by_atom (bool) – A boolean value for whether to return the atom identifiers for all atoms
- Returns:
residues_indentifier – A list of the residue identifiers
- Return type:
List(str)
- ml.process.summed_residue_charge(charge_data: pandas.DataFrame, template: str)[source]¶
Sums the charges for all atoms by residue.
Reduces inaccuracies introduced by the limitations of Mulliken charges.
- Parameters:
charge_data (pd.DataFrame) – A DataFrame containing the charge data.
template (str) – The name of the template pdb for the protein of interest.
- Returns:
sum_by_residues – The charge data averaged by residue and stored as a pd.DataFrame.
- Return type:
pd.DataFrame
- ml.process.final_charge_dataset(charge_file: str, template: str, mutations: List[int]) pandas.DataFrame[source]¶
Create final charge data set.
The output from the combined charges is an .xls file with atoms as columns. We will combine the atoms by residues and average the charges.
- Returns:
charges_df – The original charge data as a pandas dataframe.
- Return type:
pd.DataFrame
- ml.process.calculate_esp(component_atoms, scheme)[source]¶
Calculate the electrostatic potential (ESP) of a molecular component.
Takes the output from a Multiwfn charge calculation and calculates the ESP. Run it from the folder that contains all replicates. It will generate a single csv file with all the charges for your residue, with one component/column specified in the input residue dictionary.
- Parameters:
component_atoms (List[int]) – A list of the atoms in a given component
- ml.process.collect_esp_components(first_job: int, last_job: int, step: int) None[source]¶
Loops over replicates and single points and collects metal-centered ESPs.
The main purpose is to navigagt the file structure and collect the data. The computing of the ESP is done in the calculate_esp() function.
- Parameters:
first_job (int) – The name of the first directory and first job e.g., 0
last_job (int) – The name of the last directory and last job e.g., 39900
step (int) – The step size between each single point.
See also
qa.process.calculate_esp
- ml.process.add_esp_charges(charges_df, esp_scheme, geometry_name)[source]¶
Add the “upper” and “lower” columns from a CSV file to a DataFrame by removing and re-adding the “replicates” column.
- Parameters:
charges_df (pandas.DataFrame) – The DataFrame to which the “upper” and “lower” columns will be added.
esp_scheme (str) – The name of the esp file.
geometry_name (str) – The name of the mimochrome or directory where we are working.
- Returns:
The modified DataFrame with the “upper” and “lower” columns added and the “replicates” column re-added at the end.
- Return type:
pandas.DataFrame