QHTCP - Hartman Lab User's Guide.md 338 KB

QHTCP - Hartman Lab User’s Guide

Overview and Introduction to Directory Structure

There should be at least 4 subdirectories to organize Q-HTCP data and analysis. The parent directory is simply called ‘Q-HTCP’ and the 4 are subdirectories described below (Fig. 1):

  1. ‘ExpJobs’- This directory contains raw image data and image analysis results for the entire collection of Q-HTCP experiments. We recommend each subdirectory within ‘ExpJobs” should represent a single Q-HTCP experiment and be named using the following convention (AB yyyy_mmdd_PerturbatationsOfInterest): experimenter initials (‘AB ‘), date (‘yyyy_mmdd_’), and brief description (‘drugs_medias’). Each subdirectory contains the Raw Image Folders for that experiment (a series of N folders with successive integer labels 1 to N, each folder containing the time series of images for a single cell array). It also contains a user-supplied subfolder, which must be named ‘’MasterPlateFiles” and must contain two excel files, one named ‘DrugMedia_*experimentdescription*’ and the other named ‘MasterPlate_*experimentdescription*’. The bolded part of the file name including the underscore is required. The italicized part is optional description. Generally the ‘DrugMedia_’ file merits description. If the standard MasterPlate_Template file is being used, it’s not needed to customize then name. On the other hand if the template is modified, it is recommended to rename it and describe accordingly - a useful convention is to use the same name for the MP files as given to the experiment (i.e, the parent ExpJobs subdirectory described above) after the underscores. The ‘MasterPlate_’ file contain associated cell array information (culture IDs for all of the cell arrays in the experiment) while the ‘DrugMedia_’ file contains information about the media that the cell array is printed to. Together they encapsulate and define the experimental design. The QHTCPImageFolders and ‘MasterPlateFiles’ folder are the inputs for image analysis with EASY software. As further described below, EASY will automatically generate a ‘Results’ directory (within the ExpJobs/‘ExperimentJob’ folder) with a name that consists of a system-generated timestamp and an optional short description provided by the user (Fig.2). The ‘Results’ directory is created and entered, using the “File >> New Experiment” dropdown in EASY. Multiple ‘Results’ files may be created (and uniquely named) within an ‘ExperimentJob’ folder.

  2. ‘EASY’- This directory contains the GUI-enabled MATLAB software to accomplish image analysis and growth curve fitting. EASY analyzes Q-HTCP image data within an ‘ExperimentJob’’ folder (described above; each cell array has its own folder containing its entire time series of images). EASY analysis produces image quantification data and growth curve fitting results for each cell array; these results are subsequently assembled into a single file and labeled, using information contained in the ‘MasterPlate_’ and ‘DrugMedia_’ files in the ‘MasterPlateFiles’ subdirectory. The final files (named ‘!!ResultsStd_.txt’ or ‘!!ResultsELr_.txt’) are produced in a subdirectory that EASY creates within the ‘ExperimentJob’ folder, named ‘/ResultsTimeStampDesc/PrintResults’ (Fig. 2). The /EASY directory is simply where the latest EASY version resides (additional versions in development or legacy versions may also be stored there). Note: The raw data inputs and result outputs for EASY are kept in the ‘ExpJobs’ directory. EASY also outputs a ‘.mat’ file that is stored in the ‘matResults’ folder and is named with the TimeStamp and user-provided name appended to the ‘Results’ folder name when ‘New Experiment’ is executed from the ‘File’ Dropdown menu in the EASY console.

  3. ‘EZview’- This directory contains the GUI-enabled MATLAB software to conveniently and efficiently mine the raw cell array image data for a Q-HTCP experiment. It takes the Results.m file (created by EASY software) as an input and permits the user to navigate through the raw image data and growth curve results for the experiment. The /EZview provides a place for storing the the latest EZview version (as well as other EZview versions). EZview provides a GUI for examining the EASY results as provided in the …/matResults/… .mat file.

  4. ‘StudiesQHTCP’ - A software composite (MATLAB, JAVA, R, Python, Perl, Shell) that takes growth curve results (created by EASY software) as an input and successively generates interaction Z-score results, which are used for graphing gene interactions, Clustering, Gene Ontology analysis, and other ways of interpreting and visualizing the experimental quality and outcomes. {The /StudiesQHTCP folder contains the ordered command line scripts that call sets of other scripts to perform data selection and adaptation from the extracted text results spreadsheet found in the /ExpJobs/experiment name/Results…/PrintResults/ folder. In particular the ‘user customize interactionCode4experiment.R’ file. It also contains a multitude of R generated plots based on the selected data and possible adaptation. All clustering and Gene ontology analysis are derived from the ‘ZScores_Interaction.csv’ file found in the/ZScores subdirectory.}

  5. ‘Master Plates’ - This optional folder is a convenient place to store copies of the ‘MasterPlate_’ and a ‘DrugMedia_’ file templates, along with previously used files that may have been modified and could be reused or further modified to enable future analyses. These two file types are required in the ‘MasterPlateFiles’ folder, which catalogs experimental information specific to individual Jobs in the ExpJobs folder, as described further below.

ExpJobs

  1. The ExpJobs folder contains subdirectories, named accordingly for each experiment. Inside the respective experiment directory, folders containing the time series for each cell array that is part of the experiment are named, numerically, ‘1’, ‘2’, …. There should be one folder for each cell array (these are generated at the image collection stage). The images are provided for this Q-HTCP study, which consists of two experiments. In addition to the image folders, each experiment will contain a subdirectory, named ‘MasterPlateFiles’, which must contain two files inside: one is the ‘MasterPlate_’, and the other is the ‘Drugmedia_.csv’ file custom names can be appended after the underscore if desired.

EASY

  1. Architecture of the /EASY Subdirectory:

/EASY

/figs

/PTmats

datatipp.m

DgenNoGrowthResults200809.m

DMPexcel2mat\_2023winLinix.m

EASYconsole.fig

EASYconsole.m

NCdisplayGui.m

NCfitImCFparforFailGbl2.m

NCscurImCF\_3parfor.m

NCsingleDisplay.m

NIcircle.m

NImParamRadiusGui.m

NIscanIntensBGpar4GblFnc.m

p4loop8c.m

par4Gbl\_Main8c.m

par4GblFnc8c.m
  1. To analyze a new Q-HTCP experiment:
  2. Open the EASY Software.

    1. Open ‘EstartConsole.m’ with MATLAB
    2. Click the Run icon (play button)
    3. When prompted, click “Change Folder” (do not select “Add to Path”).
    4. In the pop-up display, select from the ‘File’ dropdown: ‘New Experiment’. From the pop-up, choose where to save the new file. Navigate to the relevant job in the ExpJobs folder, name the file accordingly, and click ‘save’. The newly created .mat file in the newly created Results folder will automatically be loaded. The file name will then be automatically appended by the code with the current date information (e.g. ‘A1.mat’ will become ‘Results2023-07-19A1)
      1. If the experiment has already been created, it can be reloaded by clicking ‘Load Experiment’ instead of ‘New Experiment’ and selecting the relevant results
    5. Next, in the pop-up display, click on the ‘Run’ dropdown menu and select ‘Image CurveFit ComboAnalysis’.
      1. In the updated pop-up, choose/highlight all desired image folders for analysis (this is generally all of the folders, since only the ones that need analysis should be there) and then click on ‘continue’. As the program is running, updates will periodically appear in the Command Window; there will be an initial pause at “Before call to NIscanIntens…..”.
      2. When the curve fitting is finished, the EASY console will pop back up. Check to see the completed analysis results in the newly created ‘PrintResults’ Folder, inside of the ‘Results’ Folder. Other folders (‘CFfigs’, ‘figs’, ‘Fotos’) are created for later optional use and will be empty. **NOTE: The image analysis is completed independent of labeling the data (strains, media type, etc. Labeling happens next with the ‘GenReports’ function).
    6. Next, click on the ‘GenReports’ dropdown and select ‘DrugMediaMP Generate .mat’
      1. **NOTE: The ‘MasterPlate’ and ‘DrugMedia’ files have very specific formats and should be completed from a template. Additionally, the Masterplate file must be exact (it must contain all and only the strains that were actually tested). For example, if only part of a library is tested, the complete library file must be modified to remove irrelevant strains.
      2. You will be prompted to first select the ‘MasterPlate’ file. You will need to navigate away from the working directory to get to it. It is fine for the ‘MasterPlate_’ file to be .xlsx (or .xls), and if you don’t see it in the popup window, then change the file type from ‘.xls’ to “all files” and then select it. Once it is selected, a report of the number of master plates in the file will pop up; when the report appears, assuming it is correct, click on ‘OK’.
      3. You will then be prompted to select the ‘DrugMedia’ file from the relevant job folder. You will automatically return to the correct prior directory location. Choose it and click ‘OK’. You may see a warning about column headers being modified, but that’s ok.
        1. This will create an additional file in the ‘MasterPlatesFiles’ folder named ‘MPDMmat.mat’
    7. Finally, click on the ‘GenReports’ dropdown and select ‘Results_Generate.’

      1. You will first see ‘!!ResultsElr_.txt’ generated in the ‘PrintResults’ folder. Refreshing will reveal an increasing file size until you see the ‘!!ResultsStd_.txt’ being generated. When finished, the ‘!!ResultsStd_.txt’ will be about the same file size and it should be used in the following StudiesQHTCP analysis.

      2. ‘NoGrowth_.txt’, and ‘GrowthOnly_.txt’ files will be generated in the ‘PrintResults’ folder.

System for Multi-QHTCP-Experiment Gene Interaction Profiling Analysis

  1. Introductory Remarks

“StudiesQHTCP” is a program that incorporates several command line scripts and provides a directory structure for input and output files.

The analysis system involves Sean Santos’ R code for calculating genetic interaction values and z-scores, clustering of gene interaction z-scores using Recursive Expectation-Maximization clustering (REMc) which relies on WEKA and Java implementation, Go Term Finder (GTF) analyses of the REMc clusters which uses python. Jingu Guo worked on REMc and GTF code and Remy Cron incorporated it into a Java ‘.jar’ file to make it possible to run by multiple users from a shared folder. The executable ‘.jar’ files and all associated Python, Perl, and R scripts are executed via a single master shell script, REMcMaster3.sh. [See section IV.7]

  1. System Requirements (software/packages necessary to run StudiesQHTCP)
    1. Software - These can all be downloaded from the respective online platforms for each operating system
      1. R
      2. Perl
      3. Java
      4. MATLAB
    2. Packages - These packages must be installed in a specific order to ensure proper installation.

For MacOS: It is recommended that MacOS users download Homebrew for easy installation of the following packages. The command prompt to download Homebrew followed by the prompts to download the necessary packages are listed below.

export HOMEBREW_BREW_GIT_REMOTE=https://github.com/Homebrew/brew

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

sudo cpan File::Map
sudo cpan ExtUtils::PkgConfig
sudo cpan GD
brew install graphiz
brew install gd
sudo cpan GO::TermFinder
brew install pdftk-java
brew install pandoc

**For Linux:** The package manager commands used below are for Debian-based distributions. 

If using Fedora or CentOS, you may need to use ‘dnf’ or ‘yum’ in place of ‘apt-get’

    sudo cpan File::Map  

sudo cpan ExtUtils::PkgConfig
sudo cpan GD
sudo apt-get install graphviz
sudo apt-get install libgd-dev
sudo cpan GO::TermFinder
sudo apt-get install pdftk-java
sudo apt-get install pandoc

For R:

install.packages(“BiocManager”)  
BiocManager::install(“org.Sc.sgd.db”)  
install.packages(‘ontologyIndex’, dep=TRUE)  

install.packages(‘ggrepel’, dep=TRUE)
install.packages(‘tidyverse’, dep=TRUE)
install.packages(‘sos’, dep=TRUE)
install.packages(‘openxlsx’, dep=TRUE)

  1. Proper Architecture of Beginning Subdirectories

/StudiesQHTCP

StudiesDataArchive.txt

/ExpStudy (user named)

/A_QHTCP Study Design and Notes

/Code

22_0602_Remy_DAmPsList.txt

All_SGD_GOTerms_for_QHTCPtk.csv

All_SGD_GOTerms.csv

/devStuff

InteractTemplateB4fixes.R

InteractTemplateB4Prompt4SDinput.R

gene_association.sgd

gene_ontology_edit.obo

go_terms.tab

GTAtemplate.R

ORFs_w_DAmP_list.txt

PairwiseLK.R

Parameters.csv

/ScriptTemplates {preserves starting templates of code modified by user}

/BU_Legacy

InteractTemplate.R

Concatenate_GTF_results.py

Concatenate_GTF_resultsB4REMcMaster2.py

GTAtemplate.R

InteractionTemplate230119.R

JoinInteractExps.R

JoinInteractExps3dev.R

PairwiseK_lbl.r

PairwiseL_lbl.R

PairwiseLK.R

Remy_yor_dF_correlation_study.R

TSHeatmaps5dev2.R

SGD_features.tab

SGD_features.tab.txt

/Sscripts

18_0205_heatmaps_zscores_2SD_color_ARem_Z_lm.R

22_0603_Remy_Exlcude_DAmPs.R

cmd_Doxo_SumZScore_Z_lm_Interaction_d...alidationedit.R

cmd_ScoreAllGOTerms_From_Z_lm_V2.R

Compare_GTF_Averages_BetweenScreens_lm_Kvals_v2.R

Compare_GTF_Averages_BetweenScreens_lm_Lvals_v2.R

Compare_GTF_Averages_BetweenScreens_lm_v2.R

GO_list_All_ChildTerms_lmZscore_max100child_Heatmaps_3terms_V2.R

GO_list_All_ChildTerms_lmZscore_max100child_Heatmaps_4terms_aging.R

GO_list_All_ChildTerms_lmZscore_max100child_Heatmaps_4terms_V2.R

GO_list_All_ChildTerms_lmZscore_max100child_Heatmaps_5terms_V2.R

GO_list_All_ChildTerms_lmZscore_max100child_Heatmaps_V2.R

ScoreAllGOTerms_From_Z_lm_V2.R

StudyInfo.csv

TSHeatmaps5dev2.R

/Documentation

**\*\*\*ADD IN SEAN’S MANUAL\*\*\***

Jingyu_REMc_Instruction for clustering and...2013Mar.docx

/LegacyDocs

QHTCP Analysis SystemRev2.docx

QHTCP Analysis SystemRev2a.docx

QHTCP Analysis SystemRev2b.docx

QHTCP Analysis SystemRev2b0.docx

QHTCP Analysis SystemRev2c.docx

/Exp1

/backups

InteractTemplateB4Prompt4SDinput.R

ExpFrontend.m

Z_InteractionTemplate.R

Notes Exp1

/ZScores

/Exp2

/backups

InteractTemplateB4Prompt4SDinput.R

ExpFrontend.m

Z_InteractionTemplate.R

Notes Exp2

/ZScores

/Exp3

/backups

InteractTemplateB4Prompt4SDinput.R

ExpFrontend.m

Z_InteractionTemplate.R

Notes Exp3

/ZScores

/Exp4

/backups

InteractTemplateB4Prompt4SDinput.R

ExpFrontend.m

Z_InteractionTemplate.R

Notes Exp4

/ZScores

/GTAresults

/Exp1

/Exp2

/Exp3

/Exp4

/REMc

AddShiftVals2.R

DconJG2.py

GeneByGOAttributeMatrix_nofiltering-2009Dec07.tab

/GTF

analyze_v2.pl

concatenate_GTF_Results.py

gene_association.sgd

gene_ontology_edi.obd

GOontologyPar.sh

SeanEmailPython2

SGD_features.tab

SGD_features.tab.txt

Terms2tsv_v4.pl

/Component

analyze_v2.pl

concatenate_GTF_Results.py

gene_association.sgd

gene_ontology_edi.obd

ORF_List_DAmPs_Only.txt

ORF_List_Without_DAmPs.txt

ORFs_w_DAmP_list.txt

SGD_features.tab

SGD_features.tab.txt

terms2tsv_v4.pl

/Function

analyze_v2.pl

concatenate_GTF_Results.py

gene_association.sgd

gene_ontology_edi.obd

ORF_List_DAmPs_Only.txt

ORF_List_Without_DAmPs.txt

ORFs_w_DAmP_list.txt

SGD_features.tab

SGD_features.tab.txt

terms2tsv_v4.pl

/Process

analyze_v2.pl

concatenate_GTF_Results.py

gene_association.sgd

gene_ontology_edi.obd

ORF_List_DAmPs_Only.txt

ORF_List_Without_DAmPs.txt

ORFs_w_DAmP_list.txt

SGD_features.tab

SGD_features.tab.txt

terms2tsv_v4.pl

jingyuJava_1_7_extractLib.jar

JoinInteractExps3dev.R

mComponent.sh

mFunction.sh

mProcess.sh

Notes/ REMc, GTF_Ontologies and Associated_Heatmaps

ORF_List_DAmPs_Only.txt

ORF_List_Without_DAmPs.txt

ORFs_w_DAmP_list.txt

/REMcHeatmaps

/REMcHeatmapsWithHomolgy

17_0503_DAmPs_Only.txt

/Homology

REMcHeatmaps_Z_lm_wDAmPs_andHomology_221212.R

Yeast_Human_Homology_Mapping_biomaRt_18_0902.csv

REMcJar2.sh

REMcJar2old.sh

REMcMaster2.sh

REMcMaster3.sh

/TermSpecificHeatmaps

{Note: The TSHeatmaps… .R contains a **Table** section near the start where is a default set of tables. If the user wishes to use different tables, i.e. (All_SGD_GOTerms_for_... .csv) that should be modified and the TSH… . R script relabeled to reflect user modification and that is included in the/Code section. Users should always write notes related to code modifications and study goals-strategies.

/test-DevStuff

Int4DoxGem.R

InteractionTemplate230119cutdown4compareSSV6.R

REMcMaster2Bad.sh

As stated earlier, the user can add folders to back up temporary results, study-related notes, or other related work. However, it is advised to set up and use separate STUDIES when evaluating differing data sets whether that is from experiment results files or from differing data selections in the first interaction … .R script stage. This reduces confusion at the time of the study and especially for those reviewing study analysis in the future.

  1. How-To Procedure: Execute a Multi-experiment Study

To begin, consider the goals of the study and design a strategy of experiments to include in the study. Consider the quality of the experiment runs using EZview to see if there are systematic problems that are readily detectable. In some cases, one may wish to design a ‘pilot’ study for discovery purposes. There is no problem doing that, just take a template study, copy and rename it as XYZpilotStudy etc. However, careful examination of the experimental results using EZview will likely save time in the long run. One may be able to relatively quickly run the interaction Z scores (the main challenge there is the user creation of customized interaction… .R code. I have tried to simplify this by locating the user edits near the top).

Preliminary Task

  1. Copy the Template directory structure and rename it for your study.
    1. This directory contains the structure and code for analyzing a multi-experiment study. It contains the code templates and other reference files called by the scripts.

The user specifies the arrangement of the data (in ‘StudyInfo.csv’) by assigning it to /Exp1, /Exp2, /Exp3, or Exp4, which is particularly relevant for clustering as results will be ordered left to right according to experiment number.

A utility (ExpFrontend.m) was made for recording into a spreadsheet (‘StudiesDataArchive.txt’) the date and files used (i.e., directory paths to the !!Results files used as input for Z-interaction script) for each multi-experiment study.

Experiment Specific Interaction Zscores generation

2. In your files directory, open the /Code folder, edit the ‘StudyInfo.csv’ file

  1. Enter the desired Experiment names- ***order the names in the way you want them to appear in the REMc heatmaps; and make sure to run the front end programs (below) in the correct order (e.g., run front end in ‘exp1’ folder to call the !!Results file for the experiment you named as exp1 in the StudyInfo.csv file)*
    1. The GTA and pairwise, TSHeatmaps, JoinInteractions and GTF Heatmap scripts use this table to label results and heatmaps in a meaningful way for the user and others. The BackgroundSD and ZscoreJoinSD fields will be filled automatically according to user specifications, at a later step in the QHTCP study process.

3. Open MATLAB and in the application navigate to each specific /Exp folder, call and execute ExpFrontend.m by clicking the play icon. **Use the “Open file” function from within Matlab; do not ‘double click’ on the file from the directory. When prompted, navigate to the ExpJobs folder and the PrintResults folder within the correct job folder. Repeat this for every Exp# folder depending on how many experiments are being performed. The Exp# folder must correspond to the StudyInfo.csv created above.

Note: Before doing this, it’s a good idea to compare the ref and non-ref CPP average and median values. If they are not approximately equal, then may be helpful to standardize Ref values to the measures of central tendency of the Non-refs, because the Ref CPPs are used for the z-scores, which should be centered around zero.

  1. This script will copy the !!ResultsStd file (located in /PrintResults in the relevant job folder in /ExpJobs ***rename this !!Results file before running front end; we normally use the ‘STD’ (not the ‘ELR’ file)* chosen to the Exp# directory as can be seen in the “Current Folder” column in MATLAB, and it updates ‘StudiesDataArchive.txt’ file that resides in the /StudiesQHTCP folder. ‘StudiesDataArchive.txt’ is a log of file paths used for different studies, including timestamps.

Do this to document the names, dates and paths of all the studies and experiment data used in each study. Note, one should only have a single ‘!!Results…’ file for each /Exp_ to prevent ambiguity and confusion. If you decide to use a new or different ‘!!Results…’ sheet from what was used in a previous “QHTCP Study”, remove the one not being used. NOTE: if you copy a ‘!!Results…’ file in by hand, it will not be recorded in the ‘StudiesDataArchive.txt’ file and so will not be documented for future reference. If you use the ExpFrontend.m utility it will append the new source for the raw !!Results… to the ‘StudiesDataArchive.txt’ file.

As stated above, it is advantageous to think about the comparisons one wishes to make so as to order the experiments in a rational way as it relates to the presentation of plots. That is, which results from sheets and selected ‘interaction … .R’, user modified script, is used in /Exp1, Exp2, Exp3 and Exp4 as explained in the following section.

4. In each /Exp# folder, rename the Z_InteractionTemplate.R script according to the experiment focus

  1. Example: Interaction, Experimenter Initials, Experiment Focus --> ‘int_RM_2PE.R’
    5. Open the renamed interaction script, and edit each one beginning at the ‘++BEGIN USER DATA SELECTION++’
  2. This is designed so that the data of interest for each experiment is appropriately selected from the !!Results…txt file
  3. The user can edit, step through, and test the R script without running through the whole routine by observing the resultant data table created in RStudio.
    1. The Z_InteractionTemplate.R script has a collection of code lines that have been used for prior analyses (generally to select data from various !!Results…txt files), which may be commented out (if not relevant), reused as needed, and/or modified for a new study. These include lines associated with the removal of ‘dAmps’, specific concentrations, and items described in the ‘Specifics’ and ‘Media’, i.e., information specific to a particular experiment design. There are also code lines to replace gene names ‘OCT1/YKL134C’ /’MAY24/YPR153W’ and that get converted to date format in excel, by using only the ORF name and to remove data rows with ‘Blank’ listed; these lines of code convenient to reuse. Hopefully, these code lines can be used, commented out, or adapted to aid the user in modifying this section to the specific data requirements of the study. As a new user data filter code is developed for each ‘Study’ (and vetted), those lines can be added to the InteractionTemplate230119.R code in the /StudyTemplate folders to aid in future studies.

6. Open a terminal, navigate to each /Exp# folder, and execute the (customized) ‘Z_InteractionTemplate_…” script by using the command line below:

Rscript RenamedInteractionTemplate.R \!\!Results… .txt

**need to change wording to choose SD of Delta_Background to exclude Data from analysis.
[1] "Be sure to enter Background noise filter standard deviation i.e., 3 or 5 per Sean"
Enter a Standard Deviation value to noise filter >>

[1] Enter Standard deviation value for removing data for cultures due to high background (e.g., contaminated cultures). Generally set this very high (e.g., ‘20’) on the first run in order NOT to remove data, e.g. ‘20’. Review QC data and inspect raw image data to decide if it is desirable to remove data, and then rerun analysis.
Enter a Background SD threshold for EXCLUDING culture data from further analysis:

  1. The script will request for the user to input a ‘Background Standard Deviation Value’. This Background value removes data where there is high pixel intensity in the background regions of a spot culture (i.e., suspected contamination). 5 is a minimum recommended value, because lower values result in more data being removed, and often times this is undesirable if contamination occurs late after the carrying capacity of the yeast culture is reached. This is most often “trial and error”, meaning there is a ‘Frequency_Delta_Background.pdf’ report in the /Exp_/ZScores/QC/ folder to evaluate whether the chosen value was suitable (and if not the analysis can simply be rerun with a more optimal choice). In general, err on the high side, with BSD of 10 or 12…. One can also use EZview to examine the raw images and individual cultures potentially included/excluded as a consequence of the selected value. Background values are reported in the results sheet and so could also be analyzed there..

    (For new terminal users, directory navigation tips are described below)

  2. To navigate to the directory one can use the directory GUI (in X2Go, use the GUI to navigate to desired operating directory and then from the ‘File’ menu, choose “Open in Terminal’)

  3. Alternatively, navigate there through the terminal window: ‘pwd’ “prints the current working directory”, ‘ls’ “lists” the subfolders in the current directory. ‘cd’’ followed by the name of the ‘subdirectory’ will move down into it. “cd .. “ changes to the parent directory

  4. The tab key can be used to autofill unique characters after typing the initial letters of a folder or file you wish to call.

The template structure above assists the user with organization and management of Q-HTCP files and provides a uniform directory structure to streamline reference across different users and experiments.

Since we are systematically comparing perturbations, most Q-HTCP studies will consist of either 2 or 4 experiment subfolders.

The Zscores files are used for subsequent analyses, including REMc, GTA and Term Specific Heatmaps. These further analyses are described below and can be completed in any order and/or concurrently from separate terminals.

**Annotate Files produced and comment out code that produces files that are obsolete or clutter.

REMc

7. Navigate to the /REMc directory and run the following Rscript:

[jwrodger@hartmanlab REMc]$ sh REMcMaster3.sh

  1. There is a single shell script which will execute a series of shell script commands that were previously executed individually. To execute, open a terminal in the …/REMc folder and type the following.
  2. The command line will request the user to enter a standard deviation multiplier (factor) that will filter the ZScore data accordingly for use with REMc. That value will also be stored to the StudyInfo.csv file where the user entered descriptive Labels in at the start of this entire QHTCP study. Those labels are used throughout the process on all the graphics that are produced.

  3. The REMcMaster3.sh script will execute the entire process in roughly thirty minutes to possibly an hour. REMcMaster3.sh script tasks are as follows:

    1. Joins the interaction Zscores into a table appropriate for the REMc jar file executable
      1. Execute the REMcJar2.sh which calls the java executable with the appropriate arguments
      2. Add shift columns back to the REMcRdy_lm_only.csv-finalTable.csv file to produce the "REMcWithShift.csv" file to be used to produce the REMc Heatmaps task.
      3. Execute REMcHeatmaps_zscores.R contingent upon "REMcWithShift.csv" file being created.
      4. Execute REMcHeatmapsWithHomology/REMcHeatmaps_Z_lm_wDAmPs_andHomology_221212.R that is located in …/REMc/REMcHeatmapsWithHomology/REMcHeatmaps_Z_lm_wDAmPs_andHomology_221212.R . This is a copy of the R script used by Denver and renamed with all the essential files it needs stored with it.
      5. Execute the process of GTF contingent upon "REMcRdy_lm_only.csv-finalTable.csv" being produced by "REMcJar.sh" (Step1). This process involves several tasks as follows:
        1. Execute DconJG2.py to produce the /Process/REMcRdy_lm_only directory and files. This is first created in the ../Process folder and …
        2. Then copied to the ../Function and ../Component folders
        3. Next, contingent upon the production the ../REMcRdy_lm_only folder being made, the REMcMaster3.sh script executes the mProcess.sh, mFunction.sh and mComponent.sh tasks within the associated ontology directories. These ontology scripts call the Perl utilities and arguments from the respective ontology folders. These ontology scripts also execute the Concatenate_GTF_results.py script to produce the /Process/ProcessResults.txt, /Function/FunctionResults.txt and /Component/ComponentResults.txt output files.
    2. These files are concatenated with a Linux utility:
      Pdftk REMcHeatmaps/*.pdf < output path-filename.pdf>

**Annotate Files produced and comment out code that produces files that are obsolete or clutter

GTA related work

8. Navigate to the Code directory and open a terminal to run the following Rscript to produce the GTA results for each Exp#:

[jwrodger@hartmanlab Code]$ Rscript GTAtemplate.R

  1. It will create the /GTAresults/Exp# directories for the number of experiments for which you have produced Zscore_interactions.csv and populate them with output files. The script ‘knows’ where to find the interaction files and where to put the results.

9. Still in the /Code directory, run the following Rscript, entering two Exp# files as input arguments to compare:

[jwrodger@hartmanlab Code]$ Rscript PairwiseLK.R Exp1 Exp2

  1. This script will perform both L and K comparisons for the specified experiment folders. Note this could just as easily have been Exp3 and Exp4 or even Exp1 and Exp3. The script ‘knows’ how to label the results as it has the StudyInfo.txt table to assign your labeling convention to the results. The code uses the naming convention of PairwiseCompare_Exp’#’-Exp’#’ to standardize and keep simple the structural naming (where ‘X’ is either K or L and ‘Y’ is the number of the experiment GTA results to be found in ../GTAresult/Exp_). The GTA analysis is now complete. {FYI There are also individual scripts that just do the ‘L’ or ‘K’ pairwise studies in the ../Code folder.}

Term Specific Heatmaps Production

10. Navigate to the /Code directory and run the following Rscript to produce the Term Specific Heatmaps:

[jwrodger@hartmanlab Code]$ Rscript TSHeatmaps5dev2.R

  1. The Term Specific Heatmaps are produced directly from the
    ../ExpStudy/Exp_/ZScores/ZScores_Interaction.csv file generated by the user modified interaction… .R script. The heatmap labeling is per the names the user wrote into the StudyInfo.txt spreadsheet.
  2. Verify that the All_SGD_GOTerms_for_QHTCPtk.csv found in ../Code is what you wish to use or if you wish to use a custom modified version. If you wish to use a custom modified version, create it and modify the TSHeatmaps template script (TSHeatmaps5dev2.R) and save it as a ‘TSH_study specific name’.

**Naming of ‘StudiesQHTCP/Study/output files. The resulting files produced in StudiesQHTCP folders have standard file names, which will be the same initially, across all studies. However, when the analysis is complete, and it may be desirable to move some of the results files outside of their native directories, and therefore useful to give them unique and recognizable names. Descriptive names can be added to all files by running two scripts from a terminal after navigating to the corresponding code directory:

i. “sh RenameZscores_GTAresults.sh” will add names provided in ‘StudyInfo.csv’ to files in the ‘Zscores’ subdirectory of the respective ‘Exp’ folder and to files in the ‘GTAresults’ folder.

ii. “sh RenameREMcHtmaps_GTFfiles.sh” will append the label given by the user when prompted to files in the ‘REMc’ and ‘TermSpecificHeatmaps’ folders.

https://weka.sourceforge.io/doc.dev/weka/clusterers/RandomizableDensityBasedClusterer.html

#setSeed-int- the above link is relevant to how REMc results are always the same (presumably because seed selection is non-random).

Questions to address / notes to incorporate here or elsewhere:

We need full documentation for all of the current workflow. There are different documents that need to be integrated. This will need to be updated as we make improvements to the system.

In Easy -
MasterPlate_ file must have ydl227c in orf column, or else it Z_interaction.R will fail, because it can’t calculate shift values.
Make sure there are no special characters; e.g., (), “, ‘, ?, etc.; dash and underscore are ok as delimiters
Drug_Media_ file must have letter character to be read as ‘text’.

MasterPlate_ file and DrugMedia_ are .xlsx or .xls, but !!Results_ is .txt.

In Z_interactions.R, does it require a zero concentration/perturbation (should we use zero for the low conc, even if it’s not zero), e.g., in order to do the shift correctly.

Need to enable all file types (not only .xls) as the default for GenerateResults (to select MP and DM files as .xlsx).

Explore differences between the ELR and STD files - 24_0414; John R modified Z script to format ELR file for Z_interactions.R analysis.

To keep time stamps when transferring with FileZilla, go to the transfer drop down and turn it on, see https://filezillapro.com/docs/v3/advanced/preserve-timestamps/

Could we change the ‘MasterPlateFiles’ folder label in EASY to ‘MasterPlate_DrugMedia’ (since there should be only one MP and there is also a DM file required?

I was also thinking of adding a ‘MasterPlateFilesOnly’ folder to the QHTCP directory template where one could house different MPFiles (e.g., with and without damps, with and without Refs on all MPs, etc; other custom MPFiles, updated versions, etc)

Currently updated files are in ‘23_1011_NewUpdatedMasterPlate_Files’ on Mac (yeast strains/23_0914…/)

For EASY to report cell array positions (plate_row_column) to facilitate analyzing plate artifacts. The MP File in Col 3 is called ‘LibraryLocation’ and is reported after ‘Specifics’ in the !!Results.

Can EASY/StudiesQ-HTCP be updated at any time by rerunning with updated MP file (new information for gene, desc, etc)- or maybe better to always start with a new template?

Need to be aware of file formatting to avoid dates (e.g., with gene names like MAY24, OCT1, etc, and with plate locations 1E1, 1E2, etc)- this has been less of a problem.

In StudiesQHTCP folders, remember to annotate Exp1, Exp2, in the StudyInfo.csv file.

Where are gene names called from for labeling REMc heatmaps, TSHeatmaps, Z-interaction graphs, etc? Is this file in the QHTCP ‘code’ folder, or is it in the the results file (and thus ultimately the MP file)?

Is it ok for a MasterPlate_ file to have multiple sheets (e.g., readme tab- is only the first tab read in)?
What are the rules for pulling information from the MasterPlateFile to the !!Results_ (e.g., is it the column or the Header Name, etc that is searched? Particular cells in the DrugMedia file?).

Modifier, Conc are from DM sheet, and refer to the agar media arrays. OrfRep is from MasterPlate_ File. ‘Specifics’ (Last Column) is experiment specific and accommodate designs involving differences across the multi-well liquid arrays. ‘StrainBkGrd’ (now ‘Library location’) is in the 3rd column and reported after ‘Specifics’ at the last col of the ‘!!Results..’ file.

Do we have / could we make an indicator- work in progress or idle/complete with MP/DM and after gen-report. Now, we can check for the MPDMmat.mat file, or we can look in PrintResults, but would be nice to know without looking there.

File>>Load Experiment wasn’t working (no popup to redirect). Check this again.

In EZview:

What do the File, Parameters and Tools dropdown menu items do?
What is the ‘Hide’ button for?
What is the ‘composite’ overlay good for?
What is the file that is used for the ‘Info’ function above the Gene Directory.
how to wand over labels - how does that work in matlab?

In StudiesQHTCP:

For front end, be more specific about where to navigate to find results file.

ELR type file errors out - needs to be produced in a compatible format.

**change wording to “choose SD for Delta_Background to exclude spot culture growth curve data from interaction analysis”.

GTF:
Limit to smaller terms.
Enable sort by term size.

There needs to be an annotated set of MasterPlate File templates. These could be numbered and annotated chronologically, and each experiment could specify which instance of the MP file template is used. When possible/ if necessary, the folders of plate images should be reordered rather than reordering the MP file. Each Exp, should have an ExpDesc spreadsheet in it indicating the Exp Design (summarizing what is expected in the !!Results file), based on the ‘MasterPlate_’ and ‘DrugMedia_’ files.

Need to add Ref to Blank positions in the new library construction.

In EZview:

John R made a version that runs the original Guide, the ‘exported’, or the ‘migrated’ Forms of the program. A variety of versions didn’t work very well. The original program, with some improvements, seems to work the best. We should try to optimize it.

{AppDesigner
/mnt/data/EZview/EZview2023/EZviewDev23_0921POSadaptedOnM4800_wlapp
Use EZvStartup which calls EZviewGui_7.mlapp

GUIDE
/mnt/data/EZview/EZview2023/EZviewDev23_0919POScleanup4Pub_MigrationWorkingFileExport_wlapp
Use the standard EZviewGui.m to start execution}

Update from John R:
“There are four EZstartup ----.m files.
EZvStartup.m -Guide version
EZvStartup_Export.m -Exported file migration version
EZvStartup_mlappLaptop.m -M4800 sized Laptop version
EZvStartup_mlappServer.m -Server sized

You can try them out at your convenience.
I have obviously not tried them out on a Mac Laptop.
Extra files etc. still there. It's a hack and chop job but it seems to work.
Location:
/mnt/data/EZview/EZview2023/EZviewDev23_1004POSadaptedOnM4800_wlapp”

Suggestions to improve EZview appearance:
Fix heatmap dimensions to match image dimensions. Is it possible to enable the user to adjust the heatmap dimensions (e.g. by dragging edges or corners to resize its window)?
Have an option to use a fixed heatmap scale across an experiment.
Check chronological experiments.
What does “SpotView” button do?
Can we add scrollbar to RFtab popup window so that it can be resized without losing view of table?

For StudiesQ-HTCP, For GTF, we need to make sure that the correct ORFpool (e.g., with or without damps) is being used. Can that be a selection step in the code, or an additional step to include it in the code (try, etc).

The Library Locations for E rows in the MasterPlateFiles are being converted to exponentials in the !!Results files- needs to be text. One idea is to convert to text and/or use a delimiter. If they have a delimiter, perhaps prefix of ‘mp’ (converting to text) is not needed? e.g., ‘1_E1’ instead of ‘mp1E1’?

For TermSpecificHeatMaps, which list of GOterms did we use (see Sean’s manual, p. 14; ‘1.3 Term Specific Heatmaps’). Maybe we need a shorter, or more dedicated list.

Compare our GTF to that of YeastMine to check for ‘correctness' of updated files?

In Studies QHTCP new template, edit the StudyInfo.csv on the server in Libre, but leave it as a single column (or choose to open it with comma delimiter) and edit between the commas. But don’t convert in Excel (text to columns >> resave), since this deprecates the .csv format and code won’t run anymore and gives a data frame error after the STDEV for background.

**Consider updating Z_InteractionTemplate.R in the Studies_QHTCP template folder if modifications are made for a particular study that could be useful for additional future studies. The idea would be to comment out the study-specific modifications and overwrite the existing program.

StudiesQHTCP:
RF z-interaction plots don’t include RF2; RF1 only?
We may want to set different z-score cutoffs, based on the shape of the rank plot.
We want to calculate mean and median CPPs for Refs and Non-Ref cultures. May want to adjust REF data so that the median CPP values for Ref cultures are same or close to that to the Non-Ref cultures.
We should regress through the origin for the z-score interaction fitting.
Define NG(no growth), DB(?) and SM(?) on InteractionPlots.

In MPfile_templates, replace all YKL227C with YDL227C (120 instances); may only be in the file with Refs added to MPs.

Update gene by go matrix in REMc (from Dec07 2009) folder of StudiesQHCP.

The ‘FrontEnd’ popup message when it is played should say “Select the !!Results File (in ‘ExpJobs’ folder)” to avoid confusion / remind about QHTCP structure.

In REMcMaster3.sh, change prompt to ask for Z-score value (not standard deviation) for filtering analysis.

Can REMc cluster Aniyia’s data (extract names in place of gene names) does it fail because it can’t do GTF, even though it should be able to simply cluster the Int_Z-scores and label heatmaps (i.e., do this without doing GTF).

Also, provide more detailed message when prompted to enter background level in Z_IntR.

Output Z_scoreInt file in same order as InteractionPlots.

Check all folders in template for updated files; e.g., not just the ‘code’ folder, but also Exp1/2/3/4, REMc and GTF, etc..

For EASY, need notification for successful completion after selecting drug media file with prompt ‘labeling complete, you may now generate report’.

REMc error (BMHonly run):

REMc- include LibraryLocation info in FinalTable.
This would come from looking up ‘OrfReplicate’ column in ‘MP File’, not ORF name column.
Useful to have demarcator between MP and WellPos, e.g., mp8_B24 so that clusters can be analyzed for MP artifacts (more concise preferred format is 8_B24).

End of GTA in terminal:

Full GTA result BMHonly:

The one pdf that is made can’t be opened.

When we leave out the DAmPs, we should probably still do the 2nd REF plate.

I go the same result by running the REMcMaster3.sh whether I used Zscore cutoff of 2 or 1 (2PEonly Experiment). Note: I reran the script in the same folder.

Need new strategy to check for plate artifacts - calculate REF averages and Medians and compare to non-REF averages and Medians. If need to do corrections, correct by median since these are less impacted by outlier/tails of distributions.

Heatmaps change to incorrect ones when the print heatmap function is used.

Don’t we need a program to remove the template files and keep only the results after StudiesQHTCP.

What is this step - what package do we need to openxlsx- what failed as a result of not having it?

Appendix

Notes on the standard EASY coding structure (in Matlab):

  • /figs: a folder with two pintool map (‘PT’) files in it
  • /PTmats: a folder with several .mat parameter files
  • Datatipp.m: Matlab function to display small text boxes w/ info of a particular data point when the cursor is hovered over it
  • DgenNoGrowthResults200809.m: script responsible for generating the !!Results files(Std & Elr)
  • DMPexcel2mat_2023winLinix.m: script that prompts users to select the MasterPlate and DrugMedia files
  • EASYconsole.fig: This is the figure for the EASYconsole GUIDE. In order to edit the EASYconsole using the GUIDE functionality, enter ‘ guide(“EASYconsole.fig”) ‘ into the command window. Future versions of MATLAB may no longer allow GUIDE editing and APPDESIGNER must be used.
  • EASYconsole.m: main EASY program script; large center for GUI functionality; created using GUIDE
  • EstartConsole.m: starter script for EASY; calls EASYconsole.m
  • NCdisplayGui.m: called when the user accesses the ‘CurveFit Display’ functionality on the console; responsible for the working GUI [part 1]
  • NCsingleDisplay.m: assists in producing a functional GUI when accessing the ‘CurveFit Display’ functionality on the console [part 2]
  • NCfitImCFparforFailGbl2.m: calls NCscurImCF_3parfor.m; performs curve fitting and data analysis as a set of cultures given their time points and intensity values >> produces ‘FitResultsComplete.txt’ in PrintResults folder [must work in conjunction with ImParamGui.m]
  • NCscurImCF_3parfor.m: part of the computational process when using the ‘Image CurveFit ComboAnalysis’ functionality; performs growth curve fitting using a logistic growth model; performs the curve fit both with and without the early-late-r improved code; collects and consolidates data into structures and that is returned back up through the calling functions
  • NIcircle.m: generates circular boundaries for image processing (are squares used instead?)
  • NImParamRadiusGui.m: necessary for the ‘Image CurveFit ComboAnalysis’ computations.
  • NIscanIntensBGpar4: major part of the computations for ‘Image CurveFit ComboAnalysis’
  • par4Gbl_Main8c.m: preallocation of data structures that are passed into descending function calls; calls the parallel processing parfor loop which loops through all the plate imaging folders
  • p4loop8c.m: calls par4GblFnc8c.m and passes to it preallocated structures and other variables; passes back the analysis data in renamed structures based preallocated structures to calling script par4Gbl_main8c.m
  • par4GBLFnc8c.m: calls NIscanIntensBG4par4.m; compiles returned data; calls NCfitImCFparforFailGb12.m passing data obtained with NIscanIntensBG4par4.m to the curve fit function

For PinTool Functionality, the following scripts must be in the EASY folder:
- NdirectPTGui.m
-NIPTdirectParmsGui.m
-NIPTsearchParmsGui.m
-NImapPT.m
-NImapPTcentA.m
-NImapPTcentroidSrc.m
-NImapPTcentroidSrcCirc.m

Other code Notes:
>The Toolbar for the EASYconsole is found in line 61 of EASYconsole.m >> ‘figure’ \= on; ‘none’ \= off
>PlateMapPintool button functionality: line 345 in EASYconsole.m