From f2fcb34b5e16a12eb90da36aaf7537a225ceb32a Mon Sep 17 00:00:00 2001 From: Bryan Roessler Date: Thu, 25 Jul 2024 00:24:25 -0400 Subject: [PATCH] EASY work --- workflow/script-run-workflow | 147 ++++++++++++++++++-------- workflow/templates/easy/EASYconsole.m | 55 +++++----- 2 files changed, 129 insertions(+), 73 deletions(-) diff --git a/workflow/script-run-workflow b/workflow/script-run-workflow index c372fdd5..1ce7278d 100755 --- a/workflow/script-run-workflow +++ b/workflow/script-run-workflow @@ -9,12 +9,45 @@ # @brief One script to rule them all (see: xkcd #927) # @description A flexible yet opinionated analysis framework for the Hartman Lab # There should be at least 4 subdirectories to organize Q-HTCP data and analysis. The parent directory is simply called 'Q-HTCP' and the 4 are subdirectories described below (Fig. 1): -# * **ExpJobs** - This directory contains raw image data and image analysis results for the entire collection of Q-HTCP experiments. We recommend each subdirectory within 'ExpJobs" should represent a single Q-HTCP experiment and be named using the following convention (AB yyyy_mmdd_PerturbatationsOfInterest): experimenter initials ('AB '), date ('yyyy_mmdd_'), and brief description ('drugs_medias'). Each subdirectory contains the Raw Image Folders for that experiment (a series of N folders with successive integer labels 1 to N, each folder containing the time series of images for a single cell array). It also contains a user-supplied subfolder, which must be named ''MasterPlateFiles" and must contain two excel files, one named 'DrugMedia_experimentdescription' and the other named 'MasterPlate_experimentdescription'. The bolded part of the file name including the underscore is required. The italicized part is optional description. Generally the 'DrugMedia_' file merits description. If the standard MasterPlate_Template file is being used, it's not needed to customize then name. On the other hand if the template is modified, it is recommended to rename it and describe accordingly - a useful convention is to use the same name for the MP files as given to the experiment (i.e, the parent ExpJobs subdirectory described above) after the underscores. The 'MasterPlate_' file contain associated cell array information (culture IDs for all of the cell arrays in the experiment) while the 'DrugMedia_' file contains information about the media that the cell array is printed to. Together they encapsulate and define the experimental design. The QHTCPImageFolders and 'MasterPlateFiles' folder are the inputs for image analysis with EASY software. As further described below, EASY will automatically generate a 'Results' directory (within the ExpJobs/'ExperimentJob' folder) with a name that consists of a system-generated timestamp and an optional short description provided by the user (Fig.2). The 'Results' directory is created and entered, using the "File >> New Experiment" dropdown in EASY. Multiple 'Results' files may be created (and uniquely named) within an 'ExperimentJob' folder. -# * **EASY** - This directory contains the GUI-enabled MATLAB software to accomplish image analysis and growth curve fitting. EASY analyzes Q-HTCP image data within an 'ExperimentJob'' folder (described above; each cell array has its own folder containing its entire time series of images). EASY analysis produces image quantification data and growth curve fitting results for each cell array; these results are subsequently assembled into a single file and labeled, using information contained in the 'MasterPlate_' and 'DrugMedia_' files in the 'MasterPlateFiles' subdirectory. The final files (named '!!ResultsStd_.txt' or '!!ResultsELr_.txt') are produced in a subdirectory that EASY creates within the 'ExperimentJob' folder, named '/ResultsTimeStampDesc/PrintResults' (Fig. 2). The /EASY directory is simply where the latest EASY version resides (additional versions in development or legacy versions may also be stored there). Note: The raw data inputs and result outputs for EASY are kept in the 'ExpJobs' directory. EASY also outputs a '.mat' file that is stored in the 'matResults' folder and is named with the TimeStamp and user-provided name appended to the 'Results' folder name when 'New Experiment' is executed from the 'File' Dropdown menu in the EASY console. -# * **EZview** - This directory contains the GUI-enabled MATLAB software to conveniently and efficiently mine the raw cell array image data for a Q-HTCP experiment. It takes the Results.m file (created by EASY software) as an input and permits the user to navigate through the raw image data and growth curve results for the experiment. The /EZview provides a place for storing the the latest EZview version (as well as other EZview versions). EZview provides a GUI for examining the EASY results as provided in the …/matResults/… .mat file. -# * **StudiesQHTCP** - A software composite (MATLAB, JAVA, R, Python, Perl, Shell) that takes growth curve results (created by EASY software) as an input and successively generates interaction Z-score results, which are used for graphing gene interactions, Clustering, Gene Ontology analysis, and other ways of interpreting and visualizing the experimental quality and outcomes. {The /StudiesQHTCP folder contains the ordered command line scripts that call sets of other scripts to perform data selection and adaptation from the extracted text results spreadsheet found in the /ExpJobs/experiment name/Results…/PrintResults/ folder. In particular the 'user customize interactionCode4experiment.R' file. It also contains a multitude of R generated plots based on the selected data and possible adaptation. All clustering and Gene ontology analysis are derived from the 'ZScores_Interaction.csv' file found in the/ZScores subdirectory.} -# * **Master Plates** - This optional folder is a convenient place to store copies of the 'MasterPlate_' and a 'DrugMedia_' file templates, along with previously used files that may have been modified and could be reused or further modified to enable future analyses. These two file types are required in the 'MasterPlateFiles' folder, which catalogs experimental information specific to individual Jobs in the ExpJobs folder, as described further below. -# +# * **ExpJobs** +# * This directory contains raw image data and image analysis results for the entire collection of Q-HTCP experiments. +# * We recommend each subdirectory within 'ExpJobs" should represent a single Q-HTCP experiment and be named using the following convention (AB yyyy_mmdd_PerturbatationsOfInterest): experimenter initials ('AB '), date ('yyyy_mmdd_'), and brief description ('drugs_medias'). +# * Each subdirectory contains the Raw Image Folders for that experiment (a series of N folders with successive integer labels 1 to N, each folder containing the time series of images for a single cell array). It also contains a user-supplied subfolder, which must be named ''MasterPlateFiles" and must contain two excel files, one named 'DrugMedia_experimentdescription' and the other named 'MasterPlate_experimentdescription'. The bolded part of the file name including the underscore is required. The italicized part is optional description. Generally the 'DrugMedia_' file merits description. +# * If the standard MasterPlate_Template file is being used, it's not needed to customize then name. On the other hand if the template is modified, it is recommended to rename it and describe accordingly - a useful convention is to use the same name for the MP files as given to the experiment (i.e, the parent ExpJobs subdirectory described above) after the underscores. +# * The 'MasterPlate_' file contain associated cell array information (culture IDs for all of the cell arrays in the experiment) while the 'DrugMedia_' file contains information about the media that the cell array is printed to. +# * Together they encapsulate and define the experimental design. +# * The QHTCPImageFolders and 'MasterPlateFiles' folder are the inputs for image analysis with EASY software. +# * As further described below, EASY will automatically generate a 'Results' directory (within the ExpJobs/'ExperimentJob' folder) with a name that consists of a system-generated timestamp and an optional short description provided by the user (Fig.2). The 'Results' directory is created and entered, using the "File >> New Experiment" dropdown in EASY. Multiple 'Results' files may be created (and uniquely named) within an 'ExperimentJob' folder. +# * **EASY** +# * This directory contains the GUI-enabled MATLAB software to accomplish image analysis and growth curve fitting. +# * EASY analyzes Q-HTCP image data within an 'ExperimentJob'' folder (described above; each cell array has its own folder containing its entire time series of images). +# * EASY analysis produces image quantification data and growth curve fitting results for each cell array; these results are subsequently assembled into a single file and labeled, using information contained in the 'MasterPlate_' and 'DrugMedia_' files in the 'MasterPlateFiles' subdirectory. +# * The final files (named '!!ResultsStd_.txt' or '!!ResultsELr_.txt') are produced in a subdirectory that EASY creates within the 'ExperimentJob' folder, named '/ResultsTimeStampDesc/PrintResults' (Fig. 2). +# * The /EASY directory is simply where the latest EASY version resides (additional versions in development or legacy versions may also be stored there). +# * The raw data inputs and result outputs for EASY are kept in the 'ExpJobs' directory. +# * EASY also outputs a '.mat' file that is stored in the 'matResults' folder and is named with the TimeStamp and user-provided name appended to the 'Results' folder name when 'New Experiment' is executed from the 'File' Dropdown menu in the EASY console. +# * **EZview** +# * This directory contains the GUI-enabled MATLAB software to conveniently and efficiently mine the raw cell array image data for a Q-HTCP experiment. +# * It takes the Results.m file (created by EASY software) as an input and permits the user to navigate through the raw image data and growth curve results for the experiment. +# * The /EZview provides a place for storing the the latest EZview version (as well as other EZview versions). +# * The /EZview provides a GUI for examining the EASY results as provided in the …/matResults/… .mat file. +# * **StudiesQHTCP** +# * This directory contains the GUI-enabled JAVA software composite (MATLAB, JAVA, R, Python, Perl, Shell) that takes growth curve results (created by EASY software) as an input and successively generates interaction Z-score results, which are used for graphing gene interactions, Clustering, Gene Ontology analysis, and other ways of interpreting and visualizing the experimental quality and outcomes. {The /StudiesQHTCP folder contains the ordered command line scripts that call sets of other scripts to perform data selection and adaptation from the extracted text results spreadsheet found in the /ExpJobs/experiment name/Results…/PrintResults/ folder. In particular the 'user customize interactionCode4experiment.R' file. It also contains a multitude of R generated plots based on the selected data and possible adaptation. All clustering and Gene ontology analysis are derived from the 'ZScores_Interaction.csv' file found in the/ZScores subdirectory.} +# * **Master Plates** +# * This optional folder is a convenient place to store copies of the 'MasterPlate_' and a 'DrugMedia_' file templates, along with previously used files that may have been modified and could be reused or further modified to enable future analyses. +# * These two file types are required in the 'MasterPlateFiles' folder, which catalogs experimental information specific to individual Jobs in the ExpJobs folder, as described further below. +# +# NOTES: +# * For the time being I have tried to balance the recognizability of your current workflow with better practices that allow this program to function. +# +# TODO: +# * Scripts should be made modular enough that they can be stored in the same dir +# * Don't cd in scripts +# * Pass variables +# * Variable scoping is horrible right now +# * I wrote this sequentially and tried to keep track the best I could +# * Local vars have a higher likelihood of being lower case, global vars are UPPER +# # @option -p | --project= Include one or more projects in the analysis # @option -i | --include= Include one or more modules in the analysis (default: all modules) # @option -x | --exclude= Exclude one or more modules in the analysis @@ -165,7 +198,6 @@ module() { declare -gA "$1" } # @arg ALL_SUBMODULES array A list of all available modules (that have been passed to module()) -# @internal submodule() { debug "Adding $1 submodule" ALL_SUBMODULES+=("$1") @@ -209,30 +241,32 @@ debug() { (( DEBUG )) && echo "Debug: $*"; } module install_dependencies # @section Install dependencies # @description Installs dependencies for the workflow -# Software - These can all be downloaded from the respective online platforms for each operating system -# R -# RStudio (Why?) -# Perl -# Java -# MATLAB # -# For MacOS: It is recommended that MacOS users download Homebrew for easy installation of the following packages. -# The command prompt to download Homebrew followed by the prompts to download the necessary packages are listed below. -# export HOMEBREW_BREW_GIT_REMOTE=https://github.com/Homebrew/brew -# /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" -# cpan File::Map ExtUtils::PkgConfig GD GO::TermFinder -# brew install graphiz gd pdftk-java pandoc shdoc nano rsync # -# For Linux: -# cpan File::Map ExtUtils::PkgConfig GD GO::TermFinder -# apt-get install graphviz libgd-dev pdftk-java pandoc shdoc nano rsync +# +# Dependencies +# * R +# * Perl +# * Java +# * MATLAB +# +# For MacOS +# * export HOMEBREW_BREW_GIT_REMOTE=https://github.com/Homebrew/brew +# * /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" +# * cpan File::Map ExtUtils::PkgConfig GD GO::TermFinder +# * brew install graphiz gd pdftk-java pandoc shdoc nano rsync +# +# For Linux +# * cpan File::Map ExtUtils::PkgConfig GD GO::TermFinder +# * apt-get install graphviz libgd-dev pdftk-java pandoc shdoc nano rsync # or -# dnf install graphviz pandoc pdftk-java gd-devel shdoc nano rsync +# * dnf install graphviz pandoc pdftk-java gd-devel shdoc nano rsync # -# For R: -# install.packages(“BiocManager”) -# BiocManager::install(“org.Sc.sgd.db”) -# install.packages(c('ontologyIndex', 'ggrepel', 'tidyverse', 'sos', 'openxlsx'), dep=TRUE) +# For R +# * install.packages(“BiocManager”) +# * BiocManager::install(“org.Sc.sgd.db”) +# * install.packages(c('ontologyIndex', 'ggrepel', 'tidyverse', 'sos', 'openxlsx'), dep=TRUE) +# @noargs install_dependencies() { debug "Running: ${FUNCNAME[0]}" "$@" @@ -297,22 +331,22 @@ install_dependencies() { module init_project # @section Initialize a new project in the scans directory # @description This function creates and initializes project directories -# We make the general assumption that a project will have most modules run on it and handle # This module is responsible for the following tasks: # * Initializing a project directory in the scans directory # * Initializing a QHTCP project directory in the qhtcp directory # - -# TODO Copy over source image directories from robot - are these alse named by the ExpJobs name? -# TODO Suggest renaming ExpJobs to something like "scans" or "images" -# TODO Best practices: +# TODO +# * Copy over source image directories from robot - are these alse named by the ExpJobs name? +# * Suggest renaming ExpJobs to something like "scans" or "images" +# * MasterPlate_ file **should not** be an xlsx file, no portability +# +# NOTES # * Copy over the images from the robot and then DO NOT TOUCH that directory except to copy from it # * Write-protect (read-only) if we need to # * Copy data from scans/images directory to the project working dir and then begin analysis # * You may think...but doesn't that 2x data? # * No, btrfs subvolume uses reflinks, only data that is altered will be duplicated # * Most of the data are static images that are not written to, so the data is deduplicated -# TODO Where are MasterPlate and DrugMedia blank sheets? init_project() { debug "Running: ${FUNCNAME[0]}" @@ -328,9 +362,36 @@ init_project() { DRUG_MEDIA_FILE="$SCANS_DIR/MasterPlateFiles/DrugMedia_$PROJECT.xls" MASTER_PLATE_FILE="$SCANS_DIR/MasterPlateFiles/MasterPlate_$PROJECT.xls" - for f in $DRUG_MEDIA_FILE $MASTER_PLATE_FILE; do - touch "$f" + # Write skeleton files in csv + # If we have to convert to xlsx later, so be it + cat <<-EOF > "$DRUG_MEDIA_FILE" + + + + EOF + + cat <<-EOF > "$MASTER_PLATE_FILE" + + + + EOF + + # TODO here we'll copy scan from robot but for now let's pause and wait for transfer + read -r -p "Hit to continue: " + + # Refactor some of the EASY fs code into here + # Make EASY directories + results_dir="Results$DATE-$PROJECT_SUFFIX" + mkdir "$results_dir" + dirs=('PrintResults' 'CFfigs' 'Fotos' 'Fotos/BkUp' 'matResults') + for d in "${dirs[@]}"; do + mkdir "$results_dir/$d" done + + # Copy templates + rsync -a "$EASY_TEMPLATE_DIR"/{figs,Ptmats} "$results_dir" + + } @@ -340,7 +401,7 @@ module easy # TODO Don't create output in the scans folder, put it in an output directory # TODO The !!Results output files need standardized naming # TODO Don't perform directory operations in EASY -# * The QHTCPImageFolders and 'MasterPlateFiles' folder are the inputs for image analysis with EASY software. +# * The scans/images and 'MasterPlateFiles' folder are the inputs for image analysis with EASY software. # * EASY will automatically generate a 'Results' directory (within the ExpJobs/'ExperimentJob' folder) w/ timestamp and an optional short description provided by the user (Fig.2). # * The 'Results' directory is created and entered, using the "File >> New Experiment" dropdown in EASY. # * Multiple 'Results' files may be created (and uniquely named) within an 'ExperimentJob' folder. @@ -476,7 +537,8 @@ easy() { pushd "$EASY_TEMPLATE_DIR" || return 1 # Launch graphical matlab if the user wants - ! ((YES)) && ask "Start EASY in MATLAB? This requires a GUI." && matlab -nosplash -r "$script" + ! ((YES)) && ask "Start EASY in MATLAB? This requires a GUI." && + matlab -nosplash -sd ~/downloads -r "$script" popd || return 1 # Use the function return code see if we succeeded @@ -677,7 +739,7 @@ qhtcp() { "$STUDY_DIR"\ "$STUDY_INFO_FILE"\ "/ZScores/" \ - "../Code/SGD_features.tab" \ + "$CODE_DIR/SGD_features.tab" \ 5 popd || return 1 done @@ -752,7 +814,7 @@ module gta # @set all_sgd_terms_csv string The all_SGD_GOTerms_for_QHTCPtk.csv file # @set sgd_terms_tfile string The go_terms.tab file # @set sgd_features_file string The gene_association.sgd file -# @set gene_ontology_file string The gene_ontology_edit.obo file +# @set gene_ontology_file string The gene_ontology_edit.obo file # @set zscores_file string The ZScores_interaction.csv file gta() { debug "Running: ${FUNCNAME[0]}" @@ -979,7 +1041,7 @@ mat_exp_frontend() { The BackgroundSD and ZscoreJoinSD fields will be filled automatically according to user specifications, at a later step in the QHTCP study process. - Open MATLAB and in the application navigate to each specific /Exp folder, + COpen MATLAB and in the application navigate to each specific /Exp folder, call and execute ExpFrontend.m by clicking the play icon. Use the "Open file" function from within Matlab. Do not double-click on the file from the directory. @@ -1000,9 +1062,7 @@ submodule r_interactions # @description Run the R interactions analysis (Z_InteractionTemplate.R) # TODO # * don't want to rename Z_InteractionTemplate.R because that will break logic, just edit in place instead -# NOTE -# * I modified the input logic of Z_InteractionTemplate script so that it can be treated like a native module -# * Note how little code is required to call it in r_interactions() +# NOTES # @arg $1 string The current working directory r_interactions() { debug "Running: ${FUNCNAME[0]}" @@ -1283,6 +1343,7 @@ get_easy_results() { submodule documentation # @section Documentation # @description Generates markdown documentation from this script using shdoc +# # TODO # * We can include images in the markdown file but not natively with shdoc # * Need to add a post processor diff --git a/workflow/templates/easy/EASYconsole.m b/workflow/templates/easy/EASYconsole.m index 1f7cd71a..2cbca9ed 100644 --- a/workflow/templates/easy/EASYconsole.m +++ b/workflow/templates/easy/EASYconsole.m @@ -11,7 +11,7 @@ function varargout = EASYconsole(varargin) global wCodeDir %global ImParMat - wCodeDir=pwd + wCodeDir=pwd; % changing directory to wCodeDir returnStartDir @@ -133,17 +133,11 @@ function NewExpDat_Callback(~, ~, ~) % Set paths newExpfilePref= strrep(newExpfile,'.mat',''); - resDirName=strcat('Results',datestr(now,29),newExpfilePref); resDir=fullfile(newExppath,resDirName); matDir=fullfile(newExppath,resDirName,'matResults'); ExpOutmat=fullfile(matDir,strcat(datestr(now,29),newExpfile)); ExpPath=fullfile(newExppath); - % create the the matResults dir - if ~exist(matDir) - mkdir (matDir); - end - %***Added for 'parfor global' to preallocate 'scan' structure 20-0123***** nlist=dir(fullfile(ExpPath,'*')); nnn=0; @@ -160,23 +154,24 @@ function NewExpDat_Callback(~, ~, ~) scan(scanMax)= struct(); %changed for parfor global 20_0118 save(ExpOutmat,'scan') - % create supporting dirs - dirs = {'PrintResults', 'CFfigs', 'Fotos', 'Fotos/BkUp'}; - for i = 1:length(dirs) - d = dirs{i}; - if ~exist(fullfile(ExpPath, resDirName, d), 'dir') - mkdir(fullfile(ExpPath, resDirName, d)); - end - end + % BCR rewrote these but moved the functionality into the main workflow script + % % create supporting dirs + % dirs = {'PrintResults', 'CFfigs', 'Fotos', 'Fotos/BkUp'}; + % for i = 1:length(dirs) + % d = dirs{i}; + % if ~exist(fullfile(ExpPath, resDirName, d), 'dir') + % mkdir(fullfile(ExpPath, resDirName, d)); + % end + % end - % templateDirs are stored in the easy template directory - templates = {'figs', 'PTmats'} - for i = 1:length(templates) - d = dirs{i}; - if ~exist(fullfile(ExpPath, resDirName, d), 'dir') - copyfile((fullfile(wCodeDir,d)), (fullfile(ExpPath,resDirName,d))); - end - end + % % templateDirs are stored in the easy template directory + % templates = {'figs', 'PTmats'} + % for i = 1:length(templates) + % d = dirs{i}; + % if ~exist(fullfile(ExpPath, resDirName, d), 'dir') + % copyfile((fullfile(wCodeDir,d)), (fullfile(ExpPath,resDirName,d))); + % end + % end clear sbdg % reduce possible retention of a previous job sdbg sbdg= cell(1,scanMax); @@ -243,13 +238,13 @@ function LoadDatFile_Callback(~, ~, ~) mkdir(fullfile(openExppath,'\BkUp')); % create supporting dirs - dirs = {'PrintResults', 'figs', 'CFfigs', 'PTmats', 'Fotos'} - for i = 1:length(dirs) - d = dirs{i} - if ~exist(fullfile(resDir, d), 'dir') - mkdir(fullfile(resDir, d)); - end - end + % dirs = {'PrintResults', 'figs', 'CFfigs', 'PTmats', 'Fotos'} + % for i = 1:length(dirs) + % d = dirs{i} + % if ~exist(fullfile(resDir, d), 'dir') + % mkdir(fullfile(resDir, d)); + % end + % end catch returnStartDir