Rollup

2024-08-13 15:27:53 -04:00
parent 724b292dab
commit f190967383
7 changed files with 367 additions and 298 deletions
--- a/workflow/README.md
+++ b/workflow/README.md
@@ -34,6 +34,7 @@ Insert a general description of Q-HTCP and the Q-HTCP process here.
 * [py_gtf_concat](#pygtfconcat)
 * [r_compile_gtf](#rcompilegtf)
 * [get_studies](#getstudies)
+* [choose_easy_results](#chooseeasyresults)

 ## Notes

@@ -94,7 +95,7 @@ Insert a general description of Q-HTCP and the Q-HTCP process here.

 ### parse_input

-`--project`, `--module`, `--nomodule`, and `--submodule` can be passed multiple times or with a comma-separated string
+`--project`, `--module`, `--nomodule`, and `--wrapper` can be passed multiple times or with a comma-separated string

 #### Options

@@ -106,9 +107,9 @@ Insert a general description of Q-HTCP and the Q-HTCP process here.

  One or more modules to run (default: all), can be passed multiple times or with a comma-separated string

-* **-s\<value\>** | **--submodule=\<value\>**
+* **-w\<value\>** | **--wrapper=\<value\>**

-  Requires two arguments: the name of the submodule and its arguments, can be passed multiple times
+  Requires two arguments: the name of the wrapper and its arguments, can be passed multiple times

 * **-n\<value\>** | **--nomodule=\<value\>**

@@ -134,7 +135,7 @@ Insert a general description of Q-HTCP and the Q-HTCP process here.

 * **PROJECTS** (array): List of projects to cycle through
 * **MODULES** (array): List of modules to run on each project
-* **SUBMODULES** (array): List of submodules and their arguments to run on each project
+* **WRAPPERS** (array): List of wrappers and their arguments to run on each project
 * **EXCLUDE_MODULES** (array): List of modules not to run on each project
 * **DEBUG** (int): Turn debugging on
 * **YES** (int): Turn assume yes on
@@ -147,10 +148,10 @@ Use a module to:

 * Build a new type of analysis from scratch
 * Generate project directories
-* Group multiple submodules (and modules) into a larger task
-* Dictate the ordering of multiple submodules
-* Competently handle pushd and popd for their submodules if they do not reside in the SCANS/PROJECT_DIR
-* Call their submodules with the appropriate arguments
+* Group multiple wrappers (and modules) into a larger task
+* Dictate the ordering of multiple wrappers
+* Competently handle pushd and popd for their wrappers if they do not reside in the SCANS/PROJECT_DIR
+* Call their wrappers with the appropriate arguments

 ### install_dependencies

@@ -204,6 +205,7 @@ TODO
 * MasterPlate_ file **should not be an xlsx file**, no portability
 * We can keep the existing xlsx code for old style fallback
 * But moving forward should switch to csv or something open
+* Do we need to sync a QHTCP template?

 NOTES

@@ -218,6 +220,15 @@ NOTES

 Run the EASY matlab program

+INPUT FILES
+* MasterPlate_.xls
+* DrugMedia_.xls
+
+OUTPUT FILES
+* !!ResultsStd_.txt
+* !!ResultsELr_.txt
+
+
 TODO

 * Don't create output in the scans folder, put it in an output directory
@@ -612,11 +623,9 @@ TODO
 * **$5** (string): All_SGD_GOTerms_for_QHTCPtk.csv
 * **$6** (string): zscores_interaction.csv

-## Submodules
+## Wrappers

-Submodules are shell wrappers for workflow components in external languages
-
-Submodules:
+Wrappers:

 * Allow scripts to be called by the main workflow script using input and output arguments as a translation mechanism.
 * Only run by default if called by a module.
@@ -665,7 +674,7 @@ Output

 *

-This submodule:
+This wrapper:

 * Will perform both L and K comparisons for the specified experiment folders. 
 * The code uses the naming convention of PairwiseCompare_Exp’#’-Exp’#’ to standardize and keep simple the structural naming (where ‘X’ is either K or L and ‘Y’ is the number of the experiment GTA results to be found in ../GTAresult/Exp_).
@@ -697,7 +706,7 @@ Output

 *

-This submodule:
+This wrapper:

 * The Term Specific Heatmaps are produced directly from the ../ExpStudy/Exp_/ZScores/ZScores_Interaction.csv file generated by the user modified interaction… .R  script. 
 * The heatmap labeling is per the names the user wrote into the StudyInfo.txt spreadsheet.
@@ -716,11 +725,18 @@ This submodule:

 ### r_interactions

-Run the R interactions analysis (Z_InteractionTemplate.R)
+Run the R interactions analysis (deprecates Z_InteractionTemplate.R)
+
+SCRIPT: interactions.R

 TODO

-* Don't want to rename Z_InteractionTemplate.R because that will break logic, just edit in place instead
+* Parallelization (need to consult with Sean)
+* Needs more loops to reduce verbosity, but don't want to limit flexibility
+* Replace 1:length() with seq_along()
+* Re-enable disabled linter checks
+* Reduce cyclomatic complexity of some of the for loops
+* There needs to be one point of truth for the SD factor

 NOTES

@@ -729,10 +745,11 @@ NOTES
 #### Arguments

 * **$1** (string): The input directory
-* **$2** (string): The study info file
-* **$3** (string): The zscores directory
+* **$2** (string): The zscores directory
+* **$3** (string): The study info file
 * **$4** (string): SGD_features.tab
 * **$5** (integer): delta SD background value (default: 5)
+* **$6** (integer): experiment number

 ### r_join_interactions

@@ -826,7 +843,7 @@ Output

 ### pl_gtf_analyze

-Perl analyze submodule
+Perl analyze wrapper
 This seems weird to me because we're just overwriting the same data for all set2 members
 https://metacpan.org/dist/GO-TermFinder/view/examples/analyze.pl
 Is there a reason you need a custom version and not the original from cpan?
@@ -840,7 +857,7 @@ Is there a reason you need a custom version and not the original from cpan?

 ### pl_gtf_terms2tsv

-Perl terms2tsv submodule
+Perl terms2tsv wrapper
 Probably should be translated to shell/python

 #### Arguments
@@ -849,7 +866,7 @@ Probably should be translated to shell/python

 ### py_gtf_concat

-Python concat submodule for GTF
+Python concat wrapper for GTF
 Concat the process ontology outputs from the /REMcReady_lm_only folder
 Probably should be translated to bash

@@ -872,14 +889,14 @@ Parse study names from StudyInfo.csv files

 TODO

-* This whole submodule should eventually be either
+* This whole wrapper should eventually be either
 * Removed
 * Expanded into a file that stores all project/study settings (database)
 * I had to had a new line to the end of StudyInfo.csv, may break things?

 #### Arguments

-* **$1** (string): File to read
+* **$1** (string): Study info file

 #### Variables set

@@ -890,3 +907,21 @@ TODO
 * **0**: If one or more studies found
 * **1**: If no studies found

+### choose_easy_results
+
+Chooses an EASY scans directory if the information is undefined
+TODO Standardize EASY output, it's hard to understand
+TODO eventually we could run this on multiple results dirs simultaneously with some refactoring
+
+#### Arguments
+
+* **$1** (string): directory containing EASY results dirs
+
+#### Variables set
+
+* **EASY_RESULTS_DIR** (string): The working EASY output directory
+
+#### Exit codes
+
+* **0**: if successfully choose anEASY results dir
+