Rollup before parallelization

2024-08-14 23:20:29 -04:00
parent 1ba1f14537
commit 6992d5eec0
8 changed files with 2517 additions and 2434 deletions
--- a/workflow/README.md
+++ b/workflow/README.md
@@ -33,7 +33,7 @@ Insert a general description of Q-HTCP and the Q-HTCP process here.
 * [pl_gtf_terms2tsv](#plgtfterms2tsv)
 * [py_gtf_concat](#pygtfconcat)
 * [r_compile_gtf](#rcompilegtf)
-* [get_studies](#getstudies)
+* [study_info](#studyinfo)
 * [choose_easy_results](#chooseeasyresults)

 ## Notes
@@ -183,7 +183,7 @@ If you wish to install them manually, you can use the following information to d

 #### Perl

-* `cpan File::Map ExtUtils::PkgConfig GD GO::TermFinder`
+* `cpan -I -i File::Map ExtUtils::PkgConfig GD GO::TermFinder`

 #### R

@@ -199,7 +199,7 @@ This module:

 * Initializes a project directory in the scans directory

-TODO 
+:bulb: **TODO** 

 * Copy over source image directories from robot
 * MasterPlate_ file **should not be an xlsx file**, no portability
@@ -207,7 +207,7 @@ TODO
 * But moving forward should switch to csv or something open
 * Do we need to sync a QHTCP template?

-NOTES
+:memo: **NOTES**

 * Copy over the images from the robot and then DO NOT TOUCH that directory except to copy from it
 * Write-protect (read-only) if we need to
@@ -522,12 +522,11 @@ TODO WIP
 System for Multi-QHTCP-Experiment Gene Interaction Profiling Analysis

 * Functional rewrite of REMcMaster3.sh, RemcMaster2.sh, REMcJar2.sh, ExpFrontend.m, mProcess.sh, mFunction.sh, mComponent.sh
-* Added a newline character to the end of StudyInfo.csv so it is a valid text file
+* Added a newline character to the end of the study info file so it is a valid text file

 TODO 

 * Suggest renaming StudiesQHTCP to something like qhtcp qhtcp_output or output
-* Store StudyInfo somewhere better
 * Move (hide) the study template somewhere else
 * StudiesArchive should be smarter:
 * Create a database with as much information as possible
@@ -592,7 +591,7 @@ TODO

 #### Arguments

-* **$1** (string): studyInfo file
+* **$1** (string): study info file

 ### gtf

@@ -640,14 +639,14 @@ TODO
 * Is GTAtemplate.R actually a template?
 * Do we need to allow user customization?

-Files
+INPUT

 * [gene_association.sgd](https://downloads.yeastgenome.org/curation/chromosomal_feature/gene_association.sgd)
 * go_terms.tab

-Output
+OUTPUT

-*
+* Average_GOTerms_All.csv

 #### Arguments

@@ -663,11 +662,13 @@ PairwiseLK.R R script

 TODO

-* Should move directory creation from PairwiseLK.R to gta module
+* Move directory creation from PairwiseLK.R to gta module
+* Needs better output filenames and directory organization
+* Needs more for looping to reduce verbosity

-Files
+INPUT

-* 
+* Average_GOTerms_All.csv
 * 

 Output
@@ -684,7 +685,7 @@ This wrapper:

 * **$1** (string): First Exp# name
 * **$2** (string): Second Exp# name
-* **$3** (string): StudyInfo.csv file
+* **$3** (string): study info file
 * **$4** (string): output directory

 ### r_gta_heatmaps
@@ -693,9 +694,10 @@ TSHeatmaps5dev2.R R script

 TODO

-* Script could use rename
-* Script should be refactored to automatically allow more studies
-* Script should be refactored with more looping to reduce verbosity
+* Rename
+* Refactor to automatically allow more studies
+* Refactor with more looping to reduce verbosity
+* Reduce cyclomatic complexity of some of the for loops

 Files

@@ -709,13 +711,13 @@ Output
 This wrapper:

 * The Term Specific Heatmaps are produced directly from the ../ExpStudy/Exp_/ZScores/ZScores_Interaction.csv file generated by the user modified interaction… .R  script. 
-* The heatmap labeling is per the names the user wrote into the StudyInfo.txt spreadsheet.
+* The heatmap labeling is per the names the user wrote into the study info file
 * Verify that the All_SGD_GOTerms_for_QHTCPtk.csv found in ../Code is what you wish to use or if you wish to use a custom modified version.  
 * If you wish to use a custom modified version, create it and modify the TSHeatmaps template script (TSHeatmaps5dev2.R) and save it as a ‘TSH_study specific name’.

 #### Arguments

-* **$1** (string): StudyInfo.csv file
+* **$1** (string): study info file
 * **$2** (string): gene_ontology_edit.obo file
 * **$3** (string): go_terms.tab file
 * **$4** (string): All_SGD_GOTerms_for_QHTCPtk.csv
@@ -737,6 +739,14 @@ TODO
 * Re-enable disabled linter checks
 * Reduce cyclomatic complexity of some of the for loops
 * There needs to be one point of truth for the SD factor
+* Replace most paste() functions with printf()
+
+INPUT
+
+* easy/results_std.txt
+
+
+

 NOTES

@@ -744,18 +754,26 @@ NOTES

 #### Arguments

-* **$1** (string): The input directory
+* **$1** (string): The input results_std.txt
 * **$2** (string): The zscores directory
 * **$3** (string): The study info file
 * **$4** (string): SGD_features.tab
-* **$5** (integer): delta SD background value (default: 5)
-* **$6** (integer): experiment number
+* **$5** (integer): experiment number
+* **$6** (integer): delta SD background value (default: 3)

 ### r_join_interactions

 JoinInteractExps3dev.R creates REMcRdy_lm_only.csv and Shift_only.csv

-Output
+TODO
+
+* Needs more loops to reduce verbosity
+
+INPUT
+
+* 
+
+OUTPUT

 * REMcRdy_lm_only.csv
 * Shift_only.csv
@@ -765,7 +783,7 @@ Output

 * **$1** (string): The output directory
 * **$2** (string): The sd value
-* **$3** (string): The studyInfo file
+* **$3** (string): The study info file

 ### java_extract

@@ -785,10 +803,10 @@ NOTE

 #### Arguments

-* **$1** (string): GeneByGOAttributeMatrix_nofiltering-2009Dec07.tab
+* **$1** (string): The output directory
 * **$2** (string): ORF_List_Without_DAmPs.txt
 * **$3** (string): REMcRdy_lm_only.csv
-* **$4** (string): The output directory
+* **$4** (string): GeneByGOAttributeMatrix_nofiltering-2009Dec07.tab
 * **$5** (string): The output file

 #### Exit codes
@@ -805,13 +823,25 @@ and output "REMcWithShift.csv" for use with the REMc heat maps

 * **$1** (string): REMcRdy_lm_only.csv-finalTable.csv
 * **$2** (string): Shift_only.csv
-* **$3** (string): StudyInfo.csv file
-* **$4** (string): The sd value
+* **$3** (string): study info file
+* **$4** (string): sd value

 ### r_create_heat_maps

 Execute createHeatMaps.R

+INPUT
+
+* REMcWithShift.csv
+
+OUTPUT
+
+* compiledREMcHeatmaps.pdf
+
+TODO
+
+* Needs more looping for brevity
+
 #### Arguments

 * **$1** (string): The final shift table (REMcWithShift.csv)
@@ -832,7 +862,9 @@ Execute createHeatMapsAll.R

 Perform python dcon portion of GTF

-Output
+SCRIPT: [DconJG2.py](apps/python/DconJG2.py)
+
+OUTPUT

 * 1-0-0-finaltable.csv

@@ -844,9 +876,13 @@ Output
 ### pl_gtf_analyze

 Perl analyze wrapper
-This seems weird to me because we're just overwriting the same data for all set2 members
-https://metacpan.org/dist/GO-TermFinder/view/examples/analyze.pl
-Is there a reason you need a custom version and not the original from cpan?
+
+SCRIPT: [analyze_v2.pl](https://metacpan.org/dist/GO-TermFinder/view/examples/analyze.pl)
+
+TODO
+
+* Are we just overwriting the same data for all set2 members?
+* Why the custom version?

 #### Arguments

@@ -858,7 +894,10 @@ Is there a reason you need a custom version and not the original from cpan?
 ### pl_gtf_terms2tsv

 Perl terms2tsv wrapper
-Probably should be translated to shell/python
+
+TODO
+
+* Probably should be translated to shell/python

 #### Arguments

@@ -868,7 +907,10 @@ Probably should be translated to shell/python

 Python concat wrapper for GTF
 Concat the process ontology outputs from the /REMcReady_lm_only folder
-Probably should be translated to bash
+
+TODO
+
+* Probably should be translated to bash

 #### Arguments

@@ -883,24 +925,18 @@ Compile GTF in R

 * **$1** (string): gtf output directory

-### get_studies
+### study_info

-Parse study names from StudyInfo.csv files
+Creates, modifies, and parses the study info file

 TODO

-* This whole wrapper should eventually be either
-* Removed
-* Expanded into a file that stores all project/study settings (database)
-* I had to had a new line to the end of StudyInfo.csv, may break things?
-
-#### Arguments
-
-* **$1** (string): Study info file
+* Needs refactoring
+* Ended up combining a few functions into one

 #### Variables set

-* **STUDIES_NUMS** (array): Contains Exp numbers
+* **STUDIES_NUMS** (array): contains Exp numbers

 #### Exit codes