This commit is contained in:
2024-08-13 15:27:53 -04:00
parent 724b292dab
commit f190967383
7 changed files with 367 additions and 298 deletions

View File

@@ -34,6 +34,7 @@ Insert a general description of Q-HTCP and the Q-HTCP process here.
* [py_gtf_concat](#pygtfconcat)
* [r_compile_gtf](#rcompilegtf)
* [get_studies](#getstudies)
* [choose_easy_results](#chooseeasyresults)
## Notes
@@ -94,7 +95,7 @@ Insert a general description of Q-HTCP and the Q-HTCP process here.
### parse_input
`--project`, `--module`, `--nomodule`, and `--submodule` can be passed multiple times or with a comma-separated string
`--project`, `--module`, `--nomodule`, and `--wrapper` can be passed multiple times or with a comma-separated string
#### Options
@@ -106,9 +107,9 @@ Insert a general description of Q-HTCP and the Q-HTCP process here.
One or more modules to run (default: all), can be passed multiple times or with a comma-separated string
* **-s\<value\>** | **--submodule=\<value\>**
* **-w\<value\>** | **--wrapper=\<value\>**
Requires two arguments: the name of the submodule and its arguments, can be passed multiple times
Requires two arguments: the name of the wrapper and its arguments, can be passed multiple times
* **-n\<value\>** | **--nomodule=\<value\>**
@@ -134,7 +135,7 @@ Insert a general description of Q-HTCP and the Q-HTCP process here.
* **PROJECTS** (array): List of projects to cycle through
* **MODULES** (array): List of modules to run on each project
* **SUBMODULES** (array): List of submodules and their arguments to run on each project
* **WRAPPERS** (array): List of wrappers and their arguments to run on each project
* **EXCLUDE_MODULES** (array): List of modules not to run on each project
* **DEBUG** (int): Turn debugging on
* **YES** (int): Turn assume yes on
@@ -147,10 +148,10 @@ Use a module to:
* Build a new type of analysis from scratch
* Generate project directories
* Group multiple submodules (and modules) into a larger task
* Dictate the ordering of multiple submodules
* Competently handle pushd and popd for their submodules if they do not reside in the SCANS/PROJECT_DIR
* Call their submodules with the appropriate arguments
* Group multiple wrappers (and modules) into a larger task
* Dictate the ordering of multiple wrappers
* Competently handle pushd and popd for their wrappers if they do not reside in the SCANS/PROJECT_DIR
* Call their wrappers with the appropriate arguments
### install_dependencies
@@ -204,6 +205,7 @@ TODO
* MasterPlate_ file **should not be an xlsx file**, no portability
* We can keep the existing xlsx code for old style fallback
* But moving forward should switch to csv or something open
* Do we need to sync a QHTCP template?
NOTES
@@ -218,6 +220,15 @@ NOTES
Run the EASY matlab program
INPUT FILES
* MasterPlate_.xls
* DrugMedia_.xls
OUTPUT FILES
* !!ResultsStd_.txt
* !!ResultsELr_.txt
TODO
* Don't create output in the scans folder, put it in an output directory
@@ -612,11 +623,9 @@ TODO
* **$5** (string): All_SGD_GOTerms_for_QHTCPtk.csv
* **$6** (string): zscores_interaction.csv
## Submodules
## Wrappers
Submodules are shell wrappers for workflow components in external languages
Submodules:
Wrappers:
* Allow scripts to be called by the main workflow script using input and output arguments as a translation mechanism.
* Only run by default if called by a module.
@@ -665,7 +674,7 @@ Output
*
This submodule:
This wrapper:
* Will perform both L and K comparisons for the specified experiment folders.
* The code uses the naming convention of PairwiseCompare_Exp#-Exp# to standardize and keep simple the structural naming (where X is either K or L and Y is the number of the experiment GTA results to be found in ../GTAresult/Exp_).
@@ -697,7 +706,7 @@ Output
*
This submodule:
This wrapper:
* The Term Specific Heatmaps are produced directly from the ../ExpStudy/Exp_/ZScores/ZScores_Interaction.csv file generated by the user modified interaction… .R script.
* The heatmap labeling is per the names the user wrote into the StudyInfo.txt spreadsheet.
@@ -716,11 +725,18 @@ This submodule:
### r_interactions
Run the R interactions analysis (Z_InteractionTemplate.R)
Run the R interactions analysis (deprecates Z_InteractionTemplate.R)
SCRIPT: interactions.R
TODO
* Don't want to rename Z_InteractionTemplate.R because that will break logic, just edit in place instead
* Parallelization (need to consult with Sean)
* Needs more loops to reduce verbosity, but don't want to limit flexibility
* Replace 1:length() with seq_along()
* Re-enable disabled linter checks
* Reduce cyclomatic complexity of some of the for loops
* There needs to be one point of truth for the SD factor
NOTES
@@ -729,10 +745,11 @@ NOTES
#### Arguments
* **$1** (string): The input directory
* **$2** (string): The study info file
* **$3** (string): The zscores directory
* **$2** (string): The zscores directory
* **$3** (string): The study info file
* **$4** (string): SGD_features.tab
* **$5** (integer): delta SD background value (default: 5)
* **$6** (integer): experiment number
### r_join_interactions
@@ -826,7 +843,7 @@ Output
### pl_gtf_analyze
Perl analyze submodule
Perl analyze wrapper
This seems weird to me because we're just overwriting the same data for all set2 members
https://metacpan.org/dist/GO-TermFinder/view/examples/analyze.pl
Is there a reason you need a custom version and not the original from cpan?
@@ -840,7 +857,7 @@ Is there a reason you need a custom version and not the original from cpan?
### pl_gtf_terms2tsv
Perl terms2tsv submodule
Perl terms2tsv wrapper
Probably should be translated to shell/python
#### Arguments
@@ -849,7 +866,7 @@ Probably should be translated to shell/python
### py_gtf_concat
Python concat submodule for GTF
Python concat wrapper for GTF
Concat the process ontology outputs from the /REMcReady_lm_only folder
Probably should be translated to bash
@@ -872,14 +889,14 @@ Parse study names from StudyInfo.csv files
TODO
* This whole submodule should eventually be either
* This whole wrapper should eventually be either
* Removed
* Expanded into a file that stores all project/study settings (database)
* I had to had a new line to the end of StudyInfo.csv, may break things?
#### Arguments
* **$1** (string): File to read
* **$1** (string): Study info file
#### Variables set
@@ -890,3 +907,21 @@ TODO
* **0**: If one or more studies found
* **1**: If no studies found
### choose_easy_results
Chooses an EASY scans directory if the information is undefined
TODO Standardize EASY output, it's hard to understand
TODO eventually we could run this on multiple results dirs simultaneously with some refactoring
#### Arguments
* **$1** (string): directory containing EASY results dirs
#### Variables set
* **EASY_RESULTS_DIR** (string): The working EASY output directory
#### Exit codes
* **0**: if successfully choose anEASY results dir