表型-微生物-代谢物组学关联分析

### 
#Pedersen, H.K., Forslund, S.K., Gudmundsdottir, V. et al. A computational #framework to integrate high-throughput ‘-omics’ datasets for the identification #of potential mechanistic links. Nat Protoc 13, 2781–2800 (2018). #https://doi.org/10.1038/s41596-018-0064-z
### Procedure for three-pronged host-microbiome analysis pipeline
###
### This R script essentially reproduces results from the analysis of MetaHIT
### data in Pedersen, H. K. et al. 'Human gut microbes impact host serum
### metabolome and insulin sensitivity'. Nature 535, 376–381 (2016)., and is
### intended to allow adaptations for analogous work on other datasets.
###
### The sections in this script corresponds to procedural sections in the parent
### protocol: 'A computational framework for conducting a three-pronged
### association study to integrate high-throughput ‘-omics’ datasets for the
### identification of potential mechanistic links'.
### It was adapted from initial scripts by Helle Krogh Pedersen (HKP), Valborg
### Gudmundsdottir (VG) and Henrik Bjørn Nielsen (BN) by HKP, Sofia K. Forslund
### (SKF), VG and Anders Petersen (AP). 
### Last version 26/07/2018.
###

###
### Step 1 - Set working directory
###
### The code will look for input files and deposit output files in
### subdirectories relative to a main directory. On your computer, create a
### directory for the analysis, to remain generic (and without assumption on
### user operating system) we here call it top/. Then create subdirectories to
### obtain the following directory hierarchy:
###
###   top/ 
###     r-code/   // containing this script and any others
###     data/     // containing input files (i.e. all files (unzipped) from the 'example_input.zip'-file).
###     results/  // location of output files
###
### Copy all R-code from the git repository to your top/r-code/ subdirectory (i.e.
### all .R files). Then open the main R-script called protocol_main.R in e.g.
### RStudio. 
### Finally, modify the following command to set the working directory to
### your top/ directory: setwd(“~/top”)
###

#setwd ("~/top") ### Change the path to your main working directory. 

###
### Step 2 - Ensure availability of software packages and satisfaction of
### dependencies
###
### The code in this section simply makes all script-wise dependencies and
### settings available in the present workspace. The analysis was tested using R
### 3.3.3. If packages are available under another version, it should run, but
### specifics of the implementation of each package may change results slightly.
### The provided r-code completes in ~1 hour on a MacBook Pro (2.9GHz quad- core
### 7th-generation Intel Core i7 processor, 16GB 2133MHz LPDDR3 memory) and can
### be paused at any point.
###
### Ensure the following packages (including their indirect dependencies other
### packages needed for their compilation and operation, some of which should be
### installed via Bioconductor) are installed and loaded correctly correctly (in
### every case available via the built-in R and Bioconductor package managers).
###

library (xlsx) ### Saving to spreadsheet
library (data.table) ### Fast read of large files into R
library (WGCNA) ### -	Clustering software. Previously reported work done using v1.34
library (flashClust) ### Clustering software
library (ppcor) ### Partial Spearman correlations, for confounder analysis. Previously reported work done using v1.0
library (gplots) ### Plotting
library (cowplot) ### Plotting; to arrange several plots on the same page
library (ggplot2) ### Plotting
library (plyr) ### Data transformations

### If you have problems installing some packages (e.g. WGCNA), it is likely 
### because you are missing one or more of the Bioconductor packages. 
### First install the dependencies with the code below, then reattempt the 
### installation of WGCNA
# source("https://bioconductor.org/biocLite.R")
# biocLite("impute")
# biocLite("GO.db")
# install.packages("WGCNA)

### The script was testing using the following version of packages:
# R version 3.3.3 (2017-03-06)
# Platform: x86_64-apple-darwin13.4.0 (64-bit)
# Running under: macOS Sierra 10.12.6
# 
# locale:
# [1] da_DK.UTF-8/da_DK.UTF-8/da_DK.UTF-8/C/da_DK.UTF-8/da_DK.UTF-8
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] plyr_1.8.4            cowplot_0.9.1         ggplot2_2.2.1         gplots_3.0.1          ppcor_1.1            
# [6] MASS_7.3-47           flashClust_1.01-2     WGCNA_1.61            fastcluster_1.1.24    dynamicTreeCut_1.63-1
# [11] data.table_1.10.4-3  xlsx_0.5.7            xlsxjars_0.6.1        rJava_0.9-9          
# 
# loaded via a namespace (and not attached):
# [1] B iobase_2.34.0        bit64_0.9-7           splines_3.3.3         foreach_1.4.3         gtools_3.5.0         
# [6]  Formula_1.2-2         stats4_3.3.3          latticeExtra_0.6-28   blob_1.1.0            fit.models_0.5-14    
# [11] yaml_2.1.14           robustbase_0.92-8     impute_1.48.0         RSQLite_2.0           backports_1.1.1      
# [16] lattice_0.20-35       digest_0.6.12         RColorBrewer_1.1-2    checkmate_1.8.5       colorspace_1.3-2     
# [21] htmltools_0.3.6       preprocessCore_1.36.0 Matrix_1.2-12         pcaPP_1.9-72          pkgconfig_2.0.1      
# [26] GO.db_3.4.0           mvtnorm_1.0-6         scales_0.5.0          gdata_2.18.0          htmlTable_1.11.2     
# [31] tibble_1.3.4          IRanges_2.8.2         nnet_7.3-12           BiocGenerics_0.20.0   lazyeval_0.2.1       
# [36] survival_2.41-3       magrittr_1.5          memoise_1.1.0         doParallel_1.0.11     foreign_0.8-69       
# [41] tools_3.3.3           matrixStats_0.52.2    stringr_1.2.0         S4Vectors_0.12.2      munsell_0.4.3        
# [46] cluster_2.0.6         AnnotationDbi_1.36.2  caTools_1.17.1        rlang_0.1.4           grid_3.3.3           
# [51] iterators_1.0.8       rstudioapi_0.7        htmlwidgets_0.9       robust_0.4-18         labeling_0.3         
# [56] bitops_1.0-6          base64enc_0.1-3       gtable_0.2.0          codetools_0.2-15      DBI_0.7              
# [61] reshape2_1.4.2        rrcov_1.4-3           gridExtra_2.3         knitr_1.17            bit_1.1-12           
# [66] Hmisc_4.1-1           KernSmooth_2.23-15    stringi_1.1.6         parallel_3.3.3        Rcpp_0.12.13         
# [71] rpart_4.1-11          acepack_1.4.1         DEoptimR_1.0-8       

###
### Step 3 - Import input files
### 
### The following input data is assumed to exist within the top/data subdirectory:
###
### Data-files: 
### -	phenotypes.tab: 
###   File with clinical phenotypes (columns) per
###   individual (rows). Used to test for associations with, or for confounder
###   analysis. Individuals are labeled ‘idv’ followed by a number, e.g. ‘idv001’.
###
### -	metabolomic.tab / lipidomic.tab: 
###   Input data matrix for abundance of 325
###   polar metabolites or 876 molecular lipids per individual. Note that no
###   additional normalization is done in this script, so data is assumed to be
###   comparable in these regards. Such different data types are eventually merged
###   into a single set of metabolite cluster abundances. Individual
###   metabolite/lipids are named M or L (for specifying a polar metabolite or
###   molecular lipid, respectively), followed by a number and lastly the
###   annotation or ‘unknown’ in case of unannotated metabolites/lipids, e.g.
###   ‘M_20_Valine’.
###
### -	MGS_abundance.tab: 
###   File with the abundance (e.g. median gene abundance) of
###   MGSs (columns) per individual (rows). These are assumed to have been
###   rarefied to comparable depth or otherwise normalized. For historical
###   reasons, the MGSs are labeled ‘T2DCAG’ followed by a number, e.g.
###   ‘T2DCAG00001’.
###
### -	KO_abundance.tab: 
###   File with the abundance of each KO (columns) per
###   individual (rows). The data is assumed to be rarefied to comparable depth or
###   otherwise normalized. For this, the software tool rtk58 can be used.
###
### - gene_abundance_sub.tab: 
###   File with the abundance of each catalog gene
###   (subset-version) in each individual assumed to be rarefied to comparable
###   depth or otherwise normalized.
### 
### Annotation-files:
### -	cluster_mapping_file.tab: 
###   Input file with annotation for metabolite
###   clusters, as available from curation of data in the specific dataset. The
###   WGCNA clustering algorithm names the generated clusters with color codes.
###   This mapping file simply facilitate renaming to more meaningful cluster
###   descriptions. Here, the serum polar metabolite and serum molecular lipid
###   clusters are labelled M01–M35 and L01–L39, respectively, and collectively
###   termed metabolite clusters.
###
### -	MGS_taxonomy.tab: 
###   File with taxonomic annotation of the MGSs (rows) used
###   in the analysis. Each row contains the following information:
###   species_taxonomy, species_pct, genus_taxonomy, genus_pct, family_taxonomy,
###   family_pct, order_taxonomy, order_pct, phylum_taxonomy, phylum_pct, where
###   x_pct is the percentage of the MGS genes that can be annotated (by sequence
###   similarity) to the taxonomy of the MGS. If no taxonomy can be assigned to
###   the MGS, the value will be NA.
###
### -	KEGG_modules.tab: 
###   File containing definition of KEGG gene functional
###   modules (rows) specifying which KO gene groups constitute each KEGG module.
###   The first two columns contain KEGG module entry (number) and name, the third
###   column list all KOs separated by semicolon. Any other functional annotation
###   used analogously could be swapped in instead of the name. This file can be
###   obtained by downloading KEGG modules from
###   www.genome.jp/kegg-bin/get_htext?ko00002.keg (www.genome.jp/kegg/ --> KEGG
###   MODULE --> KEGG modules); download htext and then running the provided
###   script ‘parse_kegg.pl’ (after changing input filename).
###
### -	KO_to_MGS.tab: 
###   File listing for each KO (rows) the MGSs (space-separated)
###   it is a member of. This is based on the KO annotation of the gene catalog
###   (gene_to_KO.tab) and the information, what gene from the gene catalog is
###   within each MGS as given in the list MGS_to_gene.tab.
###
### -	gene_to_KO.tab: 
###   File containing the KO annotation (if any) per gene (rows)
###   in the gene catalog (constituting 7,328,469 genes). Genes are labeled
###   ‘RefCat620’ followed by a number (1…7328469), e.g. ‘RefCat620.1’. Used to
###   create KO_to_MGS.tab. Note, for the purpose of this protocol and to
###   considerably reduce size of input data, only the subset of catalogue genes
###   with KO annotation (n = 2,205,769) are provided in files with gene abundance
###   or annotation (gene_to_KO.tab, MGS_to_gene.tab and gene_abundance_sub.tab)
###   as only those genes are used in the driver-species analysis.
###
### -	MGS_to_gene.tab: 
###   List of genes binned into a given MGS (rows). Used to
###   create KO_to_MGS.tab. Rather than gene names the file contains the index
###   value of the gene. i.e. position of the gene in the gene catalogue
###   (subset-version, thus 1…2205769).
###
### All demonstration files are tab-delimited text files, but other formats
### would equally work after modifying the respective file-import commands in
### the R-script. 
###
### For all demonstration data, pseudonymised sample names were re-randomised to
### generate anonymised data.
###

options (stringsAsFactors = FALSE)

### - phenotypes.tab

phenotypes = read.table (file = "data/phenotypes.tab", row.names = 1, header = T, sep = "\t")
ctrl = rownames (subset (phenotypes, Diabetes == "nonDiabetic")) ### set of control samples for analyses restricted to non-diabetic individuals.

### - metabolomic.tab

metabolomic = read.table (file = "data/metabolomic.tab", row.names = 1, header = T, sep = "\t")

### - lipidomic.tab

lipidomic = read.table (file = "data/lipidomic.tab", row.names = 1, header = T, sep = "\t")

### - cluster_mapping_file.tab

cluster_mapping_file = read.table (file = "data/cluster_mapping_file.tab", header = T, sep = "\t", row.names = 1)
cluster_mapping_file$label =  sapply (rownames (cluster_mapping_file),  function (x) paste (cluster_mapping_file [x, "New_Name"], cluster_mapping_file [x, "Description"], sep = ": "))

### - MGS_abundance.tab

mgs_abundance = read.table (file = "data/MGS_abundance.tab", sep = "\t", row.names = 1, header = T)

### - MGS_taxonomy.tab

mgs_taxonomy  = read.table (file = "data/MGS_taxonomy.tab", sep = "\t", row.names = 1, header = T)

### - KEGG_modules.tab

tmp = read.table ("data/KEGG_modules.tab", sep = "\t")
koann = strsplit (tmp[,3], split = ";")
names (koann) = tmp[,1]

module_mapping = tmp[,2] ### description of Kegg modules
names (module_mapping) = tmp[,1] ; rm (tmp)

### remove KEGG references in square brackets for more clean names for plotting
module_mapping_clean = sapply(module_mapping, function(x) strsplit(x, " \\[")[[1]][1])

### - KO_abundance.tab

ko_abundance = read.table (file = "data/KO_abundance.tab", sep = "\t", row.names = 1, header = T)

### - KO_to_MGS.tab

tmp = read.table ("data/KO_to_MGS.tab", sep = "\t", strip.white = T)
KO2MGS = strsplit (tmp[,2], split = " ")
names (KO2MGS) = tmp[,1] ; rm (tmp)

### - gene_to_KO.tab

gene2KO = read.table ("data/gene_to_KO.tab", sep = "\t", row.names = 1, header = T)
KO2gene = tapply (1:nrow (gene2KO), gene2KO[,1], c) ; rm (gene2KO)

### - MGS_to_gene.tab

tmp = read.table ("data/MGS_to_gene.tab", sep = "\t", strip.white = T)
MGS2gene = strsplit (tmp[,2], split = " ")
names (MGS2gene) = tmp[,1] ; rm (tmp)

### - gene_abundance_sub.tab

gene_abundance_sub = data.frame( fread ("data/gene_abundance_sub.tab", sep = "\t", header = T), row.names = 1)

### 
### Step 4 - Preprocessing/cleanup of loaded data for sparsity and domain
### limitation
###
### The following commands restrict the input data to account for method
### limitations.
###

### MGS sparsity filter step - exclude MGSs that occur in <3 of the control individuals
test = apply (mgs_abundance [intersect (ctrl, rownames (mgs_abundance)),], 2, function(x) length (x [x != 0])) 
incl = names (test [test >= 3])
length (incl)
MGSs = incl
mgs_abundance = mgs_abundance [,incl]
mgs_taxonomy = mgs_taxonomy [incl,]
rm (test, incl)
### MGS sparsity filter step done

### Filtering out KEGG modules that are eukaryotic only
euk = unique (c ("M00352", "M00355", "M00354", "M00285", "M00295", "M00341", 
                 "M00177", "M00160", "M00359", "M00391", "M00182", "M00340", 
                 "M00359", "M00182", "M00160", "M00177", "M00391", "M00180",  ### "eukaryotes" in name
                 "M00351", "M00352", "M00355", "M00354", "M00353", "M00427")) ### spliceosome or nuclear functions
incl = setdiff (names (koann), euk)
koann = koann [incl] ; rm (incl)
### Eukaryote filtering done

### KO sparsity filter step - exclude KOs that occur in <3 of the control individuals
test = apply (ko_abundance [intersect (ctrl, rownames (ko_abundance)),], 2, function (x) length (x [x != 0]))  
incl = names (test [test >= 3])
length (incl)
ko_abundance = ko_abundance [, incl]
rm (test, incl)
### KO sparsity filter step done

### Determine control sample set and ensure there are no missing values for
### phenotypes tested, as algorithms used cannot handle this well.

ctrl.no.na = ctrl [! is.na (phenotypes [ctrl, "Homa.IR"])]

###
### Step 5 - Identify clusters of polar metabolites
###
### A central part of the protocol is dimensionality reduction of the high-
### dimensional input data. Here the space of metabolite measurements is reduced
### by identifying clusters of co-abundant metabolite peaks, analogously to the
### identification of MGSs as co-abundant gene groups. This is done using
### algorithms developed for gene expression analysis, specifically WGCNA.
###
### Importantly, effective WGCNA performance depends on parameter settings for
### cluster reconstruction which will vary to some extent between datasets.
### Establishing these for a particular dataset requires some inspection of
### resulting clustering behavior under parameter exploration. The full
### procedure for this lies beyond the scope of the present Protocols, but the
### reader is referred to the WGCNA main documentation.
###
### Thus, for optimal performance, the steps below should be performed for
### different values and the resulting curves inspected, hereafter parameters
### should be specified as appropriate. The examples below corresponds to
### optimal choices for the dataset and analysis used in Pedersen et al., 2016.
###
### In this step, WGCNA clustering is performed separately on lipidomic and
### metabolomic measurements to detect cluster of densely connected
### metabolites/lipids. The metabolite/lipid profiles constituting a given
### cluster are summarized by the 1st principal component of the
### metabolite/lipid abundance matrix (‘Module Eigen-metabolite/lipid’); i.e.
### basically a weighted average abundance profile. Prior to this, optimal
### parameters for WGCNA should be established for the dataset being analyzed.
###

### Settings for WGCNA generally
cor_method          = "spearman" ### for association with clinical parameters
corFun_tmp          = "bicor"
cluster_method      = "average"
corOptions_list     = list (use = 'pairwise.complete.obs') 
corOptions_str      = "use = 'pairwise.complete.obs'"
BH_pval_asso_cutoff = 0.05
NetworkType         = "signed" ### Signed-network (as the PC1 and median profile does not make sense as a summary measure of a cluster with anticorrelated metabolites.)

### Specify data and parameters 
ID_common_all_samples_meta = intersect (rownames (phenotypes), rownames (metabolomic)) ### only include individuals with information for both domains.
dat = as.data.frame (metabolomic [ID_common_all_samples_meta,])
dat_tmp = log2 (dat) ### work in logarithmic space

### Settings for WGCNA on polar metabolite measurements
### Once these are established the steps below can be run
RsquareCut_val      = 0.89 ### usually ranges 0.80-0.95 but requires inspecting curves
mergingThresh       = 0.20 ### Maximum dissimilarity of module eigengenes (i.e. 1-correlation) for merging modules.
minModuleSize       = 3 ### minimum number of metabolites constituting a cluster
SoftPower           = 13 ### beta-value, main parameter to optimize

### Calculate weighted adjacency matrix
A = adjacency (dat_tmp, power = SoftPower, type = NetworkType, corFnc = corFun_tmp, corOptions = corOptions_str)
colnames (A) = rownames (A) = colnames (dat_tmp)
### Define dissimilarity based on topological overlap
dissTOM = TOMdist (A, TOMType = NetworkType)
colnames (dissTOM) = rownames (dissTOM) = colnames (dat_tmp)
### Hierarchical clustering
metaTree = flashClust (as.dist (dissTOM), method = cluster_method)
### Define modules by cutting branches
moduleLabels1 = cutreeDynamic (dendro = metaTree, distM = dissTOM, method = "hybrid", deepSplit = 4, pamRespectsDendro = T, minClusterSize = minModuleSize)
moduleLabels1 = labels2colors (moduleLabels1)
### Automatically merge highly correlated modules
merge = mergeCloseModules (dat_tmp, moduleLabels1, corFnc = corFun_tmp, corOptions = corOptions_list, cutHeight = mergingThresh)
### Determine resulting merged module colors
moduleLabels2 = merge$colors
### Establish eigengenes of the newly merged modules, used for cluster overall abundances
MEs = merge$newMEs
### Choose final module assignments
moduleColorsMeta = moduleLabels2
names (moduleColorsMeta) = colnames (dat_tmp)
MEsMeta = orderMEs (MEs)
rownames (MEsMeta) = rownames (dat_tmp)

### Determine relevant descriptive statistics of established clusters
### kIN: within-module connectivity, determined by summing connectivity with all
###      other metabolites in the given cluster.
### kME: bicor-correlation between the metabolite profile and module eigenvector; 
### both measures of intramodular hub-metabolite status.
kIN <-      vector (length = ncol (dat_tmp)); names (kIN) = colnames (dat_tmp)
kME <-      vector (length = ncol (dat_tmp)); names (kME) = colnames (dat_tmp)
modules <-  vector (length = ncol (dat_tmp)); names (modules) = colnames (dat_tmp)

for (module in names (table (moduleColorsMeta))) {   

	all.metabolites = names (dat_tmp)
	inModule = (moduleColorsMeta == module)
	module.metabolites = names (moduleColorsMeta [inModule])
  modules [module.metabolites] = module 
	kIN [module.metabolites] = sapply (module.metabolites, function (x) sum (A [x, module.metabolites]) - 1)
  datKME = signedKME (dat_tmp, MEsMeta, corFnc = corFun_tmp, corOptions = corOptions_str)
	rownames (datKME) = colnames (dat_tmp)
	kME [module.metabolites] = datKME [module.metabolites, paste ("kME", module, sep = "")]   

}
output = data.frame ("module" = modules, "kME" = kME, "kIN" = kIN, "cluster_name" = sapply (modules, function (m) cluster_mapping_file [paste0 ("M_ME", m), "New_Name"]))

###
### Step 6 - Link individual polar metabolites to phenotype of interest
###
### The dimensionality reduction approach (detailed in Stage 3 (Step 8-12))
### hinges on identifying features (e.g. metabolites and lipids clusters, etc.)
### that are associated with a host phenotype of interest. In the example work
### we describe here, this was insulin resistance as assessed by the HOMA-IR
### measurement.
###
### In this step, for reference, the individual polar metabolites are linked
### with HOMA-IR, both directly and under adjustment for a potential confounder
### variable. In this case, such de-confounding was done for body mass index
### (BMI). In case of a binary phenotype variable, one can, for example,
### substitute the Spearman correlation test with a Mann-Whitney U (MWU) test.
###

tmpMat = array (NA, c (ncol (metabolomic), 2, 2))
dimnames (tmpMat) [[1]] = colnames (metabolomic)
dimnames (tmpMat) [[2]] = c ("HOMA.ir", "HOMA.ir_BMI.adj.partial") 
dimnames (tmpMat) [[3]] = c ("estimate", "p.value")

### Associating individual polar metabolites with HOMA-IR without de-confounding
### for BMI
tmpMat [, "HOMA.ir", c ("estimate", "p.value")] =
	t (apply (metabolomic [ctrl.no.na,], MARGIN = 2, FUN = function (x) 
	  unlist (cor.test (phenotypes [ctrl.no.na, "Homa.IR"], x,
		method = cor_method, use = "pairwise.complete.obs")[c ("estimate", "p.value")])))       

### Associating individual polar metabolites with HOMA-IR while de-confounding
### for BMI
tmpMat [, "HOMA.ir_BMI.adj.partial", c ("estimate", "p.value")] =
	t (apply (metabolomic [ctrl.no.na,], MARGIN = 2, FUN = function (x) 
	  unlist (pcor.test (x = x, y = phenotypes [ctrl.no.na, "Homa.IR"], z = phenotypes [ctrl.no.na, "BMI.kg.m2"],
		method = cor_method) [c ("estimate", "p.value")])))       

### Sort by cluster_name and then decreasing values of kIN
output2 = cbind (tmpMat[, "HOMA.ir", c ("estimate", "p.value")], "p.adjust" = p.adjust (tmpMat [, "HOMA.ir", "p.value"], method = "BH"), 
                 tmpMat[, "HOMA.ir_BMI.adj.partial", c ("estimate", "p.value")], "p.adjust" = p.adjust (tmpMat [, "HOMA.ir_BMI.adj.partial", "p.value"], method = "BH"))
colnames (output2) = paste (rep (c ("HOMA.ir", "HOMA.ir_BMI.adj.partial"), each = 3), colnames (output2), sep = "_")
output3 = cbind (output, output2)
output3 = output3 [with (output3, order (cluster_name, -kIN)), ]

### Write to file
write.table (output3, file = "results/individual_metabolites.txt", sep = "\t", col.names = NA, quote = F, row.names = T)
rm (output, output2, output3, tmpMat, dat, dat_tmp)

###
### Step 7a - Identify clusters of molecular lipid
###
### Analogous to Step 5. First identify optimal parameters for WGCNA, then
### establish clusters.
###

### Specify data and parameters
dat = as.data.frame (lipidomic [rownames (phenotypes),])
dat_tmp = log2 (dat)

### Settings for WGCNA on molecular lipids measurements
### Once these are established the steps below can be run
RsquareCut_val = 0.90
mergingThresh = 0.25
minModuleSize = 5
SoftPower = 14

### Calculate weighted adjacency matrix
A = adjacency (dat_tmp, power = SoftPower, type = NetworkType, corFnc = corFun_tmp, corOptions = corOptions_str)
### Define dissimilarity based on topological overlap
dissTOM = TOMdist (A, TOMType = NetworkType)
### Hierarchical clustering
lipidTree = flashClust (as.dist (dissTOM), method = cluster_method)
### Define modules by cutting branches
moduleLabels1 = cutreeDynamic (dendro = lipidTree, distM = dissTOM, method = "hybrid", deepSplit = 4, pamRespectsDendro = T, minClusterSize = minModuleSize)
moduleLabels1 = labels2colors (moduleLabels1)
### Automatically merge highly correlated modules
merge = mergeCloseModules (dat_tmp, moduleLabels1, corFnc = corFun_tmp, corOptions = corOptions_list, cutHeight = mergingThresh)
### Determine resulting merged module colors
moduleLabels2 = merge$colors
### Establish eigengenes of the newly merged modules, used for cluster overall
### abundances
MEs = merge$newMEs
### Choose module assignments
moduleColorsLipid = moduleLabels2
names (moduleColorsLipid) = colnames (dat_tmp)
MEsLipid = orderMEs (MEs)
rownames (MEsLipid) = rownames (dat_tmp)

### Determine relevant descriptive statistics of established clusters
kIN <-      vector (length = ncol (dat_tmp)); names (kIN) = colnames (dat_tmp)
kME <-      vector (length = ncol (dat_tmp)); names (kME) = colnames (dat_tmp)
modules <-  vector (length = ncol (dat_tmp)); names (modules) = colnames (dat_tmp)
 
for (module in names (table (moduleColorsLipid))) {   

	all.lipids = names (dat_tmp)
	inModule = (moduleColorsLipid == module)
	module.lipids = names (moduleColorsLipid [inModule])
  modules [module.lipids] = module 
	kIN [module.lipids] = sapply (module.lipids, function (x) sum (A [x, module.lipids]) -1)
	datKME = signedKME (dat_tmp, MEsLipid, corFnc = corFun_tmp, corOptions = corOptions_str)
	rownames (datKME) = colnames (dat_tmp)
	kME [module.lipids] = datKME [module.lipids, paste ("kME", module, sep = "")]   

}
output = data.frame ("module" = modules, "kME" = kME, "kIN" = kIN, "cluster_name" = sapply (modules, function (l) cluster_mapping_file [paste0 ("L_ME", l), "New_Name"]))

###
### Step 7b - Link individual molecular lipids to phenotype of interest
###
### Analogous to Step 6.
### The resulting metabolite and lipid clusters are thereafter merged into a
### combined dataset, collectively termed 'metabolite clusters', for downstream
### analyses.
###

tmpMat = array (NA, c (ncol (lipidomic), 2,  2))
dimnames (tmpMat) [[1]] = colnames (lipidomic)
dimnames (tmpMat) [[2]] = c ("HOMA.ir", "HOMA.ir_BMI.adj.partial") 
dimnames (tmpMat) [[3]] = c ("estimate", "p.value")

### Associating individual molecular lipids with HOMA-IR without de-confounding
### for BMI
tmpMat [, "HOMA.ir", c ("estimate", "p.value")] =
	t (apply (lipidomic [ctrl.no.na,], MARGIN = 2, FUN = function (x) 
	  unlist (cor.test (phenotypes [ctrl.no.na, "Homa.IR"], x,
		method = cor_method, use = "pairwise.complete.obs") [c ("estimate", "p.value")])))       

### Associating individual molecular lipids with HOMA-IR while de-confounding
### for BMI
tmpMat[, "HOMA.ir_BMI.adj.partial", c ("estimate", "p.value")] =
	t (apply (lipidomic [ctrl.no.na,], MARGIN = 2, FUN = function (x) 
	  unlist (pcor.test (x = x, y = phenotypes [ctrl.no.na, "Homa.IR"], z = phenotypes [ctrl.no.na, "BMI.kg.m2"],
		method = cor_method) [c ("estimate", "p.value")])))       

### Sort by cluster_name and then decreasing values of kIN
output2 = cbind (tmpMat[, "HOMA.ir", c ("estimate", "p.value")], "p.adjust" = p.adjust (tmpMat [, "HOMA.ir", "p.value"], method = "BH"), 
                 tmpMat[, "HOMA.ir_BMI.adj.partial", c ("estimate", "p.value")], "p.adjust" = p.adjust (tmpMat [, "HOMA.ir_BMI.adj.partial", "p.value"], method = "BH"))
colnames (output2) = paste (rep (c ("HOMA.ir", "HOMA.ir_BMI.adj.partial"), each = 3), colnames (output2), sep = "_")
output3 = cbind (output, output2)
output3 = output3 [with (output3, order (cluster_name, -kIN)), ]

### Write to file
write.table (output3, file = "results/individual_lipids.txt", sep = "\t", col.names = NA, quote = F, row.names = T)
rm (output, output2, output3, tmpMat, dat, dat_tmp)

### Create a joint data frame of metabolite/lipid cluster eigengene
### equivalents/effective abundances ('MEsMetLip')
MEsMetLip = cbind (MEsMeta [rownames (MEsMeta),], MEsLipid [rownames (MEsMeta),])
### Rename module names (columnames) from 'colors' to numbers
colnames (MEsMetLip) = cluster_mapping_file [c (paste0 ("M_", colnames (MEsMeta)), paste0 ("L_", colnames (MEsLipid))), "New_Name"]
### Exclude the two bin-clusters (i.e. "M_remaining" and "L_remaining")
MEsMetLip = subset (MEsMetLip, select = c (-M_remaining, -L_remaining))
### Save MEsMetLip to file, order by module number
write.table (MEsMetLip [ , order (colnames (MEsMetLip))], file = "results/MEs_metabolite_clusters.txt", sep = "\t", row.names = T, col.names = NA, quote = F)

###
### Step 8 - Link metabolite clusters to phenotype of interest
###
### Analogous to Step 6 (and 7b). 
### This is a core analysis step generating associations between the
### integrated/clustered –omics data and a clinically interesting phenotype. In
### Pedersen et al., 2016, this was insulin resistance (HOMA-IR measurement),
### but any phenotype is possible, as is checking against other –omics spaces or
### overall –omics measurements such as gut diversity or enterotype. This
### analysis can further be conducted controlling for confounders such as BMI in
### the analysis we previously reported, by performing tests with partial
### correlations or extended to binary phenotype variables by substituting tests
### of Spearman correlation with e.g. MWU tests.
###

cor_HOMA.IR <- list () ### Data structure for storing results of correlation tests under different setups

tmpMat = array (NA, c (ncol (MEsMetLip), 2, 2))
dimnames (tmpMat) [[1]] = names (MEsMetLip)
dimnames (tmpMat) [[2]] = c ("HOMA.ir", "HOMA.ir_BMI.adj.partial") 
dimnames (tmpMat) [[3]] = c ("estimate", "p.value")

### Associating metabolite clusters with HOMA-IR without de-confounding for BMI    
tmpMat [, "HOMA.ir", c ("estimate", "p.value")] =
	t (apply (MEsMetLip [ctrl.no.na,], MARGIN = 2, FUN = function (x) 
	  unlist (cor.test (phenotypes [ctrl.no.na, "Homa.IR"], x,
		method = cor_method, use = "pairwise.complete.obs") [c ("estimate", "p.value")])))       

### Associating metabolite clusters with HOMA-IR while de-confounding for BMI    
tmpMat [, "HOMA.ir_BMI.adj.partial", c ("estimate", "p.value")] =
	t (apply (MEsMetLip [ctrl.no.na,], MARGIN = 2, FUN = function (x) 
	  unlist (pcor.test (x = x, y = phenotypes [ctrl.no.na, "Homa.IR"], z = phenotypes [ctrl.no.na, "BMI.kg.m2"],
		method = cor_method) [c ("estimate", "p.value")])))       

cor_HOMA.IR [["metlip"]] <- tmpMat
rm (tmpMat)

###
### Step 9 - Link MGS metagenomic entities to phenotype of interest
###
### Analogous to Step 6 but for metagenomic taxonomic data.
###

tmpMat = array (NA, c (ncol (mgs_abundance), 2, 2))
dimnames (tmpMat) [[1]] = colnames (mgs_abundance)
dimnames (tmpMat) [[2]] = c ("HOMA.ir", "HOMA.ir_BMI.adj.partial")
dimnames (tmpMat) [[3]] = c ("estimate", "p.value")

### MGS sparsity filter step - exclude MGSs that occur in <3 of the control
### individuals that are actually used in the analysis. Repeat the step since we
### have excluded control individuals with missing information for HOMA-IR
ctrl.no.na.4MGS = intersect (ctrl.no.na, rownames (mgs_abundance)) # length=275
test = apply (mgs_abundance [intersect (ctrl.no.na.4MGS, rownames (mgs_abundance)),], 2, function (x) length (x [x != 0]))
mgs.subset = names (test[test>=3])
rm (test)
### Sparsity filter done

### Associating MGSs with HOMA-IR without de-confounding for BMI    
tmpMat [mgs.subset, "HOMA.ir", c ("estimate", "p.value")] =
	t (apply (mgs_abundance [ctrl.no.na.4MGS, mgs.subset], MARGIN = 2, FUN = function (x) 
	  unlist (cor.test (phenotypes [ctrl.no.na.4MGS, "Homa.IR"], x,
		method = cor_method, use = "pairwise.complete.obs") [c ("estimate", "p.value")])))       

### Associating MGSs with HOMA-IR while de-confounding for BMI    
tmpMat [mgs.subset, "HOMA.ir_BMI.adj.partial", c ("estimate", "p.value")] =
	t (apply (mgs_abundance [ctrl.no.na.4MGS, mgs.subset], MARGIN = 2, FUN = function (x) 
	  unlist (pcor.test (x = x, y = phenotypes [ctrl.no.na.4MGS, "Homa.IR"], z = phenotypes [ctrl.no.na.4MGS, "BMI.kg.m2"],
		method = cor_method) [c ("estimate", "p.value")])))       

cor_HOMA.IR [["MGSs"]] <- tmpMat
rm (tmpMat)

###
### Step 10 - Link KEGG functions to phenotype of interest
###
### Analogous in goal to Step 6 but for metagenomics functional data. 
### Here we use KEGG modules, but any other groupings of genes into functional
### modules could similarly be used (see examples in Table 1 in the accompanying
### Protocol). However, it is more complex in that each KEGG module is
### constituted by multiple KOs. Thus, to generate results on the level of
### modules, a test is made if correlations between the phenotype and the
### abundances of KOs in the module is significantly higher or lower (MWU test)
### for the module member KOs than for all other KOs, thus also considering
### module completeness beyond the single gene level. In case of a binary
### phenotype variable, the KOs can instead (of Spearman correlation
### coefficients) be ranked based on Wald statistics for testing differentially
### abundant KOs with a negative binomial test with the DESeq2 R package using
### non-rarefied gene counts.
###

tmpMat = array (NA, c (length (koann), 3, 2))
dimnames (tmpMat) [[1]] = names (koann)
dimnames (tmpMat) [[2]] = c ("HOMA.ir", "HOMA.ir_BMI.adj.partial", "HOMA.ir_RichnessGenes7M.adj.partial")
dimnames (tmpMat) [[3]] = c ("estimate", "p.value")

### KOs sparsity filter step - exclude KOs that occur in <3 of the control
### individuals that are actually used in the analysis. Repeat the step since we
### have excluded control individuals with missing information for HOMA-IR
ctrl.no.na.4KOs = intersect (ctrl.no.na, rownames (ko_abundance)) # length=275
test = apply (ko_abundance [intersect (ctrl.no.na.4KOs, rownames (ko_abundance)),], 2, function (x) length (x [x != 0]))
ko.subset = names (test [test >= 3])
rm (test)
### Sparsity filtering done

### Associating KOs with HOMA-IR without de-confounding for BMI    
### First, obtain the correlation coefficients for Spearman correlation between
### KOs and HOMA-IR
KO_cor = apply (ko_abundance [ctrl.no.na.4KOs, ko.subset], MARGIN = 2, FUN = function (x) 
	cor (phenotypes [ctrl.no.na.4KOs, "Homa.IR"], x,
	method = cor_method, use = "pairwise.complete.obs"))      
KO_cor_HOMA = KO_cor
### Then, test for difference in correlation coefficients between KOs in the
### KEGG module and all other KOs.
for (k in names (koann)) {
	incat =    na.omit (KO_cor [   names (KO_cor) %in% koann [[k]] ]) ### select all correlations between HOMA-IR and KOs in the KEGG module
	notincat = na.omit (KO_cor [! (names (KO_cor) %in% koann [[k]])]) ### select all correlations between HOMA-IR and KOs NOT in the KEGG module
	if (length (incat) > 0 & length (notincat) > 0) {
		x = wilcox.test (incat, notincat)
		tmpMat [k, "HOMA.ir", "p.value"] = x$p.value
		tmpMat [k, "HOMA.ir", "estimate"] = (median (incat, na.rm = T) - median (notincat, na.rm = T))
	}
}
rm (KO_cor)

### Associating KOs with HOMA-IR while de-confounding for BMI    
### First, obtain the correlation coefficients for partial Spearman correlation
### between KOs and HOMA-IR
KO_cor = apply (ko_abundance [ctrl.no.na.4KOs, ko.subset], MARGIN = 2, FUN = function (x) 
	pcor.test (x = x, y = phenotypes [ctrl.no.na.4KOs, "Homa.IR"], z = phenotypes [ctrl.no.na.4KOs, "BMI.kg.m2"],
	method = cor_method) [1, "estimate"])      
KO_cor_HOMAadjBMI = KO_cor
### Then, test for difference in correlation coefficients between KOs in the 
### KEGG module and all other KOs.
for (k in names (koann)) {
	incat =    na.omit (KO_cor [   names (KO_cor) %in% koann [[k]] ]) ### select all correlations between HOMA-IR.BMI.adj and KOs in the KEGG module
	notincat = na.omit (KO_cor [! (names (KO_cor) %in% koann [[k]])]) ### select all correlations between HOMA-IR.BMI.adj and KOs NOT in the KEGG module
	if (length (incat) > 0 & length (notincat) > 0) {
		x = wilcox.test (incat, notincat)
		tmpMat [k, "HOMA.ir_BMI.adj.partial", "p.value"]  = x$p.value
		tmpMat [k, "HOMA.ir_BMI.adj.partial", "estimate"] = (median (incat, na.rm = T) - median (notincat, na.rm = T))
	}
}
rm (KO_cor)

### Associating KOs with HOMA-IR while de-confounding for gut gene richness    
### First, obtain the correlation coefficients for partial Spearman correlation
### between KOs and HOMA.IR
KO_cor = apply (ko_abundance [ctrl.no.na.4KOs, ko.subset], MARGIN = 2, FUN = function (x) 
	pcor.test (x = x, y = phenotypes [ctrl.no.na.4KOs, "Homa.IR"], z = phenotypes [ctrl.no.na.4KOs, "richnessGenes7M"],
	method = cor_method) [1, "estimate"])      
KO_cor_HOMAadjRichnessGenes7M = KO_cor
### Then, test for difference in correlation coefficients between KOs in the 
### KEGG module and all other KOs.
for (k in names (koann)) {
	incat =    na.omit (KO_cor [   names (KO_cor) %in% koann [[k]] ]) ### select all correlations between HOMA-IR.RichnessGenes.adj and KOs in the KEGG module
	notincat = na.omit (KO_cor [! (names (KO_cor) %in% koann [[k]])]) ### select all correlations between HOMA-IR.RichnessGenes.adj and KOs NOT in the KEGG module
	if (length (incat) > 0 & length (notincat) > 0) {
		x = wilcox.test (incat, notincat)
		tmpMat [k, "HOMA.ir_RichnessGenes7M.adj.partial", "p.value"]  = x$p.value
		tmpMat [k, "HOMA.ir_RichnessGenes7M.adj.partial", "estimate"] = (median (incat, na.rm = T) - median (notincat, na.rm = T))
	}
}
rm (KO_cor)

cor_HOMA.IR [["keggmodules"]] <- tmpMat
rm (tmpMat, ko.subset, ctrl.no.na.4KOs)

###
### Step 11 - Save phenotype associations
###
### Save the (BMI corrected) HOMA-IR association of metabolite clusters, MGSs
### and KEGG modules calculated in Step 8-10.
###

tmp.excel.file = "results/HOMA.IR_associations.xlsx"
write.xlsx (paste ("Associations with HOMA-IR and HOMA-IR adjusted for BMI"), 
            sheetName = "info", file = tmp.excel.file, row.names = F, col.names = F)

for (tmp_name in names (cor_HOMA.IR)) {
  tmpMat = cor_HOMA.IR [[paste (tmp_name)]]
  out = cbind (tmpMat [, "HOMA.ir", c("estimate", "p.value")], "p.adjust" = p.adjust (tmpMat [, "HOMA.ir","p.value"], method = "BH"), 
               tmpMat [, "HOMA.ir_BMI.adj.partial", c ("estimate", "p.value")], "p.adjust" = p.adjust (tmpMat [, "HOMA.ir_BMI.adj.partial", "p.value"], method = "BH"))
  colnames (out) = paste (rep (c ("HOMA.IR", "HOMA.IR_BMI.adj.partial"), each = 3), colnames (out), sep = "_")
  if (tmp_name == "metlip") { rownames (out) = cluster_mapping_file [match( rownames (out), cluster_mapping_file$New_Name), "label"] }
  write.xlsx (x = out, file = tmp.excel.file, append = T, sheetName = paste (tmp_name, sep = "_"), row.names = T, col.names = T)            
}

###
### Step 12 - Select features with significant differences
###
### Here, combine and integrate those functional, taxonomic and metabolomics
### features which reliably correspond to the host phenotype of interest, then
### later determine their inter-correlations.
###

final.fdr.cutoffs = 0.1 ### FDR thresholds, change to your likings.

vartotest_union <- list ()
vartotest_intersect <- list ()
for (tmp_name in names (cor_HOMA.IR)) {
	tmpMat = cor_HOMA.IR [[paste (tmp_name)]]
	vartotest_union [[paste ("fdr", final.fdr.cutoffs, sep = "_")]] [[tmp_name]] =
	unique (c (names (which (p.adjust (tmpMat [ , "HOMA.ir", "p.value"], method = "BH") < final.fdr.cutoffs)), 
		   names (which (p.adjust (tmpMat [ , "HOMA.ir_BMI.adj.partial", "p.value"], method = "BH") < final.fdr.cutoffs))))
	vartotest_intersect [[paste("fdr", final.fdr.cutoffs, sep = "_")]] [[tmp_name]] =
	intersect (names (which (p.adjust (tmpMat[ , "HOMA.ir", "p.value"], method = "BH") < final.fdr.cutoffs)),
		   names (which (p.adjust (tmpMat[ , "HOMA.ir_BMI.adj.partial", "p.value"], method = "BH") < final.fdr.cutoffs)))
}

###
### Step 13 - Correlate metabolite clusters to functional metagenomic potentials
### (KOs)
###
### The set of metabolite clusters associated with the phenotype of interest
### should next be tested for association with the set of functional metagenome
### features likewise so associated (identified in Step 12). Here once again the
### complex nature of KEGG modules (consisting of multiple KOs) must be taken
### into account.
###
### In this step, the correlations between each metabolite/lipid cluster and
### each KO are determined.
###

ctrl.4KOs = intersect (ctrl, rownames (ko_abundance))

KO_MetLip_cor = matrix (NA, nrow = ncol (ko_abundance), ncol = ncol (MEsMetLip))
rownames (KO_MetLip_cor) = colnames (ko_abundance)
colnames (KO_MetLip_cor) = colnames (MEsMetLip)

for (m in colnames (MEsMetLip)) {
		KO_MetLip_cor [ , m] = apply (ko_abundance [ctrl.4KOs, ], MARGIN = 2, FUN = function (x) 
		  cor (x, MEsMetLip [ctrl.4KOs, m],
		       method = cor_method, use = "pairwise.complete.obs"))
}

###
### Step 14 - Associate metabolite clusters to functional metagenomic potentials
### (KEGG modules)
###
### Using the KO level data generated in Step 13, to calculate module-level
### associations between a KEGG module and a metabolite cluster.
###

### Select variables to include in the heatmap
### note that for the metabolite clusters (metlip) we take the intersect, to 
### reduce the number of clusters in the heatmap, but for the KEGG modules 
### (keggmodules) we use the union.
metlip2test = vartotest_intersect [[paste("fdr", final.fdr.cutoffs, sep = "_")]][["metlip"]] 
keggmodules2test = vartotest_union [[paste("fdr", final.fdr.cutoffs, sep = "_")]][["keggmodules"]] 

kegg_metlip_est = matrix (data = NA, nrow = length (keggmodules2test), ncol = length (metlip2test))
rownames (kegg_metlip_est) = keggmodules2test
colnames (kegg_metlip_est) = metlip2test
kegg_metlip_p = kegg_metlip_est

for (m in metlip2test) {
	for (k in keggmodules2test) {
		incat =    na.omit (KO_MetLip_cor [   rownames (KO_MetLip_cor) %in% koann [[k]],  m]) ### select all correlations between metlip group and KOs in the KEGG module
		notincat = na.omit (KO_MetLip_cor [! (rownames (KO_MetLip_cor) %in% koann [[k]]), m]) ### select all correlations between metlip group and KOs NOT in the KEGG module
		if (length (incat) > 0 & length (notincat) > 0) {
			x = wilcox.test (incat, notincat)
			kegg_metlip_p [k, m] = x$p.value
			kegg_metlip_est [k, m] = median (incat, na.rm = T) - median (notincat, na.rm = T)
		}
	}
}
rm (KO_MetLip_cor)

###
### Step 15 - Plot metabolome-microbiome functional analysis results
###
### Create visual representation of the generated results.
###

### load modified heatmap function from the gplots R-package, which allows for
### both multi-column row-sidebar and showing text within cells (here
### significance). Note this function (heatmap.3) is different from the
### 'heatmap3' R package and the heatmap.3 function in GMD package The latest
### version can be obtained from https://gist.github.com/amcdavid/5439787 We
### have tested the script with the version loaded below.
source("r-code/heatmap_3.R")

### BH adjust p-values for associations between KEGG modules and metabolite
### modules
kegg_metlip_p_adj = p.adjust (kegg_metlip_p, method = "BH")
dim (kegg_metlip_p_adj) = dim (kegg_metlip_p)
rownames (kegg_metlip_p_adj) = rownames (kegg_metlip_p)
colnames (kegg_metlip_p_adj) = colnames (kegg_metlip_p)

### Select only significant associations to show in heatmap (to make it visually
### comprehensible) I.e. exclude KEGG modules and metabolite modules with not at
### least one significant association. In this example, all rows/columns are
### kept.
tmp = kegg_metlip_p_adj < final.fdr.cutoffs
issig = rowSums (tmp, na.rm = T)
kegg = na.omit (names (issig [issig > 0]))
length (kegg)
issig = colSums (tmp, na.rm=T)
metlip = na.omit (names (issig [issig > 0]))
length (metlip)

### create matrix for heatmap
plotmat = kegg_metlip_est [kegg, metlip]

### create matrix with significance stars
plotmat_p = kegg_metlip_p_adj [kegg, metlip] ### p-values to make stars for heatmap
stars = matrix ("", ncol = ncol (plotmat_p), nrow = nrow (plotmat_p))
rownames (stars) = rownames (plotmat_p)
colnames (stars) = colnames (plotmat_p)
for (z in 1:ncol (stars)) {
  for (j in 1:nrow (stars)) {
    if (plotmat_p [j, z] < 0.1) {
      stars [j, z] = "+"
    }
    if (plotmat_p [j, z] < 0.01) {
      stars [j, z] = "*"
    }
    if (plotmat_p [j, z] < 0.001) {
      stars [j, z] = "**"
    }
  }
}
rm (plotmat_p)

### make sidebar with phenotype associations
pheno = c ("HOMA.ir", "HOMA.ir_BMI.adj.partial")
phenobar = matrix (NA, ncol = length (pheno), nrow = length (kegg))
colnames (phenobar) = pheno
rownames (phenobar) = kegg

for (j in pheno) {
  tmp = p.adjust (cor_HOMA.IR$keggmodules [ , j, "p.value"], method="BH") ### adjust column wise on phenotypes
  for (k in kegg) {
    if (tmp [k] >= 0.1) {
      phenobar [k, j] = "grey"
    } else {
      if (cor_HOMA.IR$keggmodules [k , j, "estimate"] > 0) {
        phenobar [k, j] = "darkblue"
      } else if (cor_HOMA.IR$keggmodules [k , j, "estimate"] < 0){
        phenobar[k, j] = "darkred"
      } else {
        phenobar[k, j] = "white"
      }
    }
  }
}

### Rename column names of phenobar
colnames (phenobar) = c("HOMA-IR", "HOMA-IR.BMI.adj")

### Note, normally one would cluster the rows and columns in the heatmap as
### shown below but for the paper (Pedersen et al, 2016) we needed to group the
### KEGG modules by biological similarity to make a higher-level annotation for
### the figure and thus made a manual arrangement.

### cluster rows of matrix 
d = dist (plotmat)
dend = as.dendrogram (hclust (d, method = "average"))
rm (d)

### cluster columns of matrix 
d2 = dist (t (plotmat))
dend2 = as.dendrogram (hclust (d2, method = "average"))
rm (d2)

### make heatmap with all the selected KEGG modules
pdf ("results/heatmap_KEGG_vs_metabolite_clusters.pdf", height = 10, width = 10)
heatmap.3 (plotmat,
          Rowv = dend, 
          Colv = dend2, 
          dendrogram = "column",
          col = rev (bluered (100)),
          symbreaks = T,
          key = T,
          symkey = T,
          keysize = 1,
          KeyValueName = "SCC.bg.adj",
          lhei = c(0.6, 4),
          cellnote = stars, 
          notecol = "black",
          notecex = 1.1,
          trace = "none",
          labRow = module_mapping_clean [rownames (plotmat)],
          labCol = cluster_mapping_file [match( colnames (plotmat), cluster_mapping_file$New_Name), "label"],
          RowSideColors = t (phenobar [rownames (plotmat), ]), 
          side.height.fraction = 0.35,    
          NumColSideColors = dim (phenobar) [2], 
          margins = c (23, 33),
          cexRow = 1,
          cexCol = 1
)
dev.off () 

###
### Step 16 (optional) - Export metabolome-microbiome associations for network
### analysis
###
### Further exploration of such high dimensional association data can be
### performed using software for network analysis, e.g. Cytoscape
### (www.cytoscape.org) or igraph (www.igraph.org). Here we provide code for
### exporting an edge-file with pairwise association scores and FDR-values
### between metabolite clusters and KEGG modules as well as a corresponding node
### attribute file; both in .txt-format. Several tutorials for importing,
### visualizing and analyzing networks in Cytoscape can be found here:
### https://github.com/cytoscape/cytoscape-tutorials/wiki.
###

### Note, for simplicity, the node and edge files are made for a network
### representing the heatmap plotted in Step 15 (i.e. KEGG modules in
### "keggmodules2test" and metabolite clusters in "metlip2test")

### Make edge file with the following four columns: 
### - KEGG_module, 
### - Metabolite_cluster, 
### - association estimate (SCC.bg.adj), 
### - FDR adjusted p-values
tmp.edge.file.p_adj = melt (kegg_metlip_p_adj [keggmodules2test, metlip2test])
tmp.edge.file.est   = melt (kegg_metlip_est [keggmodules2test, metlip2test])
colnames (tmp.edge.file.p_adj) = c ("KEGG_module", "Metabolite_cluster", "p.adjust")
colnames (tmp.edge.file.est)   = c ("KEGG_module", "Metabolite_cluster", "SCC.bg.adj")
edge.file = cbind (tmp.edge.file.est, "p.adjust" = tmp.edge.file.p_adj$p.adjust)
rm (tmp.edge.file.p_adj, tmp.edge.file.est)

### Save the edge-file as a tab-seperated txt.file
write.table (edge.file, file = "results/edge_file.txt", sep = "\t", row.names = F, col.names = T, quote = F)

### Make node annotation file with the following three columns: 
### - Node_name (corresponds to names in the edge-file), 
### - Description (where existing), 
### - Examples (example metabolites, only for metabolite clusters)
### Get metabolite annotation
node.file = cluster_mapping_file [cluster_mapping_file$New_Name %in% metlip2test, c ("New_Name", "Description", "Examples")]
colnames (node.file) = c ("Node_name", "Description", "Examples")
### Add KEGG module annotation
node.file = rbind (node.file, data.frame ("Node_name" = keggmodules2test, "Description" = module_mapping_clean [keggmodules2test], "Examples" = rep ("", time = length (keggmodules2test))))

### Save the node-file as a tab-seperated txt.file
write.table (node.file, file = "results/node_file.txt", sep = "\t", row.names = F, col.names = T, quote = F)

###
### Step 17 - Leave-one-MGS-out analysis
###
### This part of the approach allows testing which bacterial taxa (sensu
### metagenomics species (MGS) as defined from the metagenomics datasets
### themselves) are driving the functional effects seen. I.e. enabling
### assessment of the extent to which different taxa explain a functional
### potential association to a phenotype of interest, such as insulin resistance
### in the case of Pedersen et al., 2016. It is done by testing for each
### functional feature (here: KEGG modules) to what extent leaving out each MGS
### and the genes it contains causes a change in the association between those
### modules and the target phenotype. Thus, the first step is setting which
### modules and taxa will be tested, then performing this test for each
### combination, after which results are plotted and reported.
###

### Specify the MGSs to be tested in the leave-on-MGS-out analysis. 
### Here it is set to all 788 MGSs.
MGS.list = MGSs
### Specify the KEGG modules to test in the leave-one-MGS-out analysis, and get
### the KOs for those modules.
### Here it is set to all modules defined as significant associated with HOMA-IR
### and/or HOMA-IR.bmi.adjusted (i.e. 'keggmodules2test').
KOsets = koann [keggmodules2test] 
### Manually add the bcaa_biosynthesis 'mega-module' (relevant only for Pedersen
### et al., 2016)
KOsets [["bcaa_biosyn"]] = unique (c (koann [["M00019"]], koann [["M00570"]], koann [["M00535"]], koann [["M00432"]]))

### The following code can take long time.
### for testing purposes, especially on older systems, we recommend subsetting 
### to a few KEGG modules and maybe also MGSs.
# KOsets = list ("M00060"= KOsets[["M00060"]],  "bcaa_biosyn"=KOsets[["bcaa_biosyn"]])
# MGS.list = c ("T2DCAG00004", "T2DCAG00005", "T2DCAG00385", " T2DCAG00011")

### Functions: 

### Calculate abundance sum of ALL genes annotated to a KO
KOprofile_FUN <- function (KO) {
	KOgenes_i <- unique (unlist (KO2gene [KO])) ### get all genes that are annotated to the KO
	return (colSums (gene_abundance_sub [KOgenes_i,]))
}

### Calculate abundance sum of genes annotated to a KO while excluding the genes
### that are part of 'LeftOutMGS'
LOOKOprofile <- function (KO, LeftOutMGS) {
	if ( any (LeftOutMGS == unlist (KO2MGS [KO])) ){	KOgenes_i <- setdiff (unique (unlist (KO2gene [KO])), MGS2gene [[LeftOutMGS]])} 
	else { KOgenes_i <- unique (unlist (KO2gene [KO])) }
	return (colSums (gene_abundance_sub [KOgenes_i,]))
}

### The main function calculating the effect of leaving-one-MGS-out on the 
### association between a KEGG module and a phenotype of interest
LeaveOneMGSOut_FUN <- function (KOsets = KOsets, MGS.list = MGS.list, sample.list = sample.list, y, z = NULL) {
	### number of KOs kinds per MGS
	KO_types_per_MGS <- lapply (KOsets, function (KOset) {
		mgses <- sapply (KOset, function (KO) { intersect (unique (unlist (KO2MGS [KO])), MGS.list)})
		table (unlist (mgses))})
  
	if  (is.null (z) == T) { 
    
		print ("No 'z' is provided, calculating spearman correlation")
    
		### Code for spearman correlation, i.e. NOT adjusting for a confounding factor
		y = y [sample.list,]
    
		### correlating the mean KO signal (calculated with 'KOprofile_FUN') to 
		### HOMA-IR, and then taking the median SCC for the module
		print ("calculating spearman correlation, using all genes")
		SCCallMGS<-sapply (KOsets, function (KOset) { 
      			KOprofiles<-t (sapply (KOset, KOprofile_FUN))
			return (median (apply (KOprofiles [rowSums (KOprofiles) > 0, sample.list], 1, function (x) { 
				cor.test (x = x, y = y, method = "spearman", use = "pairwise.complete.obs", exact = FALSE)$estimate }))) 
		})
    
		### correlating the mean KO signal while excluding contribution from 'MGS' 
		### (calculated with 'LOOKOprofile') to HOMA-IR, and then taking the median 
		### SCC for the module.
		### Repeating for all MGS in 'MGS.list'
		print ("calculating spearman correlation, leaving-one-MGS-out")
		SCComitingMGS <- lapply (KOsets, function (KOset) {
			sapply (intersect (unique (unlist (KO2MGS[KOset])),MGS.list), function (MGS){ 
				KOprofiles <- (sapply (KOset, function (KO) { LOOKOprofile (KO, MGS) }))
				return (median (cor (x = KOprofiles [sample.list, colSums (KOprofiles) > 0], y = y, method = "spearman", use = "pairwise.complete.obs"))) }) 
		})
    
		### Summarizing outout
		DeltaSCCperMGS <- lapply (names (SCComitingMGS), function (N) {
			SCC = SCCallMGS [[N]]
			SCC.bgadj =  (median (na.omit (KO_cor_HOMA [names (KO_cor_HOMA) %in% KOsets [[N]]]), na.rm = T) 
				          - median (na.omit (KO_cor_HOMA[! (names (KO_cor_HOMA) %in% KOsets [[N]])]), na.rm = T)) 
			# summary(KO_cor_HOMA); adjust % SCC effect for background distribution (i.e. the fact that median(SCC for HOMA vs modules) is negative and not = 0)
			data.frame (SCC = SCC, SCC.bgadj = SCC.bgadj, SCC_omiting_MGS = SCComitingMGS [[N]], DeltaMGS_SCC = SCC - SCComitingMGS [[N]], 
				pctSCCeffect = 100 * (SCC - SCComitingMGS [[N]]) / SCC, pctSCCeffect.bgadj = 100 * (SCC - SCComitingMGS [[N]]) / SCC.bgadj,
				Distinct_KOs_in_MGS = as.vector (KO_types_per_MGS [[N]][names (SCComitingMGS [[N]])]), row.names = names (SCComitingMGS [[N]])
			) [rev (order (pctSCCeffect = 100 * (SCC - SCComitingMGS[[N]]) / SCC)),] 
		})

		names (DeltaSCCperMGS) <- names (KOsets)
		return (DeltaSCCperMGS)

	} 

	else {
    
		print ("'z' is provided, calculating partial spearman correlation, adjusting for 'z'")

		### Code for partial spearman correlation, i.e. adjusting for a confounding factor
		y = y [sample.list,]
		z = z [sample.list,]

		### correlating the mean KO signal (calculated with 'KOprofile_FUN') to 
		### HOMA-IR, and then taking the median SCC for the module
		print ("calculating partial spearman correlation, using all genes")
		partialSCCallMGS <- sapply (KOsets, function (KOset) {	
			KOprofiles <- t (sapply (KOset, KOprofile_FUN))
			return (median (apply (KOprofiles [rowSums (KOprofiles) > 0,], 1, function (x) { pcor.test (x = x [sample.list], y = y, z = z, method = "spearman")$estimate })))
		})

		### correlating the mean KO signal while excluding contribution from 'MGS' 
		### (calculated with 'LOOKOprofile') to HOMA-IR, and then taking the median 
		### SCC for the module.
		### Repeating for all MGS in 'MGS.list'
		print ("calculating partial spearman correlation, leaving-one-MGS-out")
		partialSCComitingMGS <- lapply (KOsets, function (KOset) {
			sapply (intersect (unique (unlist (KO2MGS [KOset])), MGS.list), function (MGS) { # looping over all MGGs in 'MGS.list'
				KOprofiles <- t (sapply (KOset, function (KO) {LOOKOprofile (KO, MGS) }))
				return (median (apply (KOprofiles [rowSums (KOprofiles) > 0,], 1, function (x) { pcor.test (x = x [sample.list], y = y, z = z, method="spearman")$estimate })))
			})
		})
    
		### Summarizing outout
		partialDeltaSCCperMGS<-lapply (names (partialSCComitingMGS), function (N){
			SCC = partialSCCallMGS[[N]]
			SCC.bgadj =  (median (na.omit (KO_cor_HOMAadjBMI[names (KO_cor_HOMAadjBMI) %in% KOsets[[N]]]), na.rm=T) 
				          - median (na.omit (KO_cor_HOMAadjBMI[! (names (KO_cor_HOMAadjBMI) %in% KOsets[[N]])]), na.rm=T)) 
			### summary(KO_cor_HOMA); adjust % SCC effect for background distribution (i.e. the fact that median(SCC for HOMA vs modules) is negative and not = 0) 
			data.frame (SCC =SCC, SCC.bgadj = SCC.bgadj, SCC_omiting_MGS = partialSCComitingMGS [[N]], DeltaMGS_SCC = partialSCCallMGS [[N]] - partialSCComitingMGS [[N]], 
				pctSCCeffect = 100 * (SCC - partialSCComitingMGS [[N]]) / SCC, pctSCCeffect.bgadj = 100 * (SCC - partialSCComitingMGS [[N]]) / SCC.bgadj,
				Distinct_KOs_in_MGS = as.vector (KO_types_per_MGS [[N]][names (partialSCComitingMGS [[N]])]), 
				row.names = names (partialSCComitingMGS [[N]])
			) [rev (order (pctSCCeffect = 100 * (SCC - partialSCComitingMGS [[N]]) / SCC)),] 
		})

		names (partialDeltaSCCperMGS) <- names (KOsets)
		return (partialDeltaSCCperMGS)

	}  

}
### End functions

### Next, run these functions to compute the delta SCC values for the test in
### question - change here if partial correlation to account for a confounder is
### desired.

### Performing the Leave-one-MGS-out for HOMA-IR
DeltaSCCperMGS <- LeaveOneMGSOut_FUN (KOsets = KOsets, MGS.list = MGS.list,
	sample.list = ctrl.no.na.4MGS, ### the subset of individuals with complete information for phenotype and metagenomic data.
	y = phenotypes [,"Homa.IR", drop = F]) 
save (DeltaSCCperMGS, file = "results/delta_SCC_per_MGS.RData")

### Performing the Leave-one-MGS-out for HOMA-IR.bmi.adjusted
# partialDeltaSCCperMGS = LeaveOneMGSOut_FUN (KOsets = KOsets, MGS.list = MGS.list,
#	sample.list = ctrl.no.na.4MGS, y = phenotypes [,"Homa.IR", drop = F], z = phenotypes [,"BMI.kg.m2", drop = F])
# save (partialDeltaSCCperMGS, file = "results/partial_delta_SCC_per_MGS.RData")

###
### Step 18 - Extracting top driver species from leave-one-MGS-out analysis for
### each microbiome functional module
###
### Here, having computed the contribution per taxon to each module, we extract
### top five drivers for interpretation.
###

topX = 5
output = as.data.frame (matrix (NA, nrow = length (DeltaSCCperMGS), ncol= (2 + 4 * topX)))
colnames (output) = c ("Module name","Number of genes in KEGG_module",
			"top1", "top1_Number of module genes in MGS", "top1_DeltaMGS_SCC",  "top1_pctSCCeffect_bg.adj",
			"top2", "top2_Number of module genes in MGS", "top2_DeltaMGS_SCC",  "top2_pctSCCeffect_bg.adj",
			"top3", "top3_Number of module genes in MGS", "top3_DeltaMGS_SCC",  "top3_pctSCCeffect_bg.adj",
			"top4", "top4_Number of module genes in MGS", "top4_DeltaMGS_SCC",  "top4_pctSCCeffect_bg.adj",
			"top5", "top5_Number of module genes in MGS", "top5_DeltaMGS_SCC",  "top5_pctSCCeffect_bg.adj")
rownames (output) = names (DeltaSCCperMGS)

output [,"Number of genes in KEGG_module"] = sapply (rownames (output), function (i) length (KOsets [[i]]))
output [,"Module name"] = module_mapping [rownames (output)]

for (i in names (DeltaSCCperMGS)) {

	if (length (rownames (DeltaSCCperMGS[[i]])) <= topX ) {

		tmp.MGSs.top = rownames (DeltaSCCperMGS [[i]]) [1:topX]
		tmp.MGSs.top [DeltaSCCperMGS [[i]][tmp.MGSs.top, "pctSCCeffect"] <= 0] <- NA ### set top.MGSs to NA if removing them are NOT INCREASING the effect
		tmp.MGSs = c (tmp.MGSs.top)

	} 

	else {

		tmp.MGSs.top = rownames (DeltaSCCperMGS [[i]]) [1:topX]
		tmp.MGSs.top [DeltaSCCperMGS [[i]][tmp.MGSs.top, "pctSCCeffect"] <= 0] <- NA ### set top.MGSs to NA if removing them are NOT INCREASING the effect
		tmp.MGSs = c (tmp.MGSs.top)

	}

	## Add taxonomically annotation (for those MGSs that have one)
	tmp.MGSs.names = sapply (tmp.MGSs, function (i) ifelse (i %in% rownames (mgs_taxonomy),
		paste (" ", i, " : ", mgs_taxonomy [i, "genus"], " sp ", sep = ""), # true
		paste (" ", i, " ", sep = "") )) # false
	tmp.KOsInMGS      = DeltaSCCperMGS [[i]][ tmp.MGSs, "Distinct_KOs_in_MGS"]
	tmp.delta         = DeltaSCCperMGS [[i]][ tmp.MGSs, "DeltaMGS_SCC"]
	tmp.pct_bg.adj    = DeltaSCCperMGS [[i]][ tmp.MGSs, "pctSCCeffect.bgadj"]
	output [i, 3:ncol (output)] = c (rbind (tmp.MGSs.names, tmp.KOsInMGS, tmp.delta, tmp.pct_bg.adj))

	rm (tmp.delta, tmp.pct_bg.adj, tmp.MGSs.names, tmp.KOsInMGS, tmp.MGSs, tmp.MGSs.top)

}

### write 'output' to tab-delimited text file
write.table (x = output, file = "results/top_driver_species.txt", row.names = T, col.names = NA, quote = F, sep = "\t")         

###
### Step 19 - Plotting leave-one-MGS-out results
###
### Having computed these results, we plot the top driver taxa for the
### phenotype-associated gene functions (Figure 4).
### These include sub-plots for:
### - Distribution/density plots of correlations for KOs in a KEGG module vs 
###   all other KOs
### - Distribution of correlations when leave-one-MGS-out. 
### - Distribution of correlations when leave-one-MGS-out. bg.adj. SCC 
###   (i.e. what was shown in Figure 3c+d in Pedersen et al., 2016)
### Median SCC for KO within a module (red) and all other remaining KOs (green)
### are indicated in the first two plots.
###

### Specify pdf-file for output.
pdf (file = "results/density_plot_SCC_HOMA.IR.pdf", width = 6 * 1, height = 3 * 3)

for (m in names (KOsets)) {
  
  ### Distribution of correlations for KOs in modules vs all other KOs
  incat =     na.omit (KO_cor_HOMA [names (KO_cor_HOMA) %in% KOsets [[m]] ]) ### select all correlations between HOMA-IR and KOs in the KEGG module
  incat2 =    data.frame ("KO_cor_HOMA" = incat, "cat" = "KOs in module")
  notincat =  na.omit (KO_cor_HOMA [! (names (KO_cor_HOMA) %in% KOsets [[m]])]) ### select all correlations between HOMA-IR and KOs NOT in the KEGG module   
  notincat2 = data.frame ("KO_cor_HOMA" = notincat, "cat" = "KOs not in module")
  tmp.df = rbind (incat2, notincat2)
  cdf <- ddply (tmp.df, "cat", summarise, rating.median = median (KO_cor_HOMA)) ### find median for each group
  g1 <- ggplot (tmp.df, aes (x = KO_cor_HOMA, fill = cat)) + 
    geom_density (alpha = 0.4) +
    geom_vline (data = cdf, aes (xintercept = rating.median, colour = cat), linetype = "dashed", size = 0.8) +
    ggtitle (paste0 ("KEGG module: ", m, 
                     "\n", strsplit (module_mapping [m], split = "\\[|,")[[1]][1], 
                     "\n", length (KOsets [[m]]), " KOs in module vs all remaining ", (length (KO_cor_HOMA) - length (KOsets[[m]])), " KOs", "\n")) +
    xlab ("SCC for KOs and HOMA-IR") +
    theme_classic () + theme (panel.grid.major = element_blank (), panel.grid.minor = element_blank ())
  
  ### Plot Distribution of correlations when leave-one-MGS-out.   
  if (nrow (DeltaSCCperMGS [[m]]) > 1) {
    g2 <- ggplot (DeltaSCCperMGS [[m]], aes (x = SCC_omiting_MGS)) + 
      geom_density (alpha = 1, fill = "grey", aes (y = ..scaled..)) + 
      geom_segment (aes (y = -0.1, yend = -0.02, x = SCC_omiting_MGS, xend = SCC_omiting_MGS)) +
      geom_vline (data = cdf, aes (xintercept = rating.median, colour = cat), linetype = "dashed", size = 0.8) +
      ggtitle (paste0 ("KEGG module: ", m, 
                       "\n", strsplit (module_mapping [m], split = "\\[|,")[[1]][1], 
                       "\nSCC for leave-1-MGS-out",
                       "\nNumber of MGSs = ", nrow (DeltaSCCperMGS [[m]]))) +
      xlab ("SCC for KOs and HOMA-IR") +
      theme_classic () + theme (panel.grid.major = element_blank (), panel.grid.minor = element_blank ())
    
  } else {
    
    ### Special circumstance when there is only one MGS (then one cannot make a 
    ### density plot, instead a black line is plotted at the value when that 
    ### one MGS is left out).
    
    g2 <- ggplot () +
      scale_x_continuous (limits = range (c (DeltaSCCperMGS [[m]][, "SCC_omiting_MGS"], cdf$rating.median))) +
      scale_y_continuous (name = "", limits = c (0, 1)) +
      geom_vline (data = cdf, aes (xintercept = rating.median, colour = cat), linetype = "dashed", size = 0.8) +
      geom_vline (data = DeltaSCCperMGS [[m]], aes (xintercept = SCC_omiting_MGS), linetype = "longdash", color = "black",size = 1) + 
      ggtitle (paste0 ("KEGG module: ", m, 
                       "\n", strsplit (module_mapping [m], split = "\\[|,") [[1]][1], 
                       "\nSCC for leave-1-MGS-out",
                       "\nNumber of MGSs = ", nrow (DeltaSCCperMGS [[m]]))) +
      xlab ("SCC for KOs and HOMA-IR") +
      theme_classic () + theme (panel.grid.major = element_blank (), panel.grid.minor = element_blank ()) 
    
  }
  
  ### Plot distribution of correlations when leave-one-MGS-out. Bg.adjust SCC 
  ### (i.e. what we show in Figure 3c+d)
  tmp.DeltaSCCperMGS = DeltaSCCperMGS [[m]]
  ### Calculate delta-SCC in relation to the background adjusted SCC. 
  ### i.e the 'plottedvalue.s.m' shown as the second last equation in the 
  ### methods section in Pedersen et al., 2016.
  tmp.DeltaSCCperMGS$DeltaMGS_SCC.bgadj = tmp.DeltaSCCperMGS$SCC.bgadj - tmp.DeltaSCCperMGS$DeltaMGS_SCC
  ### specify the range of the x-axis to include 0, i.e. [min:0] for negative 
  ### correlations and [0:max] for positive correlations
  x.range = c (min (0, tmp.DeltaSCCperMGS$DeltaMGS_SCC.bgadj), max (0, tmp.DeltaSCCperMGS$DeltaMGS_SCC.bgadj))
  
  if (nrow (tmp.DeltaSCCperMGS) > 1 ) {
    
    g3 <- ggplot (tmp.DeltaSCCperMGS, aes (x = DeltaMGS_SCC.bgadj)) + 
      geom_density (alpha = 1, fill = "grey", aes (y = ..scaled..)) + 
      geom_segment (aes (y = -0.1, yend = -0.02, x = DeltaMGS_SCC.bgadj, xend = DeltaMGS_SCC.bgadj)) +
      ggtitle (paste0 ("KEGG module: ", m, 
                       "\n", strsplit (module_mapping [m], split = "\\[|,")[[1]][1], 
                       "\nbg.adj.SCC for leave-1-MGS-out",
                       "\nNumber of MGSs = ", nrow (tmp.DeltaSCCperMGS))) +
      xlab ("bg.adj.SCC for KOs and HOMA-IR") + xlim (x.range) +
      theme_classic () + theme (panel.grid.major = element_blank (), panel.grid.minor = element_blank ())
    
  } else { 
    
    ### Special circumstance when there is only one MGS (then one cannot make a 
    ### density plot)
    
    g3 <- ggplot () +
      ggtitle (paste0 (m, 
                       "\n", strsplit (module_mapping [m], split="\\[|,")[[1]][1], 
                       "\nbg.adj.SCC for leave-1-MGS-out",
                       "\nNumber of MGSs = ", nrow (tmp.DeltaSCCperMGS), " - consequently not showing a density/rug plot")) 
    
  }
  
  g = plot_grid (g1, g2, g3, ncol = 1, nrow = 3, align = "v", labels = c ("a", "b", "c"), axis = "rl")
  # g = plot_grid (g1, g3, ncol = 1, nrow = 2, align = "v", labels = c ("a", "b", "c"), axis = "rl") ### use this if deleting g2
  plot (g)
  
}

dev.off ()  

### END

#print (sessionInfo ())

# R version 3.3.3 (2017-03-06)
# Platform: x86_64-apple-darwin13.4.0 (64-bit)
# Running under: macOS Sierra 10.12.6
# 
# locale:
# [1] da_DK.UTF-8/da_DK.UTF-8/da_DK.UTF-8/C/da_DK.UTF-8/da_DK.UTF-8
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] plyr_1.8.4            cowplot_0.9.1         ggplot2_2.2.1         gplots_3.0.1          ppcor_1.1            
# [6] MASS_7.3-47           flashClust_1.01-2     WGCNA_1.61            fastcluster_1.1.24    dynamicTreeCut_1.63-1
# [11] data.table_1.10.4-3  xlsx_0.5.7            xlsxjars_0.6.1        rJava_0.9-9          
# 
# loaded via a namespace (and not attached):
# [1]  Biobase_2.34.0        bit64_0.9-7           splines_3.3.3         foreach_1.4.3         gtools_3.5.0         
# [6]  Formula_1.2-2         stats4_3.3.3          latticeExtra_0.6-28   blob_1.1.0            fit.models_0.5-14    
# [11] yaml_2.1.14           robustbase_0.92-8     impute_1.48.0         RSQLite_2.0           backports_1.1.1      
# [16] lattice_0.20-35       digest_0.6.12         RColorBrewer_1.1-2    checkmate_1.8.5       colorspace_1.3-2     
# [21] htmltools_0.3.6       preprocessCore_1.36.0 Matrix_1.2-12         pcaPP_1.9-72          pkgconfig_2.0.1      
# [26] GO.db_3.4.0           mvtnorm_1.0-6         scales_0.5.0          gdata_2.18.0          htmlTable_1.11.2     
# [31] tibble_1.3.4          IRanges_2.8.2         nnet_7.3-12           BiocGenerics_0.20.0   lazyeval_0.2.1       
# [36] survival_2.41-3       magrittr_1.5          memoise_1.1.0         doParallel_1.0.11     foreign_0.8-69       
# [41] tools_3.3.3           matrixStats_0.52.2    stringr_1.2.0         S4Vectors_0.12.2      munsell_0.4.3        
# [46] cluster_2.0.6         AnnotationDbi_1.36.2  caTools_1.17.1        rlang_0.1.4           grid_3.3.3           
# [51] iterators_1.0.8       rstudioapi_0.7        htmlwidgets_0.9       robust_0.4-18         labeling_0.3         
# [56] bitops_1.0-6          base64enc_0.1-3       gtable_0.2.0          codetools_0.2-15      DBI_0.7              
# [61] reshape2_1.4.2        rrcov_1.4-3           gridExtra_2.3         knitr_1.17            bit_1.1-12           
# [66] Hmisc_4.1-1           KernSmooth_2.23-15    stringi_1.1.6         parallel_3.3.3        Rcpp_0.12.13         
# [71] rpart_4.1-11          acepack_1.4.1         DEoptimR_1.0-8
发送评论 编辑评论

发送评论编辑评论