Keywords br Rectified factor networks br Biclustering
Rectified factor networks
Detecting cancer-related genes and their interactions is a crucial task in cancer research. For this purpose, we proposed an efficient method, to detect coding genes, microRNAs (miRNAs), and their interactions related to a particular cancer or a cancer subtype using their POM 1 data from the same set of samples. Firstly, biclusters specific to a particular type of cancer are detected based on rectified factor networks and ranked according to their associations with general cancers. Secondly, coding genes and miRNAs in each bicluster are prioritized by considering their differential expression and differential correlation values, protein–protein interaction data, and potential cancer markers. Finally, a rank fusion process is used to obtain the final comprehensive rank by combining multiple ranking results. We applied our proposed method on breast cancer datasets. Results show that our method outperforms other methods in detecting breast cancer-related coding genes and miRNAs. Furthermore, our method is very efficient in computing time, which can handle tens of thousands genes/miRNAs and hundreds of patients in hours on a desktop. This work may aid researchers in studying the genetic archi-tecture of complex diseases, and improving the accuracy of diagnosis.
Detecting cancer-related genes, including coding genes and microRNAs (miRNAs), as well as their interactions is a crucial task in cancer research, which could help researchers focus further efforts on the most promising biomarkers . For this purpose, various compu-tational methods have been developed, which could be classified into three categories: single gene, network based and gene module methods. Single gene based methods rely on gene expression profile analysis [2,3]. For example, Endeavour  assesses the BLAST scores of can-didates against the known cancer genes and prioritize candidates that are homologous to seed biomarker genes. Makhijani et al.  and Torrente et al.  identified cancer-related genes by compiling a large number of gene expression datasets and choosing common differen-tially expressed genes (DEGs) as cancer-related genes. Similar methods were also used in cancer-related miRNAs identification. For example, researchers started to explore the involvement of miRNAs in cancers through computational analyses of their expression data with statistical tests such as Student’s t-test , Wilcoxon signed-rank test and ANOVA [7,8].
Network-based methods rely on the guilt-by-association paradigm, i.e., to infer functions of poorly characterized genes from their
E-mail address: [email protected] (D. Xu).
associations with other well-described genes. The association can be in the form of gene co-expression and gene-gene interaction between candidate genes and known cancer genes [9–11]. In the past few years, several network-based methods for analyzing cancer-related coding genes  or cancer-related miRNAs  were proposed. For instance, NetICS  is a graph diffusion-based method for prioritizing cancer genes by integrating diverse molecular data types on a directed func-tional interaction network. NetICS prioritizes genes by their proximities to upstream aberration events, and to downstream differentially ex-pressed genes. GenePANDA  assesses whether a gene is likely a candidate disease gene based on its relative distance to known disease genes in a functional association network. SIiR-NBI  prioritizes miRNAs as potential biomarkers on the basis of a heterogeneous net-work connecting drugs, miRNAs and genes. KATZ  uses the func-tional similarity scores to denote the associations based on the distances between the miRNAs and disease nodes. Recently, module-based methods are widely used, as many cancers are believed to be caused by the dysfunctional regulation in a set of functionally related genes rather than a single gene. For example, FGMD  uses a hierarchical clus-tering algorithm on gene and isoform expression data to identify functional gene modules, and ranks them by the ratio of known cancer genes in each module. MGOGP  uses predefined gene modules and