Unsupervised bicluster analysis is a hot topic in Bioinformatics and has become an invaluable tool for extracting knowledge from high-dimensional -omics data. Biclustering simultaneously organizes a data matrix into subsets of rows and columns in which the entities of each row subset are similar to each other on the column subset and vice versa. This simultaneous grouping of rows (e.g. genes, bioassays, or chemical fingerprints) and columns (e.g. conditions or compounds) allows identifying new subgroups within the conditions, e.g. in drug design where researchers want to reveal how compounds affect gene expression (the effects of compounds may only be similar on a subgroup of genes) or for identifying chemical substructures that are shared by bioactive compounds. Standard clustering methods are not suited to tackle these kinds of problems. We therefore present a new biclustering approach, called FABIA, which goes far beyond the usually clustering concept. FABIA is a multiplicative latent variable model that extracts linear dependencies between column and row subsets by forcing both the hidden factors and the loading matrix to be sparse.
FABIA is a mathematical well-founded analysis technique that allows exploring high-dimensional data in an unsupervised manner and thereby shedding new light on the dark matter of many biological problems. During the poster session, we will present:
a) the FABIA model for extracting biclusters and their ranking according to information content;
b) results from a high-throughput compound screening;
c) biclustering ChEMBL?s bioactive small molecules (16 million chemical fingerprints times 1 million compounds)