References
[1] P. D’haeseleer, “How does gene expression clustering work?” Nat. Biotechnol., vol. 23, pp. 1499–501, 2005.
[2] N. D. Heintzman, G. C. Hon, R. D. Hawkins, P. Kheradpour, A. Stark, L. F. Harp, Z. Ye, L. K. Lee, R. K. Stuart, and C. W. Ching, “Histone modifications at human enhancers reflect global celltype- specific gene expression,” Nature, vol. 459, no. 7243, pp. 108–112, 2009.
[3] R. K. Chodavarapu, S. Feng, Y. V. Bernatavichute, P.-Y. Chen, H. Stroud, Y. Yu, J. a. Hetzel, F. Kuo, J. Kim, S. J. Cokus, D. Casero, M. Bernal, P. Huijser, A. T. Clark, U. Kramer, S. S. Merchant, X. Zhang, S. E. Jacobsen, and M. Pellegrini, “Relationship between nucleosome positioning and DNA methylation,” Nature, vol. 466, pp. 388–92, 2010.
[4] X. Wang, G. O. Bryant, M. Floer, D. Spagna, and M. Ptashne, “An effect of DNA sequence on nucleosome occupancy and removal,” Nat. Publishing Group, vol. 18, pp. 507–509, 2011.
[5] A. S. Shirkhorshidi, S. Aghabozorgi, T. Y. Wah, T. Herawan, “Big Data Clustering: A Review” Computational Science and Its Applications – ICCSA 2014Volume 8583 of the series Lecture Notes in Computer Science pp 707-720.
[6] M. L. Sogin, H. G. Morrison, J. A. Huber, D. Mark Welch, S. M. Huse, P. R. Neal, J. M. Arrieta, and G. J. Herndl, “Microbial diversity
[7] in the deep sea and the underexplored ‘rare biosphere’”, Proc. Nat. Acad. Sci. USA, vol. 103, no. 32, pp. 12115– 12120, 2006.
[8] S. M. Huse, D. M. Welch, H. G. Morrison, and M. L. Sogin. (2010).Ironing out the wrinkles in the rare biosphere through improved OTU clustering,” Environmental Microbiol., vol. 12, no. 7, pp. 1889–1898.
[9] J. G. Caporaso, J. Kuczynski, J. Stombaugh, K. Bittinger, F. D. Bushman, E. K. Costello, N. Fierer, A. G. Pena, J. K. Goodrich, J. I. Gordon, G. A. Huttley, S. T. Kelley, D. Knights, J. E. Koenig, R. E. Ley, C. A. Lozupone, D. McDonald, B. D. Muegge, M. Pirrung, J. Reeder, J. R. Sevinsky, P. J. Turnbaugh, W. A. Walters, J. Widmann, T. Yatsunenko, J. Zaneveld, and R. Knight, “QIIME allows analysis of high-throughput community sequencing data,” Nature Methods, vol. 7, no. 5, pp. 335–336, May 2010.
[10] R. C. Edgar. (2010). “Search and clustering orders of magnitude faster than BLAST” Bioinformatics, vol. 26, no. 19, pp. 2460–2461.
[11] R. C. Edgar, “UPARSE: highly accurate OTU sequences from microbial amplicon reads,” Nat. Methods, vol. 10, no. 10, pp. 996– 8, Oct. 2013.
[12] P. D. Schloss, S. L. Westcott, T. Ryabin, J. R. Hall, M. Hartmann, E. B. Hollister, R. A. Lesniewski, B. B. Oakley, D. H. Parks, C. J. Robinson, J. W. Sahl, B. Stres, G. G. Thallinger, D. J. V. Horn, and C. F. Weber, “Introducing mothur: Open-source platform-independent community supported software for describing and comparing microbial communities”, Appl. Envir. Microbiol., vol. 75, no. 23, pp. 7537–7541, 2009.
[13] Y. Sun, Y. Cai, L. Liu, F. Yu, M. L. Farrell, W. McKendree, and W. Farmerie, “ESPRIT: Estimating species richness using large collections of 16S rRNA pyrosequences”, Nucleic Acids Res., vol. 37, no. 10, p. e76, 2009.
[14] Y. Cai and Y. Sun., “ESPRIT-Tree: Hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time” Nucleic Acids Res., vol. 39, no. 14, p. e95, 2011.
[15] R. C. Edgar., “MUSCLE: Multiple sequence alignment with high accuracy and high throughput”, Nucleic Acids Res., vol. 32, no. 5, pp. 1792–1797, 2004.
[16] Y. Sun, Y. Cai, S. M. Huse, et al., “A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis,” Briefings in Bioinformatics, vol. 13, no. 1, pp. 107–121, 2011.
[17] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: A new data clustering algorithm and its applications”, Data Mining Knowl. Discovery, vol. 1, no. 2, pp. 141–182, 1997.
[18] W. Li and A. Godzik., “Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences”, Bioinformatics, vol. 22, no. 13, pp. 1658–1659, 2006.
[19] Schloss PD, Handelsman J., “Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness” Appl Environ Microbiol 71:1501–1506. http://dx.doi.org/ 10.1128/AEM.71.3.1501, 2005.
[20] Albanese D, Fontana P, De Filippo C, Cavalieri D, Donati C., “Micca: a complete and accurate software for taxonomic profiling of metagenomic data”, Sci Rep 5:9743, http://dx.doi.org/10.1038/srep09743, 2015.
[21] Mahé F, Rognes T, Quince C, de Vargas C, Dunthorn M., “Swarm: robust and fast clustering method for amplicon-based studies”, PeerJ 2:e593, http://dx.doi.org/10.7717/peerj.593, 2014.
[22] Mahé F, Rognes T, Quince C, de Vargas C, Dunthorn M., “Swarm v2: highly-scalable and high-resolution amplicon clustering”, PeerJ 3:e1420, http://dx.doi.org/10.7717/peerj.1420, 2015.
[23] Kopylova E, Noé L, Touzet H., “SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data”, Bioinformatics 28:3211–3217. http://dx.doi.org/10.1093/bioinformatics/bts611, 2012.
[24] Hobohm U, Scharf M, Schneider R, Sander C., “Selection of representative protein data sets” Protein Sci 1, 409 – 417, http:// dx.doi.org/10.1002/pro.5560010313, 1992.
[25] Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R., “UCHIME improves sensitivity and speed of chimera detection” BioInformatics 27, 2194–2200,
[26] Legendre P, Legendre L., “Numerical ecology”, 2nd ed, Developments in environmental modelling, vol 20, p . Elsevier Science, Amsterdam, The Netherlands, 1998