| | | | Abstract: With the increasing amount of available genome sequences, novel tools are needed for comprehensive analysis and visualization of species-specific sequence characteristics for wide varieties of genomes. An unsupervised neural-network algorithm, Kohonen's self-organizing map (SOM), is an effective tool for clustering and visualizing a large amount of complex data on a single map. We modified the conventional SOM for genome informatics, making the learning process and resulting map independent of the order of data input. We used the modified SOM to characterize di-to pentanucleotide frequencies in a total of approximately 10-Gb sequence derived from both prokaryotic and eukaryotic genomes for which complete sequences are known. SOMs could classify the 10- and 100-kb sequences of these genomes mainly according to species on a single map and revealed sequence characteristics of individual genomes. The unsupervised algorithm could recognize, in most of the sequence fragments, the species-specific characteristics (key combinations of oligonucleotide frequencies) that are signature features of each genome. In other words, SOMs could systematically extract profound genomic information from the oligonucleotide frequency in each genome. Because species-specific separation on a SOM was very clear, SOM could be established as a novel strategy for phylogenetic classification of sequence fragments obtained from uncultured microorganism mixtures in an environmental or clinical sample. | | | |