JST200014 : Vol. 1 (2000) , No. 0 p.233

Analysis of DNA coding regions

Fumihiko Takeuchi¹⁾²⁾, Kenji Yamamoto¹⁾ and Hiroshi Yoshikura¹⁾

1) Research Institute, International Medical Center of Japan
2) Dept. of Information Science, Univ. of Tokyo

The most fundamental information of biological systems is encoded in their DNA or RNA sequences. Though these sequences contain the basic program for development, functions, sex, etc., the four nucleotides t (or u), c, a, g in the sequences seem to be aligned fairly random. To what extent are they random? This is the question we consider in this paper. The understanding of this randomness must be basic for system-level comprehension of biological systems. We approach this problem by analyzing two kinds of proportions of amino acids in coding regions of DNA sequences. The two are the real and theoretical proportions. The real proportion in a coding region is the usual proportion after translation. The theoretical proportion, introduced by King & Jukes, is the expected proportion calculated from the proportions of nucleotides in the coding region. If the nucleotides t, c, a, g in coding regions were aligned uniformly random, the two figures should match. However, this is not the case. We will analyze the tendencies these proportions have. We begin by verifying the results in King & Jukes, and then proceed to much extensive analysis, such as classification of amino acids according to their distribution of these proportions.