Error-correcting codes and genetics

  • Gérard Battail Ecole Nationale Supérieure des Télécommunications
Keywords: Biological evolution, error-correcting codes, genome conservation, genomic channel capacity, information theory, nested codes, soft codes


The conservation of genetic information through the ages can not be explained unless one assumes the existence of genomic error-correcting codes, our main hypothesis. Shielding by phenotypic membranes does not protect the genomes against radiations and their own quantum indeterminism. The cumulated errors then make the genomic memory ephemeral at the time scale of geology. Only means intrinsic to the genome itself, taking the form of error-correcting codes, can ensure the genome permanency. According to information theory, they can achieve reliable communication over unreliable channels, so paradoxical it may look, provided some conditions are met. The experience of communication engineers witnesses their high efficiency. As a subsidiary hypothesis, we assume moreover they take the form of `nested codes', i.e., that several codes are combined into a layered structure which results in an unequal protection: the older and more fundamental parts of the genomic information are better protected than more recent ones. Looking for how nature implements error-correcting codes, we are led to assume that they rely on the many physical, steric, chemical and linguistic constraints to which the DNA molecule and the proteins for which they code are subjected. Taking account of these constraints enables to regenerate the genome provided the number of accumulated errors remains less than the correcting ability of the code, i.e., after a short enough time. Based on these hypotheses, fundamental results of information theory explain basic features of the living world, especially that life proceeds by successive generations, the discreteness of species and their hierarchical taxonomy, as well as the trend of evolution towards complexity. Other consequences are that evolution proceeds by jumps and that the genomic message originates in random regeneration errors. That basic results of information theory and error--correcting codes explain biological facts left unexplained by today's biology confirms the necessity of our hypotheses. The direct experimental identification of genomic error-correcting codes and regeneration means still lacks, however, but it would obviously require the active
collaboration of practicing geneticists.