Gene sequencing - Wikipedia

Most living things store information within their cells in the form of DNA, RNA or with a similar molecule.

Humans store their information principally as DNA in the nucleus of the cell. Humans have 22 pairs of autosomal (non-sex) chromosomes, plus either two X chromosomes (for women) or an X chromosome and a Y chromosome for men which determine the sex. Other arrangements of chromosomes occur but these are generally related to disease states. Within these chromosomes are long sections of DNA and this DNA holds most of the genetic information held by the cell. DNA has a double helix structure and the coding regions of the molecule are formed of four nucleotides bound to the DNA. These are adenine (A), cytosine (C), guanine (G) and thymine (T). A always pairs with T and C with G, and so these are known as base pairs. Although other nucleotides exist in RNA and in non-human species their purpose is much the same.

The base pairs running along the DNA structure from a three letter code which holds information required to make RNA which in turn controls cell function, development, gene expression and protein production. The section of DNA which codes for a base pair sequence which produces RNA for a particular purpose is a gene.

Genes are vital to the understanding of the many diseases and disease states (for example cystic fibrosis, dwarfism, mental illness, birth defects and more minor problems such as colour blindness) which they are responsible for.

It is therefore of great interest to know what genes we have, how many there are and where they are.

At the start it was mentioned that many lifeforms use DNA and RNA. They too have genes and studying these genes can tell scientists many things, which allows advances in other fields such as evolutionary biology, crop science, veterinary science, and pharmacology.

Modern techniques enable genes to be sequenced, i.e. their position and size determined. Recent projects such as the Human Genome Project have lead to widely publicised understanding of approximately what genes humans have and where they are. Other, less well known studies have elucidated the structure of the genome of other organisms such as fruit flies (widely used in other genetic research), and nematode worms.

At a basic level, genes can be sequenced in a number of ways. The most basic is to chop the DNA in a chromosome with restriction enzymes and then attach them to markers. Segments of DNA can be attached to marker molecules because of the specificity of the base pairs for each other. These fragments can be duplicated if necessary by PCR (the Polymerase Chain Reaction) and base pairs in each sequence determined. Large computers are then used to tag all the fragments together. This approach tells you what the code is, but does not decipher it i.e. it does not tell you what the code produces or what the purpose of the product is.

Some companies have been allowed to patent genes, thereby restricting research on them. One example has been the patenting of the breast cancer genes BCRA1 and 2. This is a highly contentious point in research and many academic groups have sought to publish gene sequences so that they cannot be patented.

Determining what any given sequence does is much harder. Information can be found from patients who have defects in a gene (for example, a large proportion of people with spontaneous dwarfism have a mutation in the gene which produces a growth factor) or from animals such as transgenic mice or gene knockout mice or fruitflies. The search is further complicated by the fact that only sections of a gene actually code for RNA. The rest seems to have a regulatory function, but is not well understood.