How to Memorize the Genetic Code
Overview
The same genetic code, aside from a very few minor variations, is used by all life on Earth. The four nucleotides adenine, thymine, guanine, and cytosine act like "letters" of DNA, and are denoted A, T, G, and C. In RNA, thymine is replaced with uracil, abbreviated U. Every gene is made up of codons, groups of three nucleotides that stand for a single "letter" of a protein, any one of 20 commonly occurring amino acids, that are joined together head to tail to make the protein. They're called amino acids because they each have an acid end and an amino (alkaline) end, and they differ only by their side chains that branch out from the middle. Since there are 64 possible codons but only 20 amino acids (a few organisms have 21), there are more than enough codons to go around and most amino acids have two or four codons. These mnemonics mostly adhere to the RNA nucleotides of A, U, G, and C.
Table from Wikipedia. CC-BY-SA 3.0.
The genetic code can be represented as a grid divided into blocks, where all codons within a given block start with the same two nucleotides. A block contains four codons. Two of the nucleotides U (or T) and C are physically smaller molecules, while the other two A and G are larger. Some blocks are divided into two half blocks, where one half has codons that end in the smaller-molecule letters and the other has codons ending in larger-molecule letters. All that's important to remember is whenever there's a half block, one will end in U and C while the other ends in A and G, and then which amino acids occupy half blocks and which occupy full blocks.
Amino Acids
It's a good idea to be a little bit familiar with the 20 common amino acids before proceeding. You don't have to memorize everything about them yet, but it helps to know that each one has a designated letter just like the nucleotides do. The mnemonics are included in the table for completeness, but don't memorize them yet; we'll cover them in detail afterwards.
Name | Side Chain | Mnemonic | Name | Side Chain | Mnemonic | ||
---|---|---|---|---|---|---|---|
A | Alanine | -CH3 | Gassy. | L | Leucine | -CH2CH(CH3)CH3 | Luuse; see you, Lucy! |
R | Arginine | -CH2CH2CH2NHC(NH2)=NH2+ | See her go, Argo; argent twice. | K | Lysine | -CH2CH2CH2CH2NH3+ | Triple-A, 'k? | N | Asparagine | -CH2C(=O)NH2 | AA battery's N terminal. | M | Methionine | -CH2CH2CH2SCH3 | Augment. | D | Aspartic acid | -CH2C(=O)O- | Smaller acid, like peaches. | F | Phenylalanine | -CH2-C6H5 (benzene ring) | All of you are free. | C | Cysteine | -CH2SH | UnderGround sulfur or ugcky. | P | Proline | -CH2CH2CH2- (back to the amino end) | Coil contorting. | E | Glutamic acid | -CH2CH2C(=O)O- | Larger acid, like peaches. | S | Serine | -CH2OH | You see how serious; serious metals. | Q | Glutamine | -CH2CH2C(=O)NH2 | Caaged bird eating gluten. | T | Threonine | -CH(CH3)OH | 309 volts of AC. | G | Glycine | -H | G in, G out. | W | Tryptophan | -CH2-C8H6N (indole rings) | Ugly UGG. | H | Histidine | -CH2-C3H3N2 (imidazole ring) | Catch the cat: hiss. | Y | Tyrosine | -CH2-C6H4-OH | Wow, two functional groups! | I | Isoleucine | -CH(CH3)CH2CH3 | Hey you, Isolucy! | V | Valine | -CH(CH3)CH3 | Guv. |
The Mnemonics
For convenience's sake, we will mostly go in the sequence U, C, A, G counting by the second letter and then the first letter.
You are all free: UUUF. The UUU half block is phenylalanine (F).
The other UU half block is luuse leucine.
Speaking of leucine: Imagine you're visiting your friend Lucy and it's time to go so you say "see you, Lucy." CU full block is leucine.
Say "hey you" to Lucy's sister Isolucy. AU full block is almost all isoleucine, with one exception:
AUG like "augment" is the start codon. It tells the ribosome to begin making protein if it hasn't started already. It also codes for methionine, so every protein starts with M.
GU full block: There is a joke where someone says "give us a copper, guv." V is valine, so give us a valine, GUV. GU full block is valine.
You see how serious it is. UC full block is serine.
Because of its relative rigidity, proline puts kinks in helices. Coil Contorting proline is CC full block. You can also remember it as CCCP, just know it's a full block.
How about 309 volts of alternating current? AC full block is threonine.
GC kinda sounds like "gassy", and natural gas is mostly methane, and alanine's side chain is a methyl group. GC full block is alanine.
UAU looks like "wow", as in wow, tyrosine has two functional groups! (A benzene ring and a hydroxyl.) UAU half block is tyrosine.
The other UA half block consists of Useless Agent Alpha and Useless Agent Gamma. They're so useless, they can't code for anything! They're stop codons, that is, they signal the end of the gene and tell the ribosome to stop making protein.
Using DNA letters for this one: when I tried to catch (CAC) the CAT, it hissed (histidine) at me.
CAA and CAG are the "caaged" half block. There's a parrot in the cage eating crackers, which are full of gluten, and the caaged half block codes for glutamine.
For the other amide, think of a AA battery, maybe one with a copper top since it is the C/U half block. But we're looking at the negative, or N, terminal. N is asparagine, and so are AAC and AAU.
Triple-A is lysine, 'k? K is the letter for lysine. This is the other AA half block.
GA forms two half blocks. Georgia peaches, like all fruits, contain acids. The smaller-molecule half block is the smaller-molecule aspartic acid, while the larger-molecule half block is larger glutamic acid.
Ugly UGG is the bulkiest amino acid (tryptophan).
I can stop this protein: UGA-chaka. (It's a stop codon.) Or, alternatively: stop that annoying and culturally insensitive chant!
As for the smaller-molecule UG half block, think Under Ground, which is where sulfur is mined from, and the smallest amino acid to have sulfur is cysteine. You can also think of it as an ugcky thiol, since cysteine is the only thiol amino acid, but be careful not to confuse ugcky with the UC full block, because that's seriously serine.
See her go, the great ship Argo, of ancient Greek legend. CG full block is arginine.
Silver (Ag) and copper (Cu) can be worth serious money. AGC and AGU are serine.
AGA and AGG are like Ag twice. In French silver is argent; AGA and AGG are arginine again.
GG full block is glycine. If your RNA strand is all GGGGGGG then the protein will be all GGGGGG; no other letter does this.
And now with these mnemonics, you can read any gene sequence of almost any organism and translate it to its protein sequence. No one actually has to do this, since we have computers that can do the translation for us with a lookup table, although at the end of this page I will demonstrate translating a small gene by memory. What we can do is look at a mutation and see what effect it has. Here are the first several codons of the human HBB gene, which codes for the globin protein, which complexes with the heme molecule to make hemoglobin, the protein that carries oxygen in the blood. Observe the difference between the normal and the variant sequences:
Normal: AUG GUG CAU CUG ACU CCU GAG GAG AAG UCU GCC ... Variant: AUG GUG CAU CUG ACU CCU GUG GAG AAG UCU GCC ...
It's hard to see, but the difference is in the seventh codon: GUG instead of GAG. We know that the normal sequence has GAG, and we recall the mnemonic that GA peaches contain acids, and since the last letter G is a larger nucleotide, we get the larger glutamic acid. For GUG we remember "give us a valine, Guv", therefore this variant has valine instead of glutamic acid.
You might have guessed by now what effect this has. While glutamic acid is very hydrophilic, i.e. it likes water and wants to face out away from the center of the protein, valine's side chain is very hydrophobic and prefers to surround itself with other hydrophobic side chains. In the case of HBB, this mutation causes the protein molecules to link up into a chain (they polymerize), rendering them insoluble and changing the shape of the red blood cell into what looks like a sickle. This is the mutation that causes sickle cell disease.
Other variants exist on this codon that also cause sickle cell disease, namely changing GAG (glutamic acid) to GGG (remember all G in, all G out, so glycine) or GCG (remember GC sounds like gassy, so methane, methyl group, so alanine). Glycine and alanine also have hydrophobic side chains, so they function similarly to valine in this case.
Now for the manual gene translation. Here's the sequence of the human OXT gene, the precursor to oxytocin, in mRNA format:
AUGGCCGGCCCCAGCCUCGCUUGCUGUCUGCUCGGCCUCCUGGCGCUGACCUCCGCCUGCUACAUCCAGAACUGCCCCCUGGGAGGCAAGAGGGCCGCG CCGGACCUCGACGUGCGCAAGUGCCUCCCCUGCGGCCCCGGGGGCAAAGGCCGCUGCUUCGGGCCCAAUAUCUGCUGCGCGGAAGAGCUGGGCUGCUUC GUGGGCACCGCCGAAGCGCUGCGCUGCCAGGAGGAGAACUACCUGCCGUCGCCCUGCCAGUCCGGCCAGAAGGCGUGCGGGAGCGGGGGCCGCUGCGCG GUCUUGGGCCUCUGCUGCAGCCCGGACGGCUGCCACGCCGACCCUGCCUGCGACGCGGAAGCCACCUUCUCCCAGCGCUGA
It looks like a mouthful, but it's actually one of the smaller human genes. Going through it codon by codon, we get:
Codon | Mnemonic | Result |
---|---|---|
AUG | Augment | Start: M methionine |
GCC | Gassy | A alanine |
GGC | G in, G out | G glycine |
CCC | coil contortion | P proline |
AGC | Ag+Cu serious | S serine |
CUC | See you, Lucy! | L leucine |
GCU | Gassy | A alanine |
UGC | UnderGround sulfur. | C cysteine |
UGU | UnderGround sulfur. | C cysteine |
CUG | See you, Lucy! | L leucine |
CUC | See you, Lucy! | L leucine |
GGC | G in, G out | G glycine |
CUC | See you, Lucy! | L leucine |
CUG | See you, Lucy! | L leucine |
GCG | Gassy | A alanine |
CUG | See you, Lucy! | L leucine |
ACC | 309 volts | T threonine |
UCC | You see it's serious | S serine |
GCC | Gassy | A alanine |
UGC | UnderGround sulfur. | C cysteine |
UAC | Same half block as wow. | Y tyrosine |
AUC | Hey you, Isolucy. | I isoleucine |
CAG | Caged bird eating gluten. | Q glutamine |
AAC | AA battery, Cu top, N terminal. | N asparagine |
UGC | UnderGround sulfur. | C cysteine |
CCC | Coil contort | P proline |
CUG | See you, Lucy! | L leucine |
GGA | G in, G out. | G glycine |
GGC | G in, G out. | G glycine |
AAG | Same half block as triple-A | K lysine |
AGG | Double argent. | R arginine |
GCC | Gassy | A alanine |
GCG | Gassy | A alanine |
CCG | Coil contort. | P proline |
GAC | Georgia peaches, acid, smaller mol. | D aspartic acid |
CUC | See you. | L leucine |
GAC | GA peaches, smaller acid. | D aspartic acid |
GUG | Give us a valine, Guv! | V valine |
CGC | See her go, Argo. | R arginine |
AAG | Same as triple-A. | K lysine |
UGC | UnderGround. | C cysteine |
CUC | See you. | L leucine |
CCC | Coil contorting. | P proline |
UGC | UnderGround. | C cysteine |
GGC | G in, G out. | G glycine |
CCC | Coil contorting. | P proline |
GGG | G in, G out. | G glycine |
GGC | G in, G out. | G glycine |
AAA | Triple-A. | K lysine |
GGC | G in, G out. | G glycine |
CGC | See her go. | R arginine |
UGC | UnderGround. | C cysteine |
UUC | All of you (half block). | F phenylalanine |
GGG | G => G. | G glycine |
CCC | Contort. | P proline |
AAU | AA batt, Cu top. | N asparagine |
AUC | Hey you! | I isoleucine |
UGC | UnderGround. | C cysteine |
UGC | UnderGround. | C cysteine |
GCG | Gassy. | A alanine |
GAA | GA peaches, larger acid. | E glutamic acid |
GAG | GA peaches, larger acid. | E glutamic acid |
CUG | See you! | L leucine |
GGC | G => G | G glycine |
UGC | UnderGround. | C cysteine |
UUC | All of U. | F phenylalanine |
GUG | Guv. | V valine |
GGC | G => G | G glycine |
ACC | 309V AC. | T threonine |
GCC | Gassy. | A alanine |
GAA | GA larger acid. | E glutamic acid |
GCG | Gassy. | A alanine |
CUG | See you! | L leucine |
CGC | See her go. | R arginine |
UGC | UnderGround. | C cysteine |
CAG | Caged bird eating gluten. | Q glutamine |
GAG | GA larger acid. | E glutamic acid |
GAG | GA larger acid. | E glutamic acid |
AAC | AA battery. | N asparagine |
UAC | Wow! | Y tyrosine |
CUG | See you! | L leucine |
CCG | Contort. | P proline |
UCG | You see how serious. | S serine |
CCC | Contort. | P proline |
UGC | UnderGround. | C cysteine |
CAG | Caged. | Q glutamine |
UCC | You see. | S serine |
GGC | G => G. | G glycine |
CAG | Caged. | Q glutamine |
AAG | Triple-A half block. | K lysine |
GCG | Gassy. | A alanine |
UGC | UnderGround. | C cysteine |
GGG | G => G. | G glycine |
AGC | Ag+Cu serious. | S serine |
GGG | G => G. | G glycine |
GGC | G => G. | G glycine |
CGC | See her go. | R arginine |
UGC | UnderGround. | C cysteine |
GCG | Gassy. | A alanine |
GUC | Guv. | V valine |
UUG | Luuse. | L leucine |
GGC | G => G. | G glycine |
CUC | See you! | L leucine |
UGC | UnderGround. | C cysteine |
UGC | UnderGround. | C cysteine |
AGC | Ag+Cu. | S serine |
CCG | Contort. | P proline |
GAC | GA smaller acid. | D aspartic acid |
GGC | G => G. | G glycine |
UGC | UnderGround. | C cysteine |
CAC | Tried to catch the cat. | H histidine |
GCC | Gassy. | A alanine |
GAC | GA smaller acid. | D aspartic acid |
CCU | Contort. | P proline. |
GCC | Gassy. | A alanine |
UGC | UnderGround. | C cysteine |
GAC | GA smaller acid. | D aspartic acid |
GCG | Gassy. | A alanine |
GAA | GA larger acid. | E glutamic acid |
GCC | Gassy. | A alanine |
ACC | 309V AC. | T threonine |
UUC | All of U half block. | F phenylalanine |
UCC | You see. | S serine |
CAG | Caged bird. | Q glutamine |
CGC | See her go. | R, arginine |
UGA | Stop that chant. | Stop codon. |
Therefore, this gene encodes a protein with the following sequence:
MAGPSLACCLLGLLALTSACYIQNCPLGGKRAAPDLDVRKCLPCGPGGKGRCFGPNICCAEELGCFVGTAEALRCQEENYLPSPCQSGQKACGSGGRCAV LGLCCSPDGCHADPACDAEATFSQR
...which we can see is correct because it is identical to the translation about ¾ of the way down the NCBI page.