Introduction

This project discusses about one fundamental concept related to the innovative CRISPR-Cas9 gene editing tool: a bacteria’s CRISPR-Cas adaptive immunity system. As a biology major, I find that it helps to visualize biological mechanisms to fully understand the material; the CRISPR-Cas system is one of such mechanism that I feel would benefit from through some sort of simulation. Therefore, I developed a MVC-based simulation using Java and the Java Swing library.

The motivation for the MVC design is because I believe the adaptive immunity mechanism can be written as some sort of computer algorithm. For those of you who are interested in bioinformatics like myself, this project is a perfect way to start to see the intricate connections between biology and computer science.

Lastly, the actual coding of the simulation is simplified for various reasons. Firstly, for the sake of teaching students about this biological concept, the focus should be on maximizing the presentation of the simulation rather than how the simulation is made. Secondly, scientists don’t know yet how the CRISPR-Cas system fully functions mechanically to completely mimic the biological process, thus I cannot develop a thorough algorithm even if I wanted to. The end goal is to facilitate an efficient learning experience regarding bacterial CRISPR-Cas adaptive immunity, so the code can always be improved as I learn more about the interdisciplinary field of computer science and biology.

Note: I will use some technical words to fully explain the biology concept . If you don’t understand all of them, it is okay and I will try to explain as much in detail as possible without scaring you away. I have written an index at the end of this post, so feel free to take a look at it!

What is CRISPR-Cas Adaptive Immunity?

All these fancy-sounding words seem very daunting, but I assure you it is not a difficult concept to learn. Essentially, bacterial CRISPR-Cas adaptive immunity behaves just like our immune system: it detects a harmful virus, removes it, and remembers this viral infection event. Like how our immune system produces antibodies to detect future viruses, a bacteria stores a section of the viral DNA and uses complementary base pairing instead to detect future viruses. Complementary base pairing is basically how the four nucleotides ATCG binds to its complementary nucleotides TAGC (A binds to T, C binds to G, vice versa).

CRISPR-Cas adaptive immunity occurs in three stages: adaptation, CRISPR-RNA biogenesis, and interference. The next few sections will be dedicated to explaining these three stages.

In adaptation, the virus enters the bacteria and is intercepted by the Cas1-Cas2 complex. The complex then extracts only a portion of the viral DNA to insert into the CRISPR array, the main storage for viral DNA of viruses that have infected the bacteria. This CRISPR array is the bacteria’s “memory” of these past viruses, consisting of repeats and spacers. In the simulation you’ll see a red and green strand entering the gray bacteria, which represent the entire viral DNA. The orange polygon is the Cas1-Cas2 complex, and the green strand is the viral DNA spacer it will extract and insert into the CRISPR array.

The next stage, as you might have guessed, is the generation of CRISPR-RNAs, crRNAs for short. At this stage, the bacteria transcribes the CRISPR array and produces crRNA-Cas RNP complexes. CRISPR-associated proteins, or Cas, are proteins that provide various functionalities in the CRISPR-Cas system, one of them being attaching to copies of the spacers to form those complexes. These complexes would act as surveillance in the bacteria to see if there are any viral DNAs that match with the crRNA through complementary base pairing. The simulation shows this stage as the CRISPR array separating into pieces, but that is not biologically correct. In reality, a compound “skims” over the CRISPR array and makes copies of the green and red spacers as shown in the simulation.

Lastly, interference is when a Cas-crRNA complex finds a viral DNA, again through complementary base pairing. Once it does, the viral DNA is degraded and the bacteria would be safe from a virus attack! It is as simple as it sounds.

This diagram does a much detailed job on outlining the entire process, so take a look at it if you are feeling ambitious!

Figure taken from: Amitai, G., Sorek, R. CRISPR–Cas adaptation: insights into the mechanism of action. Nat Rev Microbiol14, 67–76 (2016). https://doi.org/10.1038/nrmicro.2015.14

Real Life Applications

As I mentioned earlier, this CRISPR-Cas concept has found crucial usage in the world of gene editing. How it actually works is actually quite complicated, so I will not be explaining it here. However, to know that something as simple as a bacteria’s immune system can be used in gene editing is astonishing. It leaves a lot of room for imagination to what we can do as our gene editing techniques become more and more advanced.

There has been previous work done in agriculture with genetically-modified crops, but they used techniques that were inefficient and costly. With CRISPR-based gene editing, scientists are starting to look at therapeutic applications such as cancer treatments and disease prevention. A lot of them share one thing in common, and that is using gene editing to knockout, or remove, the gene(s) that causes the cancer/disease in the first place. They also fare much better compared to the other gene editing techniques in terms of cost and efficiency.

Conclusion

A lot of our biological understanding of humans and other organisms stem from research with more primitive organisms such as bacteria. Bacterial CRISPR-Cas adaptive immunity is a prime example of scientists using a seemingly simple concept to advance research in a field as advanced as gene editing. For those who are more interested in the biological aspect of this topic, the next concept to learn would be the CRISPR-Cas9 gene editing technique, the same one scientists are using for therapeutic gene editing. As for learning more about bioinformatics, take a look at Rosalind, a platform similar to Leetcode that has a ton of bioinformatics problems instead.

The source code for the simulation can be found at my github, alongside a .jar file for the simulation. I hope you enjoyed learning about this as much as I did!

Index

I have written here a list of terms/concepts I think that may be confusing or need further explanation. Hopefully this is helpful!

CRISPR = Clustered Regularly Interspaced Short Palindromic Repeats

Spacers = The segment of viral DNA that is saved in the CRISPR array

Repeats = The palindromic repeats that is characteristic of CRISPR-Cas

Transcription = The first step in how DNA is replicated, where the DNA is first copied into an RNA strand through complementary base pairing.

Complementary base pairing = DNA consists of only four nucleotides: adenine, thymine, cytosine, and guanine (A, T, C, and G for short). Two famous scientists James Watson and Francis Crick correctly theorized that A will only bind with T, C with G, and vice versa. This pattern of binding is called complementary base pairing.

Author | Phillip Wei is currently a sophomore at Northeastern University’s Khoury College of Computer Sciences. He is pursuing a B.S. degree in Computer Science and Biology and is interested in exploring more interdisciplinary opportunities regarding Computer Science and Biology. Phillip also enjoys learning about new technologies in PC building and performing/listening to all sorts of music