Topic Introduction

Guide RNAs: A Glimpse at the Sequences that Drive CRISPR–Cas Systems

  1. Rodolphe Barrangou1,2
  1. 1North Carolina State University, Raleigh, North Carolina 27695

    Abstract

    CRISPR–Cas systems provide adaptive immunity in bacteria and archaea. Although there are two main classes of CRISPR–Cas systems defined by gene content, interfering RNA biogenesis, and effector proteins, Type II systems have recently been exploited on a broad scale to develop next-generation genetic engineering and genome-editing tools. Conveniently, Type II systems are streamlined and rely on a single protein, Cas9, and a guide RNA molecule, comprised of a CRISPR RNA (crRNA) and trans-acting CRISPR RNA (tracrRNA), to achieve effective and programmable nucleic acid targeting and cleavage. Currently, most commercially available Cas9-based genome-editing tools use the CRISPR–Cas system from Streptococcus pyogenes (SpyCas9), although many orthogonal Type II systems are available for diverse and multiplexable genome engineering applications. Here, we discuss the biological significance of Type II CRISPR–Cas elements, including the tracrRNA, crRNA, Cas9, and protospacer-adjacent motif (PAM), and look at the native function of these elements to understand how they can be engineered, enhanced, and optimized for genome editing applications. Additionally, we discuss the basis for orthogonal Cas9 and guide RNA systems that would allow researchers to concurrently use multiple Cas9-based systems for different purposes. Understanding the native function of endogenous Type II CRISPR–Cas systems can lead to new Cas9 tool development to expand the genetic manipulation toolbox.

    WHAT ARE CRISPR–Cas SYSTEMS?

    All organisms have a need to protect themselves and develop immunity against foreign invaders. Since the mid-2000s, researchers have been studying the ability of CRISPR–Cas adaptive immune systems to protect bacteria and archaea against potentially damaging foreign nucleic acids (Mojica et al. 2005; Makarova et al. 2006; Barrangou et al. 2007; Tyson and Banfield 2008). Clustered regularly interspaced short palindromic repeats (CRISPR) and associated sequences (cas genes) function as the immunization records and immunity systems to provide resistance against phages, plasmids, and potentially harmful nucleic acids. Generally, CRISPR–Cas systems protect the cell in three steps: (1) acquisition of a nucleic acid sequence from an invader, (2) expression and biogenesis of small interfering RNAs, and (3) interference and cleavage of foreign nucleic acid of similar or homologous sequence upon re-introduction into the cell (Barrangou et al. 2007; Makarova et al. 2011, 2015; Koonin and Makarova 2013; van der Oost et al. 2014).

    All CRISPR–Cas systems have repeat spacer arrays with conserved repeats between 24 and 47 nucleotides flanking variable spacer sequences that are derived from the foreign DNA of invaders (Koonin and Makarova 2013; van der Oost et al. 2014). Additionally, all three systems use the conserved Cas1 and Cas2 proteins during the acquisition stage to detect foreign nucleic acids and store the sequences as short DNA spacers between two repeats (Arslan et al. 2014; Nuñez et al. 2014; Heler et al. 2015). The sequence stored in the repeat-spacer array is referred to as the spacer (Jansen et al. 2002), whereas the homologous sequence on the foreign DNA is referred to as the protospacer (Deveau et al. 2008). New spacers are always added to the leader end of the array (Barrangou et al. 2007); this polarized acquisition ensures that immunization events are actively transcribed and maintained for protection against the most recent invaders as transcription of the array is driven by the leader sequence which contains promoter elements (Andersson and Banfield 2008; Tyson and Banfield 2008; Wei et al. 2015). After the acquisition stage, the entire repeat-spacer array is transcribed into a single RNA transcript that contains all of the repeats and spacers encoded in the locus (Brouns et al. 2008). In order to become functional, the RNA transcript, called the precursor-CRISPR RNA (pre-crRNA), must be cleaved into smaller RNA molecules, called CRISPR RNAs (crRNAs), that contain a partial CRISPR repeat and partial spacer (Brouns et al. 2008; Deltcheva et al. 2011; Karvelis et al. 2013). Once the crRNAs have been generated, they guide the Cas effector proteins to their complementary protospacers; subsequently, the Cas proteins target, cleave, and degrade the invading complementary nucleic acid (Brouns et al. 2008; Garneau et al. 2010; Gasiunas et al. 2012). In CRISPR immunity, Cas proteins are able to distinguish target DNA from spacer-containing, nontarget self DNA by the presence of the protospacer adjacent motif (PAM) on the target (Marraffini and Sontheimer 2010). The PAM is a short nucleotide sequence flanking the protospacer on the foreign DNA (Deveau et al. 2008; Horvath et al. 2008; Mojica et al. 2009). When stored in the repeat-spacer array, a spacer sequence is not adjacent to a PAM, therefore preventing the CRISPR–Cas systems from self-targeting and cleaving the host chromosome (Deveau et al. 2008; Marraffini and Sontheimer 2010; Heler et al. 2015).

    TWO CLASSES, FIVE TYPES, 16 SUBTYPES

    There are two classes of CRISPR–Cas systems which can further be broken down into five system types; these different classes are distinguished by CRISPR repeat length and sequence, cas gene content and locus architecture, crRNA biogenesis and composition, and effector protein type and activity (Makarova et al. 2011, 2015). All CRISPR–Cas systems contain the universal cas1 and cas2 genes; however, these two classes can be broken down into five types based on the presence of the signature gene for each type: cas3, cas9, cas10, and cpf1 for Types I, II, III, and V, respectively. Class I CRISPR–Cas systems, namely Type I and Type III systems, have been found in bacteria and archaea, and share similarities in the acquisition, expression, and interference stages of CRISPR immunity. Both of these systems use a large multiprotein complex called either Cascade (CRISPR-associated complex for antiviral defense) (Type I) or the Cmr/Csm complex (Type III) to target and cleave foreign DNA and occasionally RNA when guided by a crRNA (Brouns et al. 2008; Hale et al. 2009). Additionally, their CRISPR repeats are highly palindromic and form hairpin loops that allow a Cas ribonuclease, Cas6, to cleave the pre-crRNA into individual interfering crRNAs (Kunin et al. 2007; van der Oost et al. 2014).

    Class II systems are characterized by Type II and Type V, which use a single signature protein to both bind and cleave foreign DNA (Makarova et al. 2015). Type V systems are characterized by the Cpf1 signature protein which uses a crRNA molecule containing a partial repeat and full spacer to recognize, bind, and cleave foreign sequences (Zetsche et al. 2015). Conversely, the Type II systems use the signature Cas9 protein to bind and cleave target DNA but additionally require a second RNA molecule to complex with the crRNA; additionally, RNaseIII activity is required to generate individual crRNAs in Type II systems. This second RNA molecule, called the trans-activating RNA (tracrRNA, pronounced “tracer-RNA”), is important in both the biogenesis of crRNAs and Cas9-guided interference against foreign DNA (Deltcheva et al. 2011; Sapranauskas et al. 2011; Gasiunas et al. 2012; Chylinski et al. 2013, 2014; Karvelis et al. 2013). The tracrRNA contains a partial complementary antirepeat at the 5′ end that allows the molecule to base pair with the repeat portion of the crRNA; this complementary region forms three structural modules within the guide RNA that are important for Cas9 functionality: the lower stem, bulge, and upper stem (Briner et al. 2014). The 3′ end of the tracrRNA molecule comprises several hairpin structures that are key in nucleotide binding interactions with Cas9, namely the nexus and hairpins. The first hairpin structure in the tracrRNA, called the nexus, can take several structural forms, but often has a conserved nucleotide sequence in the base of the hairpin stem. Beginning with the U in the GU wobble at the base of the lower stem, the motif UnAnnC can be found in the majority of IIA nexus hairpins in tracrRNAs (Briner et al. 2014). The terminal hairpins vary in number, size, and structure, but typically contain a Rho-independent transcriptional terminator hairpin that is GC-rich and followed by a string of U’s at the 3′ end. The hairpins and nexus are key factors in determining Cas9 orthogonality and cross-compatibility (Esvelt et al. 2013; Briner et al. 2014; Fonfara et al. 2014).

    TYPE II CRISPR–Cas SYSTEMS

    Although there are five system types to date, distribution of systems is not equal among all the groups. Overall, CRISPR–Cas systems have been detected in 47% of all bacterial and archaeal genomes. Type I systems are the most dominant by far, constituting ~60% of total CRISPR loci bacterial and archaeal genomes. Type III systems are the second most dominant system type occurring more frequently in archaea (34% of all archaeal CRISPR loci) than bacteria (25%) (Makarova et al. 2015). Type IV and V systems are definitively the rarest types of CRISPR–Cas system, constituting <2% of overall CRISPR–Cas systems. Notwithstanding the vast diversity and high rates of occurrence of CRISPR–Cas systems, Type II systems are only harbored by bacteria and are estimated to occur in <5% of all bacterial genomes (Makarova et al. 2011, 2015; Chylinski et al. 2014). Despite their rare occurrence, the system is diverse enough to be broken down into three subtypes (IIA, IIB, IIC) that have distinct Cas9 size, repeat size, array orientation, and guide RNA composition. The IIA subtype is the best characterized system and contains model systems like Streptococcus pyogenes (Spy) and Streptococcus thermophilus (Sth), and Staphylococcus aureus (Anders et al. 2014; Chylinski et al. 2014; Nishimasu et al. 2014; Ran et al. 2015). From these organisms, we have learned that there are two distinct groups of Cas9 sizes in the IIA subtype; these systems contain long Cas9s that are approximately 1300 amino acids in length, as characterized by the Sth-CRISPR3 locus and Spy, or contain short Cas9s around 1100 amino acids in length, as characterized by the Sth-CRISPR1 locus and S. aureus (Horvath et al. 2008). The IIA repeats are always 36 nucleotides in length and orientation is often easy to determine as the arrays contain degenerate repeats that differ from the consensus sequence opposite the leader end. Because the leader end of the repeat-spacer array is actively being maintained through transcription and addition of new spacers, it is hypothesized that spacer excision and repeat recombination events may lead to mutations and single nucleotide polymorphisms toward the ancestral end. The tracrRNAs are either located between the cas9 and cas1 genes or upstream of the cas9 gene (Chylinski et al. 2013; Fonfara et al. 2014). Occasional reports have found antirepeats thought to be potential tracrRNAs between the csn2 gene and the start of the repeat-spacer array and downstream from the repeat-spacer array (Chylinski et al. 2013, 2014; Briner et al. 2014).

    Type IIB and IIC systems are less characterized than IIAs and have been shown to not have a clear conservation in orientation or size of repeats. Although repeats can be 36 nucleotides in length like the IIAs, some IIB repeats can be as large at 47 nucleotides. Additionally, the orientation of these repeats cannot be determined by looking for the degenerate end of the repeat-spacer array and usually must be determined through RNA-sequencing of crRNAs. The size of Cas9s in these subsystems can vary greatly. The Neisseria meningitidis (Nme) Cas9 (IIC) is 1082 amino acids in length and the Legionella pneumophila (IIB) Cas9 is 1372 amino acids in length (Esvelt et al. 2013; Hou et al. 2013; Chylinski et al. 2014). There has also not been clear characterization of the tracrRNAs in IIC and IIB systems, although often times they resemble the canonical IIA tracrRNA without the bulge module between the lower and upper stems. Interestingly, tracrRNAs for these systems still often contain a nexus hairpin like the IIA systems with a conserved sequence motif seen in all IIA tracrRNAs. Terminal hairpins are also present in IIB and IIC systems.

    The tracrRNA molecule is critical for Type II CRISPR–Cas functionality and performance for both native bacterial immunity and exploitation to generate genome-editing tools. During the expression stage of native CRISPR–Cas immunity, the pre-crRNA containing all of the repeat and spacer sequences is transcribed into one molecule; the antirepeat, complementary portion of the tracrRNA is able to base pair with the repeat portions of the pre-crRNA with the aid of Cas9 (Gasiunas et al. 2012). When the double-stranded RNA (dsRNA) molecule is formed, a native RNase III, encoded by the rnc gene, cleaves the RNA molecule in the repeat segment, forming individual single repeat-spacer units (Deltcheva et al. 2011; Gasiunas et al. 2012). A secondary unknown RNase then trims the crRNA at the 5′ end of the spacer portion of the crRNA so that only 19–22 nucleotides of a partial CRISPR spacer is left intact. In minimal repeat-spacer arrays containing a single spacer flanked by repeats, no RNase III processing is necessary and the crRNA:tracrRNA duplex retains both flanking repeats during the interference stage (Karvelis et al. 2013).

    THE tracrRNA INTERACTS CLOSELY WITH Cas9

    Co-evolution of core CRISPR elements has led to divergent, orthogonal systems that are not cross-compatible (Horvath et al. 2009). Distinct evolution events have allowed the elements within a CRISPR locus, including Cas9, Cas1, CRISPR repeat, and tracrRNA, to remain compatible and adapted to one another by developing defined characteristics that distinguish the elements from one locus to be incompatible with elements from another. This close co-evolution of core elements is supported by the fact that each unique Cas9 and its tracrRNA:crRNA interact through specific binding of protein residues to guide RNA functional modules.

    The tracrRNAs in Type IIA systems always contain the five canonical functional modules in addition to the spacer module: upper stem, bulge, lower stem, nexus, and hairpins (Fig. 1A; Briner et al. 2014). The first three modules, the upper stem, bulge, and lower stem, are formed by the base-pairing between the CRISPR repeat and the complementary antirepeat portion of the tracrRNA. The upper stem sequence and length are variable, but it serves as the site for dsRNA cleavage by RNaseIII during crRNA biogenesis. The bulge is variable in sequence and size but is always kinked in the same direction. The size of lower stem varies based on the system, but contains between four and eight base pairs, often ending in a G-U wobble at the base of the hairpin. Based on the Spy Cas9 guide RNA- and target DNA-bound structure, the phosphate backbone of the upper and lower stem, formed by the repeat–antirepeat binding, interacts in a sequence-independent manner with Cas9, demonstrating the crRNA:tracrRNA repeat structure is more important for Cas9 recognition than the repeat sequence (Fig. 1B; Anders et al. 2014; Nishimasu et al. 2014). The arginine bridge helix of the Cas9 binds with the base of the lower stem and the nexus, the first hairpin in the tracrRNA, restructuring into an active conformation (Jinek et al. 2012; Anders et al. 2014; Nishimasu et al. 2014). There are two key nucleotide in the nexus that directly interact with amino acid residues in SpyCas9 and drive the structural conformation in the protein that allow it to target and cleave foreign targets; one uracil in the nexus protrudes from the hairpin and interacts with an asparagine residue in the recognition lobe of Cas9, while an adenosine nucleotide binds with an arginine residue in the protein. Together, these interactions allow Cas9 to access the double-stranded DNA at the PAM site and initiate Cas9 based targeting and cleavage (Anders et al. 2014; Nishimasu et al. 2014; Sternberg et al. 2014; Heler et al. 2015; Ran et al. 2015).

    Figure 1.
    View larger version:
      Figure 1.

      Modules of native crRNA:tracrRNA duplexes and engineered single guide RNAs interact with protein domains of Cas9. The six functional modules formed in the crRNA:tracrRNA duplex that allow it to bind Cas9 (light gray) are shown in A; these include the spacer (black), lower stem (dark gray), bulge (green), upper stem (yellow), nexus (blue), and terminal hairpins (red). The protein domains within Cas9 that bind and interact with the guide RNA are shown in B, including the arginine bridge helix (green), PAM interacting motif (blue), HNH nuclease domain (red), and RuvC nuclease domain (yellow). Additionally, the recognition (REC) lobe and nuclease (NUC) lobe of Cas9 are labeled. The full native dual crRNA (blue) and tracrRNA (red) duplex is shown for S. pyogenes, S. aureus, and N. meningitidis in C, E, and G, respectively. The single guide RNA is shown for the same organisms in complex with target DNA (black) flanked by the PAM (green), for S. pyogenes (Jinek et al. 2012), S. aureus (Ran et al. 2015), and N. meningitidis (Hou et al. 2013) in D, F, and H, respectively. The sgRNA combines the crRNA and tracrRNA by a 4-nt tetraloop (dark gray).

      BIOTECHNOLOGY APPLICATIONS

      What practically allowed Cas9-based technology to catapult to the primary genetic engineering tool, arguably, was the technological advance of artificially combining the tracrRNA and crRNA through a nucleotide tetra-loop (Jinek et al. 2012; Mali et al. 2013). The artificially combined crRNA:tracrRNA duplex is referred to as a single guide RNA (sgRNA), and reduced the natural four component system (Cas9:RNaseIII:crRNA:tracrRNA) to a synthetic two-component system (Cas9::sgRNA), greatly increasing ease of use and portability of the system. To exploit the DNA-targeting power of Cas9, one simply needs to understand the various components of tracrRNAs and designing corresponding guide RNAs (Fig. 1C–H).

      Understanding the key modules within the tracrRNA has allowed researchers to concurrently use multiple Cas9 tools in a single cell without having to rely on the same protein, guide, and PAM during different applications. This multiplexing potential is made possible through utilization of CRISPR–Cas9 systems that contain tracrRNAs that have variable nexus and hairpin regions (Esvelt et al. 2013; Briner et al. 2014; Fonfara et al. 2014). It was determined that these two modules are the key to determine orthogonality between systems and establish the boundaries of compatibility between systems.

      To date, the Cas9-based genome-editing tool box is capable of performing genetic techniques ranging from sequence-specific introduction of double-stranded breaks to gene regulation to fluorescent imaging of chromosome loci (Doudna and Charpentier 2014; Sternberg and Doudna 2015). High-throughput studies made possible by Cas9 have already begun to revolutionize the rate and depth of genome-wide surveys intended to help researchers better understand the DNA code (Cong et al. 2013; Mali et al. 2013). Although researchers are rapidly improving and expanding the specificity, range of applications, and portability of Cas9-based tools, this technology is still fairly young and presents an exciting opportunity for deeper characterization and understanding. Although great depth has already been achieved in understanding and utilization of Cas9-based genome editing tools, the surface of the potential for this technology has barely been scratched. In our accompanying protocol, we present a strategy to identify elements of bacterial immune systems that will allow us to develop next-generation Cas9-based genome editing tools (see Protocol: Prediction and Validation of Native and Engineered Cas9 Guide Sequences [Briner et al. 2016]). Through discovery, validation, and development of new and unique Cas9 proteins and their corresponding tracrRNAs, crRNAs, and PAMs, the CRISPR–Cas9 genome-editing technology has the potential to become as flexible as restriction enzymes and arguably as commonplace as PCR in molecular genetics laboratories.

      Footnotes

      • 2 Correspondence: rbarran{at}ncsu.edu

      REFERENCES

      | Table of Contents