(19)
(11)EP 2 877 488 B1

(12)EUROPEAN PATENT SPECIFICATION

(45)Mention of the grant of the patent:
22.07.2020 Bulletin 2020/30

(21)Application number: 13776595.4

(22)Date of filing:  24.07.2013
(51)International Patent Classification (IPC): 
C07K 14/435(2006.01)
(86)International application number:
PCT/US2013/051783
(87)International publication number:
WO 2014/018601 (30.01.2014 Gazette  2014/05)

(54)

NEW MODULAR BASE-SPECIFIC NUCLEIC ACID BINDING DOMAINS FROM BURKHOLDERIA RHIZOXINICA PROTEINS

NEUE MODULARE BASENSPEZIFISCHE NUKLEINSÄUREBINDENDE DOMÄNEN AUS BURKHOLDERIA-RHIZOXINICA-PROTEINEN

NOUVEAUX DOMAINES DE LIAISON D'ACIDE NUCLÉIQUE SPÉCIFIQUES À LA BASE MODULAIRES À PARTIR DE PROTÉINES BURKHOLDERIA RHIZOXINICA


(84)Designated Contracting States:
AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

(30)Priority: 24.07.2012 US 201261675160 P
01.02.2013 US 201361759744 P

(43)Date of publication of application:
03.06.2015 Bulletin 2015/23

(73)Proprietor: Cellectis SA
75013 Paris (FR)

(72)Inventors:
  • BERTONATI, Claudia
    75003 Paris (FR)
  • DUCHATEAU, Philippe
    91210 Draveil (FR)
  • JUILLERAT, Alexandre
    75014 Paris (FR)
  • SILVA, George
    94420 Le Plessis-Trevise (FR)
  • VALTON, Julien
    94220 Charenton-Le-Pont (FR)

(74)Representative: Santarelli 
49, avenue des Champs-Elysées
75008 Paris
75008 Paris (FR)


(56)References cited: : 
WO-A1-2011/146121
WO-A2-2013/152220
  
  • DATABASE UniProt [Online] 8 February 2011 (2011-02-08), "SubName: Full=Plasmid pBRH01, complete sequence;", XP002723337, retrieved from EBI accession no. UNIPROT:E5AV36 Database accession no. E5AV36 & LACKNER GERALD ET AL: "Complete Genome Sequence of Burkholderia rhizoxinica, an Endosymbiont of Rhizopus microsporus", JOURNAL OF BACTERIOLOGY, vol. 193, no. 3, February 2011 (2011-02), pages 783-784,
  • DATABASE UniProt [Online] 8 February 2011 (2011-02-08), "SubName: Full=Plasmid pBRH02, complete sequence;", XP002723338, retrieved from EBI accession no. UNIPROT:E5AW45 Database accession no. E5AW45
  • DATABASE UniProt [Online] 8 February 2011 (2011-02-08), "SubName: Full=Plasmid pBRH02, complete sequence;", XP002723339, retrieved from EBI accession no. UNIPROT:E5AW43 Database accession no. E5AW43
  • M. M. MAHFOUZ ET AL: "De novo-engineered transcription activator-like effector (TALE) hybrid nuclease with novel DNA binding specificity creates double-strand breaks", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 108, no. 6, 8 February 2011 (2011-02-08), pages 2623-2628, XP055007615, ISSN: 0027-8424, DOI: 10.1073/pnas.1019533108
  • HEIDI SCHOLZE ET AL: "TAL effectors are remote controls for gene activation", CURRENT OPINION IN MICROBIOLOGY, vol. 14, no. 1, 5 January 2011 (2011-01-05), pages 47-53, XP028359313, ISSN: 1369-5274, DOI: 10.1016/J.MIB.2010.12.001 [retrieved on 2010-12-15]
 
Remarks:
The file contains technical information submitted after the application was filed and not included in this specification
 
Note: Within nine months from the publication of the mention of the grant of the European patent, any person may give notice to the European Patent Office of opposition to the European patent granted. Notice of opposition shall be filed in a written reasoned statement. It shall not be deemed to have been filed until the opposition fee has been paid. (Art. 99(1) European Patent Convention).


Description

Field of the invention



[0001] The present invention concerns the field of genetic engineering and the reprogramming of cells functions using protein fusions involving new modular specific nucleic acid binding domains.

[0002] These modular nucleic acid binding domains result from the rearrangement of genomic sequences coming from Burkholderia rhizoxinica, a bacterial endosymbiont of the fungus Rhizopus microsporus.

[0003] Fusion proteins of these new engineered binding domains with catalytic domains of different nucleic acid processing enzymes, in particular catalytic domains having endonuclease activity, permit the processing of genomes at desired targeted loci.

Background of the invention



[0004] Significant progress has been made over the last years in the way genomes can be investigated and modified in living cells. The main challenge in this matter is to transfect the living cells with enzyme molecules that are able to process targeted genetic sequences in a sequence specific manner, without inducing toxicity. This goal has been reached using enzymes derived from natural proteins, for instance by creating variants of homing endonucleases, also called meganucleases (Stoddard, Monnat et al. 2007; Arnould, Delenda et al. 2011), but also by creating fusion proteins, such as for instance the fusion of TALE DNA binding domain with a catalytic domain (Christian, Cermak et al. 2010; Li, Huang et al. 2011).

[0005] Transcription Activator Like Effectors (TALE) has been widely used for several applications in the field of genome engineering. The sequence specificity, of this family of proteins used in the infection process by plant pathogens of the Xanthomonas genus, is driven by an array of motifs of 33 to 35 amino acids repeats, differing essentially by the two positions 12 and 13 (Boch, Scholze et al. 2009; Moscou and Bogdanove 2009).The recent achievement of the high resolution structure of TAL effectors bound to DNA showed that each single base of the same strand in the DNA target is contacted by a single repeat (Deng, Yan et al. 2012; Mak, Bradley et al. 2012), with the specificity resulting from the two polymorphic amino acids of the repeat; the so-called RVDs (repeat variable dipeptides). The modularity of these DNA binding domains has been confirmed by assembly of repeats designing TALE-derived protein with new sequence specificities.

[0006] TALE proteins has so far been described as containing: (i) an N-terminal domain including a translocation signal, (ii) a central DNA-binding domain, and (iii) a C-terminal domain including a nuclear localization signal (NLS) and an acidic activation domain (AD). A representative member of this family is AvrBs3 from Xanthomonas vesicatoria (SWISSPROT P14727) that has a 1164 amino acid sequence comprising a N-terminal domain of 288 amino acids (position 1 to 288), a central domain of 593 amino acids (positions 289 to 881), and a C-terminal domain of 283 amino acids (positions 882 to 1164) comprising a NLS and AD (transcription activation domain). The DNA-binding domain which determines the target specificity of each TALE consists of a variable number (generally 12 to 27) of tandem, nearly identical, 33-35 amino acid repeats, followed by a single truncated repeat. For example, AvrBs3 DNA-binding domain (SEQ ID NO. 1) comprises 17 repeats of 34 amino acids and a truncated repeat of 15 amino acids. The "repeat-variable di-residue" (RVD), which represents the variable residues in the repeat determines the specificity of interaction with the nucleotide base of the DNA target, in a code-like fashion with some degeneracy. The four most common RVDs are HD with respect to c, NI with respect to a, NG with respect to t and NN with respect to g ((Boch, Scholze et al. 2009; Moscou and Bogdanove 2009; Bogdanove and Voytas 2011), WO 2011/072246).

[0007] This straightforward sequence relationship between RVDs and nucleotide bases allows the production of custom TAL effectors that bind DNA sequences of interest by assembling an array of repeats that corresponds to the intended target site. Such engineered TALE proteins have improved gene-editing technology (Baker 2012). A variety of rapid construction methods for custom TALE fusion proteins have recently been developed based on the protein scaffold of AvrBs3-like proteins by adding catalytic protein domains to the C-terminal. (US 2011/0145940; Cermak, Doyle et al. 2010; Weber, Gruetzner et al. 2011; Zhang, Cong et al. 2011; Doyle, Booher et al. 2012). TAL effectors have been, for instance fused to a nuclease catalytic head to form specific nucleases (TALE-Nuclease) creating thereby new tools, especially for genome engineering applications, that have proven efficiency in cell-based assays in yeast, mammalian cells and plants (Cermak, Doyle et al. 2010; Christian, Cermak et al. 2010; Geissler, Scholze et al. 2011; Huang, Xiao et al. 2011; Li, Huang et al. 2011; Mahfouz, Li et al. 2011; Miller, Tan et al. 2011; Morbitzer, Elsaesser et al. 2011; Mussolino, Morbitzer et al. 2011; Sander, Cade et al. 2011; Tesson, Usal et al. 2011; Weber, Gruetzner et al. 2011; Zhang, Cong et al. 2011; Li, Piatek et al. 2012; Mahfouz, Li et al. 2012).

[0008] Meanwhile, the Transcription Activator Like Effectors so far described in the literature (AvrXa7, Hax, PthXo1,..) are highly similar to the protein AvrBs3 and all originate from Xanthomonas or its closely related Ralstonia bacterial genus.

[0009] One of the drawbacks of the Transcription Activator Like Effectors from Xanthomonas lies in the fact that they mostly consists of highly repetitive motifs, nearly identical to each other. The high identity of these repeats is prompted to create genetic recombination or instability when the repeats are assembled to form engineered nucleic acid binding domains.

[0010] A first level of difficulty occurs at the polynucleotide level to clone the repeat sequences due to the fact that restriction sites and PCR primers are basically the same for each repeat. Under these conditions, it gets difficult to perform routine lab procedures to check that the repeats have been cloned properly, in the good number and in the right order. This is although essential to achieve proper expression of a DNA binding protein that is expected to show specificity with a desired nucleic acid sequence.

[0011] A second level of difficulty occurs when the polynucleotide sequences are included in vectors for heterologous expression, in particular when using viral vectors. As recently reported by Holkers et al. (2012), it appears that DNA tandem repeat motifs from TALE scaffold are generally incompatible with lentiviral vector system due to some internal sequence recombinations. This particularly limits the current use of TALE proteins into primary cells, which are generally not permissive towards classical gene transfer technologies.

[0012] Lower efficiencies of TALE derived proteins have also been reported in certain cell types, like for instance in mice, or in relation with epigenetic modifications, so that alternative or complementary solutions to improve TALE derived protein are still actively sought.

[0013] Unexpectedly, the present inventors have identified putative proteins from the bacterial endosymbiont Burkholderia rhizoxinica and others from a marine organism, displaying highly polymorphic modules having specific DNA binding activity, while having very different sequence (less than 40 % identity) in comparison with TALE repeats. These proteins have also completely different N and C terminal domains. The modules found in these proteins have higher sequence variability than TALE repeats and can although be assembled to engineer new base per base specific binding domains (MBBBD) to target nucleic acid sequences in genomes. These modules confer better sequence stability when they are assembled and expressed in living cells as nucleic acid binding domains.

Summary of the invention



[0014] The present invention concerns new modular base-per-base specific nucleic acid binding domains (MBBBD) derived from a newly identified protein from the bacterial endosymbiont Burkholderia Rhizoxinica, namely EAV36_BURRH. Other newly identified proteins from the bacterial endosymbiont Burkholderia Rhizoxinica are described herein, namely E5AW43_BURRH, E5AW45_BURRH and E5AW46_BURRH proteins and from other similar proteins identified from marine organisms metagenomic database referred to as JCVI_A and JCVI_B and ECR81667.

[0015] These proteins comprise modules of about 31 to 33 amino acids that, when assembled together, form modular base-per-base binding domains (MBBBD). A Parallel may be made with the repeat domains of TALE proteins from Xanthomonas. However the modules in these binding domains display less than 40 % sequence identity with TALE common repeats and much more sequence variability. In addition, most modules from these proteins display amino acid variability only in position 13, and not in position 12, whereas variability is observed both in positions 12 and 13 in the variable di-residues (RVDs) of TALE proteins. As a result, into the engineered MBBBDs according to the invention, base specificity may rely only on position 13 of the modules by merely following a one base/one amino acid code. These proteins display also different N and C-terminal domains, which are much shorter than in TALE proteins.

[0016] The different domains from said proteins (modules, N and C terminals) are useful to engineer new proteins or scaffolds having binding properties to specific nucleic acid sequences. Assembling the different modules into new MBBBDs allows targeting almost any nucleic acid sequence in a genome. The MBBBDs can thereby be fused to different catalytic domains to process DNA at the locus of a target nucleic acid sequence, especially nuclease and transcriptional activators. The invention pertains to new rare-cutting endonucleases derived from these polypeptides, with improved specificity or cleavage activity towards a specific locus. Chimeric proteins resulting from the assembly of the different domains from said new modular proteins with functional domains of TALE-like proteins are also disclosed herein.

[0017] The inventors have conceived different fusion or hybrid proteins deriving from the above polypeptides and polynucleotides and methods to use same.

[0018] The invention pertains to E5AV36_BURRH modules assembled to form modular base-per-base binding domains (MBBBD). By modular base-per-base binding domains is meant a succession of polypeptide modules assembled in order to respectively target a nucleic acid base in a given nucleic acid target sequence.

[0019] Such MBBBD can be fused to catalytic domains in order to process DNA at a locus defined by a nucleic acid target sequence, especially to a transcription activator, such as VP16 or VP64 or to some repression factors such as for example KRAB (kruppel-associated box) domain.

[0020] The MBBBD of the invention are fused to a nuclease catalytic head, especially catalytic domains from Fok-I, to form specific endonucleases, which allow dimerization of Fok-1. The MBBBDs have several advantages over TALE-repeats. In particular, the fact that the modules can display non repeated sequences provides the MBBBDs with improved modularity. MBBBD are likely to be processed more easily using PCR, cloning methods and viral delivery methods because polynucleotide sequences encoding the modules are not identical to each other. As a further advantage, MBBBDs allow fusions with further nuclease domains such as I-Tevl, making them active under monomeric form as well.

[0021] The resulting fusion proteins therefore form a new class of engineered endonucleases useful for gene targeting and edition of genomes.

[0022] Hybrid TALE-like proteins can be also created by combining polypeptide domains (modules, N or C terminals) from the above E5AV36 protein with those of currently existing, natural or engineered TALEs of AvrBs3-like proteins. Such new chimeric TALE-like proteins can be assembled using the methods already well-established in the art for engineering TALE domains, in particular by sub-cloning the sequences encoding modules or repeats in polynucleotide vectors, for instance, by using Golden Gate cloning method. Preferably, the protein domains from E5AV36 (module domain, N-terminal domain, C-terminal domain) will be used in combination with the complementary domains of classical TAL effectors.

[0023] Fusions of catalytic domains to the BURRH polypeptides according to the invention may be N-terminal or C-terminal fusions, with any appropriate linkers or truncations.

[0024] E5AV36_BURRH modules can also be used as template to build new artificial repeats for TALE-like proteins. Such artificial repeat arrays can be created by introducing mutations into their sequences or by introducing new RVDs into repeats or modules. Key positions at the N/C-terminal domains of the protein can be partially or totally degenerated to modulate DNA affinity as well as interactions with other cofactors. An extensive screening may be also carried out to identify new modules and new RVD-like structures throughout the genomes diversity.

Disclosure of the technology


1/ BURRH Polypeptides displaying modular base-specific binding domains



[0025] Upon an extensive search for proteins that may display DNA binding properties throughout a selection of genomes, the present inventors have unexpectedly identified 4 proteins from the microorganism Burkholderia rhizoxinica displaying a modular structure. These modules share a low identity with TALE proteins and have completely different N and C-terminals. Interestingly, the modules of these proteins display more variability than AvrBs3-like repeats and their amino acids in position 12 and 13 significantly differ from those at play in Xanthomonas.

[0026] Burkholderia rhizoxinica is an intracellular symbiont of the phytopathogenic zygomycete Rhizopus microsporus, the causative agent of rice seedling blight. The endosymbiont produces the antimitotic macrolide rhizoxin for its host. It is vertically transmitted within vegetative spores and is essential for spore formation of the fungus. Its 3.75 Mb genome, which consists of a chromosome and two strain-specific plasmids, was recently sequenced by Lackner, Moebius et al. 2011. Unlike TALE proteins, the DNA binding protein derived from Burkholderia rhizoxinica do not display a transactivator domain and very few is known about the biology of this microorganism.

[0027] In a general aspect, the present disclosure relates to the discovery and identification of new modular proteins obtainable from the different domains of these four proteins:
  • EAV36_BURRH (SEQ ID NO.2);
  • E5AW43_BURRH (SEQ ID NO.3);
  • E5AW45_BURRH (SEQ ID NO.4), and
  • E5AW46_BURRH (SEQ ID NO.5),


[0028] The modular arrays of EAV36_BURRH, E5AW43_BURRH and E5AW45_BURRH proteins are flanked by short C and N terminal domains, which do not appear to contain either an acidic domain or a NLS.

[0029] The alignment of the proteins sequences E5AV36, E5AW43, E5AW45, E5AW46 (SEQ ID NO. 2 to SEQ ID NO. 5) from BURRH and of AvrBs3 (SEQ ID NO.1) are presented in Figure1.

[0030] EAV36_BURRH appears to contain 20 modules and a shorter N- and C-termini.

[0031] E5AW45_BURRH numbers 27 modules and has N- and C-termini very similar to EAV36_BURRH.

[0032] E5AW43_BURRH and E5AW46_BURRH are much shorter polypeptides. E5AW43_BURRH has only 6 modules, whereas E5AW46_BURRH does not appear to have any. However, the N- and C-termini of E5AW43_BURRH and E5AW46_BURRH are very similar to EAV36_BURRH.

[0033] EAV36_BURRH, E5AW43_BURRH and E5AW45_BURRH proteins are currently annotated in Cog database [http://www.ncbi.nlm.nih.gov/COG] as being: "AraC-type DNA-binding domain-containing proteins". Thus, in one aspect, the disclosure relates to the use of these proteins, and more generally of AraC-type DNA binding domains, and more especially modules thereof, for engineering fusion proteins having modular base per base sequence specific binding domains.

[0034] The alignments of the modules and of the -N and -C terminal sequences of the above BURRH proteins are presented in Table 23 and 24 as follows:
• Aligned N-ter sequences (Table 23):
1) AvrBs3 N-ter (SEQ ID NO.6) 287 AA
2) E5AV36_BURRH N-ter (SEQ ID NO.7) 82 AA
3) E5AW45_BURRH N-ter (SEQ ID NO.9) 83 AA
4) E5AW43_BURRH N-ter (SEQ ID NO.8) 83 AA

• Aligned C-ter sequences (Table 24):
1) E5AW43_BURRH C-ter (SEQ ID NO.65) 30 AA
2) E5AW45_BURRH C-ter (SEQ ID NO.66) 30 AA
3) E5AV36_BURRH C-ter (SEQ ID NO.64) 30 AA
4) AvRBS3 C-ter (SEQ ID NO.111) 231 AA


[0035] The alignments have been made using standard alignment software using a segment to segment approach (Burkhard Morgenstern (1999). DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211 - 218).
The different module sequences are listed in Table 27 and aligned in Figure 2.

[0036] By contrast with what has been already published for classical TAL effectors repeats these modules show a higher degree of polymorphism. Nevertheless, as shown in logotype and occurrence matrix in Figures 3 and 4, it is interesting to observe that a stretch of 15 amino acids, from position 5 to 19 represented below, is highly conserved among the different modules:
DIVKIAGX1X2GGAQAL,

where X1 in position 12 is mostly represented by N, but can also be represented in some instances by K, and

where X2 in position 13 varies between different amino acids, more particularly : G, I, N, S, D, T, A, K and R.



[0037] The amino acids X1 and X2 found in positions 12 and 13 of these modules are more particularly: NI, ND, NG, NA, **, NT, NS, NR, NK, KG and N* (where * means that a deletion appears in the alignment made of the different module sequences as shown in Table 27). Position 12 is mainly represented by N, whereas position 13 is more variable, which suggests that the specificity with respect to nucleobases could rely more particularly on position 13. In such an event, NT, **, KG, and NR appear to be additional di-residues not occurring in Xanthomonas TALE proteins.

[0038] Interestingly, the data presented in the present application, in particular with respect to E5AV36_BURRH target specificity (see Figure 8) suggests that the nucleotide base specificity could even be determined only by X2 (position 13) of each module, position 12 (X1) being preferably N, thereby defining a one amino-acid/base code recognition. This would form the first code ever linking one amino acid to one base for specific recognition. This code appears to be primarily based on the following correspondences (AA: amino acid preferably in position 13 of the module):
Primary code
AANucleotide base
I A
G T
D C
N G


[0039] Possible alternative recognition also appears between the following amino acids and nucleotide bases as follows:
Secondary code
AANucleotide base
S, T A
R T
T, * C
R G


[0040] The symbol "*" (star) means a gap i.e. that there is no position aligned with position 13 using clustal alignment of the different modules.

[0041] It is also interesting to observe that most modules start with F and generally with FS, and end with G, generally RG.

[0042] Some modules from the above proteins also comprise less than 33 amino acids.

[0043] When considering amino acids that are present in more than 50 % of the modules, the following consensus sequence can be drawn:
F S - - D I V K I A G N - G G A Q A L - A V L - - - P T L - - RG
where the symbol "-" means a standard amino acid which is more variable.

[0044] The above consensus sequences are fully distinct from that of AvrBs3 repeats.

[0045] The matrix in Table 28 details the percentages of identity found between each of the different modules of the BURRH proteins and the following representative AvrBs3 repeat sequence:
AvrBs3 LTPEQVVAIASXXGGGKQALETVQRLLPVLCQAHG (SEQ ID NO.10)

[0046] The percentages of sequence identities for the different modules with respect to the above AvrBs3 repeat are indicated in bold in this matrix. The identity is comprised between 23 % (E5AV36_2) and 47 % (E5AW45_24 and E5AW45_27).

[0047] Polynucleotide sequences encoding the BURRH proteins E5AV36, E5AW43, E5AW45 and E5AW46 are also described herein. They are respectively referred to as SEQ ID NO.113 (E5AV36), SEQ ID NO.114 (E5AW43), SEQ ID NO.112 (E5AW45) and SEQ ID NO.115 (E5AW46).

2/ Metagenomic polypeptides with similarity to the BURRH polypeptides



[0048] Further search in genome databases were performed to identify further proteins having sequence similarity with the above BURRH proteins.

[0049] This search has permitted to identify the following polynucleotide sequence of so far unreported function encoded by genomic DNA isolated from marine organism sample.

[0050] The exact organism from which these metagenomic DNA sequences have been extracted has not been yet established. The DNA sequences might comprise some uncertainties due to the sequencing method. Thus, as a preliminary step, the searchers have reconstructed the original polynucleotide sequences (SEQ ID NO. 67 to SEQ ID NO. 70) to obtain the following full length protein sequences:
  • JCVI_A (SEQ ID NO.72) (Table 29);
  • JCVI_B (SEQ ID NO.73) (Table 30); and
  • ECR81667 (SEQ ID NO.71) (Table 31).


[0051] Initially, the primary polypeptide sequences were derived from polynucleotide sequences from different open reading frames that had to be assembled: JCVI_ORF_1096675837214 (SEQ ID NO.116), JCVI_ORF_1096688227496 (SEQ ID NO.117), JCVI_ORF_1096688227494 (SEQ ID NO.118), JCVI_ORF_1096675837216 (SEQ ID NO.119) and JCVI_ORF_1096688327480 (SEQ ID NO.120), data extracted from http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?lvl=0&id=408172.

[0052] The N-terminal and C-terminal of these proteins have been aligned with those from the BURRH proteins:
• Aligned N-ter sequences of the following sequences are presented in Table 32:
1) JCVI_B N-ter (SEQ ID NO.75) 76 AA
2) JCVI_A N-ter (SEQ ID NO.74) 66 AA
3) AvrBs3 N-ter (SEQ ID NO.6) 287 AA
4) E5AV36_BURRH N-ter (SEQ ID NO.7) 82 AA
5) E5AW45_BURRH N-ter (SEQ ID NO.9) 83 AA
6) E5AW43_BURRH N-ter (SEQ ID NO.8) 83 AA

• Aligned C-ter sequences of the following sequences are presented in Table 33:
1) EAW43_BURRH C-ter (SEQ ID NO.65) 30 AA
2) E5AW45_BURRH C-ter (SEQ ID NO.66) 30 AA
3) E5AV36_BURRH C-ter (SEQ ID NO.64) 30 AA
4) AvRBS3 C-ter (SEQ ID NO.111) 231 AA
5) JVCI_A C-ter (SEQ ID NO.110) 24 AA
6) ECR81667 C-ter (SEQ ID NO.109) 24 AA


[0053] It can be observed from the above alignments a significant variability between the C- and N-terminal domains from BURRH and the metagenomic proteins, which are also much shorter than those from AvrBs3.

[0054] The module polypeptides of 33 amino acids from the three metagenomic proteins have been aligned using Clustal multiple alignment (Figure 5). These modules also display a higher degree of polymorphism than what can be found among Xanthomonas TALEs. It is although interesting to observe from the logotype and occurrence matrix of Figures 6 and 7, that a stretch of 10 amino acids, from position 5 to 14, is highly conserved:
D I V S I A S X'1 X'2 G,

where X'1 in position 12 is mostly represented by H, but can also be represented by N or R, and

where X'2 in position 13 varies between different amino acids, more particularly : D, G, N, I, H, K S and T.



[0055] It is also noteworthy that most modules start with L or F and generally finish with G or E.

[0056] Amino acids X'1 and X'2 found in positions 12 and 13 of these modules are more particularly: HI, HD, HG, HS, HA, HH, HN, NN, NT and RN. The di-residues HH, HS, NT, HK and RN do not appear to occur in Xanthomonas TALE proteins Position 12 mostly displays H, whereas position 13 is more variable, which suggests that the specificity with respect to the different nucleobases could also rely more particularly on position 13.

[0057] When considering amino acids that are present in more than 50 % of the modules, the following consensus sequence can be drawn:
L - P - D I V S I A S H - G - - K - I T - L L - KW - - L - - LG,
where the symbol "-" means a standard amino acid which is more variable.

[0058] The above consensus sequences are fully distinct from that of AvrBs3 repeats.

[0059] It has the following common characteristics with the previous BURRH consensus:
- - D I V - I A - - - G- - - - - - - - L - - - - - L - - - G

Brief description of the figures:



[0060] 

Figure 1: Clustal W alignment of the proteins sequences E5AV36, E5AW43, E5AW45, E5AW46 from BURRH and of AvrBs3 (SEQ ID NO.1) using a segment to segment approach. The arrow indicates the start of modules sequences.

Figure 2: Alignment of the different modules of E5AV36, E5AW43 and E5AW45.

Figure 3: Logotype representation of the amino acids occurrence at positions 1 to 33 in the modules of BURRH proteins E5AV36, E5AW43 and E5AW45.

Figure 4: Matrix showing the number of times each amino acid is represented at positions 1 to 33 in the BURRH modules.

Figure 5: Alignment of the modules of JCVI_A (Table 29), JCVI_B (Table 30) and ECR81667 (Table 31).

Figure 6: Logotype representation of the amino acids occurrence at positions 1 to 33 in the modules of JCVI_A and JCVI_B

Figure 7: Matrix showing the number of times each amino acid is represented at positions 1 to 33 in the JCVI_A (Table 29), JCVI_B (Table 30) and ECR81667 (Table 31) modules.

Figure 8: Affinity of the BurrH_36 derived nuclease onto different putative targets A, B and C that have been recognized and cleaved in SSA assay. This shows the base per base apparent affinity of the different modules with respect to the nucleotide bases present on these targets. Experiments are detailed in Example 1.

Figure 9: Activity of BurrH_36 derived nuclease on pseudo-palindromic RAGT2.3 and RAGT2.4 target sequences listed in Table 9 in our mammalian SSA assay.

Figure 10: Examples of targeted mutagenesis (indels) at the desired locus using BurrH_36 derived nuclease (see example 7).

Figure 11: Alignment of wild type genomic sequence and most predominant mutants (deletions are highlighted by dashes) induced by the BurrH nuclease (18 modules) at the CAPNS1 locus.

Figure 12: Alignment of wild type genomic sequence and most predominant mutants (deletions are highlighted by dashes) induced by the BurrH nuclease (20 modules) at the CAPNS1 locus.

Figure 13: Targeted Gene Insertion (TGI) frequency determined at the CAPNS1 locus in the presence or absence (empty plasmid) of the nuclease.

Figure 14: Activity of BurrH_36 derived nuclease on AVR15 sequences targets in CHO SSA assay.

Figure 15: Graphical representation of possible architectures. (a) two BurrH_based monomers facing each other on both DNA strands with Fokl catalytic domain fused at the C-terminus (C/C). (b) two BurrH_based monomers facing each other on both DNA strands with Fokl catalytic domain fused at the N-terminus (N/N). (c) two BurrH_based monomers following each other on one DNA strand with Fokl catalytic domain fused at the C-terminus for the first and N-terminus for the second (C/N).

Figure 16: A. Insertion or mutation of amino acid residues in the N-terminal domain of BurrH_36 to enhance BurrH 36 nuclease activity. A. Alignment of the wild type N-terminal domain of BurrH_36 (SEQ ID NO. 7) and mutated N-terminal domain of BurrH_36 (pCLS21512 to pCLS21520; SEQ ID NO 399 to SEQ ID NO. 407). B. Alignment of wild type N-terminal domain of BurrH_36 and N-terminal domain pCLS21521 (SEQ ID NO: 408) in which the 26 first amino acids of the N-terminal domain of BurrH_36 (SEQ ID NO.2) have been replaced by the 74 first amino acids from the D152 N-terminal domain of AvrBS3 (SEQ ID NO. 366) and comprising seven point mutations.

Figure 17: NHEJ mutagenesis frequency on the xylosyltransferase gene in Nicotiana benthamiana by Bur-based TALEN. The transformation with YFP alone serves as the negative control for 454 deep sequencing.

Figure 18: Activity of the TevM01::b36 construct in mammalian cells (CHO-K1) on a chromosomal target measured as a reduction in GFP fluorescence.

Figure 19: Activity of the TevM01::cT11 construct in mammalian cells (CHO-K1) on a chromosomal target measured as a reduction in GFP fluorescence.

Figure 20: Activation of BFP transcription by engineered dBurrh_36 WT and dBurrh_36 HBB in 293H cells. 293H cells were transfected in 10 cm plate format (1.2 106 cells/well) with 3 µg of reporter plasmid, 0, 500 or 1000 ng of dBurrh_36 WT (A) or dBurrh_36 HBB (B) plasmids, using Lipofectamine as a transfection agent. 2 days post transfection, living 293H cells displaying red fluorescence signal were first selected by an appropriated gating analysis and GFP/BFP median signals emitted by these cells were then determined using a MACS Quant flow cytometer. The BFP signals, obtained when 3 µg of target was transfected in the absence or in the presence of increasing amounts of its specific Effector, are displayed (black bars). A non-specific target was transfected in the absence or in the presence of increasing amounts of each Effector and the results are displayed as negative controls (grey bars). Experimental data regarding dBurrh_36 Effector and dBurrh_36 Effector are a result of 3 and 1 independent experiments respectively.


Brief description of the Tables:



[0061] 

Table 1: List of all pseudo-palindromic sequences targets (two identical recognition sequences are placed facing each other on both DNA strands - minuscule letters represent spacers) used in yeast SSA assay.

Table 2: Activity of BurrH_36 derived nuclease on pseudo-palindromic sequences targets (two identical recognition sequences are placed facing each other on both DNA strands) in yeast SSA assay.

Table 3: List of all pseudo-palindromic (two identical recognition sequences are placed facing each other on both DNA strands) sequences targets, with various nucleotides in position 0, -1 and -2 used in yeast SSA assay.

Table 4: Activity of BurrH_36 derived nuclease on pseudo-palindromic sequences targets listed in Table 3 in yeast SSA assay.

Table 5: Sequences of the module domains of BurrH_36 based constructs containing 18 DNA binding modules (Example 3).

Table 6: List of all pseudo-palindromic (two identical recognition sequences are placed facing each other on both DNA strands) sequences targets, with various spacer length (ranging from 5 to 40 bp) used in yeast SSA assay.

Table 7: Activity of BurrH_36 derived nuclease on pseudo-palindromic sequences targets listed in Table 6 in yeast SSA assay.

Table 8: Sequences of the module domains of BurrH_36 based constructs containing 16 DNA binding modules (Example 4).

Table 9: List of the 2 pseudo-palindromic (two recognition sequences are placed facing each other on both DNA strands) sequences targets, used in yeast and mammalian SSA assay.

Table 10: Activity of BurrH_36 derived nuclease on pseudo-palindromic sequences targets listed in Table 9 in yeast SSA assay.

Table 11: Sequences of the 16 module domains of pCLS18477 construct derived from the alignment of the first 5 modules of E5AV36.

Table 12: Sequences of the 16 module domains of pCLS18478 construct derived from the alignment of all the E5AV36 modules.

Table 13: Sequences of the 16 module domains of pCLS18479 construct derived from the alignment of all the E5AV36 modules (Example 5).

Table 14: Activity of BurrH_36 derived nuclease on one of the pseudo-palindromic sequences targets listed in Table 9 in yeast SSA assay.

Table 15: Activity of BurrH_36 derived nuclease on AVR15 sequences targets in yeast SSA assay at 37°C. +++ indicates a high activity.

Table 16: List of all pseudo-palindromic (two identical recognition sequences are placed facing each other in the 5'/5' (or N/N) orientation on both DNA strands) sequences targets, with various spacer sizes used in our yeast SSA assay previously described (International PCT Applications WO 2004/067736 and in (Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

Table 17: List of all targets having a single RAGT2.4 DNA target sequences preceding a single AvrBs3 (on the same DNA strand), with various spacer sizes used in our yeast SSA assay previously described (International PCT Applications WO 2004/067736 and in (Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006)

Table 18: Activity of BurrH_36 derived nuclease on one of the pseudo-palindromic sequences targets listed in Table 16 and 17 in yeast SSA assay at 37°C. - indicates no detectable activity, + indicates a low activity, ++ a medium activity and +++ a high activity

Table 19: Activity of BurrH_36 derived nuclease on RAGT2.3 and RAGT2.4 sequences targets in yeast SSA assay at 37°C. - indicates no detectable activity, + indicates a low activity, and +++ a high activity.

Table 20: Activity of BurrH_36 derived chimera nuclease on RAGT2.3 and RAGT2.4 sequences targets in yeast SSA assay at 37°C. - indicates no detectable activity, and +++ a high activity.

Table 21: Activity of BurrH_36 derived nuclease containing mutations in the N-terminal domain on Avr15 sequence target in yeast SSA assay at 37°C. - indicates no detectable activity, and +++ a high activity.

Table 22: Activity of monomeric MBBBD nuclease in yeast (37°C). Activity of TevD02::b36-AvrBs3 and TevM01::b36-AvrBs3 on DNA target containing natural I-Tevl cleavage site (CAAGC) wherein the terminal G base of the I-Tevl cleavage site is spaced away of 10 bp from the residue preceded the single AvrBs3 recognition site (SEQ ID NO. 425).Table 23: Alignment of the N-terminal sequences of E5AV36, E5AW45 and E5AW46 BURRH proteins with N-terminal sequence of AvrBs3 (DIALIGN format).

Table 24: Alignment of the C-terminal sequences of E5AV36, E5AW45 and E5AW46 BURRH proteins with C-terminal sequence of AvrBs3 (DIALIGN format).

Table 25: Sequence identity matrix showing percentages of identity between the N-terminal amino acids sequences of E5AV36, E5AW45 and E5AW46 and AvrBs3.

Table 26: Sequence identity matrix showing percentages of identity between the C-terminal amino acids sequences of E5AV36, E5AW45 and E5AW46 and AvrBs3.

Table 27: Amino acid sequences of the modules of E5AV36, E5AW43 and E5AW45.

Table 28: Matrix comparing the identity of the amino acid sequences of the different modules from E5AV36, EAW45, E5AW43 and AvrBs3.

Table 29: Amino acid sequences of the putative protein JCVI_A (SEQ ID NO.72) resulting from the fusion of ECG96325 (SEQ ID NO.68) and ECG96326 (SEQ ID NO. 69).

Table 30: Amino acid sequences of the putative protein JCVI_B (SEQ ID NO.73), resulting from the fusion of EBN19408 (SEQ ID NO.70) and EBN19409 (SEQ ID NO.67)

Table 31: Amino acid sequences of the putative protein JCVI_ORF_1096688327480 (ECR81667) (SEQ ID NO.71).

Table 32: Alignment of the N-terminal sequences of JCVIA and JVCIB with those of E5AV36, E5AW45, E5AW43 and AvrBS3 (DIALIGN format).

Table 33: Alignment of the C-terminal sequences of JCVIA and JVCIB with those of E5AV36, E5AW45, E5AW43 and AvrBS3 (DIALIGN format).

Table 34: List of peptide linkers that can be used in MBBBD proteins.


Detailed description


General method for identifying genomic members as a source of module domains



[0062] As a primary embodiment of the disclosure is a method to identify putative genomic sequences that may encode modules having specificity to nucleic acid bases. In the present situation, the identification of module sequences according to the invention has come across the following difficulties:
  • Lack of identity with any known repeat sequences, especially with Xanthomonas TALEs;
  • Degeneration of the genetic code to pass from polypeptide to polynucleotides;
  • Different codon usage depending of the different genomes of organisms;
  • Higher sequence variability between the module sequences; and
  • High number of genomic sequences in database to process.


[0063] In order to overcome these difficulties, the disclosure provides with an approach based on occurrence of repeated structures in putative proteins without taking into account the Xanthomonas TALEs known amino acid sequences. The method is based, as a first screening, on the identification of aminoacidic sequences containing module motifs of variable length (between 20 and 50 aa) using a large variety of computational techniques. Then the candidate sequences are submitted to secondary structure predictions. All the candidates whose module motifs display a high content of alpha helices joined by small loops (whose primary sequence is highly polymorphic) are kept. Finally the entire sequences of the candidates (not only their module motives) are modelled on the available 3D structures. This step allows the identification of the correct number of domains constituting the entire candidate sequences as well as a first functional identification of the key residues regulating the activity of the new putative DNA binding proteins.

[0064] As a first result, said method has permitted the identification of proteins referred to as being related to the AraC protein family. Interestingly, some proteins of the AraC family have been described as containing DNA-binding domains having the ability of establishing DNA-base contacts (Bustos and Schleif 1993). However, to the inventor's knowledge, modular sequences have not been yet reported in connection with AraC DNA binding domains.

[0065] Thus, one aspect of the present disclosure concerns the use of polypeptide sequences referred to in databases as belonging to the AraC protein family as a source of new modules for engineer base per base specific DNA binding domain. In particular, the use of DNA binding domains from protein referred to as AraC proteins in genomic databases, especially those domains having nucleic acid base specificity, to form fusion proteins for recognition of specific nucleic acid target sequences, is described herein. As a result, DNA recognition protein domains may be assembled in order to pair off with a specific nucleic acid base sequence and be fused to catalytic domains to form a new generation of binding proteins.

New polypeptides derived from metagenomic JCVI_A, JCVI_B and ECR81667 proteins and from the BURRH proteins E5AV36, E5AW43, E5AW45 and E5AW4, and their use to engineer base per base binding domains (MBBBD)



[0066] As a further disclosure are the polypeptides derived from the BURRH proteins E5AV36, E5AW43, E5AW45 and E5AW46 and from the metagenomic JCVI_A, JCVI_B and ECR81667 proteins. These polypeptides may consist of the whole proteins or of their different domains as previously described especially the different modules, N and C-terminal domains of these proteins.

[0067] Because some variability may arise from the genomic data from which these polypeptides derive, and also to take into account the possibility to substitute some of the amino acids present in these polypeptides without significant loss of activity (functional variants) and also because the modules have a significant variability (some share less than 50 % identity), the disclosure encompasses polypeptides variants of the above polypeptides that share at least 70%, preferably at least 80 %, more preferably at least 90 % and even more preferably at least 95 % identity with the sequences provided in this patent application.

[0068] The present disclosure is thus drawn to polypeptides comprising a polypeptide sequence that has at least 60%, preferably 70%, more preferably at least 80%, again more preferably at least 90 %, 95 % 97 % or 99 % sequence identity with any of the above disclosed polypeptide sequences encoding modules, N or C-terminals. The invention relates to the use any polypeptide of sequence SEQ ID NO.11 to 30 as a new or alternative module. Also described herein is the use of any of the polypeptides of sequence SEQ ID NO.7 to 9 or SEQ ID NO. 74 to 76 as new or alternative N-terminal domain, and/or any of said polypeptides of sequence SEQ ID NO.64 to 66 or SEQ ID NO. 109 to 111 as a new or alternative C-terminal, in particular for introduction into existing AvrBs3-like TALE proteins (chimeric proteins).

[0069] The disclosure also relates to a polypeptide module or modular binding domain of an engineered protein that comprises a module sequence from a protein of the AraC family, especially a module sequence of 30 to 40 amino acids, preferably from 30 to 33 amino acids.

[0070] The polypeptide modules according to the invention are particularly useful to engineer "artificial" nucleic acid binding domains. By "artificial" is meant that they are assembled or modified to bind a desired nucleic acid sequence, said desired target sequence being different from that initially recognized by BURRH protein E5AV36 in the wild.

[0071] The assembly is generally made by selecting the modules in respect of the affinity of each module to a given nucleic acid base, preferably on a base per base basis. The selection can be made in particular by reference to said one amino-acid/one base code recognition established by the inventors. Said one amino-acid/one base code recognition is based on the following correspondences (AA: amino acid preferably in position 13 of the module):
Primary code
AANucleotide base
I A
G T
D C
N G


[0072] Possible alternative recognition may be implemented using the following correspondences:
Secondary code
AANucleotide base
S, T A
R T
T, * C
R G


[0073] The symbol "*" (star) means a gap i.e. that there is no position aligned with the amino acid in position 13 using clustal alignment of the different modules.

[0074] This straightforward code according to the present invention may also be used to modify the specificity of the polypeptide modules by directly introducing mutations in any of the module polypeptides described previously, especially in position 13.

[0075] The polynucleotide encoding the artificial nucleic acid binding domains of the invention can be assembled by cloning the polynucleotide sequences encoding the different polypeptides by the methods known in the art or by using a solid phase and Type IIS restriction enzymes as described in WO2013/017950 with respect to repeats from TAL binding domains, or even by automated polynucleotide synthesis. The produced polynucleotides can then be cloned into various expression or replication vectors to be transfected into living cells.

[0076] In one embodiment of the disclosure, modules of 32, 31 or less amino acids, such as those having identity to SEQ ID NO. 30, 38, 41, 50 and 63 can be used into such artificial nucleic acid binding domains. All the polypeptide modules or mutations according to the present invention can also be introduced into, or assembled with, TAL repeats, to form chimeric MBBBDs (see chimeric proteins).

[0077] "Identity" refers to sequence identity between two nucleic acid molecules or polypeptides. Identity can be determined by comparing a position in each sequence which may be aligned for purposes of comparison. When a position in the compared sequence is occupied by the same base, then the molecules are identical at that position. A degree of similarity or identity between nucleic acid or amino acid sequences is a function of the number of identical or matching nucleotides at positions shared by the nucleic acid sequences. Various alignment algorithms and/or programs may be used to calculate the identity between two sequences, including FASTA, or BLAST which are available as a part of the GCG sequence analysis package (University of Wisconsin, Madison, Wis.), and can be used with, e.g., default setting. BLASTP may also be used to identify an amino acid sequence having at least 80%, 85%, 87.5%, 90%, 92.5%, 95%, 97.5%, 98%, 99% sequence similarity to a reference amino acid sequence using a similarity matrix such as BLOSUM45, BLOSUM62 or BLOSUM80. Unless otherwise indicated a similarity score will be based on use of BLOSUM62. When BLASTP is used, the percent similarity is based on the BLASTP positives score and the percent sequence identity is based on the BLASTP identities score. BLASTP "Identities" shows the number and fraction of total residues in the high scoring sequence pairs which are identical; and BLASTP "Positives" shows the number and fraction of residues for which the alignment scores have positive values and which are similar to each other. Amino acid sequences having these degrees of identity or similarity or any intermediate degree of identity of similarity to the amino acid sequences disclosed herein are contemplated and encompassed by this disclosure. The same applies with respect to polynucleotide sequences using BLASTN.

[0078] By "TALE-like polypeptide" is intended any polypeptide or protein comprising a binding domain formed by at least two repeats, preferably at least 5, more preferably at least 10, even more preferably at least 14 repeats from a TALE protein having more than 80 % identity with AvrBs3 from Xanthomonas, each of said repeat having specificity for a nucleic acid base. In general the repeats do not overlap and form a succession of repeats comprising RVDs. This succession and order of the RVDs, so-called "RVD sequence" may be modified by assembling repeats together to form engineered TALE-like binding domains, thereby allowing targeting any desired sequence in-vivo or in-vitro. According to the invention, modules as disclosed herein may replace some of the AvrBs3-like repeats in such proteins to form new TALE-like chimeric polypeptides.

[0079] Some modules from the polypeptides according to the disclosure comprise variable residues in position 12 and 13, in particular NT, **, KG, NR, RN, HS, HH and/or HK which may be independently introduced in any existing TALE repeats or in any TALE-like polypeptide as described herein, to improve or modulate their specificity with respect to their cognate nucleic acid bases.

Fusion proteins



[0080] The polypeptides according to the disclosure previously described may be fused with any other polypeptides to form single chain, monomer or multimer proteins.

[0081] In particular, the above polypeptides can be fused with catalytic domains in order to activate or inactivate transcription or translation activity or process genetic material, within or adjacent to the nucleic acid sequence targeted by the MBBBD. Said catalytic domain can have cleavage activity, either a cleavase activity either a nickase activity, more broadly a nuclease activity but also a polymerase activity, a kinase activity, a phosphatase activity, a methylase activity, a topoisomerase activity, an integrase activity, a transposase activity, a ligase, a helicase or recombinase activity as non-limiting examples. According to the invention, the polypeptides are fused to a catalytic domain which has an endonuclease activity in a monomeric or dimeric form.

[0082] Suitable domains for achieving activation include the HSV VP16 activation domain (see, e.g., Hagmann et al., J. Virol. 71, 5952-5962 (1997)) nuclear hormone receptors (see, e.g., Torchia et al., Curr. Opin. Cell. Biol. 10:373-383 (1998)); the p65 subunit of nuclear factor kappa B (Bitko & Barik, J. Virol. 72:5610-5618 (1998) and Doyle & Hunt, Neuroreport 8:2937-2942 (1997)); Liu et al., Cancer Gene Ther. 5:3-28 (1998)), or artificial chimeric functional domains such as VP64 (Beerli et al., (1998) Proc. Natl. Acad. Sci. USA 95:14623-33), and degron (Molinari et al., (1999) EMBO J. 18, 6439-6447). Additional exemplary activation domains include, Oct 1, Oct-2A, Sp1, AP-2, and CTF1 (Seipel et al., EMBO J. 11, 4961-4968 (1992) as well as p300, CBP, PCAF, SRC1 PvALF, AtHD2A and ERF-2. See, for example, Robyr et al. (2000) Mol. Endocrinol. 14:329-347; Collingwood et al. (1999) J. Mol. Endocrinol. 23:255-275; Leo et al. (2000) Gene 245:1-11; Manteuffel-Cymborowska (1999) Acta Biochim. Pol. 46:77-89; McKenna et al. (1999) J. Steroid Biochem. Mol. Biol. 69:3-12; Malik et al. (2000) Trends Biochem. Sci. 25:277-283; and Lemon et al. (1999) Curr. Opin. Genet. Dev. 9:499-504. Additional exemplary activation domains include, but are not limited to, OsGAI, HALF-1, C1, AP1, ARF-5, -6, -7, and -8, CPRF1, CPRF4, MYC-RP/GP, and TRAB1. See, for example, Ogawa et al. (2000) Gene 245:21-29; Okanami et al. (1996) Genes Cells 1:87-99; Goff et al. (1991) Genes Dev. 5:298-309; Cho et al. (1999) Plant Mol. Biol. 40:419-429; Ulmason et al. (1999) Proc. Natl. Acad. Sci. USA 96:5844-5849; Sprenger-Haussels et al. (2000) Plant J. 22:1-8; Gong et al. (1999) Plant Mol. Biol. 41:33-44; and Hobo et al. (1999) Proc. Natl. Acad. Sci. USA 96:15,348-15,353.

[0083] Exemplary repression domains include, but are not limited to, KRAB A/B, KOX, TGF-beta-inducible early gene (TIEG), v-erbA, SID, MBD2, MBD3, members of the DNMT family (e.g., DNMT1, DNMT3A, DNMT3B), Rb, and MeCP2. See, for example, Bird et al. (1999) Cell 99:451-454; Tyler et al. (1999) Cell 99:443-446; Knoepfler et al. (1999) Cell 99:447-450; and Robertson et al. (2000) Nature Genet. 25:338-342. Additional exemplary repression domains include, but are not limited to, ROM2 and AtHD2A. See, for example, Chem et al. (1996) Plant Cell 8:305-321; and Wu et al. (2000) Plant J. 22:19-27.

[0084] The above polypeptides may also be fused with reporter or selection markers such as GFP and GUS as non limiting examples.
  • By "catalytic domain" is intended the protein domain or module of an enzyme containing the active site of said enzyme; by active site is intended the part of said enzyme at which catalysis of the substrate occurs. Enzymes, but also their catalytic domains, are classified and named according to the reaction they catalyze. The Enzyme Commission number (EC number) is a numerical classification scheme for enzymes, based on the chemical reactions they catalyze (http://www.chem.qmul.ac.uk/iubmb/enzyme/).


[0085] Said catalytic domain has preferably an enzymatic activity selected from the group consisting of nuclease activity, polymerase activity, kinase activity, phosphatase activity, methylase activity, topoisomerase activity, integrase activity, transposase activity or ligase activity. In another preferred embodiment, the catalytic domain fused to the MBBBD polypeptides of the present disclosure can be a transcription activator or repressor (i.e. a transcription regulator), or a protein that interacts with or modifies other proteins such as histones. Non-limiting examples of nucleic acid processing activities of said fusion MBBBD polypeptides of the present disclosure include, for example, creating or modifying epigenetic regulatory elements, making site-specific insertions, deletions, or repairs in DNA, controlling gene expression, and modifying chromatin structure.

[0086] Catalytic domains that may be fused to the MBBBD polypeptides can be selected, for instance, from the group consisting of proteins Mmel, Colicin-E7 (CEA7_ECOLX), EndA, Endo I (END1_ECOLI), Human Endo G (NUCG_HUMAN), Bovine Endo G (NUCG_BOVIN), R.HinP1I, I-BasI, I-BmoI, I-HmuI, I-Tev-I, I-TevII, I-TevIII, I-TwoI, R.Mspl, R.Mval, NucA, NucM, Vvn, Vvn_CLS, Staphylococcal nuclease (NUC_STAAU), Staphylococcal nuclease (NUC_STAHY), Micrococcal nuclease (NUC_SHIFL), Endonuclease yncB, Endodeoxyribonuclease I (ENRN_BPT7), Metnase, Nb.BsrDI, BsrDI A, Nt.BspD6I (R.BspD6I large subunit), ss.BspD6I (R.BspD6I small subunit), R.PleI, Mlyl, Alwl, Mva1269I, Bsrl, Bsml, Nb.BtsCI, Nt.BtsCI, R1.BtsI, R2.Btsl, BbvCI subunit 1, BbvCI subunit 2, Bpu10I alpha subunit, Bpu10I beta subunit, Bmrl, Bfil, I-Crel, hExol (EX01_HUMAN), Yeast Exol (EX01_YEAST), E.coli Exol, Human TREX2, Mouse TREX1, Human TREX1, Bovine TREX1, Rat TREX1, Human DNA2, Yeast DNA2 (DNA2_YEAST), VP16, RBBP8 and Type IIS nucleases like Fok-I and functional variants thereof.

[0087] By" functional variants" is intended a catalytically active variant of a protein, such variant can have additional properties compared to its parent protein. Amino acid sequence variants of the peptide can be prepared by mutations in the DNA which encodes the peptide. Such variant comprise, for example, deletions from, or insertions or substitutions of residues within the amino acid sequence. Any combination of deletion, insertion or substitutions may also be made to arrive at the final construct, provided that the final construct possesses the desired activity.

[0088] The catalytic domain is preferably a nuclease domain and more preferably a domain having nuclease activity, like for instance I-Tev-I, Col E7, NucA and Fok-I.

[0089] In a particular embodiment, said polypeptides that specifically target nucleic acid sequence of interest may be fused to any catalytic domains that require dimerization for activity. As non limiting example, said polypeptide may be fused to the type IIS Fokl endonuclease domain or functional variant thereof which functions independently of the DNA binding domain and induces nucleic acid double-stranded cleavage as a dimer (Li, Wu et al. 1992; Kim, Cha et al. 1996). Amino acid sequence of Fokl variants can be prepared by mutations in the DNA, which encodes the catalytic domain. Such variants include, for example, deletions from, or insertions or substitutions of, residues within the amino acid sequence. Any combination of deletion, insertion, and substitution may also be made to arrive at the final construct, provided that the final construct possesses the desired activity. Said nuclease domain of Fokl variant according to the present invention comprises a fragment of a protein sequence having at least 80%, more preferably 90%, again more preferably 95 % amino acid sequence identity with the protein sequence of Fokl (SEQ ID NO.123).

[0090] The targeted nucleic acid sequence of interest are preferably selected with respect to each other, such that the binding of the two fusion polypeptides to their respective target sites places each monomers of the endonuclease in a spatial orientation that allows the formation of a functional cleavage domain by dimerizing. In some embodiments, the spacer of the targeted nucleic acid sequences can be selected or varied to modulate MBBD nuclease specificity and activity. Thus in certain embodiment, the near edge of the target sites are separated by 5 to 50 nucleotides, preferably by 10-30 nucleotides or 25-40 nucleotides.

[0091] In another particular embodiment, said fusion protein is a monomeric MBBBD-nuclease. A monomeric MBBBD-nuclease is a MBBBD that does not require dimerization for specific recognition and cleavage, such as the fusions of engineered MBBBD modules with the catalytic domain of I-TevI.

[0092] I-Tevl catalytic domain corresponds to the protein domain or module of an enzyme containing the active site of said enzyme; by active site is intended the part of said enzyme at which catalysis of the substrate occurs. In the scope of the present invention, I-Tevl catalytic domain can provide nuclease activity.

[0093] By "nuclease catalytic domain" is intended the protein domain comprising the active site of an endonuclease enzyme. Such nuclease catalytic domain may generate a cleavage in a nucleic acid target sequence that corresponds to either Double Strand Break (DSB) (cleavase activity) in a nucleic acid target or a single strand break in a nucleic acid target sequence (nickase activity).

[0094] Said catalytic domain can be I-Tevl or a variant thereof. In a preferred embodiment, said catalytic domain is a variant of catalytic domain of I-Tevl designed from the N-terminal region of I-Tevl. Said catalytic domain comprises a part of the protein sequence SEQ ID NO. 413. In a preferred embodiment, said I-Tevl catalytic domain corresponds to the amino acid sequence of SEQ ID NO. 416 or SEQ ID NO: 417. Alternatively, amino acid sequence variants of the catalytic domain I-Tevl can be prepared by mutations in the DNA, which encodes the catalytic domain. Such variants include, for example, deletions from, or insertions or substitutions of, residues within the amino acid sequence. Any combination of deletion, insertion, and substitution may also be made to arrive at the final construct, provided that the final construct possesses the desired activity.

[0095] In a particular embodiment, said catalytic domain of I-Tevl according to the present invention comprises a fragment of a protein sequence having at least 80%, more preferably 90%, again more preferably 95 % amino acid sequence identity with the protein sequence SEQ ID NO. 413. In a preferred embodiment, said catalytic domain of I-Tevl comprises a protein sequence having at least 80%, more preferably 90%, again more preferably 95% amino acid sequence identity with the protein sequence SEQ ID NO. 416 or SEQ ID NO. 417.

[0096] Tevl fused MBBBD nuclease interacts with two regions in target nucleic acid sequence: the recognition site and the cleavage site. Optimal distances in the target nucleic acid sequence for the relative positioning of the binding and cleavage modules in the Tevl fused MBBBD polypeptide have been determined. Thus, the present invention relates to a MBBBD polypeptide capable of targeting a nucleic acid sequence that comprises a recognition site spaced away from said I-Tevl cleavage site by an optimal distance to increase DNA processing activity.

[0097] Increased DNA processing activity refers to an increase in the detected level of MBBBD nuclease processing activity against a target nucleic acid sequence. In the present invention, nucleic acid processing activity refers to a cleavage, either a cleavase activity or a nickase activity. By optimal distance is intended the distance between said recognition site and I-Tevl cleavage site allowing an increase in DNA processing activity of the Tevl chimeric endonuclease. An optimal distance is considered when it provides at least a 5% increase efficiency of DNA processing activity, more preferably 10%, again more preferably 15%, again more preferably 20%, again more preferably 25%, again more preferably 50%, again more preferably greater than 50%.

[0098] In particular embodiment, DNA binding recognition site is also chosen based upon its optimal spacer between the residue preceded the first nucleic acid base of DNA binding recognition site and the terminal G base of the I-Tevl cleavage site. In a preferred embodiment, the optimal spacer distance is between 1 to 50 bp, more preferably between 4 to 12 bp, again more preferably is 4, 5, 6, 7, 8, 9, 10, 11 or 12 bp.

[0099] In certain embodiment, the nuclease is a meganuclease (homing endonuclease) or variant thereof. Naturally-occurring meganucleases recognize 15-40 base-pair cleavage sites and are commonly grouped into four families: the LAGLIDADG family, the GIY-YIG family, the His-Cyst box family and the HNH family. Exemplary homing endonucleases include I-Sce I, I-Chu I, I-Cre I, I-Csm I, PI-Sce I, PI-TIi I, PI-Mtu I, I-Ceu I, I-Sce II, I-Sce III, HO, PI-Civ I, PI-Ctr I, PI-Aae I, PI-Bsu I, PI-Dha I, PI-Dra I, PI-Mav I, PI-Mch I, PI-Mfu I, PI-Mfl I, PI-Mga I, PI-Mgo I, PI-Min I, PI-Mka I, PI-Mle I, PI-Mma I, PI-Msh I, PI-Msm I, PI-Mth I, PI-Mtu I, PI-Mxe I, PI-Npu I, PI-Pfu I, PI-Rma I, PI-Spb I, PI-Ssp I, PI-Fac I, PI-Mja I, PI-Pho I, PI-Tag I, PI-Thy I, PI-Tko I, PI-Tsp I or I-MsoI, PI-PspI, I-SceIV, I-PanI, I-OnuI, I-PpoI, I-Tevl, I-TevII and I-TevIII. In a preferred embodiment, the homing endonuclease according to the invention is a LAGLIDADG endonuclease such as I-SceI, I-CreI, I-CeuI, I-OnuI, I-MsoI, and I-DmoI. In a most preferred embodiment, said LAGLIDADG endonuclease is I-CreI. Wild-type I-CreI is a homodimeric homing endonuclease that is capable of cleaving a 22 to 24 bp double-stranded target sequence.

[0100] In the present application, homing endonuclease variants such as I-CreI may be homodimers (meganuclease comprising two identical monomers) or heterodimers (meganuclease comprising two non-identical monomers). It is understood that the scope of the present invention also encompasses the homing endonuclease variants per se, including heterodimers (WO2006097854), obligate heterodimers (WO2008093249) and single chain meganucleases (WO03078619 and WO2009095793) as non limiting examples, able to cleave one of the sequence targets in the cell genome. The invention also encompasses hybrid variant per se composed of two monomers from different origins (WO03078619).

[0101] The invention encompasses both wild-type and variant endonucleases. In a preferred embodiment, the endonuclease according to the invention is a "variant" endonuclease, i.e. an endonuclease that does not naturally exist in nature and that is obtained by genetic engineering or by random mutagenesis. The variant endonuclease according to the invention can for example be obtained by substitution of at least one residue in the amino acid sequence of a wild-type, endonuclease with a different amino acid. Said substitution(s) can for example be introduced by site-directed mutagenesis and/or by random mutagenesis. In the frame of the present invention, such variant endonucleases remain functional, i.e. they retain the capacity of recognizing and specifically cleaving a target sequence. The variant endonuclease according to the invention cleaves a target sequence that is different from the target sequence of the corresponding wild-type endonuclease. Methods for obtaining such variant endonucleases with novel specificities are well-known in the art.

[0102] Said catalytic domain might be at the N-terminal part or C-terminal part of said MBBBD. In a particular embodiment, Said catalytic domain is fused to MBBBD by a peptide linker. Peptide linker acts as a communication device between the MBBBD polypeptide and catalytic domain to act in concert for nucleic acid cleavage. Said peptide linkers can be of various sizes, preferably from 2 to 50 amino acids, more preferably from 3 to 10 amino acids and can be selected from the group consisting of NFS1, NFS2, CFS1, RM2, BQY, QGPSG, LGPDGRKA, 1a8h_1, 1dnpA_1, 1d8cA_2, 1ckqA_3, 1sbp_1, 1ev7A_1, 1alo_3, 1amf_1, 1adjA_3, 1fcdC_1, 1al3_2, 1g3p_1, 1acc_3, 1ahjB_1, 1acc_1, 1af7_1, 1heiA_1, 1bia_2, 1igtB_1, 1nfkA_1, 1au7A_1, 1bpoB_1, 1b0pA_2, 1c05A_2, 1gcb_1, 1bt3A_1, 1b3oB_2, 16vpA_6, 1dhx_1, 1b8aA_1 and 1qu6A_1 and peptide linkers listed in Table 34 (SEQ ID NO.451 to SEQ ID NO.535).

[0103] In a more preferred embodiment, the peptide linker that can link said catalytic domain to the MBBBD polypeptide according to the method of the present invention can be selected from the group consisting of GRSGSDP (SEQ ID NO: 489), QGPSG (SEQ ID NO: 487), IA (SEQ ID NO.90) or SG (SEQ ID NO: 491). Peptide linkers between the MBBBD polypeptide and the catalytic domain can be constructed to be either flexible or positionally constrained to allow for the most efficient activity targeted nucleic acid processing.

[0104] Example 1 below shows that the above polypeptides have the ability to dimerize when fused to the catalytic domain of the nuclease Fok-I. A fusion of BurrH_36 with Fok-I has been achieved to form a sequence specific nuclease being able to cut a putative artificial nucleic acid target. Interestingly, this fusion experiment revealed that, contrary to TALE-Nucleases, there was no requirement for T in the target DNA sequence for the first module to bind said nucleic acid target. It is unclear at the moment whether it is due to the N-terminus (SEQ ID NO.7) or to the first module (SEQ ID NO.11) of the BurrH protein. However, these polypeptides provide a significant advantage over the TALE-Nuclease of the prior art in this regard.

[0105] Accordingly, the invention also provides modular polypeptides or N-terminal sequences to alleviate the requirement of a T in sequences to be targeted by a TALE or TALE-like binding domain. Such module or N-terminal domain according to the invention may thus be introduced in TALE or TALE-like repeat binding domains to overcome the requisite T nucleotide at position -1 in nucleic acid target sequences.

[0106] Truncations, spacers and linkers may be added by one skilled in the art to the polypeptides according to the invention to optimize their binding activity or the catalytic activity conferred by their catalytic domains. The catalytic domain that is capable of processing genetic material withinin or adjacent the nucleic acid target sequence of interest can be fused to the N- or C-terminus part of said binding domains of the invention. In a preferred embodiment two catalytic domains having complementary or distinct activities are fused to both N-terminus and C-terminus parts of said binding domains.

Chimeric proteins



[0107] According to a further aspect of the disclosure, the polypeptides and fusion proteins previously described can be used to create chimeric proteins, which incorporate sequences from AvrBs3-like proteins, in particular repeats, N-terminal or C-terminal sequences thereof.

[0108] Accordingly, the disclosure provides engineered TALE-like proteins with a binding domain comprising a mix of the modules according to the invention and of AvrBs3-like repeats. By providing a larger choice of modules of various affinities with the nucleic acid bases, it is intended to increase the modularity and the various possibilities of assembly within MBBBDs to create customized nucleic acid binding domains.

[0109] Meanwhile, new scaffolds can be derived from AvrBs3-like proteins comprising a module, N or C terminals, or any functional part of the polypeptides from E5AV36, E5AW43, E5AW45, E5AW46, JCVI_A, JCVI_B and ECR81667 previously described. More generally, the chimeric protein of the present invention can be derived from any naturally occurring TAL effectors, such as those described by (Moscou and Bogdanove 2009) and in WO 2011072246., that comprise repeats of 33 to 35 amino acids, wherein two critical amino acids located at positions 12 and 13 (RVD) mediate specific nucleic acid base recognition. In such chimeric proteins, the following RVDs can be used: HD for recognizing C, NG for recognizing T, NI for recognizing A, NN for recognizing G or A, NS for recognizing A, C, G or T, HG for recognizing T, IG for recognizing T, NK for recognizing G, HA for recognizing C, ND for recognizing C, HI for recognizing C, HN for recognizing G, NA for recognizing G, SN for recognizing G or A and YG for recognizing T, TL for recognizing A, VT for recognizing A or G and SW for recognizing A. More preferably, RVDs associated with recognition of the nucleotides C, T, A, G/A and G respectively are selected from the group consisting of NN or NK for recognizing G, HD for recognizing C, NG for recognizing T and NI for recognizing A, TL for recognizing A, VT for recognizing A or G and SW for recognizing A. In another embodiment, RVDs associated with recognition of the nucleotide C are selected from the group consisting of N* and RVDS associated with recognition of the nucleotide T are selected from the group consisting of N* and H*, where * denotes a gap in the repeat sequence that corresponds to a lack of amino acid residue at the second position of the RVD. In another embodiment, critical amino acids 12 and 13 can be mutated towards other amino acid residues in order to modulate their specificity towards nucleotides A, T, C and G and in particular to enhance this specificity. By other amino acid residues is intended any of the twenty natural amino acid residues or unnatural amino acids derivatives. All these RVDs can be used in addition to those with respect to the present invention, especially: NT, **, KG, NR, RN, HS, HH and/or HK.

[0110] As non limiting examples, chimeric MBBBD protein may be created by combining modules domains from E5AV36, E5AW43, E5AW45, EAW46, JCVI_A, JCVI_B and ECR81667 proteins with repeat domain of AvrBs3-like proteins, by combining modules domains from E5AV36, E5AW43, E5AW45, EAW46, JCVI_A, JCVI_B and ECR81667 proteins with the N- and C-terminal domains of AvrBs3-like proteins, by combining N and C-terminal domains of E5AV36, E5AW43, E5AW45, EAW46, JCVI_A, JCVI_B and ECR81667 proteins with repeat domain of AvrBs3-like proteins, by combining the N-terminal domain of AvrBs3-like proteins with modules domain and C-terminal from E5AV36, E5AW43, E5AW45, EAW46, JCVI_A, JCVI_B and ECR81667, by combining part of C-terminal domain of E5AV36, E5AW43, E5AW45, EAW46, JCVI_A, JCVI_B and ECR81667 with part of C-terminal domain of AvrBs3-like protein or other protein sequences as nuclear export signal sequence (see example 9, SEQ ID NO: 259 to SEQ ID NO. 261 and SEQ ID NO; 271 to 274), by combining part of N-terminal domain of E5AV36, E5AW43, E5AW45, EAW46, JCVI_A, JCVI_B and ECR81667 with part of N-terminal domain of AvrBs3-like protein, or by combining part of DNA binding modules of E5AV36, E5AW43, E5AW45, EAW46, JCVI_A, JCVI_B and ECR81667 with part of repeat domain of AvrBs3-like protein More generally, the protein domains from the E5AV36, E5AW43, E5AW45, E5AW46, JCVI_A, JCVI_B and ECR81667 proteins (module domain, N-terminal domain, C-terminal domain) may be used in combination with the complementary domains of classical TAL effectors. A most preferred chimeric protein comprises modules from E5AV36 with a N-terminal from AvrBs3 (see example 12, SEQ ID NO. 370 and SEQ ID NO. 372).

Polynucleotides



[0111] The disclosure also concerns the polynucleotides, in particular DNA or RNA encoding the polypeptides and proteins previously described. These polynucleotides may be included in vectors, more particularly plasmids or virus, in view of being expressed in prokaryotic or eukaryotic cells. The polynucleotides of SEQ ID NO.112 to 120 correspond to the sequences that have been identified according to the invention in the genomic databases. Polynucleotides according to the disclosure encompass polynucleotides having at least 80 %, preferably at least 90 %, more preferably at least 95 and even more preferably 99 % identity with the above polynucleotide sequences.

[0112] The terms "vector" or "vectors" refer to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. A "vector" in the present invention includes, but is not limited to, a viral vector, a plasmid, a RNA vector or a linear or circular DNA or RNA molecule which may consists of a chromosomal, non-chromosomal, semi-synthetic or synthetic nucleic acids. Preferred vectors are those capable of autonomous replication (episomal vector) and/or expression of nucleic acids to which they are linked (expression vectors). Large numbers of suitable vectors are known to those of skill in the art and commercially available. Viral vectors include retrovirus, adenovirus, parvovirus (e. g. adenoassociated viruses), coronavirus, negative strand RNA viruses such as orthomyxovirus (e. g., influenza virus), rhabdovirus (e. g., rabies and vesicular stomatitis virus), paramyxovirus (e. g. measles and Sendai), positive strand RNA viruses such as picornavirus and alphavirus, and double-stranded DNA viruses including adenovirus, herpesvirus (e. g., Herpes Simplex virus types 1 and 2, Epstein-Barr virus, cytomegalovirus), and poxvirus (e. g., vaccinia, fowlpox and canarypox). Other viruses include Norwalk virus, togavirus, flavivirus, reoviruses, papovavirus, hepadnavirus, and hepatitis virus, for example. Examples of retroviruses include: avian leukosis-sarcoma, mammalian C-type, B-type viruses, D type viruses, HTLV-BLV group, lentivirus, spumavirus (Coffin, J. M., Retroviridae: The viruses and their replication, In Fundamental Virology, Third Edition, B. N. Fields, et al., Eds., Lippincott-Raven Publishers, Philadelphia, 1996).

[0113] Preferred vectors are viral vectors, more particularly lentiviral vectors. "viral vector" refers to a nucleic acid construct which carries, and within certain embodiments, is capable of directing the expression of a nucleic acid molecule of interest. The lentiviral vector can include at least one transcriptional promoter/enhancer or locus defining element(s), or other elements which control gene expression by other means such as alternate splicing, nuclear RNA export, post-translational modification of messenger, or post-transcriptional modification of protein. Such vector constructs can also include a packaging signal, long terminal repeats (LTRs) or portion thereof, and positive and negative strand primer binding sites appropriate to the retrovirus used (if these are not already present in the retroviral vector). Optionally, the recombinant lentiviral vector may also include a signal which directs polyadenylation, selectable markers such as Neo, TK, hygromycin, phleomycin, histidinol, or DHFR, as well as one or more restriction sites and a translation termination sequence. By way of example, such vectors typically include a 5' LTR, a tRNA binding site, a packaging signal, an origin of second strand DNA synthesis, and a 3' LTR or a portion thereof. Viral vectors include retrovirus, adenovirus, parvovirus (e. g. adenoassociated viruses), coronavirus, negative strand RNA viruses such as orthomyxovirus (e. g., influenza virus), rhabdovirus (e. g., rabies and vesicular stomatitis virus), paramyxovirus (e. g. measles and Sendai), positive strand RNA viruses such as picornavirus and alphavirus, and double-stranded DNA viruses including adenovirus, herpesvirus (e. g., Herpes Simplex virus types 1 and 2, Epstein-Barr virus, cytomegalovirus), and poxvirus (e. g., vaccinia, fowlpox and canarypox). Other viruses include Norwalk virus, togavirus, flavivirus, reoviruses, papovavirus, hepadnavirus, and hepatitis virus, for example. Examples of retroviruses include: avian leukosis-sarcoma, mammalian C-type, B-type viruses, D type viruses, HTLV-BLV group, lentivirus, spumavirus (Coffin, J. M., Retroviridae: The viruses and their replication, In Fundamental Virology, Third Edition, B. N. Fields, et al., Eds., Lippincott-Raven Publishers, Philadelphia, 1996). More preferably, the present invention relates to a viral vector, preferably a lentiviral vector which comprises polynucleotide encoding MBBBD or MBBBD-fusion protein as described above. Any of these vectors can comprise one or more polynucleotide encoding MBBBD or MBBBD-fusion proteins. As non limiting example, one vector can comprise two sequences encoding two MBBBD monomers which can recognize different adjacent nucleic acid target sequences and the two protein domains function as subdomains that need to interact in order to process the genetic sequence. One vector can also comprise two sequences encoding two monomeric MBBBD which recognize and process two different nucleic acid target sequences.

[0114] "Viral particle" as utilized within the present disclosure refers to a virus which carries at least one gene of interest. The virus may also contain a selectable marker. For instance, HIV type 1 (HIV-1) based vector particles may be generated by co-expressing the virion packaging elements and the vector genome in a so-called producer cell, e.g. 293T human embryonic kidney cells. These cells may be transiently transfected with a number of plasmids. Typically from three to four plasmids are employed, but the number may be greater depending upon the degree to which the lentiviral components are broken up into separate units. Generally, one plasmid encodes the core and enzymatic components of the virion, derived from HIV-1. This plasmid is termed the packaging plasmid. Another plasmid encodes the envelope protein(s), most commonly the G protein of vesicular stomatitis virus (VSV G) because of its high stability and broad tropism. This plasmid may be termed the envelope expression plasmid. Yet another plasmid encodes the genome to be transferred to the target cell, that is, the vector itself, and is called the transfer vector. Recombinant viruses with titers of several millions of transducing units per milliliter (TU/ml) can be generated by this technique and variants thereof. After ultracentrifugation concentrated stocks of approximately 109 TU/ml can be obtained. The lentivirus is capable of reverse transcribing its genetic material into DNA and incorporating this genetic material into a host cell's DNA upon infection. Lentiviral vector particles may have a lentiviral envelope, a non-lentiviral envelope (e.g., an ampho or VSV-G envelope), or a chimeric envelope. The present disclosure relates to a viral, preferably a lentiviral particle which comprises polynucleotides encoding MBBBD or MBBBD-fusion protein as described above.

Methods for processing the genetic material of a cell



[0115] The present disclosure relates to an in vitro method of processing a nucleic acid target sequence of a cell, comprising: (a) providing a cell containing a target nucleic acid sequence; and (b) introducing into the cell a nucleic acid binding polypeptide such that said polypeptide processes the nucleic acid target sequence. Said nucleic acid binding polypeptide can be designed to recognize any suitable nucleic acid target sequence.

[0116] The term "processing" as used herein means that the sequence is considered modified simply by the binding of the polypeptide. Any nucleic acid target sequence can be processed by the present methods. For example, the nucleic acid target sequence can be chromosomal, mitochondrial or chloroplast sequences.

[0117] In another aspect, a method of processing the genetic material of a cell within or adjacent to a nucleic acid target sequence is provided by introducing into the cell fusion MBBBD polypeptides. Catalytic domain of the fusion protein of the present invention can be a transcription activator or repressor (i.e. a transcription regulator), or a protein that interacts with or modifies other proteins implicated in nucleic acid processing. Non-limiting examples of nucleic acid processing activities of said fusion polypeptides of the present invention include, for example, creating or modifying epigenetic regulatory elements, making site-specific insertions, deletions, or repairs in DNA, controlling gene expression, and modifying chromatin structure. Said nucleic acid processing activity can refer to a cleavage activity, either a cleavase activity either a nickase activity, more broadly a nuclease activity but also a polymerase activity, a kinase activity, a phosphatase activity, a methylase activity, a topoisomerase activity, an integrase activity, a transposase activity, a ligase, a helicase or recombinase activity as non-limiting examples.

[0118] By cell or cells is intended any prokaryotic or eukaryotic living cells, cell lines derived from these organisms for in vitro cultures, primary cells from animal or plant origin.

[0119] By "primary cell" or "primary cells" are intended cells taken directly from living tissue (i.e. biopsy material) and established for growth in vitro, that have undergone very few population doublings and are therefore more representative of the main functional components and characteristics of tissues from which they are derived from, in comparison to continuous tumorigenic or artificially immortalized cell lines. These cells thus represent a more valuable model to the in vivo state they refer to.

[0120] In the frame of the present disclosure, "eukaryotic cells" refer to a yeast, fungal, plant or animal cell or a cell line derived from the organisms listed below and established for in vitro culture. More preferably, the fungus is of the genus Aspergillus, Penicillium, Acremonium, Trichoderma, Chrysoporium, Mortierella, Kluyveromyces or Pichia. More preferably the plant is of the genus Arabidospis, Nicotiana, Solanum, lactuca, Brassica, Glycine, Oryza, Asparagus, Pisum, Medicago, Zea, Hordeum, Secale, Triticum, Capsicum, Cucumis, Cucurbita, Citrullis, Citrus, or Sorghum.

[0121] More preferably the animal cell is of the genus Homo, Rattus, Mus, Sus, Bos, Danio, Canis, Felis, Equus, Salmo, Oncorhynchus, Gallus, Meleagris, Drosophila, or Caenorhabditis;

[0122] In the present disclosure, the cell can be a plant cell, a mammalian cell, a fish cell, an insect cell or cell lines derived from these organisms for in vitro cultures or primary cells which have been taken directly from living tissue and established for in vitro culture. As non-limiting examples, cell can be protoplasts obtained from plant organisms listed above. As non-limiting examples cell lines can be selected from the group consisting of CHO-K1 cells; HEK293 cells; Caco2 cells; U2-OS cells; NIH 3T3 cells; NSO cells; SP2 cells; CHO-S cells; DG44 cells; K-562 cells, U-937 cells; MRC5 cells; IMR90 cells; Jurkat cells; HepG2 cells; HeLa cells; HT-1080 cells; HCT-116 cells; Hu-h7 cells; Huvec cells; Molt 4 cells.

[0123] All these cell lines can be modified by the method of the present invention to provide cell line models to produce, express, quantify, detect, study a gene or a protein of interest; these models can also be used to screen biologically active molecules of interest in research and production and various fields such as chemical, biofuels, therapeutics and agronomy as non-limiting examples. Adoptive immunotherapy using genetically engineered T cells is a promising approach for the treatment of malignancies and infectious diseases. Most current approaches rely on gene transfer by random integration of an appropriate T Cell Receptor (TCR) or Chimeric Antigen Receptor (CAR). Targeted approach using rare-cutting endonucleases is an efficient and safe alternative method to transfer genes into T cells and generate genetically engineered T cells.

Methods of genetic engineering / gene editing / mutagenesis



[0124] The present invention also relates to in vitro methods for use of said polypeptides polynucleotides and proteins previously described for various applications ranging from targeted nucleic acid cleavage to targeted gene regulation. In genome engineering experiments, the efficiency of nuclease fusion protein or chimeric protein as referred to in the present patent application, e.g. their ability to induce a desired event (Homologous gene targeting, targeted mutagenesis, sequence removal or excision) at a locus, depends on several parameters, including the specific activity of the nuclease, probably the accessibility of the target, and the efficacy and outcome of the repair pathway(s) resulting in the desired event (homologous repair for gene targeting, NHEJ pathways for targeted mutagenesis). The present invention more particularly relates to an in vitro method for modifying the genetic material of a cell within or adjacent to a nucleic acid target sequence. The double strand breaks caused by endonucleases are commonly repaired through non-homologous end joining (NHEJ). NHEJ comprises at least two different processes. Mechanisms involve rejoining of what remains of the two DNA ends through direct re-ligation (Critchlow and Jackson 1998) or via the so-called microhomology-mediated end joining (Ma, Kim et al. 2003). Repair via non-homologous end joining (NHEJ) often results in small insertions or deletions and can be used for the creation of specific gene knockouts. The present invention related to an in vitro method for modifying the genetic material of a cell within or adjacent to a nucleic acid target sequence by using nuclease MBBBD fusion protein according to the invention that allows nucleic acid cleavage that will lead to the loss of genetic information and any NHEJ pathway will produce targeted mutagenesis. In a preferred embodiment, the present invention related to an in vitro method for modifying the genetic material of a cell within or adjacent to a nucleic acid target sequence by generating at least one nucleic acid cleavage and a loss of genetic information around said target nucleic acid sequence thus preventing any scarless re-ligation by NHEJ. Said modification may be a deletion of the genetic material, insertion of nucleotides in the genetic material or a combination of both deletion and insertion of nucleotides.

[0125] By "homologous" is intended a sequence with enough identity to another one to lead to homologous recombination between sequences, more particularly having at least 95 % identity, preferably 97 % identity and more preferably 99 %.

[0126] The present disclosure also relates to a method for modifying target nucleic acid sequence further comprising the step of expressing an additional catalytic domain into a host cell. In a more preferred embodiment, the present invention relates to a method to increase mutagenesis wherein said additional catalytic domain is a DNA end-processing enzyme. Non limiting examples of DNA end-processing enzymes include 5-3' exonucleases, 3-5' exonucleases, 5-3' alkaline exonucleases, 5' flap endonucleases, helicases, hosphatase, hydrolases and template-independent DNA polymerases. Non limiting examples of such catalytic domain comprise of a protein domain or catalytically active derivate of the protein domain seleced from the group consisting of hExol (EXO1_HUMAN), Yeast Exol (EXO1_YEAST), E.coli Exol, Human TREX2, Mouse TREX1, Human TREX1, Bovine TREX1, Rat TREX1, TdT (terminal deoxynucleotidyl transferase) Human DNA2, Yeast DNA2 (DNA2_YEAST). In a preferred embodiment, said additional catalytic domain has a 3'-5'-exonuclease activity, and in a more preferred embodiment, said additional catalytic domain has TREX exonuclease activity, more preferably TREX2 activity (WO2012058458). In another preferred embodiment, said catalytic domain is encoded by a single chain TREX polypeptide (WO2013009525). Said additional catalytic domain may be fused to a nuclease fusion protein or chimeric protein according to the disclosure optionally by a peptide linker.

[0127] Endonucleolytic breaks are known to stimulate the rate of homologous recombination. Therefore, in another preferred embodiment, the present disclosure relates to a method for inducing homologous gene targeting in the target nucleic acid sequence further comprising providing to the cell an exogeneous nucleic acid comprising at least a sequence homologous to a portion of the target nucleic acid sequence, such that homologous recombination occurs between the target nucleic acid sequence and the exogeneous nucleic acid.

[0128] Said exogenous nucleic acid usually comprises a sequence homologous to at least a portion of the target nucleic acid sequence, such that homologous recombination occurs between the target nucleic acid sequence and the exogenous nucleic acid. In particular embodiments, said exogenous nucleic acid comprises first and second portions which are homologous to region 5' and 3' of the target nucleic acid, respectively. Said exogenous nucleic acid in these embodiments also comprises a third portion positioned between the first and the second portion which comprises no homology with the regions 5' and 3' of the target nucleic acid sequence. Following cleavage of the target nucleic acid sequence, a homologous recombination event is stimulated between the genome containing the target nucleic acid sequence and the exogenous nucleic acid. Preferably, homologous sequences of at least 50 bp, preferably more than 100 bp and more preferably more than 200 bp are used within said donor matrix. Therefore, the exogenous nucleic acid is preferably from 200 bp to 6000 bp, more preferably from 1000 bp to 2000 bp. Indeed, shared nucleic acid homologies are located in regions flanking upstream and downstream the site of the break and the nucleic acid sequence to be introduced should be located between the two arms.

[0129] Said exogenous nucleic acid can comprise a positive selection marker between the two homology arms and eventually a negative selection marker upstream of the first homology arm or downstream of the second homology arm. The marker(s) allow(s) the selection of the cells having inserted the sequence of interest by homologous recombination at the target site. Depending on the location of the targeted genome sequence wherein break event has occurred, such exogenous nucleic acid can be used to knock-out a gene, e.g. when exogenous nucleic acid is located within the open reading frame of said gene, or to introduce new sequences or genes of interest. Sequence insertions by using such exogenous nucleic acid can be used to modify a targeted existing gene, by correction or replacement of said gene (allele swap as a non-limiting example), or to up- or down-regulate the expression of the targeted gene (promoter swap as non-limiting example), said targeted gene correction or replacement.

[0130] The methods of the disclosure involve introducing a polynucleotide encoding MBBBD polypeptide into a cell. Methods for introducing a polynucleotide construct into bacteria, plants, fungi and animals are known in the art and including as non limiting examples stable transformation methods wherein the polynucleotide construct is integrated into the genome of the cell, transient transformation methods wherein the polynucleotide construct is not integrated into the genome of the cell and virus mediated methods. Said polynucleotides encoding MBBBD polypeptide may be introduced into a cell by for example, recombinant viral vectors (e.g. retroviruses, adenoviruses), liposomes and the like. For example, transient transformation methods include for example microinjection, electroporation, particle bombardment The MBBD polypeptide may be synthesized in situ in the cell as a result of the introduction of polynucleotide encoding polypeptide into the cell. Alternatively, the MBBBD polypeptide could be produced outside the cell and then introduced thereto.

[0131] In a preferred aspect of the disclosure, the method for targeting genetic material of a cell comprises providing a cell comprising a nucleic acid target and introducing the polynucleotide encoding MBBBD or MBBBD fusion protein as described above into the cell via a viral particle, and expressing said polynucleotide within the cell. In particular, the viral particle comprises the polynucleotide and said polynucleotide is introduced into the cell by contacting said cell with the viral particle under condition that permits infection.

[0132] Engineered MBBBD polypeptides can be produced by rearranging modules thus allowing the generation of modular domain with novel target nucleic acid specificity. Each different MBBBD modules can be engineered following the recognition code according to the present invention. The present invention relates to a method to produce MBBBD polypeptides capable of binding to any desired nucleic acid target sequence by assembling the different engineered MBBBD modules in the appropriate order.

Method for generating a non-human animal/ a plant



[0133] Non-human animals may be generated by introducing MBBBD polypeptide into a cell or a non-human embryo. In particular, the present disclosure relates to a method for generating an animal, comprising providing an eukaryotic cell comprising a nucleic acid target sequence into which it is desired to introduce a genetic modification; generating a cleavage within or adjacent to the nucleic acid target sequence by introducing a MBBBD polypeptide according to the present invention; and generating an animal from the cell or progeny thereof, in which cleavage has occurred. Typically, the embryo is a fertilized one cell stage embryo. Polynucleotides encoding said MBBBD polypeptides may be introduced into the cell by any of the methods known in the art including micro injection into the nucleus or cytoplasm of the embryo. In a particular aspect, the method for generating a non-human animal, further comprise introducing an exogenous nucleic acid as desired. Said exogenous nucleic acid comprises a sequence homologous to at least a portion of the nucleic acid target sequence, such that homologous recombination occurs between said exogenous nucleic acid and the nucleic acid target sequence in the cell or progeny thereof. The exogenous nucleic acid can include for example a nucleic acid sequence that disrupts a gene after homologous recombination, a nucleic acid sequence that replaces a gene after homologous recombination, a nucleic acid sequence that introduces a mutation into a gene after homologous recombination or a nucleic acid sequence that introduce a regulatory site after homologous recombination. The embryos are then cultures to develop an animal. In one aspect of the disclosure, a non-human an animal in which at least a nucleic acid target sequence of interest has been engineered is provided. For example, an engineered gene may become inactivated such that it is not transcribed or properly translated, or an alternate form of the gene is expressed. The animal may be homozygous or heterozygous for the engineered gene.

[0134] The present disclosure also related to a method for generating a plant comprising providing a plant cell comprising a nucleic acid target sequence into which it is desired to introduce a genetic modification; generating a cleavage within or adjacent to the nucleic acid target sequence by introducing a MBBD polypeptide according to the present invention; and generating a plant from the cell or progeny thereof, in which cleavage has occurred. Progeny includes descendants of a particular plant or plant line. In a particular embodiment, the method for generating a plant, further comprise introducing an exogenous nucleic acid as desired. Said exogenous nucleic acid comprises a sequence homologous to at least a portion of the nucleic acid target sequence, such that homologous recombination occurs between said exogenous nucleic acid and the nucleic acid target sequence in the cell or progeny thereof. Plant cells produced using methods can be grown to generate plants having in their genome a modified nucleic acid target sequence. Seeds from such plants can be used to generate plants having a phenotype such as, for example, an altered growth characteristic, altered appearance, or altered compositions with respect to unmodified plants.

[0135] The polypeptides of the invention are useful to engineer genomes and to reprogram cells, especially iPS cells and ES cells.

Therapeutic applications



[0136] From the above, the polypeptides according to the invention can be used as a medicament, especially for modulating, activating or inhibiting gene transcription, at the promoter level or through their catalytic domains.

[0137] Fusion proteins composed of a binding domain according to the invention and of a catalytic domain with nuclease activity can be used for the treatment of a genetic disease to correct a mutation at a specific locus or to inactivate a gene the expression of which is deleterious. Such proteins can also be used to genetically modify iPS or primary cells, for instance T-cells, in view of injected such cells into a patient for treating a disease or infection. Such cell therapy schemes are more particularly developed for treating cancer, viral infection such as caused by CMV or HIV or self-immune diseases.

[0138] Having generally described this invention, a further understanding can be obtained by reference to certain specific examples, which are provided herein for purposes of illustration.

SEQUENCE LISTING



[0139] 

<110> CELLECTIS

<120> NEW MODULAR SPECIFIC NUCLEIC ACID BINDING DOMAINS FROM BUKHOLDERIA RHIZOXINICA PROTEINS

<130> 418769WO

<160> 535

<170> PatentIn version 3.5

<210> 1
<211> 757
<212> PRT
<213> Xanthomonas

<220>
<223> AvrBs3 CLS

<400> 1







<210> 2
<211> 771
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> E5AV36_BURRH

<400> 2







<210> 3
<211> 310
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> E5AW43_BURRH

<400> 3





<210> 4
<211> 997
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> E5AW45_BURRH

<400> 4









<210> 5
<211> 63
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> E5AW46_BURRH

<400> 5

<210> 6
<211> 287
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> N-ter AvrBs3 P14727

<400> 6



<210> 7
<211> 82
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> N-ter E5AV36_BURRH

<400> 7

<210> 8
<211> 83
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> N-ter E5AW43_BURRH

<400> 8



<210> 9
<211> 83
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> N-ter E5AW45_BURRH

<400> 9

<210> 10
<211> 34
<212> PRT
<213> Xanthomonas

<220>
<223> Repeat AvrBs3 consensus

<400> 10

<210> 11
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_1

<400> 11

<210> 12
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_2

<400> 12

<210> 13
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_3

<400> 13

<210> 14
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_4

<400> 14

<210> 15
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_5

<400> 15

<210> 16
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_6

<400> 16



<210> 17
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_7

<400> 17

<210> 18
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_8

<400> 18

<210> 19
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_9

<400> 19



<210> 20
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_10

<400> 20

<210> 21
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_11

<400> 21

<210> 22
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_12

<400> 22

<210> 23
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_13

<400> 23

<210> 24
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_14

<400> 24

<210> 25
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_15

<400> 25

<210> 26
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_16

<400> 26

<210> 27
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_17

<400> 27

<210> 28
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_18

<400> 28

<210> 29
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_19

<400> 29

<210> 30
<211> 32
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AV36_20

<400> 30

<210> 31
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW43_1

<400> 31

<210> 32
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW43_2

<400> 32

<210> 33
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW43_3

<400> 33

<210> 34
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW43_4

<400> 34

<210> 35
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW43_5

<400> 35

<210> 36
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW43_6

<400> 36



<210> 37
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_1

<400> 37

<210> 38
<211> 31
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_2

<400> 38

<210> 39
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_3

<400> 39



<210> 40
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_4

<400> 40

<210> 41
<211> 31
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_5

<400> 41

<210> 42
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_6

<400> 42

<210> 43
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_7

<400> 43

<210> 44
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_8

<400> 44

<210> 45
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_9

<400> 45



<210> 46
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_10

<400> 46

<210> 47
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_11

<400> 47

<210> 48
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_12

<400> 48



<210> 49
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_13

<400> 49

<210> 50
<211> 31
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_14

<400> 50

<210> 51
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_15

<400> 51



<210> 52
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_16

<400> 52

<210> 53
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_17

<400> 53

<210> 54
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_18

<400> 54



<210> 55
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_19

<400> 55

<210> 56
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_20

<400> 56

<210> 57
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_21

<400> 57

<210> 58
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_22

<400> 58

<210> 59
<211> 32
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_23

<400> 59

<210> 60
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_24

<210> 61
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_25

<400> 61

<210> 62
<211> 33
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_26

<400> 62

<210> 63
<211> 32
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> Module E5AW45_27

<400> 63

<210> 64
<211> 30
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> C-ter E5AV36_BURRH

<400> 64

<210> 65
<211> 30
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> C-ter E5AW43_BURRH

<400> 65

<210> 66
<211> 30
<212> PRT
<213> burkholderia rhizoxinica

<220>
<223> C-ter E5AW45_BURRH

<400> 66

<210> 67
<211> 328
<212> PRT
<213> Unknown

<220>
<223> Full lenght EBN19409

<400> 67



<210> 68
<211> 247
<212> PRT
<213> Unknown

<220>
<223> Full lenght ECG96325

<400> 68



<210> 69
<211> 296
<212> PRT
<213> Unknown

<220>
<223> Full lenght ECG96326

<400> 69



<210> 70
<211> 224
<212> PRT
<213> Unknown

<220>
<223> Full lenght EBN19408

<400> 70



<210> 71
<211> 143
<212> PRT
<213> Unknown

<220>
<223> Full lenght ECR81667

<400> 71



<210> 72
<211> 595
<212> PRT
<213> Unknown

<220>
<223> Full lenght JVCI_A

<400> 72





<210> 73
<211> 552
<212> PRT
<213> Unknown

<220>
<223> Full lenght JVCI_B

<400> 73





<210> 74
<211> 66
<212> PRT
<213> Unknown

<220>
<223> N-ter JCVI_A

<400> 74

<210> 75
<211> 76
<212> PRT
<213> Unknown

<220>
<223> N-ter JCVI_B

<400> 75

<210> 76
<211> 20
<212> PRT
<213> Unknown

<220>
<223> N-ter ECR81667

<400> 76

<210> 77
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 1 JCVI_A

<400> 77

<210> 78
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 2 JCVI_A

<400> 78

<210> 79
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 3 JCVI_A

<400> 79

<210> 80
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 4 JCVI_A

<400> 80

<210> 81
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 5 JCVI_A

<400> 81

<210> 82
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 6 JCVI_A

<400> 82

<210> 83
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 7 JCVI_A

<400> 83

<210> 84
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 8 JCVI_A

<400> 84

<210> 85
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 9 JCVI_A

<400> 85

<210> 86
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 10 JCVI_A

<400> 86

<210> 87
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 11 JCVI_A

<400> 87



<210> 88
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 12 JCVI_A

<400> 88

<210> 89
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 13 JCVI_A

<400> 89

<210> 90
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 14 JCVI_A

<400> 90



<210> 91
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 1 JCVI_B

<400> 91

<210> 92
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 2 JCVI_B

<400> 92

<210> 93
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 3 JCVI_B

<400> 93



<210> 94
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 4 JCVI_B

<400> 94

<210> 95
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 5 JCVI_B

<400> 95

<210> 96
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 6 JCVI_B

<400> 96

<210> 97
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 7 JCVI_B

<400> 97

<210> 98
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 8 JCVI_B

<400> 98

<210> 99
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 9 JCVI_B

<400> 99

<210> 100
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 10 JCVI_B

<400> 100

<210> 101
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 11 JCVI_B

<400> 101

<210> 102
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 12 JCVI_B

<400> 102

<210> 103
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 13 JCVI_B

<400> 103

<210> 104
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 14 JCVI_B

<400> 104

<210> 105
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module 15 JCVI_B

<400> 105

<210> 106
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module ECR81667_1

<400> 106

<210> 107
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module ECR81667_2

<400> 107



<210> 108
<211> 33
<212> PRT
<213> Unknown

<220>
<223> Module ECR81667_3

<400> 108

<210> 109
<211> 24
<212> PRT
<213> Unknown

<220>
<223> C-ter ECR81667

<400> 109

<210> 110
<211> 24
<212> PRT
<213> Unknown

<220>
<223> C-ter JCVI_A JCVI_B

<400> 110

<210> 111
<211> 278
<212> PRT
<213> Xanthomonas

<220>
<223> C-ter AvrBs3 P14727

<400> 111



<210> 112
<211> 2994
<212> DNA
<213> burkholderia rhizoxinica

<220>
<223> E5AW45_BURRH Gene ID: 10430071

<400> 112



<210> 113
<211> 2316
<212> DNA
<213> burkholderia rhizoxinica

<220>
<223> E5AV36_BURRH Gene ID: 9979518

<400> 113



<210> 114
<211> 936
<212> DNA
<213> burkholderia rhizoxinica

<220>
<223> E5AW43_BURRH Gene ID: 10430017

<400> 114

<210> 115
<211> 192
<212> DNA
<213> burkholderia rhizoxinica

<220>
<223> E5AW46_BURRH Gene ID: 10430015

<400> 115

<210> 116
<211> 675
<212> DNA
<213> Unknown

<220>
<223> EBN19408.1 marine metagenome hypothetical protein : Location:1..675

<400> 116



<210> 117
<211> 988
<212> DNA
<213> Unknown

<220>
<223> EBN19409.1 marine metagenome partial hypothetical protein :
Location:1..988

<400> 117

<210> 118
<211> 891
<212> DNA
<213> Unknown

<220>
<223> ECG96326.1 marine metagenome partial hypothetical protein :
Location:1..891

<400> 118



<210> 119
<211> 740
<212> DNA
<213> Unknown

<220>
<223> ECG96325.1 marine metagenome partial hypothetical protein :
Location:1..741

<400> 119

<210> 120
<211> 433
<212> DNA
<213> Unknown

<220>
<223> ECR81667.1 marine metagenome partial hypothetical protein :
Location:1..433

<400> 120

<210> 121
<211> 498
<212> DNA
<213> artificial sequence

<220>
<223> pCLS17028

<400> 121

<210> 122
<211> 1998
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_RVD_array1

<400> 122

<210> 123
<211> 616
<212> DNA
<213> artificial sequence

<220>
<223> FokI

<400> 123

<210> 124
<211> 1086
<212> DNA
<213> artificial sequence

<220>
<223> pCLS17419

<400> 124

<210> 125
<211> 3021
<212> DNA
<213> artificial sequence

<220>
<223> pCLS17421

<400> 125



<210> 126
<211> 57
<212> DNA
<213> artificial sequence

<220>
<223> BURRH_v01

<400> 126
taagagaagc aaagacgtta ctagcatgaa ggtaccgtaa cgtctttgct tctctta   57

<210> 127
<211> 57
<212> DNA
<213> artificial sequence

<220>
<223> BURRH_v02

<400> 127
aaagagaagc aaagacgtta ctagcatgaa ggtaccgtaa cgtctttgct tctcttt   57

<210> 128
<211> 57
<212> DNA
<213> artificial sequence

<220>
<223> BURRH_v03

<400> 128
caagagaagc aaagacgtta ctagcatgaa ggtaccgtaa cgtctttgct tctcttg   57

<210> 129
<211> 57
<212> DNA
<213> artificial sequence

<220>
<223> BURRH_v04

<400> 129
gaagagaagc aaagacgtta ctagcatgaa ggtaccgtaa cgtctttgct tctcttc   57

<210> 130
<211> 57
<212> DNA
<213> artificial sequence

<220>
<223> BURRH_v05

<400> 130
taagcgaagc aactacgtta ctagcatgaa ggtaccgtaa cgtagttgct tcgctta   57

<210> 131
<211> 57
<212> DNA
<213> artificial sequence

<220>
<223> BURRH_v06

<400> 131
aaagcgaagc aactacgtta ctagcatgaa ggtaccgtaa cgtagttgct tcgcttt   57

<210> 132
<211> 57
<212> DNA
<213> artificial sequence

<220>
<223> BURRH_v07

<400> 132
caagcgaagc aactacgtta ctagcatgaa ggtaccgtaa cgtagttgct tcgcttg   57

<210> 133
<211> 57
<212> DNA
<213> artificial sequence

<220>
<223> BURRH_v08

<400> 133
gaagcgaagc aactacgtta ctagcatgaa ggtaccgtaa cgtagttgct tcgcttc   57

<210> 134
<211> 57
<212> DNA
<213> artificial sequence

<220>
<223> BURRH_v09

<400> 134
taagagaagc aaatacgtta ctagcatgaa ggtaccgtaa cgtatttgct tctctta   57

<210> 135
<211> 57
<212> DNA
<213> artificial sequence

<220>
<223> BURRH_v10

<400> 135
aaagagaagc aaatacgtta ctagcatgaa ggtaccgtaa cgtatttgct tctcttt   57

<210> 136
<211> 57
<212> DNA
<213> artificial sequence

<220>
<223> BURRH_v11

<400> 136
caagagaagc aaatacgtta ctagcatgaa ggtaccgtaa cgtatttgct tctcttg   57

<210> 137
<211> 57
<212> DNA
<213> artificial sequence

<220>
<223> BURRH_v12

<400> 137
gaagagaagc aaatacgtta ctagcatgaa ggtaccgtaa cgtatttgct tctcttc   57

<210> 138
<211> 57
<212> DNA
<213> artificial sequence

<220>
<223> CTRL traget

<400> 138
tttatataaa cctaaccctc ttagcatgaa ggtaccagag ggttaggttt atataca   57

<210> 139
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v13

<400> 139

<210> 140
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v14

<400> 140

<210> 141
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v15

<400> 141

<210> 142
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v16

<400> 142

<210> 143
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v17

<400> 143

<210> 144
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v18

<400> 144

<210> 145
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v19

<400> 145

<210> 146
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v20

<400> 146

<210> 147
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v21

<400> 147

<210> 148
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v22

<400> 148

<210> 149
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v23

<400> 149

<210> 150
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v24

<400> 150

<210> 151
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v25

<400> 151

<210> 152
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v26

<400> 152

<210> 153
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v27

<400> 153

<210> 154
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v28

<400> 154

<210> 155
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v29

<400> 155

<210> 156
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v30

<400> 156

<210> 157
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v33

<400> 157





<210> 159
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36_v35

<400> 159

<210> 160
<211> 1803
<212> DNA
<213> artificial sequence

<220>
<223> pCLS18120

<400> 160

<210> 161
<211> 2826
<212> DNA
<213> artificial sequence

<220>
<223> pCLS18473

<400> 161



<210> 162
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36 1

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 162

<210> 163
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36 2

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 163

<210> 164
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36 3

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 164



<210> 165
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36   4

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 165

<210> 166
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36   5

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 166

<210> 167
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36   6

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 167

<210> 168
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36   7

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 168

<210> 169
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36   8

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 169

<210> 170
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36   9

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 170

<210> 171
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36   10

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 171



<210> 172
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36   11

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 172

<210> 173
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36   12

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 173

<210> 174
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36   13

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 174

<210> 175
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36   14

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 175

<210> 176
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36   15

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 176

<210> 177
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36   16

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 177

<210> 178
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36   17

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 178



<210> 179
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36   18

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 179

<210> 180
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36   19

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 180

<210> 181
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> Module BurrH_36   20

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 181

<210> 182
<211> 43
<212> DNA
<213> artificial sequence

<220>
<223> Avr05

<400> 182
tatataaacc taaccctcta ggtaagaggg ttaggtttat ata   43

<210> 183
<211> 44
<212> DNA
<213> artificial sequence

<220>
<223> Avr06

<400> 183
tatataaacc taaccctcta aggtaagagg gttaggttta tata   44

<210> 184
<211> 45
<212> DNA
<213> artificial sequence

<220>
<223> Avr07

<400> 184
tatataaacc taaccctcta aggtacagag ggttaggttt atata   45

<210> 185
<211> 46
<212> DNA
<213> artificial sequence

<220>
<223> Avr08

<400> 185
tatataaacc taaccctctg aaggtacaga gggttaggtt tatata   46

<210> 186
<211> 47
<212> DNA
<213> artificial sequence

<220>
<223> Avr09

<400> 186
tatataaacc taaccctctg aaggtaccag agggttaggt ttatata   47

<210> 187
<211> 48
<212> DNA
<213> artificial sequence

<220>
<223> AvrlO

<400> 187
tatataaacc taaccctctt gaaggtacca gagggttagg tttatata   48

<210> 188
<211> 49
<212> DNA
<213> artificial sequence

<220>
<223> Avr11

<400> 188
tatataaacc taaccctctt gaaggtacct agagggttag gtttatata   49

<210> 189
<211> 50
<212> DNA
<213> artificial sequence

<220>
<223> Avr12

<400> 189
tatataaacc taaccctcta tgaaggtacc tagagggtta ggtttatata   50

<210> 190
<211> 51
<212> DNA
<213> artificial sequence

<220>
<223> Avr13

<400> 190
tatataaacc taaccctcta tgaaggtacc ttagagggtt aggtttatat a   51

<210> 191
<211> 52
<212> DNA
<213> artificial sequence

<220>
<223> Avr14

<400> 191
tatataaacc taaccctctc atgaaggtac cttagagggt taggtttata ta   52

<210> 192
<211> 53
<212> DNA
<213> artificial sequence

<220>
<223> AVR15

<400> 192
tatataaacc taaccctctt agcatgaagg taccagaggg ttaggtttat ata   53

<210> 193
<211> 54
<212> DNA
<213> artificial sequence

<220>
<223> Avr16

<400> 193
tatataaacc taaccctctg catgaaggta ccttgagagg gttaggttta tata   54

<210> 194
<211> 55
<212> DNA
<213> artificial sequence

<220>
<223> Avr17

<400> 194
tatataaacc taaccctctg catgaaggta ccttgtagag ggttaggttt atata   55

<210> 195
<211> 56
<212> DNA
<213> artificial sequence

<220>
<223> Avr18

<400> 195
tatataaacc taaccctcta gcatgaaggt accttgtaga gggttaggtt tatata   56

<210> 196
<211> 57
<212> DNA
<213> artificial sequence

<220>
<223> Avr19

<400> 196
tatataaacc taaccctcta gcatgaaggt accttgtcag agggttaggt ttatata   57

<210> 197
<211> 58
<212> DNA
<213> artificial sequence

<220>
<223> Avr20

<400> 197
tatataaacc taaccctctt agcatgaagg taccttgtca gagggttagg tttatata   58

<210> 198
<211> 59
<212> DNA
<213> artificial sequence

<220>
<223> Avr21

<400> 198
tatataaacc taaccctctt agcatgaagg taccttgtcg agagggttag gtttatata   59

<210> 199
<211> 60
<212> DNA
<213> artificial sequence

<220>
<223> Avr22

<400> 199
tatataaacc taaccctctt agcatgaagg taccttgtcg tagagggtta ggtttatata   60

<210> 200
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> Avr23

<400> 200

<210> 201
<211> 62
<212> DNA
<213> artificial sequence

<220>
<223> Avr24

<400> 201

<210> 202
<211> 63
<212> DNA
<213> artificial sequence

<220>
<223> Avr25

<400> 202

<210> 203
<211> 64
<212> DNA
<213> artificial sequence

<220>
<223> Avr26

<400> 203

<210> 204
<211> 65
<212> DNA
<213> artificial sequence

<220>
<223> Avr27

<400> 204

<210> 205
<211> 66
<212> DNA
<213> artificial sequence

<220>
<223> Avr28

<400> 205

<210> 206
<211> 67
<212> DNA
<213> artificial sequence

<220>
<223> Avr29

<400> 206

<210> 207
<211> 68
<212> DNA
<213> artificial sequence

<220>
<223> Avr30

<400> 207

<210> 208
<211> 69
<212> DNA
<213> artificial sequence

<220>
<223> Avr31

<400> 208

<210> 209
<211> 70
<212> DNA
<213> artificial sequence

<220>
<223> Avr32

<400> 209

<210> 210
<211> 71
<212> DNA
<213> artificial sequence

<220>
<223> Avr33

<400> 210

<210> 211
<211> 72
<212> DNA
<213> artificial sequence

<220>
<223> Avr34

<400> 211

<210> 212
<211> 73
<212> DNA
<213> artificial sequence

<220>
<223> Avr35

<400> 212

<210> 213
<211> 74
<212> DNA
<213> artificial sequence

<220>
<223> Avr36

<400> 213

<210> 214
<211> 75
<212> DNA
<213> artificial sequence

<220>
<223> Avr37

<400> 214

<210> 215
<211> 76
<212> DNA
<213> artificial sequence

<220>
<223> Avr38

<400> 215

<210> 216
<211> 77
<212> DNA
<213> artificial sequence

<220>
<223> Avr39

<400> 216

<210> 217
<211> 78
<212> DNA
<213> artificial sequence

<220>
<223> Avr40

<400> 217

<210> 218
<211> 1605
<212> DNA
<213> artificial sequence

<220>
<223> pCLS18123

<400> 218

<210> 219
<211> 1605
<212> DNA
<213> artificial sequence

<220>
<223> pCLS18127

<400> 219



<210> 220
<211> 2628
<212> DNA
<213> artificial sequence

<220>
<223> pCLS18476

<400> 220



<210> 221
<211> 2628
<212> DNA
<213> artificial sequence

<220>
<223> pCLS18480

<400> 221



<210> 222
<211> 49
<212> DNA
<213> artificial sequence

<220>
<223> RAGT2.4

<400> 222
tgtttatggt tacttatatg tgtgtaacag gtataagtaa ccataaaca   49

<210> 223
<211> 49
<212> DNA
<213> artificial sequence

<220>
<223> RAGT2.3

<400> 223
tatatttaag cacttatatg tgtgtaacag gtataagtgc ttaaatata   49

<210> 224
<211> 1605
<212> DNA
<213> artificial sequence

<220>
<223> pCLS18124

<400> 224



<210> 225
<211> 1605
<212> DNA
<213> artificial sequence

<220>
<223> pCLS18125

<400> 225

<210> 226
<211> 1605
<212> DNA
<213> artificial sequence

<220>
<223> pCLS18126

<400> 226



<210> 227
<211> 2628
<212> DNA
<213> artificial sequence

<220>
<223> pCLS18477

<400> 227



<210> 228
<211> 2628
<212> DNA
<213> artificial sequence

<220>
<223> pCLS18478

<400> 228



<210> 229
<211> 2628
<212> DNA
<213> artificial sequence

<220>
<223> pCLS18479

<400> 229



<210> 230
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> consensus first_5

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 230

<210> 231
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> consensus_all

<220>
<221> misc_feature
<222> (12)..(13)
<223> Xaa can be any amino acid

<400> 231

<210> 232
<211> 1086
<212> DNA
<213> artificial sequence

<220>
<223> pCLS18645

<400> 232

<210> 233
<211> 1104
<212> DNA
<213> artificial sequence

<220>
<223> pCLS18646

<400> 233

<210> 234
<211> 92
<212> DNA
<213> artificial sequence

<220>
<223> NLS-Stag

<400> 234

<210> 235
<211> 2628
<212> DNA
<213> artificial sequence

<220>
<223> pCLS19041

<400> 235



<210> 236
<211> 2646
<212> DNA
<213> artificial sequence

<220>
<223> pCLS19042

<400> 236

<210> 237
<211> 60
<212> DNA
<213> artificial sequence

<220>
<223> burrH_36 target endogenous loci

<400> 237
aaccccattg tccgggaacc cagagctcac agccacgatc ttagacccga gcccacagag   60

<210> 238
<211> 1605
<212> DNA
<213> artificial sequence

<220>
<223> pCLS19087

<400> 238

<210> 239
<211> 1605
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20851

<400> 239



<210> 240
<211> 2628
<212> DNA
<213> artificial sequence

<220>
<223> pCLS19638

<400> 240



<210> 241
<211> 2628
<212> DNA
<213> artificial sequence

<220>
<223> pCLS19679

<400> 241



<210> 242
<211> 63
<212> DNA
<213> artificial sequence

<220>
<223> Primer for

<400> 242

<210> 243
<211> 52
<212> DNA
<213> artificial sequence

<220>
<223> Primer rev

<400> 243
cctatcccct gtgtgccttg gcagtctcag gtgagatcca gagcccagcc tg   52

<210> 244
<211> 51
<212> DNA
<213> artificial sequence

<220>
<223> CAPT locus 18

<400> 244
gtccgggaac ccagagctca cagccacgat cttagacccg agcccacaga g   51

<210> 245
<211> 1803
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20311

<400> 245



<210> 246
<211> 1803
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20312

<400> 246



<210> 247
<211> 2826
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21603

<400> 247



<210> 248
<211> 2844
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21607

<400> 248



<210> 249
<211> 55
<212> DNA
<213> artificial sequence

<220>
<223> CAPT locus 20

<400> 249
ttgtccggga acccagagct cacagccacg atcttagacc cgagcccaca gagcc   55

<210> 250
<211> 2001
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20313

<400> 250

<210> 251
<211> 2001
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20314

<400> 251

<210> 252
<211> 3024
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21604

<400> 252



<210> 253
<211> 3042
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21608

<400> 253



<210> 254
<211> 1284
<212> DNA
<213> artificial sequence

<220>
<223> pCLS9893

<400> 254



<210> 255
<211> 22
<212> DNA
<213> artificial sequence

<220>
<223> KI-1F

<400> 255
aattgcggcc gcggtccggc gc   22

<210> 256
<211> 22
<212> DNA
<213> artificial sequence

<220>
<223> KI1-R

<400> 256
aaaaaggccg gtagcccata cc   22

<210> 257
<211> 31
<212> DNA
<213> artificial sequence

<220>
<223> KI2-F

<400> 257
gccgccgccg cccttcaaga acgagttaac c   31

<210> 258
<211> 22
<212> DNA
<213> artificial sequence

<220>
<223> KI2-R

<400> 258
ttaaggcgcg ccggaccgcg gc   22

<210> 259
<211> 37
<212> PRT
<213> artificial sequence

<220>
<223> Hybrid C-terminal domain 1

<400> 259

<210> 260
<211> 55
<212> PRT
<213> artificial sequence

<220>
<223> Hybrid C-terminal domain 2

<400> 260

<210> 261
<211> 52
<212> PRT
<213> artificial sequence

<220>
<223> Hybrid C-terminal domain 3

<400> 261

<210> 262
<211> 163
<212> DNA
<213> artificial sequence

<220>
<223> Hybrid C-terminal domain 1

<400> 262

<210> 263
<211> 217
<212> DNA
<213> artificial sequence

<220>
<223> Hybrid C-terminal domain 2

<400> 263

<210> 264
<211> 208
<212> DNA
<213> artificial sequence

<220>
<223> Hybrid C-terminal domain 3

<400> 264



<210> 265
<211> 1110
<212> DNA
<213> artificial sequence

<220>
<223> pCLS19785

<400> 265

<210> 266
<211> 1164
<212> DNA
<213> artificial sequence

<220>
<223> pCLS19787

<400> 266



<210> 267
<211> 1155
<212> DNA
<213> artificial sequence

<220>
<223> pCLS19788

<400> 267

<210> 268
<211> 2850
<212> DNA
<213> artificial sequence

<220>
<223> pCLS19815

<400> 268



<210> 269
<211> 2904
<212> DNA
<213> artificial sequence

<220>
<223> pCLS19816

<400> 269



<210> 270
<211> 2895
<212> DNA
<213> artificial sequence

<220>
<223> pCLS19817

<400> 270



<210> 271
<211> 39
<212> PRT
<213> artificial sequence

<220>
<223> Hybrid C-terminal domain 4

<400> 271

<210> 272
<211> 45
<212> PRT
<213> artificial sequence

<220>
<223> Hybrid C-terminal domain 5

<400> 272



<210> 273
<211> 169
<212> DNA
<213> artificial sequence

<220>
<223> Hybrid C-terminal domain 4

<400> 273

<210> 274
<211> 187
<212> DNA
<213> artificial sequence

<220>
<223> Hybrid C-terminal domain 5

<400> 274

<210> 275
<211> 1116
<212> DNA
<213> artificial sequence

<220>
<223> pCLS22405

<400> 275



<210> 276
<211> 1134
<212> DNA
<213> artificial sequence

<220>
<223> pCLS22406

<400> 276

<210> 277
<211> 1134
<212> DNA
<213> artificial sequence

<220>
<223> pCLS22420

<400> 277



<210> 278
<211> 1152
<212> DNA
<213> artificial sequence

<220>
<223> pCLS22421

<400> 278

<210> 279
<211> 1605
<212> DNA
<213> artificial sequence

<220>
<223> pCLS19088

<400> 279



<210> 280
<211> 2658
<212> DNA
<213> artificial sequence

<220>
<223> pCLS23511

<400> 280



<210> 281
<211> 2676
<212> DNA
<213> artificial sequence

<220>
<223> pCLS23513

<400> 281



<210> 282
<211> 2676
<212> DNA
<213> artificial sequence

<220>
<223> PCLS23531

<400> 282

<210> 283
<211> 2694
<212> DNA
<213> artificial sequence

<220>
<223> pCLS23533

<400> 283

<210> 284
<211> 49
<212> DNA
<213> artificial sequence

<220>
<223> CAPT1.1

<400> 284
tccgggaacc cagagctcac agccacgatc ttagacccga gcccacaga   49

<210> 285
<211> 1214
<212> DNA
<213> artificial sequence

<220>
<223> Nterfok scaffold

<400> 285

<210> 286
<211> 1170
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21170

<400> 286



<210> 287
<211> 2910
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21226

<400> 287



<210> 288
<211> 1086
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20474

<400> 288

<210> 289
<211> 2628
<212> DNA
<213> artificial sequence

<220>
<223> pCLS23060

<400> 289



<210> 290
<211> 19
<212> DNA
<213> artificial sequence

<220>
<223> single Avr

<400> 290
tatataaacc taaccctct   19

<210> 291
<211> 43
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr05b

<400> 291
agagggttag gtttatataa ggtatatata aacctaaccc tct   43

<210> 292
<211> 44
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr06b

<400> 292
agagggttag gtttatataa aggtatatat aaacctaacc ctct   44

<210> 293
<211> 45
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr07b

<400> 293
agagggttag gtttatataa aggtactata taaacctaac cctct   45

<210> 294
<211> 46
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr08b

<400> 294
agagggttag gtttatatag aaggtactat ataaacctaa ccctct   46

<210> 295
<211> 47
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr09b

<400> 295
agagggttag gtttatatag aaggtaccta tataaaccta accctct   47

<210> 296
<211> 48
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr10b

<400> 296
agagggttag gtttatatat gaaggtacct atataaacct aaccctct   48

<210> 297
<211> 49
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr11b

<400> 297
agagggttag gtttatatat gaaggtacct tatataaacc taaccctct   49

<210> 298
<211> 50
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr12b

<400> 298
agagggttag gtttatataa tgaaggtacc ttatataaac ctaaccctct   50

<210> 299
<211> 51
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr13b

<400> 299
agagggttag gtttatataa tgaaggtacc tttatataaa cctaaccctc t   51

<210> 300
<211> 52
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr14b

<400> 300
agagggttag gtttatatac atgaaggtac ctttatataa acctaaccct ct   52

<210> 301
<211> 53
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr15b

<400> 301
agagggttag gtttatatac atgaaggtac cttgtatata aacctaaccc tct   53

<210> 302
<211> 54
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr16b

<400> 302
agagggttag gtttatatag catgaaggta ccttgtatat aaacctaacc ctct   54

<210> 303
<211> 55
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr17b

<400> 303
agagggttag gtttatatag catgaaggta ccttgttata taaacctaac cctct   55

<210> 304
<211> 56
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr18b

<400> 304
agagggttag gtttatataa gcatgaaggt accttgttat ataaacctaa ccctct   56

<210> 305
<211> 57
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr19b

<400> 305
agagggttag gtttatataa gcatgaaggt accttgtcta tataaaccta accctct   57

<210> 306
<211> 58
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr20b

<400> 306
agagggttag gtttatatat agcatgaagg taccttgtct atataaacct aaccctct   58

<210> 307
<211> 59
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr21b

<400> 307
agagggttag gtttatatat agcatgaagg taccttgtcg tatataaacc taaccctct   59

<210> 308
<211> 60
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr22b

<400> 308
agagggttag gtttatatat agcatgaagg taccttgtcg ttatataaac ctaaccctct   60

<210> 309
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr23b

<400> 309

<210> 310
<211> 62
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr24b

<400> 310

<210> 311
<211> 63
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr25b

<400> 311

<210> 312
<211> 64
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr26b

<400> 312

<210> 313
<211> 65
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr27b

<400> 313

<210> 314
<211> 66
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr28b

<400> 314

<210> 315
<211> 67
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr29b

<400> 315

<210> 316
<211> 68
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr30b

<400> 316

<210> 317
<211> 69
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr31b

<400> 317

<210> 318
<211> 70
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr32b

<400> 318

<210> 319
<211> 71
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr33b

<400> 319

<210> 320
<211> 72
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr34b

<210> 321
<211> 73
<212> DNA
<213> artificial sequence

<220>
<223> NfusAvr35b

<400> 321

<210> 322
<211> 17
<212> DNA
<213> artificial sequence

<220>
<223> single RAGT2.4

<400> 322
tgtttatggt tacttat   17

<210> 323
<211> 41
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr05

<400> 323
tgtttatggt tacttatagg tatatataaa cctaaccctc t   41

<210> 324
<211> 42
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr06

<400> 324
tgtttatggt tacttataag gtatatataa acctaaccct ct   42

<210> 325
<211> 43
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr07

<400> 325
tgtttatggt tacttataag gtactatata aacctaaccc tct   43

<210> 326
<211> 44
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr08

<400> 326
tgtttatggt tacttatgaa ggtactatat aaacctaacc ctct   44

<210> 327
<211> 45
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr09

<400> 327
tgtttatggt tacttatgaa ggtacctata taaacctaac cctct   45

<210> 328
<211> 46
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr10

<400> 328
tgtttatggt tacttattga aggtacctat ataaacctaa ccctct   46

<210> 329
<211> 47
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr11

<400> 329
tgtttatggt tacttattga aggtacctta tataaaccta accctct   47

<210> 330
<211> 48
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr12

<400> 330
tgtttatggt tacttatatg aaggtacctt atataaacct aaccctct   48

<210> 331
<211> 49
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr13

<400> 331
tgtttatggt tacttatatg aaggtacctt tatataaacc taaccctct   49

<210> 332
<211> 50
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr14

<400> 332
tgtttatggt tacttatcat gaaggtacct ttatataaac ctaaccctct   50

<210> 333
<211> 51
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr15

<400> 333
tgtttatggt tacttattag catgaaggta cctatataaa cctaaccctc t   51

<210> 334
<211> 52
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr16

<400> 334
tgtttatggt tacttatgca tgaaggtacc ttgtatataa acctaaccct ct   52

<210> 335
<211> 53
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr17

<400> 335
tgtttatggt tacttatgca tgaaggtacc ttgttatata aacctaaccc tct   53

<210> 336
<211> 54
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr18

<400> 336
tgtttatggt tacttatagc atgaaggtac cttgttatat aaacctaacc ctct   54

<210> 337
<211> 55
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr19

<400> 337
tgtttatggt tacttatagc atgaaggtac cttgtctata taaacctaac cctct   55

<210> 338
<211> 56
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr20

<400> 338
tgtttatggt tacttattag catgaaggta ccttgtctat ataaacctaa ccctct   56

<210> 339
<211> 57
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr21

<400> 339
tgtttatggt tacttattag catgaaggta ccttgtcgta tataaaccta accctct   57

<210> 340
<211> 58
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr22

<400> 340
tgtttatggt tacttattag catgaaggta ccttgtcgtt atataaacct aaccctct   58

<210> 341
<211> 59
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr23

<400> 341
tgtttatggt tacttatcta gcatgaaggt accttgtcgt tatataaacc taaccctct   59

<210> 342
<211> 60
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr24

<400> 342
tgtttatggt tacttatcta gcatgaaggt accttgtcgt ttatataaac ctaaccctct   60

<210> 343
<211> 61
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr25

<400> 343

<210> 344
<211> 62
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr26

<400> 344

<210> 345
<211> 63
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr27

<400> 345

<210> 346
<211> 64
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr28

<400> 346

<210> 347
<211> 65
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr29

<400> 347

<210> 348
<211> 66
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr30

<400> 348

<210> 349
<211> 67
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr31

<400> 349

<210> 350
<211> 68
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr32

<210> 351
<211> 69
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr33

<210> 352
<211> 70
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr34

<400> 352

<210> 353
<211> 71
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr35

<210> 354
<211> 72
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr36

<400> 354

<210> 355
<211> 73
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr37

<400> 355

<210> 356
<211> 74
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr38

<400> 356

<210> 357
<211> 75
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr39

<400> 357

<210> 358
<211> 76
<212> DNA
<213> artificial sequence

<220>
<223> C_N_RAGAvr40

<400> 358

<210> 359
<211> 1605
<212> DNA
<213> artificial sequence

<220>
<223> DNA binding domain

<400> 359

<210> 360
<211> 17
<212> DNA
<213> artificial sequence

<220>
<223> RAGT2.3 sequence target

<400> 360
tatatttaag cacttat   17

<210> 361
<211> 2628
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21549

<400> 361

<210> 362
<211> 1605
<212> DNA
<213> artificial sequence

<220>
<223> DNA binding domain

<400> 362

<210> 363
<211> 2628
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21558

<400> 363



<210> 364
<211> 1605
<212> DNA
<213> artificial sequence

<220>
<223> DNA binding domain

<400> 364

<210> 365
<211> 2628
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21559

<400> 365

<210> 366
<211> 135
<212> PRT
<213> artificial sequence

<220>
<223> AvrBs3 D152

<400> 366

<210> 367
<211> 2236
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20720

<400> 367



<210> 368
<211> 2218
<212> DNA
<213> artificial sequence

<220>
<223> PCLS20721

<400> 368



<210> 369
<211> 2790
<212> DNA
<213> artificial sequence

<220>
<223> pCLS22251

<400> 369



<210> 370
<211> 928
<212> PRT
<213> artificial sequence

<220>
<223> pCLS22251

<400> 370









<210> 371
<211> 2808
<212> DNA
<213> artificial sequence

<220>
<223> pCLS22247

<400> 371



<210> 372
<211> 934
<212> PRT
<213> artificial sequence

<220>
<223> pCLS22247

<400> 372







<210> 373
<211> 558
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20716

<400> 373

<210> 374
<211> 576
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20717

<400> 374

<210> 375
<211> 1266
<212> DNA
<213> artificial sequence

<220>
<223> pCLS22244

<400> 375



<210> 376
<211> 1248
<212> DNA
<213> artificial sequence

<220>
<223> pCLS22245

<400> 376

<210> 377
<211> 2790
<212> DNA
<213> artificial sequence

<220>
<223> pCLS23592

<400> 377



<210> 378
<211> 2808
<212> DNA
<213> artificial sequence

<220>
<223> pCLS23591

<400> 378



<210> 379
<211> 396
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20653

<400> 379

<210> 380
<211> 399
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20654

<400> 380

<210> 381
<211> 495
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20655

<400> 381

<210> 382
<211> 396
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20656

<400> 382

<210> 383
<211> 396
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20657

<210> 384
<211> 396
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20658

<400> 384

<210> 385
<211> 399
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20659

<400> 385

<210> 386
<211> 396
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20660

<400> 386



<210> 387
<211> 399
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20661

<400> 387

<210> 388
<211> 549
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20662

<400> 388

<210> 389
<211> 1086
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21492

<400> 389







<210> 391
<211> 1185
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21494

<400> 391



<210> 392
<211> 1086
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21495

<400> 392

<210> 393
<211> 1086
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21496

<400> 393



<210> 394
<211> 1086
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21497

<400> 394

<210> 395
<211> 1089
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21498

<400> 395



<210> 396
<211> 1086
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21499

<400> 396

<210> 397
<211> 1089
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21500

<400> 397



<210> 398
<211> 1239
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21501

<400> 398

<210> 399
<211> 2826
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21512

<400> 399



<210> 400
<211> 2829
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21513

<400> 400



<210> 401
<211> 2925
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21514

<400> 401



<210> 402
<211> 2826
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21515

<400> 402



<210> 403
<211> 2826
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21516

<400> 403



<210> 404
<211> 2826
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21517

<400> 404



<210> 405
<211> 2829
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21518

<400> 405



<210> 406
<211> 2826
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21519

<400> 406



<210> 407
<211> 2829
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21520

<400> 407



<210> 408
<211> 2979
<212> DNA
<213> artificial sequence

<220>
<223> pCLS21521

<400> 408



<210> 409
<211> 2628
<212> DNA
<213> artificial sequence

<220>
<223> Xyl_BurrH_L (from pCLS21030)

<400> 409



<210> 410
<211> 2646
<212> DNA
<213> artificial sequence

<220>
<223> Xyl_BurrH_R (from pCLS21113):

<400> 410



<210> 411
<211> 4722
<212> DNA
<213> artificial sequence

<220>
<223> YFP (from pCLS13857)

<400> 411



<210> 412
<211> 55
<212> PRT
<213> artificial sequence

<220>
<223> N-ter BurrH_36 (Delta26)

<400> 412

<210> 413
<211> 246
<212> PRT
<213> artificial sequence

<220>
<223> I-TevI catalytic domain

<400> 413



<210> 414
<211> 498
<212> DNA
<213> artificial sequence

<220>
<223> BurrH_36 core scaffold

<400> 414

<210> 415
<211> 3032
<212> DNA
<213> artificial sequence

<220>
<223> pCLS7865

<400> 415



<210> 416
<211> 184
<212> PRT
<213> artificial sequence

<220>
<223> TevD02

<400> 416



<210> 417
<211> 138
<212> PRT
<213> artificial sequence

<220>
<223> TevM01

<400> 417



<210> 418
<211> 354
<212> PRT
<213> artificial sequence

<220>
<223> TevCreD02

<400> 418



<210> 419
<211> 9363
<212> DNA
<213> artificial sequence

<220>
<223> pCLS6615

<400> 419







<210> 420
<211> 1782
<212> DNA
<213> artificial sequence

<220>
<223> DBA-Burrh_36-AvrBs3

<400> 420



<210> 421
<211> 897
<212> PRT
<213> artificial sequence

<220>
<223> TevD02-BurrH chimeric endonuclease

<400> 421







<210> 422
<211> 851
<212> PRT
<213> artificial sequence

<220>
<223> TevM01-BurrH chimeric endonuclease

<400> 422







<210> 423
<211> 8334
<212> DNA
<213> artificial sequence

<220>
<223> pCLS0542

<400> 423





<210> 424
<211> 6
<212> DNA
<213> artificial sequence

<220>
<223> Natural TevI cleavage site

<400> 424
caacgc   6

<210> 425
<211> 18
<212> DNA
<213> artificial sequence

<220>
<223> AvrBs3 recognition site

<400> 425
atataaacct aaccctct   18

<210> 426
<211> 1605
<212> DNA
<213> artificial sequence

<220>
<223> RVD_bhEGFP_T03g06

<400> 426

<210> 427
<211> 16
<212> DNA
<213> artificial sequence

<220>
<223> RVD_bhEGFP_T03g06 target

<400> 427
gaagttcatc tgcacc   16

<210> 428
<211> 5673
<212> DNA
<213> artificial sequence

<220>
<223> pCLS1853

<400> 428



<210> 429
<211> 785
<212> PRT
<213> artificial sequence

<220>
<223> TevM01_b36EGfpT3g6

<400> 429







<210> 430
<211> 5428
<212> DNA
<213> artificial sequence

<220>
<223> pCLS0003

<400> 430



<210> 431
<211> 6627
<212> DNA
<213> artificial sequence

<220>
<223> pCLS8982

<400> 431



<210> 432
<211> 3165
<212> DNA
<213> artificial sequence

<220>
<223> pCLS2198

<400> 432



<210> 433
<211> 147
<212> PRT
<213> artificial sequence

<220>
<223> st2

<400> 433

<210> 434
<211> 3509
<212> DNA
<213> artificial sequence

<220>
<223> pCLS9008

<400> 434



<210> 435
<211> 610
<212> DNA
<213> artificial sequence

<220>
<223> RVD_ctEGFP_T03g12-L1

<400> 435



<210> 436
<211> 16
<212> DNA
<213> artificial sequence

<220>
<223> RVD_ctEGFP_T03g12-L1 target

<400> 436
gaccctgaag ttcatc   16

<210> 437
<211> 6134
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20650

<400> 437





<210> 438
<211> 820
<212> PRT
<213> artificial sequence

<220>
<223> TevI::cT11EGfpT3g12

<400> 438







<210> 439
<211> 1086
<212> DNA
<213> artificial sequence

<220>
<223> pCLS23330

<400> 439



<210> 440
<211> 1491
<212> DNA
<213> artificial sequence

<220>
<223> pCLS23453

<400> 440

<210> 441
<211> 3426
<212> DNA
<213> artificial sequence

<220>
<223> pCLS23638

<400> 441



<210> 442
<211> 3033
<212> DNA
<213> artificial sequence

<220>
<223> pCLS23636

<400> 442

<210> 443
<211> 21
<212> DNA
<213> artificial sequence

<220>
<223> dBurrH_36 WT Effector specific target

<400> 443
taagagaagc aaagacgtta c   21

<210> 444
<211> 17
<212> DNA
<213> artificial sequence

<220>
<223> dBurrH_36 HBB Effector specific target

<400> 444
tgcaccatgg tgtctgt 17

<210> 445
<211> 15
<212> DNA
<213> artificial sequence

<220>
<223> dBurrH/dTAL Effector non-specific target

<400> 445
tcccgagtcc ccaat   15

<210> 446
<211> 831
<212> DNA
<213> artificial sequence

<220>
<223> pCLS23601

<400> 446

<210> 447
<211> 824
<212> DNA
<213> artificial sequence

<220>
<223> pCLS20585

<400> 447

<210> 448
<211> 827
<212> DNA
<213> artificial sequence

<220>
<223> pCLS23598

<400> 448

<210> 449
<211> 702
<212> DNA
<213> artificial sequence

<220>
<223> BFP ORF

<400> 449



<210> 450
<211> 681
<212> DNA
<213> artificial sequence

<220>
<223> DsRED ORF

<400> 450

<210> 451
<211> 3
<212> PRT
<213> artificial sequence

<220>
<223> 1a8h_1

<400> 451

<210> 452
<211> 4
<212> PRT
<213> artificial sequence

<220>
<223> 1dnpA_1

<400> 452

<210> 453
<211> 4
<212> PRT
<213> artificial sequence

<220>
<223> ld8cA_2

<400> 453

<210> 454
<211> 4
<212> PRT
<213> artificial sequence

<220>
<223> 1ckqA_3

<400> 454

<210> 455
<211> 4
<212> PRT
<213> artificial sequence

<220>
<223> 1sbp_1

<400> 455

<210> 456
<211> 5
<212> PRT
<213> artificial sequence

<220>
<223> 1ev7A_1

<400> 456

<210> 457
<211> 5
<212> PRT
<213> artificial sequence

<220>
<223> 1alo_3

<400> 457

<210> 458
<211> 5
<212> PRT
<213> artificial sequence

<220>
<223> 1amf_1

<400> 458

<210> 459
<211> 6
<212> PRT
<213> artificial sequence

<220>
<223> ladjA_3

<400> 459

<210> 460
<211> 6
<212> PRT
<213> artificial sequence

<220>
<223> 1fcdC_1

<400> 460

<210> 461
<211> 6
<212> PRT
<213> artificial sequence

<220>
<223> 1al3_2

<400> 461

<210> 462
<211> 7
<212> PRT
<213> artificial sequence

<220>
<223> 1g3p_1

<400> 462

<210> 463
<211> 7
<212> PRT
<213> artificial sequence

<220>
<223> 1acc_3

<400> 463

<210> 464
<211> 8
<212> PRT
<213> artificial sequence

<220>
<223> 1ahjB_1

<400> 464

<210> 465
<211> 8
<212> PRT
<213> artificial sequence

<220>
<223> 1acc_1

<400> 465

<210> 466
<211> 8
<212> PRT
<213> artificial sequence

<220>
<223> 1af7_1

<400> 466

<210> 467
<211> 9
<212> PRT
<213> artificial sequence

<220>
<223> 1heiA_1

<400> 467

<210> 468
<211> 9
<212> PRT
<213> artificial sequence

<220>
<223> 1bia_2

<400> 468

<210> 469
<211> 9
<212> PRT
<213> artificial sequence

<220>
<223> 1igtB_1

<400> 469

<210> 470
<211> 10
<212> PRT
<213> artificial sequence

<220>
<223> 1nfkA_1

<400> 470

<210> 471
<211> 10
<212> PRT
<213> artificial sequence

<220>
<223> 1au7A_1

<400> 471

<210> 472
<211> 11
<212> PRT
<213> artificial sequence

<220>
<223> 1bpoB_1

<400> 472

<210> 473
<211> 11
<212> PRT
<213> artificial sequence

<220>
<223> 1b0pA_2

<400> 473

<210> 474
<211> 14
<212> PRT
<213> artificial sequence

<220>
<223> lc05A_2

<400> 474

<210> 475
<211> 14
<212> PRT
<213> artificial sequence

<220>
<223> 1gcb_1

<400> 475

<210> 476
<211> 14
<212> PRT
<213> artificial sequence

<220>
<223> 1bt3A_1

<400> 476

<210> 477
<211> 15
<212> PRT
<213> artificial sequence

<220>
<223> 1b3oB_2

<400> 477

<210> 478
<211> 21
<212> PRT
<213> artificial sequence

<220>
<223> 16vpA_6

<400> 478

<210> 479
<211> 21
<212> PRT
<213> artificial sequence

<220>
<223> 1dhx_1

<400> 479

<210> 480
<211> 26
<212> PRT
<213> artificial sequence

<220>
<223> 1b8aA_1

<400> 480

<210> 481
<211> 28
<212> PRT
<213> artificial sequence

<220>
<223> 1qu6A_1

<400> 481

<210> 482
<211> 20
<212> PRT
<213> artificial sequence

<220>
<223> NFS1

<400> 482

<210> 483
<211> 23
<212> PRT
<213> artificial sequence

<220>
<223> NFS2

<400> 483

<210> 484
<211> 10
<212> PRT
<213> artificial sequence

<220>
<223> CFS1

<400> 484

<210> 485
<211> 32
<212> PRT
<213> artificial sequence

<220>
<223> RM2

<400> 485

<210> 486
<211> 27
<212> PRT
<213> artificial sequence

<220>
<223> BQY

<400> 486

<210> 487
<211> 5
<212> PRT
<213> artificial sequence

<220>
<223> QGPSG

<400> 487

<210> 488
<211> 8
<212> PRT
<213> artificial sequence

<220>
<223> LGPDGRKA

<400> 488

<210> 489
<211> 7
<212> PRT
<213> artificial sequence

<220>
<223> GRSGSDP

<400> 489

<210> 490
<211> 2
<212> PRT
<213> artificial sequence

<220>
<223> IA

<400> 490
Ile Ala

1

<210> 491
<211> 2
<212> PRT
<213> artificial sequence

<220>
<223> SG

<400> 491

<210> 492
<211> 15
<212> PRT
<213> artificial sequence

<220>
<223> TAL1

<400> 492

<210> 493
<211> 20
<212> PRT
<213> artificial sequence

<220>
<223> TAL2

<400> 493

<210> 494
<211> 22
<212> PRT
<213> artificial sequence

<220>
<223> TAL3

<400> 494

<210> 495
<211> 17
<212> PRT
<213> artificial sequence

<220>
<223> TAL4

<400> 495

<210> 496
<211> 26
<212> PRT
<213> artificial sequence

<220>
<223> TAL5

<400> 496

<210> 497
<211> 38
<212> PRT
<213> artificial sequence

<220>
<223> TAL6

<400> 497

<210> 498
<211> 21
<212> PRT
<213> artificial sequence

<220>
<223> TAL7

<400> 498

<210> 499
<211> 21
<212> PRT
<213> artificial sequence

<220>
<223> TAL8

<400> 499

<210> 500
<211> 21
<212> PRT
<213> artificial sequence

<220>
<223> TAL9

<400> 500

<210> 501
<211> 22
<212> PRT
<213> artificial sequence

<220>
<223> TAL10

<400> 501

<210> 502
<211> 23
<212> PRT
<213> artificial sequence

<220>
<223> TAL11

<400> 502

<210> 503
<211> 23
<212> PRT
<213> artificial sequence

<220>
<223> TAL12

<400> 503

<210> 504
<211> 26
<212> PRT
<213> artificial sequence

<220>
<223> TAL13

<400> 504

<210> 505
<211> 16
<212> PRT
<213> artificial sequence

<220>
<223> TAL14

<400> 505

<210> 506
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> TAL15

<400> 506

<210> 507
<211> 17
<212> PRT
<213> artificial sequence

<220>
<223> TAL16

<400> 507

<210> 508
<211> 19
<212> PRT
<213> artificial sequence

<220>
<223> TAL17

<400> 508



<210> 509
<211> 26
<212> PRT
<213> artificial sequence

<220>
<223> TAL18

<400> 509

<210> 510
<211> 16
<212> PRT
<213> artificial sequence

<220>
<223> TAL19

<400> 510

<210> 511
<211> 16
<212> PRT
<213> artificial sequence

<220>
<223> TAL20

<400> 511

<210> 512
<211> 18
<212> PRT
<213> artificial sequence

<220>
<223> TAL21

<400> 512

<210> 513
<211> 27
<212> PRT
<213> artificial sequence

<220>
<223> TAL22

<400> 513

<210> 514
<211> 18
<212> PRT
<213> artificial sequence

<220>
<223> TAL23

<400> 514

<210> 515
<211> 16
<212> PRT
<213> artificial sequence

<220>
<223> TAL24

<400> 515

<210> 516
<211> 20
<212> PRT
<213> artificial sequence

<220>
<223> TAL25

<400> 516

<210> 517
<211> 17
<212> PRT
<213> artificial sequence

<220>
<223> TAL26

<400> 517

<210> 518
<211> 19
<212> PRT
<213> artificial sequence

<220>
<223> TAL27

<400> 518

<210> 519
<211> 33
<212> PRT
<213> artificial sequence

<220>
<223> TAL28

<400> 519

<210> 520
<211> 18
<212> PRT
<213> artificial sequence

<220>
<223> TAL29

<400> 520

<210> 521
<211> 20
<212> PRT
<213> artificial sequence

<220>
<223> TAL30

<400> 521

<210> 522
<211> 40
<212> PRT
<213> artificial sequence

<220>
<223> TAL31

<400> 522

<210> 523
<211> 31
<212> PRT
<213> artificial sequence

<220>
<223> TAL32

<400> 523

<210> 524
<211> 31
<212> PRT
<213> artificial sequence

<220>
<223> TAL33

<400> 524

<210> 525
<211> 26
<212> PRT
<213> artificial sequence

<220>
<223> TAL34

<400> 525



<210> 526
<211> 31
<212> PRT
<213> artificial sequence

<220>
<223> TAL35

<400> 526

<210> 527
<211> 31
<212> PRT
<213> artificial sequence

<220>
<223> TAL36

<400> 527

<210> 528
<211> 26
<212> PRT
<213> artificial sequence

<220>
<223> TAL37

<400> 528

<210> 529
<211> 37
<212> PRT
<213> artificial sequence

<220>
<223> Linker A

<400> 529

<210> 530
<211> 37
<212> PRT
<213> artificial sequence

<220>
<223> Linker B

<400> 530

<210> 531
<211> 37
<212> PRT
<213> artificial sequence

<220>
<223> Linker C

<400> 531

<210> 532
<211> 44
<212> PRT
<213> artificial sequence

<220>
<223> Linker D

<400> 532

<210> 533
<211> 40
<212> PRT
<213> artificial sequence

<220>
<223> Linker E

<400> 533

<210> 534
<211> 38
<212> PRT
<213> artificial sequence

<220>
<223> Linker F

<400> 534

<210> 535
<211> 40
<212> PRT
<213> artificial sequence

<220>
<223> Linker G

<400> 535




Claims

1. A polypeptide that comprises a succession of modules from 30 to 35 amino acids, said succession of modules displaying a base per base specificity towards a nucleic acid target sequence, wherein

(i) at least one of said modules has at least 70% sequence identity with one of the module polypeptide sequences from the protein E5AV36 of SEQ ID No: 2 from Burkholderia rhizoxinica and has at least 90% sequence identity with a polypeptide sequence selected from the group consisting of: SEQ ID NO: 162 to 181,

(ii) said at least one of module comprises in position 13 a unique variable amino acid residue that determines the specificity of each module towards a nucleotide base as follows:

AANucleotide base
I, S, T A
G, R T
D, T, * C
N, R G
wherein "*" means no amino acid, and

(iii) said succession of modules is fused to a catalytic domain having an endonuclease activity under monomeric or dimeric form.


 
2. The polypeptide of claim 1, in which the amino acid sequence of said at least one module is selected from the group consisting of SEQ ID Nos: 11 to 30.
 
3. The polypeptide according to claim 1 or claim 2, wherein said target nucleic acid sequence does not comprise a thymidine nucleotide at position -1.
 
4. The polypeptide according to any one of claims 1 to 3, wherein said polypeptide further comprises a module that has at least 80 % amino acid identity with an AvrBs3 repeat of SEQ ID NO. 10.
 
5. A polypeptide according to any one of claims 1 to 4, wherein said catalytic domain comes from the endonuclease Fokl or a homing endonuclease.
 
6. A polypeptide according to any one of claims 1 to 4, wherein said catalytic domain is from I-Tevl.
 
7. Use of a polypeptide according to any one of claims 1 to 6, for in vitro processing the genetic material of a cell within or adjacent to said nucleic acid target sequence.
 
8. A polypeptide according to any one of claims 1 to 6, for use as a medicament.
 
9. An in vitro method for targeting the genetic material of a cell, comprising:

(a) providing a cell, preferably a mammalian cell or a plant cell comprising a nucleic acid target sequence; and

(b) introducing into the cell a polynucleotide according to any one of claims 1 to 8, and optionally an exogenous nucleic acid comprising a sequence homologous to at least a portion of the target nucleic acid sequence, such that homologous recombination occurs between said exogenous nucleic acid and the target nucleic acid sequence;

wherein the polypeptide encoded by said polynucleotide processes the genetic material of the cell within or adjacent to said target nucleic acid sequence.
 


Ansprüche

1. Polypeptid, welches eine Abfolge von Modulen von 30 bis 35 Aminosäuren umfasst, wobei die Abfolge von Modulen eine Base-für-Base-Spezifität gegenüber einer Nukleinsäure-Zielsequenz zeigt, wobei

(i) mindestens eines der Module mindestens 70% Sequenzidentität mit einer der Modul-Polypeptidsequenzen aus dem Protein E5AV36 der SEQ ID No: 2 aus Burkholderia rhizoxinica aufweist und mindestens 90% Sequenzidentität mit einer Polypeptidsequenz aufweist, die aus der Gruppe ausgewählt ist, welche aus SEQ ID NO: 162 bis 181 besteht,

(ii) das mindestens eine Modul in Position 13 einen einzelnen variablen Aminosäurerest umfasst, der die Spezifität jedes Moduls gegenüber einer Nukleotidbase wie folgt bestimmt:

ASNukleotidbase
I, S, T A
G, R T
D, T, * C
N, R G
wobei "*" keine Aminosäure bedeutet, und

(iii) die Abfolge von Modulen mit einer katalytischen Domäne fusioniert ist, welche in monomerer oder dimerer Form eine Endonuklease-Aktivität aufweist.


 
2. Polypeptid nach Anspruch 1, in welchem die Aminosäuresequenz des mindestens einen Moduls aus der Gruppe ausgewählt ist, die aus den SEQ ID Nos: 11 bis 30 besteht.
 
3. Polypeptid nach Anspruch 1 oder 2, wobei die Ziel-Nukleinsäuresequenz kein Thymidin-Nukleotid an Position -1 umfasst.
 
4. Polypeptid nach einem der Ansprüche 1 bis 3, wobei das Polypeptid zudem ein Modul umfasst, das mindestens 80% Aminosäureidentität mit einer AvrBs3-Wiederholungseinheit der SEQ ID NO. 10 aufweist.
 
5. Polypeptid nach einem der Ansprüche 1 bis 4, wobei die katalytische Domäne aus der Endonuklease Fokl oder einer Homing-Endonuklease kommt.
 
6. Polypeptid nach einem der Ansprüche 1 bis 4, wobei die katalytische Domäne aus I-Tevl stammt.
 
7. Verwendung eines Polypeptids nach einem der Ansprüche 1 bis 6 zur in-vitro-Bearbeitung des genetischen Materials einer Zelle innerhalb oder benachbart zu der Nukleinsäure-Zielsequenz.
 
8. Polypeptid nach einem der Ansprüche 1 bis 6 zur Verwendung als Medikament.
 
9. In-vitro-Verfahren zum Targeting des genetischen Materials einer Zelle, welches umfasst:

(a) Bereitstellen einer Zelle, bevorzugt einer Säugetierzelle oder einer Pflanzenzelle, welche die Nukleinsäure-Zielsequenz umfasst; und

(b) in die Zelle Einführen eines Polynukleotids nach einem der Ansprüche 1 bis 8 und optional einer exogenen Nukleinsäure, die eine zu mindestens einem Teil der Ziel-Nukleinsäuresequenz homologe Sequenz umfasst, so dass homologe Rekombination zwischen der exogenen Nukleinsäure und der Ziel-Nukleinsäuresequenz stattfindet;

wobei das Polypeptid, für welches das Polynukleotid kodiert, das genetische Material der Zelle innerhalb oder benachbart zu der Ziel-Nukleinsäuresequenz bearbeitet.
 


Revendications

1. Polypeptide qui comprend une succession de modules de 30 à 35 acides aminés, ladite succession de modules présentant une spécificité base par base envers une séquence cible d'acide nucléique, dans lequel

(i) au moins un desdits modules présente au moins 70 % d'identité de séquence avec une des séquences des modules polypeptidiques de la protéine E5AV36 de SEQ ID NO: 2 de Burkholderia rhizoxinica et présente au moins 90 % d'identité de séquence avec une séquence polypeptidique sélectionnée dans le groupe consistant en : SEQ ID NO: 162 à 181,

(ii) ledit au moins un des modules comprend, à la position 13, un résidu d'acide aminé variable unique qui détermine la spécificité de chaque module envers une base nucléotidique de la manière suivante :

AABase nucléotidique
I, S, T A
G, R T
D, T, * C
N, R G
où « * » signifie une absence d'acide aminé, et

(iii) ladite succession de modules est fusionnée à un domaine catalytique possédant une activité endonucléase sous une forme monomérique ou dimérique.


 
2. Polypeptide selon la revendication 1, dans lequel la séquence d'acides aminés dudit au moins un module est sélectionnée dans le groupe consistant en SEQ ID NO: 11 à 30.
 
3. Polypeptide selon la revendication 1 ou la revendication 2, dans lequel ladite séquence d'acide nucléique cible ne comprend pas de nucléotide thymidine à la position -1.
 
4. Polypeptide selon l'une quelconque des revendications 1 à 3, où ledit polypeptide comprend en outre un module qui présente au moins 80 % d'identité d'acides aminés avec une répétition AvrBs3 de SEQ ID NO: 10.
 
5. Polypeptide selon l'une quelconque des revendications 1 à 4, dans lequel ledit domaine catalytique provient de l'endonucléase FokI ou d'une endonucléase de ciblage (« homing »).
 
6. Polypeptide selon l'une quelconque des revendications 1 à 4, dans lequel ledit domaine catalytique provient de I-TevI.
 
7. Utilisation d'un polypeptide selon l'une quelconque des revendications 1 à 6, pour la transformation in vitro du matériel génétique d'une cellule au sein de ou en position adjacente à ladite séquence cible d'acide nucléique.
 
8. Polypeptide selon l'une quelconque des revendications 1 à 6, destiné à être utilisé en tant que médicament.
 
9. Procédé in vitro pour cibler le matériel génétique d'une cellule, comprenant :

(a) la mise à disposition d'une cellule, de préférence une cellule de mammifère ou une cellule végétale comprenant une séquence cible d'acide nucléique ; et

(b) l'introduction dans la cellule d'un polynucléotide selon l'une quelconque des revendications 1 à 8, et facultativement d'un acide nucléique exogène comprenant une séquence homologue à au moins une partie de la séquence cible d'acide nucléique, de sorte qu'une recombinaison homologue se produise entre ledit acide nucléique exogène et la séquence cible d'acide nucléique ;

dans lequel le polypeptide codé par ledit polynucléotide transforme le matériel génétique de la cellule au sein de ou en position adjacente à ladite séquence cible d'acide nucléique.
 




Drawing
















































































Cited references

REFERENCES CITED IN THE DESCRIPTION



This list of references cited by the applicant is for the reader's convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

Patent documents cited in the description




Non-patent literature cited in the description