X-ray crystallography is a viable technique to study biomolecular structures like nucleic acids. The accuracy of these structures depends on a complex set of factors. First, the purity of the sample that is used in the process. Except for a tRNA and ribosomal RNA obtaining purified nucleic acids sample, the enough quantity is one of the contributing factors. Secondly, the preparation of the diffracting crystals of highest possible resolution.Thirdly, the collection of the data, analysis and post processing. The quality of the measured data and the choice of mathematical treatment were both essential to derive accurate, convincing crystallographic structures that are biologically relevant. Finally, the biological relevance of the derived structural models. The Protein Data Bank offers also various validation tools with compelling metrics.
RNA–protein interactions are fundamental to core cellular processes. Recent technological advances have allowed genome-wide sequencing of multiple mRNA targets for RNA-binding proteins. Together with mRNA–protein binding data and variations in the specific city of “canonical” RNA binding domains, these results suggest that many mRNA-binding proteins recognize protein- specific c sets of mRNA sites. These sets could be considered a “mRNA recognition code” that define new interactions and function of mRNA-binding proteins. Many RNA-binding proteins recognize specifically or semi-specifically short RNA sequences. Therefore, biochemical and crystallographic studies of RNA-binding proteins would greatly benefit from the technological capability to synthesize short RNAs. Interactions with RNA for a number of important RNA-binding proteins depend on the specific readout of the phosphorylation state of the RNA termini. Certain proteins utilize triphosphorylated RNAs as reaction substrates.
The proposed approach for the synthesis of short oligoribonucleotides is based on the in vitro transcription with bacteriophage T7 DNA-dependent RNA polymerase. In vitro transcription with phage polymerases can produce both short and long RNAs and is currently the method of choice for making long (>100 nucleotides, nts) RNAs. Chemical synthesis is often used for preparing non-phosphorylated short and medium size (2–30 nts) RNAs. The chemical synthesis is relatively inexpensive for short nonphosphorylated RNAs.
The method takes into account two characteristics of the T7 RNA polymerase. First, the polymerase is able to transcribe in vitro from both supercoiled and linear DNA templates and is highly effective with double- and single-stranded synthetic templates containing a double-stranded T7 promoter. Therefore, a DNA template can be easily prepared by annealing two chemically synthesized DNA oligonucleotides, which reconstitutes the T7 promoter, followed by the target RNA sequence. Second, a T7 polymerase can prematurely terminate transcription in the presence of an incomplete set of NTPs.
The method was developed for biochemical and crystallographic characterization of Escherichia coli RppH, the enzyme responsible for initiation of the 5′-dependent mRNA decays by cleaving off a pyrophosphate moiety from the 5′ end of natural triphosphorylated mRNAs. The technique was optimized for the synthesis of the 2–5 nucleotide long RNAs with natural or modified triphosphorylated as well as diphosphorylated 5′ ends. Transcription reactions described here are typically carried out at the semi-preparative scale which can yield sufficient amount of RNA for successful applications in both biochemical and crystallographic studies of proteins and other macromolecules.
Crystallographic Data and Model Quality
In the last decades, crystallography has been highly successful in delivering structural information about proteins, DNA, and RNA, the substrates of life on earth. The resolution of the method is good enough to discern the three-dimensional structure of these macromolecules at the atomic level, which is essential to understand their diverse properties, functions, and interactions. However, although it is easy to calculate the diffraction pattern for a given structure, the reverse task of deriving a molecular structure from just a single set of unique diffracted intensities is difficult, as the mathematical operation governing the former direction cannot be inverted in a unique way. To be solved experimentally, this “inverse problem,” or more specifically “phase problem,” requires more than just a single set of unique diffraction data. High quality of the data is a requirement for the experimental solution of the problem, but also for the refinement of the macromolecular structure, as discussed in Subheading.
The correct biological interpretation requires the best possible model of the macromolecule. To obtain the best model, every step of the structure determination procedure has to be performed in a close-to-optimal way. This means that the purification of the macromolecule, its crystallization, crystal handling, measurement of diffraction data, processing of the resulting datasets, and downstream steps such as structure solution, refinement, and validation each constitute scientific tasks that deserve specific attention, and have been undergoing continuous enhancements throughout the history of macromolecular crystallography. Two kinds of numerical data are the result of a crystallographic experiment and usually deposited as such in the Protein Data Bank: the diffraction intensities as a reduced representation of the diffraction experiment, and the atomic coordinates resulting from the visual inspection and interpretation of electron density maps, and subsequent refinement. The third kind of numerical data, the raw data (frames) obtained in the diffraction experiment, have so far not been usually deposited in long-term archives, mainly due to (disk) space concerns. This is unfortunate since archiving of raw data would enable reprocessing of incorrectly processed data as well as enable and taking advantage of future improvements in methodology, like extracting the diffuse scattering information. The discussion focuses on data that correspond to a single atomic model. This rules out all the complications that arise from merging of non-isomorphous datasets, where each individual dataset corresponds to a different model—in this situation, a merged dataset would represent something like an average model, which
violates the physicochemical requirements, and may not be biologically meaningful. This chapter first presents the principles and concepts that need to be understood in the context of the rather broadly used term “data quality”; similar presentations may be found for example in refs. [1–3]. Second, the application of these principles to data processing with the XDS program package [4, 5], which the author is most familiar with, is explained. Third, data and atomic model are related in a graphical way, which allows some important and non-trivial conclusions to be drawn about how the former influence the latter.
Errors and Crystallographic Indicators
The goal of a crystallographic experiment is to obtain accurate intensities (I) for as many Bragg reflections as possible. Two kinds of errors, random and systematic error, which exist in any experiment. A major difference between them is that the relative error arising from the random component decreases with increasing intensity, whereas the relative error in intensity Kay Diederichs from the systematic component is (at least on average) constant, often in the range between 1 and 10 %. A more specific description of systematic error would thus be “fractional error,” but this name is not in common use. Nonlinear errors also exist, but play a minor role. A well-designed crystallographic experiment has to strike an appropriate compromise between the two kinds of error. For example, a reduction of random error (see below) can be obtained by longer or stronger exposure of the crystal, but this will inevitably increase the systematic error from radiation damage to the crystal. Ideally, the sum of both errors should be minimal, and programs, e.g., “BEST” , exist that suggest a compromise, in the form of a proposed “strategy” for the experiment. Fortunately, the gradient of the sum of both errors is close to zero at and near the optimal strategy, which means that small deviations from the optimal strategy do not substantially decrease data quality. The discussion of errors has to take the distinction between precision and accuracy into account. The term “precision” refers to the reproducibility of an experiment, and to the internal consistency or relative deviation of the values obtained. For example, if the number e = 2.718… should be determined in an experiment, and two measurements would yield the values 3.217 and 3.219, then these measurements are considered precise because they agree well with each other—their relative deviation is small. However, they are not close to the true value—the error (or inaccuracy) in their measurement amounts to about 0.5. The term “accuracy,” on the other hand, refers to the deviation of measured values from the true values. In this example, if two measurements would yield the values 2.6 and 2.8, then the results from this experiment are more accurate than that from the previously mentioned experiment, although they are not as precise. Optimizing an experiment for precision alone, therefore, does not ensure accuracy; rather, equating accuracy with precision also requires the absence of any kind of error that has not been taken into account in the precision estimate. To estimate accuracy, we thus need to quantify both the precision of the data, and the undetected error (which usually requires some knowledge about the true value obtained by other means). If both can be quantified, we can estimate the accuracy as the absolute or relative error of a measurement.
The crystallographic experiment measures the number of photons contributing to each detector pixel. These photons arise from Bragg reflections, but also from background scatter. The number of photons in each pixel is subject to random fluctuations. These are due to the quantum nature of photons; there exists a certain probability of emission of a given photon by the crystal into a given
Data and Model Quality pixel in a unit of time, and each photon’s emission into that pixel is independent of that of other photons. As a result, photon counts are governed by Poisson (counting) statistics, which mathematically means that the variance of the photon number is equal to the photon number itself. Furthermore, a CCD detector may contribute a random component (“read-out noise”) to the total photon count (pixel detectors are almost noise-free), which is also due to quantum fluctuations in the detector hardware and may be considered as additional background.
Absorption differences due to crystal shape and mounting, shutter synchronization problems, imperfect detector calibration and inhomogeneity of detector sensitivity, shadowed parts of the detector, nonlinear or overloaded detector, vibrations for example due to the cryo stream or fluctuations of the primary X-ray beam, imperfect or inaccurate assumptions about geometric parameters and computational models applied in the data processing step, and to other problems that may be significant for a given experiment.
Systematic error may appear to be random if its cause is unknown or cannot be fully described or modeled, but contrary to a random counting error, the change of a reflection’s intensity is usually proportional to the intensity itself— thus the term “fractional error.” However, many kinds of systematic errors in a crystallographic experiment at least partially cancel out if multiple measurements are averaged. Examples are beam instability, shutter problems, and most aspects of detector non-ideality, except those that result in the nonlinear response (e.g., overload). If all or most observations of a unique reflection are systematically affected in the same or a similar way, their systematic errors are not independent, and averaging may not necessarily decrease the systematic difference between true and estimated intensity (the accuracy). Known or well-understood effects may often be modeled by analytical or empirical formulas. If a model for the specific error type is available and appropriate, the systematic difference is accounted for, and any remaining difference between intensities may become a useful signal. In this way, a systematic effect may become a part of an extended description of the experiment, and does no longer contribute to the experimental error.
An example for this is absorption by the crystal and its environment (loop, mother liquor)—if it can be properly modeled, its influence is compensated. However, in low-symmetry space groups, all symmetry-related reflections may systematically be weakened or strengthened in the same way. Since only those systematic errors that lead to systematic differences can be corrected, no information about the proper absorption correction is available in this case. Therefore, at least one additional dataset should be measured in a different orientation of the crystal. The systematic absorption difference between the two resulting data sets may then be detected and corrected in software. It should be noted that even if absorption is not corrected in the data processing stage, it can be approximately compensated by an overall anisotropic overall displacement parameter in the refinement stage. This parameter then should not be interpreted as its name suggests, but rather as a compensation factor for an experimental property.
Importantly, for strong reflections (low resolution), systematic error is usually higher than random error; the converse is true for weak reflections (high resolution), where the signal-to-noise ratio is usually dominated by the random error term. However, radiation damage, the most devastating kind of systematic error, is an exception to this rule. Radiation damage, which changes (and ultimately destroys) the structure of the macromolecule during the measurement, induces a systematic error that is not mitigated by averaging of multiple observations, because it results in intensity measurements that do not scatter around a true value, but rather, with increasing dose, deviate further and further from the true value—the intensity at the beginning of the experiment. The detrimental influence of radiation damage has to be avoided to a degree that depends on the kind of experiment, and its desired goal. In recent years, there has been some progress in describing the relation between dose and its footprint on the macromolecule. Furthermore, the influence of radiation damage may be partially compensated by zero-dose extrapolation, a computational technique. However, it should be noted that the relative change of intensities by radiation damage is biggest at high resolution, where the signal may be so weak (i.e., the individual measurements so imprecise) that zero-dose extrapolation becomes inaccurate.
- Chemically synthesized, deprotected, and desalted DNA oligonucleotides at 100 pmol/μL concentration: “TopA,” a top strand of the T7 Class II ϕ2.5 promoter, 5′-TAATACGACTCACTATT; “TopB” a top strand of the Class III promoter, 5′-TAATACGACTCACTATA; “Bottom” (BotA and BotB types) oligonucleotides containing a complementary or bottom strand of the T7 promoter with an RNA coding sequence depicted by a stretch of Ns, 3′-ATTATGCTGAGTGATAANNNNN for Class II ϕ2.5 promoter (BotA oligonucleotides), or 3′-ATTATGCT GAGTGATATNNNNN for Class III promoter (BotB oligonucleotides).
- Diethyl pyrocarbonate (DEPC)-treated water.
- TE buffer: 10 mM Tris–HCl, pH 8.0, 1 mM ethylenediaminetetraacetic acid (EDTA).
- 0.2 mL PCR tubes.
- Thermal cycler.
In Vitro Transcription
- Transcription mixture components: 1 M Tris–HCl, pH 8.0, 1 M dithiothreitol (DTT), 250 mM spermidine–HCl, 100 mM solutions of each ribonucleotide triphosphate (ATP, GTP, CTP, and UTP), 1 M MgCl 2 , and T7 RNA polymerase (6 mg/mL).
- 0.5 M EDTA.
- Water bath.
- Bench top centrifuge for 1.5 mL Eppendorf and 50 mL tubes.
- 5 mL HiTrap Q HP column.
- Mono Q 5/50 (1 mL) column.
- Purifier chromatography system.
- Column cleaning solutions: 0.5 M NaOH and 2 M NaCl.
- Buffer A: 20 mM Tris–HCl, pH 8.0.
- Buffer B: 20 mM Tris–HCl, pH 8.0, 1 M NaCl.
- 100 % ethanol.
- 80 % (v/v) ethanol.
- 5 M NaCl.
- 0.22 μm syringe filters.
- 10 mL syringes.
DNA templates for T7 RNA polymerase are designed to contain a double-stranded 17 base pair T7 promoter with a downstream single-stranded extension encoding for the target oligoribonucleotide. The coding region typically contains a 2–5 nt sequence complementary to the target RNA, followed by a “stalling” nucleotide absent in the coding region. Omission of the complementary nucleotide for the stalling nucleotide from the transcription mixture causes T7 RNA polymerase to stall and abort transcription at this site.
RNA Nucleotide Preparation
DNA Template Preparation
- Mix TopA and one of the BotA oligonucleotides diluted in water at the final concentration 10 μM in 100 μL volume. For example, to synthesize dinucleotide pppApG, mix TopA and BotA1 DNA oligonucleotides.
- Anneal oligonucleotides to form a DNA duplex in 0.2 mL PCR tube in a thermal cycler by heating at 98 °C for 2 min, cooling to 37 °C at 1 °C/s, and incubating samples at 37 °C for 10 min.
In Vitro Transcription
- A typical transcription reaction is carried out in 1 mL volume of the mixture that contains 100 mM Tris–HCl, pH 8.0, 40 mM DTT, 2 mM spermidine, 15–16 mM total NTPs, 1 μM DNA template, 20 mM MgCl 2 , and 50 μg/mL of T7 RNA polymerase. The mixture should be preheated at 37 °C prior to the addition of the DNA template and polymerase.
- Transcription of each DNA template should be routinely optimized for template and polymerase.
- To obtain RNA with a modification in the 5′-triphosphate moiety, transcription should be carried out with the corresponding modified precursor (NMPcPP, NMPPcP, etc). Make sure that T7 RNA polymerase can initiate transcription from a nucleotide analog.
- Incubate reactions at 37 °C for 2–4 h.
- Reactions should get cloudy from the precipitation of magnesium pyrophosphate.
- After incubation, add 100 μL of 0.5 M EDTA and dissolve the precipitate by vigorous vortexing.
- At this point, transcription mixtures can be frozen and stored at −20 °C prior to purification.
- Thaw reactions, dilute them with 5 mL Buffer A, and filter through 0.22 μm syringe filter.
- Separate the desired oligoribonucleotide from unincorporated NTPs and other reaction products using anion exchange chromatography on 5 mL HiTrap Q HP column. Load an RNA sample on the column pre-equilibrated with 25 mL Buffer A. Wash away unbound material with 20 mL Buffer A. Increase concentration of Buffer B to 10 % and continue washing the column with another 20 mL of buffer to remove weakly bound nucleotides and reaction products. Elute an RNA oligonucleotide by a 100-mL linear 10–30 % gradient of Buffer B. Collect 2 mL fractions. Remove DNA template from the column with a step elution using 15 mL of 100 % Buffer B. Keep flow rate at 1 mL/min during all chromatographic steps. Monitor elution at 254 nm. Before purification of another RNA oligonucleotide, clean the column with 0.5 M NaOH and 2 M NaCl.
- The target RNA oligonucleotide is usually eluted as the largest peak during the gradient step. Combine fractions containing the RNA in 50 mL tube. Add 5 M NaCl to the final concentration of 0.3 M. Add 3 volumes of ethanol and mix the solution well. Incubate at −20 °C overnight.
- Collect precipitated RNA by centrifugation at ≥12,000 × g for 30 min at +4 °C. Discard the supernatant. Wash pellet with 80 % ethanol, dry under vacuum, and dissolve in 100 μL of DEPC-treated water. Estimate concentration of the oligonucleotide spectrophotometrically by measuring absorbance at 260 nm. Typical yields for dinucleotides and trinucleotides are 1–5 μmol per 1 mL of transcription mixture, accounting for incorporation of up to 60 % of NTPs added to the reaction.
- To check the purity of the oligoribonucleotide, perform analytical anion-exchange chromatography on Mono Q column. Dilute an RNA oligonucleotide (5–25 nmol) in 500 μL of Buffer A. Load the sample onto the Mono Q column at 1 mL/min. Elute RNA with 20 mL linear 0–50 % gradient of Buffer B. Typically, purified RNA yields a single peak.