Chapter 3: Indexing, Integration and Scaling the Data


Step 15: Output Plots and Error Scale Factor


Step 15: Output Plots and Error Scale Factor

When the initial round of scaling is finished, examine the output plots and adjust the error scale factor to bring the overall χ2  down to around 1. Delete the reject file by clicking on the button in the lower Controls panel, and then click scale sets again (Figure 48). Repeat these steps until the overall χ2  is acceptable, then proceed to subsequent rounds of scaling by activating Use Rejections on Next Run and scale again. Repeat the cycle of scaling and rejecting until the number of new rejections has declined to just a few. At this point the scaling is completed.


Figure 48. The scaling Controls panel

After integration some frames may appear to have strong outliers in the integration information plots like unacceptable high χ2  – much above the “average” value. You can exclude those frames from scaling by clicking on the exclude frames button. The table with the list of frames will show up (Figure 49). Now you can mark the frames that should be excluded and close the table.


Figure 49. Frames selected to be excluded from scaling are marked red 

What is the Error Scale Factor and which way should I adjust it?

The error scale factor is a single multiplicative factor which is applied to the input σI. This should be adjusted so the normal χ2 (goodness of fit) value that is printed in the final table of the output comes close to 1. By default the input errors are used (error scale factor = 1.3). It applies to the data, which are read after this keyword, so you can apply different error scale factor to subsequent batches by repeating this input with different values. The default value can be adjusted upwards if the overall χ2 is greater than 1.

What are reasonable values for the error scale factor and what if I need really high ones?

Reasonable values are between 1 and 2. The default is 1.3. If you need to use values much greater than this, say, 5, there is likely a problem.

What should I be looking for in the charts that show up after scaling?

The first chart to examine is the one that shows the ratio of the intensity to the error of the intensity, i.e. I/σ as a function of resolution (Figure 50). From this you can find the 2 σ cutoff and adjust your resolution limits accordingly.

The second chart described the completeness of the data for different resolution limits. The blue line represents all reflections, the purple – reflections with I/σ greater then 3, the red reflections with I/σ greater then 5, the orange reflections with I/σ greater then 10, the yellow reflections with I/σ greater then 20 (Figure 51).

The next chart is scale and B versus frame (Figure 52). 

Figure 50. The output plot: I/Sigma vs. Resolution

Figure 51. The output plot: Completeness vs. Resolution

Figure 52. The output plot: Scale and B vs. Frame 

How many reflections should I be rejecting?

Ideally you would not reject any reflections. However, in practice even excellent data sets have a few outliers that need to be rejected. Generally if fewer than 0.5% of the reflections are rejected this is considered normal. Rejections significantly greater than 1% usually indicate some problem with the indexing, integration, space group assignment, or the data itself (e.g. spindle or shutter problems).

What’s the difference between a reflection being marked for rejection and actually rejected?

The table that appears at the top of the page after each round of scaling has two categories: Reflections Marked for Rejection, and Reflections Rejected (Figure 53). The Reflections Marked for Rejection are those that have been written to the reject file. The Reflections Rejected represent those reflections, which have been omitted from the data in the latest round of scaling. Rejection of reflections is an iterative process that should converge after one or two rounds of rejection and refinement. For example, on the first round of scaling, 100 reflections may be flagged for rejection and written to the reject file. If Use Rejections on Next Run is checked, then the reflections in the reject file will be removed from the set of reflection data on the next round of scaling. Having done this, other reflections may now be flagged as outliers, and written to the end of the reject file. Perhaps there are 20 more this time. Repeating this procedure should lead to a convergence, where no more reflections are being rejected. Then you are done. 

Figure 53. Info table after first round of scaling

How do I treat data that contains an anomalous signal?

To detect an anomalous signal is very easy. In the first step the data are scaled and postrefined normally, with one exception, namely that the anomalous flag has been set. This tells the program to output a .sca file where the I+ and I- reflections are separate. In this case Scalepack will treat the I+ data and the I- data as two separate measurements within a data set and the statistics that result from merging the two will reflect the differences between the I+ reflections and the I- reflections. Notice that in HKL-2000 there is no need to go through a lot of jiffies, which separate the I+ and I- data and reformat them, etc. Obviously, for a centric reflection there is no I-, so the merging statistics will only reflect the non-centric reflections. You can tell what percentage of your data is being used to calculate the merging statistics by examining the redundancy table near the end of the log file. Under the column of redundancy > 2 you will find out what percentage of the data is being compared. Since you only have I+ and I- you will never have a redundancy of more than 2.

The presence of an anomalous signal is detected by examining the χ2 values on the Figure 54. Assuming that the errors scale factors were reasonable, and there is no useful anomalous signal in your data, curves showing the χ2 resolution dependence should be flat and » 1 for scaling with merged and un-merged Friedel pairs. On the other hand, if χ2 > 1 and you see the clear resolution dependence of the χ2 for scaling with merged Friedel pairs, there is a strong indication of the presence of an anomalous signal. The resolution dependence allows you to cut off your resolution for the purposes of calculating an anomalous difference Patterson map. This whole analysis assumes that the error model is reasonable and gives you a χ2 close to 1 when the anomalous signal is not present.


Figure 54. Anomalous signal detection

*   You can only use scale anomalous when you have enough redundancy to treat the F+ and F- completely independently. 

Space Group Identification: How do I do this?

Scalepack can be used to determine the space group of your crystal. What follows is a description of how you would continue from the lattice type given by Denzo to determine your space group. This whole analysis, of course, only applies to enantiomorphic compounds, like proteins. It does not apply to small molecules, necessarily, which may crystallize in centrosymmetric space groups. If you expect a centrosymmetric space group, you use any space group which is a subgroup of the Laue class to which your crystal belongs. You also need enough data for this analysis to work so that you can see systematic absences.

To determine your space group, follow these steps:

1.  Determine the lattice type in Denzo.

2.  Scale by the primary space group in Scalepack. The primary space groups are the first space groups in each Bravais lattice type in the table that follows this discussion. In the absence of lattice pseudosymmetries (e.g. monoclinic with β » 9) the primary space group will not incorrectly relate symmetry related reflections. Note the χ2 statistics. Now try a higher symmetry space group (next down the list) and repeat the scaling, keeping everything else the same. If the χ2 is about the same, then you know that this is OK, and you can continue. If the χ2 are much worse, then you know that this is the wrong space group, and the previous choice was your space group. The exception is primitive hexagonal, where you should try P61 after failing P3121 and P3112.

3.  Examine the bottom of the log file or simulated reciprocal lattice picture for the systematic absences. If this was the correct space group, all of these reflections should be absent and their values very small. Compare this list with the listing of reflection conditions by each of the candidate space groups. The set of absences seen in your data which corresponds to the absences characteristic of the listed space groups identifies your space group or pair of space groups. Note that you cannot do any better than this (i.e. get the handedness of screw axes) without phase information.

4.  If it turns out that your space group is orthorhombic and contains one or two screw axes, you may need to reindex to align the screw axes with the standard definition. If you have one screw axis, your space group is P2221, with the screw axis along c. If you have two screw axes, then your space group is P21212, with the screw axes along a and b. If the Denzo indexing is not the same as these, then you should reindex using the reindex button.

5.  So far, this is the way to index according to the conventions of the International Tables. If you prefer to use a private convention, you may have to work out own transformations. One such transformation has been provided in the case of space groups P2 and P21.

Bravais Lattice

Primary assigned Space Groups


Reflection Conditions along screw axes

Primitive Cubic


195   P23




198   P213




207   P432




208   P4232




212   P4332




213   P4132


I Centered Cubic


197   I23




199   I213




211   I432




214   I4132


F Centered Cubic


196   F23




209   F432




210   F4132


Primitive Rhombohedral


146   R3




155   R32


Primitive Hexagonal


143   P3




144   P31




145   P32




149   P312




151   P3112




153   P3212




150   P321




152   P3121




154   P3221




168   P6




169   P61




170   P65




171   P62




172   P64




173   P63




177   P622




178   P6122




179   P6522




180   P6222




181   P6422




182   P6322


Primitive Tetragonal


 75   P4




 76   P41




 77   P42




 78   P43




 89   P422




 90   P4212




 91   P4122




 95   P4322




 93   P4222




 94   P42212




 92   P41212




 96   P43212


I Centered Tetragonal


 79   I4




 80   I41




 97   I422




 98   I4122


Primitive Orthorhombic


 16   P222




 17   P2221




 18   P21212




 19   P212121


C Centered Orthorhombic


 20   C2221




 21   C222


I Centered Orthorhombic


 23   I222




 24   I212121


F Centered Orthorhombic


 22   F222


Primitive Monoclinic


  3   P2




  4   P21


C Centered Monoclinic


  5   C2


Primitive Triclinic


  1   P1


 *   Note that for the pairs of similar candidate space groups followed by the * (or **) symbol, scaling and merging of diffraction intensities cannot resolve which member of the possible pair of space groups your crystal form belongs to.

How to start scaling from .x files without re-integrating diffraction images?

This is easy. In the main HKL-2000 window, set up the Output Data Dir to be the directory where your .x files are located. It is not necessary to set the Raw Data Dir. Next, click the Scale Sets Only button followed by load data sets. You will see a list of .x files in the little dialog box. Select the set you want to scale, followed by OK. If you want to scale additional sets of .x files, go through and find these too and add them to the list. Make sure the scale button is selected for each set you want to scale together. For example, if you want to scale two sets of .x files together (say, from two different crystals, or from high and low resolution passes of data collection, or from a native and a derivative, etc.) make sure that scale is clicked for each one. If you don’t want them to be scaled together, then make sure that scale is not clicked. You don’t have to worry about the Image Display or the Experiment Geometry, since this was done when you first generated the .x files. 

Now go to the Scaling page. You should see the set or sets of .x files listed in the Pending Sets list. These are the frames that will all be scaled together. Delete the reject file by clicking on the delete reject file button. Then scale away!


Reindexing involves reassigning indices from one unit cell axis to another. This becomes an important issue when comparing two or more data sets that were collected and processed independently. This is because Denzo, when confronted with a choice of more than one possible indexing convention, makes a random choice. This is no problem, except that if it makes a different choice for a second data set, the two will not be comparable without reindexing procedure. One cannot distinguish non-equivalent alternatives without scaling the data, which is why this is not done in Denzo. You can tell if you need to reindex a data set if the χ2 values upon merging the two are very high (e.g. 50). This makes sense when you consider that scaling two or more data sets involves comparing reflections with the same hkl or index. If the two indexing schemes are equivalent but not identical, chaos will result.

No reindexing, no new autoindexing, and nothing except changing the sign of y scale in Denzo can change the sign of the anomalous signal. At the moment reindexing does not work in HKL-2000. To reindex you’ll have to follow the scenarios listed in the HKL manual.

Statistics and Scalepack.

The quality of X-ray data is initially assessed by statistics. In small molecule crystallography there is almost always a large excess of strong data, so this allows the crystallographer to discard a substantial amount of suspect data and still accurately determine a structure. Compared to small molecules, however, proteins diffract poorly. Moreover, important phase information comes from weak differences and we must be sure these differences do not arise from the noise caused by the limitations of the X-ray exposure and detection apparatus. As a result, we cannot simply throw away or statistically down-weight marginal data without first making a sophisticated judgment about which data is good and which is bad.

To accurately describe the structure of a protein molecule, quite often we need higher resolution data than the crystal provides. That is life. One of the main judgments the crystallographer makes in assessing the quality of his data is thus the resolution to which his crystal diffracts. In making this judgment, we wish to use the statistical criteria which are most discriminatory and which are the least subjective. In practice, there are two ways of assessing the high resolution limit of diffraction. The first is the ratio of the intensity to the error of the intensity, i.e. I/σ. The second way, which is traditional, but inferior, is the agreement between symmetry related reflections, i.e. Rmerge.

From a statistical point of view, I/σ is a superior criterion, for two reasons. First, it defines a resolution “limit” since by definition I/σ is the signal to noise of your measurements. In contrast, Rmerge is un-weighted statistics that does not take into account the measurement error.

Second, the σ assigned to each intensity derives its validity from the χ2’s, which represent the weighted ratio of the difference between the observed and average value of I, <I>, squared, divided by the square of the error model, the whole thing times a factor correcting for the correlation between I and <I>. Since it depends on an explicit declaration of the expected error in the measurement, the user of the program is part of the Bayesian reasoning process behind the error estimation.

The essence of Bayesian reasoning in Scalepack is that you bring χ2 (or technically speaking, the goodness-of-fit, which is related to the total χ2 by a constant) close to 1.0 by manipulating the parameters of the error model. Rmerge, on the other hand, is an unweighted statistic which is independent of the error model. It is sensitive to both intentional and unintentional manipulation of the data used to calculate it, and may not correlate with the quality of the data. An example of this is seen when collecting more and more data from the same crystal. As the redundancy goes up, the final averaged data quality definitely improves, yet the Rmerge also goes up. As a result, Rmerge is only really useful when comparing data which has been accumulated and treated the same. This will be discussed again later.

In short, I/σ is the preferred way of assessing the quality of diffraction data because it derives its validity from the χ2 (likelihood) analysis. Unless all of the explicit and implicit assumptions which have been made in the calculation of an Rmerge are known, this criterion is less meaningful. This is particularly true when searching for a “number” which can be used by others to critically evaluate your work.

There are two modes of analysis of data using χ2s. The first mode keeps the χ2 (or more precisely, the goodness-of-fit) constant and compares the error models. Basically, this means that you are adjusting your estimates of the errors associated with the measurement until the deviations within observations agree with the expectation based on the error model.

The second mode keeps the error model constant and compares χ2s. This mode is computationally much faster and is used in refinement procedures. Of the two modes, the first is more informative, because it forces you to consider changes in the error model. Which mode you use generally depends on what you are comparing. When assessing the general quality of your detector, the first mode is used. When comparing a derivative to a native, the second mode is used due to an incomplete error model which does not take into account important factors like non-isomorphism. Thus, the χ2 of scaling between a native and a derivative provides a measure of non-isomorphism, assuming the detector error is accurately modeled for both samples.

Rmerge was historically used as an estimate of the non-isomorphism of data collected on film using several different crystals, and for this purpose it still has some validity because we do not account for non-isomorphism in our error model. It is not so important now that complete X-ray data sets are collected from single, frozen crystals.

One of the drawbacks of using Rmerge as a measure of the quality of a data set is that it can be intentionally and unintentionally manipulated. Unintentional factors which can artificially lower Rmerge generally have the effect of reducing the redundancy of the data or eliminating weaker observations. In crystallography, the greater the redundancy of the data is the worse is the Rmerge, because of the correlation between I and <I> which reduces the Rmerge. The greater the redundancy is the lower is the correlation. For two measurements with the same s, the correlation is 50%, so Rmerge is underestimated by 2 compared to the case of no correlation. Known unintentional factors which lower Rmerge include the following:

  1. Data collected so that lower resolution shells, where the data is strong, have a higher redundancy than the higher resolution shells, where the data is generally weaker. This can be accomplished by collecting data on detectors where 2q ¹ 0, or including data from the corners of rectangular or square image plates. There is nothing wrong with using this data; it will just artificially lower the Rmerge.

  2. Inclusion of single measurements in the calculation of Rmerge in one widely used program, which is why a table using this erroneous calculation used to be presented in the Scalepack output. Although the bug in the widely used program was unintentional, it nonetheless reduced the Rmerge and this may have accounted for its longevity. A second, more subtle bug that reduced Rmerge prompted the introduction of the keyword background correction. Fortunately, both bugs have now been fixed, but the point is that errors of this type can persist.

  3. Omission of negative or weak reflections from the calculation of Rmerge. This is often undocumented behavior of crystallographic data scaling/merging software. Examples include:

a) elimination from the Rmerge calculation the reflections that have negative intensities.

b) conversion of I < 0 to I = 0 before the calculation of Rmerge and inclusion of this reflection in the data set (the statistics of such a type are included in the Scalepack output for reasons of comparison. This is the first Rmerge table in the log file, not the final one).

c) omitting reflections with <I> < 0 from the calculation of Rmerge but inclusion of these reflections in the output data set. Default σ cutoffs set to unreasonable values, like 1. This is in fact the default of the software used commonly to process image plate data.

  1. Use of unreasonable default/recommended rejection criteria in other programs. These eliminate individual I’s which should contribute to Rmerge and yet are still statistically sound measurements.

  2. Use of the eigenvalue filter to determine the overall B factor of a data set collected on a non-frozen, decaying crystal. In this case, the eigenvalue filter will calculate an overall B factor which is appropriate for the middle of the data set, yet apply this to all data. As a result, the high resolution data will be down weighted compared to data processed with the first, least decayed frame as the reference. The high resolution data is generally weaker than the low resolution data, and as a result is more likely to result in higher Rmerge. By down-weighting the high resolution data, the Rmerge is artificially lowered. Any program which does not allow the option of setting the reference frame will have this problem. Of course, there is no problem with non-decaying crystals.

There are also intentional ways of lowering your Rmerge. Like those ways listed above, they generally result from the statistically invalid elimination of weak reflections, reduction of the redundancy of the data, or de-emphasis of weak data. The difference between these methods and those listed above is that they are generally under the control of the user.

  1. Use of an unreasonable sigma cutoff (e.g. 0). The rejection of weak data will always improve Rmerge. There is a further discussion of sigma cutoff in the Scalepack Keywords Descriptions section.

  2. Use of a resolution limit cutoff. Again, the omission of weak data will improve Rmerge. A reasonable resolution cutoff is the zone where I/σ < 2.

  3. Combining into a single zone for the purposes of calculations those resolution shells where Rmerge is rapidly changing. In this case, the shell will be dominated by the strong data at the low resolution end of the zone and give the impression that the high resolution limit of the zone has better statistics than it really does. For example, if you combined all your data into a single zone, the Rmerge in the final shell would be pretty good (=Rmerge overall), when in fact it was substantially worse. It is more sensible to divide your zones into equal volumes and have enough of them so that you can accurately monitor the decay with resolution.

  4. Omitting partially recorded reflections. This has the effect of a) reducing the redundancy, and b) eliminating poorer reflections. Partially recorded reflections will always have a higher σ associated with them because they have a higher total background, due to the inclusion of background from more than one frame in the reflection.

  5. Scaling I+ and I- reflections separately in the absence of a legitimate anomalous signal (scale anomalous). This has the effect of reducing the redundancy.

  6. Ignoring overloaded reflections using the ignore overloads in Scalepack. The intensity of overloaded or saturated, reflections cannot be directly measured, because obviously some of the pixels are saturated. Profile fitting only measures these reflections indirectly, by fitting a calculated profile to the spot using the information contained in the wings or tail of the spot. Ignoring the inaccuracies inherent in this method by ignoring overloads may have a dramatic effect on Rmerge. Note that in case of molecular replacement the option include overloads should be used.

 *  ignore overloads is often a useful tool, however. For example, when calculating anomalous differences you do not want to use overloaded reflections because you are looking for very, very small differences and want to use only the most accurate data. Another time you might ignore overloads is when you collect multipass data. In this case, a crystal is exposed twice, once for a short time, the other for a longer time. The longer exposure is to sufficiently darken the high resolution reflections, but will result in saturated low resolution reflections. Since the low resolution reflections can be obtained from the short exposures, the overloaded ones can be ignored in the long exposures.

See: Press, William H., Teukolsky Saul A., Vettering William T., Flannery Brian P. “Numerical Recipes in C, The Art of Scientific Computing”, Second Edition, Cambridge University Press, 1992

The Resolution of Diffraction Data.

A statistically sensible cutoff for the resolution of a diffraction data set is that shell where the average I/σ is 2 after correctly integrating and scaling the data. The resolution of the data is a distinct criterion from its completeness. The best data will be 100% complete. Completeness may suffer due to anisotropic diffraction, overlapped reflections, or geometrical constraints (data from the corners of the detector was used, beam stop or cooling nozzle shadows, etc.). In order to properly estimate resolution of the data you have to take into account: I/σ, Rsym and data completeness.

Understanding the Scalepack log file scale.log.

The Scalepack log file, scale.log, can be examined to see if the data scaling went well. It is long, since it contains the list of all .x files read in and a record of every cycle of scaling. The log file is divided into several sections. In order these are:

1.      the list of .x files read in, and the list of reflections from each of these files that will be rejected

2.      the output file name

3.      the total number of raw reflections used before combining partials

4.      the initial scale and B factors for each frame, goniostat parameters, and the space group

5.      the initial scaling procedure.

The log file should be examined after each iteration. In particular, the errors, χ2  and both R factors should be checked.





Table of Contents


Appendix: Installation Notes