On the Q statistic with constant weights for standardized mean difference

Cochran’s Q statistic is routinely used for testing heterogeneity in meta-analysis. Its expected value is also used in several popular estimators of the between-study variance, τ 2 . Those applications generally have not considered the implications of its use of estimated variances in the inverse-variance weights. Importantly, those weights make approximating the distribution of Q (more explicitly, Q IV ) rather complicated. As an alternative, we investigate a new Q statistic, Q F , whose constant weights use only the studies’ effective sample sizes. For the standardized mean difference as the measure of effect, we study, by simulation, approximations to distributions of Q IV and Q F , as the basis for tests of heterogeneity and for new point and interval estimators of τ 2 . These include newDerSimonian – Kacker-typemoment estimators based onthe ﬁrst momentof Q F , and novel median-unbiased estimators. The results show that: an approximation based on an algorithm of Farebrother follows both the null and the alternative distributions of Q F reasonably well, whereas the usual chi-squared approximation for the null distribution of Q IV and the Biggerstaff – Jackson approximation to its alternative distribution are poor; in estimating τ 2 , our moment estimator based on Q F is almost unbiased, the Mandel – Paule estimator has some negative bias in some situations, and the DerSimonian – Laird and restricted maximum likelihood estimators have considerable negative bias; and all 95% interval estimators have coverage that is too high when τ 2 ¼ 0, but otherwise the Q - proﬁle interval performs very well.


Introduction
When the individual studies assembled for a meta-analysis report means for their treatment and control arms, but those data are on different scales or come from different instruments, the customary measure of effect is the standardized mean difference (SMD).The SMD is considered to be the most appropriate effect-size index in psychological research (Sánchez-Meca & Marı ´n-Martı ´nez, 2010), and was also found to be more generalizable than the mean difference (Takeshima et al., 2014).In studying estimation of the overall effect in random-effects meta-analyses of SMD, we found that SSW, a weighted mean whose weights involve only the studies' arm-level sample sizes, performed well, avoiding shortcomings associated with estimators that use inversevariance weights based on estimated variances (Bakbergenuly, Hoaglin, & Kulinskaya, 2020).
The present paper takes a natural further step by investigating a version of Cochran's Q statistic (Cochran, 1954) for assessment of heterogeneity that uses those constant weights.This work also draws on favourable results for Q with sample-size-based weights when the measure of effect is the mean difference (MD), which is less common but more tractable (Kulinskaya, Hoaglin, Bakbergenuly, & Newman, 2021).From this version of the Q statistic we also derive new point and interval estimators of the between-study variance, τ 2 .
Simulation of the actual distribution of Q enables us to study the accuracy of approximations for the null distribution (τ 2 ¼ 0), the empirical level when τ 2 ¼ 0 and when τ 2 > 0, the bias of point estimators of τ 2 , and the coverage of confidence intervals for τ 2 .For comparison we include the usual version of Q (based on inverse-variance weights) and familiar point and interval estimators of τ 2 .
Section 2 briefly reviews study-level estimation of SMD.Section 3 reviews the randomeffects model (REM) and describes the Q statistic.Section 4 introduces new point and interval estimators for τ 2 .Section 5 discusses approximations to the distribution of Q. Section 6 describes the simulation design and summarizes the results.Section 7 provides an example of meta-analysis using SMD.Section 8 offers a summary and discussion.An Appendix gives the derivation of conditional and unconditional moments of Hedges's estimator of study-level SMD.

Study-level estimation of standardized mean difference
Consider a meta-analysis of K comparative studies, each consisting of two arms, treatment (T) and control (C), with sample sizes n iT and n iC .The total sample size in study i is n i ¼ n iT þ n iC , and the ratio of the control sample size to the total is f i ¼ n iC =n i .The subject-level data in each arm are assumed to be normally distributed with means μ iT and μ iC and equal variances σ 2 i .The sample means are x ij , and the sample variances are s 2 ij , for i ¼ 1, . . ., K and j ¼ C, T .
The SMD effect measure is The unbiased estimator of δ i is Hedges's g, given by where the standard deviation, σ i , is estimated by the square root of the pooled sample variance s 2 i , m i ¼ n iT þ n iC À 2, and the factor 2 À Á corrects for bias.For the variance of g i we use the unbiased estimator.
derived by Hedges (1983).The literature contains several other estimators of the variance of g i and its biased counterpart, d i .Lin and Aloe (2021) provide a comprehensive assessment.Define ñi ¼ n iC n iT =n i , the effective samplesizeinstudy i.The sample SMD g i has a scaled non-central t distribution with non-centrality parameter γ i ¼ ñ1=2 i δ i (Hedges & Olkin, 1985): 3. Random-effects model and the Q statistic We consider a generic REM.For study i , where the effect-measure-specific distribution G has mean θ i and variance v 2 i , and θ i ∼ Nðθ, τ 2 Þ.Thus, the θi are unbiased estimates of the true conditional effects θ i , and the v 2 i ¼ Varð θi jθ i Þ are the true conditional variances.Cochran's Q statistic is a weighted sum of the squared deviations of the estimated effects θi from their weighted mean θ w ¼ ∑w i θi =∑w i : In Cochran (1954), w i is the reciprocal of the estimated variance of θi .We denote this traditional version of Q with inverse-variance weights by Q IV .In meta-analysis those w i come from the fixed-effect model.In what follows, we examine the version of Q, discussed by DerSimonian and Kacker (2007) and further studied by Kulinskaya et al. (2021), in which the w i are arbitrary positive constants.We denote this Q statistic with fixed weights by Q F .
Define W ¼ ∑w i , q i ¼ w i =W and Θ i ¼ θi À θ.In this notation, and expanding θ w , equation (4) can be written as We distinguish between the conditional distribution of Q (given the θ i ) and the unconditional distribution, and the respective moments of Θ i .For instance, the conditional second moment of Θ i is M c 2i ¼ v 2 i , and the unconditional second moment is Under the above REM, it is straightforward to obtain the first moment of Q F as.
This expression is similar to equation (4) in DerSimonian and Kacker (2007); they use the conditional variance v 2 i instead of its unconditional mean Eðv 2 i Þ.
Q with constant weights for SMD 3 Kulinskaya et al. (2021) also provide expressions for the second and third moments of Q F , but these moments require higher moments of Θ, up to the sixth moment.For Hedges's g the expressions for these higher central moments are rather complicated; we provide them in the Appendix.
4. Point and interval estimators of τ 2 4.1.Point estimators Rearranging the terms in equation ( 5) gives the moment-based estimator of τ 2 : DerSimonian and Kacker (2007) obtain a similar result; they use the conditional estimate, v2 i , instead of the unconditional estimate, Ê v 2 i À Á .For MD the two estimators are the same, because then For SMD we study both estimators with effectivesample-size weights w.With the conditional estimated variances in equation ( 7), we denote the estimator by SSC; with the unconditional estimated variances, it is SSU.
The estimator τ2 M arose from setting the observed value of Q equal to its expected value and solving for τ 2 .Instead of the expected value, one could use the median of the distribution of Q given τ 2 .If the true (or approximate) cumulative distribution function is FðÁjτ 2 Þ, a point estimator of τ 2 can be found as In the Farebrother approximation to the distribution of Q (Section 5), one can use either the conditional estimated variances or the unconditional estimated variances.We denote the resulting estimators by SMC and SMU, respectively.
For comparison our simulations (Section 6) include four estimators that use inversevariance weights: DerSimonian and Laird (1986) (DL), REML, Mandel and Paule (1970) (MP), and an estimator (KDB) based on the work of Kulinskaya, Dollinger, and Bjørkestøl (2011a) and discussed by Bakbergenuly et al. (2020).KDB uses an improved non-null first moment of Q and has better performance than most other estimators.In their review of methods for estimating the between-study variance, Veroniki et al. (2016) explain that DL is (by default) the most widely used, and they conclude that both REML and MP are better alternatives.

Interval estimators
Straightforward use of FðÁjτ 2 Þ also yields a 100ð1 À αÞ% confidence interval for τ 2 : We use both the conditional estimated variances and the unconditional estimated variances in the Farebrother approximation to F; we refer to the resulting profile estimators as FPC and FPU.Jackson (2013) introduced a similar approach using conditional variances.

Approximations to the distribution of Q
For meta-analysis of MD, Kulinskaya et al. (2021) considered the distribution of Q F , a quadratic form in normal variables, which has the form Q ¼ Θ T AΘ for a symmetric matrix A of rank K À 1.Because the vector Θ has a multivariate normal distribution, Nðμ, ΣÞ, the distribution of Q can be obtained by the algorithm of Farebrother (1984) (after determining the eigenvalues of AΣ and some other inputs).If the variances in Σ are the true variances, Farebrother's algorithm evaluates the exact distribution of Q.In practice (as in our simulations), it is necessary to plug in estimated variances.Encouragingly, the resulting approximation is quite accurate for MD.Kulinskaya et al. (2021) also considered a two-moment approximation and a three-moment approximation.The three-moment approximation regularly encountered numerical problems, so we do not include it here.
For SMD, Q F is a quadratic form in t variates.The Farebrother algorithm may provide a satisfactory approximation, especially for larger sample sizes.To apply it, we again plug in estimated variances.We investigate the quality of that approximation, which we denote by F SW, and the two-moment approximation (M2 SW), which is based on the gamma distribution.
The null distribution of Q IV is usually approximated by the chi-squared distribution with K À 1 degrees of freedom.For MD and SMD, however, this approximation is not accurate for small sample sizes (Viechtbauer, 2007a).For SMD, Kulinskaya, Dollinger, and Bjørkestøl (2011a) provided an improved approximation to the null distribution of Q IV based on a chi-squared distribution with degrees of freedom equal to the estimate of the corrected first moment; we denote this approximation by KDB.Biggerstaff and Jackson (2008) used the Farebrother approximation to the distribution of a quadratic form in normal variables as the 'exact' distribution of Q IV .We denote this approximation by BJ.Jackson, Turner, Rhodes, and Viechtbauer (2014) extended this approach to a Q with arbitrary weights in a meta-regression setting.When τ 2 ¼ 0, the BJ approximation to the distribution of Q IV is the χ 2 KÀ1 distribution.For comparison, our simulations include these three approximations.

Simulation design
Our simulation design follows that described in Bakbergenuly et al. (2020).Briefly, we varied five parameters: the overall true SMD (δ), the between-studies variance (τ 2 ), the number of studies (K), the studies' total sample size (n and n) and the proportion of observations in the control arm (f).Table 1 lists the values of each parameter.
The numbers of studies (K = 5, 10, 30) reflect the sizes of many meta-analyses and have yielded valuable insights in previous work.Rubio-Aparicio et al. (2018) reported numbers of studies ranging from 7 (their minimum for inclusion) to 70, with 28 of the 41 between 10 and 24.
In practice, many studies' total sample sizes fall in the ranges covered by our choices (n ¼ 20, 40, 100, 250 when all studies have the same n, supplemented by 30, 50, 60, 70; and n ¼ 30, 60, 100, 160 when sample sizes vary among studies).The choices of n follow a suggestion of Sánchez-Meca and Marı ´n-Martı ´nez (2000), who constructed the studies' sample sizes to have skewness 1.464, which they regarded as typical in behavioural and health sciences.The meta-analyses studied by Rubio-Aparicio et al. had median study-level sample sizes from 16 to 87.5; within those meta-analyses, the sample sizes varied substantially.
Many studies allocate subjects equally to the two groups ( f ¼ 1=2), and rough equality holds more widely (as in the studies analysed by Rubio-Aparicio et al.).Unequal allocations, either planned or observed, are not uncommon.To investigate potential impacts of such situations, we also used f ¼ 3=4, a substantial departure from equality.
We generated the true effect sizes δ i from a normal distribution: We generated the values of Hedges's estimator g i directly from the appropriately scaled noncentral t distribution, given by equation (3).We used a total of 10,000 repetitions for each combination of parameters.
R statistical software (R Core Team, 2016) was used for simulations.The user-friendly R programs implementing our methods and analysing the example in Section 7 are available at https://osf.io/3gytv.

Simulation results
In tests based on either version of Q, heterogeneity corresponds to large values of Q.Thus, we focused on the upper tail of the distribution.For each configuration of parameters and for each generated value of Q, we used each approximation to calculate the probability of a larger Q: p ¼ 1 ÀbFðQÞ (bF denotes the distribution function of the approximation).We recorded empirical p-valuesbp ¼ #ð p < pÞ=10, 000 at p =.001,.0025,.005,.01,.025,.05,.1,.25,.5 and, for completeness, the complementary values.75,. ..,.999.Thus, bp estimates the upper-tail probability.PðbFðQÞ > 1 À pÞ The values of τ 2 included both null (τ 2 ¼ 0) and non-null (τ 2 > 0) values (Table 1).The approximations to the non-null distribution of Q were based on the value of τ 2 used in the simulation.These data provide the basis for probability-probability (P-P) plots (vs. the true null distribution) for two approximations to the distribution of Q with effectivesample-size weights (F SW and M2 SW) and two approximations to the distribution of Q with IV weights (chi-squared/BJ and KDB (for τ 2 ¼ 0 only)) and for estimating their null levels and their non-null empirical tail areas.We also estimate the bias of eight point estimators of τ 2 (DL, REML, MP, KDB, SSC, SSU, SMC and SMU) and the coverage of five interval estimators of τ 2 (QP, PL, KDB, FPC and FPU).
6.3.P-P plots for τ 2 ¼ 0 To compare an approximation for a distribution function of Q against the theoretical distribution function, we use P-P plots (Wilk & Gnanadesikan, 1968).Evaluating two distribution functions, F 1 and F 2 , to obtain upper-tail probabilities at x yields p 1 ¼ 1 À F 1 ðxÞ and p 2 ¼ 1 À F 2 ðxÞ.In the usual plot of p 2 versus p 1 , equality of the two distributions corresponds to the line p 2 ¼ p 1 .To make departures from that reference pattern more visible, we flatten the plot by subtracting the line; that is, we plot p 2 À p 1 versus p 1 .
When δ ¼ 0, P-P plots show that the KDB and F SW approximations perform reasonably well for small sample sizes, but the χ 2 KÀ1 and M2 SW approximations do not (Figure 1).This difference is especially pronounced for larger K values.It appears that KDB performs better overall for small K, and F SW for large K.All four approximations perform reasonably well for n ≥ 100.When δ increases, the performance of KDB deteriorates somewhat in the upper half, though it may be somewhat better than F SW in the lower half.Both the χ 2 KÀ1 and M2 SW approximations deteriorate further.Results are similar for equal and unequal sample sizes.

Empirical levels when τ 2 ¼ 0
To better visualize the quality of the approximations as the basis for a test for heterogeneity at the 0.05 level, we plot their empirical levels under the null τ 2 ¼ 0 against sample size.Figure 2 presents typical results at the 0.05 level.Figure 3 depicts the quality of the approximations at the 0.95 level.
For small sample sizes, the error rate of the test based on F SW somewhat exceeds the nominal 5% (up to 6%), and the error rate of the test based on KDB is somewhat low (between 4% and 3%).Both the χ 2 KÀ1 and M2 SW approximations result in tests with error rates that are noticeably too low (in that order).For all approximations, departures from the nominal level increase for larger K and larger δ, especially when sample sizes are unequal.The χ 2 KÀ1 approximation has empirical levels of about 2.5% (vs. the nominal 5%) when K = 30.For unequal sample sizes or unbalanced samples, the results are similar.The chi-squared approximation provides reasonable results by n = 100.
The picture is similar in the lower tail.All approximations except M2 SW, which produces extremely high empirical levels when n is small, work well for K ¼ 5.However, increasing values of K and, to a lesser degree, of δ, result in decreasing empirical levels for χ 2 KÀ1 (down to 92%) and hence larger error rates, and, to a lesser degree, for F SW (to 93.5%) when K ¼ 30.KDB exhibits the best performance in the lower tail.
6.5.Empirical levels when τ 2 > 0 To understand how the approximations behave as τ 2 increases, we plot the empirical pvalues ( p) versus τ 2 for the F SW, M2 SW and BJ approximations for the nominal level 0.05 (Figure 4).F SW provides robust though somewhat high (for larger K ) levels at all values   of τ 2 and δ.This is also true for unequal sample sizes and unbalanced studies.M2 SW results in lower error rates; its level decreases further for larger δ but does not depend on τ 2 .The BJ approximation has even lower error rates, and it deteriorates further as τ 2 increases.

Bias in estimation of τ 2
Here we compare eight point estimators of τ 2 : three well-known estimators (DL, REML, MP), the less-well-known KDB, and four new estimators (SSC, SSU, SMC, SMU). Figure 5 depicts the biases of the eight estimators for small sample sizes.
SSC is the best estimator overall; it is almost unbiased under all studied conditions even for very small and unbalanced sample sizes (Figure 5).DL is clearly the worst; it has considerable negative bias, which increases in τ 2 .REML is the second worst; its bias is similar to DL but less pronounced.MP is the best of the established estimators; this agrees with the recommendation of Veroniki et al. (2016).It is also negatively biased for larger K and τ 2 values, but not by much.The bias of SSU is similar to that of MP for K ¼ 5, and it is smaller than that of MP for larger values of K and δ, though it is larger than that of SSC.This makes sense, as the unconditional variance is calculated from averages over K values, so it is estimated more precisely for larger K. Estimators SMC and SMU are positively biased; their bias increases in τ 2 but decreases in K. SMU is less biased than SMC.By design, these two estimators are expected to be median-unbiased; we discuss them further in Section 8. Finally, KDB is somewhat positively biased, and its bias increases in τ 2 .We recommended MP and KDB in our previous work (Bakbergenuly et al., 2020), and our current results agree with the previous ones.
To summarize, SSC provides very precise estimation of τ 2 and should be exclusively used in practice.

Coverage in interval estimation of τ 2
Here we compare the coverage of five interval estimators of τ 2 (QP, PL, KDB, FPC, FPU) at the 95% nominal level of confidence.Figure 6 depicts the coverage of the five estimators for small sample sizes.
All interval estimators have coverage that is too high for τ 2 < 0:5.For larger values of τ 2 , QP and KDB typically have somewhat excessive coverage (KDB more so than QP), and PL, FPC and FPU typically have somewhat deficient coverage.Coverage of FPC is very slightly above that of FPU.The coverage of these two new estimators is close to nominal when K ¼ 5 and τ 2 ≥ 0:5, but it decreases to about 94% for K ¼ 30.Coverage of PL may be erratic, especially for small K, even for large sample sizes, and we do not recommend its use.
Overall, QP performs quite impressively, and we recommend its use in practice.This finding is counter-intuitive, as we saw previously that the χ 2 KÀ1 approximation does not hold levels well.However, it works for confidence levels.The explanation is that the confidence intervals provided by QP are not symmetrical for small n, especially for large values of K and δ.
To explain this point, consider a QP confidence interval at the 90% level.In the discussion of empirical levels in Section 6.4 and Figures 2 and 3 for levels 0.05 and 0.95, we noted that when K ¼ 30, the χ 2 KÀ1 approximation has empirical level about 2.5% at the nominal 5% level and empirical level about 92% at the nominal 95% level.The QP 90% Q with constant weights for SMD 13 confidence interval consists of fτ 2 : q 0:95 ≥ Q wðτ 2 Þ ≥ q 0:05 g, where the q α are critical values of χ 2 KÀ1 and w τ2 . This interval would have about 8% probability below the lower limit and about 2.5% above the upper limit.The results for levels .025and .975(not shown) also show non-symmetric patterns similar to those in Figures 2 and 3. QP intervals at the 95% level can have about 4% in the left tail, and 1% in the right tail.

Example
We use data, previously considered by (Sánchez-Meca & Marı ´n-Martı ´nez, 2010) and subsequently by Bakbergenuly et al. (2020), on the efficacy of psychological treatments for obsessive-compulsive disorder.These data consist of 24 trials with mostly small sample sizes, ranging from 12 to 121 patients.The effect measure is SMD, and positive values correspond to lower levels of obsessions and compulsions in the treatment group.The data appear in table 4 of Bakbergenuly et al. (2020), and our Figure 7 shows a forest plot.Heterogeneity in these data is rather high.The value of Q IV is 53:45 resulting in a pvalue of :00032 for the χ 2 KÀ1 approximation and a p-value of :00010 for the KDB approximation.The value 83:44 for Q F results in a p-value of :00003 for F SW and in a much higher p-value, :0139 for the two-moment approximation, M2 SW.Table 4 of Bakbergenuly et al. (2020) includes the year of the study (one in 1980 and the rest from 1993 to 2006) and whether the study design was experimental or quasi-experimental (four studies).The values of g i , however, are not systematically related to either of these variables.Further examination of potential sources of heterogeneity would consider details of the studies' designs and variation among the means of the control arms.
Table 5 of Bakbergenuly et al. (2020) shows results for several point and interval estimators of τ 2 that we also consider here, and for the corresponding estimators of δ, with bδ values from 1.07 to 1.12.Our  5).All our new estimators give rather similar values.SSC, at 0:2698, is somewhat higher than SSU, at 0:2504; and SMC, at 0:2879, is somewhat higher than SMU.KDB is highest, also in agreement with our simulations.MP is unexpectedly high, but this may be due to the comparatively low value of τ 2 , as all estimators have positive bias at τ 2 ¼ 0.
The lower limits of the QP, FPC and FPU confidence intervals are rather similar, whereas PL is lowest, at 0, and KDB is highest, at 0:2167.The lengths of the confidence intervals are also rather similar, except for QP, which is widest at 1.001.This may be due to the shift of the upper limit of the QP interval, further into the right tail, as discussed in Section 6.7.

Discussion
The Q statistic serves as the basis for two main steps in random-effects meta-analysis: testing for the presence of heterogeneity and estimating the between-study variance.In its customary form, with inverse-variance weights based on estimated fixed-effect variances, Q IV contributes to a variety of shortcomings.Encouraged by the favourable performance, with the mean difference as the measure of effect, of a version, Q F , whose weights are based only on the studies' arm-based sample sizes, we studied key features of its performance when the measure of effect is the SMD.Aspects of performance included accuracy of approximations for the distribution of Q F (or Q IV ), empirical levels when τ 2 ¼ 0 and when τ 2 > 0, bias of point estimators of τ 2 , and coverage of interval estimators for τ 2 .On most of these aspects, Q F and related estimators performed better than Q IV and their other counterparts.
The P-P plots show that the Farebrother approximation (F SW) usually comes close to the actual null distribution of Q F , much closer than the two-moment approximation.This result should not be surprising, because F SW makes more detailed use of the study-level variances than M2.It is encouraging, because F SW assumes that the variables in the quadratic form have normal distributions, instead of the actual t distributions.For Q IV , the KDB approximation is consistently better than χ 2 KÀ1 , especially as K increases when n is small.Having the correct first moment of the null distribution can make a big The first moment of t nÀ2 ðγÞ, denoted by μ 1 , is The second moment is The (conditional) central moments of Hedges'sbg are

Q
with constant weights for SMD 5 Table1.Values of parameters in the simulations for Q with constant weights and SMD as the measure of n (average size of individual study) -total of the two arms

Figure 1 .
Figure1.Flattened P-P plots of upper-tail probabilities for the Farebrother and M2 approximations to the null distribution of Q with sample-size-based weights, and for the chi-squared and KDB approximations to the null distribution of Q with IV-based weights.First three rows: equal sample sizes, n ¼ 20, f ¼ 0:5, δ ¼ 0, 0:5, 1. Fourth row: n ¼ 40, δ ¼ 1, f ¼ 0:5

Figure 2 .
Figure2.Empirical levels of approximations to the null distribution of Q with sample-size-based or IV weights at nominal 0.05 level against sample size n.In all plots, τ 2 ¼ 0 and f ¼ 0:5.Top two rows: equal sample sizes, δ ¼ 0 and δ ¼ 1. Bottom two rows: unequal sample sizes, δ ¼ 0 and δ ¼ 1

Figure 3 .Figure 4 .
Figure3.Empirical levels of approximations to the null distribution of Q with sample-size-based or IV weights at nominal 0.95 level against sample size n.In all plots, τ 2 ¼ 0 and f ¼ 0:5.Top two rows: equal sample sizes, δ ¼ 0 and δ ¼ 1. Bottom two rows: unequal sample sizes, δ ¼ 0 and δ ¼ 1

M
μ s is the sth moment E t s nÀ2 γ ð Þ Â Ãgiven by equation (A.2).Substituting the result from equation (A.2) and expressions for γ and μ 1 , the conditional central moments of Hedges's g areg are !en -j :Now we apply these results to study i by restoring the subscript i on variables pertaining to study i and substituting the conditional moments into equation (A.1):Q with constant weights for SMD 21 is the mth central moment of the N 0, τ 2 ð Þdistribution.Define E 0 ¼ 1.All odd central moments are zero, and the even moments are E m ¼ τ m m À 1 ð Þ!!.
Table 2 includes all estimators of τ 2 in our simulationFigure 7. Forest plot for meta-analysis of the data from Sánchez-Meca and Marı ´n-Martı ´nez (2010) on the efficacy of psychological treatments for obsessive-compulsive disorder.The estimate of the overall effect obtained by using REML is included for illustration study.For comparison, our simulations include data patterns with K ≥ 20, δ ¼ 1 and τ 2 ≤ 0:5.Point estimates of τ 2 are lowest for REML and DL, in agreement with our simulations (Figure

Table 2 .
Point and confidence-interval estimates for the heterogeneity parameter τ 2 in the example of efficacy of psychological treatments for obsessive-compulsive disorder Note L and U denote the lower and upper limits of the 95% confidence interval.