Understanding Treatment Effect Scores in Behavioral Therapy Research

Current prevalence estimates released by the Center for Disease Control (CDC) suggest that one in 68 children in the United States may have Autism Spectrum Disorder (ASD) (CDC MMWR, 2014). The core impairments that characterize autism – social and communicative functioning and the presence of restricted, repetitive and stereotyped interests and behaviors – are reportedly experienced to varying degrees into adulthood (Roth, Gillis, & DiGennaro Reed, 2013). Support systems have been slow in adapting to the needs of transition aged youth with ASD and many adults on the spectrum have been described as socially isolated, economically unproductive, and financially disadvantaged (Howlin, 2008; Geller & Greenberg, 2010).

While there are a wide variety of potential treatments available for supporting individuals with ASD, a long history of failed treatments and fads has been reported (Food and Drug Administration, Consumer Health Information, 2014). In their report, the FDA warned that a number of companies may face legal action should they continue to promote false or misleading claims about products and therapies that claim to treat or cure autism. In particular chelation therapy, hyperbaric oxygen therapy, miracle mineral solution, detoxifying clay baths, coconut kefir and other probiotic products were listed.

A combination of genetic makeup and lived experiences contribute to a unique personal profile of strengths and deficits for those on the autism spectrum, and a focus on individuality is essential when identifying treatment options. Applied Behavior Analysis (ABA) based treatments are endorsed by the U.S. Surgeon General and the New York State Department of Health. While many children have the benefit of accessing ABA-based early intervention programs, others who face a life time of autism related challenges may grapple with little to no funding for support services. For some, geographic isolation may impact the ability to access support services while for others socio-economic factors may mean that families in under-funded communities may struggle with limited access to support services.

Accordingly, the ability for parents, teachers and clinicians alike to access and interpret scientifically robust information on evidence-based treatments is essential. Researchers working in a behavioral therapy paradigm often utilize Single-Case Design (SCD) methodology as these designs make it possible to draw scientifically valid conclusions (Baer, Wolf, & Risley, 1968). SCD research is of particular importance to the autism community as these research designs are highly suitable for accommodating the unique characteristics of individuals on the spectrum. Treatment packages can be developed for older students and adults in addition to younger children.

The What Works Clearinghouse (WWC) was formed under the Education Sciences Reform Act (2002) to address the concerns of Evidence-Based Practice (EBP) and empirically-supported treatment that may enable federal and state governments to invest in educational, clinical and social practices that are scientifically valid (Horner, Swaminathan, Sugai, & Smolkowski, 2012). The WWC Procedures and Standards Handbook describe quality assessment procedures for both group design and SCD research (Kratochwill et al., 2013). Methodology for determening strength of treatment effects in group design research is well established, however for SCD research the most appropriate approach to determine treatment effect is surrounded by ongoing debate.

The current WWC SCD pilot guidelines recommend that a treatment may be considered evidence based if a set of studies have met the minimum 5-3-20 rule:


  • At least five SCD studies document experimental control;
  • The five studies were drawn from at least three different research teams/locations, and;
  • The five studies document effects for at least 20 different participants.

Currently, the WWC panel has cautioned against calculating a treatment effect score for SCD research until a greater consensus on a best method is reached. In the interim, visual analysis has been suggested as the preferred method to evaluate treatment effects. However, a treatment effect score has previously been specified as a requirement for meta-analysis publication by the APA Taskforce on Statistical Inference (1999). In addition, the earlier statistical taskforce emphasized the importance of understanding how a given statistical measure is calculated, and how to interpret the statistic. Despite this debate, the ASD community of stakeholders require information describing evidence based best practice immediately. It is of critical importance to identify potential treatments that may in fact cause harm, and in the best interests of all parties to avoid selecting treatments that may be ineffective.

The Percentage of Nonoverlapping Data (PND) (Scruggs, Mastropieri, & Casto, 1987) effect size calculation has been identified as the most frequently adopted method of calculating a treatment effect score in SCD research, with a recent review of published meta-analyses reporting that this method was applied in 47 of the 84 (55%) effect sizes that were reported (Maggin, O’Keeffe, & Johnson, 2011). PND has been criticized in the literature on the grounds that it is reliant on a single extreme data point in baseline, lacks sensitivity as calculated scores approach 100%, and confidence intervals cannot be calculated. In recent years, alternate calculation methods have been developed to address these concerns.

Carr and colleagues (2014) investigated the suitability of SCD data for treatment effect calculations using examples drawn from peer-reviewed published behavior therapy research specifically for individuals diagnosed on the spectrum. Self-management was selected to represent an established treatment, and exercise to represent an emerging treatment, as described in the National Standards Report (2009). The aim of their research was twofold: first to determine if the data may be suitable for a more complex regression-based treatment effect calculation; and second to compare three calculation methods that can be performed by hand. The logic behind this approach was that calculations that do not require extensive training or additional software applications to perform may mean that teachers or clinicians in underfunded communities, or remote locations, could access and interpret treatment reports with greater ease.

One significant finding from their study was that relatively short data series are being collected by researchers, with recent studies reporting fewer data points than older studies. Behavioral challenges were described for many participants, and the authors argued that collecting a greater number of data points presents researchers with a significant ethical dilemma. Accordingly, the authors reported that in future, SCD data for participants with ASD may not be well suited to complex treatment effect calculations that require larger volumes of data points.

Percentage of All Nonoverlapping Data (PAND) (Parker, Hagan-Burke, & Vannest, 2007), and Nonoverlap of All Pairs (NAP) (Parker & Vannest, 2009) was subsequently compared to PND. Although PAND received favorable feedback in the broader educational psychology literature and has appeared recently in several published systematic reviews conducted with students with disabilities, Carr and colleagues reported that PAND was suitable for only 23 of the 38 studies. Mean treatment effect scores for the self-management intervention data resulted in PND 78.8%, PAND 92.7% and NAP 93.2%. Interpretation of derived scores is currently not a straightforward procedure. Using the available scales developed by the original authors for each method respectively, Carr and colleagues reported that PND described self-management interventions as an effective treatment (the second highest category), and NAP as a strong treatment (the highest category). PAND was omitted from their comparison as an interpretation scale was not proposed by the original authors.

Also of concern, the tentative interpretation scale that has been developed for NAP uses bandings that were noted to be inconsistent to those of the widely employed PND metric. Carr and colleagues reported that until these issues are further researched, interpretation of newer calculation methods should be treated cautiously. In particular, they found that newer methods may report a greater strength of treatment effect score when compared to PND. Their report stressed that this may be potentially misleading, as readers may perceive studies as more effective should a newer treatment effect score be adopted, and argued that this in turn may contribute to false expectations on behalf of treatment providers or families.


For further information please contact Monica E. Carr, Doctoral Researcher, Monash University, Australia, at mebar4@student.monash.edu.


Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 1(1), 91–7. doi:10.1901/jaba.1968.1-91

Carr, M., E., Anderson, A., Moore, D. W., & Evans, W., H. (2014). How should we determine treatment effectiveness with single-case design research for participants with autism spectrum disorder? Review Journal of Autism and Developmental Disorders. doi:10.1007/s40489-014-0030-9

Carr, M. E. (2014). A sensitivity analysis of three nonparametric treatment effect scores for single-case research for participants with autism. Review Journal of Autism and Developmental Disorders. doi:10.1007/s40489-014-0037-2

CDC MMWR. (2014). Prevalence of autism spectrum disorder among children aged 8 years – autism and developmental disabilities monitoring network, 11 sites, United States, 2010. Morbidity and mortality weekly report. Surveillance summaries (Washington, D.C. : 2002), 63(2), 1–21. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/24670961 August 1, 2014.

FDA. (2014). Beware of False or Misleading Claims for Treating Autism. Consumer Health Information, 1–2. Retrieved from http://www.fda.gov/downloads/forconsumers/consumerupdates/ucm394800.pdf December 1, 2014

Geller, L., & Greenberg, M. (2010). Managing the Transition Process From High School to College and Beyond: Challenges for Individuals, Families, and Society. Social Work in Mental Health, 8(1), 92–116. doi:10.1080/15332980902932466

Horner, R. H., Swaminathan, H., Sugai, G., & Smolkowski, K. (2012). Considerations for the Systematic Analysis and Use of Single-Case Research. Education and Treatment of Children, 35(2), 269–290. doi:10.1353/etc.2012.0011

Howlin, P. (2008). Redressing the balance in autism research. Nature clinical practice Neurology, 4(8), 407. doi:10.1038/ncpneuro0860

Kratochwill, T. R., Hitchcock, J. H., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M., & Shadish, W. R. (2013). What Works Clearinghouse Procedures and Standards Handbook (Version 3.0).

Lovaas, O. I. (1987). Behavioral Treatment and Normal Educational and Intellectual Functioning in Young Autistic Children, 55(1), 3–9.

Maggin, D. M., O’Keeffe, B. V., & Johnson, A. H. (2011). A quantitative synthesis of methodology in the meta-analysis of single-subject research for students with disabilities: 1985–2009. Exceptionality, 19(2), 109–135. doi:10.1080/09362835.2011.565725

National Standards Report. (2009). National Autism Center. National Autism Center, Randolph MA.

Parker, R. I., Hagan-Burke, S., & Vannest, K. (2007). Percentage of all non-overlapping data (PAND): An alternative to PND. The Journal of Special Education, 40, 194–204. doi:10.1177/00224669070400040101

Parker, Richard I, & Vannest, K. (2009). An improved effect size for single-case research: nonoverlap of all pairs. Behavior therapy, 40, 357–67. doi:10.1016/j.beth.2008.10.006

Roth, M. E., Gillis, J. M., & DiGennaro Reed, F. D. (2013). A Meta-Analysis of Behavioral Interventions for Adolescents and Adults with Autism Spectrum Disorders. Journal of Behavioral Education, 23(2), 258–286. doi:10.1007/s10864-013-9189-x

Scruggs, T. E., Mastropieri, M. A., & Casto, G. (1987). The quantitative synthesis of single-subject research: Methodology and validation. Remedial and Special Education, 8(2), 24–33. doi:10.1177/074193258700800206

Have a Comment?