This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Psychoanalytic PsychotherapyVol. 24, No. 1, March 2010, 22–43 The changing shape of clinical practice: Driven by science Freud Memorial Professor of Psychoanalysis, University College London, and Chief Executive, The Anna Freud Centre, London Few would dispute that psychoanalytic practice has changed considerably overthe last 50 years. The differences are at least three-fold: (a) a bimodal distributionof treatment lengths (some very long and some very short); (b) patients withpersonality pathology in addition to axis I symptoms; and (c) technical changeswith a relational flavour, displaying influences from developmental theoriesincluding attachment theory. Are any of these changes associated with scientificresearch and, if so, how? A second question concerns the research base ofpsychoanalysis. If we take the discoveries of the last 25 years seriously, what kindof therapy might we recommend to our patients? Would it look anything likestandard practice? I will attempt to pose some serious questions about the statusof clinical practice, asking why it appears to be protected from scientific advancesyet readily responsive to pragmatic considerations such as third party payments.
To make the argument easier to absorb I will try and tell it as number of shortheroic tales, each with its own beginning, middle and end, complete with themoral lessons to be learned.
The evidence base of psychotherapy in general I will begin with the story of the evidence base of psychotherapy in general. This Downloaded by [DAVID LOPEZ] at 13:08 05 September 2013 story began over half a century ago when an evil genius, a certain Dr HansEysenck (1952) disturbed the tranquility in the Kingdom of Psychotherapyestablished by a benign powerful ruler (Sigmund Freud, the Wise). This evilgenius made the claim that psychotherapy worked no better than the facilitynature gave all of us to recover from psychological disturbance by a process ofspontaneous healing. Up to that time, all the Templar Knights of psychotherapy,who were both fearsome warriors and devout monks, could righteously believethat the people they cured recovered because of their magic spells and carefullymeasured potions, which often took decades of apprenticeship to learn to brewwith confidence. This evil genius claimed that Sleeping Beauty woke up simply ISSN 0266-8734 print/ISSN 1474-9734 online q 2010 The Association for Psychoanalytic Psychotherapy in the NHS DOI: 10.1080/02668731003590139 because she had slept long enough, and that her awakening had nothing to do withthe magical time she spent on the couch.
Fortunately, in raising this monster, he disturbed a nest of psychologists, who saw an opportunity to create a new hive for their worker bees (also calledgraduate students) and they called this psychotherapy research. The researcherbees and their bee-keepers went on to sample hundreds of thousands of therapysessions and collected together a number of papers with magical powers, calledan ‘evidence base’, which showed that the magic of psychotherapy was indeedmagical.
In the ensuing half a century the bee-keepers and the researcher bees (the bee-keepers actually looked more like Sancho Panza, and the templar knightsa little like Don Quixote), demonstrated that the average effect size ofpsychotherapy was 0.8 across probably over 1000 studies (Wampold, 2001).
Effect size (ES) is researcher bee language for the likelihood that a person treatedwith psychotherapy magic would be better off than a person in the control groupif both were chosen at random (Cohen, 1962). Using ESs the bees can collecttogether the nectar (the results) of hundreds of flowers (studies) and calculate theaverage ES for a treatment. Tradition has it that an effect size of less than 0.3 istoo close to the 50% mark (i.e. too close to chance), to call it a significant effect.
From 0.3 (where 58% of those treated are better off than untreated controls) to 0.6(where two-thirds are better off) the effect is considered to be weak. From 0.6 to0.9 (where three-quarters of those treated are better than untreated controls)effect sizes are thought of as medium and beyond 0.9 we are in the land of strongeffect sizes. You should note that effect sizes are sometimes used to describethe change between the beginning and the end of a trial (pre-post effect size)and the difference between a treated and an untreated group at termination(between group effect size). Pre-post effect sizes combine the effect of treatmentwith the monster that Eysenck raised – spontaneous remission. One final noteabout the language of these researcher bees – in another of the dialects spoken inthis evidence-based magical beehive, differences are expressed in terms of thenumber of individuals who need to be treated before we come upon a person who Downloaded by [DAVID LOPEZ] at 13:08 05 September 2013 unequivocally would not have got better on their own without the treatment(Laupacis, Sackett, & Roberts, 1988).
So, 0.8 as a psychotherapy versus no therapy effect size is big – in a moderate sort of way (Wampold, Imel, & Minami, 2007). It means that nearly three-quartersof patients who have psychotherapy are better off than those left to recover bythemselves. The Number Needed to Treat (NNT) is 3. As a marker, for those ofyou taking aspirin as a prophylaxis for heart attacks, the NNT before you see oneunequivocal case who benefited is 129. The effect size for Psychotherapy Magic issuperior to almost all interventions in cardiology, geriatric medicine, asthma, fluvaccines, and cataract surgery. Psychotherapy is mostly as effective aspsychoactive medication and there is evidence that additional benefit accruesfrom combining the magics in many contexts (Cuijpers, van Straten, van Oppen,& Andersson, 2008; Maina, Rosso, & Bogetto, 2009).
So this story appears to have a happy ending, but like most of the stories that you will hear from me, and unlike real fairy stories, the ending is also curious andcomplex. Effect sizes are inherently ambiguous. A magical treatment could attaina moderate effect size by producing a very large effect for a small subset ofpatients or achieving a moderate but incomplete reduction in symptoms for many.
To clarify this, the researcher bees have collected data on the percentagerecovered or percentage improved both by some predefined criterion. Two of thebusiest bee-keepers of psychotherapy research (Westen & Bradley, 2005), haveprovided us with an impressive meta-analysis of six disorders: Major DepressiveDisorder (MDD); panic disorder; Generalized Anxiety Disorder (GAD);Obsessive Compulsive Disorder (OCD); bulimia nervosa; and Post-traumaticStress Disorder (PTSD) based on percentage of improvers. They tell a soberingtale. While across disorders, treatment versus control effect sizes tend to bemoderate or large, on average only roughly half of patients who completetreatments in these trials improve substantially. The figures are even moresobering when we compute the percentage of those who improved on the basis ofall those who entered treatment. On average, only 33% of those who enteredtreatment for depression can expect to improve significantly based on thesefigures. A further surprise awaits us when we look at the percentage who remainimproved at follow-up. Less than one-third of those who complete treatment forbulimia remain improved. However, the figures for anxiety conditions (panic andGAD) tend to be better.
Not surprisingly, improvement rates relate to severity and treatment duration (Kopta, Lueger, Saunders, & Howard, 1999). On average, acute distress improvesin three-quarters of cases within 25 sessions. However, chronic disorders, definedin various ways, appear to require longer term treatment. There is a less than 60%improvement rate after 25 sessions. The situation is even worse for those whoreceive what we might refer to as a complex disorder diagnosis, that is, three ormore diagnoses or diagnosis of a personality disorder. Here improvement ratesare less than 40% after 25 sessions. There is even some indication, good magicturning bad, of inadvertent harm being done to patients with some personality Downloaded by [DAVID LOPEZ] at 13:08 05 September 2013 disorders if they are offered time-limited treatment, so that they end up worse offthan when they started (Tyrer et al., 2004). The most recent German work fromthe bees in Horst Kaechele’s hive, a champion bee-keeper of psychotherapyresearch, carefully following a group of patients with private health careinsurance, reported that about one and a half years of treatment was requiredbefore the average patient achieved an acceptable level of improvement(Puschner, Kraft, Kachele, & Kordy, 2007).
Incidentally, this work found no evidence for the much cited exponential rate of improvement originally demonstrated by Ken Howard (Howard, Kopta,Krause, & Orlinsky, 1986). The relationship of improvement and time is bestillustrated by a straight line.
So what are the morals of the tale of psychotherapy in general? Certainly there is good evidence for the efficacy of psychological therapies. Over 1000 studies demonstrate that in relation to major mental health conditions they canachieve significant symptom reductions and in some cases, particularly withanxiety disorders, freedom from symptoms. The magic spell and potion, whenappropriately administered, can improve social adjustment and workrelationships. However, there are many different potions and some commonlyused ones have been less well researched than others. There is more evidence forthe magic formula brewed by the knights of cognitive behaviour therapy than thepotion generated in the cauldron of psychodynamic psychotherapy. Of course, inpractice most of those who brew these powerful mixtures combine severaldifferent recipes, and currently there is very little evidence for eclectic brewslike these. By contrast, researcher bees have been busy in relation to somelittle-used concoctions, such as interpersonal psychotherapy. There are also manydiagnoses for which very little data has been collected, includingeating disorders, bipolar disorder, and certain personality disorders such asavoidant PD.
The story of long-term psychodynamic psychotherapy (LTPP) The next story is less epic and more recent than the previous one. As with manymodern stories, it is fraught with controversies and possibly insoluble problems.
The bee-keepers and researcher bees struggle to collect information about longer-term treatments to bring back to our collective knowledge base. They found ithard to solve the problem of randomization when this involved getting agreementto go without the preferred treatment for 18 months or more. Finding anappropriate control group was itself a challenge, as was the blinding of assessors,and the creating of manuals to guide work over a lengthy time period and ensurethat what took place was what was described (treatment integrity). Yet forpsychoanalytic clinicians this is the Holy Grail. It is long-term treatment that wehave been trained to do and that we wish to make claims for.
It is nothing short of miraculous that enough data have been collected for two prestigious systematic reviews to have been published, one by Falk in the Journal Downloaded by [DAVID LOPEZ] at 13:08 05 September 2013 of the American Medical Association (Leichsenring & Rabung, 2008) andanother by Askia de Maat and her colleagues in the Harvard Review ofPsychiatry (de Maat, de Jonghe, Schoevers, & Dekker, 2009). The latter reviewcollected together 27 studies where the impact of long-term therapy on symptomreduction was measured, and/or information on personality changes wascollected. The unmet challenge of the control groups meant that effect sizes werepre-post, not between-group. Nevertheless, the studies covered the treatment ofover 5000 patients. The effect sizes of outcome measures combined werebetween 0.8 and 1 and tended, if anything, to slightly increase on follow-up andwere somewhat bigger for psychoanalysis than psychotherapy. The percentagesuccess rate on symptoms was around 70% based on clinicians’ opinion and frombetween 60 – 70% for patient self-report, when success was defined as at leastmoderate improvement.
The Leichsenring meta-analysis was very ambitious and identified 23 studies.
The studies concerned difficult problems, but pre-post effect sizes wereconsistently large. For chronic problems, the effect size is 0.87 – 2.45, forcomplex depression and anxiety it is 0.97 – 1.94, multiple problems 0.94 – 1.84,and personality disorder 0.82 – 1.65. Controversially, the authors contrasted theseeffects with those normally obtained for similar client groups in short-termtherapy and found a significant superiority for long-term treatment. The size ofeffects varied according to type of measure, with the largest effect sizesconsistently obtained on target problems, and social and personality functioningcoming some way behind. However the effects were consistently positive, withthe confidence interval around the effect sizes comfortably above the line ofinsignificance, the dreaded zero line.
As you might imagine, not all in the Land of Psychotherapy Research were happy on hearing this news (Beck & Bhar, 2009; Glass, 2008; Kriston, Holzel, &Harter, 2009; Roepke & Renneberg, 2009; Thombs, Bassel, & Jewett, 2009) andthe ensuing correspondence had the consistent theme of debunking positiveevidence for long-term treatment. The details should perhaps be consigned to thehistory books, but the points raised address both the nature of the original studies(the review was based on studies with small samples with a likely bias towardsthe positive, treating a wide range of disorders with poorly specified controlgroups, and uncontrolled for contact and the structure of treatment) and themethodology of the review (conflating within-group and between-group effectsizes, selective inclusion and exclusion of studies, etc.).
Our bards put up a spirited defence over several pages of the journal (Leichsenring & Rabung, 2009), but the substance of the attacks was hard todeny. Many of the studies reviewed were in effect uncontrolled andheterogeneous and it is hard to feel confident that the original critique by theEvil Genius, Hans Eysenck, was adequately addressed by the original collectionof reports. So the bards returned to their crypts and wrote a new ballad which isnot yet published, and therefore I am only at liberty to share the refrain. Theyidentified ten controlled studies of long-term psychoanalytic psychotherapy Downloaded by [DAVID LOPEZ] at 13:08 05 September 2013 versus other types of treatment (Bachar, Latzer, Kreitler, & Berry, 1999;Bateman & Fonagy, 1999; Bateman & Fonagy, in press; Clarkin, Levy,Lenzenweger, & Kernberg, 2007; Dare, Eisler, Russell, Treasure, & Dodge,2001; Gregory et al., 2008; Huber, Denscherz, Gastner, Henrich, & Klug,submitted; Korner, Gerull, Meares, & Stevenson, 2006; Svartberg, Stiles, &Seltzer, 2004) where these treatments were used in the treatment of complexdisorders, chiefly personality disorders (7), eating disorders (2) and depression(1). The comparisons are with CBT (Cognitive Behaviour Therapy), DBT(Dialectical Behaviour Therapy), CAT (Cognitive Analytic Therapy), SCM(Structured Clinical Management), and TAU (Treatment as Usual). Thetreatments lasted on average 70 weeks offering 120 sessions. The comparisontreatments lasted about the same time, although offered fewer sessions. Thefindings were similar to the previous analysis. The average between-group effect size was 0.67, somewhat larger for target problems, 0.88, than for generalpsychiatric symptoms, 0.54. Medium effect size differentiated the comparisongroups from LTPP in terms of personality and social function.
Why are these findings of enormous importance? This is the first set of strong signals which suggests that long-term psychodynamic psychotherapy is superiorto less intensive treatments when directed towards complex mental disorders. Nodoubt when these findings are published there will be a chorus of complaintsconcerning the original studies and the methodology of the review; but shouldthis surprise us? We live in a competitive world. What is good for psychodynamictherapy is often felt to disadvantage other orientations. Collectively, we haveexpressed dissatisfaction on many occasions about similar issues in relation totrials of CBT.
Nevertheless, before concluding, let us once again draw the morals from our tale. First, while it is reassuring and helpful that LTPP knights struggle for longerand ultimately more effectively than the knights with other crests on their shields,we do not know how well the other knights would do if their rules of combatpermitted them to battle as intensely and at as close a range as our knights did.
Second, the knights indeed showed their valour in these most ardent of trialsrescuing suicidal self-harming damsels in distress (both Borderline PersonalityDisorder (BPD) and eating disorder samples are 80% female). How common aresuch challenges in our consulting rooms where we take the same amount of time,and sometimes longer, to rescue individuals of both genders, who perhaps at leastsuperficially are in less distress? Is the superiority of our knights still evident, andif so in what respect? The study of private insurance cases I mentioned earlier(Puschner et al., 2007) would suggest caution. When psychoanalyticpsychotherapy and psychodynamic psychotherapy were tracked over two yearsin 480 patients no significant differences were seen in the rate or extent of declinebetween these two groups. So, what is the moral? We need more trials with abroader set of outcome measures.
Before finishing my story about LTPP, let me make three further observations; first on the naturalistic follow-up of patients treated by members Downloaded by [DAVID LOPEZ] at 13:08 05 September 2013 of this organization in long-term therapy or psychoanalysis (Beutel, Rasting,Stuhr, Ruger, & Leuzinger-Bohleber, 2004; Leuzinger-Bohleber, Stuhr, Ruger, &Beutel, 2003). It was an interesting and carefully constructed study, not so muchfor the findings but because the Templar Knights who offered the treatmentworked alongside the researcher bees and used qualitative as well as quantitativemethods to show how the patients experienced their treatment. The vast majorityof patients valued their experiences and the impact of treatment could bemeasured in terms of healthcare costs. The most ambitious study ofpsychoanalytic psychotherapy comes from Helsinki (Knekt et al., 2008).
It contrasted Solution Focused Therapy and psychodynamic psychotherapy withlong-term psychodynamic psychotherapy for patients with mixed depression andanxiety problems. These bee-keepers and research bees worked extremely hardand pursued their patients relentlessly over three years. This was just as well because significant benefit from long-term treatment was not found at 18 months,nor even at 24 months but only at 36 months. In fact our knights, who had beentrained to do battle for years rather than months, took their time to achievesuccess and were frankly struggling to keep up with the progress of their short-term colleagues over the first year of the trial, but of course it could be said thatthe patients who selected themselves for long-term treatment were the toughercustomers. The third study is probably the best study of LTPP carried out so far.
The Munich study of Dorothea Huber and Gunter Klug (Huber et al., submitted) isremarkable for being the only one to use a specifically psychoanalytically-orientedoutcome measure, Wallerstein’s Scales of Psychological Functioning (Klug &Huber, 2009). The instrument is based on psychoanalytic experts’ definitions andincludes 17 dimensions, each divided into two sub-dimensions, (a) exaggerated,and (b) inhibited functioning. So impulse regulation may be pathological becauseof over-indulgence or over-inhibition. You could finally see the benefits ofpsychoanalysis on this measure, but even here, convincingly only after a 1-year wait.
So, it seems to me that in looking at intensive long-term treatment we are probably measuring the wrong things and not for long enough.
The current story of psychotherapy for Borderline Personality Disorder(BPD): An example Before ending this part of my story and while I still have your attention, I wouldlike to tell you one of my own tales. Indulge me please, not only because it is astory where I played the part of a bee-keeper but because it illustrates a number ofthe issues of evidence with which we as knights of psychodynamicpsychotherapy are having to confront.
Now, there has been many a trial with patients known as borderline and many a potion has been tried and shown to be effective in randomized controlled trials.
These include Dialectical Behaviour Therapy (DBT) (Linehan et al., 2006);Transference-Focused Psychotherapy (TFP) (Clarkin et al., 2007); SolutionFocused Therapy (SFT) (Giesen-Bloo et al., 2006); Cognitive Behavioural Therapy Downloaded by [DAVID LOPEZ] at 13:08 05 September 2013 (CBT) (Cottraux et al., 2009); Cognitive Analytic Therapy (CAT) (Chanen et al.,2008); Dynamic Deconstructive Psychotherapy (DDP) (Gregory et al., 2008);Systems Training for Emotional Predictability and Problem Solving (STEPPS)(STP) (Blum et al., 2008); and Mentalization-Based Treatment (MBT) (Bateman &Fonagy, 2008). It should be clear to you that as long as the acronym for the magicformula has three letters in it, it will be shown to be superior, mostly againsttreatment as usual or a less than adequate comparison. All these treatments arespecialist interventions requiring extensive training and continuous supervision.
So, when Antony Bateman and I started out on our quest (Bateman & Fonagy, in press), we made solemn oaths that we were not going to return unless we founda comparison that was more meaningful to practitioners and third party payersthan the comparisons we found in the journals kept by the High Priests ofevidence-based practice. Our solemn oath committed us to find superiority compared to a structured treatment organized in a coherent treatment programmewith equivalent supervision, when both treatments were delivered by knightstrained to the same level but without family crests committing them to one orother side. We also swore to collect a clinically representative sample of damselsand gentlemen with confirmed diagnosis of BPD and at high risk of suicide.
We swore not to stop until we reached adequate statistical power to detect evenrelatively small differences.
Our quest concerned the trial of Mentalization-based Treatment (Bateman & Fonagy, 2006), a potion we ourselves concocted containing special ingredients tohelp patients to understand their mental states more clearly, both simple andcomplex, and to assist them in regulating their emotions, focus their attention(effortful control) and conceptualize their own actions and the actions of others interms of thoughts, feelings, wishes, beliefs and desires. MBT is not concocted tomake people ‘see in the dark’ (in other words it is not aimed at unconsciousinsight) but rather to help them to make better psychological sense ofpreconscious aspects of their interpersonal (especially their attachment)relationships, including their relationship with their therapist.
The trial was open to women or men who made a suicide attempt or a life- threatening act of self-harm within the last six months and who carried a confirmedlabel of Borderline Personality Disorder (BPD). Our commitment was to treat allcomers to ensure that our potion worked for all, except those with organic braindisorder, psychotic disorder or opiate dependence. We assessed patients atadmission, six months, 12 months and 18 months. The programme lasted 18months and included an individual and a group therapy session once per week.
Now, in order to ensure that it was our potion that was responsible for the healing and not our special knights with particular family crests, we co-opted 11apprentice knights who were randomly assigned to be trained to deliveroutpatient MBT or outpatient structured clinical management (SCM). Structuredclinical management is what a good psychiatric department should offer thesepatients in a coherent and ordered manner. All therapists had a minimum of twoyears’ experience of treating patients in general psychiatric services following Downloaded by [DAVID LOPEZ] at 13:08 05 September 2013 completion of their general psychiatric training and at least one year’s experienceof treating patients with BPD. Their average psychiatric experience was well oversix years. Amongst our knights were seven nurses (four MBT and threeStructured Clinical Management (SCM)), three trainee psychiatrists (two MBTand one SCM) and one accredited counsellor (SCM). We trained both the MBTand SCM therapists, and both trainings were carefully designed to be of roughlyequal duration, three days basic training and two days advanced training for MBTand comparable basic training on personality disorders and generic supportivepsychotherapeutic techniques for SCM. Supervision was offered on a weeklybasis for the knights in both arms of the trial.
SCM included many of the ingredients of MBT, including support and structure, challenging of self-destructive acts, and crisis management, but inaddition it included advocacy, social support work, problem-solving and regular medication reviews. Knights trained in the MBT methods were taught how toenhance basic mentalizing skills, how to offer mentalizing interpretations, howto mentalize the transference, how to deal with crises, emotional storms,self-harming behaviour and suicide attempts by addressing these experiencesas failures of mentalizing triggered by a variety of interpersonal experiences.
The medication review for this group followed the same protocol as the SCMgroup, but the therapist knights were required to attempt to explore thepsychological reasons behind the requests by the patients or others involved withthe patients for changes in medication, in the same way as they would for all thepatient’s other actions.
A total of 168 patients were screened for eligibility and 134 were randomized.
Of the patients excluded, one-third did not attend interviews and a further one-third declined to participate. The rest either did not meet inclusion or exclusioncriteria or were uncontactable. Some 52 of 71 participants allocated to MBTcompleted treatment and 47 out of 63 allocated to SCM did so. There wasminimal missing data and all allocated patients were included in the analysis.
The primary outcome was the proportion of each group without severe para-suicidal behaviour as indicated by suicide attempts, life-threatening self-harm orhospital admission in the previous six months. The randomization was successfuland the groups did not differ in key characteristics. This was a fairly complexgroup of patients with about 3 Axis I diagnoses and 2.5 Axis II diagnoses inaddition to BPD. By 18 months 57% of the SCM group but only 27% of the MBTgroup had shown severe parasuicidal behaviour, which corresponds to a relativerisk reduction of 0.46 and an NNT of 3. Breaking this down into suicide attempts,life-threatening self-harm and hospital admission separately, we find significantbenefits for suicide (NNT of 4), hospitalization (NNT of 6) and a somewhat lessconvincing effect for self-harm (NNT of 5 but CI of 2 – 32). At the end of the18 months over 70% in the MBT group but only 45% in the SCM group were nolonger taking psychoactive medication.
I don’t have time to recount all the results but self-report measures of Downloaded by [DAVID LOPEZ] at 13:08 05 September 2013 symptom severity, depression, social adjustment and interpersonal problems aswell as blind ratings of general adaptive functioning (GAF) scores suggested thatMBT benefited patients substantially more than SCM.
However, an important moral of this tale for us was less how well the knights wearing MBT colours did and more how well both groups did on almost all themeasures. While our treatment may have worked better, both structured,integrated and focused treatment protocols achieved better results than we mightexpect from spontaneous remissions of symptoms from follow-along studies.
So, while the trial supported MBT, it also supported structured treatmentapproaches in general. The general utility of focusing on the patient’s mind issupported and the trial strongly contraindicated the adoption of non-focusedgeneric approaches and the premature exclusive adaptation of any one model.
We shall return to this theme later on.
The story of the demise of the randomized controlled trial I think it is now time to recount the stories of the sect of zealots who havetried to rule the kingdom of psychotherapy over the last couple of decades(Sackett, Richardson, Rosenberg, & Haynes, 2000; Strauss, Richardson,Glasziou, & Haynes, 2005). They wear many crests on their shields, but allswear allegiance to evidence-based medicine as ‘the conscientious explicit andjudicious use of current best evidence in making decisions about the care ofindividual patients’ (Sackett, Rosenberg, Gray, Haynes, & Richardson, 1996).
Their slogans are motivational, persuasive and essentialist rather than reportive,stipulative and operational.
The initiative is basically synonymous with attempts experimentally to establish a causal relationship between treatment and outcome. Hence their mosthighly valued strategy is the randomized controlled trial. Now, a strange thinghappened last year. A high priest of evidence-based medicine, none other thanSir Michael Rawlins, the Director of the National Institute of Health and ClinicalExcellence (NICE), gave a Harveian Oration at the Royal College of Physiciansabout the over-valuation of randomized controlled trials in evidence-basedmedicine (Rawlins, 2008). He pointed out that frequently such trials areinappropriate. There can be bioethical and legal problems in randomizing peopleto ineffective or harmful treatments. Some conditions are so rare and sometreatment effects so massive that few would consider Randomized ControlledTrials (RCTs) to be sensible. For example, parachutes are very widely useddespite the fact that they have not been subjected to RCTs. He also pointed outthat the null hypothesis of RCTs was there often for show rather than genuineconviction. It was certainly inappropriate where previous studies had alreadyshown an effect and statistically and conceptually difficult if the aim of the studywas to show no difference between treatment arms. Rawlins also identified somechallenges in relation to applying theories of probability in RCTs.
Many studies use multiple variables to measure outcome, and this has massive potential to create confusion. With Anna Higgitt we have made a study Downloaded by [DAVID LOPEZ] at 13:08 05 September 2013 of reviews of the value of Selective Serotonin Reuptake Inhibitors (SSRIs) forpaediatric depression (Fonagy & Higgitt, 2009). There are only about 15 RCTsbut nearly 100 reviews have been published since 2005. These review thesame investigations but come to dramatically different conclusions by focusingon different aspects of the results reported in the original investigations.
The conclusions vary anywhere between ‘ban all SSRIs’ to ‘use SSRIs as firstline of treatment’ with various gradations in between. (‘Ban some SSRIs’,‘use only one in first line treatment’, ‘use as second line treatment afterpsychological therapy has failed’, etc.) Sir Michael went on to revisit the issue most frequently alluded to in considering the appropriateness of RCTs in the Kingdom of Psychotherapy: howgeneralizable are the results of RCTs? Certainly, the settings in whichpsychotherapy RCTs take place are quite different from the real clinical situation (La Greca, Silverman, & Lochman, 2009; Weiss, Guidi, & Fava, 2009). Most areundertaken in selected populations for a finite time, rather than a heterogeneouspopulation with many life problems and co-morbidities, which might excludethem or they would exclude themselves. Certainly patients in trials and ‘real life’differ in age, gender, severity, risk factors, co-morbidities, ethnicity, andsocio-economic status. As if that was not bad enough, the treatment given inpsychotherapy RCTs rarely fits clinical reality in terms of dose (frequency oftherapy), timing of administration, duration of therapy, inter-current treatments,the skills and commitment of the practitioners. There are major differences in thesetting, the way the treaters are reimbursed, and their professional priorities –publication versus clinical care. There is a real question about whether theassessment of benefit obtained from a trial can be applied to ordinary clinicalsettings.
Sir Michael also drew attention to the assessment of harm. RCTs are appalling at testing for the possibility of harm (Lilienfeld, 2007; Roback, 2000).
Jefferys, Leakey, Lewis, Payne, & Rawlins (1998) noted that surveying drugsintroduced between 1972 and 1994, they could find 22 that were withdrawn forsafety reasons but only one withdrawn for lack of efficacy. This is hardlysurprising, since RCTs are rarely powered up to scrutinize adverse events. Thecontroversy over SSRIs and paediatric suicide is a good example. None of theplacebo-controlled trials are large enough to show a significant difference in ratesof spontaneous adverse events (SAEs). When taken together, they showed suchevents to be twice as frequent amongst children and adolescents taking activemedication rather than placebo, but then those advocating for the use of thesedrugs argue that the study designs and samples are too heterogeneous to permitthat kind of integration.
So perhaps, as Rawlins concluded, the knights of RCTs have been inappropriately elevated above those conducting observational studies.
The hierarchy is illusory. Both kinds of study have advantages and disadvantages.
In making decisions about cost-effective treatments against a background oflimited resources, we need to appraise all the evidence and exercise judgement.
Downloaded by [DAVID LOPEZ] at 13:08 05 September 2013 RCTs are enormously costly: 153 pharmaceutical RCTs had median costs of£3,200,000, with an interquartile range of £2 m to £6.25 m.
RCTs are not only resource-intensive in terms of money – they also take up considerable time and energy. Perhaps it is for this reason that their reporting is sooften troubled by bias, aiming to emphasize the differences and be relativelysilent about null findings. Take just one recent example on therapy for treatmentresistant depression in adolescents (Brent et al., 2008). Reading the abstract ofthis study we learn that when a young person did not respond to at least one SSRI,40% are likely to improve if offered a medication switch, but they will improveeven more if this switch is accompanied by CBT. The response rate increase is15%. It makes no difference if they are switched to another SSRI or venlafaxine,but adding a psychological treatment clearly improves response rate. I don’t wantto bother you with the detail of this very complicated study other than to say that a disturbingly large number of young people withdrew from each arm of thetreatment (out of 334 randomized, 102 – almost one-third – withdrew from thetreatment, a significant number (41) because of adverse events). This raises realconcerns about the acceptability of any of the treatment arms. Looking atresponders, it seems clear that adding CBT improved response rates, althoughthis seems less clear when we look at only those young people who completed thetreatment trial (i.e. had the full benefit of CBT).
The story here is about something else, far more serious in relation to Evidence-based Medicine (EBM). It is about three figures which I drew on thebasis of numbers reported in very large, hard to access tables in the journal. Theseconcerned children’s self-report of depression, suicidal ideation and the globalassessment of independent observers. I challenge you to see any differencewhatsoever between the severity of depression, suicidal ideation or even theprobably not so blind assessors’ rating of global functioning. The conclusionreported in Journal of the American Medical Association (JAMA) is that thecombination of CBT and a switch to another antidepressant results in ‘a higherrate of clinical response than did medication switch alone’. In the body of thepaper, which is all about the value of CBT for this group, there is a tellingsentence: ‘There were no differential treatment effects on scalar measures ofdepression, suicidal ideation, and functioning, nor were there treatment effects onsuicide attempts, or self-harm-related measures’ (Brent et al., 2008, p. 909).
I could bring many other similar examples of biased reporting of results. They are more common in the pharmacological literature, where commercial interestsare great, but they are by no means unique to them. Nor would I wish to suggestthat psychodynamic researchers are any less hampered by self-serving biasesthan those from other orientations. Fifteen years ago one of our championresearchers, Lester Luborsky (Luborsky et al., 1999) published an entertainingpaper that showed you could magically divine the conclusion of a paper onpsychotherapeutic outcome just by knowing the theoretical orientation of itsfirst author. My point here is that notwithstanding the rhetoric of evidence-basedmedicine, conscientious, explicit and judicious it frequently is not.
Downloaded by [DAVID LOPEZ] at 13:08 05 September 2013 As psychoanalysts we understand about how the mind has ways of bendingreality to that which maximizes pleasure and minimizes psychic pain. Whyshould those energetically in pursuit of making conscientious explicit andjudicious use of current evidence be any different from the rest of us? I don’t have time to tell you the heroic tale of those who forced drug companies to reveal trial data indicating the risks of SSRIs, venlafaxine,mirtazapine, etc. Let it suffice to say that when five years ago we published dataincluding unpublished studies along with published ones on the efficacy of SSRIsfor paediatric populations we found a disappointing reduction in effectiveness(Whittington et al., 2004). It took another four years before someone replicatedour design for adult trial data (Turner, Matthews, Linardatos, Tell, & Rosenthal,2008). Some 37 of 38 studies viewed by the FDA as having positive resultswere published. Out of 34 studies viewed by the FDA as having negative or questionable results, 22 were not published and 11 were published in a waythat implied a positive outcome. Conscientious? Explicit? Judicious? So, let’s turn over a metaphoric page. The moral has to be: ‘read carefully, don’t believe everything you do read and look for things that could and should bethere’.
A new intellectual framework for psychoanalytic psychotherapy research This brings me to the last of my tales. This is a story of looking back towards thefuture – the title of a brilliant paper by my colleagues Patrick Luyten and SidneyBlatt (2007).
There are a number of episodes to this story and the first concerns the question It turns out that our brave knights and their faithful researcher companions were competing in trials where not only the rules of engagement but also thecriteria for winning had been defined by the Barons of Big Pharma. The outcomemeasures in the field of psychotherapy are self-report measures designed to bereactive to changes which neurochemical interventions tend to bring, by and largethe blunting of awareness of distress caused by symptoms. First, in manyinstances psychodynamic psychotherapy tries to increase awareness of distressrather than reduce it. Second, it is a devastating indictment of the entire systemthat there has been almost no client participation in defining outcome measuresand the entire scheme is an edifice to evidence-based practice as prescribed byprofessionals (Dolan, Lee, King, & Metcalfe, 2009). From a professional’sstandpoint, as from that of the ordinary member of the public, physical rolelimitation, physical function and pain have high priority, while those sufferingdisorders rate dignity and general wellbeing (mood, global assessment of life,having a partner, job, lots of social contact) as more important. Wellbeing shouldfeature at least alongside, if not in place of, lists of symptoms in outcome studies(Pressman & Cohen, 2005).
Measures for the most part are arbitrary, measuring subtle psychological Downloaded by [DAVID LOPEZ] at 13:08 05 September 2013 processes on arbitrary scales. Measures are arbitrary, but we reify them, we treatthem and think of them as if they were not (Kazdin, 2006). A review of 2000RCTs in schizophrenia identified 640 scales, many of which were devised for theparticular RCT and had no supporting data for validity or reliability. Unvalidatedscales were more likely to show significant treatment effects than establishedscales. Arbitrary or not, our measures should be neutral in relation to the nature oftreatment they intend to evaluate, otherwise we might find treatments targetingthe scales of measurement rather than the disease process, which of course wouldbe a travesty. For a number of years we have had non-reactive functional brainimaging measures of outcome available. Since Eisenberger, Lieberman, &Williams (2003) demonstrated that social exclusion could activate the very samebrain areas (anterior cingulated cortex, right ventral pre-frontal cortex) as theexperience of physical pain, there have been literally hundreds of demonstrations of fMRI yielding accurate sensitive information related to subjective states.
There have been 27 neuro-imaging studies of psychotherapy using a number ofimaging modalities, a range of diagnoses and therapeutic approaches (Carrig,Kolden, & Strauman, 2009).
These studies have their limitations, and almost no studies provide data on changes while treatment is going on. However, with a little ingenuity from thebee-keepers and researcher bees, functional tasks could be designed which arespecific not only to the disease condition but also to the hypothesized mechanismof action of the mode of therapy. Exploring the interplay of biological andpsychological processes has the potential to enhance our understanding of themode of action of psychotherapy. Multiple lines of evidence are likely to beneeded to identify the mechanisms critical to particular types of intervention(Kazdin, 2008). The aim of this would not be to make psychological accountsredundant by providing a biological explanation, but rather to be more specificabout what makes therapy work and to identify instances when it works well andfor the long term. Biomarkers of change offer a way to dig deeper thaninformation on symptom distress. Furthermore, basic biological research maysuggest antecedents and primary pathologies that could prove to be targets forpsychotherapeutic intervention. For example, neurobiological researchers haveidentified impulsivity as a key target for preventative intervention for problems ofaddiction (Lejuez et al., 2007).
This brings us to one of the key questions facing our valiant knights as they look back into the future. The system of feudal patronage without primogeniturewhich dominates our psychotherapeutic kingdom has led to fragmentation; oncevast fiefdoms have been reduced to literally hundreds of pocket-sized plots ofland. There are hundreds of therapies, many of them even possessing ‘title-deeds’(evidence) supporting their potential to produce change, but are there really asmany mechanisms for therapeutic change as there are modalities and orientationswithin modalities? Understanding how psychotherapy leads to change couldgroup these together.
Are we as good at bringing about change as we think? Have all these knightly Downloaded by [DAVID LOPEZ] at 13:08 05 September 2013 trials of therapy improved our effectiveness as therapists? Let me show you justone example. The average effect sizes obtained in trials of cognitive behaviourtherapy from 27 trials in youth depression in the decade leading up to 1990 wasaround 0.8. In the last decade of the Twentieth Century the average was reducedto 0.5, while for the trials in the last 10 years the average effect size from trialswas around 0.3. It should be obvious that rather than increasing in effect, the sizeof therapeutic benefit has been decreasing with time (Weisz, McCarty, & Valeri,2006). Why? The obvious answer is that the Templar Knights of therapy havegiven the wrong instructions to their researcher bees. On their instructions theyhave been using increasingly stringent criteria to define a therapy that works(for example, increasingly realistic tests of effectiveness). This has led to smallerobserved effects because the treatment offered was only minimally modified.
Improved research methodology will not yield a more efficacious treatments.
By contrast, if we learn more about how therapeutic change comes about, wemight be able to identify alternative and superior strategies that are more efficientin triggering a change process. In particular, as we look back to think about howthe gap between clinical effectiveness observed under research conditions may beeffectively translated to ordinary daily practice, knowing about the mechanismsof change may help us selectively to guard features that should not be dilutedwhile being more relaxed about others that contribute to research vigour but nottherapeutic efficacy.
Understanding mechanisms better will also help us to pinpoint moderators of treatment effectiveness. As clinicians we know that even the best approachdoesn’t work for everyone. However, other than a few rules of thumb or truthspassed down from our teachers, at the start of a treatment we have few ideas aboutwhat the indicators might be that suggest whether it is likely to work.
Inexperienced therapists may often be as or more effective with certain chronicpatients (Brown, Lambert, Jones, & Minami, 2005) because they do not have theexperience that would tell them that their effort is likely to be futile.
Finally, the effectiveness research of the past 50 years, particularly the randomized controlled trials, have shown us that psychotherapy is causal inbringing about change. However, demonstrating causation is but an illusion ofexplanation and a pernicious illusion at that. It gives rise to superstitiousbehaviour such as Skinner’s pigeons randomly delivering a pallet reinforcingwhatever activity they happened to be engaged in; rather than understanding howour treatments work, we merely repeat exactly the behaviours that led to thepositive observed outcomes. In the UK we are involved in a nationwide trial ofmulti-systemic therapy (Henggeler, Clingempeel, Brondino, & Pickrel, 2002)where enormous financial and human effort is being expended to replicate theintervention exactly as it was carried out in the USA at the Medical University ofSouth Carolina. The care we take in the replication represents a futile andwasteful effort which is necessary only because of our ignorance of mechanismsof therapeutic action.
However, not only experimental studies but also observational studies are Downloaded by [DAVID LOPEZ] at 13:08 05 September 2013 misleading in terms of identifying the effective components of treatments. Thereis the legend of the therapeutic alliance, still frequently taught in (k)night schools.
It is often claimed that the therapeutic alliance is a mediator and mechanism oftherapeutic change since the stronger the alliance, the greater the changeobserved (Klein et al., 2003). Correlational studies also show that alliance at thebeginning of treatment predicts improvement in symptoms at the end(Cloitre, Chase Stovall-McClough, Miranda, & Chemtob, 2004). Do we needany more evidence to prove that it is the good relationship with the therapist thatcures? More recent research that contrasted the outcome of patients with anumber of therapists found that indeed differences between the effectiveness oftherapists could be predicted by the strength of alliance they were likely to formwith their patients (Baldwin, Wampold, & Imel, 2007) but differences in outcomebetween patients with the same therapist were unrelated to therapeutic alliance.
If therapeutic alliance was the mechanism of change, then I would expect to dobetter with patients with whom I form a good alliance than those with whom myalliance is relatively poor. This turns out spectacularly not to be the case. So theability to form an alliance does mark out our more talented therapists but what itis that they do more or less of that makes them more or less effective still remainsa mystery.
Understanding why therapy works will increasingly require an understanding of the moderators of therapeutic effectiveness. Rapidly advancing biologicalresearch is providing increasingly persuasive evidence that there may be geneticlimitations on how well therapy can work. Freud (1937) in his last major paper,‘Analysis terminable and interminable’ seemed to be acutely aware of thelimitations of his technique, although he might understandably havemisperceived some of the processes involved. Six years ago, in a widely quotedstudy, Avshalom Caspi and Terri Moffitt showed that the association between thenumber of stressful life events an individual experienced between 21 and 26 yearsand the probability of depression, suicidal ideation and suicide attempts wasmoderated by the 5HTT genotype (Caspi et al., 2003). Only those who had two ofthe short alleles of this genotype were likely to respond to four life events withincreased suicidal ideation. The association between life events and suicidalideation of those with two long alleles was completely absent.
This area of research has become a minor cottage industry, although many geneticists are appropriately sceptical about it (Risch et al., 2009). The mostchallenging finding, to me as an attachment theorist, is the report fromKochanska’s laboratory which demonstrated that maternal sensitivity predictedinfant security of attachment as it is supposed to only in infants with the shortallele of the 5HTT genotype (Barry, Kochanska, & Philibert, 2008). Infants withthe long allele were equally likely to be secure regardless of maternal sensitivity.
Along similar lines but with older children, Kaufman reported an elegantstudy showing that the depressogenic effects of maltreatment could bemitigated by social support in individuals with the short allele of the 5HTTgene (Kaufman et al., 2004).
Downloaded by [DAVID LOPEZ] at 13:08 05 September 2013 The hypothesis which cries out to be tested is that individuals whose depression is more likely to be associated with life events, as their serotonin-transported gene marks them out to be environmentally sensitive, are also morelikely to benefit from an environmental intervention such as psychotherapy.
There is evidence indicative of the appropriateness of this way of thinking fromthose working on an attachment-based intervention to promote positive parentingand sensitive discipline. In this study, where parents of toddlers withexternalizing behavioural problems were given video-feedback and otherattachment-theory guided interventions, the children benefited if they had theseven-repeat allele on the DRD4 gene (Bakermans-Kranenburg, Van, Pijlman,Mesman, & Juffer, 2008).
The moral here is not that psychotherapy should not be offered to people without this or that allele, but rather that the mechanism by which therapy achieves its effect may be quite different for these constitutionallydistinguishable groups of individuals. If we choose to ignore the reality ofthese differences in our clinical work, future generations are likely to judgethis decision as unethical and unjustifiable and potentially as an indication ofself-serving attitude.
The moral of this long tale is that science is good for practice. Research is therenot simply to defend the boundaries of our existing domains, but to help us todeliver the forms of care which are best for our patients. To do this we have tounderstand better which causal mechanisms play a role in achieving patientbenefit and also what circumstances can interfere with a treatment working.
However, the flow of information should not be one way. Science, particularlyneuroscience, will give us better ideas about how we can help our patients inmore differentiated ways as it evolves.
Practice is also excellent for science. Practice has to tell researchers where knowledge is most needed and to ensure that science is firmly grounded ineveryday clinical care. Best evidence is only meaningful if used in properargumentation. Argumentation is only meaningful if based on the best evidencein its building blocks. Jules Henri Poincare´ wrote: ‘Science is built up with factsas a house is with stone, but a collection of facts is no more a science than a heapof stones is a house (Poincare´, Science and Hypothesis, 1905).
Thus ends my story of research in the land of psychotherapy, but I have been telling this story for long enough to know that arguments are persuasive onlywhen they reach those parts of the brain where emotional significance is stored.
So here is an observational study with personal significance for all in this room.
