The subgroup and the European Medicines Agency
The disclosed outcome is of overall non-significance (data not disclosed), but significance, p=0.015, was seen in a subgroup of patients with a “mid- range” EDV, numerically undisclosed. Selecting subgroups retrospectively is subject to bias, but Celyad states it used a statistical methodology. EDV was measured at baseline as it is used to calculate ejection fraction, one of the endpoints. In this case, the subgroup p value indicates a clear separation between treated and placebo sets. There are statistical methods to adjust the p value threshold in these situations as the primary endpoint threshold of p=0.05 is too high for subgroup analysis; no adjusted threshold has been discussed but management indicate that the result is robust enough for an EMA submission. The EMA has provided guidelines for subgroup analysis. These suggest that scientific credibility of the finding, which can be qualitative, is important in any regulatory decision.
The EMA process anticipated will probably run during 2017 with marketing possible from 2018. Review takes 210 days excluding clock stops for questions; Celyad has already submitted preliminary background data. If a positive opinion is issued, the EU commission will issue an approval some months later after pack inserts and other details have been approved.
Without more information, the regulatory outcome is very hard to assess. The EMA can be much more pragmatic than the FDA. If a treatment is novel, safe, as this appears to be, and meets an unmet medical need (these patients were on the best current therapy but not improving), then the EMA may recommend approval, possibly on a conditional basis involving a patient registry (database) and further studies. Celyad management noted that the data was “intensive and powerful”. As this is uncertain, no EU sales are forecast until CHART-2 data perhaps in 2021.
In European markets, each is different, cost effectiveness is a major buying decision criteria. Agencies such as the UK’s NICE will take at least a year to do a rigorous economic assessment; other countries have similar processes. Even if “cost effective”, there is no guarantee that the cash budget will be available. Germany is probably the most important market. France typically requires extensive pricing discussions. The UK is very cash constrained, maybe more so in future.
End diastolic volume – a beginner’s guide
The CHART-1 study did not have EDV as a selection criteria or as a pre-specified endpoint. Edison notes that this is the sort of finding that trials, which are scientific experiments, can reveal especially when it is the first large-scale, properly conducted study of its type to report. This is cutting edge science and clinical development. The EDV finding scientifically needs confirmation by other studies, but is regarded by Celyad scientific and clinical advisers as a good indicator.
EDV is the volume of blood in the left ventricle in the instant before it contracts. Echocardiography is the usual measurement method of choice and was used by Celyad. EDV is needed to calculate the ejection fraction by subtracting end systolic volume, the fully contracted left ventricle volume, from EDV. Hence EDV was measured. Normal levels seem to be about 100mL for women and around 150mL for men. However, measurement techniques vary so this is not a standardised methodology (Rigolli 2016). There is also an EDV index, relating EDV to body size.
On contraction of the heart, blood is pushed into the aorta and round the arterial system. Some blood remains in the ventricle. The volume of blood ejected depends on the pressure in the aorta – if high, less blood is pushed out – and the strength of the left ventricle muscle. To be in CHART-1, patients only expelled 35% or less of the EDV; this is the ejection fraction (EF). A healthy person will eject 50-70%, with 60-70% being normal. We do not know the ejection fraction range in CHART-1 or how EDV related to EF.
However, it is complicated as always. The Frank-Starling effect shows that in a healthy person, the EF increases as the EDV rises until a limit where cardiac output (volume) plateaus. This appears to be related to the elasticity of the muscle fibres; if stretched more they contract better. Patients with lower blood volumes, for example undergoing major surgery or after trauma, pump less blood. Volume, and pressure, are the key to positive clinical outcomes.
In heart failure, the Frank-Starling effect is weakened or lost. The left ventricle becomes distorted and distended. The muscle is damaged in places or dead if there is no blood supply due to a heart attack (infarction) so is weaker overall. Hence, EDV is much higher, perhaps doubled, relative to healthy people. A high EDV and weaker muscle leads to a low ejection fraction. However clinically, the key issue is the volume of blood ejected. If EF falls but EDV rises, the two may cancel out, although pressure is an issue.
The reason why Celyad found an apparent effect in “mid-range” EDV patients is not certain. Patients with very weak cardiac muscles may have high EDV but have little viable tissue and be too damaged for the C-Cure cells to have an effect. In “healthier” patients, it may be that there is sufficient functional muscle so the cells make little difference particularly as the Frank-Starling effect shows that the cardiac system is adaptable. More analysis may give some indication.
In the clinical literature, Kramer (2010) reviewed 25 cardiac intervention trials covering 69,766 patients and a further 80 studies on cardiac remodelling with 19,921 patients. The analysis focused on mortality, which is the primary endpoint but not expected by Celyad to be a key indicator in CHART-1. The analysis by Kramer et al is in Exhibit 2. There is a clear correlation with mortality but at best it is 50%, so these variables are not the only factors: These factors are interlinked and they are not independent variables. Hence, as CHART-1 has a clear EDV signal, one would expect to see similar signals in EF and ESV in the same grouping. If these are not seen, then the EMA may be highly sceptical.
Exhibit 2: Correlation of EF, EDV and ESV with mortality
Factor |
Correlation (r) |
Significance (p) |
Comment |
LVEF |
-0.51 |
0.001 |
The best correlation with mortality was ejection fraction (EF). at about 50%. The ejection fraction is the difference between the diastolic volume before the heart contracts and the systolic volume at the end of the stroke. The correlation is negative as the lower the EF, the higher the risk. |
EDV |
0.44 |
0.002 |
The correlation with mortality is under 50% so this is clearly present, but there are many other factors. The correlation is positive as the higher the EDV, the greater the risk. |
ESV |
0.48 |
0.002 |
The correlation with mortality is nearly 50% so this is clearly present but there are many other factors. The correlation is positive as the higher the ESV, the greater the risk. |
In a device trial Goldenberg (2011) found that patients in the top quartile of EDV had the best response. This is in contrast to the CHART-1 where high EDV (undefined) had weak response. Goldenberg also noted a very similar ESV response pattern.
CHART-1 endpoints – crucial but the data is not yet disclosed
A paper on the CHART-1 design was published in 2015: Bartunek et al, 2015. The study uses a Finkelstein–Schoenfeld hierarchical composite endpoint. The endpoints are, in order of testing, listed in Exhibit 2. This process appears complex but allows multiple endpoints to be included.
For example, if Patient 1 is compared to Patient 2 and they both survived, had no cardiac events and rated equal on MLHFQ, all those score zero. If Patient 1 walks more than 40m further on 6MWT but Patient 2 walks less than 40m further (compared to their baselines), Patient I scores 1. No other endpoint is then scored by Patient 1 against Patient 2. When Patient 2 is compared against Patient 1, they will score -1.
Next Patient 1 is compared with Patient 3. For example, if Patient 3 died, Patient 1 scores 1 (as a survivor).
Patient 1 is then compared with Patient 4, Patient 5 and so on. The score for each patient is then summed to give a single value. For example, when Patient 3 is compared to all other patients, if there were 19 other deaths, all those score zero. As all the other 251 patients survived, Patient 3 scores -251 in total. All 271 scores for each patient are put into order and a rank order assigned to each. The groups are then separated into placebo and treated so each group has a distribution of rankings. A two-sided 5% significance level using a modification of the generalized Wilcoxon test is used to get statistical significance. This tests if the distribution is the same between the groups.
The authors of Bartunek et al, 2015 note that the Minnesota Living with Heart Failure Questionnaire (MLHFQ) and six-minute walk (6MW) tests “may be influenced by knowledge of the treatment variable”. These tests are likely to be crucial to understanding the trial result. They are rated above cardiac quantitative measures as they are more clinically relevant. Exhibit 2 shows the design.
To take an example of scoring for a single patient, to score a mortality endpoint, a patient scores one if they survive but the comparison patient dies. If they both survive they are equal, so score zero. This is comparison is done with every other patient. However, if this patient died and the comparison survived, the patient scores minus one. If both survive the outcome is zero (the likely outcome on this measure). The second endpoint is then tested then the third and so on. Some patients will not have full data sets, for example, because they died before the designed 39-week endpoint or did not complete some tests, but the scores they gained can still be included in the data. This allows hard endpoints like mortality where few events are expected to be included (so most comparisons score zero) with endpoints that are expected to have a more variable outcome like quality-of-life scores and the six-minute walk test.
Exhibit 3: CHART-1 hierarchical Phase III endpoints in order
Parameter/ change |
Cut off to score |
Expected outcome (treated vs placebo) |
Comments |
Mortality |
39-week survival post treatment |
7.5% vs 10%. |
Death provides a clear endpoint with an overall 8.75% death rate so 21 deaths expected: nine in the treated vs 12 placebo. Because of the small difference (three people), the figures could be very variable and are unlikely to be statistically significant. |
Worsening Heart Failure (WHF) events |
0, 1 or 2 or more events scored |
Events expected: None – 83.5% vs 78% One – 11% vs 16% Two plus – 5.5% vs 6% |
Patients need to be in NYHA Class II or higher to enter CHART-1. Although 46 events were expected (20 treated, 26 placebo) most patients are not expected to show any worsening so will score zero and progress to the next test. The number of two or more event patients is effectively equal as six/seven expected but small number effects could influence the data. |
Minnesota Living with Heart Failure Questionnaire (MLHFQ) |
A 10 or more point decrease (improvement in condition) scores one |
Total score of -14 vs -5 points expected with a common standard deviation of 20 points. |
MLHFQ has 21 items ranging from physical symptoms to daily activities with high score (up to five points per question) indicating a poor condition and a zero showing no effect. The maximum score is 105 for a severely ill, badly affected patient. A difference of five or less is not considered clinically meaningful. A reduction in score indicates an improvement in the patient’s condition. Scoring systems can be subjective and variable but MLHFQ has been widely tested. Note the high standard deviation, which is much higher than the excepted average difference of nine. To enter CHART-1, patients needed a score of 30 or more. |
Six-minute walk test (6MWT) distance |
A difference of 40m or more scores one |
The expected improvement is of 45m vs 10m with a standard Deviation of 120m. |
The test is done with a 100m long oval track in level corridor. It is regarded as a robust clinical endpoint by the FDA. Patients are reminded of the time during the walk and stop, start and sit down as they wish. To have enrolled in CHART-1, patients needed to walk between 100m and 400m before treatment. However, this is highly variable as seen by the high standard deviation, which is three times the scoring level. Average distances can flatter from one or two good performances so median distance is a more relevant measure. |
Left ventricular ejection systolic volume (LVESV) |
15mL decrease scores one |
−10mL vs 5mL with a standard deviation of 20mL. |
Systolic volume is the volume of blood left in the ventricle (heart chamber) once the heart has contracted so a decrease shows improved pumping efficiency and possibly heart muscle remodelling; the heart can become distended in heart failure. In a healthy person, this is about 60mL but can be three times this in heart failure. The expected average difference is 15mL. Note that the standard deviation on this parameter is expected to be high. This is partly as the measurement using diagnostic imaging has inherent variability in addition to patient responses. |
Left ventricle ejection fraction (LVEF) |
4% absolute improvement |
6% vs 1% with a standard deviation of 5%. |
LVEF is the percentage of blood expelled by the left ventricle on contraction. Normal values are 50–70%. Patients with under 40% EF are in heart failure. To enter the study, patients had to have an LVEF of 35% or less, so are very ill. |