Background
In the past years, cardiovascular magnetic resonance Imaging (CMR) has emerged as a broadly applied imaging modality in cardiac diagnostics [
1]. Due to its high accuracy and reproducibility, CMR is considered as gold standard for evaluation of left ventricular (LV) function [
2]. CMR is the recommended method to assess cardiac function and hemodynamics especially when transthoracic echocardiography is limited [
3]. In addition to mere functional assessment, non-invasive tissue differentiation represents CMR’s unique feature [
4]. Clinical decision-making is often based on quantification, i.e. the placement of an implantable cardiac defibrillator depends on quantified LV function or valve replacement on quantitative flow assessment [
5‐
7]. Therefore, accurate and reliable quantification is essential for correct diagnosis and adequate treatment. Technical aspects such as field strength, vendor platforms and imaging protocol influence CMR results [
8‐
12]. The Society for Cardiovascular Magnetic Resonance (SCMR) published not only standardized protocols for image acquisition and interpretation, but also guidelines for reporting which propose to report scanner type, sequences used and study quality [
13‐
15]. Interestingly, it is not suggested to report the software used [
15]. CMR image analysis is performed on dedicated commercial and non-commercial software solutions. They often differ within and between sites. Quantitative analysis is mostly based on manually contouring or manually correction of semi-automatic segmented regions of interest (ROI) in CMR images. For LV volumetry and flow quantification, the contour relies on the definition of a whole pixel or subpixel depending on the software. In case of parametric mapping, not all software providers do have a specific tool. A recent study reported that software used for myocardial perfusion analysis is not interchangeable and reliable results were only achieved within the same software [
16]. In contrast to this, statistically significant differences were found in analysis of T2* mapping between two software which were considered to be without any effect on clinical decision making [
17]. Other groups found a strong correlation and no significant differences between software for LV assessment [
18,
19]. Software comparison for flow measurement was only done in a small number of patients [
20]. The impact of the software-dependent approaches of contour modification on results is unknown and mathematical calculation and extrapolation remain reserved to the vendors.
The aim of the present study was to investigate the equivalence of three commercially available software used at our site for assessment of LV, 2D flow and T1- and T2-parametric mapping. We hypothesized that mean differences between software are smaller than intraobserver variability and hence, software can be considered as equivalent.
Discussion
Quantification is a basic requirement for cardiovascular decision making and several parameters in CMR depend on reliable and robust values. To the best of our knowledge, this is the first study comparing three CMR analysis software for quantification of LV 2D flow and T1 and T2 parametric mapping. Main findings were: (i) all three software were equivalent for LV assessment (EF, EDV, ESV and mass), (ii) all three software were equivalent for SV, but only two software for Vmax, (iii) equivalence was given for all software in quantification of T2-time, but only two software for T1-time.
It is well known that different post-processing SW are used world-wide in clinical routine and research. They differ e.g. regarding pixel definition settings, contour detection and other algorithms. Each pixel of a cardiac image displayed by the post-processing software provides information about its size and specific value, such as maximum velocity in case of flow measurement or T1-time in case of T1-mapping. For quantitative image analysis contours intersect pixels. Depending on the software type, different pixel inclusion methods for calculations can be used, e.g. to involve the pixel partly or entirely. In a clinical setting it is crucial to know if these potential differences could impact the results. Previous studies compared the relation between software using correlations, intra-class-coefficient and significant differences. We applied an equivalence testing approach using the intraobserver variability to define equivalence margins to identify deviations between software. In the present study there is no impact of scan procedure related technical influences [
21] as we analyzed the same data sets with all three software. The discussion of the results is based on the findings of the particular software versions, we have used. All vendors were open-minded for discussion and adaption.
For LV assessment, all three software showed a high correlation and equivalence for LV EF, EDV, ESV and mass. Our results are supported by previous studies using different software. Messali et al. revealed a high correlation of LV function and volume without significant differences between ViewForum (Philips) and Argus in 46 patients [
19]. Kara et al. demonstrated a high correlation between LV tutorials (Cardiovascular Imaging Solutions) and Argus in 40 patients with known or suspected coronary artery disease. Additionally, they compared CMR software with other modalities like CT and 2D echocardiography, but only for EDV they could show a stronger correlation between CMR tools and CT rather than the two CMR software. Another group compared image analysis of 15 healthy subjects between one scanner providing MASS and one scanner providing Argus and did not find significant differences within one observer [
41]. Nevertheless, CMR image segmentation is reader dependent and LV quantification differs even between expert readers which emphasizes the need for standardization [
42]. In our study, we assumed that a range within software could be declared as equivalent, however, this range would depend on the reader’s precision. Still, our intraobserver bias was comparable to former results even though we excluded papillary muscles from LV mass [
36,
43]. In the present study, each software calculated volumes in function of area and slice thickness. As there was no gap between the SAX slices, interpolation was not necessary. EF and mass were derived from cardiac volumes. We conclude that different pixel definitions of the present software did not substantially influence results of LV volumetry. The applied software are interchangeable for LV assessment in this cohort of patients.
Hemodynamics can be assessed by PC-CMR to evaluate shunt fraction, valve regurgitation or stenosis [
3]. We used automatic contour propagation with manual correction in all three software for comparison of flow data sets of 30 patients. Boye et al. applied a software flow analysis procedure in 6 patients with aortic insufficiency and showed similar results for aortic regurgitant fraction based on backward/forward SV in four software, three out of those four were the same as in our study [
20]. Consistently, the present study showed equivalence for SV between all three software. However, even in phantom measurements without manual contour correction they revealed differences in contour propagation algorithms as they found different velocities among software. In our study, intraobserver analysis of Vmax showed a high reliability within each software. But, despite accurate corrected anatomical borders, we identified software B measuring nonequivalent Vmax values compared to other software even when the peak velocity measuring square was in the same phase and visually at a similar location within the vessel. This finding is attributed to different voxel averaging methods, depending on the software. In software B the default of flow measurement was an averaging including 4 adjacent voxels in contrast to the other software which preset a single voxel. Voxel averaging techniques reduce spatial resolution of the measurement and significantly underestimate peak velocity compared to the single voxel technique with a difference of 7% mean percentage, but do not influence the flow volume [
44]. We found nearly congruent Vmax values between software A and C, whereas these software showed the highest bias in SV. This could be explained by the fact that Vmax is measured by only one or a few voxel while SV is calculated as sum of velocities of the voxel within the ROI multiplied by the area at each temporal phase [
45]. We cannot exclude small differences in ROI sizes despite manual border correction among software. However, ROI size should then substantially affect the SV which was not the case in this study. Interestingly, the velocity measuring pixel among two software vendors partly exceeded the anatomical and delineated border of the aorta, in turn possibly inducing an incorrect velocity value for this phase. Therefore, attentive care must be taken to control outliers and to avoid misalignment. Other authors analyzed also the impact of different modalities to assess different anatomical structures [
46‐
49]. In our opinion, the validation of different software is warranted at least within an imaging modality and needs further attention.
CMR enables tissue characterization using parametric mapping techniques. Myocardial T2-mapping can detect edema in acute myocardial infarction or inflammation [
37]. Native T1-mapping reflects pathological changes in both myocardium and interstitium [
35,
50]. It allows further differentiation of cardiac diseases in LV hypertrophy and in systemic diseases such as amyloidosis [
51,
52]. For T2* analysis, statistically significant but clinically negligible differences were found between the software Functool protocol (GE) and the T2* module of Qmass [
17]. In line with this finding, our results indicated that the present software are not equivalent in quantifying T1-times. Differences could occur due to different contour drawing procedures and pixel inclusion approaches that potentially influence precision. This may lead to the significant smaller area of the ROI in software A than in software B for both, T1- and T2-quantification. Qmass and cvi
42 provided a tool for endo- and epicardial border delineation. Argus has no such specific tool yet. However, within one software, the delineated area was consistent between two measurements. Another explanation for discrepancies might be the different ranges of the values for T1- and T2-time. This is supported by the fact that the relation of our maximum intraobserver SD to the recently published segment based normal values of our group was much smaller for T1- than for T2-time accounting for narrower equivalence limits for T1-time (the maximum intraobserver SD of ±24.4 ms correlates to ±2.5% of the published normal value of 980.7 ms for T1-time, whereas the maximum intraobserver SD ±3.2 ms correlates to ±6.1% of the published normal value of 52.3 ms for T2-time) [
4]. Within one software, SD of intraobserver analysis for T2-time was comparable to other studies using Qmass and Osirix [
37,
53]. The intraobserver SD of T1-time is in good agreement with other publications in the literature investigating ViewForum and cvi
42 [
4,
10,
11,
54]. However, the range of published intraobserver values is considerably high. Depending on the CMR sequence a correction factor can be introduced if T1-times have to be calculated using the software [
55]. Therefore, the impact of software on T1-time quantification should be evaluated in further studies including other diseases like amyloid and hypertrophic cardiomyopathy and at different sites with an approach to correct for some variations as described for LV assessment [
42].
Limitations
Currently there is a lack of an internationally accepted gold standard for software, like phantoms for the different cardiac structure and function. Therefore, we used intraobserver variability of an experienced reader as gold standard to assess equivalence testing. We investigated only a certain number of SW, being aware that there are many others on the market. Further, our findings were specific for the particular software version, knowing that software packages evolve continuously. We did not analyze different cardiovascular diseases but among the selected patients 52% suffered from cardiac alterations. The potential influence of multiple observers and other pathologies on the comparability of results from different software systems was not considered in this study but should be subject to continuative analyses.