Spectacularly unreliable MRI “results”

woman-10-mri-scans-10-different-diagnoses
People mostly assume that MRI is a reliable technology, but if you send the same patient to get ten different MRIs, interpreted by ten different radiologists from different facilities, apparently you get ten markedly different explanations for her symptoms.
A 63-year-old volunteer with sciatica allowed herself to be scanned again and again and again for science. The radiologists — who did not know they were being tested — cooked up forty-nine distinct “findings.” Sixteen were unique; not one was found in all ten reports, and only one was found in nine of the ten. On average, each radiologist made about a dozen errors, seeing one or two things that weren’t there and missing about ten things that were. That’s a lot of errors and not a lot of reliability. The authors clearly believe that some MRI providers are better than others, and that’s probably true, but we also need to ask the question: is any MRI reliable?
PURPOSE:
This study is designed to test the authors’ hypothesis that radiologists’ reports from multiple imaging centers performing a lumbar MRI examination on the same patient over a short period of time will have (1) marked variability in interpretive findings and (2) a broad range of interpretive errors.
STUDY DESIGN: 
This is a prospective observational study comparing the interpretive findings reported for one patient scanned at 10 different MRI centers over a period of 3 weeks to each other and to reference MRI examinations performed immediately preceding and following the 10 MRI examinations.
PATIENT SAMPLE: 
The sample is a 63-year-old woman with a history of low back pain and right L5 radicular symptoms.
OUTCOME MEASURES: Variability was quantified using percent agreement rates and Fleiss kappa statistic. Interpretive errors were quantified using true-positive counts, false-positive counts, false-negative counts, true-positive rate (sensitivity), and false-negative rate (miss rate).

METHODS:
Interpretive findings from 10 study MRI examinations were tabulated and compared for variability and errors. Two of the authors, both subspecialist spine radiologists from different institutions, independently reviewed the reference examinations and then came to a final diagnosis by consensus. Errors of interpretation in the study examinations were considered present if a finding present or not present in the study examination’s report was not present in the reference examinations.
RESULTS:
Across all 10 study examinations, there were 49 distinct findings reported related to the presence of a distinct pathology at a specific motion segment. Zero interpretive findings were reported in all 10 study examinations and only one finding was reported in nine out of 10 study examinations.
Of the interpretive findings, 32.7% appeared only once across all 10 of the study examinations’ reports.
A global Fleiss kappa statistic, computed across all reported findings, was 0.20±0.06, indicating poor overall agreement on interpretive findings. The average interpretive error count in the study examinations was 12.5±3.2 (both false-positives and false-negatives). The average false-negative count per examination was 10.9±2.9 out of 25 and the average false-positive count was 1.6±0.9, which correspond to an average true-positive rate (sensitivity) of 56.4%±11.7 and miss rate of 43.6%±11.7.
CONCLUSIONS: 
This study found marked variability in the reported interpretive findings and a high prevalence of interpretive errors in radiologists’ reports of an MRI examination of the lumbar spine performed on the same patient at 10 different MRI centers over a short time period. As a result, the authors conclude that where a patient obtains his or her MRI examination and which radiologist interprets the examination may have a direct impact on radiological diagnosis, subsequent choice of treatment, and clinical outcome.