Beforehand, CMS carried out a Tukey outlier deletion technique when calculating Medicare Benefit (MA) and Medicare Half D Prescription Drug Plans (PDP) star score. A Last Rule carried out in 2022, nonetheless, removed the use of Tukey outlier deletion from quality measures. Primarily based on 2020 historic knowledge, 17% of MA plans would have lower star ratings as in comparison with only one% would have increased star rankings after eradicating the Tukey outlier deletion. This begs the query, what’s a Tukey Outlier.
Tukey Outlier Definitions.
Tukey outliers are knowledge factors that lie outdoors the next vary;
- Q1 – okay(IQR), Q3+okay(IQR)
Right here Q1 and Q3 are the primary and third quartiles of the information respectively and IQR is the interquartile vary (i.e., the distinction between the third and first quartile). The time period okay is a multiplier, which describes how delicate you’d need to be to outliers. John Tukey proposed that k = 1.5 indicates an “outlier”, and k = 3 indicates data that is “far out”.
How seemingly are you to determine an outlier with the Tukey technique?
The reply to this query is dependent upon (i) how vast your Tukey vary is (i.e., the worth of okay) and (ii) the form of your distribution. Andrey Akinshin created simulations to reply this query for Regular, Gumbel and exponential distributions. The outcomes are beneath. As you’ll be able to see beneath, non-normal distributions–particularly exponential–are more likely to have an outlier noticed utilizing the Tukey technique.
Like all outliers, identification is vital however what to do with them is dependent upon context. If these are knowledge errors or pure anomalous conditions, one might need to delete them. Alternatively, if these are simply outlier values that occur from time-to-time, one ought to depart them within the knowledge and simply attempt to higher perceive if there’s a knowledge producing course of which differs from the common one which might generate these values. Both manner, the Tukey technique is a useful, easy strategy for figuring out outliers, however it doesn’t inform you what to do with them as soon as they’re recognized.