This post gives just few quick notes on the methodological aspects of the paper.
1. They select data with a weak climatic temperature trend.
2. They select data with a large cooling bias due to improvements in radiation protection of thermometers.
3. They developed a new homogenization method using an outdated design and did not test it.
Weak climatic trendChristy and McNider wrote: "This is important because the tropospheric layer represents a region where responses to forcing (i.e., enhanced greenhouse concentrations) should be most easily detected relative to the natural background."
The trend in the troposphere should a few percent stronger than at the surface; mainly in the tropics. However, it is mainly interesting that they see a strong trend as a reason to prefer tropospheric temperatures, because when it comes to the surface they select the period and temperature with the smallest temperature trend: the daily maximum temperatures in summer.
The trend in winter due to global warming should be 1.5 times the trend in summer and the trend in the night time minimum temperatures is stronger than the trend in the day time maximum temperatures, as discussed here. Thus Christy and McNider select the data with the smallest trend for the surface. Using their reasoning for the tropospheric temperatures they should prefer night time winter temperatures.
(And their claim on the tropospheric temperatures is not right because whether a trend can be detected does not only depend on the signal, but also on the noise. The weather noise due to El Nino is much stronger in the troposphere and the instrumental uncertainties are also much larger. Thus the signal to noise ratio is smaller for the tropospheric temperatures, even if the signal were as long as the surface observations.
Furthermore, I am somewhat amused that there are still people interested in the question whether global warming can be detected.)
[UPDATE. Tamino shows that within the USA, Alabama happens to be the region with the least warming. The more so for the maximum temperature. The more so for the summer temperature.]
Cooling biasThen they used data with a very large cooling bias due to improvements in the protection of the thermometer for (solar and infra-red) radiation. Early thermometers were not protected as well against solar radiation and typically record too high temperatures. Early thermometers also recorded too cool minimum temperatures; the thermometer should not see the cold sky, otherwise it radiates out to it and cools. The warming bias in the maximum temperature is larger than the cooling bias in the minimum temperature, thus the mean temperature still has some bias, but less than the maximum temperature.
Due to this reduction in the radiation error summer temperatures have a stronger cooling bias than winter temperatures.
The warming effect of early measurements on the annual means is probably about 0.2 to 0.3°C. In the maximum temperature is will be a lot higher and in the summer temperature it will again be a lot higher.
That is why most climatologists use the annual means. Homogenization can improve climate data, but it cannot remove all biases. Thus it is good to start with data that has least bias. Much better than starting with a highly biased dataset like Christy and McNider did.
Statistical homogenization removes biases by comparing a candidate station to its neighbour. The stations need to be close enough together so that the regional climate can be assumed to be similar in both stations. The difference between two stations is then weather noise and inhomogeneities (non-climatic changes due to changes in the way temperature was measured).
If you want to be able to see the inhomogeneities you thus need to have well correlated neighbors that have as little weather noise as possible. By using only the maximum temperature, rather than the mean temperature, you increase the weather noise. But using the monthly means in summer, rather than the annual means or at the very least the summer means, you increase the weather noise. By going back in time more than a century you increase the noise because we had less stations to compare with at the time.
They keyed part of the the data themselves mainly for the period before 1900 from the paper records. It sounds as if they performed no quality control of these values (to detect measurement errors). This will also increase the noise.
With such a low signal to noise ratio (inhomogeneities that are small relative to the weather noise in the difference time series), the estimated date of the breaks they still found will have a large uncertainty. It is thus a pity that they purposefully did not use information from station histories (metadata) to get the date of the breaks right.
Homogenization methodThey developed their own homogenization method and only tested it on a noise signal with one break in the middle. Real series have multiple breaks; in the USA typically every 15 years. Furthermore also the reference series has breaks.
The method uses the detection equation from the Standard Normal Homogeneity Test (SNHT), but then starts using different significance levels. Furthermore for some reason it does not use the hierarchical splitting of SNHT to deal with multiple breaks, but it detects on a window, in which it is assumed there is only one break. However, if you select the window too long it will contain more than one break and if you select the window too short the method will have no detection power. You would thus theoretically expect the use of a window for detection to perform very badly and this is also what we found in a numerical validation study.
I see no real excuse not to use better homogenization methods (ACMANT, PRODIGE, HOMER, MASH, Craddock). These are build to take into account that also the reference station has breaks and that a series will have multiple breaks; no need for ad-hoc windows.
If you design your own homogenization method, it is good scientific practice to test it first, to study whether it does what you hope it does. There is, for example, the validation dataset of the COST Action HOME. Using that immediately allows you to compare your skill to the other methods. Given the outdated design principles, I am not hopeful the Christy and McNider homogenization method would score above average.
ConclusionsThese are my first impressions on the homogenization method used. Unfortunately I do not have the time at the moment to comment on the non-methodological parts of the paper.
If there are no knowledgeable reviewers available in the USA, it would be nice if the AMS would ask European researchers, rather than some old professor who in the 1960s once removed an inhomogeneity from his dataset. Homogenization is a specialization, it is not trivial to make data better and it really would not hurt if the AMS would ask for expertise from Europe when American experts are busy.
Hitler is gone. The EGU general assembly has a session on homogenization, the AGU does not. The EMS has a session on homogenization, the AMS does not. EUMETNET organizes data management workshops, a large part of which is about homogenization; I do not know of an American equivalent. And we naturally have the Budapest seminars on homogenization and quality control. Not Budapest, Georgia, nor Budapest, Missouri, but Budapest, Hungary, Europe.
Related readingTamino: Cooling America. Alabama compared to the rest of contiguous USA.
HotWhopper discusses further aspects of this paper and some differences between the paper and the press release. Why nights can warm faster than days - Christy & McNider vs Davy 2016
Early global warming
Statistical homogenisation for dummies