See also an update till 2012 and a supplement of 2015
Weather at TRAO since 2000
Table of contentsIntroduction
Daily high and low values
Distributions of measured values
Summary statistics and record values
Comparison with Koniczynka station data
Corrected and reduced data (downloadable)
Recommended actions to take
TRAO (Torun Radio Astronomy Observatory or Department of Radio Astronomy of the TCfA) meteo data are collected since 2000 from the weather station WST7000 of IRDAM SA. It is mounted atop a pole sticking 4.2 m above the roof of adjacent building.
±5° for the wind direction,
±1 °C for the air temperature,
±3 % (RH at 20 % to 90 %) to ±4 % (elsewhere) for the relative humidity, and
±1 hPa (at 23°C) to ±3 hPa (at -40°C to 60°C) for the atmospheric pressure.
DoY Test V D T P RH Td 2.000093 0 0xff 4.19 103.42 -0.68 1018.31 100.00 -0.68 0.056 2.000208 0 0xff 3.88 97.41 -0.62 1018.34 100.00 -0.62 0.055The last number in these records is meant for an optional external sensor (which is absent in our installation).
A word of warning for a potential user: some daily files have as their first record the one that belongs to the end of the previous day file (but is not present there).
Measurement results of current day are displayed online on TRAO website.
In this document we present results of some simple statistical analyses of
the entire data set, since the time of installation of the meteo station
in mid 2000 until March 10, 2008.
These results were obtained after removing evidently erratic records.
A record has been considered erratic if any of its meteo data did not fit
into specified range. The ranges were set somewhat arbitrarily (but with care)
Below presented are daily high and low values of the four main quantities. Avalaible is also a tabular form of the data plotted. These diagrams consist of individual lines drawn between the minimum and maximum value of given quantity as found separately for each day.
Distributions of measured values
The plots that follow have these widths of bins (horizontal resolutions):
Each of the above record values has been checked if it represents a gradually reached extremum and is not a fake value due to corrupted reading. This was done by visual inspection of neighbouring measurements in the respective daily file.
In view of suspect quality of some of our measurements we have compared them with data obtained in a nearby (about 8.5 km away) professional meteo station located in Koniczynka village. The station is supervised by dr Marek Kejna of Department of Climatology, Institute of Geography, who kindly made his data available to us (a big thank you to him, and also to Zsuzsa Vizi for help in data format conversion).
The Koniczynka data taken for comparison were 1-hourly averages in the period from 1 January 2003 to 31 December 2007, Central European Time. There were 39110 (out of 43824 possible) hours in which at least one of the five quantities was measured in both the stations. These were plotted against each other as shown in the figures that follow. Overlapped points are plotted horizontally offset. The red lines represent an ideal case of perfect correlation of respective measurements in both the stations whereas the olive lines correspond to a least squares fit (see the table further down this page).
The following table contains numerical results of linear fits. The best fitted lines are of the form P = a + b*K, where P is a Piwnice quantity and K - corresponding quantity at the Koniczynka station. Besides the regression line parameters, a and b (with their estimation errors), the table contains also the correlation coefficient R, standard deviation about the fitted line SD, and number of data used for the fit N.
One notes very good correlation of the air temperature and atmospheric pressure, and high dispersion of the wind quantities. The Piwnice pressure is on average lower than that in Koniczynka only by 0.6 hPa at 960 hPa and by 0.2 hPa at 1040 hPa, while the temperature is higher by 0.9 to 0.6°C at -25 to +35°C, respectively.
In case of the wind direction, the decorrelation may partly be explained by inappropriate averaging of data originally read from the station. During this initial data reduction the angle mean is calculated the same way as the means of the other quantities. A proper algorithm should rely on calculating the mean sine and the mean cosine of the angles being averaged, Di (i = 1, 2, ... N), and then taking the arctan2 function of the two means or just sums:
= arctan2(Σisin(Di), Σicos(Di)), (1)
where the summations are carried over all N measurements. That is how our wind direction data were further averaged to obtain the 1-hour means for this particular plot. Judging after the depth and width of the gap near the direction of 0° in the angular distribution (see the figure on the left, which is circularly rearranged and expanded display of a part of the earlier presented distribution of wind directions) it is possible to estimate the number of measurements swept away from there. This number makes about 1 % of all measurements. That many data were affected by the simple (inappropriate) averaging of angles near 0 and 360°. The data that originally belonged to the depression at 0° must have been spread over the entire 360° range, with maximum at 180°.
The analysis of raw data presented above has indicated the presence of corrupt recordings. To improve usability of this database an attempt has been made to clean it of more obvious erratic records and tag some errors, essentially only with respect to the atmospheric pressure and temperature measurements. Basic search for errors relied on comparison of the deviation of each measurement (a 10-second average as usually stored in the archives) from the mean value of the temperature and pressure with the standard deviation calculated for various time intervals. The intervals ranged from 4 minutes to 1 hour. If a value deviated from the mean by a few standard deviations it was further compared to neighbouring measurements and automatically tagged as erratic only if there were 'normal' neighbours on both sides. There were cases that two consecutive measurements happened erratic, and these were treated individually. This way we have detected a few hundred errors, most of them belonged to the pressure measurements. Another cleaning has been based on a search for exactly the same numerical values repeated in number (10 or more) consecutive measurements. It allowed to remove many cases when apparently all the sensors simultaneously 'froze' for a few minutes. Such records were not tagged but were altogether erased from the daily files.
Finally, a search for incorrect time tags was performed. There are many cases (some 30 000) that two neighbouring records have the same time stamp (whereas they are expected to differ by 10 s). About 2700 cases were discovered where time of the next record was earlier than of the current one, and in 8 files there are backward jumps in time exceeding 10 minutes and reaching 1 hour (in one case there is a 2-hour jump!). Unfortunately, only one of the latter cases could be corrected by shifting in time a portion of earlier data. It seems that the backward jumps are due to fast computer (internal system) clock.
The corrected database encompassing measurements till 1 June 2008 (inclusive) now consists of 22797080 healthy records (lines) and 347 records tagged as having erratic pressure or temperature measurement. The summary statistics do not differ by more than 0.1 from those already presented in this report and calculated prior to the correction with sole exception of the median relative humidity, which now is equal to 99.1 %.
This database has been reduced to 1-hour averages and is avalable for download in the form of one zipped file. This big file (uncompressed it is about 4 MB in size) contains an ASCII table, header plus 65639 data raws, which begins thus:
Each data line of this table corresponds to a time interval equal to 1 hour between integer UTC hours. Time indicated is given for the center of the interval. For example, the first line begins with 2000.57508, which signifies the year 2000, and days passed since 0 hours UTC on 1 January: 0.57508*366 = 210.47928, i.e. 211-th day of the year and UTC interval beginning with the hour equal to the integer of 0.47928*24 = 11.5, i.e. 11:00 UTC. Note carefully that in the above decoding of time the factor 366 stands for the leap years only (2000, 2004 and 2008); for other years it is 365. The next nine columns contain the mean weather quantities (columns headed 'Mean') with standard deviations ('SD'). The wind direction mean ('Dir') is not accompanied by SD because of nonstandard averaging (which was angular averaging according to Eq. (1)). The rightmost column shows the number of daily file records used for calculation of the means.
Despite considerable amount of cleaning and corrections, statistical properties of our data base did not change much, so that the results presented for original data remain valid. This refers also to the one-hour data (see this analysis and figures) and to the correlations with the Koniczynka data, which are now only slightly better. For example, the most affected quantity of the atmospheric pressure now is linearly related to the Koniczynka data through this equation:
PPiwnice = (-6.43728 ± 0.36735) + (1.00604 ± 0.00037) PKoniczynka,and this fit has the correlation coefficient of 0.99746 and standard deviation of 0.66848. In practice, this line is indistinguishable from the earlier one, as seen from the following figure, wherein the new fit is represented by the thin light line drawn in the middle of the other. Comparing this diagram with that for uncorrected data one easily notes disappearence of many outliers.
The angular averaging has been implemented on 5 June 2008.