Mesonet IT and Data Management Managing, reviewing, processing, and distributing data


Introduction to Mesonet Data Management

The design of your mesonet, of course, takes into account the reason why you are building a mesonet (your objective) and the deliverables you intend to provide to your various stakeholders. To meet your objective and provide your stakeholders with high-quality, accurate mesonet deliverables (including data products), you will need to devise a system for the management, review, processing, and distribution of your data.

technician in server room

Managing Measurement Data

Your mesonet may consist of hundreds of automated weather stations that are collecting data that need to be housed—somewhere and somehow. Typically, the “owner” of the raw data files (and processed data) is a central data processing facility hosted by the mesonet operator. For example, for a mesonet operated by a university, the data files may reside on servers that are physically located on the university campus.

Depending on the scope of the mesonet, file servers and offsite data hosting for backup may be used. Most entities are state sponsored and require direct access to data products. However, there are certain data products that are created for specific applications where it is not intended for a larger general audience. The data product may be geographic dependent or for a specific entity (for example, emergency managers, public health department, etc.).

Mesonet server
(Photo courtesy of Dr. Jerald Brotzge, Program Manager, New York State Mesonet)

Managing metadata

The metadata for a mesonet are the data that describe and give historical information about the measured data that have been collected by the network of automated weather stations. The metadata may include the following types of information:

  • Site information, including a documentation of changes that have occurred—both in writing and via digital photography—and any necessary actions taken (such as cutting back interfering weeds)
  • Equipment information, such as sensors, serial numbers, calibration coefficients, calibration drift over time, important dates (such as installation, calibration, and replacement), locations, etc.
  • Equipment maintenance information, including what was done (such as sensor replacement) and why
  • Explanations regarding data that are missing or known to be inaccurate

Metadata are important because they give context for measurements and instill confidence in the veracity of the measurements and data products. You could deploy the highest quality hardware, but unless you know how those measurements are obtained and managed, it would be difficult to trust and not question the resultant data. Unfortunately, all too often in environmental research, data are used without prior knowledge of the source and how they were obtained. This fails the scientific method.

In a dynamic system such as a mesonet, metadata are subject to change during the life of the automated weather station. Therefore, maintaining the station history and tracking the changes are important for measuring climate variables. If a climate signature change at a station is recorded, you can use your metadata of the site—documenting the site, instrumentation changes, and program changes—to help determine if the change is real or if it is an artifact of intervention.

Methods to record metadata include a standard paper form and mobile platforms with custom software.

The once-standard field notebooks are being routinely replaced by weatherproof handheld computers to document this information. At the Hubbard Brook Experimental Forest in New Hampshire, technicians track known events on handheld devices with electronic forms that have pull-down menus to ensure uniformity. When the technician returns from the field, the digital notes are downloaded and automatically synchronized with the sensor data using the date and time stamp.

Standards for documenting metadata are described here:

In recent years, various metadata standards have been developed for environmental data and can be applied to sensors that produce streaming data. SensorML (Sensor Model Language), EML (Ecological Metadata Language), and WaterML (Water Markup Language) are all common metadata standards that use Extensible Markup Language (XML). XML is a flexible and widely used standard for encoding information in a format that is both human and machine readable, which facilitates its use in Internet applications.


Security Issues

The credibility of your measurement data and metadata is paramount to the accuracy of the mesonet products you provide. Consequently, it is essential to protect your data from being hacked or otherwise treated maliciously.

Starting with the data logger at the automated weather station, there are measures you can take to protect your data. For example, you can encode the data stored and transmitted over communication links to a central location so that smaller-sized data packets are used for transmission, which speeds up the data transmission rates and reduces power consumption. This encrypted format is not typically known and can’t be viewed without the proper decoding software. After the data reaches its destination—a central location, such as a university—required IT security protocols are used to store the data onsite.

computer room
(Photo courtesy of Dr. Christopher A. Fiebrich, Executive Director, Oklahoma Mesonet)

To ensure the continuity of your mesonet data operations, you can purchase external web services to store your data at additional offsite locations. (This is known as data storage redundancy.) The additional external storage could back up and protect source code, databases, and synchronization algorithms—in addition to other sources such as satellite data, river data, and lightning data. Putting all the software code pieces back together (even though they are backed up), however, could be an extensive exercise. An easier alternative may be to use a mirrored data site that would take care of the internal data communication paths (such as IP addresses and modem configurations). A mirror site is a website or a set of files on a computer server that has been copied to another computer server so that the same website or data files are available from multiple locations. A mirror site has its own URL but is otherwise identical to the principal site.


IT Maintenance

Don't underestimate the resources required to maintain the data side of a mesonet. It is a considerable effort to maintain all the communication and equipment used— from the sensors and stations to the data servers, software, databases, computer models, data products, etc. System upgrades are typically deployed when two criteria are met: the current technology has become obsolete, and funding is available.

Operations Control Room, NYS Mesonet
(Photo courtesy of Dr. Jerald Brotzge, Program Manager, New York State Mesonet)

Data Quality Review

While conducting regular maintenance of your mesonet’s stations is critical, it is equally important to have procedures in place to monitor your mesonet data and raise flags for abnormal measurement values. Any mechanical or electrical instrument in the field will eventually fail, and sensors often degrade over time. Failures seldom occur during or immediately preceding regular station maintenance, so developing automated procedures to monitor your measurements can trigger a quicker resolution of those issues.

Mesonet operators must be willing to spend considerable time reviewing the data from their stations to ensure that the data products they deliver to stakeholders are accurate and of high quality. Operators have an obligation to ensure that “bad data”—especially without supporting metadata—does not cause problems for stakeholders and the potentially life-affecting decisions they need to make based on that data.

Operators have an obligation to flag potentially bad data during the quality assurance process and disregard suspect data. Operators can compare the data of one station to that of a nearby station to see if there is a discrepancy, investigate why there might be differences, and annotate the measurements using metadata.

There is a core list of suggestions regarding best practices for quality control of streaming environmental sensor data that are outlined in Campbell et al. (2013):

  • Automate QA/QC procedures.
  • Maintain an appropriate level of human inspection.
  • Replicate sensors.
  • Schedule maintenance and repairs to minimize data loss.
  • Have ready access to replacement parts.
  • Record the date and time of known events that may affect measurements.
  • Implement an automated alert system to warn about potential sensor network issues.
  • Retain the original unmanipulated data.
  • Ensure that the data are collected sequentially.
  • Perform range checks on numerical data.
  • Perform domain checks on categorical data.
  • Perform slope and persistence checks on continuous data.
  • Compare the data with data from related sensors.
  • Correct the data or fill gaps, if that is prudent.
  • Use flags to convey information about the data.
  • Estimate uncertainty in the value, if that is feasible.
  • Provide complete metadata.
  • Document all QA/QC procedures that were applied.
  • Document all data processing (e.g., correction for sensor drift).
  • Retain all versions of the input data, workflows, QC programs, and models used (data provenance).

The two levels at which you should review the quality of the measurement data from your automated weather stations are the following:

  1. Use automated computer programs to monitor your data and flag potential problems.
  2. Have a staff member conduct manual checks of flagged data as they occur, as well as daily, weekly, monthly, and yearly routine data reviews.

Automated quality assurance review

Your measurement data can be quality checked in many ways to ensure the proper operation of your mesonet stations. The following are some example methods you can program your software to use:

  1. Automatically check for questionable measurement values that are outside of an established “normal” range or even the published specifications of individual instruments.
  2. Automatically check measurement values against what the sensor reported previously. For example, a large pressure spike over a five-minute period may indicate a sensor issue.
  3. Automatically check measurement values at one station against values at other nearby stations to give spatial context. For example, if a few stations are reporting similar unusual phenomena, there might not be a measurement problem.

Climate testing, step tests, temporal tests, spatial tests, and sensor-specific testing are also data filters used to identify suspect data.

Networks may choose to measure, store, and output raw data with minimal quality review taking place at the measurement site. Known sensor characteristics, however, can be chosen to have a minimal amount of processing in real time in the data logger programming code. (For example, an HMP45C Temperature and Relative Humidity Probe showing relative humidity values from 100 to 107% is known to be 100%.)

Human quality assurance review

A trained staff member should review the raw mesonet data several times per day. This does not mean reviewing all raw observations—which could be in the order of tens of thousands of data points—but rather the processed data that was flagged and failed from the automated quality review process. From this data, the staff member could then take the following actions:

  1. Decide if the measurement failure is real and there is a problem with a sensor or other piece of equipment. Stations located near each other can be compared to determine if flagged data is erroneous. The staff member can use the Barnes Objective Analysis method to make this determination.
  2. Manually verify flagged data for additional context. For example, if a big storm or local fire has caused extreme environmental phenomena in a localized environment, your data may be accurate, even though it was automatically flagged.
  3. Choose whether to issue a trouble ticket that would trigger a field technician to visit the station.
  4. Determine when the sensor first exhibited the questionable condition.

The following is a suggested list of tasks for a technician to conduct on a site visit:

  • Record basic information, including:
    • Site name
    • Date
    • Arrival time
    • Departure time
    • Climate season during visit (spring, summer, fall, or winter)
  • Conduct tasks upon arrival.
    • Check for physical damage, including:
      • Instrument tower
      • Grounding wire
      • Environmental enclosure
      • Solar panel
      • Instrument wires
      • Data communication wires (coaxial cable or Ethernet)
      • Sensors
    • Take photos of the site in all directions.
  • Maintain the bare soil plot (for four-inch soil temperature measurements).
    • Remove vegetation, such as grass or weeds.
    • Apply chemicals to prevent future weed growth.
  • Conduct general equipment maintenance.
    • Check the cleanliness of equipment, including:
      • Solar radiation sensors
      • Tipping-bucket rain gage
      • Any debris on sensors that would affect two-meter wind speed measurements
      • Air temperature and relative humidity shield
      • Solar panel
    • Level any misaligned equipment, including:
      • Solar radiation sensor plate
      • Rain gage
      • Wind sensors
    • Conduct a rain gage drip test.
  • Review active trouble tickets for a site (for example, a temperature sensor reading abnormally high as compared to other stations).

The Oklahoma Mesonet offers this perspective on site visits:

On-site intercomparisons are not used to correct data. Instead, they provide performance checkpoints during the sensor’s operational life cycle. As a result of these visits, technicians may detect sensor problems that are difficult to ascertain by other components of the QA system. On the other hand, a site visit may reveal that a perceived QA problem is attributable to a real local effect. During the time that a technician is present at a site, a special data flag is activated in the site datalogger. The flag is reported with the regular observations and is archived for future reference. Should questions arise concerning any aspect of data quality, it is possible to check the archives of QA flags to see if the presence of a technician could be a proximate cause of the suspicious data.

During scheduled physical site visits, a technician may perform maintenance tasks ranging from validating instrument performance to maintaining vegetation surrounding the station. The presence of a technician could influence measurements made by the station during the service period. For example, a technician may be validating a rain gage by pouring a known amount of water into the funnel, which then causes false tips in the stored data set. A possible solution to this interference would be to install a door switch on the environmental enclosure housing the data logger. Upon arrival, the technician could open the door, which would signal the data logger to flag all measured data until the door is closed again. The data would still be recorded, but it would be noted that the data are not valid due to a known site visit.


Data Processing

After maintaining the physical structures of mesonet stations, proper IT and data management procedures constitute the most important aspect for a mesonet’s continual operation. A mesonet produces thousands of data points daily, and quality computing infrastructure and database management expertise are vital to process all the measurement values and keep the mesonet online. In addition to managing their measurement data, many mesonets use their database and software to manage operations affecting metadata, maintenance routines, calibration information, manual quality review flags, and inventory management.

Most current mesonets opt to store all their raw measurement data and then post-process the data for calculations and quality checks using automated programs and manual processing. Currently, a commercially available master software package does not exist to handle these processes for mesonets. Consequently, mesonet operators have had to build databases and software to handle all their data and convert them to the formats needed by their stakeholders. To code their database management systems, mesonet operators have used C++ and MYSQL.

There are certain data file formats that work well for stakeholders with TV meteorology computer systems but are not suitable for other stakeholders. When a television station upgrades its server software, changes may be necessary when providing these stakeholders with their data products. Sometimes multiple data formats may need to be provided. Your system administrator may prefer one data format, but in reality, you may need to provide these stakeholders with four or five data formats. This situation is further complicated when you flag a suspicious variable that needs to be updated in multiple data products.


Data Distribution

Your processed data will need to be distributed to your stakeholders in various formats, depending on the agreed-upon data products that your mesonet is providing.

Data distribution can be achieved through either of two methods:

  • Stakeholders can “self service” and access the data products made available to them.
  • The data products can be “pushed out” to the stakeholders.

Popular data distribution formats include FTP (File Transfer Protocol), HTTP (Hypertext Transfer Protocol), and LDM (Local Data Manager), which all use a dedicated IP (Internet Protocol) port.

  • HTTP is typically much easier to support than FTP or LDM.
  • HTTP can be tracked through IP addresses and disabled quickly if unauthorized access occurs.
  • With an FTP login (username and password), you can make your data distribution more secure.

RSS (Rich Site Summary or Really Simple Syndication) feeds are another method of data distribution, but these feeds are more difficult to set up than using the other data distribution formats.