Winners are announced.
The Blue Sky Challenge
Let’s keep the sky blue and the earth green!!
Monitoring air quality in various cities across the globe is becoming an utmost necessity as air pollution can lead to several respiratory/cardiovascular diseases. As per reports from WHO, more than 7 million mortalities have been recorded worldwide due to diseases related to air pollution. Increasing urbanisation and industrialisation have a negative impact on air quality which is already alarming in many cities across the world. In practice, the installation of ground stations for local air quality measurement would not be a feasible solution. A mobile application in this regard would enable us to make an informed decision.
In this direction, this hackathon aims to find solutions in two different approaches which are categorized in two sub-themes, (a) to analyse satellite imagery data to estimate the pollutants in a given area, and (b) to discover new innovative solutions for developing smart air quality monitoring systems by integrating sensor technology with machine learning algorithms. The challenges with satellite images include the availability of only temporal snapshots of data instead of continuous data, a huge data size of the imagery data makes them not suitable to be downloaded through mobile networks to consumer devices, and processing of such data with limited computational capacity in the mobile devices. Similarly, the sensor data in many cases need to be screened to estimate failures, anomalies, and errors associated with the same. This hackathon targets finding solutions in a hybrid approach. Further, this event will also provide a platform for technology and innovation enthusiasts from different parts of the world to demonstrate and showcase their skills for the betterment of society.
The participants are expected to find and submit solutions in the proposed sub-themes. The submissions will be then evaluated by experts.
Each registering team needs to submit max 1000 words on their approach to solving the sub-theme problems. Final decision on winners and other matters will be taken by the panel of judges.
Each registering team needs to submit max 1000 words on their approach to solving the sub-theme problems. Final decision on winners and other matters will be taken by the panel of judges.
Goal: Pollution estimation with improved accuracy using a combination of hyper-spectral satellite imagery data and maps.
Background:
Air pollution is one of the major public health concerns which leads to a number of respiratory/cardiovascular diseases and an estimated 7 million mortalities worldwide [WHO]. Tracking the quality of the air we breathe using a mobile device will help us be more informed about the air quality condition around and to make an informed decision. However, accessible data from local air quality measurement ground stations are not available everywhere. There are some recent advancements in terms of the use of map-based land-use regression (LUR) models (Steininger et al.). However, such models may suffer from spatial and temporal inaccuracy due to artefacts, non-man-made sources of pollution, wind speed, pressure, precipitation and temperature. Satellite imagery data could give a better estimate of the pollutants in a given area. The challenges with satellite images include availability of only temporal snapshots of data instead of continuous data, huge data-size of the imagery data makes them not suitable to be downloaded through mobile networks to consumer devices, and processing of such data with limited computational capacity in the mobile devices. This hackathon aims to find solutions to these challenges through a hybrid approach.
Problem Statement:
The schematic diagram below illustrates the challenge. A code template with APIs is provided to assist you with the implementation. The solution requires two levels of data or image processing. One involves processing of the hyper-spectral satellite images/data to extract hyper-parameters or other suitable compressed representation/feature set. This compressed information should be sent to the second regression machine learning (ML) model as data input. The regression machine learning model should process the data from the satellite imagery along with the map data from the open street map to find the regression estimate of pollution (Nitrogen Dioxide) in the region for specific time instances. The solution should be submitted as a python code (.py) or (.ipynb) file
Dataset: https://s3-ap-southeast-1.amazonaws.com/he-public-data/BlueSkyAbove0287eea4.ipynb
Click here to read more on the dataset - https://s3-ap-southeast-1.amazonaws.com/he-public-data/BlueSkyChallenge39c7d73.pdf
Judging/scoring Criteria:
Example Scores:
Team A:
Score from regression: 43000
Score from the hyper-parameter datasize (2048 kb): -2048 points Score from time of execution (2.242 seconds): -2242 points
Comments/Presentation: 4200 points
Total: 42910
Team B:
Score from regression: 46000
Score from the hyper-parameter datasize (10084 kb): -10084 points Score from time of execution (4.648 seconds): -4648 points
Comments/Presentation: 2200 points
Total: 33468
Teams Information: Please check the Rules section
Benchmarking compute cloud information:
Amazon EC2
t2.micro, 1 GiB of Memory, 1 vCPUs, EBS only, 64-bit platform
Code template:
https://github.com/williamnavaraj/BlueSkyChallenge.git
Potential FAQs:
1) Should the code run on a mobile device?
No. While the problem statement aims towards a mobile application/webapp, the current algorithms/code development is aimed to be tested in a consumer grade computer and will be benchmarked in an equivalent computing elastic cloud computing unit in the cloud.
2) What dataset will be used for testing?
A similar dataset as the dataset provided in the challenge (From same region and similar satellite data will be used for testing).
3) Will the ML training time/resources be taken into consideration for the grading?
No. Training can be carried out in any computing system. However, the resulting trained models should not exceed more than 4 GB for the compression/hyperparameter extraction and should not be more than 1 GB for the regression model.
4) What if we get negative total scores?
Given the penalty points, total negative scores are possible and are acceptable. The overall goal is to maximize the accuracy within the limited time and computing resources. The solutions will be ranked based on whoever gets the maximum in the positive direction.
5) Can we publish this work?
Yes. You are free to publish the work.
6) Who owns the IP?
You own the IP. On submission of the solution, you are contributing to IEEE a license to use your algorithm for potential app development to tackle and create awareness about air pollution.
References:
https://www.who.int/health-topics/air-pollution#tab=tab_1
Schmitz, O., Beelen, R., Strak, M. et al. High resolution annual average air pollution concentration maps for the Netherlands. Sci Data 6, 190035 (2019). https://doi.org/10.1038/sdata.2019.35
Michael Steininger, Konstantin Kobs, Albin Zehe, Florian Lautenschlager, Martin Becker, and Andreas Hotho. 2020. MapLUR: Exploring a New Paradigm for Estimating Air Pollution Using Deep Learning on Map Images. ACM Trans. Spatial Algorithms Syst. 6, 3, Article 19 (May 2020), 24 pages. DOI: https://doi.org/10.1145/3380973
Goal: Forecasting Sensor Measurements in Smart Air Aquality Monitoring System
Background:
Air quality has a significant impact on the overall well-being of humans and society across the globe. The rewards of good air quality are numerous, including substantial health, environmental, and economic benefits. However, as a result of increasing urbanisation and industrialisation, air quality in major cities around the world is becoming a source of concern. Several nations have made efforts to implement smart city initiatives, in which sensors play a vital role in informing both governing authorities and the general public about real-time air quality levels via mobile or web-based apps. Traditional sensor monitoring can be made smarter through the adoption of state-of-the-art machine learning algorithms, which will allow for an improvement in the current capabilities of air quality monitoring. In this context, the sub-theme 2 of the hackathon seeks to discover new innovative solutions for developing smart air quality monitoring systems by integrating sensor technology with machine learning algorithms.
Problem Statement:
A number of factors in the air can have an impact on its quality. Multiple sensors monitoring various parameters are used in air quality monitoring sensing systems, which are available as a whole suite. The role of temperature and carbon monoxide in air quality is vital. The following issue that may be addressed via this hackathon in order to make such systems smarter:
Temporal forecasting of temperature and Carbon Monoxide (CO) sensor data one day ahead: It can assist the general public and government officials in anticipating trends early in order to make timely decisions and take preventative actions.
Advanced machine learning algorithms combined with sensor data have the potential to be a leap forward and in addressing the problem listed above. Therefore, the primary emphasis of this sub-theme 2 is on the development of machine learning algorithm to solve the defined problem. To evaluate the developed machine learning algorithm, the participants can use the dataset from the air quality chemical multisensory device deployed in the field in an Italian city.
Dataset: https://www.kaggle.com/fedesoriano/air-quality-data-set?select=AirQuality.csv
Dataset Acknowledgements: Saverio De Vito (saverio.devito '@' enea.it), ENEA - National Agency for New Technologies, Energy and Sustainable Economic Development.
Citation Request: S. De Vito, E. Massera, M. Piga, L. Martinotto, G. Di Francia, On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario, Sensors and Actuators B: Chemical, Volume 129, Issue 2, 22 February 2008, Pages 750-757, ISSN 0925-4005.
(https://www.sciencedirect.com/science/article/pii/S0925400507007691)
Judging/scoring Criteria:
In total, the dataset contains 15 attributes. In this hackathon, we are restricting to use only 4 attributes. They are:
Initial Training Data Period: 7 days from 11/03/2004 00.00.00 to 17/03/2004 23.00.00
Testing Data Period: 7 days from 18/03/2004 00.00.00 to 24/03/2004 23.00.00
Each day of the training and test data period have 24 data points starting from 00.00.00 to 23.00.00.
Procedure:
a) Initially, train your machine learning model using 7 days data from 11/03/2004 00.00.00 to 17/03/2004 23.00.00.
b) Perform temporal forecasting (one-day ahead forecasting) for the 8th day using 7 days of data. Compare the forecast values with the real sensor data and perform performance evaluation using the metrics Mean Absolute Percentage Error (MAPE). Use your real sensor data for the 8th day as the true value while computing the performance metric for the 8th
c) Perform the temporal forecasting for the 9th day by updating the training database from the 8th day sensor measurements. Compute the forecasting performance metric for the 9th
d) Perform the temporal forecasting for the 10th day by updating the training database from the 9th day sensor measurements. Compute the forecasting performance metric for the 10th
e) Perform the temporal forecasting for the 11th day by updating the training database from the 10th day sensor measurements. Compute the forecasting performance metric for the 11th
f) Perform the temporal forecasting for the 12th day by updating the training database from the 11th day sensor measurements. Compute the forecasting performance metric for the 12th
g) Perform the temporal forecasting for the 13th day by updating the training database from the 12th day sensor measurements. Compute the forecasting performance metric for the 13th
h) Perform the temporal forecasting for the 14th day by updating the training database from the 13th day sensor measurements. Compute the forecasting performance metric for the 14th
JUDGING CRITERION: Determine the average of MAPE for the testing period (8th day to 14th day) and each day MAPE as well.
For evaluation of different teams, ranking orders among the teams will be computed by the judges for each judging criterion. Note: the participants need to use temperature data independently to evaluate their algorithm.
Potential FAQs:
1) Should the code run on a mobile device?
No. The current algorithms/code development is aimed to be tested in a consumer grade computer and will be benchmarked in an equivalent computing elastic cloud computing unit in the cloud.
2) What dataset will be used for evaluation?
The judges will be using the same dataset as provided.
3) Will the ML training time/resources be taken into consideration for the grading?
No.
4) Can we publish this work?
Yes. You are free to publish the work.
5) Who owns the IP?
You own the IP. On submission of the solution, you are contributing to IEEE a license to use your algorithm for potential app development to tackle and create awareness about air quality.
6) -200 in the dataset is an anomaly value. Can be removed as part of the training/testing dataset.
References:
Similar Work: K. Thiyagarajan, S. Kodagoda, L. Van Nguyen and R. Ranasinghe, "Sensor Failure Detection and Faulty Data Accommodation Approach for Instrumented Wastewater Infrastructures," in IEEE Access, vol. 6, pp. 56562-56574, 2018, doi: 10.1109/ACCESS.2018.2872506.
One winning team per sub theme will get prizes worth USD 1000.
One runner up team per sub theme will get prizes worth USD 500.