Hadoop Paradigm for Satellite Environmental Big Data Processing
2020, International Journal of Agricultural and Environmental Information Systems
https://doi.org/10.4018/IJAEIS.2020010102…
4 pages
1 file
Sign up for access to the world's latest research
Abstract
The important growth of industrial, transport and agriculture activities, has not led only to the air quality and climate changes issues, but also to the increase of the potential natural disasters. The emission of harmful gases, particularly: the Vertical Column Density (VCD) of CO, SO2 and NOx, is one of the major factors causing the aforementioned environmental problems. Our research aims to contribute finding solution to this hazardous phenomenon, by using Remote Sensing (RS) technique to monitor air quality which may help decision makers. However, RS data are not easy to manage, because of their huge size, high complexity, variety and velocity, Thus, our manuscript explains the different aspect of the used satellite data. Furthermore, this article have proved that RS data could be regarded as big data. Accordingly, we have adopted the Hadoop big data architecture and explained how to process efficiently RS environmental data.
Related papers
—Searching info on the web in today's world can be considered as dragging a net across the surface of the earth. While a great deal may be caught in the net, there is still a huge amount of information that is deep, and therefore, missed. The reason is simple: Most of the Web's information is buried down on dynamically produced sites, and standard search engines never find it, where data are hidden behind query interfaces. But a direct query is a "one at a time" laborious way to find info.Several factors contribute to making this problem particularly challenging. The Web is changing at a constant pace – new sources are added, and old sources are removed and modified. The remote wireless senses generate very massive amount real-time data from the Satellite or from the Aircraft with the assistance of the sensors. Technology trends for Big Data accept open source software, commodity servers, and massively parallel-distributed processing platforms. Analytics is at the core of exploiting values from Big Data to produce consumable insights for business and government. This paper presents architecture for Big Data Analytics and explores Big Data technologies offering SQL databases, Hadoop Distributed File System and Map-Reduce. The intended architecture has the aptness of storing incoming unprepared data to dispatch offline analysis on largely stored dumps when required. Concluding, a detailed analysis of remotely sensed earth observatory Big Data for ground or sea level are offered using Hadoop. The proposed architecture possess the ability of dividing, load balancing, and parallel processing of only useful data. Thus, it results in efficient analysis of real-time remote sensing Big Data using earth observatory system.
Enormous data generated by Satellite sensors, Storage and Processing of Remote Sensing Data is a challenging task due to its variety and volume. This paper studied on real-time Big Data Analytical architecture for remote sensing satellite application. To handle Remote Sensing Data proposed architecture comprises three main units, such as Data Pre-Processing Unit (DPREU), Data Analysis Unit (DAU) and Data Post-Processing Unit (DPOSTU). First, DPREU acquires the required data from satellite sensors by using filtration, balanced distributed storage and parallel processing using Hadoop environment. Second, DAU identifies the hidden patterns from data stored in distributed File System using Map functions followed by Reduce functions in Map-Reduce paradigm. Finally, DPOSTU is the upper layer unit of the proposed architecture, which is responsible for compiling storage of the results, and generation of decision based on the results received from DAU.
Concurrency and Computation: Practice and Experience, 2017
In this paper, we describe the work on the in-place query driven big data platform and applications built on the platform, which include processing climate simulation data and air pollution monitoring. The system architecture of this experimental platform comprises NCHC supercomputer, ALPS, storage pool, one master data node and 18 slave data nodes. The openSUSE operating system and MaraiaDB database are installed on all nodes. Master node is responsible for metadata management and information integration and 18 slave nodes for distributed database and parallel model simulation and computation/analysis. The data are distributed to local nodes according to a pre-defined data partition plan. When application software, such as simulation model or post-processing application, is executed on slave nodes, the relevant input data can be obtained by querying the local database and conduct computing locally. We have obtained the performance benchmark of the system from two applications and both have satisfactory results. When it is applied to global climate simulation, the model simulation is carried out on ALPS supercomputer, and the resultant temporal data are distributed to slave data nodes for parallel post-processing by using MPI. The global forecast data can be further downscaled in regional and local areas through different spatial-scale refinement of models or statistical approach of data mining. When applied to air pollution monitoring, the platform is connected to EPA open data which are used as an input for air pollution GTx model simulation. The simulation and data post-processing are both carried out by GTx on the slave nodes by way of distributed and shared-nothing processing. The influence weighting between point source and receptor or local monitor station thus can be determined. Air quality monitoring requires consider all kind of scenarios, including fixed point source and mobile source management. Therefore, it is necessary to run for many combinations of cases and to constitute knowledge base for fast decision support. Further study will be focused on the air pollution per-warning and response in Taichung city which will be linked with the smart city operation in the city.
IRJET, 2020
In this digital era, data is generated in great volume, variety and velocity. Not all of the data generated has significance. Insignificant and redundant data must be eliminated to form a quality dataset. This data generated in terabytes and petabytes leads to the coining of the term Big Data. The data must be optimally analyzed to enable better models to provide high precision recommendations and solutions to serve mankind. A variety of big data tools are employed to facilitate the faster processing of big data. However, there is no enough evidence to prove if the same tools and methods can be used to improve the analysis for relatively much smaller data. To test this, some of the big data methods and techniques are experimented on pollution data to improve the analysis of small data using big data analytical methods. The effect of quality of air on pollution is analyzed. Poor Air Quality is one of the major challenges that a country faces and is one of the leading causes of deaths. We analyse the major constituents of air that causes contamination of air.
2022
There is no doubt that air pollution harms human health. Municipal areas are the most affected by the degradation of the air quality by discharging anthropogenic gases from transport and industrial activities. This research collected remote sensing data from numerous satellite sensors to efficiently monitor the air quality in near-real-time. This paper deliberates the developed software based on the complex event processing calculating in streaming the air quality level in Morocco and Spain. Therefore, this computer program extracts only useful information rapidly from remote sensing big data helping decision-makers. This investigation takes up also a validation between the air quality measured by the ground station data of Andalucía and Madrid regions and the used satellite sensors data.
In the Republic of Korea, the building-type fish and agricultural farms are expected to emerge in the town areas or suburbs. Developed farming technologies that employ water recirculation equipments or LED lights are becoming are becoming more common and convenient. However, there are still some requirements required to successfully operate the farms and these requirements must be identified through analyses of various factors surrounding farms. This study conducts a research to obtain the analytical results and investigates their characteristics through visualization of the atmospheric environment data of Gangnam District provided by the Seoul Metropolitans Government to perform modeling of the preliminary big data analysis against the pollutants as a countermeasure to the bioaccumulation of heavy metals in the agricultural and marine products. The basic research was performed by visualizing the data obtained from the univariate, simple and multiple regression analyses for easy viewing, finding the a log-transformed model, and modeling overall characteristics through categorization of the explanatory variables. We hope that this research will assist the farmers in selecting their farming locations.
International Journal of Embedded and Real-Time Communication Systems, 2019
The world is witnessing important increases in industrial, transport and agriculture activities. This leads to economic growth, but, on the other hand, causes substantial damage in urban air, due to emissions of harmful gases, mainly CO, SO2, NO2 and the Particular Matter (PM). The World Health Organization (WHO) confirms that daily exposure to pollutants causes approximately three million deaths. It is therefore necessary to assess continuously the air quality. In this context, a Java-based application was developed to acquire data from EUMETSAT geostationary and Polar Orbit satellites, through the Mediterranean Dialogue Earth Observatory (MDEO) terrestrial station. This application filters, subsets, processes and visualizes products covering Morocco zone. Significant correlations were found between emissions and industrial activities related to power thermal plants, factories, transportation and ports.
Image processing algorithms related to remote sensing have been tested and utilized on the Hadoop MapReduce parallel platform by using an experimental 112-core high-performance cloud computing system that is situated in the Environmental Studies Center at the University of Qatar. Although there has been considerable research utilizing the Hadoop platform for image processing rather than for its original purpose of text processing, it had never been proved that Hadoop can be successfully utilized for high-volume image files. Hence, the successful utilization of Hadoop for image processing has been researched using eight different practical image processing algorithms. We extend the file approach in Hadoop to regard the whole TIFF image file as a unit by expanding the file format that Hadoop uses. Finally, we apply this to other image formats such as the JPEG, BMP, and GIF formats. Experiments have shown that the method is scalable and efficient in processing multiple large images used mostly for remote sensing applications, and the difference between the single PC runtime and the Hadoop runtime is clearly noticeable.
International Journal of Electrical and Computer Engineering (IJECE), 2018
This paper presents a data processing system based on an architecture comprised of multiple stacked layers of computational processes that transforms Raw Binary Pollution Data coming directly from Two EUMETSAT MetOp satellites to our servers, into ready to interpret and visualise continuous data stream in near real time using techniques varying from task automation, data preprocessing and data analysis to machine learning using feedforward artificial neural networks. The proposed system handles the acquisition, cleaning, processing, normalizing, and predicting of Pollution Data in our area of interest of Morocco.
INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2021
Multi-area and multi-faceted remote sensing (SAR) datasets are widely used due to the increasing demand for accurate and up-to-date information on resources and the environment for regional and global monitoring. In general, the processing of RS data involves a complex multi-step processing sequence that includes several independent processing steps depending on the type of RS application. The processing of RS data for regional disaster and environmental monitoring is recognized as computationally and data demanding.Recently, by combining cloud computing and HPC technology, we propose a method to efficiently solve these problems by searching for a large-scale RS data processing system suitable for various applications. Real-time on-demand service. The ubiquitous, elastic, and high-level transparency of the cloud computing model makes it possible to run massive RS data management and data processing monitoring dynamic environments in any cloud. via the web interface. Hilbert-based da...

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (1)
- Yongyut Trisurat, Rob Alkemade and Peter H. Verburg (2011). Land Use, Climate Change and Biodiversity Modeling: Perspectives and Applications (pp. 199-218). www.igi-global.com/chapter/modeling-land-use-biodiversity- northern/53753?camid=4v1a