As new problems emerge or existing ones resurface, deep learning algorithms keep coming to the rescue. The availability of data, and today’s computational power allows researchers and scientists to reach for machine learning solutions to real-world problems.
Identifying water bodies in images and/or videos is one of those problems. Computer vision seems promising and algorithms are becoming more and more reliable and effective. Here, I attempt to provide an overview of the problem of semantically segmenting water bodies and why we should bother.
Semantic Segmentation
Put simply, semantic segmentation is the process of assigning a class label to every pixel in a given image. The task is accomplished by generating a segmentation mask whose each pixel is colored based on the class label it has been assigned. It is an efficient and useful operation with a variety of computer vision applications: Autonomous vehicles, robotics, scene understanding, and medical imaging are just a few. This is typically done using deep learning techniques, such as convolutional neural networks (CNNs) — although attention mechanisms and vision transformers have taken over —, which are trained on a dataset of labeled images.
Semantic Segmentation of Waterbodies in images
Identifying water bodies in images is another application of Computer Vision algorithms and a problem that has recently attracted the attention of both academia and industry. The task here is reduced to only separating the pixels into two classes: water and non-water. On some occasions, techniques try to classify the different water bodies, assigning different class labels for rivers, lakes, oceans, etc. A few methods are also interested in providing segmentation information for related objects in the scene. For example, the ATLANTIS dataset provides segmentation masks that include class labels such as buildings, vegetation, and snow amongst others.
Why?
Our urge to automate procedures and processes and create more controllable environments, or even replace humans in labor-intensive or safety-critical work has resulted in a huge increase in the number of possible stakeholders interested in such algorithms. Not only can computers identify water bodies in an image, but as with other computer vision tasks, the potential of accomplishing it with a super-human performance is imminent. And to cut a long story short, here is a short list of some of the applications that semantic segmentation of water bodies in images can be used:
- Natural Disaster Management: Having systems that identify water bodies in images in real-time can be beneficial for emergency responders, especially in flood-affected areas. Governments and city councils can also use such systems to understand the extent of flooding and the conditions under which it is prominent, and plan their response appropriately.
- Environmental Monitoring: By tracking changes in water bodies over time, researchers and scientists can monitor the health and quality of aquatic ecosystems and identify potential pollution or other environmental problems.
- Urban Planning: Identifying water bodies in images of urban areas can help planners make decisions about land use and development, such as where to locate parks or recreational areas.
- Coastal zone management: Water segmentation can be used to identify and map coastal features, such as beaches and coral reefs, to help in the management and conservation of these areas. (suggestion by chatGPT)
As aforementioned, one of the main factors behind the success of water segmentation models and the noticeable growth in the literature has been the availability of data. Except for the carefully constructed datasets such as ATLANTIS, FloodNet, or AquaNet, webcams placed in real-world locations also provide vast amounts of data, useful for training and fine-tuning the proposed approaches.
One can imagine that it is possible to evaluate the algorithms’ performance by measuring the overlap between the predicted water pixels of the generated segmentation masks and the ground truth water pixels, a metric commonly known as Intersection over Union (IoU). Other metrics also exist and they provide valuable insights into the performance of a water segmentation model and can help to identify areas for improvement.
Without diving into technical details, it is worth mentioning some of the problems the current methods face. Similarly to all existing computer vision and other deep learning methods, the accuracy of the computations is not perfect. At least, not yet. Some of the failure cases can be summarized as follows:
- Water appearance: the water body in the image appears in different forms, such as calm or turbulent and with or without waves. It can also have a reflective surface that also includes shadows or specular highlights.
- Weather conditions: Fog, rain, snow, or even a very bright sun can make the task extremely difficult by affecting the appearance of the water and by making the image noisy.
- Occlusions: In the case of flooding, the water bodies might include various objects such as cars, trees, buildings, or even people. This makes the segmentation task quite difficult.
These are just a few scenarios, but one can easily think of a multitude of conditions that can negatively impact the predictions of these computational models (e.g., low visibility/night, unusual water color). And all these scenarios justify the significance and the increasing growth of the research focus on Out-of-distribution generalization. In this case, an “out-of-distribution” scenario refers to a situation where the algorithm is asked to identify and segment water pixels in an image that is significantly different from the training data it was originally trained on. Out-of-distribution generalization urges for algorithms that can generalize to unseen examples, and therefore become more accurate and reliable.
Of course, there are scenarios for which even for a human it is difficult to distinguish the water pixels from non-water pixels in a given image. But maybe the time that our artificial neural networks perform this task better than us, is not too far off. Creating diverse datasets (even synthetic ones) that capture a wide variety of conditions might provide a solution to the problem. Or maybe different mechanisms and operations are required in the construction of the neural networks we use. Either way, water segmentation is just another problem, the solution to which will provide a better understanding of our own visual perception.