Fully Convolutional Network

description15 papers

group2 followers

lightbulbAbout this topic

A Fully Convolutional Network (FCN) is a type of deep learning architecture that utilizes convolutional layers without fully connected layers, enabling the model to accept input of arbitrary size and produce spatially structured outputs, commonly used for tasks such as image segmentation and pixel-wise classification.

lightbulbAbout this topic

Key research themes

1. How can residual learning improve the optimization and depth scalability of fully convolutional networks for visual recognition tasks?

This research area is centered on overcoming the degradation problem in deep convolutional networks by using residual learning frameworks within fully convolutional architectures. The challenge involves easing optimization of substantially deeper nets while maintaining or improving accuracy for tasks such as semantic segmentation, object detection, and image classification. Residual networks (ResNets) refactor convolutional layers as residual functions with identity shortcut connections enabling easier training for very deep architectures without increased complexity, thereby expanding the representational capacity of fully convolutional networks.

Deep Residual Learning for Image Recognition

by gonzalo gonzalez

2020

Key finding: Introduced a residual learning framework that explicitly lets layers fit residual mappings instead of unreferenced functions, implemented via identity shortcut connections in deep fully convolutional networks. This approach... Read more

articleView Paper downloadDownload

Wide deep residual networks in networks

by Hmidi Alaeddine

2022

Key finding: Proposed increasing the width of Deep Residual Network in Network (DrNIN) architectures while reducing depth to counteract decreasing feature reuse and training inefficiencies in very deep residual networks. This wider... Read more

articleView Paper downloadDownload

DN-ResNet: Efficient Deep Residual Network for Image Denoising

by Mostafa El-khamy

2023, Computer Vision – ACCV 2018

Key finding: Developed a deep residual fully convolutional network with residual blocks for blind image denoising under various noise types including Gaussian, Poisson, and Poisson-Gaussian. Using cascade training for stage-wise residual... Read more

articleView Paper downloadDownload

Free-Space Detection with Self-Supervised and Online Trained Fully Convolutional Networks

by Willem P Sanberg

2024, IS&T International Symposium on Electronic Imaging Science and Technology

Key finding: Presented a fully convolutional network leveraging residual learning for per-pixel free-space detection in automotive scenes, trained via self-supervised online methods using stereo disparity cues as weak labels. The residual... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What architectural modifications to fully convolutional networks can improve semantic segmentation accuracy, robustness, and computational efficiency across various domains?

This theme investigates CNN architectural variants that simplify, regularize, or extend fully convolutional networks to better capture spatial context, incorporate multi-scale feature representations, or reduce parameter complexity while maintaining or improving semantic segmentation accuracy. Innovations include replacing pooling layers with strided convolutions, dilated convolutions, incorporating spatial regularization terms, multi-modal data fusion, and light-weight designs tuned for real-time biomedical imaging and complex scene parsing.

STRIVING FOR SIMPLICITY: THE ALL CONVOLUTIONAL NET

by Gorge Washington

2017

Key finding: Demonstrated that max-pooling layers in fully convolutional networks for semantic segmentation and image recognition can be replaced effectively by convolutional layers with stride greater than one, simplifying network... Read more

articleView Paper downloadDownload

Semantic Segmentation of Pathological Lung Tissue with Dilated Fully Convolutional Networks

by Stavroula Mougiakakou

2023, IEEE journal of biomedical and health informatics

Key finding: Employed dilated convolutions within fully convolutional networks to perform end-to-end and semi-supervised semantic segmentation of pathological lung interstitial diseases in HRCT scans. The model handled arbitrary image... Read more

articleView Paper downloadDownload

A regularized convolutional neural network for semantic image segmentation

by Mohamed El Ansari

2024, Analysis and Applications

Key finding: Introduced spatial total variation (TV) regularization integrated into CNN segmentation models via modified activation functions (e.g., softmax), resulting in spatially smooth and robust segmentation maps within fully... Read more

articleView Paper downloadDownload

CFPNet-M: A light-weight encoder-decoder based network for multimodal biomedical image real-time segmentation

by Murray Loew

2023, Computers in Biology and Medicine

Key finding: Designed a novel lightweight fully convolutional encoder-decoder architecture incorporating dilated channel-wise CNN modules and feature pyramid representations for real-time segmentation of diverse biomedical images.... Read more

articleView Paper downloadDownload

Topology Aware Fully Convolutional Networks For Histology Gland Segmentation

by Ghassan Hamarneh

2016

Key finding: Incorporated geometric and topological priors, including smoothness and hierarchical label relations, directly into the loss function of fully convolutional networks for histology gland segmentation. This topology-aware... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can fully convolutional networks be extended or fused with specialized modules and learning strategies for enhanced scene understanding and multimodal image analysis?

This theme focuses on augmenting the core fully convolutional architecture with supplementary networks, loss functions, or learning paradigms to handle diverse modalities, improve detail representation, and enable adaptable or online learning. Research explores multi-modal data fusion (e.g., RGB-D), combined fully connected-convolutional layers for GANs, dual-path networks for image restoration, self-supervised training with weak labels, and recurrent or LSTM modules integrated with FCNs for temporal or weather image classification tasks.

Fully Convolutional Networks for Semantic Segmentation from RGB-D images

by Nicolai Harich

2022

Key finding: Developed an FCN-based semantic segmentation pipeline that processes RGB-D camera data to detect and localize objects in warehouse environments, handling limited training datasets via data augmentation, transfer learning, and... Read more

articleView Paper downloadDownload

FCC-GAN: A Fully Connected and Convolutional Net Architecture for GANs

by Sukarna Barua

2022, ArXiv

Key finding: Proposed a novel GAN architecture combining deep fully connected layers with convolutional layers within both generator and discriminator networks, diverging from conventional convolution-only GAN designs. The FCC-GAN... Read more

articleView Paper downloadDownload

Learning Dual Convolutional Neural Networks for Low-Level Vision

by Yu-Wing Tai

2024, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Key finding: Introduced dual CNN architectures composed of complementary subnetworks estimating structures and details separately for image restoration tasks. This approach integrated two fully convolutional networks trained with combined... Read more

articleView Paper downloadDownload

Towards an automated weather forecasting and classification using deep learning, fully convolutional network, and long short-term memory

by International Journal of Electrical and Computer Engineering (IJECE) and

2025, International Journal of Electrical and Computer Engineering (IJECE)

Key finding: Proposed an FCN combined with Long Short-Term Memory (LSTM) to leverage spatial and temporal features from weather image datasets for real-time weather condition classification. The FCN component extracted spatial image... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Fully Convolutional Network

Towards an automated weather forecasting and classification using deep learning, fully convolutional network, and long short-term memory

by International Journal of Electrical and Computer Engineering (IJECE) and

2025, International Journal of Electrical and Computer Engineering (IJECE)

Historically, weather forecasting was unreliable and imprecise, relying on intuition and local knowledge. Inaccurate weather forecasts can cause severe impacts on agriculture, construction, and daily life. Existing methods struggle with... more

descriptionView Paper arrow_downwardDownload

Integrated U-Net segmentation and gated recurrent unit classification for accurate brain tumor diagnosis from magnetic resonance imaging images

by International Journal of Electrical and Computer Engineering (IJECE)

2024, International Journal of Electrical and Computer Engineering (IJECE)

Early diagnosis and proper grouping of tumors in the brain are critical for successful therapy and positive outcomes for patients. This work proposes a complete technique for identifying brain tumors that employ sophisticated artificial... more

descriptionView Paper arrow_downwardDownload

Adaptive Feature Recombination and Recalibration for Semantic Segmentation With Fully Convolutional Networks

by Victor Alves

2024, IEEE Transactions on Medical Imaging

Fully Convolutional Networks have been achieving remarkable results in image semantic segmentation, while being efficient. Such efficiency results from the capability of segmenting several voxels in a single forward pass. So, there is a... more

descriptionView Paper arrow_downwardDownload

A Detailed Look At CNN-based Approaches In Facial Landmark Detection

by Chin-Laung Lei

2024, arXiv (Cornell University)

Facial landmark detection has been studied over decades. Numerous neural network (NN)-based approaches have been proposed for detecting landmarks, especially the convolutional neural network (CNN)-based approaches. In general, CNN-based... more

descriptionView Paper arrow_downwardDownload

Adaptive Feature Recombination and Recalibration for Semantic Segmentation With Fully Convolutional Networks

by Carlos Silva

2024, IEEE Transactions on Medical Imaging

descriptionView Paper arrow_downwardDownload

An Implementation of Fully Convolutional Network for Surface Mesh Segmentation

by Taiyu Zhang

2024

descriptionView Paper arrow_downwardDownload

Classification of a Small Imbalanced Dataset of Vine Leaves Images using Deep Learning Techniques

by Tomas Horvath

2024

Convolutional Neural Network (CNN) has become one of the most popular techniques in image classification. Usually CNN models are trained on a large amount of data, but in this paper, it is discussed CNN usage on data shortage and class... more

descriptionView Paper arrow_downwardDownload

Fully Convolutional Network Based Ship Plate Recognition

by Jiehan Zhou

2024

Ship plate recognition is challenging due to variations of plate locations and text types. This paper proposes an effcient Fully Convolutional Network based Plate Recognition approach FCNPR, which uses a CNN (Convolutional Neural Network)... more

descriptionView Paper arrow_downwardDownload

Parallel quantum-inspired evolutionary algorithms for community detection in social networks

by Tamanna Gupta

2024, Applied Soft Computing

The world around us may be viewed as a network of entities interconnected via their social, economic, and political interactions. These entities and their interactions form a social network. A social network is often modeled as a graph whose nodes represent entities, and edges represent interactions between these entities. These networks are characterized by the collective latent behavior that does not follow trivially from the behaviors of the individual entities in the network. One such behavior is the existence of hierarchy in the network structure, the sub-networks being popularly known as communities. Discovery of the community structure in a social network is a key problem in social network analysis as it refines our understanding of the social fabric. Not surprisingly, the problem of detecting communities in social networks has received substantial attention from the researchers. In this paper, we propose parallel implementations of recently proposed community detection algorithms that employ variants of the well-known quantum-inspired evolutionary algorithm (QIEA). Like any other evolutionary algorithm, a quantum-inspired evolutionary algorithm is also characterized by the representation of the individual, the evaluation function, and the population dynamics. However, individual bits called qubits, are in a superposition of states. As chromosomes evolve individually, the quantum-inspired evolutionary algorithms (QIEAs) are intrinsically suitable for parallelization. In recent years, programmable graphics processing units-GPUs, have evolved into massively parallel environments with tremendous computational power. NVIDIA R compute unified device architecture (CUDA R) technology, one of the leading general-purpose parallel computing architectures with hundreds of cores, can concurrently run thousands of computing threads. The paper proposes novel parallel implementations of quantum-inspired evolutionary algorithms in the field of community detection on CUDA-enabled GPUs. The proposed implementations employ a single-population fine-grained approach that is suited for massively parallel computations. In the proposed approach, each element of a chromosome is assigned to a separate thread. It is observed that the proposed algorithms perform significantly better than the benchmark algorithms. Further, the proposed parallel implementations achieve significant speedup over the serial versions. Due to the highly parallel nature of the proposed algorithms, an increase in the number of multiprocessors and GPU devices may lead to a further speedup.

descriptionView Paper arrow_downwardDownload

RU-Net2+: A Deep Learning Algorithm for Accurate Brain Tumor Segmentation and Survival Rate Prediction

by Hussain Syed

2024, IEEE Access

Brain tumors present a significant medical concern, posing challenges in both diagnosis and treatment. Deep learning has emerged as an evolving technique for automating the diagnostic process for brain tumors. This research paper introduces a novel deep-learning framework designed explicitly for brain tumor diagnosis. The framework encompasses various tasks: tumor detection, classification, segmentation, and survival rate prediction. The framework was applied to the BraTS dataset, an extensive collection of brain tumor images, to evaluate its effectiveness. The proposed workflow initiates with data acquisition, followed by an enhancement of this data using a Convolutional Normalized Mean Filter (CNMF) during pre-processing. This prepares the data for the multi-class classification performed using the novel DBT-CNN classifier model. The RU-Net2+ model is employed for precise tumor demarcation, yielding segmented regions from which features are subsequently extracted utilizing the Cox model. These extracted features play a pivotal role in the final step, where the survival rate of patients is predicted using a logistic regression model. The experimental results showcased the exceptional performance of the proposed framework, surpassing current benchmarks in classification accuracy, tumor segmentation precision, and survival rate prediction. For high-grade glioma (HGG) tumors, the framework achieved an impressive classification accuracy of 99.51%, while for low-grade glioma (LGG) tumors, the accuracy reached 99.28%. The accuracy of tumor segmentation stood at 98.39% for HGG tumors and 99.1% for LGG tumors. The RU-Net2+ algorithm accurately predicts patient survival rates: 85.71% long-term, 72.72% medium-term, and 61.54% short-term, with corresponding Mean Squared Errors of 0.13, 0.21, and 0.31. These results provide valuable insights for medical professionals making brain tumor treatment decisions. Additionally, the framework shows promise for automating brain tumor diagnosis and enhancing patient care. INDEX TERMS Brain tumor, MRI images, deep learning, machine learning, CNMF, RU-Net2+, DBT-CNN, BraTs. I. INTRODUCTION Brain tumors pose a significant health concern and can cause severe patient consequences. It is crucial to promptly and precisely diagnose brain tumors to facilitate effective treatment strategies [1]. The conventional approaches to segmenting, classifying, and predicting the risks associated with brain tumors have encountered limitations in accuracy and efficiency. Deep learning-based models have recently emerged as The associate editor coordinating the review of this manuscript and approving it for publication was Zhan-Li Sun. powerful tools in medical imaging analysis [2]. These models can significantly improve the accuracy and efficiency of brain tumor diagnosis. However, significant challenges hinder their effective deployment in clinical settings. These challenges include data quality and availability, computational complexity, inter-modality variations, model generalization, overfitting, interpretability, temporal dynamics, annotation, labeling issues, integration into clinical workflows, and ethical considerations, including data privacy and biases [3]. In this environment, there's a pressing requirement for an advanced deep learning model that can adeptly and precisely

FIGURE 1. Proposed model architecture for brain tumor diagnosis.

FIGURE 3. A comparison between the original and processed images using CNMF. FIGURE 2. Sample representation of MRI scans of different modalities T1, T1C, T2, and FLAIR respectively from the BraTS dataset [23].

TABLE 2. Detail description of datasets, the total number of patients, images, and LGG, HGG cases.

FIGURE 4. Proposed RU-Net2+ architecture for segmentation [26]

FIGURE 5. Proposed classification architecture DBT-CNN for multi-class classification of brain tumor over the features of a particular layer instead of normalizing input features over batches as done in batch normalization. Mean (jz) and variance (o) of the layer are used to compute the layer normalization as shown in equation (12)

FIGURE 6. Represent a sample of HGG original, extracted affected region, and segmented region from T1, TIC, T2, AND FLAIR MRI sequences using proposed RU-Net2+. The left column represents the HGG pre-processed images, the middle column represents the affected extracted regions, and the right column represents the segmented images, where each row represents the T1, TIC, T2, and FLAIR MRI sequences.

FIGURE 7. Represent a sample of LGG original, extracted affected region, and segmented region from T1, TIC, T2, AND FLAIR MRI sequences using proposed RU-Net2+. The left column represents the LGG pre-processed images, the middle column represents the affected extracted regions, and the right column represents the segmented images, where each row represents the T1, T1C, T2, and FLAIR MRI sequences.

FIGURE 8. DBT-CNN model confusion matrix for HGG tumor multi-classification.

FIGURE 9. DBT-CNN model confusion matrix for LGG tumor multi-classification.

FIGURE 10. ROC-AUC curve for multi-classification of HGG brain tumor types.

FIGURE 12. Comparison of the proposed model with existing models. FIGURE 11. ROC-AUC curve for multi-classification of LGG brain tumor types.

TABLE 1. Summary of brain tumor segmentation and classification studies. radionics features [18]. The approach uses a combination of MRI images, 3D U-Nets, and radiomic features to segment brain tumors and predict patient survival. The approach was evaluated on the BraTS 2018 dataset and achieved an accu- racy of 91% for tumor segmentation and 82% for patient survival prediction. requirements, ethical considerations, and the validation of the proposed model in real-world clinical scenarios. The proposed work must address these challenges comprehen- sively to ensure the utilization of representative and diverse datasets, mitigate the risk of overfitting, enhance the accuracy of segmentation and classification, improve the prediction of survival rates, provide interpretability of the model’s out- puts, integrate information from various modalities, optimize computational efficiency, adhere to ethical guidelines, and validate the effectiveness of the model in real-world clini- cal settings. Successfully overcoming these challenges will contribute significantly to the development of a robust and practical deep learning-based multimodal diagnostic model for the analysis of brain tumors.

with probability p < 0.05 were considered and features with p > 0.05 were eliminated, in total. After finalizing the fea- tures survival rate was predicted using a logistic regression model. Predictions were made using the validation set and the performance was compared using accuracy and leave-one-out cross-validation. TABLE 3. Hyper parameters details defined for building the model.

In the above table TP, FP, TN, FN, TPR (True Positive, False Positive, True Negative False Negative, and True Positive Rate respectively). TABLE 4. Performance measure used for evaluating the proposed system.

TABLE 5. Performance of proposed segmented model RU-Net2+ for both HGG and LGG tumors in terms of pixel accuracy and dice-scor

TABLE 9. Accuracy of proposed model vs. existing models.

descriptionView Paper arrow_downwardDownload

CLERA: A Unified Model for Joint Cognitive Load and Eye Region Analysis in the Wild

by Bruce Mehler

2024, ACM Transactions on Computer-Human Interaction

Non-intrusive, real-time analysis of the dynamics of the eye region allows us to monitor humans’ visual attention allocation and estimate their mental state during the performance of real-world tasks, which can potentially benefit a wide... more

descriptionView Paper arrow_downwardDownload

Convolutional neural networks for the analysis of broadcasted tennis games

by Mustafa Jaber

2024, IS&T International Symposium on Electronic Imaging Science and Technology

The analysis of complex structured data like video has been a long-standing challenge for computer vision algorithms. Innovative deep learning architectures like Convolutional Neural Networks (CNNs), however are demonstrating remarkable... more

descriptionView Paper arrow_downwardDownload

Image-based MRI detection of brain tumours using convolutional neural networks

by perumal sankar and

2024, Review of Computer Engineering Research

Rapid and uncontrolled cellular proliferation is what distinguishes a brain tumor. Unfortunately, brain tumors cannot be prevented other than via surgery. As predicted, deep learning may help diagnose and cure brain cancers. The... more

descriptionView Paper arrow_downwardDownload

Current Trends in Human Pupil Localization: A Review

by Akila Subasinghe

2024, IEEE Access

Pupil localization extracts pupil center coordinates from images and videos of the human eye along with the pupillary boundary. Pupil localization essentially plays a major role in identity verification, disease recognition, visual focus... more

FIGURE 1. Parts of eye detected by a pupil localization scheme [3] assessing students’ engagement in learning activities [1]. Tracking VFOA assists in diagnosing autism or austism spectrum disorder (ASD), which is a lifelong developmental INDEX TERMS Eye and face databases, eye tracking, gaze tracking, iris recognition, learning-based methods, non-learning-based methods, pupil localization. The associate editor coordinating the review of this manuscript and

FIGURE 2. Trending applications of pupil localization in virtual reality gaming platforms, social interaction analysis, identity authentication, students’ engagement tracking, internet marketing, disease diagnosis, and driver fatigue detection.

FIGURE 3. Categorization of eye-tracking setups: Head mounted setups and Non-head mounted setups.

FIGURE 4. Classification of the eye database into highly controlled, less controlled, and uncontrolled categories.

FIGURE 5. Challenges recorded in pupil localization literature. Images are taken from [3] and [32].

image processing, machine learning, and their applications in disaster management. ASHMINI JEEVA received the B.Sc. degree in electrical and electronic engineering from South Eastern University, Sri Lanka, in 2021. She started her career as a Research and Development Engineer with Ceylon Agro Food Technologies (Pvt) Ltd., in 2022. She is currently a Research Assistant with the Department of Electrical and Electronic Engineering, Faculty of Engineering, University of Sri Jayewardenepura (USJ). Her research interests are unmanned aerial vehicles, image processing, machine learning, and their applications in disaster manasement.

TABLE 2. VS eye database categorization according to highly controlled, less controlled, and uncontrolled with synthetic eye databases with details of each database.

TABLE 3. Classification of face databases.

TABLE 4. Summarization of learning based pupil localization algorithms.

descriptionView Paper arrow_downwardDownload

Exploring deep features: deeper fully convolutional neural network for image segmentation

by Bin Kabir

2023

Classification of images has been a widely regarded challenge for the past decade, but a new type of object recognition problem which deals with pixellevel segmentation is posing a more complex task for both computer vision enthusiasts... more

Figure 4.2: A graph to show the increase in mean accuracy per iterations reaches 78.6 percent after 400,000 iterations. If we plot a graph of iteration vs loss then plot seems to have

Figure 4.4: A graph showing the increase in mean intersection over union after each iteration. local context seems to be important over global contexts.

Figure 4.5: A graph showing the increase in pixel accuracy after each iteration.

Figure 4.6: Showing the frequency weighted accuracy increasing after each iteration.

Figure 5.2: FCN-8 Heavy Figure 5.3: FCN-8 At Once

Figure 5.6: Original Image Figure 5.4: VGG19 FCN

Table 4.1: Below is the chart of results we found after validating on the reduced set ot VOC2012 data.

Table 5.1: Below is our Pixel Accuracy, Mean Accuracy, MeanIOU and FW Accuracy com- pared with other models. As seen, most of the scores generate better outcome than the previous models for each metric.

Table 5.2: Comparative metrics and loss graph against iterations of the total validation process. spikes with the help of snapshot of every 5,000 iterations.

Table 5.4: A chart showing different model’s test result in VOC2012 segmentation challenge.

Table 5.3: Hyperparameter tuning for 2nd stage of training

descriptionView Paper arrow_downwardDownload

CLERA: A Unified Model for Joint Cognitive Load and Eye Region Analysis in the Wild

by Bruce Mehler

2023, ACM Transactions on Computer-Human Interaction

descriptionView Paper arrow_downwardDownload

Multi-Object Classification and Unsupervised Scene Understanding Using Deep Learning Features and Latent Tree Probabilistic Models

by Tejaswi Nimmagadda

2023, arXiv (Cornell University)

Deep learning has shown state-of-art classification performance on datasets such as ImageNet, which contain a single object in each image. However, multi-object classification is far more challenging. We present a unified framework which... more

descriptionView Paper arrow_downwardDownload

DeepPaint : A Tool for Image Inpainting

by Kushagr Gupta

2023

This project focuses on solving the inpainting problem using both Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) approaches. Popular methods of inpainting include Adobe Photoshop’s Content Aware Fill, which do not... more

descriptionView Paper arrow_downwardDownload

A Brief Survey and an Application of Semantic Image Segmentation for Autonomous Driving

by aysegul ucar

2023, Handbook of Deep Learning Applications

Deep learning is a fast-growing machine learning approach to perceive and understand large amounts of data. In this paper, general information about the deep learning approach which is attracted much attention in the field of machine... more

descriptionView Paper arrow_downwardDownload

Mapping and Discriminating Rural Settlements Using Gaofen-2 Images and a Fully Convolutional Network

by Ziran Ye

2023, Sensors

New ongoing rural construction has resulted in an extensive mixture of new settlements with old ones in the rural areas of China. Understanding the spatial characteristic of these rural settlements is of crucial importance as it provides... more

descriptionView Paper arrow_downwardDownload

Sub-pixel face landmarks using heatmaps and a bag of tricks

by Sanjana Jain

2023, ArXiv

Accurate face landmark localization is an essential part of face recognition, reconstruction and morphing. To accurately localize face landmarks, we present our heatmap regression approach. Each model consists of a MobileNetV2 backbone... more

descriptionView Paper arrow_downwardDownload

Fully Convolutional Regression Network for Accurate Detection of Measurement Points

by Jimmy Jia

2023, Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support

Accurate automatic detection of measurement points in ultrasound video sequences is challenging due to noise, shadows, anatomical differences, and scan plane variation. This paper proposes to address these challenges by a Fully... more

descriptionView Paper arrow_downwardDownload

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

by Lyndon Chan

2023, International Journal of Computer Vision

Recently proposed methods for weakly-supervised semantic segmentation have achieved impressive performance in predicting pixel classes despite being trained with only image labels which lack positional information. Because image... more

descriptionView Paper arrow_downwardDownload

Deep Learning for Multi-Animal Tracking

by Madalena Valente

2023

Tese de mestrado integrado, Engenharia Biomédica e Biofísica (Engenharia Clínica e Instrumentação Médica) Universidade de Lisboa, Faculdade de Ciências, 2021In collective behaviour studies, the use of multi-animal tracking systems is... more

descriptionView Paper arrow_downwardDownload

Robot localization using soft object detection

by Kostas Daniilidis

2023, 2012 IEEE International Conference on Robotics and Automation

In this paper, we give a new double twist to the robot localization problem. We solve the problem for the case of prior maps which are semantically annotated perhaps even sketched by hand. Data association is achieved not through the... more

descriptionView Paper arrow_downwardDownload

MRI image segmentation using machine learning networks and level set approaches

by wisam ali

2022, International Journal of Electrical and Computer Engineering (IJECE)

The segmented brain tissues from magnetic resonance images (MRI) always pose substantive challenges to the clinical researcher community, especially while making precise estimation of such tissues. In the recent years, advancements in... more

descriptionView Paper arrow_downwardDownload

PixISegNet: pixel‐level iris segmentation network using convolutional encoder–decoder with stacked hourglass bottleneck

by Shreshth Saini

2022, IET Biometrics

In this paper, we present a new iris ROI segmentation algorithm using a deep convolutional neural network (NN) to achieve the state-of-the-art segmentation performance on well-known iris image data sets. The authors' model surpasses the... more

descriptionView Paper arrow_downwardDownload

Eye Localisation using Cascaded U-Net for Autism Spectrum Disorder

by Nawres KHLIFA

2022, WSEAS TRANSACTIONS ON SIGNAL PROCESSING

Many computer vision applications Computer Aided Diagnosis require an accurate and efficient eye detector. We represent, in this work, an efficient approach for determining the position of the eye in images presenting faces. First, a... more

descriptionView Paper arrow_downwardDownload

Parallel Loopy Belief Propagation in Conditional Random Fields

by Katharina Morik

2022

Structured real world data can be represented with graphs whose structure encodes independence assumptions within the data. Due to statistical advantages over generative graphical models, Conditional Random Fields (CRFs) are used in a... more

descriptionView Paper arrow_downwardDownload

LOTR: Face Landmark Localization Using Localization Transformer

by Sanjana Prakash Jain

2022, IEEE Access

This paper presents a novel Transformer-based facial landmark localization network named Localization Transformer (LOTR). The proposed framework is a direct coordinate regression approach leveraging a Transformer network to better utilize... more

descriptionView Paper arrow_downwardDownload

Sub-pixel face landmarks using heatmaps and a bag of tricks

by Sanjana Prakash Jain

2022, ArXiv

descriptionView Paper arrow_downwardDownload

Accurate Pupil Center Detection in Off-the-Shelf Eye Tracking Systems Using Convolutional Neural Networks

by Arantxa Villanueva

2022, Sensors

Remote eye tracking technology has suffered an increasing growth in recent years due to its applicability in many research areas. In this paper, a video-oculography method based on convolutional neural networks (CNNs) for pupil center... more

descriptionView Paper arrow_downwardDownload

MRI image segmentation using machine learning networks and level set approaches

by wisam ali

2022, International Journal of Electrical and Computer Engineering (IJECE)

Figure 2. Fully convolutional network (FCN) 3.2. Level set . Image segmentation [18] makes wide application of level set (LS) method with active contouring (AC) as it performs automatic manoeuvring of possible topological changes. The potential of LS in attaining accuracy in segmenting brain tumour is documented in [12], [13].

4. EXPERIMENT RESULT 4.1. Dataset and measurement From the (7), it is observed that initial part denotes the inner contour and second part defied as the length of the contour C. Moreover, it is noticed that first part to be neglected in case of the u=0. Also, with similar to standard VLS,v > 0 and it is useful for the noise free. But the study introduced v <0 on case of the brain tumour. Next, the third part is represented as ground truth images and complete part is neglected in case of the a > 0. At final, the last two parts related the energy in and out contour C. In case of the brain tumour, in and out counters are defined as A, and A, and also these values to be always positive. Finally, the two constants C, and C, are specified in (8) and (9) and are used to optimize the energy function @ as shown in Figure 3.

Figure 4. Comparison results of sample images from left to right, ground truth, U-Net, and proposed method Table 1. Performance of proposed method vs standard methods tested via BraTS 2013 dataset

BIOGRAPHIES OF AUTHORS 25] S. Pereira, A. Pinto, V. Alves, and C. A. Silva, “Brain tumor segmentation using convolutional neural networks in MRI images, in IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1240-1251, May 2016, doi: 10.1109/TMI.2016.2538465.

Table 2. Performance variations in methods on BraTS 2015 dataset As shown in Table 1 and Table 2, there are consistent better dice scores in sensitivity and specificity produced by the proposed algorithm. The performance can be credited to the availability of additional training data from both 2013 and 2015 BraTS dataset that helped in fine tuning the hyper-parameters of this algorithm. Such as reliability against outliers, speed makes this algorithm achieve better segmentation of core tumour. Figure 4 shows the result of proposed method, quantitatively and shows that the proposed method produces best segmentation results as compared to the standard U-Net method. Based on the 2015 BraTS dataset, WT, CT, and enhancing CT regions receive dice score, 0.82, 0.73 and 0.68 in HGG case. In case of LGG, for WT, CT, and enhancing CT regions receive dice scores of 0.82, 0.57, and 0.40 in this model. Thus, it has achieved high performance, which is better than the other standard methods. The sensitivity and specificity values achieved on BraTS 2013 dataset are 0.83, 0.82, and 0.86); and 0.85, 0.77, and 0.79, respectively. Similarly, in case of LGG, the sensitivity and specificity values achieved on BraTS 2013 dataset are 0.82, 0.76, and 0.78; and 0.84, 0.74, and 0.77, respectively.

descriptionView Paper arrow_downwardDownload

Easymatch- An Eye Localization Method for Frontal Face Images Using Facial Landmarks

by Asaf Varol

2022, Tehnicki vjesnik - Technical Gazette

Eye detection algorithms are being used in many fields such as camera applications for entertainment and commercial purposes, gaze detection applications, computer-human interaction applications, and eye recognition applications for... more

Figure 1 Three main steps of eye finding algorithms which use Haar-Like cascades The purpose of detecting the face is to ensure that the search area has limited characteristics and appearances. There should not be objects or images like the eye in the search area. Also by reducing the area, the workload of the algorithm can be reduced. Face detection is a decade-old phenomenon, and there are many methods to solve this [20]. In literature, many choose the Viola-Jones method to detect the faces by using Haar-Like cascades. According to the tests carried out on the BioID database, this method manages to successfully detect faces in 1365 images out of 1521 images which means over 89% success rate. There are only eleven false positives out of 1521 images. These false positive results are usually a smaller region within a face so it can be easily cast off by choosing the biggest face from detected faces. For TalkingFace database success rate is 99.1%.

Figure 3 The numeric markup for landmark points According to the tests carried out on the BioID database, the facial landmarks successfully detect the faces in the 1509 images from 1521 images, which means more than 99% success rate. According to the test carried out on the TalkingFace database, the facial landmarks successfully detect all faces. There are five false positives in TalkingFace database. However, like the Viola-Jones method, these false positive results are usually a smaller region within a detected face, thus it can be easily discarded by choosing the biggest face from detected faces. The ColorFeret database success rate is also 100%. As seen in the results; using facial landmarks has a much better success rate than the Viola-Jones method.

Figure 4 Created eye region using landmarks and the natural rate of eye and distance between two eye corners

The circular characteristic of the eyes makes it easy to use in a matching algorithm. In this study, images are reduced to black and white, and then a circle is used as a template for matching. Even though eyes are not fully circle, but an ellipse with a ratio of 1.13 usually this difference is negligible because cameras do not have the necessary resolution to catch that difference. However, in a state where the camera has a good resolution to catch that difference, an ellipse must be used as a template instead of a circle. A simple input image which will be used in template matching is created using an estimated radius value as seen in Fig. 5. However, this approach does not provide satisfactory results.

Figure 7 Reduced window movement Considering characteristics of the eye, detecting eye posi of th wind ime ion in a windowed manner within an eye region may be unnecessarily time-consuming. Also, as the resolution e eye region grows, the time needed to compute every ow would also grow. In order to further reduce the consumption, instead of checking all of the windows, he two-stage window movement is tried as seen in Fig. 7. “Or best he starting position, the middle of the y-axis of the eye region is chosen as a center of the y-axis of the window. After finding the best x-axis position for given y-axis; using X-axis position the window is moved in y-axis to determine best overall position. 2.4 Matter of Blink

Using all points within the circle has a good success rate. However, in order to further reduce the workload of the method instead of using all points within a circle reduced points are used for experiment as seen in Fig. 6.

upper and bottom eyelids. If this distance is close to one or two pixels, then it can be assumed that the eye is shut. However, there is also a matter of deciding in which condition it should be deemed there is a blink. Should a moment where the eye is completely shut be chosen or should moments where eyes are partially shut also be added? As seen in Fig. 8, the act of blinking is composed of a few stages. When looking from the perspective of getting information out of images, in a state where eyes are partially shut and in a state where eyes are completely shut, they are the same because both have the same meaning that the eyes are in a state where they do not concentrate on looking. So, if the distance between the eyelids is smaller than half of the eye radius, it can be assumed that the eye is closed. Even though this is a practical approach there are still some negligible issues. When a person laughs or looks at a bright sight, people tend to partially close their eyes which may be interpreted as a blink. It is not easy to detect the difference between a blink and a laugh or a bright light reflex. It is possible to use other landmark points which correspond to mouth to detect the laugh or it may be possible to check the brightness by using a histogram. However, it would be too much effort to detect the difference, and it is an acceptable error. Also, it will not make much difference in practice.

Figure 9 Example of the successful results for BiolD database (e < 0.025)

Figure 11 Example of the unsuccessful result of BiolD database (e < 0.025) Figure 10 Example of the unsuccessful result for TalkingFace database (e < 0.025)

Figure 10 Example of the successful result for TalkingFace database (e < 0.025)

Figure 12 In some cases eye positions ns of the BiolD database are not accurate. These images are marked as unsuccessful for e < 0.05 In Fig. 13 and Fig. 14 unsuccessful results for BioID and TalkingFace databases are given. However, as seen in the images these results were successful. It has been noticed that some of the eye positions are not annotated accurately.

Figure 14 Example of the successful result of ColorFeret database (e < 0.025) Our attempt to correct the inaccurate eye positions showed that even the same person might mark different locations as an eye in different trials, which is why it has been decided to use given eye positions. However, it should be noted that for (e < 0.025) some of the successful results will be marked as unsuccessful.

Figure 13 Successful localization, which is marked as unsuccessful for Talkinc Face database (e < 0.025)

Figure 15 Example of the unsuccessful result of ColorFeret database (e< 0.025) out of the three databases. However, it has also the poorest eye position information. Usually, they merely did mark somewhere within an eye. They did not mark eye centers. In Fig. 15, some successful results from ColorFeret database are given. It can be seen that as the resolution grows the method can cope with it. Fig. 16 shows yet another example of failed eye corner detection for ColorFeret database. As seen in images, eye corners are not accurate which causes unsuccessful results. However, since eye positions are not accurate, ColorFeret database has not been used for success comparison.

(e), (f), and (g) are the lists which have a low point count. In descending order, they are sorted (g), (e), and (f) according to the number of points they contain. The results of the (f) are surprisingly good for TalkingFace database, As seen in Fig. 6, (h) is the input list where all points within the eye circle are used. Thus the best result is expected from this list and as seen in Tab. 2 for both databases (h) has the best results. (b), (c), and (d) have the lowest score in both databases, and all of these input lists have the points from a circle which has a half radius of the eye. Usually, reflection occurs within this circle and, we believe, that is the reason for the low success rates.

Table 3 Experimental Results For Two Stage Window Movements 3.3 Comparison with the State of the Art

Easymatch method seems to be average against state of the art for the BioID database. However, it should be noted even though all of the methods use BioID database some of them use the database partially such as removing images with people who use eyeglasses or removing images with closed eyes. Also, some of the studies [6, 8, 10, 11] had used the Viola-Jones face detector to detect the face and according to our tests; successful face detection

descriptionView Paper arrow_downwardDownload

Title Multi-Object Classification and Unsupervised Scene Understanding Using Deep Learning Features and Latent Tree Probabilistic Models Permalink

by Tejaswi Nimmagadda

2022

Deep learning has shown state-of-art classification perfor mance on datasets such as ImageNet, which contain a single object in each image. How ever, multi-object classification is far more challenging. We present a unified f ramework... more

descriptionView Paper arrow_downwardDownload

Multi-Object Classification and Unsupervised Scene Understanding Using Deep Learning Features and Latent Tree Probabilistic Models

by Tejaswi Nimmagadda

2022

descriptionView Paper arrow_downwardDownload

Talking with Peter Brook

by Georges Banu

2022, The Drama Review: TDR

For the challenging semantic image segmentation task the most efficient models have traditionally combined the structured modelling capabilities of Conditional Random Fields (CRFs) with the feature extraction power of CNNs. In more recent... more

descriptionView Paper arrow_downwardDownload

Smartphone application to evaluate the individual possibilities for the application of electric vehicles

by Constantinos Sourkounis

2022, 2017 Twelfth International Conference on Ecological Vehicles and Renewable Energies (EVER)

This paper deals with a mobile phone application that allows the user to analyse their driving habits and therefore recognize whether an electric vehicle would suit the users' requirements. The application analyses the daily trips and... more

[1] J. Tompson, A. Jain, Y. LeCun, C. Bregler. “Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation” proceedings of Neural Information Processing Systems (NIPS 2014) [2] J.Shotton et al. “Real-Time Human Pose Recognition in Parts from Single Depth Images’, in CVPR 2011

descriptionView Paper arrow_downwardDownload

Fully Convolutional Networks for Semantic Segmentation from RGB-D images

by Nicolai Harich

2022

In recent years new trends such as industry 4.0 boosted the research and development in the field of autonomous systems and robotics. Robots collaborate and even take over complete tasks of humans. But the high degree of automation... more

descriptionView Paper arrow_downwardDownload

Deep interpretable architecture for plant diseases classification

by Abdelouahab MOUSSAOUI

2022, 2019 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)

Recently, many works have been inspired by the success of deep learning in computer vision for plant diseases classification. Unfortunately, these end-to-end deep classifiers lack transparency which can limit their adoption in practice.... more

descriptionView Paper arrow_downwardDownload

The involvement of the factor V G1691A gene on recurrent ischemic stroke in young adults

by Fatma Megdiche

2022, Journal of the Neurological Sciences

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY

descriptionView Paper arrow_downwardDownload

MRI image segmentation using machine learning networks and level set approaches

by Layth Kamil

2022, International Journal of Electrical and Computer Engineering (IJECE)

descriptionView Paper arrow_downwardDownload

Sub-pixel face landmarks using heatmaps and a bag of tricks

by Pavit Noinongyao

2022, ArXiv

descriptionView Paper arrow_downwardDownload

Simulating Realistic MRI variations to Improve Deep Learning model and visual explanations using GradCAM

by Muhammad Patel

2022, ArXiv

In the medical field, landmark detection in MRI plays an important role in reducing medical technician efforts in tasks like scan planning, image registration, etc. First, 88 landmarks spread across the brain anatomy in the three... more

descriptionView Paper arrow_downwardDownload

ARSynth : Robust Real-Time Human Torso Tracking from Synthetically Trained Deep Neural Networks

by Endri Dibra

2022, Proceedings of 3DBODY.TECH 2019 - 10th International Conference and Exhibition on 3D Body Scanning and Processing Technologies, Lugano, Switzerland, 22-23 Oct. 2019

Robust real-time tracking of the human body is crucial to applications that benefit from live visualizations performed on the underlying body. Such applications could fall in the category of Augmented Reality for Human Bodies, finding... more

descriptionView Paper arrow_downwardDownload

LOTR: Face Landmark Localization Using Localization Transformer

by Ankush Ganguly

2022, ArXiv

FIGURE 1. The overview of Localization Transformer (LOTR). It consists of three main modules: 1) a visual backbone, 2) a Transformer network, and 3) a landmark prediction head. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: D | oo my, Pe 10.1109/ACCESS.2022.3149380, IEEE Access

Watchareeruetai et a/.: LOTR: Face Landmark Localization Using Localization Transformer FIGURE 2. Comparison of Wing loss and smooth-Wing loss (top) and their gradient (bottom) in the global view (left), at the outer threshold w (middle), and at equals zero (right). The parameters are set as follows: w = 10, € = 2, and only for smooth-Wing, t = 0.01. For the Wing loss (blue dashed lines), the gradient changes abruptly at the points |x| = w (bottom-middle) and at x = 0 (bottom-right). On the other hand, the proposed smooth-Wing loss (orange solid lines) is designed to eliminate these gradient discontinuities.

where ¢ is an additional threshold (0 < t < w). When the error is smaller than the inner threshold t, it behaves like L2 loss, allowing the gradient to be smooth at zero error; otherwise, it behaves like the Wing loss. We define In this work, we also propose a modified Wing loss, named smooth-Wing loss (s-wing()), which is given by: where the loss function g(a) can be any standard loss func- tion such as LI, L2, smooth-L1, or Wing loss, which are described in Section II-E. Although Feng ef al. [39] reported that the Wing loss was superior to other loss functions for landmark localization, its major drawback is the gradient discontinuity at the threshold w and around the zero error (Fig. 2). This discontinuity can affect the convergence rate and the stability of training.

Watchareeruetai et a/.: LOTR: Face Landmark Localization Using Localization Transformer TABLE 3. The evaluation results for different LOTR models on the JD-landmark test set; + and + denote the first and second place entries.

FIGURE 3. Sample images of the test set of the WFLW dataset with predicted landmarks from our model. Each column displays the images with different subsets. Each row displays images with a different range of NMEs: < 0.05 (top), 0.05-0.06 (middle), and > 0.06 (bottom).

TABLE 1. The architectures of LOTRs used in the experiments with the JD-landmark dataset. described in Section III-B, setting the inner threshold (¢) to 0.01, the outer threshold (w) to 10, and the steepness control parameter (€) to 2 as the parameters. For data augmentation and training tricks, we used the same steps described by [1]. We implemented the model using MXNet framework [73] with Gluon libraries [74], using a single NVIDIA Titan X GPU for training.

This work is licensed under a Creative Commons Attribution- NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/

TABLE 6. Comparison of prediction accuracy (AUC) with different loss functions. the results from Feng et al. [39]. Unlike L1 loss, which maintains a constant gradient value across the error range, L2 produces a smaller gradient near zero, which causes the models to ignore small error values. Thus, L2 loss is sensitive to outliers making it less responsive to relatively smaller errors. Consequently, the models trained with L2 loss may end up omitting small errors, which may yield inaccurate predictions.

TABLE 5. Comparison of prediction accuracy, inference time, and model size. TABLE 7. Comparison of prediction accuracy (AUC) based on varying number of Transformer layers.

TABLE 4. The performance (TAR @ FAR = 107 *) of face recognition on several benchmarks.

descriptionView Paper arrow_downwardDownload

An Occluded Stacked Hourglass Approach to Facial Landmark Localization and Occlusion Estimation

by Mohan Trivedi

2022

A key step to driver safety is to observe the driver's activities with the face being a key step in this process to extracting information such as head pose, blink rate, yawns, talking to passenger which can then help derive higher... more

descriptionView Paper arrow_downwardDownload

LOTR: Face Landmark Localization Using Localization Transformer

by Ankush Ganguly

2022, ArXiv

descriptionView Paper arrow_downwardDownload

RITnet: Real-time Semantic Segmentation of the Eye for Gaze Tracking

by Rakshit Kothari

2022, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

Accurate eye segmentation can improve eye-gaze estimation and support interactive computing based on visual attention; however, existing eye segmentation methods suffer from issues such as person-dependent accuracy, lack of robustness,... more

Figure 1. Comparison of model performance on difficult samples in the OpenEDS test-set. Top-row left to right shows eyes ob- structed due to prescription glasses, heavy mascara, dim light and partial eyelid closure. Rows from top to bottom show input test images, ground truth labels, predictions from mSegNet w/BR [4] and predictions from RITnet, respectively. (akc5959, rsk3900,ma7583,sxd7257,nrn2741,rjboves,kanan, gabriel.diaz, jeff.pelz) @rit.edu

Figure 2. Architecture details of RITnet. DB refers to Down- Block, UB refers to Up-Block, and BN stands for batch normaliza- tion. Similarly, m refers to the number of input channels (m = 1 for gray scale image), c refers to number of output labels and p refers to number of model parameters. Dashed lines denote the skip connections from the corresponding Down-Blocks. All of the Blocks output tensors of channel size m=32.

To accommodate variation in individual reflectance properties (e.g., iris pigmentation, eye makeup, skin tone or eyelids/eyelashes) [4] and HMD specific illumination (the position of infrared LEDs with respect to the eye), we per- formed two pre-processing steps. These steps were based on the difference in the train, validation and test distribu- tions of mean image brightness (Figure 11 in Garbin et. al [4]).Pre-processing reduced these differences and also in- creased separability of certain eye features. First, a fixed gamma correction with an exponent of 0.8 was applied to all input images. Second, we applied local Contrast Lim- ited Adaptive Histogram Equalization (CLAHE) with a grid size of 8x8 and clip limit value of 1.5 [19]. Figure 3 shows an image before and after pre-processing.

Figure 4. Generation of a starburst pattern from the training im- age 000000240768. Left to Right: Original image, selected reflec- tions, concatenating with its 180° rotation, final pattern mask (best viewed in color).

Figure 5. Our model struggles to do an accurate segmentation when eye masks are heavily blurred or defocused.

descriptionView Paper arrow_downwardDownload

Sub-pixel face landmarks using heatmaps and a bag of tricks

by Pavit Noinongyao

2022, ArXiv

descriptionView Paper arrow_downwardDownload