Skip to main content

Mi Lu

Texas A&M University, Electrical and Computer Engineering, Faculty Member

Followers

55

Following

0

Public Views

Texas A

less

Uploads

Papers by Mi Lu

A Reconfigurable Approximate Floating-Point Multiplier with kNN

Due to the high demands for computing, the available resources always lack. The approximate compu... more Due to the high demands for computing, the available resources always lack. The approximate computing technique is the key to lowering hardware complexity and improving energy efficiency and performance. However, it is a challenge to properly design approximate multipliers since input data are unseen to users. This challenge can be overcome by Machine Learning (ML) classifiers. ML classifiers can predict the detailed feature of upcoming input data. Previous approximate multipliers are designed using simple adders based on ML classifiers but by using a simple adder-based approximate mul-tiplier, the level of approximation cannot change at runtime. To overcome this drawback, using an accumulator and recon-figurable adders instead of simple adders are proposed in this paper. Also, the rounding technique is applied to approximate floating-point multipliers for further improvement. Our experimental results show that when the error tolerance of our target application is less than 5%, the proposed approximate multiplier can save area by 70.98%, and when the error tolerance is less than 3%, a rounding enhanced simple adders-based approximate multiplier can save area by 65.9% and a reconfigurable adder-based approximate multiplier with rounding can reduce the average delay and energy by 54.95% and 46.67% respectively compared to an exact multiplier.

Object Detection in Hazy Environment Enhanced by Preprocessing Image Dataset with Synthetic Haze

Object detection in hazy environment has always been a difficult task in the autonomous driving f... more Object detection in hazy environment has always been a difficult task in the autonomous driving field. Huge breakthrough is hard to achieve due to the lack of large-scale hazy image dataset with detailed labels. In this work, we present a simple and flexible algorithm to generate synthetic haze to MS COCO training dataset, which aims to enhance the performance of object detection in haze when taking the new synthesized hazy images as training dataset. Our algorithm is inspired by the Multiple Linear Regression Dark Channel Prior (MLDCP), and we obtain a general model that can add synthetic haze to haze-free images by implementing Stochastic Gradient Descent (SGD) to the reversed MLDCP model. We further evaluate the mean average precision (mAP) of Mask R-CNN when we train the network with the Hazy-COCO training dataset and preprocessing test hazy dataset with existing single image dehazing algorithms.

Dilated Fully Convolutional Neural Network for Depth Estimation from a Single Image

Convolutional Neural Network (CNN) has achieved particularly good results on depth estimation fro... more Convolutional Neural Network (CNN) has achieved particularly good results on depth estimation from a single image. However, certain disadvantages exist including: (1) Traditional CNNs adopt pooling layers to increase the receptive field, but it will lower the resolution and cause the information loss. (2) Almost all frameworks of CNN proposed for depth estimation apply the fully connected layers to obtain global information and they will introduce too many parameters which often result in out-of-memory problems. In this paper, we present a new module named dilated fully convolutional neural network to tackle these disadvantages. On one hand, the developed method takes advantages of dilated convolutions that can support the exponential expansion of the receptive field without loss of resolution; On the other hand, our module replaces the fully connected layers with the fully convolutional layers, which can significantly reduce the number of parameters to make our module more universal. By experiments, we show that the presented module achieves state-of-the-art results on NYU Depth V2 datasets.

An Optimal Parallel Algorithm for the Longest Common Subsequence Problem on CREW-PRAM Model

A subsequence of a given string is any string obtained by deleting none or some symbols from the ... more A subsequence of a given string is any string obtained by deleting none or some symbols from the given string. A longest common subsequence of two strings is a common subsequence of both that is as long as any other common subsequences. The longest commons subsequence problem is to find a longest common subsequence of two given strings. The complexity of the problem on the decision tree model is known as mn, where m and n are the lengths of these two strings, respectively, and m <= n. We present a parallel algorithm for this problem on the CREW PRAM model, which takes O(log2 m log log m) time with mn / log2 m log log m processors when log2 m log log m > log n, or otherwise O(log n) time with mn / log n processors.

Efficient Data Routing Strategy for Scalable Distributed Mobile Multiprocessor Networks

In this paper, we present a novel scheme called SEEK (Spatial Embedded Environment Knowledge) to ... more In this paper, we present a novel scheme called SEEK (Spatial Embedded Environment Knowledge) to efficiently route data while performing location management dynamically for a scalable distributed mobile multiprocessor network. Our first novel idea uses an 'age' parameter to elect non-static location servers in each local partition. We use the term 'location server' loosely, since all the nodes in the network have equal likelihood of being elected a Location Manager. Secondly, our method incorporates path setup for data transfer during location inquiry and update steps. Furthermore, our algorithm accounts for the dynamic nature of intermediate nodes in which these nodes may also be moving during data routing. The main contributions of our paper are: A unique addressing scheme that is hierarchical and exploits locality. We present the SEEK set of algorithms that perform location management and data routing by establishing partitions of nodes for local identification. We use age as a parameter to determine partition location managers in each dynamic environment. We incorporate data routing strategy in the location inquiry/update stage. Finally, we present theoretical analysis to show the complexity of our scheme and computer simulations to illustrate the performance of SEEK.

Chroma Uniformity Assessment of LED Display Based on HVS

Chroma uniformity assessment of LED displays is important for LED assessment. In this paper, we p... more Chroma uniformity assessment of LED displays is important for LED assessment. In this paper, we propose a method to assess the chroma uniformity of LED displays based on HVS. We set three feature factors to assess the partial feature including chroma characteristic, texture details and spatial location feature factors. Images of LED displays are captured by a CCD camera. These images will be partitioned into several parts with same sizes after certain image processing procedures. The feature factors of each part will be calculated. We multiply them together to obtain the final feature value. This method fuses objective assessment results and subjective assessment results. It is a proficient way to assess chroma uniformity which is approaching the assessment of human's eyes.

Improved Stereo Matching based on Convolutional Neural Network

The MC-CNN (as Matching Cost Convolutional Neural Network) based on Convolutional Neural Network ... more The MC-CNN (as Matching Cost Convolutional Neural Network) based on Convolutional Neural Network (CNN) obtains good results in stereo matching. However, it often takes tremendous time to run algorithms by GPU, because stereo images are big and the training module takes a lot of samples. In this paper, T-CNN, a stereo matching algorithm consisting of two main modules: training and testing is proposed. For the training module, fewer samples from the stereo image patches are trained in order to reduce the running time. For the testing module, we combine the values from the MC-CNN network, color information and gradient information to obtain cost matching function. In our experiments, the matching costs of T-CNN are compared with that of MC-CNN networks, showing that our algorithm T-CNN outperforms the MC-CNN.

AN ATTENTION-AWARE BIDIRECTIONAL MULTI-RESIDUAL RECURRENT NEURAL NETWORK (ABMRNN): A STUDY ABOUT BETTER SHORT-TERM TEXT CLASSIFICATION

Long Short-Term Memory (LSTM) has been proven an efficient way to model sequential data, because ... more Long Short-Term Memory (LSTM) has been proven an efficient way to model sequential data, because of its ability to overcome the gradient diminishing problem during training. However, due to the limited memory capacity in LSTM cells, LSTM is weak in capturing long-time dependency in sequential data. To address this challenge, we propose an Attention aware Bidirectional Multi-residual Recurrent Neural Network (ABMRNN) to overcome the deficiency. Our model considers both past and future information at every time step with omniscient attention based on LSTM. In addition to that, the multi-residual mechanism has been leveraged in our model which aims to model the relationship between current time step with further distant time steps instead of a just previous time step. The results of experiments show that our model achieves state-of-the-art performance in classification tasks.

English Out-of-Vocabulary Lexical Evaluation Task

Unlike previous unknown nouns tagging task, this is the first attempt to focus on out-of-vocabula... more Unlike previous unknown nouns tagging task, this is the first attempt to focus on out-of-vocabulary (OOV) lexical evaluation tasks that does not require any prior knowledge. The OOV words are words that only appear in test samples. The goal of tasks is to provide solutions for OOV lexical classification and predication. The tasks require annotators to conclude the attributes of the OOV words based on their related contexts. Then, we utilize unsupervised word embedding methods such as Word2Vec and Word2GM to perform the baseline experiments on the categorical classification task and OOV words attribute prediction tasks.

Multiple Linear Regression Haze-removal Model Based on Dark Channel Prior

Dark Channel Prior (DCP) is a widely recognized traditional dehazing algorithm. However, it may f... more Dark Channel Prior (DCP) is a widely recognized traditional dehazing algorithm. However, it may fail in bright region and the brightness of the restored image is darker than hazy image. In this paper, we propose an effective method to optimize DCP. We build a multiple linear regression haze-removal model based on DCP atmospheric scattering model and train this model with RESIDE dataset, which aims to reduce the unexpected errors caused by the rough estimations of transmission map t(x) and atmospheric light A. The RESIDE dataset provides enough synthetic hazy images and their corresponding groundtruth images to train and test. We compare the performances of different dehazing algorithms in terms of two important full-reference metrics, the peak-signal-to-noise ratio (PSNR) as well as the structural similarity index measure (SSIM). The experiment results show that our model gets highest SSIM value and its PSNR value is also higher than most of state-of-the-art dehazing algorithms. Our results also overcome the weakness of DCP on real-world hazy images.

Comparisons and Selections of Features and Classifiers for Short Text Classification Related content Arabic text classification using master- slaves technique Research on Classification of Chinese Text Data Based on SVM

Short text is considerably different from traditional long text documents due to its shortness an... more Short text is considerably different from traditional long text documents due to its shortness and conciseness, which somehow hinders the applications of conventional machine learning and data mining algorithms in short text classification. According to traditional artificial intelligence methods, we divide short text classification into three steps, namely preprocessing, feature selection and classifier comparison. In this paper, we have illustrated step-by-step how we approach our goals. Specifically, in feature selection, we compared the performance and robustness of the four methods of one-hot encoding, tf-idf weighting, word2vec and paragraph2vec, and in the classification part, we deliberately chose and compared Naïve Bayes, Logistic Regression, Support Vector Machine, K-nearest Neighbor and Decision Tree as our classifiers. Then, we compared and analyzed the classifiers horizontally with each other and vertically with feature selections. Regarding the datasets, we crawled more than 400,000 short text files from Shanghai and Shenzhen Stock Exchanges and manually labeled them into two classes, the big and the small. There are eight labels in the big class, and 59 labels in the small class.

Improved Harmonic Analysis Based on Quadruple- spectrum-line Interpolation FFT with Multiple Cosine Window

To improve the accuracy of harmonic analysis in non-synchronous sampling, a new quadruple-spectru... more To improve the accuracy of harmonic analysis in non-synchronous sampling, a new quadruple-spectrum-line(QSL) interpolation FFT algorithm with multiple cosine window is proposed. Firstly, the characteristics in time domain and frequency domain of a multiple cosine window are analyzed, which is 6-term cosine window. Then, the QSL interpolation algorithm is analyzed. And the parameters estimation formulas based on the new algorithm are derived. In the end, the experiments of the algorithm are implemented through the simulated harmonic environment. The experimental results show that the multiple cosine windows have obvious advantages in sidelobe characteristics, which can better suppress the spectrum leakage. And the proposed algorithm can accurately calculate the parameters of each harmonic.

A self-adaptive algorithm to defeat text-based CAPTCHA

CAPTCHA (Completely Automated Public Turing test to Tell Computers and Humans Apart) is almost ev... more CAPTCHA (Completely Automated Public Turing test to Tell Computers and Humans Apart) is almost everywhere in data entry due to automated scripts like bots. Nowadays, text-based scheme is still applied most widely, which typically need the users to answer questions regarding recognition task. In particular, the segmentations of different types of CAPTCHAs are not always the same. As so far, there isn't any universal way to solve the segmentation problems. In this paper, we present a novel adaptive algorithm and based on that we create a system to defeat several CAPTCHAs at the same time. The CAPTCHA datasets we used are from the State Administration for Industry&Commerce of the People's Republic of China. There are totally 33 entrances of CAPTCHAs we need to solve. In this experiments, we assume that each of the entrance is known. Results are provided showing how our algorithms work well towards these CAPTCHAs.

Effects of Mobility on Latency in a WSN that Accommodates Mobile Nodes

Several applications have been proposed for mobile wireless sensor networks. Some of these applic... more Several applications have been proposed for mobile wireless sensor networks. Some of these applications require the transfer of a large amount of data in a short period of time. This is challenging, since mobility can lead to a deterioration in the quality of an established link. Frequent link disconnection may in turn require a mobile node to repeatedly establish new links with the surrounding relay nodes to proceed with the data transfer. The new link establishment may cause extra data communication latency and make most of the applications delay sensitive. To evaluate the effect of mobility on latency, this paper first sets up a mathematical model based on a hybrid medium access control (MAC) protocol in mobile scenarios. It then uses NS2 simulation to further analyze the latency associated with mobility. Both results show that the latency increases with an increment in the network density and the duty cycle.

Asynchronous Baseband Processor Design for Cooperative MIMO Satellite Communication

The challenges in satellite communication (SatCom) include but not limited to the customary compl... more The challenges in satellite communication (SatCom) include but not limited to the customary complications of telecom-munication such as channel condition, signal to noise ratio (SNR), etc. SatCom system is also prone to transient and permanent radiations hazards. Hence, in spite of the harsh environmental factors (weather phenomena, solar events, etc), a SatCom system must maintain reliable and predictable communication functions with limited source of power. This paper presents a SatCom system design for achieving both low-power and high fidelity communication. The design uses cooperative multiple input multiple output (MIMO) for spectral efficiency and diversity, low-density parity-check (LDPC) decoding for near Shannon-limit gain, and dynamic voltage and frequency scaling (DVFS)-assisted asynchronous circuit designs to achieve low-power and fault tolerance. The MIMO system permits uninterrupted service in the event of temporary/permanent link or unit failures. The results show that the resilience against injected radiation levels of upto about 25 fempto-Coulombs on critical path is achieved. This is more than 600 times the minimum charge required to logically flip a gate output in ordinary static CMOS gate.

Rate Allocation Algorithm with Successive Refinement in Peer-to-Peer Netoworks

We introduce a new across-peer rate allocation algorithm with successive refinement to improve th... more We introduce a new across-peer rate allocation algorithm with successive refinement to improve the video transmission performance in P2P networks, based on the combination of multiple description coding and network coding. Successive refinement is implemented through layered multiple description codes. The algorithm is developed to maximize the expected video quality at the receivers by partitioning video bitstream into different descriptions depending on different bandwidth conditions of each peer. Adaptive rate partition adjustment is applied to ensure the real reflection of the packet drop rate in the network. Also the granularity is changed to the scale of atomic blocks instead of stream rates in prior works. Through simulation results we show that the algorithm outperforms prior algorithms in terms of video playback quality at the peer ends, and helps the system more adjustable to the peer dynamics.

Low-Power On-the-Fly Reconfigurable Iterative MIMO Detection and LDPC Decoding Design

Manufacturing and operation of wireless systems require a practical solution for achieving low-po... more Manufacturing and operation of wireless systems require a practical solution for achieving low-power and high-performance when using advance communication apparatus such as that using multiple-input and multiple-output (MIMO). Often algorithm solutions achieve very high performance but over only in a narrow range of operating parameters. This paper presents a hardware design of MIMO detection that allows real-time switching between various algorithms and detection effort to achieve high performance over the wide-range of signal to noise ratio (SNR) found in realistic operating conditions. We illustrate a design with over 80% reduction in detection power that satisfies the required quality of service (QoS) in SNRs (Eb/No) as low as 8.7 dB.

The Characteristic Parameter Estimation of Low Temperature Target Weak Signal Based on VanderPol-Duffing System

The weak signal, which is usually submerged in strong noise, is very difficult to detecte for its... more The weak signal, which is usually submerged in strong noise, is very difficult to detecte for its amplitude and frequency. The dynamic properties of VanderPol-Duffing are studied in this paper. Such system can go into the chaos under certain parameters. In chaotic state the disturbance of weak periodic signals can make the system dynamic behavior change dramatically. Our research results show that the system is from period doubling state to chaotic state when the amplitude of input signal is changed. And it has a remarkable impact influence on the system dynamic performance when the input frequency is varied. The unknown frequency can be detected through counting the numbers of turning point in phase diagram. The simulation results verified that the presented method is feasible and there are a lot of theory values in the research.

. The Promising Chaos Model Establishment of Weak Signal Detection System

In optical systems, the energies are very low if the photo source is divided through polarization... more In optical systems, the energies are very low if the photo source is divided through polarization beam splitters. Most of the energies are submerged in the noise and the useful signal is even weaker than the noise. Therefore we must detect the weak useful signal from the strong noise in order to better understand the radiant features of weak signals. Nowadays, there are a lot of methods to be used in chaotic system to detect weak signals among which the Duffing system is often used. In this paper we conduct researches on several typical chaotic systems and their dynamic properties are analyzed. The purpose is to find promising chaotic model to detect weak signals and compare the detection precision with that of the Duffing system. The simulation results show that the high order chaotic system can detect the weak signal amplitude and the detected signal-to-noise ratio is even lower than that of the Duffing system. It is also shown that the presented method is feasible.

FACIAL FEATURE EXTRACTION FROM RANGE IMAGES USING A 3D MORPHABLE MODEL

In this paper, a novel scheme is introduced for human facial feature extraction. Unlike previous ... more In this paper, a novel scheme is introduced for human facial feature extraction. Unlike previous methods that fit a 3D morphable model to 2D intensity images, our scheme utilizes 3D range images to extract features without requiring manually-defined initial landmark points. A linear transformation is used to achieve the mapping between the 3D model and a 3D range image, which makes the computation simple and fast. Moreover, our scheme is robust to the illumination and pose variations. In addition to features from range images, extra features can be obtained by examining optional 2D texture images. Using our scheme, we can also perform automatic eye/mouth corner localization. Experimental results show the high accuracy and robustness of our scheme.