Papers by Mohammad Samir Farooqi
Enhancing the Classification of Biosynthetic Gene Clusters through Comprehensive NLP-Based Approach

Internet of Things in Forestry and Environmental Sciences
Forum for interdisciplinary mathematics, 2020
Internet of Things (IoT) is a revolutionary technology that aims to interconnect everyday objects... more Internet of Things (IoT) is a revolutionary technology that aims to interconnect everyday objects equipped with identity, sensors, networking, and processing capabilities and allow them to communicate with one another and with other devices and services over the Internet to accomplish some objective. This is a transition from interconnected computers to interconnected things that require support for interoperability among heterogeneous devices enabling simplification of new application development for programmers under the infrastructure of IoT. Middleware for IoT is a software layer interposed between the infrastructure and the applications that basically aims to support important requirements for these applications (Yu et al. in Cybern. Inf. Technol. 14(5):51–62, 2014). Generally, the form of communication has been human–human or human–device, but the IoT is a communication as machine–machine. So, it is a network of objects with a self-configured wireless network. These IoT frameworks are used to collect, process, and analyze data streams in real time and facilitate provision of smart solutions. IoT is observed as a natural evolution of environmental sensing systems. This aims to use different sensors to measure key parameters in forest areas in regular basis, with no need of human intervention and to send this information via wireless communication to a central platform. IoT-based environment monitoring technologies and a smart home technology are being accepted by people because they have good prospects for development. IoT products in agriculture include a number of IoT devices and sensors as well as a powerful dashboard with analytical capabilities and in-built reporting features (Yu et al. in Cybern. Inf. Technol. 14(5):51–62, 2014). A networking-based intelligent platform can monitor forest environmental factors in time by applying IoT. This technology has the advantages of low power dissipation, low data rate, and high-capacity transportation.
Machine Learning Algorithms for Protein Physicochemical Component Prediction Using Near Infrared Spectroscopy in Chickpea Germplasm
Indian Journal of Plant Genetic Resources
Expert System for wheat: an online electronic guide for appropriate and timely cultural operations in India
The Agriculture Expert Systems are designed to emulate the logic and reasoning processes that an ... more The Agriculture Expert Systems are designed to emulate the logic and reasoning processes that an Expert would use to solve a problem. The expert systems in agriculture are based on the integration of knowledge and experience of specialists from different fields and have the capability to answer relevant questions and explain its reasoning process. Expert System on Wheat Crop Management is one such example developed by the scientists of IASRI in collaboration with two premier institutions doing research on wheat namely DWR, Karnal and IARI, New Delhi. The system holds a collection of general principles that are potentially applied to solve a problem related to wheat crop management and extends large information to the wheat-growing farmers.
International journal of plant and soil science, Mar 18, 2024

Bhartiya Krishi Anusandhan Patrika, Dec 31, 2018
xz a Fk&lw ph lkexz h ds çkfIr ds fy, iklZ j ,ts a Vks a dk fodkl eq jkjh dq ekj] eks gEen lehj Q... more xz a Fk&lw ph lkexz h ds çkfIr ds fy, iklZ j ,ts a Vks a dk fodkl eq jkjh dq ekj] eks gEen lehj Qk:dh] ds-ds-prq os Z nh] pUnu dq ekj ns o ,oa ia dt nkl Hkk-v-i-& Hkkjrh;-f"k lka f[;dh vuq lU/kku la LFkku] ykbcz s jh ,os U;w ] ubZ fnYyh&110 012] HkkjrA iz kIr% fnlEcj 2018 Lohdr% tuojh 2018 lkjka 'k xz a Fk&lw ph fjd‚MZ dk mi;ks x çpq j ek=k es a fo'ys "k.k vkS j çlkj ds mís '; ls fcfCy;ks es fVª fl;u ds }kjk fd;k tkrk gS ys fdu vks iu ,Dls l if=dkvks a tS ls U;w fDyd ,flM fjlpZ ] fLça xj] v‚DlQks MZ bR;kfn es a lkfgR; çdk'ku dh c<+ rh nj ds lkFk] oka fNr çk:i es a la jfpr xz a Fk&lw ph dh tkudkjh çkIr djuk eq f'dy dke gks rk gS A ,d fMftVy xz a Fk&lw ph&Ms Vkcs l çdkf'kr lkfgR; ds ckjs es a vko';d vkS j la jfpr tkudkjh miyC/k djkrk gS A fofHkUu ys [kks a dk xz a Fk&lw ph fjd‚MZ ba Vjus V ij fc[kjk gq vk vkS j fofHkUu os c i"Bks a ij ekS tw n jgrk gS a A ;g 'kks /k ,d gh LFkku ij ^^U;w fDyd ,flM fjlpZ ß dh xz a Fk&lw ph Ms Vk ds fy, iq uiZ z kfIr ç.kkyh çLrq r djrk gS A bl ç;ks tu ds fy, iklZ j ,ts a Vks a dks fodflr fd;k x;k gS tks ÞU;w fDyd ,flM fjlpZ ß dh vyx vyx os c i"Bks a is tk dj fc[kjs gq , xz a Fk&lw ph Ms Vk dks iklZ djrk gS vkS j va r es a ,d LFkkuh; xz a Fk lw ph Ms Vkcs l es a la xghr djrk gS A xz a Fk&lw ph Ms Vkcs l ds vk/kkj ij] O;ofLFkr çk:i es a xz a Fk&lw ph dh tkudkjh çnf'kZ r djus ds fy, ^^ Fkz h fV;j vkfdZ Vs Dpj^^ dk mi;ks x fd;k x;k gS A bl ra = ds ç;ks x ls fofHkUu ys [kdks a ds }kjk çdkf'kr ys [kks a ds vka dM+ s ] vyx&vyx ys [kdks a vkS j la LFkkuks a ds chp ekS tw nk tq M+ ko ,oa vU; fo'ys "k.kkRed fjiks Vks a Z dks ns [kk tk ldrk gS A 'kCn dq a th % xz a Fk&lw ph] fMftVy ,ts a V] vkbZ lhVh] lq puk lk>kdj.k] 'kks /ki=A

An Algorithm for Automatic Text Annotation for Named Entity Recognition using spaCy Framework
Text Annotation is the process of adding metadata in the text and used in various tasks like natu... more Text Annotation is the process of adding metadata in the text and used in various tasks like natural language processing (NLP) and machine learning models. Named entity recognition (NER) is one of the interesting and challenging tasks of NLP and is being used extensively in many domains. The application of NER will also be useful in handling documents, queries, reports and research articles related to agriculture in identifying pests affecting crops. SpaCy, a free and open source library is being used for NER that requires the text data in a complex annotated format. The process of manual annotation is difficult and time-consuming task. Therefore, to streamline the process of text annotation, we developed an algorithm and a tool for automatic annotation of text data. Approximately 3.6 million queries were collected from “Kisan Call Centre”, a helpline service to farmers by Government of India and plant protection queries of Paddy and Wheat crops were extracted from this database. Th...

Frontiers in Plant Science
Wheat is widely cultivated in the Indo-Gangetic plains of India and forms the major staple food i... more Wheat is widely cultivated in the Indo-Gangetic plains of India and forms the major staple food in the region. Understanding microbial community structure in wheat rhizosphere along the Indo-Gangetic plain and their association with soil properties can be an important base for developing strategies for microbial formulations. In the present study, an attempt was made to identify the core microbiota of wheat rhizosphere through a culture-independent approach. Rhizospheric soil samples were collected from 20 different sites along the upper Indo-Gangetic plains and their bacterial community composition was analyzed based on sequencing of the V3-V4 region of the 16S rRNA gene. Diversity analysis has shown significant variation in bacterial diversity among the sites. The taxonomic profile identified Proteobacteria, Chloroflexi, Actinobacteria, Bacteroidetes, Acidobacteria, Gemmatimonadetes, Planctomycetes, Verrucomicrobia, Firmicutes, and Cyanobacteria as the most dominant phyla in the w...

Synonymous codons are randomly distributed among genes, a phenomenon termed as codon usage bias. ... more Synonymous codons are randomly distributed among genes, a phenomenon termed as codon usage bias. Understanding the extent and pattern of codon bias; the forces affecting codon usage are the key steps towards elucidating the adaptive choice of codon at the level of individual genes. Herein, trends in codon usage bias in a set of 1450 genes in Salinibacter ruber, an extremely halophilic bacterium have been evaluated. Notably, synonymous codon usage varies considerably among genes of this bacterium. Base composition (mutational bias) particularly C-and G-ending codons predominate with greater preference of 'C' at synonymously variable sites. The effect of natural selection acting at the level of translation has been observed. Certain genes with a high codon bias have been identified by multivariate statistical approach and investigations through various codon bias indices. These genes appear to be highly expressed, and their codon usage seems to have been shaped by selection favouring a limited number of translationally optimal codons. A subset of 27 optimal codons seems to be preferentially used in highly expressed genes. The frequency of these codons appears to be correlated with the level of gene expression, and may be a useful indicator in the case of genes (or open reading frames) whose expression levels are unknown.
Assessment of queries of farmers at Kisan Call Center using natural language processing
Indian Journal of Extension Education

LEGUME RESEARCH - AN INTERNATIONAL JOURNAL
Background: Chickpea is the third major pulse produced globally, with 11.6 million tonnes produce... more Background: Chickpea is the third major pulse produced globally, with 11.6 million tonnes produced per annum (Merga and Haji, 2019). Sugar alcohols, inulin, starch are all prebiotic carbohydrates found in chickpeas (Johnson et al., 2020). Near-Infrared (NIR) spectroscopy is a non-destructive, versatile and powerful analytical technique. Methods: Spectral data obtained from NIR spectroscopy requires application of various techniques to extract useful information from spectral data which is further used for building various models for prediction of physical or chemical components presents in agricultural crops. The main aim of this study is to apply various machine learning algorithms especially effective in predicting sugar concentration in chickpea. Sugar prediction models are developed using Linear Regression (LR), Artificial Neural Network (ANN), Random Forest (RF), Support Vector Regression (SVR) and Decision Tree Regression (DTR) algorithms. Performance of the models is evaluate...

Frontiers in Genetics, 2022
Rice is an important staple food grain consumed by most of the population around the world. With ... more Rice is an important staple food grain consumed by most of the population around the world. With climate and environmental changes, rice has undergone a tremendous stress state which has impacted crop production and productivity. Plant growth hormones are essential component that controls the overall outcome of the growth and development of the plant. Cytokinin is a hormone that plays an important role in plant immunity and defense systems. Trans-zeatin is an active form of cytokinin that can affect plant growth which is mediated by a multi-step two-component phosphorelay system that has different roles in various developmental stages. Systems biology is an approach for pathway analysis to trans-zeatin treated rice that could provide a deep understanding of different molecules associated with them. In this study, we have used a weighted gene co-expression network analysis method to identify the functional modules and hub genes involved in the cytokinin pathway. We have identified ni...

Biochemistry & Analytical Biochemistry, 2016
Codon is the basic unit for biological message transmission during synthesis of proteins in an or... more Codon is the basic unit for biological message transmission during synthesis of proteins in an organism. Codon Usage Bias is preferential usage among synonymous codons, in an organisms. This preferential use of a synonymous codon was found not only among species but also occurs among genes within the same genome of a species. This variation of codon usage patterns are controlled by natural processes such as mutation, drift and pressure. In this study, we have used computational as well as statistical techniques for finding codon usage bias and codon context pattern of Salinibacter ruber (extreme halophilic), Chromohalobacter salexigens (moderate halophilic) and Rhizobium etli (nonhalophilic). In addition to this, compositional variation in translated amino acid frequency, effective number of codons and optimal codons were also studied. A plot of EN c versus GC 3s suggests that both mutation bias and translational selection contribute to these differences of codon bias. However, mutation bias is the driving force of the synonymous codon usage patterns in halophilic bacteria (Salinibacter ruber and Chromohalobacter salexigens) and translational selection seems to affect codon usage pattern in non-halophilic bacteria (Rhizobium etli). Correspondence analysis of Relative Synonymous Codon Usage revealed different clusters of genes varying in numbers in the bacteria under study. Moreover, codon context pattern was also seen variable in these bacteria. These results clearly indicate the variation in the codon usage pattern in these bacterial genomes.

The Indian Journal of Animal Sciences
MicroRNAs (miRNAs) are ~22nt long non-coding RNAs, which regulate the gene regulation at the post... more MicroRNAs (miRNAs) are ~22nt long non-coding RNAs, which regulate the gene regulation at the post transcriptional level in both plants and animals. These miRNA are conserved in nature and hence potential base for new miRNA prediction through homology search. No miRNAs in this species are identified so far in economically important water buffalo (Bubalus bubalis). In this study, EST-based homology search, an established computational approach is used to find the potential miRNAs in buffalo. Six potential miRNA in buffalo were identified utilizing publicly available buffalo ESTs against the already known mature miRNAs of closely related species i.e. Bos taurus. Based on their sequence complementarity, target genes were identified which encode transcription factors (8%), enzymes (30%) and transporters (14%) as well as other proteins involved in physiological and metabolic processes (48%). These target genes also encode the proteins for signal transduction and normal development. This s...

International Journal of Molecular Sciences
Vegetable crops possess a prominent nutri-metabolite pool that not only contributes to the crop p... more Vegetable crops possess a prominent nutri-metabolite pool that not only contributes to the crop performance in the fields, but also offers nutritional security for humans. In the pursuit of identifying, quantifying and functionally characterizing the cellular metabolome pool, biomolecule separation technologies, data acquisition platforms, chemical libraries, bioinformatics tools, databases and visualization techniques have come to play significant role. High-throughput metabolomics unravels structurally diverse nutrition-rich metabolites and their entangled interactions in vegetable plants. It has helped to link identified phytometabolites with unique phenotypic traits, nutri-functional characters, defense mechanisms and crop productivity. In this study, we explore mining diverse metabolites, localizing cellular metabolic pathways, classifying functional biomolecules and establishing linkages between metabolic fluxes and genomic regulations, using comprehensive metabolomics deciphe...

Web Semantics for Textual and Visual Information Retrieval, 2017
With the advancements in sequencing technologies, there is an exponential growth in the availabil... more With the advancements in sequencing technologies, there is an exponential growth in the availability of the biological databases. Biological databases consist of information and knowledge collected from scientific experiments, published literature and statistical analysis of text, numerical, image and video data. These databases are widely spread across the globe and are being maintained by many organizations. A number of tools have been developed to retrieve the information from these databases. Most of these tools are available on web but are scattered. So, finding a relevant information is a very difficult, and tedious task for the researchers. Moreover, many of these databases use disparate storage formats but are linked to each other. So, an important issue concerning present biological resources is their availability and integration at single platform. This chapter provides an insight into existing biological resources with an aim to provide consolidated information at one pla...

Frontiers in Genetics, 2022
Cereals are the most important food crops and are considered key contributors to global food secu... more Cereals are the most important food crops and are considered key contributors to global food security. Loss due to abiotic stresses in cereal crops is limiting potential productivity in a significant manner. The primary reasons for abiotic stresses are abrupt temperature, variable rainfall, and declining nutrient status of the soil. Varietal development is the key to sustaining productivity under influence of multiple abiotic stresses and must be studied in context with genomics and molecular breeding. Recently, advances in a plethora of Next Generation Sequencing (NGS) based methods have accelerated the enormous genomic data generation associated with stress-induced transcripts such as microarray, RNAseq, Expressed Sequenced Tag (ESTs), etc. Many databases related to microarray and RNA-seq based transcripts have been developed and profusely utilized. However, an abundant amount of transcripts related to abiotic stresses in various cereal crops arising from EST technology are availa...
Wheat Expert System Poster
International Journal of Bioinformatics Research and Applications, 2021
RNA-Seq has gained immense popularity and emerged as a potential high-throughput platform for ide... more RNA-Seq has gained immense popularity and emerged as a potential high-throughput platform for identification of differentially expressed (DE) genes. In order to estimate the nature of differential genes, it is important to find statistical distributional property of the data. In the present study we propose a new hybrid model (NBPFCROS) based on parametric and non-parametric statistic for the identification of DE genes. The NBP model based on Compound mixture of Poisson-gamma distribution is used as a parametric statistic and Fold change value derived using fold change rank ordering statistics (FCROS) algorithm is used as non-parametric statistic, we used a gene significance score pi-value by combining expression fold change (f value) and statistical significance (p-value). The performance of
PERMISnet-II: Personnel Management Information System Network-II for the Indian Council of Agricultural Research
Uploads
Papers by Mohammad Samir Farooqi