The complexity and number of interconnected elements comprising today's networks have stressed th... more The complexity and number of interconnected elements comprising today's networks have stressed the need for autonomic computing techniques. This need is even bigger in Wireless Sensor Networks, because sensor nodes should work continuously without operator intervention, i.e. have self-organiztaion and self-configuration algorithms. The first action that a Wireless Sensor Network should do is to establish the network address of each node without collisions. However, this scenarios are dynamic which means that network partitions and rejoins are common and node addition and failure are not uncommon, which means that the network should be able to reconfigure itself without compromising availability.
A Machine Learning Approach for Resolving Place References in Text
This paper presents a machine learning method for resolving place references in text, i.e. linkin... more This paper presents a machine learning method for resolving place references in text, i.e. linking character strings in documents to locations on the surface of the Earth. This is a fundamental task in the area of Geographic Information Retrieval, supporting access through geography to large document collections. The proposed method is an instance of stacked learning, in which a first learner based on a Hidden Markov Model is used to annotate place references, and then a second learner implementing a regression through a Support Vector Machine is used to rank the possible disabiguations for the references that were initially annotated. The proposed method was evaluated through gold-standard document collections in three different languages, having place references annotated by humans. Results show that the proposed method compares favorably against commercial state-of-the-art systems such as the Metacarta geo-tagger and Yahoo! Placemaker.
Geotargeting is a specialization of contextual advertising where the objective is to target ads t... more Geotargeting is a specialization of contextual advertising where the objective is to target ads to Website visitors concentrated in well-defined areas. Current approaches involve targeting ads based on the physical location of the visitors, estimated through their IP addresses. However, there are many situations where it would be more interesting to target ads based on the geographic scope of the target pages, i.e., on the general area implied by the locations mentioned in the textual contents of the pages. Our proposal applies techniques from the area of geographic information retrieval to the problem of geotargeting. We address the task through a pipeline of processing stages, which involves (i) determining the geographic scope of target pages, (ii) classifying target pages according to locational relevance, and (iii) retrieving ads relevant to the target page, using both textual contents and geographic scopes. Experimental results attest for the adequacy of the proposed methods in each of the individual processing stages.
Learning to resolve geographical and temporal references in text
Abstract Geo-temporal information is pervasive over textual documents, since most of them contain... more Abstract Geo-temporal information is pervasive over textual documents, since most of them contain references to particular locations, calendar dates, clock times or duration periods. An important text analytics problem is therefore related to resolving the place names and the temporal expressions referenced in the texts, ie linking the character strings in the documents that correspond to either locations or temporal instances, to the specific geospatial coordinates or the time intervals that they refer to. However, geo-temporal ...
In this paper, we compare different methods for the automatic assignment of geographic scopes to ... more In this paper, we compare different methods for the automatic assignment of geographic scopes to Web pages, based on placenames mentioned in the text. The methods under study are the Yahoo! Placemaker Web service, the hierarchy-based method originally proposed for the Web-a-Where system, the spatial overlap-based method originally proposed in the GIPSY project, the graph-based method originally proposed in the GREASE project, and three simple baseline methods corresponding to using the most frequently occurring place, the spatial area that covers all mentioned places, or the spatial area that covers all mentioned non-outlier places. The task under study may be included into the broader problem of Geographic Information Retrieval. Experiments were carried out on Web pages from the Regional Section of the Open Directory Project, comparing the automatically assigned scopes against those assigned by human editors. The results show that the Web-a-Where method gives the best results, closely followed by the GraphRank method and by the baseline based on to the most frequent occurring place.
This paper presents an approach for categorizing documents according to their implicit locational... more This paper presents an approach for categorizing documents according to their implicit locational relevance. We report a thorough evaluation of several classifiers designed for this task, built by using support vector machines with multiple alternatives for feature vectors. Experimental results show that using feature vectors that combine document terms and URL n-grams, with simple features related to the locality of the document (e.g. total count of place references) leads to high accuracy values. The paper also discusses how the proposed categorization approach can be used to help improve tasks such as document retrieval or online contextual advertisement.
Uploads
Papers by Ivo Anastácio