Papers by Brent Wenerstrom
Proceedings, Dec 1, 2006
BRIGHAM YOUNG UNIVERSITY As chair of the candidate's graduate committee, I have read the thesis o... more BRIGHAM YOUNG UNIVERSITY As chair of the candidate's graduate committee, I have read the thesis of Brent Wenerstrom in its final form and have found that (1) its format, citations, and bibliographical style are consistent and acceptable and fulfill university and department style requirements; (2) its illustrative materials including figures, tables, and charts are in place; and (3) the final manuscript is satisfactory to the graduate committee and is ready for submission to the university library. Date Dr. Christophe Giraud-Carrier Chair, Graduate Committee
Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, 2012
We previously introduced ReClose which provides summaries with both better content and better vis... more We previously introduced ReClose which provides summaries with both better content and better visual display for search engine results. We now seek to further improve summaries with the addition of structured text and multimedia, more specifically tables, lists, buttons and images. Currently search engine provided summaries rarely use structured text and images. We show in this paper that structured text and images lead to faster comprehension by search engine users and lead to visually more appealing summaries. 70% of nonexpert users made decisions more quickly using summaries preserving document structure and 65% of all users preferred summaries preserving structure to plain text summaries.
Flu Outbreak Detection Using Summarized Twitter Messages

Background: Life and death may be on the line for patients visiting an emergency department (ED).... more Background: Life and death may be on the line for patients visiting an emergency department (ED). The time it takes a patient to see a doctor can be critical. This is made more difficult by the fact that waiting times have been increasing over the past ten years [2]. Objective: We would like to determine what factors impact the current waiting in emergency departments through the use of Enterprise Guide. Methods: We are using survey data from the 2006 Ambulatory Health Care Data survey [1] conducted by the National Center for Health Statistics (NCHS) containing about 36,000 data records. We used linear regression to model our data with the help of the PROC GLM function. Results: We found that self reported pain levels did not correlate with waiting time, but that ED prioritization, time of day of visit, arrival by ambulance and previous waiting times from the same emergency department correlated with waiting time.
Click Fraud Detection Using Time and Space Contextual Information
Studies in Computational Intelligence, 2007
E-commerce greatly facilitates and enhances the level of interaction between a retailer and its c... more E-commerce greatly facilitates and enhances the level of interaction between a retailer and its customers, thus offering the potential for smarter marketing through thoughtful site design and analytics. This chapter presents and illustrates the three stages of knowledge discovery that move e-retailers from simple on-line catalog providers to finely-tuned, customer-centric service providers.

International Journal of Data Analysis Techniques and Strategies, 2012
Detecting duplicates in click data streams is an important task to fight against click fraud, whi... more Detecting duplicates in click data streams is an important task to fight against click fraud, which is the act of generating false clicks in internet advertising. Revenue generation advertising models, that charges advertisers for each click, leave space for individuals or rival companies to generate false clicks. The extent of click fraud's damage to online advertising has grown tremendously over the years. In this paper, we consider the problem of detecting duplicates in click data streams. Our solution uses a modified version of the counting Bloom filter. The temporal stateful Bloom filter (TSBF) extends the standard counting Bloom filter by replacing the bit-vector with an array of counters of states. These counters are dynamic and decay with time. We conducted a comprehensive set of experiments using synthetic and real world data. Results are compared with buffering techniques used in NetMosaics, a click fraud detection and prevention solution. Our results show that TSBF approach achieves 99% accuracy on duplicate detection, while keeping its space requirement a constant.
Improving Click Fraud Detection by Real Time Data Fusion
2008 IEEE International Symposium on Signal Processing and Information Technology, 2008
... the Internet and it provides information access to millions of statistical advantages, the us... more ... the Internet and it provides information access to millions of statistical advantages, the use of multiple data sources may increase the accuracy with which an event can be observed users per day. ... use of the Internet by individuals but also the way businesses welite. ...
Time and space contextual information improves click quality estimation
e-COMMERCE …, 2009
... 1998; Immorlica, Jain et al. 2005; Mahdian 2006). ... Ali and Scarr (Ali and Scarr 2007) deve... more ... 1998; Immorlica, Jain et al. 2005; Mahdian 2006). ... Ali and Scarr (Ali and Scarr 2007) developed a robust model to detect outliers and robots based on their number of clicks from the home page. Qu et al.(Qu, Vetter et al.) devised a multi-step process for intrusion detection. ...
Sentence Ranking for Search Document Summarization Based on the Wisdom of Three Search Engines
ReClose Fuzz: Improved Automatic Summary Generation using Fuzzy Sets
ReClose: web page summarization combining summary techniques
International Journal of Web Information Systems, 2011
Purpose – Search engine users are faced with long lists of search results, each entry being of a ... more Purpose – Search engine users are faced with long lists of search results, each entry being of a varying degree of relevance. Often users' expectations based on the short text of a search result hold false expectations about the linked web page. This leads users to skip relevant information, missing valuable insights, and click on irrelevant web pages wasting time.
For each search result presented by a search engine, a user has a choice to click through for mor... more For each search result presented by a search engine, a user has a choice to click through for more information or to skip the result. We aim to improve the accuracy of this click process by introducing a color-coding scheme built upon our improved summary text selection approach called Re-Close. Color-coding adds an additional level of context to the text without requiring additional screen space. Our results showed an improvement in click precision from 66% when using Google summaries to 80 % when using colorcoded ReClose summaries. Improvements in user click precision will lead to better user experiences, the more efficient finding of search results and higher confidence levels in search engine usage. 1.
BRIGHAM YOUNG UNIVERSITY As chair of the candidate's graduate committee, I have read the thesis o... more BRIGHAM YOUNG UNIVERSITY As chair of the candidate's graduate committee, I have read the thesis of Brent Wenerstrom in its final form and have found that (1) its format, citations, and bibliographical style are consistent and acceptable and fulfill university and department style requirements; (2) its illustrative materials including figures, tables, and charts are in place; and (3) the final manuscript is satisfactory to the graduate committee and is ready for submission to the university library. Date Dr. Christophe Giraud-Carrier Chair, Graduate Committee

Twitter provides the freshest source of data about what is happening in the lives people across t... more Twitter provides the freshest source of data about what is happening in the lives people across the world. The publicly available streams of status updates available on Twitter have been used to track earthquakes, forest fires and most especially flu outbreaks. Current techniques for tracking flu outbreaks rely on count data for a number of keywords. However, count data alone on the noisy Twitter streams is not reliable enough for health officials to make critical decisions. We propose a semi-automatic outbreak detection system. Rather than providing only alarms backed by count data, we propose a summarization system that will allow health officials to quickly verify outbreak alarms. This will lead to higher levels of trust in the system and allow the system to be used by health organizations around the world. We experimentally verify our summarization system and have found system users to have an accuracy of 0.86 when identifying multitweet summaries.

After a user types in a search query on a major search engine, they are presented with a number o... more After a user types in a search query on a major search engine, they are presented with a number of search results. Each search result is made up of a title, brief text summary and a URL. It is then the user's job to select documents for further review. Our research aims to improve the accuracy of users selecting relevant documents by improving the way these web pages are summarized. Improvements in accuracy will lead to time improvements and user experience improvements. We propose ReClose, a system for generating web document summaries. ReClose generates summary content through combining summarization techniques from query-biased and query-independent summary generation. Query-biased summaries generally provide query terms in context. Query-independent summaries focus on summarizing documents as a whole. Combining these summary techniques led to a 10% improvement in user decision making over Google generated summaries. Color-coded ReClose summaries provide keyword usage depth at a glance and also alert users to topic departures. Color-coding further enhanced ReClose results and led to a 20% improvement in user decision making over Google generated summaries. Ylany online documents include structure and multimedia of various forms such as tables, lists, forms and images. We propose to include this structure in web page summaries. We found that the expert user was insignificantly slowed in decision making while the majority of average users made decisions more quickly using summaries including structure without any decrease in decision accuracy. We additionally extended ReClose for use in summarizing large numbers of tweets in tracking flu outbreaks in social media. The resulting summaries have variable length and are
CLICK FRAUD DETECTION USING TIME AND SPACE CONTEXTUAL INFORMATION
Enhanced Visualization for Web-Based Summaries
ReClose Fuzz: Improved Automatic Summary Generation using Fuzzy Sets
Multi-Tweet Summarization for Flu Outbreak Detection
Uploads
Papers by Brent Wenerstrom