Qualifications: MS and Ph.D., North Carolina State University
Title: Principal Scientist
Affiliation: ABB Corporate Research Center (India)
Contact Details: email@example.com
Ashish Sureka is a Principal Scientist at ABB Corporate Research Center (India). He was a Faculty Member at Indraprastha Institute of Information Technology, Delhi (IIIT-Delhi) from July 2009 to October 2014 and a visiting researcher at Siemens Corporate Research from August 2014 - July 2015. His current research interests are in the area of Mining Software Repositories, Software Analytics, and Social Media Analytics. He graduated with an MS and PhD degree in Computer Science from North Carolina State University (NCSU) in May 2002 and May 2005 respectively. He has worked at IBM Research Labs in USA, Siemens Research Lab (India) and was a Senior Research Associate at the R&D Unit of Infosys Technologies Limited before joining IIIT-D as a Faculty Member in July 2009. He has received research grants from Department of Information Technology (DIT, Government of India), Confederation of Indian Industry (CII) and Department of Science and Technology (DST, Government of India). He has published several research papers in international conferences and journals, graduated several PhD and MTech students, organized workshops co-located with conferences, and received best paper awards. He holds seven granted US patents.
Title of Talk 1: Process Mining Software Repositories
Synopsis: Process mining software repositories is an emerging field at the intersection of process mining and mining software repositories. Process mining is a subfield of business process intelligence and consists of mining event-log data for the purpose of process discovery, conformance checking or verification and process enhancement. Mining software repositories consisting of analyzing and mining structured and unstructured data stored in various software archives such as version control systems, issue tracking systems, peer code review systems and mail archives to solve problems encountered by practitioners. Few important applications of mining software repositories are: duplicate bug report detection, fault localization, effort and contribution estimation, automatic bug-report triaging, code clone detection and detecting defect prone areas in the code. In this talk I will cover technical challenges, research problems and some of the applications of process mining software repositories.
Title of Talk 2: Mining Hate and Extremism Promoting Users, Videos and Communities on YouTube
Synopsis: Online video sharing platforms such as YouTube contains several videos and users promoting hate and extremism. Due to low barrier to publication and anonymity, YouTube is misused as a platform by some users and communities to post negative videos disseminating hatred against a particular religion, country or person. We formulate the problem of identification of such malicious videos as a search problem and present a focused-crawler based approach consisting of various components performing several tasks: search strategy or algorithm, node similarity computation metric, learning from exemplary profiles serving as training data, stopping criterion, node classifier and queue manager. We implement two versions of the focused crawler: best-first search and shark search. We conduct a series of experiments by varying the seed, number of n-grams in the language model based comparer, similarity threshold for the classifier and present the results of the experiments using standard Information retrieval metrics such as precision, recall and F-measure. The accuracy of the proposed solution on the sample dataset is 69% and 74% for the best-first and shark search respectively. We perform characterization study (by manual and visual inspection) of the anti-India hate and extremism promoting videos retrieved by the focused crawler based on terms present in the title of the videos, YouTube category, average length of videos, content focus and target audience. We present the result of applying social network analysis-based measures to extract communities and identify core and influential users.