Prasad M. Deshpande
Title: Senior Technical Staff Member
Affiliation: IBM Research - India
Contact Details: firstname.lastname@example.org
Prasad M. Deshpande is a Senior Technical Staff Member at IBM Research - India and Manager of the Watson Foundations - Platforms and Infrastructure Group. His areas of expertise lie in data management, specifically data integration, OLAP, data mining and text analytics. He received a B. Tech in Computer Science and Engineering from IIT, Bombay and M.S. and Ph.D. degrees in Database Systems from the University of Wisconsin, Madison. He is an ACM Distinguished Scientist and member of the IBM Academy of Technology. His current focus is in the areas of data discovery and curation for Big Data platforms, data integration and machine data analytics. He has worked at several companies, including IBM Almaden Research Center, prior to joining IBM Research - India in 2005. He has more than 40 publications in reputed conferences and journals and 11 patents issued. He has served on the Program Committees of many conferences, as a PC member and PC Co-Chair.
Title of Talk 1: Unlocking the Power of Unstructured Data for Master Data Management
Synopsis: Master data management (MDM) systems provide structured data about business entities by combining data from multiple structured data sources and building a consolidated 360-degree view of entities such as customers and products . In addition to traditional data, enterprises increasingly want to include valuable information residing outside traditional data stores such as emails, call-center transcripts, chat logs, and comments.
In this talk, I will describe a new generation of MDM systems that can incorporate unstructured data into the master data for entities. MDM Extension for Unstructured Text Correlation (EUTC) bridges the gap between unstructured content and mastered data. It detects references to existing MDM entities in unstructured text, even when the relevant entity is not explicitly mentioned in the text, and links the documents to the MDM entities. It can also automatically discover new relationships between MDM entities, leading to better entity resolution. We have demonstrated that it is possible to achieve high precision and recall in matching entities to documents, using domain independent techniques.
Title of Talk 2: Preventing Information Leakage from Unstructured Documents
Synopsis: In today's enterprise world, information about business entities such as a person's name, address, and social security number is often present in both relational databases as well as content repositories. Information about such business entities is generally well protected in databases by well-defined and fine-grained access control. However, current document retrieval systems do not provide user-specific, fine-grained redaction of documents to prevent leakage of information about business entities from documents.
In this talk, I will present a system called ZoRRo which is an add-on for document retrieval systems to dynamically redact sensitive information of business entities referenced in a document based on access control defined for the entities. ZoRRo exploits database systems' fine-grained, label-based access control mechanism to identify and redact sensitive information from unstructured text, based on the access privileges of the user viewing it. To make on-the-fly redaction efficient, ZoRRo exploits the concept of k-safety in combination with Lucene-based indexing and scoring.