You are here: Home india
ocument Actions


ACM India Council logo



Ankur Narang

Qualifications: B.Tech, Computer Science, IIT Delhi; M.S., Engg. Mgmt., Santa Clara University; Ph.D., Computer Science, IIT Delhi

Title: Senior Research Scientist

Affiliation: IBM Research Labs, New Delhi

Contact Details: annarang@in.ibm.com

Short CV:
Ankur Narang has 20 years work experience that includes 12 years in Sun Research Labs, California, and IBM Research Labs, New Delhi, and 6 years in Senior Management and Leadership positions in R&D multinational organizations. He currently works in IBM Research Labs, New Delhi as Senior Research Scientist and Manager, where he leads and manages the High Performance Analytics Group. He has around 30 publications in top computer science conferences and journals in areas of Parallel and Distributed Computing and Data Mining, along with 10 approved US patents and 6 filed patents pending approval. His research interests include Approximation and Randomized Algorithms, Distributed and High Performance Computing, Data Mining and Machine Learning, and Computational Biology/Computational Geosciences. He is a Senior Member of IEEE, has held Industrial Track and Workshop Chair positions, and has given invited talks in multiple conferences.

Title of Talk 1: Distributed Scheduling for Massively Parallel Systems
Synopsis: The exascale computing roadmap has highlighted efficient locality oriented scheduling in runtime systems as one of the most important challenges ("Concurrency and Locality" Challenge). Massively parallel many-core architectures have NUMA characteristics in memory behavior, with a large gap between the local and the remote memory latency. Further, future single nodes could have hundreds to thousands of cores with deep cache hierarchy and shared caches. Unless efficiently exploited, this complicated system architecture could lead to non-scalability and performance issues. Languages such as X10, Chapel and Fortress are based on partitioned global address space (PGAS) paradigm. They have been designed and implemented as part of the DARPA HPCS program for higher productivity and performance on many-core massively parallel platforms. These languages have built-in support for initial placement of threads (also referred as activities) and data structures in the parallel program. Therefore, locality comes implicitly with the program. The runtime systems of these languages need to provide efficient algorithmic scheduling of parallel computations.

Further, movement of massive amounts (Terabytes to Petabytes) of data is very expensive, which necessitates affinity driven computations. Therefore, distributed scheduling of parallel computations on multiple places needs to optimize multiple performance objectives: follow affinity maximally and ensure efficient space, time and message complexity. Further, achieving good load balancing can be contradictory to ensuring affinity which leads to challenging trade-offs in distributed scheduling. In addition, parallel computations have data dependent execution patterns which requires online scheduling to effectively optimize the computation orchestration as it unfolds. With continuous demand of processing larger and larger data volumes (from petabytes to exabytes and beyond), one needs to ensure data scalability along with scalability with the respect to number of compute nodes and cores in the target system. Thus, the scheduling framework needs to consider IO bottlenecks along with compute and memory bandwidth bottlenecks in the system to enable strong scalability and performance. Simultaneous consideration of these objectives makes distributed scheduling a particularly challenging problem.

With the advent of distributed memory architectures, lot of recent research on distributed scheduling looks at multi-core and many-core clusters. A dynamic tasking library (HotSLAW) was developed for many-core clusters that uses topology-aware hierarchical work stealing strategy for both NUMA and distributed memory systems. All these recent efforts primarily achieve load balancing using (locality-aware) work stealing across the nodes in the system. Although this strategy works well for slightly irregular computation such as UTS for geometric tree, it could result in parallel inefficiencies when the computation is highly irregular (binomial tree for UTS) or when there are complicated trade-offs between affinity and load-balance as in sparse matrix benchmark such as Conjugate Gradient benchmark. Certain other approaches such as consider limited control and no data-dependencies in the parallel computation, which limits the scope of applicability of the scheduling framework.

In this talk, we present a novel distributed scheduling framework and algorithm (LDS) for multi-place parallel computations, that uses a unique combination of remote (inter-place) spawns and remote work steals to reduce the overheads in the scheduler, which helps to dynamically maintain load balance across the compute nodes of the system, while ensuring affinity maximally. Our design was implemented using GASNet API and POSIX threads. On affinity and load-balance oriented benchmarks such as CG (Conjugate Gradient) and Kmeans clustering, we demonstrate strong performance and scalability on 2048 node BG/P. Using benchmarks such as UTS we show that LDS has lower space requirement than hierarchical work-stealing based approaches such HotSLAW and better performance than Charm++. We also explore how distributed machine learning can help in performance tuning of the distributed scheduling framework.

Title of Talk 2: Research Directions in Large-Scale Inverse Problems
Synopsis: Combining complex data with large-scale models to create better predictions is of immense value in many areas of computational science and engineering including geosciences, materials, chemical systems, biological systems, astrophysics, engineered systems in aerospace, transportation, buildings, and biomedicine, and beyond. At the heart of this challenge is an inverse problem: we seek to infer unknown model inputs (parameters, source terms, initial or boundary conditions, model structure, etc.) from observations of model outputs. Quantifying and mitigating uncertainty in the solution of such inverse problems has become important in recent years. Multiple techniques have been developed for statistical inverse problems for systems governed by large-scale, complex computational models. Improvements in scalable forward solvers for many classes of large-scale models have made feasible the repeated evaluation of model outputs for differing inputs. The exponential growth in high performance computing capabilities has multiplied the effects of the advances in solvers. Emergence of MCMC methods that exploit problem structure has radically improved the prospects of sampling probability densities for inverse problems governed by expensive models. Further, recent exponential expansions of observational capabilities have produced massive volumes of data from which inference of large computational models can be carried out.

This talk will present an overview of start-of-the-art techniques as well as research challenges that must be overcome to realize the promise of inference of large-scale complex models from large-scale complex data.





For more information on ACM India contact:
acmindia@acm.org

For Membership Inquiries, please contact:
acmindiahelp@acm.org

Or, reach us through Twitter or LinkedIn: