Big Data Initiative @ CSA
Department of Computer Science and Automation
Indian Institute of Science

Main  |  Public Lectures  |  Resources  |  Fellowships Campaign  |  Other Ways to Support  |  Team  |  Contact

Overview

Today is the era of Big Data. The need to analyze vast amounts of data, being generated in different fields ranging from medicine to financial markets and from transportation to environmental modeling, is emerging as the next big challenge and opportunity. The primary objective of the Big Data Initiative is to help build a strong academic and research ecosystem that allows India to address this challenge and take a leadership position in this critical area.

In order to create awareness about developments in the emerging field of Big Data -- including for example new algorithms and systems design for Big Data and new concepts in machine learning related to Big Data -- and to bring together the broader community so as to leverage existing strengths, we are initiating a series of public lectures on various aspects of Big Data, open to both academia and industry, as a platform for open exchange of ideas.


Big Data Public Lectures 2016

To increase awareness and build a strong ecosystem around Big Data in India, a series of public lectures will be delivered by leading experts.

Venue: Faculty Hall, Indian Institute of Science
Registration: Free, open to all (limited to 250 seats per lecture)

Prof. Yogesh Simmhan
Scalable Graph Processing in a Connected World
&
Vianney Koelman
Bits, bytes, barrels: Data assimilation challenges in the oil & gas industry

Thursday, May 26, 2016
4-5:30 PM, followed by tea/coffee

[Video] [Abstract]
Prof. Partha Talukdar

From Big Text to Big Knowledge

Thursday March 31, 2016
4-5 PM (followed by tea/coffee)

[Video] [Abstract]
Prof. Vijay Natarajan
Symmetry in Scientific Data : An Approach to Feature-Directed Visualization
&
Dr. Mayur Thakur
Leveraging Big Data Analytics for Compliance in Financial Institutions

Thursday, February 25, 2016
4-5:30 PM, followed by tea/coffee

[Video] [Abstracts]

Big Data Public Lectures 2015



Venue: Faculty Hall, Indian Institute of Science
Registration: Free, open to all (limited to 250 seats per lecture)

Dr. Anurag Agrawal

The Role of Big Data in Public Health and Medicine

Thursday August 27, 2015
4-5 PM (followed by tea/coffee)

[Abstract]
Dr. Rajeev Rastogi
Machine Learning@Amazon
&
S Anand
Visualizing Big Data

Thursday, October 29, 2015
4-5:30 PM, followed by tea/coffee

[Video] [Abstracts]
Dr. Mayur Datar
Machine learning challenges in E-commerce
&
Prof. Shubhabrata Das
Selected Problems in Sports Analytics

Thursday, November 26, 2015
4-6 PM, followed by tea/coffee

[Video] [Abstracts]

Big Data Public Lectures 2014

Venue: Faculty Hall, Indian Institute of Science
Registration: Free, open to all (limited to 250 seats per lecture; prior registration required)

Prof. Ravindran Kannan

Foundations of Data Science

Tuesday February 25, 2014
4-5 PM, followed by tea/coffee

[Video]
Prof. Ramesh Hariharan

Using Data to Understand Biological Systems

Thursday March 27, 2014
4-5 PM, followed by tea/coffee

[Abstract] [Video]
Prof. Arnab Bhattacharyya
Spectral Graph Theory and Graph Partitioning
&
Prof. Rajesh Sundaresan
Belief Propagation for Large-Scale Optimization on Graphs

Friday April 18, 2014
4-5:30 PM, followed by tea/coffee

[Abstracts] [Video]
Prof. Chiranjib Bhattacharyya
Learning from Big Data: Using Statistics to tame the Complexity
&
Prof. Jayant Haritsa
Big Data, Small Testing?

Friday May 23, 2014
4-5:30 PM, followed by tea/coffee

[Abstracts] [Video]
Prof. Chandra Murthy
Role of Sparse Signal Recovery in Big Data Analytics
&
Prof. Y. Narahari
Mechanism Design for Strategic Networks, Crowds, and Markets

Tuesday June 10, 2014
4-5:30 PM, followed by tea/coffee

[Abstracts] [Video]
Prof. Uday Bondhugula
Scalable Programming Technologies and Architectures for Big Data
&
Prof. N. Viswanadham
Big Data Based Decision Making in Manufacturing Supply Chains

Thursday July 3, 2014
4-5:30 PM, followed by tea/coffee

[Abstracts] [Video]

Resources

  • Upcoming book: Foundations of Data Science, by John Hopcroft and Ravindran Kannan
  • Course: E0 229: Foundations of Data Science, taught at IISc by Navin Goyal, Ramesh Hariharan, and Ravindran Kannan

Big Data Fellowships Fundraising Campaign

To recruit and train top talent in areas related to Big Data, we are raising funds to institute a set of Big Data Fellowships at IISc. For details of how to sponsor a Fellowship, please feel free to contact us. As a small token of appreciation, sponsors will continue to receive invitations to selected events and lectures related to Big Data at IISc throughout the Fellowship period, including possible participation in a Big Data Workshop in 2015/2016.


Other Ways to Support the Initiative

We are also open to other mechanisms of support. If you are interested in supporting us in other ways, please feel free to contact us.


Organizing Team

The CSA Department at IISc has leading faculty in a wide spectrum of disciplines related to Big Data, including theory, algorithms, machine learning, optimization, parallel architectures, and visualization. The organizing team includes several of these faculty: Professors Shivani Agarwal, Arnab Bhattacharyya, Chiranjib Bhattacharyya, Uday Bondhugula, Ramesh Hariharan, Ravindran Kannan, Y. Narahari, Vijay Natarajan, and Chandan Saha; as well as Professor Chandra Murthy from the ECE Department and Professor Partha Pratim Talukdar from the SERC Department.


Abstracts

Prof. Yogesh Simmhan
Assistant Professor, CDS, IISc

Scalable Graph Processing in a Connected World

Abstract: Graph datasets exemplify the complexity and scaling challenges posed by Big Data. While linked data from the web and social networks were at the vanguard of network data analysis, the Internet of Things is generating property and temporal graphs that are feature-rich and generated continuously from billions of devices. Managing and processing such graphs running into billions of vertices and edges, and thousands of properties and temporal snapshots pose unique requirements: distributed programming abstractions that can operate of the graph structure and properties, across time; execution and query runtimes that can use distributed resources such as elastic Clouds; and new algorithms that can derive meaningful analytics. This talk will review these Big Data challenges related to graph datasets and offer a glimpse into related research.

Speaker Bio: Yogesh Simmhan is an Assistant Professor at the Department of Computational and Data Sciences at the Indian Institute of Science, Bangalore. Previously, he was a Research Assistant Professor at the University of Southern California, Los Angeles and Associate Director of the USC Center for Energy Informatics. His research explores abstractions, algorithms and applications on distributed systems. These span Cloud Computing, Distributed Graph Processing Platforms and Elastic Stream Processing to support emerging Big Data and Internet of Things (IoT) applications. He has won the IEEE/ACM Supercomputing HPC Storage Challenge and IEEE TCSC SCALE Challenge Awards, and has been funded by NSF, DARPA and DeitY. He is a Senior Member of IEEE and ACM, Associate Editor of IEEE Transactions on Cloud Computing and a member of the IEEE Future Directions Initiative on Big Data. Yogesh has a Ph.D. in Computer Science from Indiana University and was a Postdoc at Microsoft Research, San Francisco.


Dr. Vianney Koelman
Vice President Computational R&D at Shell

Bits, bytes, barrels: Data assimilation challenges in the oil & gas industry

Abstract: An overview and outlook will be given of the Computational Research and Development conducted from Shell's Technology Center in Bangalore. The talk will focus on examples where data-driven modeling and data assimilation challenges pose fundamental challenges. A spectrum of methods involving advanced analytics, statistical analysis, compressive modeling, and machine learning - often involving open innovation collaborations with academia - are at the center of this research.

Speaker Bio: Following a PhD in physics (Eindhoven University of Technology, 1988) Vianney started with Shell in Research and Development in the Netherlands, and subsequently worked most of his career in various Petroleum Engineering and Oil and Gas Leadership roles in the Middle East, Europe, Africa and the Americas. Currently he combines his position as Chief Scientist with Vice President of Computational Technologies, based at Shell Technology Centre Bangalore, India. Throughout his career Vianney has focused on making technology contribute to the business. His work in Shell ranges from applied subsurface formation evaluation to fundamental computational Research and Development. The innovative methods he designed for complex micro-scale flow simulations led to Shell’s top-quoted publication in the open literature, and created new lines of computational research in micro- and nanoscale hydrodynamics aimed at optimizing the rheological behaviors of complex fluids. Currently Vianney and his teams are working on computational technologies ranging from catalyst technology optimization using molecular simulations, to data assimilation using machine learning and compressive modeling.


Prof. Partha Talukdar
Assistant Professor, CDS and CSA, IISc

From Big Text to Big Knowledge

Abstract: Knowledge harvesting from Web-scale text datasets has emerged as an important and active research area over the last few years, resulting in the automatic construction of large knowledge graphs (KGs) consisting of millions of entities and relationships among them. This has the potential to revolutionize Artificial Intelligence and intelligent decision making by removing the knowledge bottleneck which has plagued systems in these areas all along. In this talk, I shall provide an overview of research in this exciting and emerging area.

Speaker Bio: Partha Talukdar is an Assistant Professor in the Department of Computational and Data Sciences (CDS) and Department of Computer Science and Automation (CSA) at the Indian Institute of Science (IISc), Bangalore. Before that, he was a Postdoctoral Fellow in the Machine Learning Department at Carnegie Mellon University, working with Tom Mitchell on the NELL project. Partha received his PhD (2010) in CIS from the University of Pennsylvania, working under the supervision of Fernando Pereira, Zack Ives, and Mark Liberman. Partha is broadly interested in Machine Learning, Natural Language Processing, and Cognitive Neuroscience, with particular interest in large-scale learning and inference. He is a co-author of the book on Graph-based Semi-Supervised Learning published by Morgan Claypool Publishers.


Prof. Vijay Natarajan
Associate Professor, CSA, IISc

Symmetry in Scientific Data: An Approach to Feature-Directed Visualization

Abstract: Several natural and man-made objects exhibit symmetry in different forms, both in their geometry and in the material distribution. The study of symmetry plays an important role in understanding both the structure of these objects and their physical properties. In this talk, I will introduce the problem of symmetry detection in scientific data, where the data is represented as a scalar field. The goal is to identify regions of interest within the domain of a scalar field that remain invariant under transformations of both domain geometry and the scalar values. I will present algorithms to detect symmetry and discuss applications to visualization, interactive exploration, and visual analysis of large and feature-rich scientific data.

Speaker Bio: Vijay Natarajan is an associate professor in the Department of Computer Science and Automation and the Department of Computational and Data Sciences at Indian Institute of Science, Bangalore. He received the Ph.D. degree in computer science from Duke University and holds a bachelors and masters degree in computer science and mathematics from BITS Pilani. His research interests include scientific visualization, computational geometry, and computational topology.


Dr. Mayur Thakur
Data Analytics Group, Goldman Sachs

Leveraging Big Data Analytics for Compliance in Financial Institutions

Abstract: It is critical for a financial institution to comply with government regulations. The cost of non-compliance can result in criminal indictment, multi-billion dollar fines and loss of banking and other licenses. Employees of Compliance departments are responsible for implementation of proper policies, procedures and monitoring to ensure compliance with regulations. This discussion will focus on how Compliance leverages large quantities of data to establish monitoring controls. In particular, we will discuss specific business problems and show how they map into problems in natural language processing, outlier detection, and graph analytics. No prior knowledge of finance will be assumed.

Speaker Bio: Mayur Thakur is head of the Data Analytics Group in the Global Compliance Division. He joined Goldman Sachs as a managing director in 2014. Prior to joining the firm, Mayur worked at Google, where he designed search algorithms for more than seven years. Previously, he was an assistant professor of computer science at the University of Missouri. Mayur earned a PhD in Computer Science from the University of Rochester in 2004 and a BTech in Computer Science and Engineering from the Indian Institute of Technology, Delhi, in 1999.


Dr. Mayur Datar
Principal Data Scientist, Flipkart

Machine learning challenges in E-commerce

Abstract: In this talk, we will look at what it means to be doing Big Data research in industry. Why is Big Data critical for success of companies and also examples when it is not needed. We will look at a selection of problems from the e-commerce space that are well served by standard machine learning constructs. We will take a breadth first view of around a dozen problems from e-commerce, some of which are applicable to the larger area of consumer internet. Time permitting, we will also discuss the new emerging field of Deep learning (aka Deep Neural Networks) and their applicability to Big Data.

Speaker Bio: Mayur Datar works as a Principal Data Scientist with Flipkart. Prior to Flipkart, he was a Sr. Staff Research Scientist with Google where he worked for almost 12 years, and had enormous impact on various Google products through his research and execution. His research interests are in datamining, algorithms, machine learning and computer science theory. Prior to joining Google, Mayur obtained his doctorate degree in computer science from Stanford university and a Bachelor of Technology degree from I.I.T. Bombay. He was awarded the President of India, Gold Medal for being the most outstanding student of his graduating batch from I.I.T. Bombay. He has published several papers in renowned conferences like SIGMOD, VLDB, KDD, FOCS, SODA, WWW. He serves on the review committees for these conferences and journals.


Prof. Shubhabrata Das
Professor, IIM Bangalore

Selected Problems in Sports Analytics

Abstract: Application of more advanced statistical methods in the domain of sports has been on steady rise, leading to academic conferences and journals dwelling exclusively on this domain. In this talk, we would discuss briefly a few such problems. 1) Not out scores in cricket. In cricket, batting average has always been used as the primary measure of performance of a batsman. But traditional batting average exhibits serious limitation in reflecting the true performance of a batsman in light of notout innings. Treating notouts as censored data, adaptation of Kaplan-Meir estimator provides a more reasonable solution, but it still suffers both from conceptual as well as operational problems at certain situations. A generalized class of geometric distribution (GGD) is proposed in this work to model the runs scored by individual batsmen, with the generalization coming in the form of hazard of getting out changing from one score to another. We consider the change points as the known or specified parameters and derive the general expressions for the restricted maximum likelihood estimators of the hazard rates under the generalized structure considered. Given the domain context, we propose and test ten different variations of the GGD model and carry out the test across the nested models using the asymptotic distribution of the likelihood ratio statistic. We propose two alternative approaches for improved estimation of batting average on the basis of the above modelling. 2) Tracking the progress in a round-robin tournament (World Cup football, hockey, cricket). The up-to-date position of competing teams based on points obtained by them in the middle of any round-robin (stage of) tournament may inadequately reflect their actual relative position, because of the strength of the opposition faced till that stage. To help the followers of the game, as well as to possibly help the teams to strategize, a simple probably matrix based approach followed up by computation of the expected points may easily bring clarity to the situation. While an unstructured or unconstrained way of updating these probabilities, reflecting individual perspective, at successive stages of the tournament may be an acceptable approach, this method, being ad-hoc, suffers from arbitrariness and may lack consistency. In that context, we explore how a model based Bayesian adaptation can work effectively. 3) New models for repeated tournaments (Illustration with NCAA College basketball). The primary objective here is to model the win-loss records of matches in a repeated tournament, using strengths of the teams. Of particular focus is the case of a standard knockout tournament with teams ranked apriori and National Collegiate Athletic Association (NCAA) men and women basketball tournament data are considered for demonstration. The work considers modifications of Bradley-Terry (BT) model that are consistent with ranks of the participating teams. The BT model with restricted maximum likelihood strengths involves estimation of too many parameters and strength estimates typically lack strict monotonicity. A proposed class of rank-based percentile BT models from different parametric family provides an excellent fit to the past data using only few parameters and this validates the ranking procedure adopted by NCAA. Parameter estimation, goodness-of-fit using suitably framed test statistic and its null distribution, selection between nested models in the change point framework, as well as other estimation aspects are discussed. Adaptive variations of the model, that allow strength to alter, are also considered. The discussed model and analysis can be extended in more general tournament structures, as shown through an analysis of results from Indian Premiere League. The work has potential application in the wider domain of paired comparison. 4) Seeded Contests and Betting Odds (Illustration with tennis). We next develop a model to predict the outcome (win-loss) of a game based on the rank of the participating players and the betting odds set by the bookmakers. The model is based on Bradley Terry framework where the participating players are linked by a measure of their competitive ability. We illustrate the application of our model with a data set comprising records from international tennis tournament for women and men. Bayesian approach has been adopted to make inferences about the parameters in the model. The estimates are also used to infer the margin by which the 'true-odds' may be altered by the bookmakers. Prediction based on the estimated model is compared with true observation for the games played in the year 2015. Various strategies of selecting bets based on the model have been discussed. We propose two very promising betting strategies that have yielded positive result, albeit in short run.

Speaker Bio: Professor Shubhabrata Das is a Professor in the Quantitative Methods and Information Systems group of IIM Bangalore. His major research domains are Statistical Methods, Actuarial Mathematics and Operations Research. His specific topics of interests include Multivariate Statistics, Statistical Analysis of Fuzzy data, Sports Analytics, Business Forecasting, Measurement and Scaling and Discrete Optimization problems. Professor Das has co-authored a book titled "Facing the Future: Indian Pension Systems"//and also co-authored the chapter on "Canonical Correlations" in the Encyclopedia of Biostatistics. Professor Das has held visiting faculty positions at ESSEC Business School, Indian Statistical Institute Calcutta, University of Nebraska, and University of Monatana. He received his B.Stat. and M.Stat. from ISI Kolkata, and his M.S. and Ph.D. from University of North Carolina at Chapel Hill, USA.


Dr.Rajeev Rastogi
Director of Machine Learning, Amazon India

Machine Learning@Amazon

Abstract: In this talk, I will first provide an overview of the key Machine Learning (ML) applications we are developing at Amazon. I will then describe a matrix factorization model that we have developed for making product recommendations – the salient characteristics of the model are : (1) It uses a Bayesian approach to handle data sparsity, (2) It leverages user and item features to handle the cold start problem (3) It introduces latent variables to handle multiple personas associated with a user account (e.g.family members). Our experimental results with synthetic and real-life datasets show that leveraging user and item features, and incorporating user personas enables our model to provide lower RMSE and perplexity compared to baselines.

Speaker Bio: Rajeev Rastogi is the Director of Machine Learning at Amazon. Previously, he was Vice President of Yahoo! Labs Bangalore and the founding Director of the Bell Labs Research Center in Bangalore, India. Rajeev is an ACM Fellow and a Bell Labs Fellow. He is active in the fields of databases, data mining, and networking, and has served on the program committees of several conferences in these areas. He currently serves on the editorial board of the CACM, and has been an Associate editor for IEEE Transactions on Knowledge and Data Engineering in the past. He has published over 125 papers, and holds over 50 patents. Rajeev received his B. Tech degree from IIT Bombay, and a PhD degree in Computer Science from the University of Texas, Austin.


S Anand
Chief Data Scientist, Gramener

Visualizing Big Data

Abstract: Today, more information is produced every year than the entire history of human civilisation until 2000. This offers a unique opportunity - the ability to use this information to intelligently guide us. It also poses a challenge: how does one understand such vast quantities of data - which are well beyond most supercomputers' comprehension, let alone the human mind? Yet, analytics and visualisation research has made great strides. With modern visualisations such as treemaps, over 150 pages of productivity reports have been compressed into a single sheet without loss of information or insight. With animated visualisations, 100 years of weather data has been compressed into half a minute video. A confluence of programming, statistics and design gives us new ways to visualise, experience, and interact with a world of information. This talk will cover - How organisations use visuals to comprehend large scale data. What kind of decisions can be driven through data, and how to enable this What techniques and support mechanisms are available in the market today.

Speaker Bio: Anand is the Chief Data Scientist at Gramener.com. He has advised and designed IT systems for organizations such as the Citigroup, Honda, IBM, etc. Anand and his team explore insights from data and communicates these as visual stories. Anand also builds the Gramener Visualisation Server – Gramener's flagship product. Anand has an MBA from IIM Bangalore and a B.Tech from IIT Madras. He has worked at IBM, Lehman Brothers, The Boston Consulting Group and Infosys Consulting. He blogs at s-anand.net.


Anurag Agrawal
Principal Scientist, CSIR Institute of Genomics & Integrative Biology (IGIB)

The Role of Big Data in Public Health and Medicine

Abstract: Public health and medical decision support occupies an interdisciplinary space that lies between Medical, Biological, Mathematical, and Engineering Sciences. Interdisciplinary marriages remain uncommon and multidisciplinary marriages even more so. I will discuss recent efforts where we used information technology, systems biology visualisation tools, and Bayesian frameworks to gain novel understanding of public health in India, while creating a framework for integrating transparent and effective healthcare delivery with big data collection for tomorrow's medicine.

Speaker Bio: Anurag Agrawal is a principal scientist at the CSIR Institute of Genomics & Integrative Biology (IGIB), where he leads the Center of Excellence for Translational Research in Asthma & Lung disease (TRiAL) and the CSIR program for Enabling Affordable Community Health through Information Technology (EACH-IT). He graduated in medicine from the All India Institute of Medical Sciences in 1994, followed by specialization in Internal Medicine, Pulmonary Disease, and Critical Care from Baylor College of Medicine, Houston, (2003) and a PhD in physiology. After serving as a faculty member at Baylor, he joined IGIB in 2007. His research in respiratory health has covered a full spectrum from molecular pathobiology to informatics and epidemiology, leading to his being awarded the Shanti Swaroop Bhatnagar award for Medical Sciences in 2014 and the Wellcome Trust India Alliance senior fellowship in 2015. He has previously received the Lady Tata Young Researcher Award and the Swarnjayanti Fellowship of the Department of Science and Technology. Dr. Agrawal believes health data analytics to be the new frontier of medicine and is actively engaged in promoting the confluence of medicine and informatics through research and as an academic council member of the Public Health Foundation of India.


Ramesh Hariharan
CTO, Strand Life Sciences and Adjunct Professor, CSA, IISc

Using Data to Understand Biological Systems

Abstract: Hidden inside a living organism are a large number of molecular entities, all working in concert to make, and sometimes break, the organism. We are slowly learning to tease out information about these entities, less and less via direct observation, and more and more via indirect data generation. This talk will provide an introduction to the area and outline various challenges in generating this data, priming it for analysis, interpreting its meaning, and using it to impact lives.

Speaker Bio: Ramesh Hariharan is Founder-CTO at Strand Life Sciences and Adjunct Professor at the Computer Science Department of the Indian Institute of Science. At Strand, over the last 10 years, Ramesh has lead teams building analytical tools for high-throughput molecular profiling; these tools are widely used and have been cited in several thousand publications. More recently, Ramesh and the team at Strand have been working on technology to make dna sequencing for genetic disease diagnosis affordable in India. Ramesh's research contributions in computer science include fast algorithms for several combinatorial algorithmic problems. Ramesh got his Bachelor's degree from IIT, Delhi, in 1990, his Ph.D. from the Courant Institute of Mathematical Sciences, New York University, in 1994, and subsequently did a Post-Doc at the Max-Planck Institut fur Informatik in Saarbrucken, Germany.


Arnab Bhattacharyya
Assistant Professor, CSA, IISc

Spectral graph theory and graph partitioning

Abstract: Clustering is one of the most widely used techniques for big data analysis. This fundamental algorithmic primitive has found applications in biology, natural language processing, sociology, business analytics and many other fields. In this talk, I will describe how to cluster using spectral methods and the reasons behind the success of spectral partitioning. This algorithm has become the method of choice in many domains and can be implemented efficiently by standard linear algebra software. The talk will be for a general audience and not require prior mathematical background.

Speaker Bio: Arnab Bhattacharyya is an assistant professor in the Department of Computer Science and Automation at the Indian Institute of Science. He is a researcher in theoretical computer science and specializes in the design and analysis of algorithms operating on big data. He obtained his Ph.D., M.Eng. and B.S. degrees from the Massachusetts Institute of Technology and was a postdoctoral fellow at Princeton University and Rutgers University. He is a Ramanujan Fellow and a recipient of the U.S. Department of Energy Computational Science Graduate Fellowship.


Rajesh Sundaresan
Associate Professor, ECE, IISc

Belief Propagation for Large-Scale Optimization on Graphs

Abstract: Belief propagation algorithms pass messages along the edges of a graph and update them via local computations at the nodes of the graph. They are used for decoding error correcting codes in communication systems, for probabilistic inference in Bayesian networks, and for solving certain combinatorial optimization problems. The talk will give an overview of these algorithms and the challenges involved in showing their validity.

Speaker Bio: Rajesh Sundaresan is an associate professor at IISc's ECE department. His interests are in the areas of communication network algorithms and information theory. He received his B.Tech. from IIT Madras, in 1994, and his Ph.D. from Princeton University, in 1999. From 1999-2005, he designed, implemented, and tested wireless modems at Qualcomm Inc. He visited the University of Illinois at Urbana-Champaign during 2012-2013 on an Indo-US Science and Technology Forum fellowship.


Chiranjib Bhattacharyya
Associate Professor, CSA, IISc

Learning from Big Data: Using Statistics to tame the Complexity

Abstract: The problem of learning Statistical models from Data can be posed as Optimization programs. These programs often become unwieldy, in the Big Data setting, as the number of variables and constraints grow with number of data-points. Distributed Optimization, requiring expensive parallel hardware, is the current state of the art remedy for such problems. However, in Statistics growth of data points is often welcomed as it yields more understanding.This then begs the question: Are there alternatives to distributed processing where statistical understanding, gleaned from large volumes of data, can be used for taming the computational complexity of optimisation programs? Following this paradigm we present two ideas for solving classification problems: the first involving resampling constraints and the second involving chance constraint programming. Time permitting we will show how these ideas can be leveraged to build large scale focussed crawlers.

Speaker Bio: Chiranjib Bhattacharyya is an Associate Professor in the Department of Computer Science and Automation, Indian Institute of Science. He is interested in Robust Optimization and Machine Learning. Prior to joining the Department he was a postdoctoral fellow at UC Berkeley. He holds BE and ME degrees, both in Electrical Engineering, from Jadavpur University and the Indian Institute of Science, respectively, and completed his PhD from the Department of Computer Science and Automation, Indian Institute of Science.


Jayant Haritsa
Professor, SERC and CSA, IISc

Big Data, Small Testing?

Abstract: Big Data has become the buzzword of choice in recent times, especially in the software industry. The accompanying hoopla has spawned frenetic claims foretelling the development of great and wondrous solutions to Big Data challenges. However, there is very little said about the testing of such systems, an essential pre-requisite for deployment. In this talk, we will discuss the research challenges involved in the testing process, especially from the database perspective. We will also present CODD, a graphical tool that takes a first step towards the effective testing of Big Data deployments through a new metaphor of "data-less databases". CODD is currently in use at industrial and academic institutions worldwide.

Speaker Bio: Jayant Haritsa is on the faculty of the Supercomputer Education & Research Centre and the Department of Computer Science & Automation at the Indian Institute of Science, Bangalore, since 1992. He received a BTech degree from the Indian Institute of Technology (Madras), and a PhD degree from the University of Wisconsin (Madison). His research interests are in database system design, analysis and testing.


Chandra Murthy
Associate Professor, ECE, IISc

Role of Sparse Signal Recovery in Big Data Analytics

Abstract: In this talk, we start with providing a brief overview of some of the signal processing challenges that arise in big data analytics. We then discuss the mathematical models that are commonly employed to address these challenges, and argue that sparsity and sparse signal recovery methods naturally arise as promising solutions to a variety of big data problems. We also discuss some of the recent sparse signal recovery algorithms that may be applicable to big data. We present example studies on the use of these techniques in distributed sparse signal recovery, spectrum cartography, and, time permitting, wideband channel estimation in wireless communications.

Speaker Bio: Chandra R. Murthy is an associate professor in the Dept. of ECE, Indian Institute of Science, Bangalore. His research interests are in the areas of sparse signal recovery, cognitive radio systems, energy-harvesting based communications and multiple antenna systems with channel-state feedback.


Y. Narahari
Professor and Chairman, CSA, IISc

Mechanism Design for Strategic Networks, Crowds, and Markets

Abstract: Social networks, crowdsourcing, and Internet markets represent modern institutions that present many big data challenges. A distinctive feature of these institutions is the presence of human agents who exhibit strategic, possibly manipulative, behaviour. In this talk, we address the following question: can we ensure that the strategic agents behave honestly?, and, can we realize social goals in the presence of these self-interested agents? A clear answer to this question has far-reaching implications for solving numerous economic and algorithmic problems in areas such as electronic commerce, online auctions, public procurements, Internet advertising, social network monetization, and crowdsourcing. A perfect answer to this question is still elusive, however, the discipline of game theory and mechanism design provides a principled way of addressing this question. In this talk, we bring out, through many examples, the fundamentally different way in which mechanism design combined with machine learning can enable design of solutions and algorithms to problems involving strategic networks, crowds, and markets.

Speaker Bio: Y. Narahari is currently a Professor and the Chairperson at the Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India. The focus of his research in the last decade has been to explore problems at the interface of computer science and microeconomics. In particular, he is interested in applications of game theory and mechanism design to design of auctions and electronic markets, multiagent systems, and social network research. He is the lead author of a research monograph - Game Theoretic Problems in Network Economics and Mechanism Design Solutions - published by Springer, London, in 2009. He has just completed a textbook entitled Game Theory and Mechanism Design brought out by the IISc Press and the World Scientific Publishing Company. He has been an active scientific collaborator with a host of global R & D companies and research labs including General Motors R & D, IBM Research, Infosys Technologies, Intel, Xerox Research, and Adobe Research Labs. More details at: http://lcm.csa.iisc.ernet.in/hari/


Uday Bondhugula
Assistant Professor, CSA, IISc

Scalable Programming Technologies and Architectures for Big Data

Abstract: This talk will present challenges associated with developing programs for big data along with some useful programming techniques and paradigms. Big data has a strong connection with high performance computing due to the need to extract parallelism when dealing with large amounts of data. We will highlight the problem of data movement and the need to exploit data locality and minimize data communication. We will also look at the relative strengths and weaknesses of various approaches: automatic compiler/runtime-based, domain-specific tools and code generators, tuned library-based, and completely manual. At a high level, we will also understand the merits and weaknesses of languages that provide a higher level of abstraction for expressive and productive programming at the expense of performance -- from C to R.

Speaker Bio: Uday Bondhugula is an Assistant Professor in the Department of Computer Science and Automation at the Indian Institute of Science. His research interests are in the design of parallelizing compilers and runtime systems for a wide range of parallel architectures including multicores, distributed-memory clusters, and accelerators such as GPUs. Before joining IISc, he was with the Advanced Compiler Technologies group at the IBM T.J. Watson Research Center, Yorktown Heights, New York. He received his PhD in Computer Science and Engineering from the Ohio State University, and his BTech in Computer Science and Engineering from the Indian Institute of Technology, Madras. He has been honoured with several grants and awards for his research, including research grants from C-DAC, Intel Labs, National Instruments (R&D) and AMD, an NVIDIA CUDA research center award, and an INRIA Associate Team award.


N. Viswanadham
INSA Senior Scientist, CSA, IISc

Big Data Based Decision making in Manufacturing Supply Chains

Abstract: Decision making in supply chains is based on optimization models and the data from past sales. Software tools such as ERP, CRP, TMS, and WMS have been developed and used in the Industry. The aim is to deliver quality products to the customers at the right cost.

Currently, there are several new trends that are happening in the supply chain arena. Globalization has created dispersed supply chains which are vulnerable and dependent on entities and factors that are exogenous to the supply chain. Also, technologies such as Big data, Cloud computing, Blogs, Social Media, Internet of Things and Mobility have become sources of large volumes and several varieties of data.

In this lecture, I would first present some recent big data start-ups that are revolutionizing or disrupting the traditional manufacturing networks. We then discuss how the new developments in tagging, sensing and embedding effects the four important supply chain processes: procurement, manufacturing, maintenance & repair and retail. Next, we present the big data ecosystem model: big data service chain, institutions (governments and social groups) and their influence on data availability, resources (natural, human, financial, and industry inputs) and delivery service infrastructure (communication and decision). This leads us to the question: what data should I collect, and what algorithms should I use to make decisions that would result in better business outcomes.

Data based decision making, particularly with unorganized and non-numerical data is a relatively unexplored area of research with abundant opportunities. The takeaways from this lecture are opportunities for both research and start-ups in this evolving area.

Speaker Bio: N. Viswanadham is INSA Senior Scientist in the Department of Computer Science and Automation at the Indian Institute of Science. He has held several prestigious positions before joining IISc in this position: he was INAE Distinguished Professor at IISc during 2011-2013; Professor and Executive Director for The Center of Excellence Global Logistics And Manufacturing Strategies in the Indian School of Business, Hyderabad, during 2006-2011; Deputy Executive Director of The Logistics Institute-Asia Pacific and also Professor in Department of Mechanical and Production Engineering at the National University of Singapore during 1998-2005; and a faculty member at IISc from 1967-1998. While at IISc, he was Chairman of the Department of Computer Science and Automation from 1990-96 and Chairman of Electrical Sciences Division from 97-98. Professor Viswanadham is a Fellow of the IEEE, and a Fellow of Indian National Science Academy, Indian Academy of Sciences, Indian National Academy of Engineering, and Third World Academy of Sciences. He has made significant contributions to the areas manufacturing, logistics and global supply chain networks. He is the author of four textbooks, nine edited volumes, over two hundred articles in top tier journals and conferences, and has written several thought leadership papers on logistics, manufacturing, services in India. His current research efforts are on Global supply/service chain networks, Green supply chain design, Food security in India and Smart village development.


Contact

You can contact us at bigdata@csa.iisc.ernet.in or by writing to any of the faculty members involved in organizing the initiative.

Join us on our Facebook page: