Georgetown University

Big Data

Big data is a relative term—data today are big by reference to the past, and to the methods and devices available to deal with them. The challenge big data presents is often characterized by the four V's -volume, velocity, variety, and veracity. Volume refers to the amount of data. Velocity refers to the flow rate—the speed at which it is being generated and changed. Variety refers to the different types of data being generated (currency, dates, numbers, text, etc.). Veracity refers to the fact that data is being generated by organic distributed processes (e.g., millions of people signing up for services or free downloads) and not subject to the controls or quality checks that apply to data collected for a study. 

Most large organizations face both the challenge and the opportunity of big data because most routine data processes now generate data that can be stored and, possibly, analyzed. The scale can be visualized by comparing the data in a traditional statistical analysis on the large size (e.g., 15 variables and 5000 records) to the Walmart database. If you consider the traditional statistical study to be the size of a period at the end of a sentence, then the Walmart database is the size of a football field. And that probably does not include other data associated with Walmart—social media data, for example, which comes in the form of unstructured text. 

If the analytical challenge is substantial, so can be the reward: Telenor, a Norwegian mobile phone service company, was able to reduce subscriber turnover 37% by using models to predict which customers were most likely to leave, and then lavishing attention on them., Allstate, the insurance company, tripled the accuracy of predicting injury liability in auto claims by incorporating more information about vehicle type.

Learning Outcomes of Big Data

  • To provide both a theoretical and practical understanding of the key methods of classification, prediction, reduction and exploration which are at the heart of data mining.
  • To provide a business decision-making context for these methods;
  • Using real business cases, to illustrate the application and interpretation of these methods.

Click Here for Video Transcript

[MUSIC PLAYING] JOSE-LUIS GUERRERO: Welcome to big data analytics-- a course in the matters of science-- so the MSF program at Georgetown University in McDonough School of Business. Big data, as you know very well, is one of the most important topics in the area of business, in the areas of government, in the area of drawing information from large databases. But at the same time, one of the most important things is in which way you're going to analyze the data.

I am Professor Jose-Luis Guerrero-Cusumano. I am a professor at Georgetown University. I am looking forward to working with you in this course. The intention of this class is to go [INAUDIBLE] of the definitions, techniques, and what is coming next to us. The course will have three units, and I would like to give you an idea of what each of these units would be like.

As you know, big data plays a growing importance in the modern business world. We are surrounded, overwhelmed with information. We go from one Industrial Revolution to another Industrial Revolution. This is the fourth Industrial Revolution.

In the first unit, we will answer questions such as what is exactly big data? What's the role of analytics? What is data mining? Evidently, if you have the big data set, one of the things you have to do in the future is to analyze the data, to draw conclusions, to really go deeper into the data. That's the part of analytics.

So data mining is going to be important for business, and it is. Right now, everything related to your knowledge of quantitative methods, how to look at the data, which are the steps that you'd have to draw and also to take in order to have an idea how to analyze the data. That is going to be data mining.

In the second unit, we have to explore the data. It's like data explorations through visualization in which way when I have a complex problem, I could reduce. it. I could look in a different way. Evidently, I cannot really draw a conclusion from the data without using a technique.

So one of the things I want to do is to review multiple regression analysis in which way multiple regression now could really help in order to explore the data and draw conclusions. Also when dealing with big data, the problem of big data is that it's very complex. It's not only information.

So the question would be imagine you have 100 variables, 200 variables, and a million observations. How could reduce the complexity of the problem? That will be our third and final unit-- in which way could I simplify the data but keep in most of the information there? At the same time, something that is going to be fun for this class is a group project-- a final practical report in which you would be able to gather the data that you want to analyze and draw conclusions and present those conclusions.

Remember the best way to work with big data analytics is looking at the data, organizing the data, analyzing the data, and drawing conclusions. It would be a pleasure for me to be with you this coming three weeks. And also remember that big data analytics is the beginning of analyzing complex situation, complex business analytics in order to draw conclusions that are meaningful.