Big data webinars Hadoop and Spark

Webinar: What is Hadoop?

23 February 2016
Online, 15.00 – 16.00

 

Have you heard of Hadoop but don’t know what it is or what it does?

Or do you know that Hadoop is used to store very large datasets but you don’t know what it can do or why it might be relevant to you?

If so, this webinar is for you.

This webinar will provide an overview of Hadoop, including:

  • Hadoop and Hadoop clusters
  • why Hadoop can process datasets far larger than those comfortable inside desktop applications
  • some key ‘add-in’ products and what they are used for (e.g. Hive for data manipulation on a grand scale and Spark for statistical analysis)
  • a demonstration of using the Hive package to process a large dataset into something more manageable on the desktop

This webinar is intended for researchers with no in-depth knowledge of programming with data.

The webinar will consist of a 30 minute presentation followed by 20 minutes for questions.

 

Resources

Booking

 

Webinar: What is Hive?

22 March 2016
Online, 15.00 – 16.00

 

Hive is a package that works with Hadoop that allows users to manipulate very large datasets. This webinar is intended as an overview of what Hive is and why you might want to learn more about it.

This webinar will provide an overview of:

  • how Hive integrates into an Hadoop system and provides access to the large distributed datasets stored in Hadoop
  • why you might want to use Hive
  • the range of things you can do with Hive
  • two different ways of accessing Hive: directly accessing Hive via a Web interface and accessing Hive directly from desktop applications
  • examples to demonstrate: selecting specific columns from a data set, selecting rows with specific column values, aggregating and obtaining basic statistical measures for column values and joining two datasets together using a common column.

This webinar is intended for researchers with no in-depth knowledge of programming with data. However, attendees are more likely to find this webinar of interest if they already have some experience of doing simple data manipulations (e.g. obtaining summary statistics or aggregating data in SPSS, Stata or R)

The webinar will consist of a 30 minute presentation followed by 20 minutes for questions.

 

Resources

Booking

 

 

Webinar: What is Spark?

19 April 2016
Online, 15.00 – 16.00

Spark might be considered as a one-stop tool for big data processing, providing data manipulation facilities to slice and dice datasets as well as statistical functionality and visualisation capabilities to present your results.  This webinar is intended as an overview of the Spark system and what you can use it for.

This webinar will provide:

  • an overview of the Spark system and what it can be used for
  • how Spark can be used both as a standalone product and as a means to accessing large datasets on a Hadoop cluster
  • demonstrations of how Spark can be used to access and manipulate datasets in Hadoop and to present the results of analysis

This webinar is intended for researchers with no in-depth knowledge of programming with data. However, attendees are more likely to find this webinar of interest if they already have some experience of doing simple data manipulations and analyses (e.g. obtaining summary statistics and graphs in SPSS, Stata or R)

The webinar will consist of a 30 minute presentation followed by 20 minutes for questions.

 

Resources

Booking

 

 

 

Leave a Reply

Your details
  • (Your email address will not be published in your comment)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>