Spark is fast emerging as an alternative to Hadoop & Map/Reduce due to its speed. Spark Programming is often necessary to address complex processing loads, involving huge data volumes, which can’t processed by Hadoop in a timely manner. Its in-memory computing engine makes Spark the choice of platform for real-time analytics, which requires high speed data ingestion and processing within seconds. A whole new generation of analytics applications is now emerging to process geo-location data, streaming web events, sensors data, as well as data received from mobile and wearable devices.
The program is designed to provide an overall conceptual framework and common design patterns. Key concepts in each area will be explained and working code provided. Participants will be able to run the examples and expected to understand code. While explanation of key concepts is provided, a detailed code walk-though is usually not feasible in the interest of time. Code is written in Java and Scala. Therefore prior knowledge of these languages will be helpful to understand the code-level implementation of key concepts.
To provide a thorough understanding of concepts of in-memory distributed computing and Spark API, so as to enable participants in development of Spark programs of moderate complexity.