Call +65 6100 0613 Email:

Instructor-led Classroom Adult Training in Singapore - Modular Fast Track Skill-Based Trainings

Apache Impala with R Training

Apache Impala is an open source massively parallel processing SQL query engine for Apache Hadoop. The result is order-of-magnitude faster performance than Hive, depending on the type of query and configuration. Apache Impala provides a familiar and unified platform for batch-oriented or real-time queries. This course shows how to use R tidyverse package and Impala to query, visualize and perform statistical analysis & machine learning on your data.


All participants will receive a Certificate of Completion from Tertiary Courses after achieved at least 75% attendance.

Funding and Grant Applications

Click the links below to apply. Note that you need to register the course first.

For Singaporeans: SkillsFuture Credit

For Company: SSG Training Grant

Course Code: CRS-N-0047041

Course Booking

$298.00 (GST-exclusive)

Course Date

Course Time

* Required Fields

Course Cancellation/Reschedule Policy

We reserve the right to cancel or re-schedule the course due to unforeseen circumstances. If the course is cancelled, we will refund 100% to participants.
Note the venue of the training is subject to changes due to class size and availability of the classroom.
Note the minimal class size to start a class is 3 Pax.

Course Details

Module 1: Get Started on Apache Impala

  • Why Impala?
  • Differences between Impala and Hive
  • Impala Concepts and Setup
  • Hadoop Cluster Components
  • Setup Cloudera Hadoop VM
  • Adding Settings and Connections

Module 2: Importing Data into Impala

  • Creating Tables in Impala
  • Importing CSV files in Imapala

Module 3: Getting Data from Impala

  • Impala Queries with dplyr
  • Analyzing data with Impala
  • Extracting data and storing as table
  • Joining multiple tables together

Module 4: Aggregating Data with Impala

  • Simple Aggregations
  • Grouping Sets
  • Data analysis

Module5: Visualizing data

  • Using ggplot2 for data analysis
  • Using tabplot to visulize queries
  • Using dbplot to visualize database

Module 6: Advanced Data Analysis

  • Correlations
  • t-test 
  • Linear regression
  • Random forest
  • KMeans

Course Admin


This course assume basic R knowledge

Software Requirement

Please download and install the following software prior to the class

Who Should Attend

  • Data Scientists
  • Data Analysts
  • Machine Learning Developers


Apache Spark R TrainerDwight Nuwan Fonseka have a degree in Biotechnology (from NUS) ,Advanced diploma in Pharamceutical management (from MDIS) and Masters in Education (from NTU). He have 8 years experience of teaching biology at O and A levels/ IB level in international schools in Singapore and overseas.

Write Your Own Review

You're reviewing: Apache Impala with R Training

How do you rate this product? *

  1 star 2 stars 3 stars 4 stars 5 stars
1. Do you find the course meet your expectation?
2. Do you find the trainer knowledgeable in this subject?
3. How do you find the training environment
  • Reload captcha
    Attention: Captcha is case sensitive.


Use spaces to separate Subjects. Use single quotes (') for phrases.

You May Be Interested In These Courses

R Statistics Essential Training

R Statistics Essential Training

66 Review(s)
$298.00 (GST-exclusive)
Essential Machine Learning with R

Essential Machine Learning with R

12 Review(s)
$298.00 (GST-exclusive)
Statistical Analysis with R Commander

Statistical Analysis with R Commander

2 Review(s)
$298.00 (GST-exclusive)
Data Mining with R Rattle Training

Data Mining with R Rattle Training

$298.00 (GST-exclusive)
Apache Spark with R Training

Apache Spark with R Training

$298.00 (GST-exclusive)