Feature comparison of Machine Learning Libraries
Machine learning is a subfield of computer science stemming from research into artificial intelligence. It is a scientific discipline that explores the construction and study of algorithms that can...
View ArticleCommon Functions in R
Google "What's R", and you'll see there are many ways in which R has been defined. As per my understanding, firstly it's a programming language. Secondly, it's solely meant for statistical computing....
View ArticleInstalling R on Linux
I was a bit skeptical about writing this post due to the scarcity of content, but anyhow you know what I chose.So, installing R on Ubuntu is all about the following two steps:sudo apt-get install...
View ArticleSocket Programming in Java
A socket literally means an electrical device receiving a plug or light bulb to make a connection. And in computer programming it means a method of communication between two programs one acting as the...
View ArticleInstalling H2O and Running ML Implementations of H2O
H2O is an open source predictive analytics platform. Unlike traditional analytics tools, H2O provides a combination of extraordinary math and high performance parallel processing with unrivaled ease of...
View ArticleRunning Naive Bayes Classification algorithm using Weka
Wiki says, "Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are...
View ArticleInstalling sparkling-water and Running sparkling-water's Deep Learning
Sparkling Water is designed to be executed as a regular Spark application. It provides a way to initialize H2O services on each node in the Spark cluster and access data stored in data structures of...
View ArticleInstalling SparkMLlib on Linux and Running SparkMLlib implementations
SparkMLlib is a machine learning library which ships with Apache Spark and can run on any Hadoop2/YARN cluster without any pre-installation. It is Spark’s scalable machine learning library consisting...
View ArticleInstall and run Augustus on CentOS
Hello Folks .. If you are visiting this blog you definitely know what Augustus is all about, but still for any exceptions, here’s its short introduction taken directly from its makers:“Augustus is an...
View ArticleInstallation Script for Apache Zookeeper-3.3.5 on Linux
One of my previous blogs describes how to setup a Zookeeper cluster manually. Here's a quick fix: an installation script for the same. You need to run the following script (after storing the content in...
View ArticleStart-up script for an installed Apache Zookeeper Cluster
If you have an installed Zookeeper-3.3.5 cluster, this script will save you from manually visiting each node and starting the zkServer there. All you have to do is grab a remote machine and run the...
View ArticleInstallation Script for Apache Storm on Ubuntu
One of my blogs here, describes steps for manual installation of a Storm cluster. To intensify the convenience factor for you, here's an installation script that you can use for setting up a Storm...
View ArticleInstallation Script for Apache Storm on CentOS
CentOS and Ubuntu and two famous Linux distribution used pretty widely. My last post shares an installation script for Storm cluster over Ubuntu machines and this one is for CentOS. The few usage rules...
View ArticleStart-up script for an installed Apache Storm Cluster
If you have installed a Storm cluster using my shell scripts in the previous blogs or even otherwise, this script will save you from manually visiting each node and starting the appropriate...
View ArticleInstalling Hadoop-1.x.x in Pseudo-Distributed Mode
Disclaimer: The installation steps shared in this blog post are typically for the hadoop-1.x.x series. If you are looking for hadoop-2.x.x series installation steps i.e. with YARN, this post isn’t the...
View ArticleUnderstanding Data Pre-processing in Mahout – Part I
Two most common commands used for pre-processing of train or test data when running Mahout algorithms are:seqdirectory: Turns raw text in a directory into mahout sequence file.seq2sparse: Creates...
View ArticleUnderstanding Data Pre-processing in Mahout–Part II
In continuation to my previous post where first one of the two commonly used commands for data pre-processing in Mahout is described, we shall continue with the second one i.e. “seq2sparse” in this...
View ArticleWriting your first Storm Topology
This blog contains multiple posts on Storm, its installation, shell scripts for installation of a Storm cluster and integration of Storm with HBase and RDBMS. But if you're a newbie to Storm this one's...
View ArticleQuery Storm Data Streams using Esper
As you might already be aware and as documented on the web, “Esper is a component for complex event processing (CEP) which enables rapid development of applications that process large volumes of...
View ArticleMahout’s Naïve Bayes: Train Phase
Mahout’s Naïve Bayes Classification algorithm executes in two phases:Train Phase: Trains a model using pre-processed train dataTest Phase: Classify documents (pre-processed) with the help of the model...
View ArticleMahout’s Naïve Bayes: Test Phase
This post is in continuation to my previous post where Mahout Naive Bayes "trainnb" command has been explained. This one would describe the internal execution steps of the "testnb" command, which is...
View ArticleSpark Overview
Spark is a cluster computing framework i.e. a framework which uses multiple workstations, multiple storage devices, and redundant interconnections, to form an abstract single highly available system....
View ArticleSetting up Spark-0.7.x in Standalone Mode
A Spark Cluster in Standalone Mode comprises of one Master and multiple Spark Worker processes. Standalone mode can be used both on a single local machine or on a cluster. This mode does not require...
View ArticleSetting up a Mesos-0.9.0 Cluster
Apart from running in Standalone mode, Spark can also run on clusters managed by Apache Mesos. "Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across...
View ArticleDeploying the Spark-0.7.x Cluster in Standalone Mode
To deploy the Spark Cluster in the Standalone Mode, run the following script present in the Spark Setup on the cluster's Master nodebin/start-all.shIf everything is fine, the Spark Master UI should be...
View Article