Quantcast
Channel: Jayati Tiwari
Browsing latest articles
Browse All 29 View Live

Image may be NSFW.
Clik here to view.

Feature comparison of Machine Learning Libraries

Machine learning is a subfield of computer science stemming from research into artificial intelligence. It is a scientific discipline that explores the construction and study of algorithms that can...

View Article



Common Functions in R

Google "What's R", and you'll see there are many ways in which R has been defined. As per my understanding, firstly it's a programming language. Secondly, it's solely meant for statistical computing....

View Article

Installing R on Linux

I was a bit skeptical about writing this post due to the scarcity of content, but anyhow you know what I chose.So, installing R on Ubuntu is all about the following two steps:sudo apt-get install...

View Article

Socket Programming in Java

A socket literally means an electrical device receiving a plug or light bulb to make a connection. And in computer programming it means a method of communication between two programs one acting as the...

View Article

Image may be NSFW.
Clik here to view.

Installing H2O and Running ML Implementations of H2O

H2O is an open source predictive analytics platform. Unlike traditional analytics tools, H2O provides a combination of extraordinary math and high performance parallel processing with unrivaled ease of...

View Article


Running Naive Bayes Classification algorithm using Weka

Wiki says, "Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are...

View Article

Installing sparkling-water and Running sparkling-water's Deep Learning

Sparkling Water is designed to be executed as a regular Spark application. It provides a way to initialize H2O services on each node in the Spark cluster and access data stored in data structures of...

View Article

Installing SparkMLlib on Linux and Running SparkMLlib implementations

SparkMLlib is a machine learning library which ships with Apache Spark and can run on any Hadoop2/YARN cluster without any pre-installation. It is Spark’s scalable machine learning library consisting...

View Article


Install and run Augustus on CentOS

Hello Folks .. If you are visiting this blog you definitely know what Augustus is all about, but still for any exceptions, here’s its short introduction taken directly from its makers:“Augustus is an...

View Article


Installation Script for Apache Zookeeper-3.3.5 on Linux

One of my previous blogs describes how to setup a Zookeeper cluster manually. Here's a quick fix: an installation script for the same. You need to run the following script (after storing the content in...

View Article

Start-up script for an installed Apache Zookeeper Cluster

If you have an installed Zookeeper-3.3.5 cluster, this script will save you from manually visiting each node and starting the zkServer there. All you have to do is grab a remote machine and run the...

View Article

Installation Script for Apache Storm on Ubuntu

One of my blogs here, describes steps for manual installation of a Storm cluster. To intensify the convenience factor for you, here's an installation script that you can use for setting up a Storm...

View Article

Installation Script for Apache Storm on CentOS

CentOS and Ubuntu and two famous Linux distribution used pretty widely. My last post shares an installation script for Storm cluster over Ubuntu machines and this one is for CentOS. The few usage rules...

View Article


Start-up script for an installed Apache Storm Cluster

If you have installed a Storm cluster using my shell scripts in the previous blogs or even otherwise, this script will save you from manually visiting each node and starting the appropriate...

View Article

Installing Hadoop-1.x.x in Pseudo-Distributed Mode

Disclaimer: The installation steps shared in this blog post are typically for the hadoop-1.x.x series. If you are looking for hadoop-2.x.x series installation steps i.e. with YARN, this post isn’t the...

View Article


Image may be NSFW.
Clik here to view.

Understanding Data Pre-processing in Mahout – Part I

Two most common commands used for pre-processing of train or test data when running Mahout algorithms are:seqdirectory:  Turns raw text in a directory into mahout sequence file.seq2sparse: Creates...

View Article

Image may be NSFW.
Clik here to view.

Understanding Data Pre-processing in Mahout–Part II

In continuation to my previous post where first one of the two commonly used commands for data pre-processing in Mahout is described, we shall continue with the second one i.e. “seq2sparse” in this...

View Article


Image may be NSFW.
Clik here to view.

Writing your first Storm Topology

This blog contains multiple posts on Storm, its installation, shell scripts for installation of a Storm cluster and integration of Storm with HBase and RDBMS. But if you're a newbie to Storm this one's...

View Article

Query Storm Data Streams using Esper

As you might already be aware and as documented on the web, “Esper is a component for complex event processing (CEP) which enables rapid development of applications that process large volumes of...

View Article

Image may be NSFW.
Clik here to view.

Mahout’s Naïve Bayes: Train Phase

Mahout’s Naïve Bayes Classification algorithm executes in two phases:Train Phase: Trains a model using pre-processed train dataTest Phase: Classify documents (pre-processed) with the help of the model...

View Article

Image may be NSFW.
Clik here to view.

Mahout’s Naïve Bayes: Test Phase

This post is in continuation to my previous post where Mahout Naive Bayes "trainnb" command has been explained. This one would describe the internal execution steps of the "testnb" command, which is...

View Article


Spark Overview

Spark is a cluster computing framework i.e. a framework which uses multiple workstations, multiple storage devices, and redundant interconnections, to form an abstract single highly available system....

View Article


Image may be NSFW.
Clik here to view.

Setting up Spark-0.7.x in Standalone Mode

A Spark Cluster in Standalone Mode comprises of one Master and multiple Spark Worker processes. Standalone mode can be used both on a single local machine or on a cluster. This mode does not require...

View Article

Setting up a Mesos-0.9.0 Cluster

Apart from running in Standalone mode, Spark can also run on clusters managed by Apache Mesos. "Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across...

View Article

Deploying the Spark-0.7.x Cluster in Standalone Mode

To deploy the Spark Cluster in the Standalone Mode, run the following script present in the Spark Setup on the cluster's Master nodebin/start-all.shIf everything is fine, the Spark Master UI should be...

View Article

Browsing latest articles
Browse All 29 View Live


Latest Images