Download Eclipse offline version
http://www.eclipse.org/downloads/eclipse-packages/
Java Fuzzy Markup Language
http://www.uco.es/JFML/home
Top 16 Java Utility Classes
Read this for the Weka plugins for PDI (Pentaho Data Integration): Scoring and Forecasting
LD_LIBRARY_PATH and Eclipse
use Eclipse to set environment variable LD_LIBRARY_PATH.
You do it from the "Run..." or "Debug..." dialog, in the "Environment" tab.
Example:
Name: LD_LIBRARY_PATH
Value: /home/choojun/Downloads/dist_linux_x86_64/
Threads and available processors
int numberOfThreads_ = Runtime.getRuntime().availableProcessors() ;
GPU Programming and Java
Be aware of the fact that CUDA/OpenCL will not automagically make computations faster. GPU programming is an art, and it can be very, very challenging to get it right. It is worth in noting that GPUs are well-suited only for certain kinds of computations.
Note 1: You may compute anything on the GPU. However, whether you will achieve a good speedup or not is an issue. It is due to a problem of ‘task parallel’ or ‘data parallel’. Task parallel refers to problems where several threads are working on their own tasks, more or less independently. Data parallel refers to problems where many threads are all doing the same - but on different parts of the data. Note that the latter is the kind of problem that GPUs are good at: They have many cores, and all the cores do the same, but operate on different parts of the input data.
Note 2: “Simple math but with huge amount of data”: it sound like a perfectly data-parallel problem and thus like it was well-suited for a GPU. It is no doubts and agreed that GPUs are ridiculously fast in terms of theoretical computational power (FLOPS, Floating Point Operations Per Second). However, they are often throttled down by the memory bandwidth which are categorised into memory bound and compute bound.
Memory bound refers to problems where the number of instructions that are done for each data element is low. For example, consider a parallel vector addition: You’ll have to read two data elements, then perform a single addition, and then write the sum into the result vector. You will not see a speedup when doing this on the GPU, because the single addition does not compensate for the efforts of reading/writing the memory.
Compute bound refers to problems where the number of instructions is high compared to the number of memory reads/writes. For example, consider a matrix multiplication: The number of instructions will be O(n^3) when n is the size of the matrix. In this case, one can expect that the GPU will outperform a CPU at a certain matrix size. Another example could be when many complex trigonometric computations (sine/cosine etc) are performed on “few” data elements.
Note 3: Suppose that reading/writing one data element from the “main” GPU memory has a latency of about 500 instructions…. GPUs is data locality: always kept as close as possible to the GPU cores. GPUs have certain memory areas, i.e. referred to as “local memory” or “shared memory”, that usually is only a few KB in size, but particularly efficient for data that is about to be involved in a computation.
Threads in Java, with all the concurrency infrastructure, give the impression that we just have to split work and distribute it among several processors. On the other hand, GPU programming is an art. We encounter challenges on a much lower level with GPU programming Occupancy, register pressure, shared memory pressure, memory coalescing … just to name a few CUDA examples over here :D
In summary, when you have a data-parallel, compute-bound problem to solve, the GPU is the way to go.
Output Boxplot in an SVG file
As a Maven project, it requires the following libraries.
<dependency>
<groupId>org.apache.xmlgraphics</groupId>
<artifactId>batik-svggen</artifactId>
<version>1.9.1</version>
</dependency>
<dependency>
<groupId>org.apache.xmlgraphics</groupId>
<artifactId>batik-transcoder</artifactId>
<version>1.9.1</version>
</dependency>
<dependency>
<groupId>batik</groupId>
<artifactId>batik-dom</artifactId>
<version>1.6-1</version>
</dependency>
<dependency>
<groupId>org.apache.xmlgraphics</groupId>
<artifactId>batik-codec</artifactId>
<version>1.9</version>
</dependency>
Create the following class as the application.
import java.awt.Color;
import java.awt.Graphics2D;
import java.io.File;
import java.io.FileWriter;
import java.io.Writer;
import org.apache.batik.dom.GenericDOMImplementation;
import org.apache.batik.svggen.DOMGroupManager;
import org.apache.batik.svggen.ExtensionHandler;
import org.apache.batik.svggen.ImageHandler;
import org.apache.batik.svggen.SVGGeneratorContext;
import org.apache.batik.svggen.SVGGraphics2D;
import org.apache.batik.util.SVGConstants;
import org.w3c.dom.DOMImplementation;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
public class BoxPlot
{
public static int overallScale = 10;
public static int bpHeight = overallScale * 5;
public static void main(String[] args)
{
BoxPlot fictionPrice = new BoxPlot("Fiction");
fictionPrice.q1 = 5.24;
fictionPrice.q2 = 5.59;
fictionPrice.q3 = 6.29;
fictionPrice.low = 3.99;
fictionPrice.high = 7.00;
fictionPrice.outliers = new double[]
{ 13.60, 9.09, 12.91, 8.39, 10.49, 8.55, 1.00 };
BoxPlot nonfictionPrice = new BoxPlot("Non fiction");
nonfictionPrice.q1 = 5.565;
nonfictionPrice.q2 = 6.99;
nonfictionPrice.q3 = 10.49;
nonfictionPrice.low = 1.00;
nonfictionPrice.high = 17.68;
nonfictionPrice.outliers = new double[]
{ 26.00 };
BoxPlot childrensPrice = new BoxPlot("Children's ");
childrensPrice.q1 = 4.89;
childrensPrice.q2 = 5.24;
childrensPrice.q3 = 5.24;
childrensPrice.low = 4.49;
childrensPrice.high = 5.59;
childrensPrice.outliers = new double[]
{ 6.49, 5.84, 8.24, 7.69, 3.99, 20.40, 3.38, 7.17, 7.00 };
generate("price", 0, 30, 5, 10, "£", "", fictionPrice, nonfictionPrice, childrensPrice);
}
public static void generate(String filename, int chartMin, int chartMax, int interval, double scale,
String unitBefore, String unitAfter, BoxPlot... boxPlots)
{
DOMImplementation domImpl = GenericDOMImplementation.getDOMImplementation();
String svgNS = "http://www.w3.org/2000/svg";
Document document = domImpl.createDocument(svgNS, "svg", null);
SVGGraphics svgGenerator = new SVGGraphics(document);
int height = (int) (((boxPlots.length * 1.5) + 0.5) * bpHeight);
int xoffset = bpHeight * 2;
// grid
svgGenerator.setPaint(Color.LIGHT_GRAY);
for (int i = 1; i <= chartMax; i += (interval / 5))
{
svgGenerator.drawLine((int) (xoffset + (i * scale)), 1, (int) (xoffset + (i * scale)), height);
}
svgGenerator.setPaint(Color.BLACK);
// x axis
svgGenerator.drawLine(xoffset, 1, xoffset, height);
// y axis
svgGenerator.drawLine(xoffset, height, (int) (xoffset + (chartMax - chartMin) * scale), height);
// x axis labels
for (int i = 0; i <= chartMax; i += interval)
{
anchoredText(svgGenerator, unitBefore + i + unitAfter, (int) (xoffset + (i * scale)),
height + (int) (bpHeight * 0.25), "middle");
}
for (int i = 0; i < boxPlots.length; i++)
{
// label
anchoredText(svgGenerator, boxPlots[i].name, xoffset - overallScale, (int) (((i * 1.5) + 1) * bpHeight),
"end");
// draw
boxPlots[i].paint(svgGenerator, scale, xoffset, (int) (((i * 1.5) + 0.5) * bpHeight));
}
try
{
// write svg
Writer svgOut = new FileWriter(new File(filename + ".svg"));
svgGenerator.stream(svgOut, true);
svgOut.close();
/*
// write png
Rectangle areaOfInterest = new Rectangle((int) (xoffset + chartMax * scale + bpHeight),
(int) (height + 5 * scale + bpHeight));
PNGTranscoder p = new PNGTranscoder();
p.addTranscodingHint(ImageTranscoder.KEY_PIXEL_UNIT_TO_MILLIMETER, 1f);
p.addTranscodingHint(ImageTranscoder.KEY_WIDTH, areaOfInterest.width * 10f);
p.addTranscodingHint(ImageTranscoder.KEY_HEIGHT, areaOfInterest.height * 10f);
p.addTranscodingHint(ImageTranscoder.KEY_AOI, areaOfInterest);
Reader svgIn = new FileReader(new File(filename + ".svg"));
TranscoderInput input = new TranscoderInput(svgIn);
FileOutputStream pngOut = new FileOutputStream(new File(filename + ".png"));
TranscoderOutput output = new TranscoderOutput(pngOut);
p.transcode(input, output);
svgIn.close();
pngOut.close();
*/
}
catch (Exception e)
{
e.printStackTrace();
}
}
public static void anchoredText(SVGGraphics svgGenerator, String string, int x, int y, String textAnchor)
{
Element text = svgGenerator.getDOMFactory().createElementNS(SVGConstants.SVG_NAMESPACE_URI,
SVGConstants.SVG_TEXT_TAG);
text.setAttributeNS(null, SVGConstants.SVG_X_ATTRIBUTE, svgGenerator.generatorCtx().doubleString(x));
text.setAttributeNS(null, SVGConstants.SVG_Y_ATTRIBUTE, svgGenerator.generatorCtx().doubleString(y));
// center text
text.setAttributeNS(null, "text-anchor", textAnchor);
text.setAttributeNS(SVGConstants.XML_NAMESPACE_URI, SVGConstants.XML_SPACE_QNAME,
SVGConstants.XML_PRESERVE_VALUE);
text.appendChild(svgGenerator.getDOMFactory().createTextNode(string));
svgGenerator.domGroupManager().addElement(text, DOMGroupManager.FILL);
}
public String name;
public double[] outliers;
public double q1;
public double q2;
public double q3;
public double low;
public double high;
public BoxPlot(String s)
{
this.name = s;
}
public void paint(Graphics2D g2d, double scale, int xoffset, int yoffset)
{
// q1 line
g2d.drawLine(xoffset + (int) (q1 * scale), yoffset, xoffset + (int) (q1 * scale), yoffset + bpHeight);
// q2 line
g2d.drawLine(xoffset + (int) (q2 * scale), yoffset, xoffset + (int) (q2 * scale), yoffset + bpHeight);
// q3 line
g2d.drawLine(xoffset + (int) (q3 * scale), yoffset, xoffset + (int) (q3 * scale), yoffset + bpHeight);
// top
g2d.drawLine(xoffset + (int) (q1 * scale), yoffset, xoffset + (int) (q3 * scale), yoffset);
// bottom
g2d.drawLine(xoffset + (int) (q1 * scale), yoffset + bpHeight, xoffset + (int) (q3 * scale),
yoffset + bpHeight);
// left
g2d.drawLine(xoffset + (int) (low * scale), (int) (bpHeight * 0.25) + yoffset, xoffset + (int) (low * scale),
(int) (bpHeight * 0.75) + yoffset);
// left line
g2d.drawLine(xoffset + (int) (low * scale), (int) (bpHeight * 0.5) + yoffset, xoffset + (int) (q1 * scale),
(int) (bpHeight * 0.5) + yoffset);
// right
g2d.drawLine(xoffset + (int) (high * scale), (int) (bpHeight * 0.25) + yoffset, xoffset + (int) (high * scale),
(int) (bpHeight * 0.75) + yoffset);
// right line
g2d.drawLine(xoffset + (int) (q3 * scale), (int) (bpHeight * 0.5) + yoffset, xoffset + (int) (high * scale),
(int) (bpHeight * 0.5) + yoffset);
// outliers
for (double outlier : outliers)
{
g2d.drawOval(xoffset + (int) (outlier * scale), (int) (bpHeight * 0.5) + yoffset - 1, 2, 2);
}
}
}
class SVGGraphics extends SVGGraphics2D
{
public SVGGraphics(Document domFactory)
{
super(domFactory);
}
public SVGGraphics(Document domFactory, ImageHandler imageHandler, ExtensionHandler extensionHandler,
boolean textAsShapes)
{
super(domFactory, imageHandler, extensionHandler, textAsShapes);
}
public SVGGraphics(SVGGeneratorContext generatorCtx, boolean textAsShapes)
{
super(generatorCtx, textAsShapes);
}
public SVGGraphics(SVGGraphics2D g)
{
super(g);
}
public DOMGroupManager domGroupManager()
{
return domGroupManager;
}
public SVGGeneratorContext generatorCtx()
{
return generatorCtx;
}
}
MOA with ARFF file
import java.util.Date;
import org.jfree.chart.ChartFactory;
import org.jfree.chart.ChartPanel;
import org.jfree.chart.JFreeChart;
import org.jfree.chart.plot.CategoryPlot;
import org.jfree.chart.plot.PlotOrientation;
import org.jfree.data.category.DefaultCategoryDataset;
import org.jfree.ui.ApplicationFrame;
import org.jfree.ui.RefineryUtilities;
import moa.classifiers.Classifier;
import moa.classifiers.meta.AdaptiveRandomForest;
import moa.classifiers.meta.LeveragingBag;
import moa.classifiers.meta.OzaBagAdwin;
import moa.classifiers.meta.PairedLearners;
import moa.classifiers.meta.WEKAClassifier;
import moa.evaluation.LearningCurve;
import moa.streams.ArffFileStream;
import moa.tasks.EvaluatePrequential;
public class MainTwo extends ApplicationFrame
{
private static final long serialVersionUID = 1L;
ChartPanel chartPanel = null;
JFreeChart jfreechart = null;
boolean withoutLegend = true;
boolean tooltips = true;
boolean urls = false;
int height = 400;
int width = 1600;
public MainTwo(String title)
{
super(title);
doIt();
jfreechart = ChartFactory.createLineChart(title, "Record",
"Accuracy Rate", this.createDataset(),
PlotOrientation.VERTICAL, withoutLegend, tooltips, urls);
jfreechart.getPlot().setBackgroundPaint(java.awt.Color.WHITE);;
((CategoryPlot) jfreechart.getPlot()).getRangeAxis().setRange(min, max);
chartPanel = new ChartPanel(jfreechart);
// chartPanel.setBackground( java.awt.Color.WHITE );
chartPanel.setPreferredSize(new java.awt.Dimension(width, height));
this.setContentPane(chartPanel);
this.pack();
RefineryUtilities.centerFrameOnScreen(this);
this.setVisible(true);
}
private void refreshChart(String title)
{
doIt();
chartPanel.removeAll();
chartPanel.revalidate(); // This removes the old chart
jfreechart = ChartFactory.createLineChart(title, "Record",
"Accuracy Rate", this.createDataset(),
PlotOrientation.VERTICAL, withoutLegend, tooltips, urls);
jfreechart.getPlot().setBackgroundPaint(java.awt.Color.WHITE);
((CategoryPlot) jfreechart.getPlot()).getRangeAxis().setRange(min, max);
chartPanel = new ChartPanel(jfreechart);
chartPanel.setPreferredSize(new java.awt.Dimension(width, height));
// chartPanel.add(chartPanel);
chartPanel.repaint(); // This method makes the new chart appear
this.setContentPane(chartPanel);
this.pack();
RefineryUtilities.centerFrameOnScreen(this);
this.setVisible(true);
}
private static final String DEFAULT_INPUT_FILE = "data/iris_stream.arff";
public static void main(String[] args) throws InterruptedException
{
MainTwo myObj = new MainTwo(DEFAULT_INPUT_FILE + ": results initialed on "
+ new Date());
long count = 0;
while (true)
{
Thread.sleep(4000);
count++;
myObj.refreshChart(DEFAULT_INPUT_FILE + ": results reloaded on "
+ new Date());
System.out.println("refreshing " + count);
}
}
// set EvaluatePrequential's parameter
int maxInstances = 1000000;
int timeLimit = -1;
int sampleFrequencyOption = 5;
private void doIt()
{
// prepare input file for streaming evaluation
String arffFilePath = System.getenv("HOME") + "/" + DEFAULT_INPUT_FILE;
arffFilePath = DEFAULT_INPUT_FILE;
ArffFileStream myArffstream = null;
try
{
myArffstream = new ArffFileStream(arffFilePath, -1);
myArffstream.prepareForUse();
}
catch (Exception e)
{
System.out
.println("Problem with loading arff file. Quit the program");
System.exit(-1);
}
double[][] output = null;
Classifier targetClasifier = null;
targetClasifier = new PairedLearners();
output = doWork(targetClasifier, myArffstream);
xValLp = output[0];
accValLp = output[1];
targetClasifier = new LeveragingBag();
output = doWork(targetClasifier, myArffstream);
xValLb = output[0];
accValLb = output[1];
targetClasifier = new WEKAClassifier();
output = doWork(targetClasifier, myArffstream);
xValNbu = output[0];
accValNbu = output[1];
targetClasifier = new AdaptiveRandomForest();
output = doWork(targetClasifier, myArffstream);
xValArf = output[0];
accValArf = output[1];
targetClasifier = new OzaBagAdwin();
output = doWork(targetClasifier, myArffstream);
xValBa = output[0];
accValBa = output[1];
System.out.println("found records in " + DEFAULT_INPUT_FILE
+ ": nearly " + xValLp.length * sampleFrequencyOption
+ ", " + xValLb.length * sampleFrequencyOption
+ ", " + xValNbu.length * sampleFrequencyOption
+ ", " + xValArf.length * sampleFrequencyOption
+ ", " + xValBa.length * sampleFrequencyOption
);
}
private double[][] doWork(Classifier targetClasifier, ArffFileStream myArffstream )
{
// do the learning and checking using evaluate-prequential technique
EvaluatePrequential ep = new EvaluatePrequential();
ep.instanceLimitOption.setValue(maxInstances);
ep.learnerOption.setCurrentObject(targetClasifier);
ep.streamOption.setCurrentObject(myArffstream);
ep.sampleFrequencyOption.setValue(sampleFrequencyOption);
ep.timeLimitOption.setValue(timeLimit);
ep.prepareForUse();
// do the task and get the result
LearningCurve le = (LearningCurve) ep.doTask();
// System.out.println("Evaluate prequential using targetClasifier");
// System.out.println(le);
int size = le.numEntries();
double[] xValTemp = new double[size];
double[] accValTemp = new double[size];
for (int i = 0; i < size; i++)
{
xValTemp[i] = le.getMeasurement(i, 0);
accValTemp[i] = le.getMeasurement(i, 4);
}
double[][] result = new double[2][];
result[0] = xValTemp;
result[1] = accValTemp;
return result;
}
double[] xValLp = null;
double[] xValLb = null;
double[] xValNbu = null;
double[] xValArf = null;
double[] xValBa = null;
double[] accValLp = null;
double[] accValLb = null;
double[] accValNbu = null;
double[] accValArf = null;
double[] accValBa = null;
double min = 99.9999;
double max = 100.0;
private DefaultCategoryDataset createDataset()
{
DefaultCategoryDataset dataset = new DefaultCategoryDataset();
for (int i = 0; i < xValLp.length; i++)
{
if (this.accValLp[i] > max)
this.max = this.accValLp[i];
if (this.accValLp[i] < min)
this.min = this.accValLp[i];
dataset.addValue(this.accValLp[i], "PairedLearners", this.xValLp[i] + "");
}
for (int i = 0; i < xValLb.length; i++)
{
if (this.accValLb[i] > max)
this.max = this.accValLb[i];
if (this.accValLb[i] < min)
this.min = this.accValLb[i];
dataset.addValue(this.accValLb[i], "LeveragingBag", this.xValLb[i] + "");
}
for (int i = 0; i < xValNbu.length; i++)
{
if (this.accValNbu[i] > max)
this.max = this.accValNbu[i];
if (this.accValNbu[i] < min)
this.min = this.accValNbu[i];
dataset.addValue(this.accValNbu[i], "NaiveBayesUpdateable", this.xValNbu[i] + "");
}
for (int i = 0; i < xValArf.length; i++)
{
if (this.accValArf[i] > max)
this.max = this.accValArf[i];
if (this.accValArf[i] < min)
this.min = this.accValArf[i];
dataset.addValue(this.accValArf[i], "AdaptiveRandomForest", this.xValArf[i] + "");
}
for (int i = 0; i < xValBa.length; i++)
{
if (this.accValBa[i] > max)
this.max = this.accValBa[i];
if (this.accValBa[i] < min)
this.min = this.accValBa[i];
dataset.addValue(this.accValBa[i], "OzaBagAdwin", this.xValBa[i] + "");
}
return dataset;
}
}
Streaming data into arff file in Linux
cat iris_head.arff > iris_stream.arff;
cat iris_data.arff | awk 'BEGIN {srand()} !/^$/ { if (rand() <= .01) print $0}' >> iris_stream.arff;
cat iris_data.arff | awk 'BEGIN {srand()} !/^$/ { if (rand() <= .01) print $0}' >> iris_stream.arff;
cat iris_data.arff | awk 'BEGIN {srand()} !/^$/ { if (rand() <= .01) print $0}' >> iris_stream.arff;
sleep 10;
while true;
do
cat iris_data.arff | awk 'BEGIN {srand()} !/^$/ { if (rand() <= .01) print $0}' >> iris_stream.arff
cat iris_data.arff | awk 'BEGIN {srand()} !/^$/ { if (rand() <= .01) print $0}' >> iris_stream.arff
cat iris_data.arff | awk 'BEGIN {srand()} !/^$/ { if (rand() <= .01) print $0}' >> iris_stream.arff;
sleep 2;
done