$ sudo cp -r /mnt/c/de/mllib /home/hduser
$ sudo chown hduser:hduser -R /home/hduser/mllib
$ hdfs dfs -put mllib /user/hduser
$ hdfs dfs -ls /user/hduser/mllib
$ hdfs dfs -ls /user/hduser/mllib/data
$ hdfs dfs -cat /user/hduser/mllib/matchmaker.py | more
$ $SPARK_HOME/bin/spark-submit mllib/matchmaker.py 1 M > matchmaking_recs.txt
Note that you need change the correct file paths inside the matchmaker.py, e.g. from data/*.dat to hdfs://localhost:9000/user/hduser/mllib/data/*.dat
$ hdfs dfs -ls /user/hduser/mllib/data
$ hdfs dfs -cat /user/hduser/mllib/earthquakes_clustering.py | more
$ $SPARK_HOME/bin/spark-submit mllib/earthquakes_clustering.py hdfs://localhost:9000/user/hduser/mllib/data/earthquakes.csv 6 > clusters.txt
Note that this exercise takes the data file name, located in the distributed file system, as a parameter
$ hdfs dfs -ls /user/hduser/mllib/data
$ hdfs dfs -cat /user/hduser/mllib/naive_bayes_example.py | more
$ $SPARK_HOME/bin/spark-submit mllib/naive_bayes_example.py
Note that you need change the correct file path inside the naive_bayes_example.py, e.g. from data/*.txt to hdfs://localhost:9000/user/hduser/mllib/data/*.txt
$ hdfs dfs -ls /user/hduser/mllib/data
$ hdfs dfs -cat /user/hduser/mllib/als_example.py | more
$ $SPARK_HOME/bin/spark-submit mllib/als_example.py
Note that you need change the correct file path inside the als_example.py, e.g. from data/*.txt to hdfs://localhost:9000/user/hduser/mllib/data/*.txt