ChooJun

View My GitHub Profile

spark

I. Spark and Visualization home

  1. Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Read more at URL https://seaborn.pydata.org/
  2. Visualization.zip

I1. Visualization with PySpark home

  1. Login as hduser, and install the seaborn package
    $ cd ~
    $ pip install seaborn
    
  2. Make a copy of the C:\de\Visualization folder in the local hduser’s home directory
    $ sudo cp -r /mnt/c/de/Visualization /home/hduser
    $ sudo chown hduser:hduser -R /home/hduser/Visualization
    
  3. Change to the Visualization directory and start the Jupyter notebook server by issuing the following command. Then, copy and paste one of the URLs that are listed in any web browser
    $ cd ~/Visualization
    $ jupyter notebook --port=8888 --no-browser
    

    To terminate the Jupyter notebook server with Ctrl-C

  4. Open up the Practical8_Visualization_with_Pyspark.ipynb notebook and do the visualization exercises there for the construction of pie chart, correlation matrix, boxplot, scatter chart, violin plot and 3D scatter plot

pie boxplot scatter violin 3dscatter