Component of effective Visualization:
- Visual cues
- Position
- Length
- Angle
- Visual Encoding
- Direction
- Shape
- Area
- Volume
- Color
- Hue
- Saturation
- Coordinate systems
- Scale / Data Types
- Numeric
- Categorical
- Time Series
- Context
Visual Cues Accuracy:
Python Plotting Package:
- MatPlotLib
- GGPloyt ( Grammar of Graphics)
- ggplot ( data, aes (xVar, yVar)) + geom _point(color='red') + geom_line(color='blue') + xlab ('x axis label') + ylab ( 'y axis label')
- print ggplot ( data, aes (xVar, yVar))
Plotting in Python
from pandas import *
from ggplot import *
import pandas
def lineplot(hr_year_csv):
# A csv file will be passed in as an argument which
# contains two columns -- 'HR' (the number of homerun hits)
# and 'yearID' (the year in which the homeruns were hit).
#
# Fill out the body of this function, lineplot, to use the
# passed-in csv file, hr_year_csv, and create a
# chart with points connected by lines, both colored 'red',
# showing the number of HR by year.
#
# You will want to first load the csv file into a pandas dataframe
# and use the pandas dataframe along with ggplot to create your visualization
#
# You can check out the data in the csv file at the link below:
# https://www.dropbox.com/s/awgdal71hc1u06d/hr_year.csv
#
# You can read more about ggplot at the following link:
# https://github.com/yhat/ggplot/
inData = pandas.read_csv(hr_year_csv)
gg = ggplot(inData, aes('yearID', 'HR')) + geom_point(color = 'red') + geom_line(color = 'red') + ggtitle('title') + xlab('x') + ylab('y')
return gg
- Numeric
- Discrete
- Continuous
- Categorical Data
- Time Series
Plotting Line Chart:
from pandas import *
from ggplot import *
import pandas
def lineplot_compare(hr_by_team_year_sf_la_csv):
# Write a function, lineplot_compare, that will read a csv file
# called hr_by_team_year_sf_la_csv and plot it using pandas and ggplot2.
#
# This csv file has three columns -- yearID, HR, and teamID,
# representing the total number of HR hit each year by the SF Giants
# and LA Dodgers. Produce a visualization comparing the total HR by
# year of the two teams.
#
# You can see the data in hr_by_team_year_sf_la_csv
# at the link below:
# https://www.dropbox.com/s/wn43cngo2wdle2b/hr_by_team_year_sf_la.csv
#
# Note that to differentiate between multiple categories on the
# same plot in ggplot, we can pass color in with the other arguments
# to aes, rather than in our geometry functions.
#
# For example, ggplot(data, aes(xvar, yvar, color=category_var)). This
# should help you.
inData = pandas.read_csv(hr_by_team_year_sf_la_csv)
gg = ggplot(inData, aes(x='yearID', y='HR', color = 'teamID')) + geom_point() + geom_line()
return gg
No comments:
Post a Comment