Exercise 1 - Ridership per Station
import sys
import string
import logging
from util import mapper_logfile
logging.basicConfig(filename=mapper_logfile, format='%(message)s',
level=logging.INFO, filemode='w')
def mapper():
"""
The input to this mapper will be the final Subway-MTA dataset, the same as
in the previous exercise. You can check out the csv and its structure below:
https://www.dropbox.com/s/meyki2wl9xfa7yk/turnstile_data_master_with_weather.csv
For each line of input, the mapper output should PRINT (not return) the UNIT as
the key, the number of ENTRIESn_hourly as the value, and separate the key and
the value by a tab. For example: 'R002\t105105.0'
Since you are printing the output of your program, printing a debug
statement will interfere with the operation of the grader. Instead,
use the logging module, which we've configured to log to a file printed
when you click "Test Run". For example:
logging.info("My debugging message")
The logging module can be used to give you more control over your debugging
or other messages than you can get by printing them. In this exercise, print
statements from your mapper will go to your reducer, and print statements
from your reducer will be considered your final output. By contrast, messages
logged via the loggers we configured will be saved to two files, one
for the mapper and one for the reducer. If you click "Test Run", then we
will show the contents of those files once your program has finished running.
The logging module also has other capabilities; see
https://docs.python.org/2/library/logging.html for more information.
"""
for line in sys.stdin:
data = line.strip().split(',');
if len(data) !=22 or data[6] == "ENTRIESn_hourly":
continue
else:
print '{0}\t{1}'.format(data[1], data[6])
logging.info("{0}\t{1}".format(data[1], data[6]))
mapper()
---------------------------------------------------------------------------------------------------------------------
Exercise 2 - Ridership by Weather Type
import sys
import string
import logging
from util import mapper_logfile
logging.basicConfig(filename=mapper_logfile, format='%(message)s',
level=logging.INFO, filemode='w')
def mapper():
"""
The input to this mapper will be the final Subway-MTA dataset, the same as
in the previous exercise. You can check out the csv and its structure below:
https://www.dropbox.com/s/meyki2wl9xfa7yk/turnstile_data_master_with_weather.csv
For each line of input, the mapper output should PRINT (not return) the UNIT as
the key, the number of ENTRIESn_hourly as the value, and separate the key and
the value by a tab. For example: 'R002\t105105.0'
Since you are printing the output of your program, printing a debug
statement will interfere with the operation of the grader. Instead,
use the logging module, which we've configured to log to a file printed
when you click "Test Run". For example:
logging.info("My debugging message")
The logging module can be used to give you more control over your debugging
or other messages than you can get by printing them. In this exercise, print
statements from your mapper will go to your reducer, and print statements
from your reducer will be considered your final output. By contrast, messages
logged via the loggers we configured will be saved to two files, one
for the mapper and one for the reducer. If you click "Test Run", then we
will show the contents of those files once your program has finished running.
The logging module also has other capabilities; see
https://docs.python.org/2/library/logging.html for more information.
"""
for line in sys.stdin:
data = line.strip().split(',');
if len(data) !=22 or data[6] == "ENTRIESn_hourly":
continue
else:
print '{0}\t{1}'.format(data[1], data[6])
logging.info("{0}\t{1}".format(data[1], data[6]))
mapper()
import sys
import logging
from util import reducer_logfile
logging.basicConfig(filename=reducer_logfile, format='%(message)s',
level=logging.INFO, filemode='w')
def reducer():
'''
Given the output of the mapper for this exercise, the reducer should PRINT
(not return) one line per UNIT along with the total number of ENTRIESn_hourly
over the course of May (which is the duration of our data), separated by a tab.
An example output row from the reducer might look like this: 'R001\t500625.0'
You can assume that the input to the reducer is sorted such that all rows
corresponding to a particular UNIT are grouped together.
Since you are printing the output of your program, printing a debug
statement will interfere with the operation of the grader. Instead,
use the logging module, which we've configured to log to a file printed
when you click "Test Run". For example:
logging.info("My debugging message")
'''
entries = 0
old_key = None
for line in sys.stdin:
data = line.strip().split("\t")
if len(data) !=2:
continue
key, count = data
if old_key and old_key != key:
print "{0}\t{1}".format(old_key,entries)
entries = 0
old_key = key
entries += float(count)
if old_key != None:
print "{0}\t{1}".format(old_key, entries)
reducer()
---------------------------------------------------------------------------------------------------------------------
Exercise 2 - Ridership by Weather Type
No comments:
Post a Comment