Collections Module Day 3 of 3
Day 6 of 100
After implementing a solution to the challenge yesterday, I went back and followed the tutorial closer and used the defaultdict
, namedtuple
, and Counter
tools suggested. It was clearly a more elegant and simple solution. Once I looked at the example provided, it became clearer to me how to use the tools and I think they will be more useful in the future, as long as I can remember to use them...
from collections import defaultdict, namedtuple, Counter
import csv
from urllib.request import urlretrieve
movie_data = 'https://raw.githubusercontent.com/pybites/challenges/solutions/13/movie_metadata.csv'
movies_csv = 'movies.csv'
urlretrieve(movie_data, movies_csv)
Movie = namedtuple('Movie', 'title year score')
def get_movies_by_director(data=movies_csv):
"""Extracts all the movies from csv and stores them in a dictionary
where keys are directors, and values is a list of movies (named tuples)"""
directors = defaultdict(list)
with open(data, encoding='utf-8') as f:
for line in csv.DictReader(f):
try:
director = line['director_name']
movie = line['movie_title'].replace('\xa0', '')
year = int(line['title_year'])
score = float(line['imdb_score'])
except ValueError:
continue
m = Movie(title=movie, year=year, score=score)
directors[director].append(m)
return directors
directors = get_movies_by_director()
cnt = Counter()
for director, movies in directors.items():
cnt[director] += len(movies)
print(cnt.most_common(5))
Which outputs:
[('Steven Spielberg', 26),
('Woody Allen', 22),
('Martin Scorsese', 20),
('Clint Eastwood', 20),
('Ridley Scott', 17)]
While learning about the collections
module has been great, revisiting the file i/o and reading and parsing csv has my my wheels turning and has inspired be to start my first side project - a small networking utility that I think will benefit my real world job. More to come soon!