{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Homework 4: Python Dask Lab\n", "\n", "*Due May 7th, 2021 11:59 PM*\n", "\n", "Dask is an open source library for parallel computing written in Python. We will use Dask over the next few weeks to illustrate the basics of parallel and distributed computation. This homework assignment will walk you through some of the basic syntax of Dask. \n", "\n", "It is your job to read the documentation and figure out how to do each step on your own. You are responsible for adding code in every \"FILL IN HERE\" statement below.\n", "\n", "## Installing Dask\n", "To get started, you need to install the dask packages. If you are using `pip`\n", "```\n", "pip install dask\n", "pip install \"dask[distributed]\"\n", "```\n", "If you are using, `conda`:\n", "```\n", "conda install numpy pandas h5py pillow matplotlib scipy toolz pytables snakeviz scikit-image dask distributed -c conda-forge\n", "```\n", "Let us know if you have any difficulties installing Dask.\n", "\n", "\n", "## Exercise 1. Loading Data Sets\n", "\n", "We've given you a sample dataset of flights from the JFK aiport (arrival, departure, delays, etc.). Dask is similar to Pandas as it exposes a DataFrame interface. Write code thee below to load the data in `nycflights.csv` into a Dask DataFrame" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import dask.dataframe as dd\n", "df = #FILL IN HERE" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | year | \n", "month | \n", "day | \n", "dep_time | \n", "dep_delay | \n", "arr_time | \n", "arr_delay | \n", "carrier | \n", "tailnum | \n", "flight | \n", "origin | \n", "dest | \n", "air_time | \n", "distance | \n", "hour | \n", "minute | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2013 | \n", "6 | \n", "30 | \n", "940 | \n", "15 | \n", "1216 | \n", "-4 | \n", "VX | \n", "N626VA | \n", "407 | \n", "JFK | \n", "LAX | \n", "313 | \n", "2475 | \n", "9 | \n", "40 | \n", "
1 | \n", "2013 | \n", "5 | \n", "7 | \n", "1657 | \n", "-3 | \n", "2104 | \n", "10 | \n", "DL | \n", "N3760C | \n", "329 | \n", "JFK | \n", "SJU | \n", "216 | \n", "1598 | \n", "16 | \n", "57 | \n", "
2 | \n", "2013 | \n", "12 | \n", "8 | \n", "859 | \n", "-1 | \n", "1238 | \n", "11 | \n", "DL | \n", "N712TW | \n", "422 | \n", "JFK | \n", "LAX | \n", "376 | \n", "2475 | \n", "8 | \n", "59 | \n", "
3 | \n", "2013 | \n", "5 | \n", "14 | \n", "1841 | \n", "-4 | \n", "2122 | \n", "-34 | \n", "DL | \n", "N914DL | \n", "2391 | \n", "JFK | \n", "TPA | \n", "135 | \n", "1005 | \n", "18 | \n", "41 | \n", "
4 | \n", "2013 | \n", "7 | \n", "21 | \n", "1102 | \n", "-3 | \n", "1230 | \n", "-8 | \n", "9E | \n", "N823AY | \n", "3652 | \n", "LGA | \n", "ORF | \n", "50 | \n", "296 | \n", "11 | \n", "2 | \n", "
\n", " | year | \n", "month | \n", "day | \n", "dep_time | \n", "dep_delay | \n", "arr_time | \n", "arr_delay | \n", "carrier | \n", "tailnum | \n", "flight | \n", "origin | \n", "dest | \n", "air_time | \n", "distance | \n", "hour | \n", "minute | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | \n", "2013 | \n", "5 | \n", "7 | \n", "1657 | \n", "-3 | \n", "2104 | \n", "10 | \n", "DL | \n", "N3760C | \n", "329 | \n", "JFK | \n", "SJU | \n", "216 | \n", "1598 | \n", "16 | \n", "57 | \n", "
2 | \n", "2013 | \n", "12 | \n", "8 | \n", "859 | \n", "-1 | \n", "1238 | \n", "11 | \n", "DL | \n", "N712TW | \n", "422 | \n", "JFK | \n", "LAX | \n", "376 | \n", "2475 | \n", "8 | \n", "59 | \n", "
5 | \n", "2013 | \n", "1 | \n", "1 | \n", "1817 | \n", "-3 | \n", "2008 | \n", "3 | \n", "AA | \n", "N3AXAA | \n", "353 | \n", "LGA | \n", "ORD | \n", "138 | \n", "733 | \n", "18 | \n", "17 | \n", "
6 | \n", "2013 | \n", "12 | \n", "9 | \n", "1259 | \n", "14 | \n", "1617 | \n", "22 | \n", "WN | \n", "N218WN | \n", "1428 | \n", "EWR | \n", "HOU | \n", "240 | \n", "1411 | \n", "12 | \n", "59 | \n", "
7 | \n", "2013 | \n", "8 | \n", "13 | \n", "1920 | \n", "85 | \n", "2032 | \n", "71 | \n", "B6 | \n", "N284JB | \n", "1407 | \n", "JFK | \n", "IAD | \n", "48 | \n", "228 | \n", "19 | \n", "20 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
32726 | \n", "2013 | \n", "2 | \n", "4 | \n", "1558 | \n", "-2 | \n", "1854 | \n", "4 | \n", "DL | \n", "N3737C | \n", "1331 | \n", "JFK | \n", "DEN | \n", "238 | \n", "1626 | \n", "15 | \n", "58 | \n", "
32728 | \n", "2013 | \n", "7 | \n", "13 | \n", "1923 | \n", "18 | \n", "2124 | \n", "18 | \n", "9E | \n", "N922XJ | \n", "3525 | \n", "JFK | \n", "ORD | \n", "107 | \n", "740 | \n", "19 | \n", "23 | \n", "
32729 | \n", "2013 | \n", "1 | \n", "28 | \n", "706 | \n", "36 | \n", "909 | \n", "22 | \n", "EV | \n", "N13914 | \n", "4419 | \n", "EWR | \n", "IND | \n", "105 | \n", "645 | \n", "7 | \n", "6 | \n", "
32731 | \n", "2013 | \n", "7 | \n", "7 | \n", "812 | \n", "-3 | \n", "1043 | \n", "8 | \n", "DL | \n", "N6713Y | \n", "1429 | \n", "JFK | \n", "LAS | \n", "286 | \n", "2248 | \n", "8 | \n", "12 | \n", "
32733 | \n", "2013 | \n", "10 | \n", "15 | \n", "844 | \n", "56 | \n", "1045 | \n", "60 | \n", "B6 | \n", "N258JB | \n", "1273 | \n", "JFK | \n", "CHS | \n", "93 | \n", "636 | \n", "8 | \n", "44 | \n", "
13462 rows × 16 columns
\n", "