added notes

f9545c7b · Sanjay Krishnan · d13622a7 · f9545c7b
Commit f9545c7b authored May 27, 2021 by Sanjay Krishnan
Showing with 581 additions and 0 deletions
inclass/Array Programming and Vectorization.ipynb
--- a/inclass/Array Programming and Vectorization.ipynb
+++ b/inclass/Array Programming and Vectorization.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "b83f7ada",
+   "metadata": {},
+   "source": [
+    "# Array and Vector Systems \n",
+    "\n",
+    "Manipulating numerical arrays of data is a crucial part of machine learning and scientific computing. NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.  While many of you know how to use this library, we will walk through some of the finer details of how it works.\n",
+    "\n",
+    "Let's first create an array:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 62,
+   "id": "df6f2e03",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "d = np.array([1.0,2.0,3.0,4.0])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 63,
+   "id": "bbdc8acc",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(4,)"
+      ]
+     },
+     "execution_count": 63,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "d.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 51,
+   "id": "79f4e50d",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(dtype('float64'), 4, 32)"
+      ]
+     },
+     "execution_count": 51,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "d.dtype, d.size, d.nbytes"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 66,
+   "id": "18e82a35",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "4.0"
+      ]
+     },
+     "execution_count": 66,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "d[index*sizeinbits]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73e2d59f",
+   "metadata": {},
+   "source": [
+    "## Precision\n",
+    "\n",
+    "We can control these properties much more carefull based on what we want to do with the data. For example, suppose I knew all my numbers were positive and less than 255 (like in an image pixel."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 67,
+   "id": "30be47da",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(dtype('uint8'), 4, 4)"
+      ]
+     },
+     "execution_count": 67,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "d = np.array([1,2,3,4], dtype=np.uint8)\n",
+    "d.dtype, d.size, d.nbytes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6244208a",
+   "metadata": {},
+   "source": [
+    "This knob is called the \"precision\" of a numerical computing system. Lower precision means that the framework can represent less numerical quantities--it's way more interesting for decimal numbers. Let's see how this works."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 55,
+   "id": "e0967c07",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[0.53846154]\n",
+      "[0.5386]\n"
+     ]
+    }
+   ],
+   "source": [
+    "d_full = np.array([7.0/13.0])\n",
+    "print(d_full)\n",
+    "\n",
+    "d_half = np.array([7.0/13.0],dtype=np.half)\n",
+    "print(d_half)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "72d0a5e7",
+   "metadata": {},
+   "source": [
+    "In cases where the trailing decimal points are not necessary playing with the precision can be useful. Interestingly enough, many machine learning systems now default to reduced precision arithmetic.\n",
+    "\n",
+    "\n",
+    "## Layout\n",
+    "Beyond precision, we can also consider data layout."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 68,
+   "id": "fcb29a83",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "matrix([[1, 2],\n",
+       "        [3, 4]])"
+      ]
+     },
+     "execution_count": 68,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "m = np.matrix([[1,2],[3,4]])\n",
+    "m"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 69,
+   "id": "d28d2e2d",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "  C_CONTIGUOUS : True\n",
+       "  F_CONTIGUOUS : False\n",
+       "  OWNDATA : False\n",
+       "  WRITEABLE : True\n",
+       "  ALIGNED : True\n",
+       "  WRITEBACKIFCOPY : False\n",
+       "  UPDATEIFCOPY : False"
+      ]
+     },
+     "execution_count": 69,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "m.flags"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 59,
+   "id": "26964f80",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "m1 = np.random.randn(1000,1000)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 60,
+   "id": "9d496b35",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 106 µs, sys: 233 µs, total: 339 µs\n",
+      "Wall time: 347 µs\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "-22.20328161673949"
+      ]
+     },
+     "execution_count": 60,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "m1[0,:].sum()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 61,
+   "id": "a614b5de",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 489 µs, sys: 245 µs, total: 734 µs\n",
+      "Wall time: 805 µs\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "-10.074603830504874"
+      ]
+     },
+     "execution_count": 61,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "m1[:,0].sum()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11a05407",
+   "metadata": {},
+   "source": [
+    "## Vectorization\n",
+    "In computer science, vectorization refers to solutions which allow the application of operations to an entire set of values at once. Such solutions are commonly used in scientific and engineering settings."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 70,
+   "id": "01b93c5f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data1 = np.random.randn(10000000)\n",
+    "data2 = np.random.randn(10000000)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 71,
+   "id": "70b6a67a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 8.27 s, sys: 205 ms, total: 8.48 s\n",
+      "Wall time: 10.9 s\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "data3 = np.zeros((10000000))\n",
+    "for i in range(10000000):\n",
+    "    data3[i] = data1[i]*data2[i]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "id": "d3576084",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 23.3 ms, sys: 9.42 ms, total: 32.8 ms\n",
+      "Wall time: 30.5 ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "c = data1*data2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "id": "563dec79",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "4999309\n",
+      "CPU times: user 3.27 s, sys: 5.55 ms, total: 3.27 s\n",
+      "Wall time: 3.28 s\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%time \n",
+    "cnt = 0\n",
+    "for i in range(10000000):\n",
+    "    if data1[i] > 0:\n",
+    "        cnt += 1\n",
+    "print(cnt)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 46,
+   "id": "117a5e34",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 12.3 ms, sys: 2.86 ms, total: 15.2 ms\n",
+      "Wall time: 13.6 ms\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "4999309"
+      ]
+     },
+     "execution_count": 46,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "np.count_nonzero(data1 > 0)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 90,
+   "id": "1c0ab3d4",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 502 ms, sys: 47.2 ms, total: 549 ms\n",
+      "Wall time: 844 ms\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "3.280983713096637"
+      ]
+     },
+     "execution_count": 90,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "\n",
+    "#calculate the average value for all elements above 3\n",
+    "data1 = np.random.randn(10000000)\n",
+    "np.mean(data1[data1 >= 3])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 91,
+   "id": "0248b4a6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 73.7 ms, sys: 55.9 ms, total: 130 ms\n",
+      "Wall time: 173 ms\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "array([[2242602],\n",
+       "       [4274764],\n",
+       "       [4987681]])"
+      ]
+     },
+     "execution_count": 91,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "#find consecutive elements greater than 7\n",
+    "np.argwhere(np.abs(data1[1:] - data1[:-1])>7)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 86,
+   "id": "a53037d7",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "matrix([[1, 0, 0],\n",
+       "        [1, 0, 0],\n",
+       "        [0, 1, 0],\n",
+       "        [0, 0, 1]])"
+      ]
+     },
+     "execution_count": 86,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data = np.matrix([[1,0,0],[1,0,0],[0,1,0],[0,0,1]])\n",
+    "data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 87,
+   "id": "e285087a",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "matrix([[1, 0, 0],\n",
+       "        [1, 0, 0],\n",
+       "        [1, 1, 0],\n",
+       "        [0, 0, 1]])"
+      ]
+     },
+     "execution_count": 87,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data[:,0] = data[:,0] + data[:,1]\n",
+    "data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 89,
+   "id": "19a949e5",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "matrix([[1, 0],\n",
+       "        [1, 0],\n",
+       "        [1, 0],\n",
+       "        [0, 1]])"
+      ]
+     },
+     "execution_count": 89,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data = np.matrix([[1,0,0],[1,0,0],[0,1,0],[0,0,1]])\n",
+    "data\n",
+    "\n",
+    "m = np.matrix([[1,1,0],[0,0,1]])\n",
+    "\n",
+    "data * m.T"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "284684ec",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#model serving\n",
+    "#service oriented architecture\n",
+    "#key performance indicators"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}