Commit f9545c7b by Sanjay Krishnan

added notes

parent d13622a7
{
"cells": [
{
"cell_type": "markdown",
"id": "b83f7ada",
"metadata": {},
"source": [
"# Array and Vector Systems \n",
"\n",
"Manipulating numerical arrays of data is a crucial part of machine learning and scientific computing. NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. While many of you know how to use this library, we will walk through some of the finer details of how it works.\n",
"\n",
"Let's first create an array:"
]
},
{
"cell_type": "code",
"execution_count": 62,
"id": "df6f2e03",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"d = np.array([1.0,2.0,3.0,4.0])"
]
},
{
"cell_type": "code",
"execution_count": 63,
"id": "bbdc8acc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(4,)"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d.shape"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "79f4e50d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(dtype('float64'), 4, 32)"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d.dtype, d.size, d.nbytes"
]
},
{
"cell_type": "code",
"execution_count": 66,
"id": "18e82a35",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4.0"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d[index*sizeinbits]"
]
},
{
"cell_type": "markdown",
"id": "73e2d59f",
"metadata": {},
"source": [
"## Precision\n",
"\n",
"We can control these properties much more carefull based on what we want to do with the data. For example, suppose I knew all my numbers were positive and less than 255 (like in an image pixel."
]
},
{
"cell_type": "code",
"execution_count": 67,
"id": "30be47da",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(dtype('uint8'), 4, 4)"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d = np.array([1,2,3,4], dtype=np.uint8)\n",
"d.dtype, d.size, d.nbytes"
]
},
{
"cell_type": "markdown",
"id": "6244208a",
"metadata": {},
"source": [
"This knob is called the \"precision\" of a numerical computing system. Lower precision means that the framework can represent less numerical quantities--it's way more interesting for decimal numbers. Let's see how this works."
]
},
{
"cell_type": "code",
"execution_count": 55,
"id": "e0967c07",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.53846154]\n",
"[0.5386]\n"
]
}
],
"source": [
"d_full = np.array([7.0/13.0])\n",
"print(d_full)\n",
"\n",
"d_half = np.array([7.0/13.0],dtype=np.half)\n",
"print(d_half)"
]
},
{
"cell_type": "markdown",
"id": "72d0a5e7",
"metadata": {},
"source": [
"In cases where the trailing decimal points are not necessary playing with the precision can be useful. Interestingly enough, many machine learning systems now default to reduced precision arithmetic.\n",
"\n",
"\n",
"## Layout\n",
"Beyond precision, we can also consider data layout."
]
},
{
"cell_type": "code",
"execution_count": 68,
"id": "fcb29a83",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"matrix([[1, 2],\n",
" [3, 4]])"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m = np.matrix([[1,2],[3,4]])\n",
"m"
]
},
{
"cell_type": "code",
"execution_count": 69,
"id": "d28d2e2d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
" C_CONTIGUOUS : True\n",
" F_CONTIGUOUS : False\n",
" OWNDATA : False\n",
" WRITEABLE : True\n",
" ALIGNED : True\n",
" WRITEBACKIFCOPY : False\n",
" UPDATEIFCOPY : False"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m.flags"
]
},
{
"cell_type": "code",
"execution_count": 59,
"id": "26964f80",
"metadata": {},
"outputs": [],
"source": [
"m1 = np.random.randn(1000,1000)"
]
},
{
"cell_type": "code",
"execution_count": 60,
"id": "9d496b35",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 106 µs, sys: 233 µs, total: 339 µs\n",
"Wall time: 347 µs\n"
]
},
{
"data": {
"text/plain": [
"-22.20328161673949"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"m1[0,:].sum()"
]
},
{
"cell_type": "code",
"execution_count": 61,
"id": "a614b5de",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 489 µs, sys: 245 µs, total: 734 µs\n",
"Wall time: 805 µs\n"
]
},
{
"data": {
"text/plain": [
"-10.074603830504874"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"m1[:,0].sum()"
]
},
{
"cell_type": "markdown",
"id": "11a05407",
"metadata": {},
"source": [
"## Vectorization\n",
"In computer science, vectorization refers to solutions which allow the application of operations to an entire set of values at once. Such solutions are commonly used in scientific and engineering settings."
]
},
{
"cell_type": "code",
"execution_count": 70,
"id": "01b93c5f",
"metadata": {},
"outputs": [],
"source": [
"data1 = np.random.randn(10000000)\n",
"data2 = np.random.randn(10000000)"
]
},
{
"cell_type": "code",
"execution_count": 71,
"id": "70b6a67a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 8.27 s, sys: 205 ms, total: 8.48 s\n",
"Wall time: 10.9 s\n"
]
}
],
"source": [
"%%time\n",
"data3 = np.zeros((10000000))\n",
"for i in range(10000000):\n",
" data3[i] = data1[i]*data2[i]"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "d3576084",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 23.3 ms, sys: 9.42 ms, total: 32.8 ms\n",
"Wall time: 30.5 ms\n"
]
}
],
"source": [
"%%time\n",
"c = data1*data2"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "563dec79",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4999309\n",
"CPU times: user 3.27 s, sys: 5.55 ms, total: 3.27 s\n",
"Wall time: 3.28 s\n"
]
}
],
"source": [
"%%time \n",
"cnt = 0\n",
"for i in range(10000000):\n",
" if data1[i] > 0:\n",
" cnt += 1\n",
"print(cnt)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "117a5e34",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 12.3 ms, sys: 2.86 ms, total: 15.2 ms\n",
"Wall time: 13.6 ms\n"
]
},
{
"data": {
"text/plain": [
"4999309"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"np.count_nonzero(data1 > 0)"
]
},
{
"cell_type": "code",
"execution_count": 90,
"id": "1c0ab3d4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 502 ms, sys: 47.2 ms, total: 549 ms\n",
"Wall time: 844 ms\n"
]
},
{
"data": {
"text/plain": [
"3.280983713096637"
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"\n",
"#calculate the average value for all elements above 3\n",
"data1 = np.random.randn(10000000)\n",
"np.mean(data1[data1 >= 3])"
]
},
{
"cell_type": "code",
"execution_count": 91,
"id": "0248b4a6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 73.7 ms, sys: 55.9 ms, total: 130 ms\n",
"Wall time: 173 ms\n"
]
},
{
"data": {
"text/plain": [
"array([[2242602],\n",
" [4274764],\n",
" [4987681]])"
]
},
"execution_count": 91,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"#find consecutive elements greater than 7\n",
"np.argwhere(np.abs(data1[1:] - data1[:-1])>7)"
]
},
{
"cell_type": "code",
"execution_count": 86,
"id": "a53037d7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"matrix([[1, 0, 0],\n",
" [1, 0, 0],\n",
" [0, 1, 0],\n",
" [0, 0, 1]])"
]
},
"execution_count": 86,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = np.matrix([[1,0,0],[1,0,0],[0,1,0],[0,0,1]])\n",
"data"
]
},
{
"cell_type": "code",
"execution_count": 87,
"id": "e285087a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"matrix([[1, 0, 0],\n",
" [1, 0, 0],\n",
" [1, 1, 0],\n",
" [0, 0, 1]])"
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[:,0] = data[:,0] + data[:,1]\n",
"data"
]
},
{
"cell_type": "code",
"execution_count": 89,
"id": "19a949e5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"matrix([[1, 0],\n",
" [1, 0],\n",
" [1, 0],\n",
" [0, 1]])"
]
},
"execution_count": 89,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = np.matrix([[1,0,0],[1,0,0],[0,1,0],[0,0,1]])\n",
"data\n",
"\n",
"m = np.matrix([[1,1,0],[0,0,1]])\n",
"\n",
"data * m.T"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "284684ec",
"metadata": {},
"outputs": [],
"source": [
"#model serving\n",
"#service oriented architecture\n",
"#key performance indicators"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment