"Manipulating numerical arrays of data is a crucial part of machine learning and scientific computing. NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. While many of you know how to use this library, we will walk through some of the finer details of how it works.\n",
"\n",
"Let's first create an array:"
]
},
{
"cell_type": "code",
"execution_count": 62,
"id": "df6f2e03",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"d = np.array([1.0,2.0,3.0,4.0])"
]
},
{
"cell_type": "code",
"execution_count": 63,
"id": "bbdc8acc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(4,)"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d.shape"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "79f4e50d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(dtype('float64'), 4, 32)"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d.dtype, d.size, d.nbytes"
]
},
{
"cell_type": "code",
"execution_count": 66,
"id": "18e82a35",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4.0"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d[index*sizeinbits]"
]
},
{
"cell_type": "markdown",
"id": "73e2d59f",
"metadata": {},
"source": [
"## Precision\n",
"\n",
"We can control these properties much more carefull based on what we want to do with the data. For example, suppose I knew all my numbers were positive and less than 255 (like in an image pixel."
]
},
{
"cell_type": "code",
"execution_count": 67,
"id": "30be47da",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(dtype('uint8'), 4, 4)"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d = np.array([1,2,3,4], dtype=np.uint8)\n",
"d.dtype, d.size, d.nbytes"
]
},
{
"cell_type": "markdown",
"id": "6244208a",
"metadata": {},
"source": [
"This knob is called the \"precision\" of a numerical computing system. Lower precision means that the framework can represent less numerical quantities--it's way more interesting for decimal numbers. Let's see how this works."
]
},
{
"cell_type": "code",
"execution_count": 55,
"id": "e0967c07",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.53846154]\n",
"[0.5386]\n"
]
}
],
"source": [
"d_full = np.array([7.0/13.0])\n",
"print(d_full)\n",
"\n",
"d_half = np.array([7.0/13.0],dtype=np.half)\n",
"print(d_half)"
]
},
{
"cell_type": "markdown",
"id": "72d0a5e7",
"metadata": {},
"source": [
"In cases where the trailing decimal points are not necessary playing with the precision can be useful. Interestingly enough, many machine learning systems now default to reduced precision arithmetic.\n",
"\n",
"\n",
"## Layout\n",
"Beyond precision, we can also consider data layout."
"In computer science, vectorization refers to solutions which allow the application of operations to an entire set of values at once. Such solutions are commonly used in scientific and engineering settings."