Commit dc062b6f by Sanjay Krishnan

hw1 added

parent 9d154ccc
# Homework 1. Introduction to Python and File I/O
This homework assignment is meant to be an introduction to Python programming and introduces some basic concepts of encoding and decoding.
Due Date: *Friday April 15, 2020 11:59 pm*
## Initial Setup
These initial setup instructions assume you've done ``hw0``. Before you start an assingment you should sync your cloned repository with the online one:
```
$ cd cmsc13600-materials
$ git pull
```
Copy the folder ``hw1`` to your newly cloned submission repository. Enter that repository from the command line and enter the copied ``hw1`` folder. In this homework assignment, you will only modify ``encoding.py``. Once you are done, you must add 'encoding.py' to git:
```
$ git add encoding.py
```
After adding your files, to submit your code you must run:
```
$ git commit -m"My submission"
$ git push
```
We will NOT grade any code that is not added, committed, and pushed to your submission repository. You can confirm your submission by visiting the web interface[https://mit.cs.uchicago.edu/cmsc13600-spr-20/skr]
## Delta Encoding
Delta encoding is a way of storing or transmitting data in the form of differences (deltas) between sequential data rather than complete files.
In this first assignment, you will implement a delta encoding module in python.
The module will:
* Load a file of integers
* Delta encode them
* Write back a file in binary form
The instructions in this assignment are purposefully incomplete for you to read Python's API and to understand how the different functions work. All of the necessary parts that you need to write are marked with *TODO*.
## TODO 1. Loading the data file
In `encoding.py`, your first task is to write `load_orig_file`. This function reads from a specified filename and returns a list of integers in the file. You may assume the file is formatted like ``data.txt`` provided with the code, where each line contains a single integer number. The input of this function is a filename and the output is a list of numbers. If the file does not exist you must raise an exception.
## TODO 2. Compute the basic encoding
In `encoding.py`, your next task is to write `delta_encoding`. This function takes a list of numbers and computes the delta encoding. The delta encoding encodes the list in terms of successive differences from the previous element. The first element is kept as is in the encoding.
For example:
```
> data = [1,3,4,3]
> enc = delta_encoding(data)
1,2,1,-1
```
Or,
```
> data = [1,0,6,1]
> enc = delta_encoding(data)
1,-1,6,-5
```
Your job is to write a function that computes this encoding. Pay close attention to how python passes around references and where you make copies of lists v.s. modify a list in place.
## TODO 3. Integer Shifting
When we write this data to a file, we will want to represent each encoded value as an unsigned short integer (1 single byte of data). To do so, we have to "shift" all of the values upwards so there are no negatives. You will write a function `shift` that adds a pre-specified offset to each value.
## TODO 4. Write Encoding
Now, we are ready to write the encoded data to disk. In the function `write_encoding`, you will do the following steps:
* Open the specified filename in the function arguments for writing
* Convert the encoded list of numbers into a bytearray
* Write the bytearray to the file
* Close the file
Reading from such a file is a little tricky, so we've provided that function for you.
## TODO 5. Delta Decoding
Finally, you will write a function that takes a delta encoded list and recovers the original data. This should do the opposite of what you did before. Don't forget to unshift the data when you are testing!
For example:
```
> enc = [1,2,1,-1]
> data = delta_decoding(enc)
1,3,4,3
```
Or,
```
> data = [1,-1,6,-5]
> data = delta_decoding(enc)
1,0,6,1
```
import random
from encoding import *
def test_load():
data = load_orig_file('data.txt')
try:
assert(sum(data) == 1778744)
except AssertionError:
print('TODO 1. Failure check your load_orig_file function')
def test_encoding():
data = load_orig_file('data.txt')
encoded = delta_encoding(data)
try:
assert(sum(encoded) == data[-1])
assert(sum(encoded) == 26)
assert(len(data) == len(encoded))
except AssertionError:
print('TODO 2. Failure check your delta_encoding function')
def test_shift():
data = load_orig_file('data.txt')
encoded = delta_encoding(data)
N = len(data)
try:
assert(sum(shift(data, 10)) == N*10 + sum(data))
assert(all([d >=0 for d in shift(encoded,4)]))
except AssertionError:
print('TODO 3. Failure check your shift function')
def test_decoding():
data = load_orig_file('data.txt')
encoded = delta_encoding(data)
sencoded = shift(encoded ,4)
data_p = delta_decoding(unshift(sencoded,4))
try:
assert(data == data_p)
except AssertionError:
print('TODO 5. Cannot recover data with delta_decoding')
def generate_file(size, seed):
FILE_NAME = 'data.gen.txt'
f = open(FILE_NAME,'w')
initial = seed
for i in range(size):
f.write(str(initial) + '\n')
initial += random.randint(-4, 4)
def generate_random_tests():
SIZES = (1,1000,16,99)
SEEDS = (240,-3, 9, 1)
cnt = 0
for trials in range(10):
generate_file(random.choice(SIZES), random.choice(SEEDS))
data = load_orig_file('data.gen.txt')
encoded = delta_encoding(data)
sencoded = shift(encoded ,4)
write_encoding(sencoded, 'data_out.txt')
loaded = unshift(read_encoding('data_out.txt'),4)
decoded = delta_decoding(loaded)
cnt += (decoded == data)
try:
assert(cnt == 10)
except AssertionError:
print('Failed Random Tests', str(10-cnt), 'out of 10')
test_load()
test_encoding()
test_shift()
test_decoding()
generate_random_tests()
\ No newline at end of file
107
105
106
103
105
108
109
112
110
110
109
111
109
106
110
112
115
114
116
118
117
116
113
112
115
111
114
118
122
124
127
125
123
124
126
128
130
134
131
133
129
129
129
130
127
130
130
128
131
131
129
126
125
124
121
117
118
120
117
117
118
116
118
120
124
123
122
125
124
123
123
126
126
124
120
117
117
118
116
118
115
115
111
115
114
115
115
115
114
118
115
116
116
116
117
114
111
113
117
116
119
115
115
114
111
113
115
111
110
114
113
115
119
122
122
121
125
126
130
126
128
131
131
132
134
133
135
131
130
132
134
134
138
134
131
131
133
136
134
133
129
133
131
130
126
127
125
125
127
130
128
130
133
136
139
141
142
138
142
142
143
145
149
153
152
149
145
148
144
147
147
148
147
150
152
148
152
152
152
150
152
153
155
159
158
162
164
160
162
166
167
165
165
168
172
171
170
170
173
175
175
173
172
173
173
176
177
175
172
175
174
174
171
172
168
169
165
164
163
162
163
161
161
161
158
159
163
162
158
162
160
160
163
161
159
163
163
164
165
168
165
169
166
164
160
161
165
168
166
167
167
171
167
170
173
172
175
178
180
177
173
177
180
178
182
186
182
179
180
182
180
177
177
174
175
178
179
175
173
173
170
173
171
169
166
164
166
169
165
161
162
160
162
162
159
161
165
161
163
167
163
159
160
156
154
154
156
153
151
147
144
144
140
141
144
147
143
144
146
147
146
147
150
150
151
149
150
147
147
148
148
149
148
148
144
147
144
145
144
147
143
142
142
144
140
142
142
141
143
141
143
145
144
148
150
146
144
148
146
143
140
144
148
146
143
143
139
142
145
146
145
142
142
144
141
143
145
142
140
141
138
139
136
137
136
134
138
142
140
138
138
139
137
136
133
136
137
133
137
137
133
132
131
135
131
128
128
132
136
132
134
130
128
126
122
120
117
117
120
120
123
126
130
126
129
128
125
127
124
124
125
122
123
125
124
125
129
127
131
132
128
127
130
128
125
126
130
132
128
132
133
137
133
133
131
129
130
126
122
124
120
123
121
119
119
122
125
125
128
125
123
126
126
124
124
120
124
121
120
121
120
120
121
117
114
110
114
112
108
104
102
106
104
108
104
105
105
108
111
114
111
113
114
112
113
116
119
120
120
121
117
114
113
112
115
113
115
111
113
113
109
110
110
109
108
110
106
104
107
107
107
111
109
109
110
108
110
110
110
108
106
108
105
104
106
108
111
114
114
117
117
113
111
110
111
112
114
117
117
121
117
115
116
120
116
112
110
106
108
111
107
110
111
107
104
106
107
111
111
112
116
118
120
123
121
119
116
112
112
116
119
118
116
112
111
115
113
112
110
113
112
114
115
111
109
110
106
109
110
107
109
110
113
115
119
115
116
118
119
118
116
119
115
115
113
113
111
109
112
108
106
103
102
101
104
108
105
102
101
98
98
96
92
93
89
89
91
89
88
90
93
89
85
84
84
86
89
89
89
86
87
87
83
85
86
85
83
87
83
80
80