Commit dc062b6f by Sanjay Krishnan

hw1 added

parent 9d154ccc
# Homework 1. Introduction to Python and File I/O
This homework assignment is meant to be an introduction to Python programming and introduces some basic concepts of encoding and decoding.
Due Date: *Friday April 15, 2020 11:59 pm*
## Initial Setup
These initial setup instructions assume you've done ``hw0``. Before you start an assingment you should sync your cloned repository with the online one:
```
$ cd cmsc13600-materials
$ git pull
```
Copy the folder ``hw1`` to your newly cloned submission repository. Enter that repository from the command line and enter the copied ``hw1`` folder. In this homework assignment, you will only modify ``encoding.py``. Once you are done, you must add 'encoding.py' to git:
```
$ git add encoding.py
```
After adding your files, to submit your code you must run:
```
$ git commit -m"My submission"
$ git push
```
We will NOT grade any code that is not added, committed, and pushed to your submission repository. You can confirm your submission by visiting the web interface[https://mit.cs.uchicago.edu/cmsc13600-spr-20/skr]
## Delta Encoding
Delta encoding is a way of storing or transmitting data in the form of differences (deltas) between sequential data rather than complete files.
In this first assignment, you will implement a delta encoding module in python.
The module will:
* Load a file of integers
* Delta encode them
* Write back a file in binary form
The instructions in this assignment are purposefully incomplete for you to read Python's API and to understand how the different functions work. All of the necessary parts that you need to write are marked with *TODO*.
## TODO 1. Loading the data file
In `encoding.py`, your first task is to write `load_orig_file`. This function reads from a specified filename and returns a list of integers in the file. You may assume the file is formatted like ``data.txt`` provided with the code, where each line contains a single integer number. The input of this function is a filename and the output is a list of numbers. If the file does not exist you must raise an exception.
## TODO 2. Compute the basic encoding
In `encoding.py`, your next task is to write `delta_encoding`. This function takes a list of numbers and computes the delta encoding. The delta encoding encodes the list in terms of successive differences from the previous element. The first element is kept as is in the encoding.
For example:
```
> data = [1,3,4,3]
> enc = delta_encoding(data)
1,2,1,-1
```
Or,
```
> data = [1,0,6,1]
> enc = delta_encoding(data)
1,-1,6,-5
```
Your job is to write a function that computes this encoding. Pay close attention to how python passes around references and where you make copies of lists v.s. modify a list in place.
## TODO 3. Integer Shifting
When we write this data to a file, we will want to represent each encoded value as an unsigned short integer (1 single byte of data). To do so, we have to "shift" all of the values upwards so there are no negatives. You will write a function `shift` that adds a pre-specified offset to each value.
## TODO 4. Write Encoding
Now, we are ready to write the encoded data to disk. In the function `write_encoding`, you will do the following steps:
* Open the specified filename in the function arguments for writing
* Convert the encoded list of numbers into a bytearray
* Write the bytearray to the file
* Close the file
Reading from such a file is a little tricky, so we've provided that function for you.
## TODO 5. Delta Decoding
Finally, you will write a function that takes a delta encoded list and recovers the original data. This should do the opposite of what you did before. Don't forget to unshift the data when you are testing!
For example:
```
> enc = [1,2,1,-1]
> data = delta_decoding(enc)
1,3,4,3
```
Or,
```
> data = [1,-1,6,-5]
> data = delta_decoding(enc)
1,0,6,1
```
import random
from encoding import *
def test_load():
data = load_orig_file('data.txt')
try:
assert(sum(data) == 1778744)
except AssertionError:
print('TODO 1. Failure check your load_orig_file function')
def test_encoding():
data = load_orig_file('data.txt')
encoded = delta_encoding(data)
try:
assert(sum(encoded) == data[-1])
assert(sum(encoded) == 26)
assert(len(data) == len(encoded))
except AssertionError:
print('TODO 2. Failure check your delta_encoding function')
def test_shift():
data = load_orig_file('data.txt')
encoded = delta_encoding(data)
N = len(data)
try:
assert(sum(shift(data, 10)) == N*10 + sum(data))
assert(all([d >=0 for d in shift(encoded,4)]))
except AssertionError:
print('TODO 3. Failure check your shift function')
def test_decoding():
data = load_orig_file('data.txt')
encoded = delta_encoding(data)
sencoded = shift(encoded ,4)
data_p = delta_decoding(unshift(sencoded,4))
try:
assert(data == data_p)
except AssertionError:
print('TODO 5. Cannot recover data with delta_decoding')
def generate_file(size, seed):
FILE_NAME = 'data.gen.txt'
f = open(FILE_NAME,'w')
initial = seed
for i in range(size):
f.write(str(initial) + '\n')
initial += random.randint(-4, 4)
def generate_random_tests():
SIZES = (1,1000,16,99)
SEEDS = (240,-3, 9, 1)
cnt = 0
for trials in range(10):
generate_file(random.choice(SIZES), random.choice(SEEDS))
data = load_orig_file('data.gen.txt')
encoded = delta_encoding(data)
sencoded = shift(encoded ,4)
write_encoding(sencoded, 'data_out.txt')
loaded = unshift(read_encoding('data_out.txt'),4)
decoded = delta_decoding(loaded)
cnt += (decoded == data)
try:
assert(cnt == 10)
except AssertionError:
print('Failed Random Tests', str(10-cnt), 'out of 10')
test_load()
test_encoding()
test_shift()
test_decoding()
generate_random_tests()
\ No newline at end of file
This diff is collapsed. Click to expand it.
'''ecoding.py provides utilities for compressing a file of data
using a 'delta' encoding (change over the previous) element.
'''
import struct
#TODO 1. Write a function that loads the input data
#and returns a list of numbers
def load_orig_file(in_filename):
'''load_orig_file takes an input file and returns a list of
numbers. The file is formatted with a single integer
number on each line
'''
raise ValueError('Not implemented')
#TODO 2. Write a function that performs the delta encoding
def delta_encoding(data):
'''delta_encoding takes a list of integers and performs
a delta encoding represent each element as a difference
from the previous one. The first element is kept as is.
delta_encoding encoding returns a list where the first
element is the original value and all the rest of the
elements are deltas from the first value.
'''
raise ValueError('Not implemented')
#TODO 3. Apply a shift to all the elements so all the deltas are positive
def shift(data, offset):
'''shift adds 'offset' to all of the elements to ensure that
every value in the delta encoding is positive.
'''
raise ValueError('Not implemented')
#GIVEN, should be obvious what it does
def unshift(data, offset):
return shift(data,-offset)
#TODO 4. Convert the encoded data into a byte array and write
#to disk.
def write_encoding(data, out_file):
raise ValueError('Not implemented')
#GIVEN, read encoded file
def read_encoding(out_file):
f = open(out_file, 'rb')
encoded = f.read()
return struct.unpack("B"*len(encoded), encoded)
#TODO 5. Write a function that performs the delta encoding
def delta_decoding(data):
'''delta_decoding takes a delta encoded list and return
the original data.
'''
raise ValueError('Not implemented')
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment