Commit 369c6124 by Anselm Jia

readme

parent 60fd277a
Showing with 21 additions and 10 deletions
find_averages.py: Given a set of files, finds the average length, number of C subdirectory: Contains our programs for finding the most similar file pairs under our previous scoring system as well as splitting C files into files and functions.
functions, and number of variables.
index.py: Creates the index file of pairs for the MapReduce jobs. Graphics subdirectory: Contains all graphics from the report.
most_average.py: Given statistics of the set of files, returns the files most similar Non-code subdirectory: Contains the project proposal file and the presentation slides.
to those statistics.
similarity.py: Given a similarity score calculation, returns the files that are Python subdirectory: Contains the files used for our final Python analysis
the most similar. -collectvariables.py: Pulls all of the unique variables from a code text file.
-pyfilesplit.py: Splits out the desired number of program files from the raw file.
-pyfuncsplit.py: Splits out the desired number of functions files from the raw file.
-topfunctions.py: Find the function names with the lowest mean edit distance from all other function names.
-toptext.py: Computes the least unique files overall.
-topvariables.py: Computes the least unique variable names.
-topvariables_intersection.py: Checks which variable files have the largest mean number of common variables with other files.
split.py: Splits the original C text into individual files. histogram.py: Used to construct the histograms for the report.
util.py: Contains the functions for computing similarity and files metrics. index.py: Creates the index file of tuples for the MapReduce jobs. Methodology detailed in the report.
\ No newline at end of file
plot_graph.py: Used to create the distance plot in the report.
runsim.sh: Used to send specific files for subset testing when there are other files in the directory so sending the entire directory is inefficient.
text_to_dict.py: Turns the output from mrjob into a dictonary.
util.py: Contains the functions for computing similarity and files metrics.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment