Skip to content
Toggle navigation
P
Projects
G
Groups
S
Snippets
Help
Ethan Mertz
/
CS-123-Final
This project
Loading...
Sign in
Toggle navigation
Go to a project
Project
Repository
Pipelines
Members
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Commit
454363b4
authored
Jun 03, 2018
by
Ethan Mertz
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
pushing latest version
parent
be5fa434
Hide whitespace changes
Inline
Side-by-side
Showing
7 changed files
with
47 additions
and
16 deletions
Python/pyfilesplit.py
Python/toptext.py
Python/topvariables.py
README.txt
find_top_python_variables.py
runsim.sh
util.py
Python/pyfilesplit.py
View file @
454363b4
...
...
@@ -43,8 +43,8 @@ def go(subset):
current_file
.
append
(
line
)
#Writes out the last file to the directory.
with
open
(
'pydata/pyfile'
+
str
(
count
)
+
".txt"
,
'w'
)
as
new
:
for
l
in
current_file
:
new
.
write
(
l
)
for
l
in
current_file
:
new
.
write
(
l
)
if
__name__
==
"__main__"
:
subset
=
int
(
sys
.
argv
[
1
])
...
...
Python/toptext.py
View file @
454363b4
...
...
@@ -23,7 +23,10 @@ class TopText(MRJob):
filename
=
"pyfile"
+
file_number
+
".txt"
file_text
=
filename
.
read
()
for
i
in
range
(
int
(
file1
)
+
1
,
int
(
total
)):
#Get score for body text.
++--
Python
/
pyfuncsplit
.
py
|
24
+++++-
Python
/
topfunctions
.
py
|
22
++++--
Python
/
toptext
.
py
|
30
++++--
#Get score for body text.
comparison_file
=
"pyfile"
+
i
+
".txt"
comparison_text
=
comparison_file
.
read
()
ts
=
-
jellyfish
.
jaro_winkler
(
filename_text
,
comparison_text
)
...
...
Python/topvariables.py
View file @
454363b4
...
...
@@ -4,7 +4,7 @@ import heapq
import
jellyfish
import
math
CAPACITY
=
2
00
CAPACITY
=
5
00
VARIABLE_REGEX
=
(
"[
\
s]*([
\
w, ]+)[
\
s]*="
)
...
...
README.txt
View file @
454363b4
...
...
@@ -5,13 +5,13 @@ Graphics subdirectory: Contains all graphics from the report.
Non-code subdirectory: Contains the project proposal file and the presentation slides.
Python subdirectory: Contains the files used for our final Python analysis
-collectvariables.py: Pulls all of the unique variables from a code text file.
-pyfilesplit.py: Splits out the desired number of program files from the raw file.
-pyfuncsplit.py: Splits out the desired number of functions files from the raw file.
-topfunctions.py: Find the function names with the lowest mean edit distance from all other function names.
-toptext.py: Computes the least unique files overall.
-topvariables.py: Computes the least unique variable names.
-topvariables_intersection.py: Checks which variable files have the largest mean number of common variables with other files.
-collectvariables.py: Pulls all of the unique variables from a code text file.
-pyfilesplit.py: Splits out the desired number of program files from the raw file.
-pyfuncsplit.py: Splits out the desired number of functions files from the raw file.
-topfunctions.py: Find the function names with the lowest mean edit distance from all other function names.
-toptext.py: Computes the least unique files overall.
-topvariables.py: Computes the least unique variable names.
-topvariables_intersection.py: Checks which variable files have the largest mean number of common variables with other files.
histogram.py: Used to construct the histograms for the report.
...
...
find_top_python_variables.py
View file @
454363b4
import
re
d
=
{}
with
open
(
"p
ython.txt
"
)
as
p
:
with
open
(
"p
lot_graph.py
"
)
as
p
:
for
line
in
p
:
var
=
re
.
findall
(
"[
\
s]*([
\
w, ]+)[
\
s]*="
)
d
[
var
]
=
d
.
get
(
var
,
0
)
+
1
var
=
re
.
findall
(
"[
\
s]*([
\
w, ]+)[
\
s]*=[^=]"
,
line
)
if
var
:
d
[
var
[
0
]
.
strip
()]
=
d
.
get
(
var
[
0
]
.
strip
(),
0
)
+
1
l
=
[]
for
item
in
d
:
l
.
append
((
d
[
item
],
item
))
...
...
runsim.sh
View file @
454363b4
...
...
@@ -9,12 +9,12 @@ COM="$2 --jobconf mapreduce.job.reduces=1 "
COUNTER
=
0
for
i
in
`
seq 0
$1
`
do
COM
=
"
$COM
--file py
functions/py
file
$i
.txt "
COM
=
"
$COM
--file py
variables/var
file
$i
.txt "
done
COM
=
"
$COM
index.txt"
echo
$COM
python3
$COM
>
result
s.txt
python3
$COM
>
variable_score
s.txt
util.py
View file @
454363b4
...
...
@@ -25,6 +25,19 @@ def funcsim(file1name,file2name):
return
(
name1
,
name2
,
params1
,
params2
,
text1
,
text2
)
def
get_variable_score
(
d
,
text
):
'''
Given a dictionary of variable scores and a text,
extracts the variables and determines the average
variable score
Inputs:
d (dictionary): the dictionary containing the variable scores
scores
text (str): the string of the text of the file to compare
Returns:
score (float)
'''
variables
=
re
.
findall
(
REG_V
,
text
)
num
=
len
(
variables
)
total
=
0
...
...
@@ -33,6 +46,19 @@ def get_variable_score(d, text):
return
total
/
num
def
get_function_score
(
d
,
text
):
'''
Given a dictionary of function scores and a text,
extracts the functions and determines the average
function score
Inputs:
d (dictionary): the dictionary containing the function scores
scores
text (str): the string of the text of the file to compare
Returns:
score (float)
'''
functions
=
re
.
findall
(
REG_F
,
text
)
num
=
len
(
functions
)
total
=
0
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment