Skip to content
Toggle navigation
P
Projects
G
Groups
S
Snippets
Help
Michelle Awh
/
project_kitty
This project
Loading...
Sign in
Toggle navigation
Go to a project
Project
Repository
Pipelines
Members
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Commit
d33638a0
authored
Mar 12, 2021
by
Michelle Awh
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
docstrings
parent
e7dc01bb
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
13 additions
and
0 deletions
SearchResults.py
SearchResults.py
View file @
d33638a0
...
...
@@ -568,6 +568,9 @@ class Article:
def
related_enough
(
self
):
'''
'''
links
=
self
.
all_peripheral_links
(
self
.
__soup
,
self
.
url
)
articles
,
categories
=
self
.
extract_titles
(
links
)
actual_related_articles
=
{}
...
...
@@ -590,6 +593,10 @@ class Article:
def
pre_processing
(
self
,
wh_page
,
caps
):
'''
Takes a single string representing all of the text on a WikiHow page
and returns it as a list of individual words.
'''
tokenizer
=
nltk
.
RegexpTokenizer
(
r'\w+'
)
list_of_words
=
tokenizer
.
tokenize
(
wh_page
)
list_of_words
=
[
word
for
word
in
list_of_words
if
word
not
in
stopwords
.
words
(
'english'
)]
...
...
@@ -597,6 +604,9 @@ class Article:
def
n_gram
(
self
,
wh_page
,
caps
,
n
):
'''
Takes a list of individual words and creates ngrams of size n
'''
n_gram_lst
=
[]
list_of_words
=
self
.
pre_processing
(
wh_page
,
caps
)
start_value
=
0
...
...
@@ -609,6 +619,9 @@ class Article:
def
find_common_words
(
self
,
string
):
'''
Takes a list of ngrams and finds the most common entries.
'''
words_lst
=
Counter
()
string_lst
=
self
.
n_gram
(
string
,
False
,
1
)
words_lst
.
update
(
w
for
w
in
string_lst
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment