Skip to content
Toggle navigation
P
Projects
G
Groups
S
Snippets
Help
Michelle Awh
/
project_kitty
This project
Loading...
Sign in
Toggle navigation
Go to a project
Project
Repository
Pipelines
Members
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Commit
8d94f134
authored
Mar 12, 2021
by
Michelle Awh
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
all docstrings done!
parent
063ea2a9
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
23 additions
and
23 deletions
SearchResults.py
SearchResults.py
View file @
8d94f134
...
...
@@ -24,8 +24,6 @@ inappropriate_words_lst = inappropriate_words()
class
SearchResults
:
...
...
@@ -64,7 +62,6 @@ class SearchResults:
Returns:
a SearchResults object
'''
print
(
filters
)
self
.
__filters
=
filters
self
.
__required_words
=
[]
self
.
__required_words
+=
self
.
get_required_words
(
query
)
...
...
@@ -173,18 +170,8 @@ class SearchResults:
def
passes_all_filters
(
self
,
article
):
<<<<<<<
HEAD
if
'Main-Page'
in
article
.
url
:
return
False
=======
"""
Given an article object, check to make sure that the
article matches the conditions set by the filters
Input: article object
Returns: boolean representing if the conditions have been met
"""
>>>>>>>
7e06
e29c9767fc90bea8e3028de2e61851c4c33c
if
'child_safe'
in
self
.
__filters
:
if
self
.
__filters
[
'child_safe'
]:
self
.
__forbidden_words
+=
inappropriate_words_lst
...
...
@@ -449,7 +436,6 @@ class Article:
Input: soup object
Returns: String containing the text on the article
"""
print
(
url
)
title
=
soup
.
find
(
"h1"
,
id
=
"section_0"
)
.
text
description
=
soup
.
find
(
"div"
,
class_
=
"mf-section-0"
)
.
text
steps
=
soup
.
find_all
(
"div"
,
class_
=
"step"
)
...
...
@@ -468,9 +454,11 @@ class Article:
def
all_peripheral_links
(
self
,
soup
,
url
):
"""
"""
'''
Given a soup and url, returns a list of all related links,
all expanded related links (as in 1 page away),
and all related expanded categories.
'''
related_links
=
self
.
get_related
(
self
.
__soup
,
url
)
expanding
,
full_expanse
=
self
.
expanding_breadcrumbs
(
related_links
,
soup
)
link_family
=
{
**
related_links
,
**
expanding
,
**
full_expanse
}
...
...
@@ -482,11 +470,9 @@ class Article:
def
get_related
(
self
,
soup
,
starting_url
):
'''
Inputs:
soup: Soup object
queue: queue object
Outputs:
links: queue object containing all of the links in order
Given a soup object and a starting url,
returns a dictionary of related WikiHow Articles on the WikiHow page
with thir html id or class as a key and the url as a value
'''
related
=
{}
ids
=
[
'breadcrumb'
,
'tips'
,
'relatedwikihows'
]
...
...
@@ -506,6 +492,11 @@ class Article:
def
expanding_breadcrumbs
(
self
,
related
,
soup
):
'''
Given a dictionary of related WikiHow links and a soup object,
finds all category links and returns all WikiHow articles in that
category as values in a dictionary, where the keys are category links.
'''
expanded
=
{}
full_expanse
=
{}
for
link_group
in
related
:
...
...
@@ -529,6 +520,10 @@ class Article:
def
linked_urls
(
self
,
a_lst
,
starting_url
):
'''
Given a list of 'a' tags from a soup object and starting url,
returns all WikiHow links
'''
links
=
[]
for
link
in
a_lst
:
if
link
.
has_attr
(
"href"
):
...
...
@@ -545,6 +540,9 @@ class Article:
def
extract_titles
(
self
,
lst
):
'''
Create mappings between the titles of articles and their urls
'''
titles
=
{}
categories
=
{}
for
url
in
lst
:
...
...
@@ -569,7 +567,9 @@ class Article:
def
related_enough
(
self
):
'''
Determines whether each related article is relevant enough to this article
to display by comparing the articles' titles against the tags from this
article.
'''
links
=
self
.
all_peripheral_links
(
self
.
__soup
,
self
.
url
)
articles
,
categories
=
self
.
extract_titles
(
links
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment