Tl;dr —I built a web app that summarizes search articles

Because your time is important.

Mike Salem
6 min readOct 4, 2022

I enjoy reading, but I’m pretty crunched on time. Whenever I get a chance, I try to pull up an articles around robotics, tech, or other general interests. Usually, I rely on my news feed from phone for getting articles, but I find those articles start to deviate from my interest over time, and sometimes, when I find an interesting article, I go to the link to find out it’s really long and I don’t have time to read it right now. I try to bookmark it and come back later, but by then the moment has passed and I’m on to the next piece of information fighting for my time. This got me thinking…

What if I was able to read just summaries of articles instead of reading the entire article? Almost like a…tl;dr?

Well, that’s exactly what I did. I built an app that allows the user to enter a few keywords (it even has search compatible syntax) and get back a few summarized articles from the day. This is a project I built in my spare time for enjoyment and to learn more about publishing public apps to Streamlit (I’m in it for the stickers). I wanted to play with some tech and make something I found useful — even if the usefulness is a bit limited (more on that later).

* This is not intended to be a commercial product, solution, or tool. Any likeness to any commercial or personal products are purely coincidental.

The Tools:

For this project, I decided to build everything in Python. The main tools are listed below. There are a few others like pandas and numpy but I am pretty sure most people are already familiar with those.

Streamlit — This has been my go to for creating the last three years now. I’ve used other frameworks, but as a data scientist / back-end engineer I prefer not to do a lot of heavy front-end development. Streamlit makes things easy to build in templated fashion with very few lines of code.

GoogleNews and Newspaper — Admittedly, I could have used something like Beautiful Soup and Python’s requests library to build a crawler and grab news articles, but I’ve done this in the past and it’s quite a bit of work and not something I was very excited to do again. Instead, this package has some great functionality like getting information from Google News by topic and also performingg web searches. Newspaper can take a URL and strip out the HTML and metadata and put it in a very nice user friendly format. It’s free to use, but please be mindful when pinging websites and use a sleep function to overwhelm the sites with requests.

Spacy — Spacy is a great package for NLP. I’ve used it in the past for some projects at work and have been looking to see if I can break my dependency on NLTK. Overall, I really like what Spacy has to offer as I get more and more familiar with it.

NetworkX — NetworkX is a graph library that allows users to build and analyze network graphs. Just like Streamlit, NetworkX brings me to my happy place. I love how functional and powerful this package is and recommend you check it out if you have any interest in doing any network graphs of any kind.

Mpire — If you have to do any multiprocessing, this package makes things easy and it’s very fast.

How it Works:

When the user arrives at the app, there is a very basic UI that looks like it almost could have come out of the late 90s. The site is no frills and I built it that way by design. Streamlit can do some really great color modifications, backgrounds, etc. but I purposely kept this one simple.

The basic webpage

On the left sidebar there is a place for the user to enter some text. This is where there search criteria is input. Since the search is powered by the GoogleNews Python package, the user can add Google search syntax items to include/exclude words, etc. After entering the search criteria, pressing ctrl + Enter will start the search.

GoogleNews will very large lists and the processing time to summarize all the articles would take an incredibly long time depending on the searched topic. To keep things small, only the first 10 results are searched for the article contents. Newspaper is ued to grab the articles body, image, links, date published and more while Mpire collects the article information in parallel to speed up the processing.

Once the article body is retrieved the text starts to get processed for summation. There are two different approaches for summarizing the text: abstractive and extractive. Abstractive takes the text that is written, tries to understand it and paraphrases it. Extractive looks at the text in the article and tries reduce the number of sentences and “shuffle” them around to make a smaller summary. For the summary used in this app, I utilized an extractive approach that converts the text into a network graph and utilizes the Pagerank algorithm algorithm to reconstruct the text into a summary article. The approach can be found here, but I utilized Spacy instead of the NLTK.

Once the summation is finished. We start to post the results in the web app. I wanted the web app to feel similar to my feed from my phone so I emulated a lot of that look in feel. First, the article headline is written hyperlinked to the URL for the article. Under the article headline, I added the domain for the article and the time the article was posted if available. If the article contained an image, I tried to capture that and post it on the feed because I felt that gave it a more “polished” look and feel. Finally, underneath the photo, the summary is written followed by a horizontal line to separate multiple stories.

After creating the summaries, I wanted to try putting some analytics in the sidebar. This is not typical behavior from a UI design, but I thought it would be fun to see if Streamlit had that functionality and be able to hide the stats away if I wanted to look at just my feed. To my surprise it was just as easy as putting them on the main page and it was interesting to see just how much shorter the document summaries were than the original articles. I found that the summaries were typically less than 50% of the original document length and were cohesive. Some documents were less than 10% of the original size which I found impressive.

Using the App:

Searching for the topic “Techcrunch”

Since this is a project I wanted to host on Streamlit, the app is available here. Access to the source code is available under the hamburger stack in the upper right corner. In the event you cannot access the code, please leave a comment and I will be more than happy to share the code with you. The entire app is less than 225 lines of code.

Final Words:

This article presented some tools to build a web app that summarized searched articles. While the app only displays 10 search results, this could easily be expanded by not limiting the number of URLs within the code. Additionally, other functionality could be added such as searching specific regions, languages and more. Publishing the app through Streamlit was dead simple and free provided you make the app publicly available. I definitely plan to share more projects through their platform.

I have another project idea on using network graphs to display the connections between articles based on similarity so if that is interesting to you, please leave a comment and let me know. I hope you enjoy the app and it inspires you to build your own things!

--

--

Mike Salem

Assoc. Director of Data Science @gilead | Robotics Instructor @brandeisuniversity | Mathematician interested in multi-agent systems and self-organizing behavior