Web Scraping Reddit with Scrapy

1. install scrapy

you need install Microsoft Visual C++ 14.0 from https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=BuildTools&rel=16, then pip install scrapy.

2. create scrapy project

C:\Users\zhuby\hans>scrapy startproject reddit
New Scrapy project ‘reddit’, using template directory ‘c:\python\lib\site-packages\scrapy\templates\project’, created in:
C:\Users\zhuby\hans\reddit

You can start your first spider with:
cd reddit
scrapy genspider example example.com

3. C:\Users\zhuby\hans\reddit\reddit\spiders>code redditspider.py

import scrapy

class RedditSpider(scrapy.Spider):
    name = "reddit"
    start_urls = ["https://www.reddit.com/r/cats"]

    def parse(self, response):
        links = response.xpath("//img/@src")
        html =""

        for link in links:
            url = link.get()
            if any(extension in url for extension in [".jpg", ".gif", ".png"]):
                html += """<a href="{url}"
                target="_blank">
                <img src="{url}" height="33%" width="33%">
                </a>""".format(url=url)

                with open("frontpage.html", "a") as page:
                    page.write(html)
                    page.close()

4. test the redditspider.py

C:\Users\zhuby\hans\reddit>scrapy crawl reddit
2020-06-08 16:14:25 [scrapy.utils.log] INFO: Scrapy 2.1.0 started (bot: reddit)
2020-06-08 16:14:25 [scrapy.utils.log] INFO: Versions: lxml 4.5.1.0, libxml2 2.9.5, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:37:02) [MSC v.1924 64 bit (AMD64)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g 21 Apr 2020), cryptography 2.9.2, Platform Windows-10-10.0.19041-SP0
2020-06-08 16:14:25 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2020-06-08 16:14:25 [scrapy.crawler] INFO: Overridden settings:
{‘BOT_NAME’: ‘reddit’,
………………..
then you will get file:///C:/Users/zhuby/hans/reddit/frontpage.html

2 Replies to “Web Scraping Reddit with Scrapy”

My spouse and i have been now comfortable that Edward managed to finish off his investigations with the ideas he acquired when using the web page. It is now and again perplexing to just be offering facts that many people today might have been trying to sell. We do know we’ve got you to be grateful to because of that. Most of the illustrations you have made, the easy blog menu, the friendships you can help promote – it is most fabulous, and it’s really letting our son in addition to us reason why this situation is satisfying, and that’s quite vital. Thanks for all the pieces!

Thank you a lot for giving everyone an exceptionally brilliant possiblity to read in detail from this website. It can be very pleasurable and also full of a lot of fun for me and my office co-workers to visit your site at minimum 3 times every week to study the fresh guidance you will have. And definitely, I am at all times pleased for the spectacular tips and hints you serve. Selected two tips in this post are absolutely the most suitable I’ve ever had.

1. install scrapy

2. create scrapy project

3. C:\Users\zhuby\hans\reddit\reddit\spiders>code redditspider.py

4. test the redditspider.py

2 Replies to “Web Scraping Reddit with Scrapy”

Leave a Reply Cancel reply