someone’s never used a good api. like mastodon
I wanted to build a Discord bot that would check NIST for new CVEs every 24 hours. But their API leaves quiiiiiiite a bit to be desired.
Their pages, however…
Just use this https://github.com/CVEProject/cvelistV5/tree/main/cves
Oh yeah, that’s much more robust
It’s all fun and games until you have to support all this shit and it breaks weekly!
That being said, I do miss the simplicity of maintaining selenium projects for work
I scrape with bash lord help me.
you scrape WITH BASH?
Awk all the things!
pipe sed pipe grep pipe tr pipe grep… I would say I am a bit of a plumber
Let’s see what WEI (if implemented ) will do with the scrapers. The future doesn’t look promising.
What’s that?
A google/chrome proposal for browser verification, i.e. killing addons and custom browsers.
Nice name, beat me to it
Removed by mod
If you wanted a chad scraper, look at Pushshift. Reveddit relied on it before Reddit got it taken down.
Let me introduce you to WooB (formerly WEBooB).
Why on earth would they have changed that. WEBooB is a way better name.
But it’s got boob in it.
ROFL, Chad only thinks that shit works
I really hope Libreddit switches to scraping, the “Error: Too many request” thing is so annoying, I have to click the redirect button in Libredirect like 20 times until I can actually see a post.
Still a better experience than Reddits official site tho.
My undergrad project was a scraper - there just wasn’t a name for it yet,
Scrapers have been a thing since the web exists.
One of the first search engines is even called WebCrawler
I use scrapy. It has a steeper learning curve than other libraries, but it’s totally worth it.
Splash ftw
That’s why I use geddit
Ok then make a spotify scraper
Sorry, I’m ignorant in this matter. Why exactly would you want to scrape websites aside from collecting data for ML? What kind of irreplaceable API are you using? Someone please educate me here.
API might cost a lot of money for the amount of requests you want to send. API may not include some fields in the data you want. API is rate limited, scraping might not be. API requires agreement to usage terms, scraping does not (though the recent LinkedIn scraping case might weaken that argument.)
My understanding is that the result of the LinkedIn case is that you can scrape data that you have permission to view but not to access data that you were not intended to. The end result that ClickWrap agreements are unenforceable.
So uh…as someone who’s currently trying to scrape the web for email addresses to add to my potential client list … where do I start researching this?
Start looking into selenium, probably in Python. It’s one of the easier to understand forms of scraping. It’s mainly used to web testing, though you can definitely use it for less… nice purposes.