Chad scraper

dramaticcat@sh.itjust.works · 1 year ago

Chad scraper

darcy@sh.itjust.works · 1 year ago

someone’s never used a good api. like mastodon

sebinspace@lemmy.world · 1 year ago

I wanted to build a Discord bot that would check NIST for new CVEs every 24 hours. But their API leaves quiiiiiiite a bit to be desired.

Their pages, however…

khaffner@lemmy.world · 1 year ago

Just use this https://github.com/CVEProject/cvelistV5/tree/main/cves

sebinspace@lemmy.world · 1 year ago

Oh yeah, that’s much more robust

lemmywizard@lemm.ee · 1 year ago

It’s all fun and games until you have to support all this shit and it breaks weekly!

That being said, I do miss the simplicity of maintaining selenium projects for work

lnee@lemm.ee · 1 year ago

I scrape with bash lord help me.

SubArcticTundra@lemmy.ml · 1 year ago

you scrape WITH BASH?

Dkarma@lemmy.world · 1 year ago

Awk all the things!

lnee@lemm.ee · 1 year ago

pipe sed pipe grep pipe tr pipe grep… I would say I am a bit of a plumber

Chemical Wonka@discuss.tchncs.de · 1 year ago

Let’s see what WEI (if implemented ) will do with the scrapers. The future doesn’t look promising.

kadotux@sopuli.xyz · 1 year ago

What’s that?

Username@feddit.de · 1 year ago

A google/chrome proposal for browser verification, i.e. killing addons and custom browsers.

Uresname@lemmy.dbzer0.com · 1 year ago

Nice name, beat me to it

InternetTubes@lemmy.world · edit-2 1 year ago

Removed by mod

PoolloverNathan@programming.dev · 1 year ago

If you wanted a chad scraper, look at Pushshift. Reveddit relied on it before Reddit got it taken down.

Irkam@jlai.lu · 1 year ago

Let me introduce you to WooB (formerly WEBooB).

Rodeo@lemmy.ca · 1 year ago

Why on earth would they have changed that. WEBooB is a way better name.

planish@sh.itjust.works · 1 year ago

But it’s got boob in it.

Crashumbc@lemmy.world · 1 year ago

ROFL, Chad only thinks that shit works

Fisch@lemmy.ml · 1 year ago

I really hope Libreddit switches to scraping, the “Error: Too many request” thing is so annoying, I have to click the redirect button in Libredirect like 20 times until I can actually see a post.

Still a better experience than Reddits official site tho.

NigelFrobisher@aussie.zone · 1 year ago

My undergrad project was a scraper - there just wasn’t a name for it yet,

newIdentity@sh.itjust.works · edit-2 1 year ago

Scrapers have been a thing since the web exists.

One of the first search engines is even called WebCrawler

McBain@feddit.ch · 1 year ago

I use scrapy. It has a steeper learning curve than other libraries, but it’s totally worth it.

rishado@lemmy.world · 1 year ago

Splash ftw

lnee@lemm.ee · 1 year ago

That’s why I use geddit

the_lone_wolf@lemmy.ml · 1 year ago

Ok then make a spotify scraper

UraniumBlazer@lemm.ee · 1 year ago

Sorry, I’m ignorant in this matter. Why exactly would you want to scrape websites aside from collecting data for ML? What kind of irreplaceable API are you using? Someone please educate me here.

coltorl@programming.dev · 1 year ago

API might cost a lot of money for the amount of requests you want to send. API may not include some fields in the data you want. API is rate limited, scraping might not be. API requires agreement to usage terms, scraping does not (though the recent LinkedIn scraping case might weaken that argument.)

olympicyes@lemmy.world · 1 year ago

My understanding is that the result of the LinkedIn case is that you can scrape data that you have permission to view but not to access data that you were not intended to. The end result that ClickWrap agreements are unenforceable.

redw04@lemmy.ca · 1 year ago

So uh…as someone who’s currently trying to scrape the web for email addresses to add to my potential client list … where do I start researching this?

lutillian@sh.itjust.works · 1 year ago

Start looking into selenium, probably in Python. It’s one of the easier to understand forms of scraping. It’s mainly used to web testing, though you can definitely use it for less… nice purposes.